A DATASET FOR DYNAMIC DISCOVERY OF SEMANTIC CHANGES IN VERSION CONTROLLED SOFTWARE HISTORIES
Chenguang Zhu, Yi Li, Julia Rubin, and Marsha Chechik
In Proceedings of the 14th International Conference on Mining Software Repositories, MSR 2017
Abstract: Over the last few years, researchers proposed several semantic history slicing approaches that identify the set of semantically-related commits implementing a particular software functionality. However, there is no comprehensive benchmark for evaluating these approaches, making it difficult to assess their capabilities.
This paper presents a dataset of 81 semantic change data collected from 8 real-world projects. The dataset is created for benchmarking semantic history slicing techniques. We provide details on the data collection process and the storage format. We also discuss usage and possible extensions of the dataset.