Semantic Slicing of Software Version Histories

Yi Li, Chenguang Zhu, Julia Rubin, and Marsha Chechik


Software developers often need to transfer functionality, e.g., a set of commits implementing a new feature or a bug fix, from one branch of a configuration management system to another. That can be a challenging task as the existing configuration management tools lack support for matching high-level, semantic functionality with low-level version histories. The developer thus has to either manually identify the exact set of semantically-related commits implementing the functionality of interest or sequentially port a segment of the change history, “inheriting” additional, unwanted functionality.

Here is a motivative example:

a motivation example

Fig. 1. Change history of Foo.java.





a motivation example

Fig. 2. Foo.java before and after semantic slicing.

We tackle this problem by providing automated support for identifying the set of semantically-related commits implementing a particular functionality, which is defined by a set of tests. We formally define the semantic slicing problem, provide an algorithm for identifying a set of commits that constitute a slice, and propose techniques to minimize the produced slice.

Fig. 3. High-level overview of the semantic slicing algorithms.

We then instantiate the overall approach, CSLICER, in a specific implementation for Java projects managed in Git and evaluate its correctness and effectiveness on a set of open-source software repositories. We show that it allows to identify subsets of change histories that maintain the functionality of interest but are substantially smaller than the original ones.


CSlicer Bitbucket repository: LINK


Related Publications: