2024 IEEE/ACM 46th International Conference on Software Engineering (ICSE)
Download PDF

Abstract

Understanding code changes is of crucial importance in a wide range of software evolution activities. The traditional approach is to use textual differencing, as done with success since the 1970s with the ubiquitous diff tool. However, textual differencing has the important limitation of not aligning the changes to the syntax of the source code. To overcome these issues, structural (i.e. syntactic) differencing has been proposed in the literature, notably GumTree which was one of the pioneering approaches. The main drawback of GumTree's algorithm is the use of an optimal, but expensive tree-edit distance algorithm that makes it difficult to diff large ASTs. In this article, we describe a less expensive heuristic that enables GumTree to scale to large ASTs while yielding results of better quality than the original GumTree. We validate this new heuristic against 4 datasets of changes in two different languages, where we generate edit-scripts with a median size 50% smaller and a total speedup of the matching time between 50x and 281x.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles