Abstract
When it is necessary to express changes between two source code files as a list of edit actions (an edit script), modern tree differencing algorithms are superior to most text-based approaches because they take code movements into account and express source code changes more accurately. We present 5 general optimizations that can be added to state-of-the-art tree differencing algorithms to shorten the resulting edit scripts. Applied to Gumtree, RTED, JSync, and ChangeDistiller, they lead to shorter scripts for 1898% of the changes in the histories of 9 open-source software repositories. These optimizations also are parts of our novel Move-optimized Tree DIFFerencing algorithm (MTD-IFF) that has a higher accuracy in detecting moved code parts. MTDIFF (which is based on the ideas of ChangeDistiller) further shortens the edit script for another 20% of the changes in the repositories. MTDIFF and all the benchmarks are available under an open-source license.