SrcDiff is an+ infrastructure for syntactic differencing to support research on software change. The infrastructure will also have the additional benefit of supporting software development in practice. The core of the infrastructure is the srcDiff algorithm. srcDiff is constructed using srcML and its associated XML format that embeds abstract syntactic information directly into source code.
SrcDiff infraestructure results are represented in the srcDiff format. This format gives us the ability to query and conduct analysis of the differences. Additionally, one can also perform various manipulations and transformations on the differences. More importantly, since the srcDiff format is XML.
LibsrcDiff will also be designed to be interoperable with libsrcml (the srcML API). libsrcDiff will provide extended behavior specialized to the format and for change analysis tasks.
The full range of XML technologies can be applied to the srcML format.
The current version of srcDiff supports C and C++. Extending it to work for C# and Java requires little or no change to the underlying algorithm.
The srcDiff format extends the srcML format with the addition of four XML tags (diff:common, diff:delete, diff:insert, and diff:ws) to contain original and modified source code (i.e., any two versions) marked as the delta or the set of changes to the original source code (base version).
Typically, syntactic-differencing methods support additional edits (e.g., update node value, move, etc.). Because srcDiff marks up text directly (e.g., renamed identifier in Figure 1), it does not need a separate edit for an update. In srcDiff, moves are marked as a delete (moved from) and insert (moved to) tags with an attribute move and a unique identifier. Currently, srcDiff prototype supports limited detection for moves within a file as part of the approach. Additional, stages can be added post-process to support additional move detection. As the infrastructure matures, additional move detection may be added.