srcDiff is a syntactic differencing infrastructure for analyzing how source code changes over time. It combines efficient sequence differencing with language‑aware rules to produce deltas that are accurate, readable, and aligned with how developers actually edit code. srcDiff is built on srcML (see srcML.org), whose XML representation embeds abstract syntactic information directly into the source code, fully preserving code, comments, whitespace, and preprocessor directives.
Compared to traditional tools like GNU Diff, which highlight changes only at the line level, srcDiff provides a more robust analysis. A line-level diff can be coarse and harder to read, whereas srcDiff works at the syntactic level to make differences more precise and understandable. Furthermore, srcDiff produces an XML format by default, which supports further analysis and transformation.
The infrastructure outputs an XML format by default (srcDiff format) that enables querying, analysis, and transformation of changes. Because it is XML, the format integrates cleanly with standard XML tooling and pipelines for robust analysis and transformation workflows.
Human‑readable views: In addition to the XML, srcDiff provides side‑by‑side and unified views suitable for developers and reviewers.
libsrcdiff (under development) is designed to interoperate with libsrcml (the srcML API), adding change‑analysis capabilities specialized for srcDiff.
The current version of srcDiff supports C, C++, and Java. Extending it to work for Python and other languages requires little or no change to the underlying algorithm.
The srcDiff format extends the srcML format by introducing four additional XML tags:
The format captures both original (base version) and modified versions in a single multi‑version document marked as the delta or the set of changes to the original source code.