Efficient Change Impact Quantification by Global AST Hashing

Modern software projects become more and more complex. For example, Linux consists of millions of lines of code contributed by thousands of developers. Additionally, thousands of configuration options increase the complexity further. Therefore, reviewing proposed changes to ensure code quality and minimize unwanted side effects is imperative.

However, deducing affected components from a mere textual change is an intricate problem and correctly assessing the risk is even more difficult. Further, manually inferring the complete potential impact of a change on other components (e.g., calling functions) is nearly impossible. Hence, a complete field of research was founded around the topic of change impact analysis.

In previous work, we have developed a semantic hashing algorithm on the abstract-syntax-tree level of the compiler to generate a hash of each component (i.e., functions, global variables) in a translation unit. We implemented this algorithm via a compiler plugin that also extracts references between components. Based on this information we create a reference graph containing all components of a program. This graph is then used to compute a global hash for each component from its local hash and the global hash of all referenced components.

Goal of this thesis is to utilize the graph to identify components affected by a change and to quantify its impact. For this purpose, a suitable metric must be developed and evaluated on the basis of the development history of different open-source projects.

USENIX Conference A Best Paper Award
cHash: Detection of Redundant Compilations via AST Hashing
Christian Dietrich, Valentin Rothberg, Ludwig Füracker, Andreas Ziegler, Daniel LohmannProceedings of the 2017 USENIX Annual Technical Conference (USENIX '17)USENIX Association2017Best Paper Award.
PDF Details Slides Raw Data [BibTex]