research-article

Toward a Theory of Causation for Interpreting Neural Code Models

Authors:

Denys PoshyvanykAuthors Info & Claims

IEEE Transactions on Software Engineering, Volume 50, Issue 5

Pages 1215 - 1243

https://doi.org/10.1109/TSE.2024.3379943

Published: 01 May 2024 Publication History

Abstract

Neural Language Models of Code, or Neural Code Models (NCMs), are rapidly progressing from research prototypes to commercial developer tools. As such, understanding the capabilities and limitations of such models is becoming critical. However, the abilities of these models are typically measured using automated metrics that often only reveal a portion of their real-world performance. While, in general, the performance of NCMs appears promising, currently much is unknown about how such models arrive at decisions. To this end, this paper introduces <italic>do<inline-formula><tex-math notation="LaTeX">${}_{\textbf{code}}$</tex-math><alternatives><mml:math display="inline"><mml:msub><mml:mrow/><mml:mrow><mml:mtext mathvariant="bold">code</mml:mtext></mml:mrow></mml:msub></mml:math><inline-graphic xlink:href="naderpalacio-ieq1-3379943.gif"/></alternatives></inline-formula> </italic>, a post hoc interpretability method specific to NCMs that is capable of explaining model predictions. <italic>do<inline-formula><tex-math notation="LaTeX">${}_{\textbf{code}}$</tex-math><alternatives><mml:math display="inline"><mml:msub><mml:mrow/><mml:mrow><mml:mtext mathvariant="bold">code</mml:mtext></mml:mrow></mml:msub></mml:math><inline-graphic xlink:href="naderpalacio-ieq2-3379943.gif"/></alternatives></inline-formula> </italic> is based upon causal inference to enable programming language-oriented explanations. While the theoretical underpinnings of <italic>do<inline-formula><tex-math notation="LaTeX">${}_{\textbf{code}}$</tex-math><alternatives><mml:math display="inline"><mml:msub><mml:mrow/><mml:mrow><mml:mtext mathvariant="bold">code</mml:mtext></mml:mrow></mml:msub></mml:math><inline-graphic xlink:href="naderpalacio-ieq3-3379943.gif"/></alternatives></inline-formula> </italic> are extensible to exploring different model properties, we provide a concrete instantiation that aims to mitigate the impact of <italic>spurious correlations</italic> by grounding explanations of model behavior in properties of programming languages. To demonstrate the practical benefit of <italic>do<inline-formula><tex-math notation="LaTeX">${}_{\textbf{code}}$</tex-math><alternatives><mml:math display="inline"><mml:msub><mml:mrow/><mml:mrow><mml:mtext mathvariant="bold">code</mml:mtext></mml:mrow></mml:msub></mml:math><inline-graphic xlink:href="naderpalacio-ieq4-3379943.gif"/></alternatives></inline-formula> </italic>, we illustrate the insights that our framework can provide by performing a case study on two popular deep learning architectures and ten NCMs. The results of this case study illustrate that our studied NCMs are sensitive to changes in code syntax. All our NCMs, except for the BERT-like model, statistically learn to predict tokens related to blocks of code (<italic>e.g.,</italic> brackets, parenthesis, semicolon) with less confounding bias as compared to other programming language constructs. These insights demonstrate the potential of <italic>do<inline-formula><tex-math notation="LaTeX">${}_{\textbf{code}}$</tex-math><alternatives><mml:math display="inline"><mml:msub><mml:mrow/><mml:mrow><mml:mtext mathvariant="bold">code</mml:mtext></mml:mrow></mml:msub></mml:math><inline-graphic xlink:href="naderpalacio-ieq5-3379943.gif"/></alternatives></inline-formula> </italic> as a useful method to detect and facilitate the elimination of confounding bias in NCMs.

Cited By

View all

Cao SSun XWu XLo DBo LLi BLiu XLin XLiu WFilkov VRay BZhou M(2024)Snopy: Bridging Sample Denoising with Causal Graph Learning for Effective Vulnerability DetectionProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695057(606-618)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695057

Recommendations

Toward reusing code changes
MSR '15: Proceedings of the 12th Working Conference on Mining Software Repositories

Existing techniques have succeeded to help developers implement new code. However, they are insufficient to help to change existing code. Previous studies have proposed techniques to support bug fixes but other kinds of code changes such as function ...
Toward a novel taxonomy to capture code smells caused by refactoring
Highlights
- Investigate the associated risk of applying refactoring techniques and reveal the bad smells that may appear when fixing other bad smells.
- Conducted several controlled experiments to identify the smells that emerge after refactoring.
Abstract
Code smells tend to have an impact on software quality attributes such as reusability, maintainability, and understandability. These are code flaws that do not necessarily prevent the system from operating; rather, they increase the possibility ...
Code-carrying theory
SAC '08: Proceedings of the 2008 ACM symposium on Applied computing

Code-Carrying Theory (CCT) is an alternative to the Proof-Carrying Code (PCC) approach to secure delivery of code. With PCC, code is accompanied by assertions and a proof of correctness or of other required properties. The code consumer does not accept ...

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Software Engineering

IEEE Transactions on Software Engineering Volume 50, Issue 5

May 2024

291 pages

Issue’s Table of Contents

Publisher

IEEE Press

Publication History

Published: 01 May 2024

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 08 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Cao SSun XWu XLo DBo LLi BLiu XLin XLiu WFilkov VRay BZhou M(2024)Snopy: Bridging Sample Denoising with Causal Graph Learning for Effective Vulnerability DetectionProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695057(606-618)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695057

Abstract

Cited By

Recommendations

Toward reusing code changes

Toward a novel taxonomy to capture code smells caused by refactoring

Code-carrying theory

Comments

Information

Published In

Publisher

Publication History

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

Share

Share this Publication link

Share on social media

Affiliations