Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Toward a Theory of Causation for Interpreting Neural Code Models

Published: 01 May 2024 Publication History

Abstract

Neural Language Models of Code, or Neural Code Models (NCMs), are rapidly progressing from research prototypes to commercial developer tools. As such, understanding the capabilities and limitations of such models is becoming critical. However, the abilities of these models are typically measured using automated metrics that often only reveal a portion of their real-world performance. While, in general, the performance of NCMs appears promising, currently much is unknown about how such models arrive at decisions. To this end, this paper introduces <italic>do<inline-formula><tex-math notation="LaTeX">${}_{\textbf{code}}$</tex-math><alternatives><mml:math display="inline"><mml:msub><mml:mrow/><mml:mrow><mml:mtext mathvariant="bold">code</mml:mtext></mml:mrow></mml:msub></mml:math><inline-graphic xlink:href="naderpalacio-ieq1-3379943.gif"/></alternatives></inline-formula> </italic>, a post hoc interpretability method specific to NCMs that is capable of explaining model predictions. <italic>do<inline-formula><tex-math notation="LaTeX">${}_{\textbf{code}}$</tex-math><alternatives><mml:math display="inline"><mml:msub><mml:mrow/><mml:mrow><mml:mtext mathvariant="bold">code</mml:mtext></mml:mrow></mml:msub></mml:math><inline-graphic xlink:href="naderpalacio-ieq2-3379943.gif"/></alternatives></inline-formula> </italic> is based upon causal inference to enable programming language-oriented explanations. While the theoretical underpinnings of <italic>do<inline-formula><tex-math notation="LaTeX">${}_{\textbf{code}}$</tex-math><alternatives><mml:math display="inline"><mml:msub><mml:mrow/><mml:mrow><mml:mtext mathvariant="bold">code</mml:mtext></mml:mrow></mml:msub></mml:math><inline-graphic xlink:href="naderpalacio-ieq3-3379943.gif"/></alternatives></inline-formula> </italic> are extensible to exploring different model properties, we provide a concrete instantiation that aims to mitigate the impact of <italic>spurious correlations</italic> by grounding explanations of model behavior in properties of programming languages. To demonstrate the practical benefit of <italic>do<inline-formula><tex-math notation="LaTeX">${}_{\textbf{code}}$</tex-math><alternatives><mml:math display="inline"><mml:msub><mml:mrow/><mml:mrow><mml:mtext mathvariant="bold">code</mml:mtext></mml:mrow></mml:msub></mml:math><inline-graphic xlink:href="naderpalacio-ieq4-3379943.gif"/></alternatives></inline-formula> </italic>, we illustrate the insights that our framework can provide by performing a case study on two popular deep learning architectures and ten NCMs. The results of this case study illustrate that our studied NCMs are sensitive to changes in code syntax. All our NCMs, except for the BERT-like model, statistically learn to predict tokens related to blocks of code (<italic>e.g.,</italic> brackets, parenthesis, semicolon) with less confounding bias as compared to other programming language constructs. These insights demonstrate the potential of <italic>do<inline-formula><tex-math notation="LaTeX">${}_{\textbf{code}}$</tex-math><alternatives><mml:math display="inline"><mml:msub><mml:mrow/><mml:mrow><mml:mtext mathvariant="bold">code</mml:mtext></mml:mrow></mml:msub></mml:math><inline-graphic xlink:href="naderpalacio-ieq5-3379943.gif"/></alternatives></inline-formula> </italic> as a useful method to detect and facilitate the elimination of confounding bias in NCMs.

Cited By

View all
  • (2024)Snopy: Bridging Sample Denoising with Causal Graph Learning for Effective Vulnerability DetectionProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695057(606-618)Online publication date: 27-Oct-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Software Engineering
IEEE Transactions on Software Engineering  Volume 50, Issue 5
May 2024
291 pages

Publisher

IEEE Press

Publication History

Published: 01 May 2024

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 08 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Snopy: Bridging Sample Denoising with Causal Graph Learning for Effective Vulnerability DetectionProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695057(606-618)Online publication date: 27-Oct-2024

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media