research-article

Towards Reliable AI for Source Code Understanding

Authors:

Alessandro MorariAuthors Info & Claims

SoCC '21: Proceedings of the ACM Symposium on Cloud Computing

Pages 403 - 411

https://doi.org/10.1145/3472883.3486995

Published: 01 November 2021 Publication History

Abstract

Cloud maturity and popularity have resulted in Open source software (OSS) proliferation. And, in turn, managing OSS code quality has become critical in ensuring sustainable Cloud growth. On this front, AI modeling has gained popularity in source code understanding tasks, promoted by the ready availability of large open codebases. However, we have been observing certain peculiarities with these black-boxes, motivating a call for their reliability to be verified before offsetting traditional code analysis. In this work, we highlight and organize different reliability issues affecting AI-for-code into three stages of an AI pipeline- data collection, model training, and prediction analysis. We highlight the need for concerted efforts from the research community to ensure credibility, accountability, and traceability for AI-for-code. For each stage, we discuss unique opportunities afforded by the source code and software engineering setting to improve AI reliability.

Supplementary Material

MP4 File (Day2_8-4.mp4)

Presentation video

Download
151.64 MB

References

[1]

Julius Adebayo, Justin Gilmer, Michael Muelly, Ian J. Goodfellow, Moritz Hardt, and Been Kim. 2018. Sanity Checks for Saliency Maps. (2018), 9525--9536. https://proceedings.neurips.cc/paper/2018/hash/294a8ed24blad22ec2e7efea049b8737-Abstract.html

[2]

Miltiadis Allamanis. 2019. The adverse effects of code duplication in machine learning models of code. In Onward! 2019. 143--153. https://doi.org/10.1145/3359591.3359735

Digital Library

[3]

Miltiadis Allamanis, Earl T. Barr, Christian Bird, and Charles Sutton. 2015. Suggesting accurate method and class names. In ESEC/FSE 2015. ACM, 38--49. https://doi.org/10.1145/2786805.2786849

Digital Library

[4]

Miltiadis Allamanis, Marc Brockschmidt, and Mahmoud Khademi. 2018. Learning to Represent Programs with Graphs. In ICLR 2018. https://openreview.net/forum?id=BJOFETxR-

[5]

M. Allamanis, H. Peng, and C. Sutton. 2016. A Convolutional Attention Network for Extreme Summarization of Source Code. In ICML 2016, Vol. 48. 2091--2100. http://proceedings.mlr.press/v48/allamanis16.html

[6]

Abdul Ali Bangash, Hareem Sahar, Abram Hindle, and Karim Ali. 2020. On the time-based conclusion stability of cross-project defect prediction models. Empir. Softw. Eng. 25, 6 (2020), 5047--5083. https://doi.org/10.1007/s10664-020-09878-9

Digital Library

[7]

Rohan Bavishi, Michael Pradel, and Koushik Sen. 2018. Context2Name: A Deep Learning-Based Approach to Infer Natural Variable Names from Usage Contexts. (2018). http://arxiv.org/abs/1809.05193

[8]

Pavol Bielik and Martin T. Vechev. 2020. Adversarial Robustness for Code. In ICML 2020 (Proceedings of Machine Learning Research, Vol. 119). 896--907. http://proceedings.mlr.press/v119/bielik20a.html

[9]

R. Boim, O. Greenshpan, T. Milo, S. Novgorodov, N. Polyzotis, and W. Tan. 2012. Asking the Right Questions in Crowd Data Sourcing. In IEEE 28th International Conference on Data Engineering (ICDE 2012). https://doi.org/10.1109/ICDE.2012.122

Digital Library

[10]

Alfredo Canziani, Adam Paszke, and Eugenio Culurciello. 2016. An Analysis of Deep Neural Network Models for Practical Applications. (2016). http://arxiv.org/abs/1605.07678

[11]

Joymallya Chakraborty, Suvodeep Majumder, and Tim Menzies. 2021. Bias in machine learning software: why? how? what to do?. In ESEC/FSE 2021. 429--440. https://doi.org/10.1145/3468264.3468537

Digital Library

[12]

S. Chakraborty, R. Krishna, Y. Ding, and B. Ray. 2021. Deep Learning based Vulnerability Detection: Are We There Yet. IEEE Trans. Software Eng. (2021). https://doi.org/10.1109/TSE.2021.3087402

[13]

Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer. 2002. SMOTE: Synthetic Minority Over-sampling Technique. J. Artif. Intell. Res. 16 (2002). https://doi.org/10.1613/jair.953

[14]

Alexandra Chouldechova. 2017. Fair Prediction with Disparate Impact: A Study of Bias in Recidivism Prediction Instruments. Big Data 5, 2 (2017), 153--163. https://doi.org/10.1089/big.2016.0047

[15]

M. Ciniselli, N. Cooper, L. Pascarella, D. Poshyvanyk, M. Penta, and G. Bavota. 2021. An Empirical Study on the Usage of BERT Models for Code Completion. In MSR 2021. 108--119. https://doi.org/10.1109/MSR52588.2021.00024

[16]

Sam Corbett-Davies, Emma Pierson, Avi Feller, Sharad Goel, and Aziz Huq. 2017. Algorithmic Decision Making and the Cost of Fairness. In KDD 2017. 797--806. https://doi.org/10.1145/3097983.3098095

Digital Library

[17]

Hoa Khanh Dam, Truyen Tran, Trang Pham, Shien Wee Ng, John Grundy, and Aditya Ghose. 2021. Automatic Feature Learning for Predicting Vulnerable Software Components. IEEE Trans. Software Eng. 47, 1 (2021), 67--85. https://doi.org/10.1109/TSE.2018.2881961

Digital Library

[18]

Thomas Davidson, Debasmita Bhattacharya, and Ingmar Weber. 2019. Racial Bias in Hate Speech and Abusive Language Detection Datasets. (2019). http://arxiv.org/abs/1905.12516

[19]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In (CVPR 2009. 248--255. https://doi.org/10.1109/CVPR.2009.5206848

[20]

E. Dinella, H. Dai, Z. Li, M. Naik, L. Song, and K. Wang. 2020. Hoppity: Learning Graph Transformations to Detect and Fix Bugs in Programs. In ICLR 2020. https://openreview.net/forum?id=SJeqs6EFvB

[21]

Bill Doerrfeld. 2021. How Open Source Software Powers Digital Innovation. https://devops.com/how-open-source-software-powers-digital-innovation/.

[22]

Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam D. Smith. 2006. Calibrating Noise to Sensitivity in Private Data Analysis. In TCC 2006. 265--284. https://doi.org/10.1007/11681878_14

Digital Library

[23]

Fábio F. Ferreira, Luciana Lourdes Silva, and Marco Tulio Valente. 2021. Software engineering meets deep learning: a mapping study. In SAC 2021. ACM, 1542--1549. https://doi.org/10.1145/3412841.3442029

Digital Library

[24]

Jaroslav M. Fowkes, Pankajan Chanthirasegaran, Razvan Ranca, Miltiadis Allamanis, Mirella Lapata, and Charles Sutton. 2017. Autofolding for Source Code Summarization. IEEE Trans. Software Eng. 43, 12 (2017), 1095--1109. https://doi.org/10.1109/TSE.2017.2664836

Digital Library

[25]

Github. 2021. GitHub Copilot- Your AI pair programmer. https://copilot.github.com/.

[26]

Mojdeh Golagha, Alexander Pretschner, and Lionel C. Briand. 2020. Can We Predict the Quality of Spectrum-based Fault Localization?. In ICST 2020. 4--15. https://doi.org/10.1109/ICST46399.2020.00012

[27]

S. Gowal, K. Dvijotham, R. Stanforth, R. Bunel, C. Qin, J. Uesato, R. Arandjelovic, T. Mann, and P. Kohli. 2018. On the Effectiveness of Interval Bound Propagation for Training Verifiably Robust Models. (2018). http://arxiv.org/abs/1810.12715

[28]

David Gros, Hariharan Sezhiyan, Prem Devanbu, and Zhou Yu. 2020. Code to Comment "Translation": Data, Metrics, Baselining & Evaluation. In ASE 2020. 746--757. https://doi.org/10.1145/3324884.3416546

Digital Library

[29]

Red Hat. 2021. The State of Enterprise Open Source. https://www.redhat.com/en/enterprise-open-source-report/2021.

[30]

Xin He, Kaiyong Zhao, and Xiaowen Chu. 2021. AutoML: A survey of the state-of-the-art. Knowl. Based Syst. 212 (2021). https://doi.org/10.1016/j.knosys.2020.106622

[31]

Joel Hestness, Newsha Ardalani, and Gregory F. Diamos. 2019. Beyond human-level accuracy: computational challenges in deep learning. In ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2019. 1--14. https://doi.org/10.1145/3293883.3295710

Digital Library

[32]

B. Hitaj, G. Ateniese, and F. Pérez-Cruz. 2017. Deep Models Under the GAN: Information Leakage from Collaborative Deep Learning. In CCS 2017. 603--618. https://doi.org/10.1145/3133956.3134012

Digital Library

[33]

Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, and Luke Zettlemoyer. 2016. Summarizing Source Code using a Neural Attention Model. In ACL 2016. https://doi.org/10.18653/v1/p16-1195

[34]

Matthieu Jimenez, Renaud Rwemalika, Mike Papadakis, Federica Sarro, Yves Le Traon, and Mark Harman. 2019. The importance of accounting for real-world labelling when predicting software vulnerabilities. In ESEC/FSE 2019. 695--705. https://doi.org/10.1145/3338906.3338941

Digital Library

[35]

Ugur Koc, Parsa Saadatpanah, Jeffrey S. Foster, and Adam A. Porter. 2017. Learning a classifier for false positive error reports emitted by static code analysis tools. In MAPL 2017. 35--42. https://doi.org/10.1145/3088525.3088675

Digital Library

[36]

Alexander LeClair, Sakib Haque, Lingfei Wu, and Collin McMillan. 2020. Improved Code Summarization via a Graph Neural Network. In ICPC 2020. 184--195. https://doi.org/10.1145/3387904.3389268

Digital Library

[37]

Xiaochen Li, He Jiang, Zhilei Ren, Ge Li, and Jingxuan Zhang. 2018. Deep Learning in Software Engineering. CoRR abs/1805.04825 (2018). arXiv:1805.04825 http://arxiv.org/abs/1805.04825

[38]

Zhen Li, Deqing Zou, Shouhuai Xu, Hai Jin, Yawei Zhu, and Zhaoxuan Chen. 2021. SySeVR: A Framework for Using Deep Learning to Detect Software Vulnerabilities. IEEE Transactions on Dependable and Secure Computing (2021). https://doi.org/10.1109/TDSC.2021.3051525

[39]

Zhen Li, Deqing Zou, Shouhuai Xu, Xinyu Ou, Hai Jin, Sujuan Wang, Zhijun Deng, and Yuyi Zhong. 2018. In Network and Distributed System Security Symposium, NDSS 2018. http://wp.internetsociety.org/ndss/wp-content/uploads/sites/25/2018/02/ndss2018_03A-2_Li_paper.pdf

[40]

Tsung-Yi Lin, Michael Maire, Serge J. Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common Objects in Context. In European Conference on Computer Vision, ECCV 2014. 740--755. https://doi.org/10.1007/978-3-319-10602-1_48

[41]

K. Liu, D. Kim, T. Bissyandé, T. Kim, K. Kim, A. Koyuncu, S. Kim, and Y. Traon. 2019. Learning to spot and refactor inconsistent method names. In ICSE 2019. 1--12. https://doi.org/10.1109/ICSE.2019.00019

Digital Library

[42]

NIST. 2017. Juliet Test Suite for C/C++ V1.3. https://samate.nist.gov/SRD/testsuite.php

[43]

Sheena Panthaplackel, Pengyu Nie, Milos Gligoric, Junyi Jessy Li, and Raymond J. Mooney. 2020. Learning to Update Natural Language Comments Based on Code Changes. In ACL 2020. 1853--1868. https://doi.org/10.18653/v1/2020.acl-main.168

[44]

Hyunjung Park and Jennifer Widom. 2014. CrowdFill: collecting structured data from the crowd. In International Conference on Management of Data, SIGMOD 2014. https://doi.org/10.1145/2588555.2610503

Digital Library

[45]

Luca Pascarella, Fabio Palomba, and Alberto Bacchelli. 2020. On the performance of method-level bug prediction: A negative result. J. Syst. Softw. 161 (2020). https://doi.org/10.1016/j.jss.2019.110493

Digital Library

[46]

Md. Rafiqul Islam Rabin, Vincent J. Hellendoorn, and Mohammad Amin Alipour. 2021. Understanding neural code intelligence through program simplification. In ESEC/FSE 2021. 441--452. https://doi.org/10.1145/3468264.3468539

Digital Library

[47]

Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "Why Should I Trust You?": Explaining the Predictions of Any Classifier. In KDD 2016. 1135--1144. https://doi.org/10.1145/2939672.2939778

Digital Library

[48]

R. Russell, L. Kim, L. Hamilton, T. Lazovich, J. Harer, O. Ozdemir, P. Ellingwood, and M. McConley. 2018. Automated Vulnerability Detection in Source Code Using Deep Representation Learning. In ICMLA 2018. https://doi.org/10.1109/ICMLA.2018.00120

[49]

Carson D. Sestili, William S. Snavely, and Nathan M. VanHoudnos. 2018. Towards security defect prediction with AI. CoRR abs/1808.09897 (2018). http://arxiv.org/abs/1808.09897

[50]

Sonatype. 2020. State of the Software Supply Chain Report. https://www.sonatype.com/resources/white-paper-state-of-the-software-supply-chain-2020.

[51]

Yulei Sui, Xiao Cheng, Guanqin Zhang, and Haoyu Wang. 2020. Flow2Vec: value-flow-based precise code embedding. Proc. ACM Program. Lang. 4, OOPSLA (2020), 233:1--233:27. https://doi.org/10.1145/3428301

Digital Library

[52]

Sahil Suneja, Yunhui Zheng, Yufan Zhuang, Jim Laredo, and Alessandro Morari. 2020. Learning to map source code to software vulnerability using code-as-a-graph. (2020). https://arxiv.org/abs/2006.08614

[53]

Sahil Suneja, Yunhui Zheng, Yufan Zhuang, Jim Alain Laredo, and Alessandro Morari. 2021. Probing model signal-awareness via prediction-preserving input minimization. In ESEC/FSE 2021. 945--955. https://doi.org/10.1145/3468264.3468545

Digital Library

[54]

Alexey Svyatkovskiy, Sebastian Lee, Anna Hadjitofi, Maik Riechert, Juliana Vicente Franco, and Miltiadis Allamanis. 2021. Fast and Memory-Efficient Neural Code Completion. In MSR 2021. 329--340. https://doi.org/10.1109/MSR52588.2021.00045

[55]

Synopsys. 2020. Open Source Security and Risk Analysis Report. https://www.synopsys.com/software-integrity/resources/analyst-reports/open-source-security-risk-analysis.html.

[56]

Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian J. Goodfellow, and Rob Fergus. 2014. Intriguing properties of neural networks. In 2nd International Conference on Learning Representations, ICLR 2014. http://arxiv.org/abs/1312.6199

[57]

Omer Tripp, Salvatore Guarnieri, Marco Pistoia, and Aleksandr Y. Aravkin. 2014. ALETHEIA: Improving the Usability of Static Security Analysis. In CCS 2014. 762--774. https://doi.org/10.1145/2660267.2660339

Digital Library

[58]

Cody Watson, Nathan Cooper, David Nader-Palacio, Kevin Moran, and Denys Poshyvanyk. 2020. A Systematic Literature Review on the Use of Deep Learning in Software Engineering Research. (2020). https://arxiv.org/abs/2009.06520

[59]

T. Weng, H. Zhang, H. Chen, Z. Song, C. Hsieh, L. Daniel, D. Boning, and I. Dhillon. 2018. Towards Fast Computation of Certified Robustness for ReLU Networks. In ICML 2018. 5273--5282. http://proceedings.mlr.press/v80/weng18a.html

[60]

Yanming Yang, Xin Xia, David Lo, and John C. Grundy. 2020. A Survey on Deep Learning for Software Engineering. CoRR abs/2011.14597 (2020). arXiv:2011.14597 https://arxiv.org/abs/2011.14597

[61]

Q. Yao, M. Wang, H. Escalante, I. Guyon, Y. Hu, Y. Li, W. Tu, Q. Yang, and Y. Yu. 2018. Taking Human out of Learning Applications: A Survey on Automated Machine Learning. (2018). http://arxiv.org/abs/1810.13306

[62]

Ulas Yuksel and Hasan Sözer. 2013. Automated Classification of Static Code Analysis Alerts: A Case Study. In ICSM 2013. 532--535. https://doi.org/10.1109/ICSM.2013.89

Digital Library

[63]

Andreas Zeller and Ralf Hildebrandt. 2002. Simplifying and Isolating Failure-Inducing Input. IEEE Trans. Software Eng. 28, 2 (2002), 183--200. https://doi.org/10.1109/32.988498

Digital Library

[64]

Y. Zheng, S. Pujar, B. Lewis, L. Buratti, E. Epstein, B. Yang, J. Laredo, A. Morari, and Z. Su. 2021. D2A: A Dataset Built for AI-Based Vulnerability Detection Methods Using Differential Analysis. In ICSE-SEIP 2021. 111--120. https://doi.org/10.1109/ICSE-SEIP52600.2021.00020

Digital Library

[65]

Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. 2016. Learning Deep Features for Discriminative Localization. In CVPR 2016. 2921--2929. https://doi.org/10.1109/CVPR.2016.319

[66]

Yaqin Zhou, Shangqing Liu, Jing Kai Siow, Xiaoning Du, and Yang Liu. 2019. Devign: Effective Vulnerability Identification by Learning Comprehensive Program Semantics via Graph Neural Networks. In NeurIPS 2019. 10197--10207. https://proceedings.neurips.cc/paper/2019/hash/49265d2447bc3bbfe9e76306ce40a31f-Abstract.html

[67]

Yufan Zhuang, Sahil Suneja, Veronika Thost, Giacomo Domeniconi, Alessandro Morari, and Jim Laredo. 2021. Software Vulnerability Detection via Deep Learning over Disaggregated Code Graph Representation. (2021). https://arxiv.org/abs/2109.03341

Cited By

Suneja SZhuang YZheng YLaredo JMorari AKhurana U(2023)Incorporating Signal Awareness in Source Code Modeling: An Application to Vulnerability DetectionACM Transactions on Software Engineering and Methodology10.1145/359720232:6(1-40)Online publication date: 29-Sep-2023
https://dl.acm.org/doi/10.1145/3597202
Islam Rabin MHussain ASuneja SAlipour M(2023)Study of Distractors in Neural Models of Code2023 IEEE/ACM International Workshop on Interpretability and Robustness in Neural Software Engineering (InteNSE)10.1109/InteNSE59150.2023.00005(1-7)Online publication date: May-2023
https://doi.org/10.1109/InteNSE59150.2023.00005
Lin YLi YGu MSun HYue QHu JCao CZhang Y(2022)Vulnerability Dataset Construction Methods Applied To Vulnerability Detection: A Survey2022 52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W)10.1109/DSN-W54100.2022.00032(141-146)Online publication date: Jun-2022
https://doi.org/10.1109/DSN-W54100.2022.00032

Index Terms

Towards Reliable AI for Source Code Understanding
1. Computing methodologies
  1. Machine learning
2. Software and its engineering
  1. Software notations and tools

Recommendations

Understanding and Detecting Harmful Code
SBES '20: Proceedings of the XXXIV Brazilian Symposium on Software Engineering

Code smells typically indicate poor design implementation and choices that may degrade software quality. Hence, they need to be carefully detected to avoid such poor design. In this context, some studies try to understand the impact of code smells on the ...
Is your code harmful too? Understanding harmful code through transfer learning
SBQS '23: Proceedings of the XXII Brazilian Symposium on Software Quality

Code smells are indicators of poor design implementation and decision-making that can potentially harm the quality of software. Therefore, detecting these smells is crucial to prevent such issues. Some studies aim to comprehend the impact of code smells ...
Using historical data from source code revision histories to detect source code properties

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SoCC '21: Proceedings of the ACM Symposium on Cloud Computing

November 2021

685 pages

ISBN:9781450386388

DOI:10.1145/3472883

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 November 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

SoCC '21

Sponsor:

SoCC '21: ACM Symposium on Cloud Computing

November 1 - 4, 2021

WA, Seattle, USA

Acceptance Rates

Overall Acceptance Rate 169 of 722 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
412
Total Downloads

Downloads (Last 12 months)81
Downloads (Last 6 weeks)10

Reflects downloads up to 17 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Suneja SZhuang YZheng YLaredo JMorari AKhurana U(2023)Incorporating Signal Awareness in Source Code Modeling: An Application to Vulnerability DetectionACM Transactions on Software Engineering and Methodology10.1145/359720232:6(1-40)Online publication date: 29-Sep-2023
https://dl.acm.org/doi/10.1145/3597202
Islam Rabin MHussain ASuneja SAlipour M(2023)Study of Distractors in Neural Models of Code2023 IEEE/ACM International Workshop on Interpretability and Robustness in Neural Software Engineering (InteNSE)10.1109/InteNSE59150.2023.00005(1-7)Online publication date: May-2023
https://doi.org/10.1109/InteNSE59150.2023.00005
Lin YLi YGu MSun HYue QHu JCao CZhang Y(2022)Vulnerability Dataset Construction Methods Applied To Vulnerability Detection: A Survey2022 52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W)10.1109/DSN-W54100.2022.00032(141-146)Online publication date: Jun-2022
https://doi.org/10.1109/DSN-W54100.2022.00032

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents