Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3472883.3486995acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Towards Reliable AI for Source Code Understanding

Published: 01 November 2021 Publication History

Abstract

Cloud maturity and popularity have resulted in Open source software (OSS) proliferation. And, in turn, managing OSS code quality has become critical in ensuring sustainable Cloud growth. On this front, AI modeling has gained popularity in source code understanding tasks, promoted by the ready availability of large open codebases. However, we have been observing certain peculiarities with these black-boxes, motivating a call for their reliability to be verified before offsetting traditional code analysis. In this work, we highlight and organize different reliability issues affecting AI-for-code into three stages of an AI pipeline- data collection, model training, and prediction analysis. We highlight the need for concerted efforts from the research community to ensure credibility, accountability, and traceability for AI-for-code. For each stage, we discuss unique opportunities afforded by the source code and software engineering setting to improve AI reliability.

Supplementary Material

MP4 File (Day2_8-4.mp4)
Presentation video

References

[1]
Julius Adebayo, Justin Gilmer, Michael Muelly, Ian J. Goodfellow, Moritz Hardt, and Been Kim. 2018. Sanity Checks for Saliency Maps. (2018), 9525--9536. https://proceedings.neurips.cc/paper/2018/hash/294a8ed24blad22ec2e7efea049b8737-Abstract.html
[2]
Miltiadis Allamanis. 2019. The adverse effects of code duplication in machine learning models of code. In Onward! 2019. 143--153. https://doi.org/10.1145/3359591.3359735
[3]
Miltiadis Allamanis, Earl T. Barr, Christian Bird, and Charles Sutton. 2015. Suggesting accurate method and class names. In ESEC/FSE 2015. ACM, 38--49. https://doi.org/10.1145/2786805.2786849
[4]
Miltiadis Allamanis, Marc Brockschmidt, and Mahmoud Khademi. 2018. Learning to Represent Programs with Graphs. In ICLR 2018. https://openreview.net/forum?id=BJOFETxR-
[5]
M. Allamanis, H. Peng, and C. Sutton. 2016. A Convolutional Attention Network for Extreme Summarization of Source Code. In ICML 2016, Vol. 48. 2091--2100. http://proceedings.mlr.press/v48/allamanis16.html
[6]
Abdul Ali Bangash, Hareem Sahar, Abram Hindle, and Karim Ali. 2020. On the time-based conclusion stability of cross-project defect prediction models. Empir. Softw. Eng. 25, 6 (2020), 5047--5083. https://doi.org/10.1007/s10664-020-09878-9
[7]
Rohan Bavishi, Michael Pradel, and Koushik Sen. 2018. Context2Name: A Deep Learning-Based Approach to Infer Natural Variable Names from Usage Contexts. (2018). http://arxiv.org/abs/1809.05193
[8]
Pavol Bielik and Martin T. Vechev. 2020. Adversarial Robustness for Code. In ICML 2020 (Proceedings of Machine Learning Research, Vol. 119). 896--907. http://proceedings.mlr.press/v119/bielik20a.html
[9]
R. Boim, O. Greenshpan, T. Milo, S. Novgorodov, N. Polyzotis, and W. Tan. 2012. Asking the Right Questions in Crowd Data Sourcing. In IEEE 28th International Conference on Data Engineering (ICDE 2012). https://doi.org/10.1109/ICDE.2012.122
[10]
Alfredo Canziani, Adam Paszke, and Eugenio Culurciello. 2016. An Analysis of Deep Neural Network Models for Practical Applications. (2016). http://arxiv.org/abs/1605.07678
[11]
Joymallya Chakraborty, Suvodeep Majumder, and Tim Menzies. 2021. Bias in machine learning software: why? how? what to do?. In ESEC/FSE 2021. 429--440. https://doi.org/10.1145/3468264.3468537
[12]
S. Chakraborty, R. Krishna, Y. Ding, and B. Ray. 2021. Deep Learning based Vulnerability Detection: Are We There Yet. IEEE Trans. Software Eng. (2021). https://doi.org/10.1109/TSE.2021.3087402
[13]
Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer. 2002. SMOTE: Synthetic Minority Over-sampling Technique. J. Artif. Intell. Res. 16 (2002). https://doi.org/10.1613/jair.953
[14]
Alexandra Chouldechova. 2017. Fair Prediction with Disparate Impact: A Study of Bias in Recidivism Prediction Instruments. Big Data 5, 2 (2017), 153--163. https://doi.org/10.1089/big.2016.0047
[15]
M. Ciniselli, N. Cooper, L. Pascarella, D. Poshyvanyk, M. Penta, and G. Bavota. 2021. An Empirical Study on the Usage of BERT Models for Code Completion. In MSR 2021. 108--119. https://doi.org/10.1109/MSR52588.2021.00024
[16]
Sam Corbett-Davies, Emma Pierson, Avi Feller, Sharad Goel, and Aziz Huq. 2017. Algorithmic Decision Making and the Cost of Fairness. In KDD 2017. 797--806. https://doi.org/10.1145/3097983.3098095
[17]
Hoa Khanh Dam, Truyen Tran, Trang Pham, Shien Wee Ng, John Grundy, and Aditya Ghose. 2021. Automatic Feature Learning for Predicting Vulnerable Software Components. IEEE Trans. Software Eng. 47, 1 (2021), 67--85. https://doi.org/10.1109/TSE.2018.2881961
[18]
Thomas Davidson, Debasmita Bhattacharya, and Ingmar Weber. 2019. Racial Bias in Hate Speech and Abusive Language Detection Datasets. (2019). http://arxiv.org/abs/1905.12516
[19]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In (CVPR 2009. 248--255. https://doi.org/10.1109/CVPR.2009.5206848
[20]
E. Dinella, H. Dai, Z. Li, M. Naik, L. Song, and K. Wang. 2020. Hoppity: Learning Graph Transformations to Detect and Fix Bugs in Programs. In ICLR 2020. https://openreview.net/forum?id=SJeqs6EFvB
[21]
Bill Doerrfeld. 2021. How Open Source Software Powers Digital Innovation. https://devops.com/how-open-source-software-powers-digital-innovation/.
[22]
Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam D. Smith. 2006. Calibrating Noise to Sensitivity in Private Data Analysis. In TCC 2006. 265--284. https://doi.org/10.1007/11681878_14
[23]
Fábio F. Ferreira, Luciana Lourdes Silva, and Marco Tulio Valente. 2021. Software engineering meets deep learning: a mapping study. In SAC 2021. ACM, 1542--1549. https://doi.org/10.1145/3412841.3442029
[24]
Jaroslav M. Fowkes, Pankajan Chanthirasegaran, Razvan Ranca, Miltiadis Allamanis, Mirella Lapata, and Charles Sutton. 2017. Autofolding for Source Code Summarization. IEEE Trans. Software Eng. 43, 12 (2017), 1095--1109. https://doi.org/10.1109/TSE.2017.2664836
[25]
Github. 2021. GitHub Copilot- Your AI pair programmer. https://copilot.github.com/.
[26]
Mojdeh Golagha, Alexander Pretschner, and Lionel C. Briand. 2020. Can We Predict the Quality of Spectrum-based Fault Localization?. In ICST 2020. 4--15. https://doi.org/10.1109/ICST46399.2020.00012
[27]
S. Gowal, K. Dvijotham, R. Stanforth, R. Bunel, C. Qin, J. Uesato, R. Arandjelovic, T. Mann, and P. Kohli. 2018. On the Effectiveness of Interval Bound Propagation for Training Verifiably Robust Models. (2018). http://arxiv.org/abs/1810.12715
[28]
David Gros, Hariharan Sezhiyan, Prem Devanbu, and Zhou Yu. 2020. Code to Comment "Translation": Data, Metrics, Baselining & Evaluation. In ASE 2020. 746--757. https://doi.org/10.1145/3324884.3416546
[29]
Red Hat. 2021. The State of Enterprise Open Source. https://www.redhat.com/en/enterprise-open-source-report/2021.
[30]
Xin He, Kaiyong Zhao, and Xiaowen Chu. 2021. AutoML: A survey of the state-of-the-art. Knowl. Based Syst. 212 (2021). https://doi.org/10.1016/j.knosys.2020.106622
[31]
Joel Hestness, Newsha Ardalani, and Gregory F. Diamos. 2019. Beyond human-level accuracy: computational challenges in deep learning. In ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2019. 1--14. https://doi.org/10.1145/3293883.3295710
[32]
B. Hitaj, G. Ateniese, and F. Pérez-Cruz. 2017. Deep Models Under the GAN: Information Leakage from Collaborative Deep Learning. In CCS 2017. 603--618. https://doi.org/10.1145/3133956.3134012
[33]
Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, and Luke Zettlemoyer. 2016. Summarizing Source Code using a Neural Attention Model. In ACL 2016. https://doi.org/10.18653/v1/p16-1195
[34]
Matthieu Jimenez, Renaud Rwemalika, Mike Papadakis, Federica Sarro, Yves Le Traon, and Mark Harman. 2019. The importance of accounting for real-world labelling when predicting software vulnerabilities. In ESEC/FSE 2019. 695--705. https://doi.org/10.1145/3338906.3338941
[35]
Ugur Koc, Parsa Saadatpanah, Jeffrey S. Foster, and Adam A. Porter. 2017. Learning a classifier for false positive error reports emitted by static code analysis tools. In MAPL 2017. 35--42. https://doi.org/10.1145/3088525.3088675
[36]
Alexander LeClair, Sakib Haque, Lingfei Wu, and Collin McMillan. 2020. Improved Code Summarization via a Graph Neural Network. In ICPC 2020. 184--195. https://doi.org/10.1145/3387904.3389268
[37]
Xiaochen Li, He Jiang, Zhilei Ren, Ge Li, and Jingxuan Zhang. 2018. Deep Learning in Software Engineering. CoRR abs/1805.04825 (2018). arXiv:1805.04825 http://arxiv.org/abs/1805.04825
[38]
Zhen Li, Deqing Zou, Shouhuai Xu, Hai Jin, Yawei Zhu, and Zhaoxuan Chen. 2021. SySeVR: A Framework for Using Deep Learning to Detect Software Vulnerabilities. IEEE Transactions on Dependable and Secure Computing (2021). https://doi.org/10.1109/TDSC.2021.3051525
[39]
Zhen Li, Deqing Zou, Shouhuai Xu, Xinyu Ou, Hai Jin, Sujuan Wang, Zhijun Deng, and Yuyi Zhong. 2018. In Network and Distributed System Security Symposium, NDSS 2018. http://wp.internetsociety.org/ndss/wp-content/uploads/sites/25/2018/02/ndss2018_03A-2_Li_paper.pdf
[40]
Tsung-Yi Lin, Michael Maire, Serge J. Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common Objects in Context. In European Conference on Computer Vision, ECCV 2014. 740--755. https://doi.org/10.1007/978-3-319-10602-1_48
[41]
K. Liu, D. Kim, T. Bissyandé, T. Kim, K. Kim, A. Koyuncu, S. Kim, and Y. Traon. 2019. Learning to spot and refactor inconsistent method names. In ICSE 2019. 1--12. https://doi.org/10.1109/ICSE.2019.00019
[42]
NIST. 2017. Juliet Test Suite for C/C++ V1.3. https://samate.nist.gov/SRD/testsuite.php
[43]
Sheena Panthaplackel, Pengyu Nie, Milos Gligoric, Junyi Jessy Li, and Raymond J. Mooney. 2020. Learning to Update Natural Language Comments Based on Code Changes. In ACL 2020. 1853--1868. https://doi.org/10.18653/v1/2020.acl-main.168
[44]
Hyunjung Park and Jennifer Widom. 2014. CrowdFill: collecting structured data from the crowd. In International Conference on Management of Data, SIGMOD 2014. https://doi.org/10.1145/2588555.2610503
[45]
Luca Pascarella, Fabio Palomba, and Alberto Bacchelli. 2020. On the performance of method-level bug prediction: A negative result. J. Syst. Softw. 161 (2020). https://doi.org/10.1016/j.jss.2019.110493
[46]
Md. Rafiqul Islam Rabin, Vincent J. Hellendoorn, and Mohammad Amin Alipour. 2021. Understanding neural code intelligence through program simplification. In ESEC/FSE 2021. 441--452. https://doi.org/10.1145/3468264.3468539
[47]
Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "Why Should I Trust You?": Explaining the Predictions of Any Classifier. In KDD 2016. 1135--1144. https://doi.org/10.1145/2939672.2939778
[48]
R. Russell, L. Kim, L. Hamilton, T. Lazovich, J. Harer, O. Ozdemir, P. Ellingwood, and M. McConley. 2018. Automated Vulnerability Detection in Source Code Using Deep Representation Learning. In ICMLA 2018. https://doi.org/10.1109/ICMLA.2018.00120
[49]
Carson D. Sestili, William S. Snavely, and Nathan M. VanHoudnos. 2018. Towards security defect prediction with AI. CoRR abs/1808.09897 (2018). http://arxiv.org/abs/1808.09897
[50]
Sonatype. 2020. State of the Software Supply Chain Report. https://www.sonatype.com/resources/white-paper-state-of-the-software-supply-chain-2020.
[51]
Yulei Sui, Xiao Cheng, Guanqin Zhang, and Haoyu Wang. 2020. Flow2Vec: value-flow-based precise code embedding. Proc. ACM Program. Lang. 4, OOPSLA (2020), 233:1--233:27. https://doi.org/10.1145/3428301
[52]
Sahil Suneja, Yunhui Zheng, Yufan Zhuang, Jim Laredo, and Alessandro Morari. 2020. Learning to map source code to software vulnerability using code-as-a-graph. (2020). https://arxiv.org/abs/2006.08614
[53]
Sahil Suneja, Yunhui Zheng, Yufan Zhuang, Jim Alain Laredo, and Alessandro Morari. 2021. Probing model signal-awareness via prediction-preserving input minimization. In ESEC/FSE 2021. 945--955. https://doi.org/10.1145/3468264.3468545
[54]
Alexey Svyatkovskiy, Sebastian Lee, Anna Hadjitofi, Maik Riechert, Juliana Vicente Franco, and Miltiadis Allamanis. 2021. Fast and Memory-Efficient Neural Code Completion. In MSR 2021. 329--340. https://doi.org/10.1109/MSR52588.2021.00045
[55]
Synopsys. 2020. Open Source Security and Risk Analysis Report. https://www.synopsys.com/software-integrity/resources/analyst-reports/open-source-security-risk-analysis.html.
[56]
Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian J. Goodfellow, and Rob Fergus. 2014. Intriguing properties of neural networks. In 2nd International Conference on Learning Representations, ICLR 2014. http://arxiv.org/abs/1312.6199
[57]
Omer Tripp, Salvatore Guarnieri, Marco Pistoia, and Aleksandr Y. Aravkin. 2014. ALETHEIA: Improving the Usability of Static Security Analysis. In CCS 2014. 762--774. https://doi.org/10.1145/2660267.2660339
[58]
Cody Watson, Nathan Cooper, David Nader-Palacio, Kevin Moran, and Denys Poshyvanyk. 2020. A Systematic Literature Review on the Use of Deep Learning in Software Engineering Research. (2020). https://arxiv.org/abs/2009.06520
[59]
T. Weng, H. Zhang, H. Chen, Z. Song, C. Hsieh, L. Daniel, D. Boning, and I. Dhillon. 2018. Towards Fast Computation of Certified Robustness for ReLU Networks. In ICML 2018. 5273--5282. http://proceedings.mlr.press/v80/weng18a.html
[60]
Yanming Yang, Xin Xia, David Lo, and John C. Grundy. 2020. A Survey on Deep Learning for Software Engineering. CoRR abs/2011.14597 (2020). arXiv:2011.14597 https://arxiv.org/abs/2011.14597
[61]
Q. Yao, M. Wang, H. Escalante, I. Guyon, Y. Hu, Y. Li, W. Tu, Q. Yang, and Y. Yu. 2018. Taking Human out of Learning Applications: A Survey on Automated Machine Learning. (2018). http://arxiv.org/abs/1810.13306
[62]
Ulas Yuksel and Hasan Sözer. 2013. Automated Classification of Static Code Analysis Alerts: A Case Study. In ICSM 2013. 532--535. https://doi.org/10.1109/ICSM.2013.89
[63]
Andreas Zeller and Ralf Hildebrandt. 2002. Simplifying and Isolating Failure-Inducing Input. IEEE Trans. Software Eng. 28, 2 (2002), 183--200. https://doi.org/10.1109/32.988498
[64]
Y. Zheng, S. Pujar, B. Lewis, L. Buratti, E. Epstein, B. Yang, J. Laredo, A. Morari, and Z. Su. 2021. D2A: A Dataset Built for AI-Based Vulnerability Detection Methods Using Differential Analysis. In ICSE-SEIP 2021. 111--120. https://doi.org/10.1109/ICSE-SEIP52600.2021.00020
[65]
Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. 2016. Learning Deep Features for Discriminative Localization. In CVPR 2016. 2921--2929. https://doi.org/10.1109/CVPR.2016.319
[66]
Yaqin Zhou, Shangqing Liu, Jing Kai Siow, Xiaoning Du, and Yang Liu. 2019. Devign: Effective Vulnerability Identification by Learning Comprehensive Program Semantics via Graph Neural Networks. In NeurIPS 2019. 10197--10207. https://proceedings.neurips.cc/paper/2019/hash/49265d2447bc3bbfe9e76306ce40a31f-Abstract.html
[67]
Yufan Zhuang, Sahil Suneja, Veronika Thost, Giacomo Domeniconi, Alessandro Morari, and Jim Laredo. 2021. Software Vulnerability Detection via Deep Learning over Disaggregated Code Graph Representation. (2021). https://arxiv.org/abs/2109.03341

Cited By

View all
  • (2023)Incorporating Signal Awareness in Source Code Modeling: An Application to Vulnerability DetectionACM Transactions on Software Engineering and Methodology10.1145/359720232:6(1-40)Online publication date: 29-Sep-2023
  • (2023)Study of Distractors in Neural Models of Code2023 IEEE/ACM International Workshop on Interpretability and Robustness in Neural Software Engineering (InteNSE)10.1109/InteNSE59150.2023.00005(1-7)Online publication date: May-2023
  • (2022)Vulnerability Dataset Construction Methods Applied To Vulnerability Detection: A Survey2022 52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W)10.1109/DSN-W54100.2022.00032(141-146)Online publication date: Jun-2022

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SoCC '21: Proceedings of the ACM Symposium on Cloud Computing
November 2021
685 pages
ISBN:9781450386388
DOI:10.1145/3472883
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 November 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. explainability
  2. machine learning
  3. reliability
  4. signal awareness

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

SoCC '21
Sponsor:
SoCC '21: ACM Symposium on Cloud Computing
November 1 - 4, 2021
WA, Seattle, USA

Acceptance Rates

Overall Acceptance Rate 169 of 722 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)81
  • Downloads (Last 6 weeks)10
Reflects downloads up to 17 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Incorporating Signal Awareness in Source Code Modeling: An Application to Vulnerability DetectionACM Transactions on Software Engineering and Methodology10.1145/359720232:6(1-40)Online publication date: 29-Sep-2023
  • (2023)Study of Distractors in Neural Models of Code2023 IEEE/ACM International Workshop on Interpretability and Robustness in Neural Software Engineering (InteNSE)10.1109/InteNSE59150.2023.00005(1-7)Online publication date: May-2023
  • (2022)Vulnerability Dataset Construction Methods Applied To Vulnerability Detection: A Survey2022 52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W)10.1109/DSN-W54100.2022.00032(141-146)Online publication date: Jun-2022

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media