survey

Public Access

A Survey of Machine Learning for Big Code and Naturalness

Authors:

Miltiadis Allamanis,

Premkumar Devanbu,

Charles SuttonAuthors Info & Claims

ACM Computing Surveys (CSUR), Volume 51, Issue 4

Article No.: 81, Pages 1 - 37

https://doi.org/10.1145/3212695

Published: 31 July 2018 Publication History

Abstract

Research at the intersection of machine learning, programming languages, and software engineering has recently taken important steps in proposing learnable probabilistic models of source code that exploit the abundance of patterns of code. In this article, we survey this work. We contrast programming languages against natural languages and discuss how these similarities and differences drive the design of probabilistic models. We present a taxonomy based on the underlying design principles of each model and use it to navigate the literature. Then, we review how researchers have adapted these models to application areas and discuss cross-cutting and application-specific challenges and opportunities.

Supplementary Material

a81-allamanis-suppl.pdf (allamanis.zip)

Supplemental movie, appendix, image and software files for, A Survey of Machine Learning for Big Code and Naturalness

Download
28.25 KB

References

[1]

Mithun Acharya, Tao Xie, Jian Pei, and Jun Xu. 2007. Mining API patterns as partial orders from source code: From usage scenarios to specifications. In Proceedings of the Joint Meeting of the European Software Engineering Conference and the Symposium on the Foundations of Software Engineering (ESEC/FSE’07).

Digital Library

[2]

Karan Aggarwal, Mohammad Salameh, and Abram Hindle. 2015. Using Machine Translation for Converting Python 2 to Python 3 Code. Technical Report.

[3]

Alex A. Alemi, Francois Chollet, Geoffrey Irving, Christian Szegedy, and Josef Urban. 2016. DeepMath--Deep sequence models for premise selection. In Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS’16).

Digital Library

[4]

Miltiadis Allamanis, Earl T. Barr, Christian Bird, Premkumar Devanbu, Mark Marron, and Charles Sutton. 2016. Mining Semantic Loop Idioms from Big Code. Technical Report. Retrieved from https://www.microsoft.com/en-us/research/publication/mining-semantic-loop-idioms-big-code/.

[5]

Miltiadis Allamanis, Earl T. Barr, Christian Bird, and Charles Sutton. 2014. Learning natural coding conventions. In Proceedings of the International Symposium on Foundations of Software Engineering (FSE’14).

Digital Library

[6]

Miltiadis Allamanis, Earl T. Barr, Christian Bird, and Charles Sutton. 2015. Suggesting accurate method and class names. In Proceedings of the Joint Meeting of the European Software Engineering Conference and the Symposium on the Foundations of Software Engineering (ESEC/FSE’15).

Digital Library

[7]

Miltiadis Allamanis and Marc Brockschmidt. 2017. SmartPaste: Learning to adapt source code. arXiv Preprint arXiv:1705.07867 (2017).

[8]

Miltiadis Allamanis, Marc Brockschmidt, and Mahmoud Khademi. 2018. Learning to represent programs with graphs. In Proceedings of the International Conference on Learning Representations (ICLR’18).

[9]

Miltiadis Allamanis, Pankajan Chanthirasegaran, Pushmeet Kohli, and Charles Sutton. 2017. Learning continuous semantic representations of symbolic expressions. In Proceedings of the International Conference on Machine Learning (ICML’17).

[10]

Miltiadis Allamanis, Hao Peng, and Charles Sutton. 2016. A convolutional attention network for extreme summarization of source code. In Proceedings of the International Conference on Machine Learning (ICML’16).

[11]

Miltiadis Allamanis and Charles Sutton. 2013. Mining source code repositories at massive scale using language modeling. In Proceedings of the Working Conference on Mining Software Repositories (MSR’13).

Digital Library

[12]

Miltiadis Allamanis and Charles Sutton. 2014. Mining idioms from source code. In Proceedings of the International Symposium on Foundations of Software Engineering (FSE’14).

Digital Library

[13]

Miltiadis Allamanis, Daniel Tarlow, Andrew Gordon, and Yi Wei. 2015. Bimodal modelling of source code and natural language. In Proceedings of the International Conference on Machine Learning (ICML’15).

Digital Library

[14]

Sven Amann, Sebastian Proksch, Sarah Nadi, and Mira Mezini. 2016. A study of visual studio usage in practice. In Proceedings of the International Conference on Software Analysis, Evolution, and Reengineering (SANER’16).

[15]

Gene M. Amdahl. 1967. Validity of the single processor approach to achieving large scale computing capabilities. In Proceedings of the Spring Joint Computer Conference.

Digital Library

[16]

Matthew Amodio, Swarat Chaudhuri, and Thomas Reps. 2017. Neural attribute machines for program generation. arXiv Preprint arXiv:1705.09231 (2017).

[17]

Jacob Andreas, Marcus Rohrbach, Trevor Darrell, and Dan Klein. 2016. Learning to compose neural networks for question answering. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’16).

[18]

Daniel Arp, Michael Spreitzenbarth, Malte Hubner, Hugo Gascon, and Konrad Rieck. 2014. DREBIN: Effective and explainable detection of android malware in your pocket. In Proceedings of the Network and Distributed System Security Symposium.

[19]

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the International Conference on Learning Representations (ICLR’15).

[20]

Matej Balog, Alexander L. Gaunt, Marc Brockschmidt, Sebastian Nowozin, and Daniel Tarlow. 2017. DeepCoder: Learning to write programs. In Proceedings of the International Conference on Learning Representations (ICLR’17).

[21]

Antonio Valerio Miceli Barone and Rico Sennrich. 2017. A parallel corpus of Python functions and documentation strings for automated code documentation and code generation. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers) 2 (2017), 314--319.

[22]

Rohan Bavishi, Michael Pradel, and Koushik Sen. 2017. Context2Name: A deep learning-based approach to infer natural variable names from usage contexts. TU Darmstadt, Department of Computer Science.

[23]

Tony Beltramelli. 2018. pix2code: Generating code from a graphical user interface screenshot. In Proceedings of the ACM SIGCHI Symposium on Engineering Interactive Computing Systems. ACM, 3 pages.

Digital Library

[24]

Al Bessey, Ken Block, Ben Chelf, Andy Chou, Bryan Fulton, Seth Hallem, Charles Henri-Gros, Asya Kamsky, Scott McPeak, and Dawson Engler. 2010. A few billion lines of code later: Using static analysis to find bugs in the real world. Communications of the ACM 53, 2 (2010), 66--75.

Digital Library

[25]

Sahil Bhatia and Rishabh Singh. 2018. Automated correction for syntax errors in programming assignments using recurrent neural networks. In Proceedings of the International Conference on Software Engineering (ICSE’18).

[26]

Avishkar Bhoopchand, Tim Rocktäschel, Earl Barr, and Sebastian Riedel. 2016. Learning Python code suggestion with a sparse pointer network. arXiv Preprint arXiv:1611.08307 (2016).

[27]

Benjamin Bichsel, Veselin Raychev, Petar Tsankov, and Martin Vechev. 2016. Statistical deobfuscation of android applications. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security.

Digital Library

[28]

Pavol Bielik, Veselin Raychev, and Martin Vechev. 2015. Programming with “big code”: Lessons, techniques and applications. In Proceedings of the LIPIcs-Leibniz International Proceedings in Informatics.

[29]

Pavol Bielik, Veselin Raychev, and Martin Vechev. 2016. PHOG: Probabilistic model for code. In Proceedings of the International Conference on Machine Learning (ICML’16).

Digital Library

[30]

David M. Blei. 2012. Probabilistic topic models. Communications of the ACM 55, 4 (2012), 77--84.

Digital Library

[31]

Marc Brockschmidt, Yuxin Chen, Pushmeet Kohli, Siddharth Krishna, and Daniel Tarlow. 2017. Learning shape analysis. In Proceedings of the International Static Analysis Symposium. Springer.

[32]

Peter John Brown. 1979. Software Portability: An Advanced Course. CUP Archive.

[33]

Marcel Bruch, Martin Monperrus, and Mira Mezini. 2009. Learning from examples to improve code completion systems. In Proceedings of the Joint Meeting of the European Software Engineering Conference and the Symposium on the Foundations of Software Engineering (ESEC/FSE’09).

Digital Library

[34]

Raymond P. L. Buse and Westley Weimer. 2012. Synthesizing API usage examples. In Proceedings of the International Conference on Software Engineering (ICSE’12).

Digital Library

[35]

Joshua Charles Campbell, Abram Hindle, and José Nelson Amaral. 2014. Syntax errors just aren’t natural: Improving error reporting with language models. In Proceedings of the Working Conference on Mining Software Repositories (MSR’14).

Digital Library

[36]

Lei Cen, Christoher S. Gates, Luo Si, and Ninghui Li. 2015. A probabilistic discriminative model for Android malware detection with decompiled source code. IEEE Transactions on Dependable and Secure Computing 12, 4 (2015), 400--412.

Digital Library

[37]

Luigi Cerulo, Massimiliano Di Penta, Alberto Bacchelli, Michele Ceccarelli, and Gerardo Canfora. 2015. Irish: A hidden markov model to detect coded information islands in free text. Science of Computer Programming 105 (2015), 26--43.

Digital Library

[38]

Kwonsoo Chae, Hakjoo Oh, Kihong Heo, and Hongseok Yang. 2017. Automatically generating features for learning program analysis heuristics for C-like languages. In Proceedings of the Conference on Object-Oriented Programming, Systems, Languages 8 Applications (OOPSLA’17).

Digital Library

[39]

Varun Chandola, Arindam Banerjee, and Vipin Kumar. 2009. Anomaly detection: A survey. ACM Computing Surveys (CSUR) 41, 3 (2009), 15.

Digital Library

[40]

Stanley F. Chen and Joshua Goodman. 1999. An empirical study of smoothing techniques for language modeling. Computer Speech and Language 13, 4 (1999), 359--394.

Digital Library

[41]

Kyunghyun Cho, Bart van Merriënboer, Dzmitry Bahdanau, and Yoshua Bengio. 2014. On the properties of neural machine translation: Encoder--Decoder approaches. In Syntax, Semantics and Structure in Statistical Translation (2014).

[42]

Edmund Clarke, Daniel Kroening, and Karen Yorav. 2003. Behavioral consistency of C and verilog programs using bounded model checking. In Proceedings of the 40th Annual Design Automation Conference.

Digital Library

[43]

Trevor Cohn, Phil Blunsom, and Sharon Goldwater. 2010. Inducing tree-substitution grammars. Journal of Machine Learning Research 11, Nov (2010), 3053--3096.

Digital Library

[44]

Christopher S. Corley, Kostadin Damevski, and Nicholas A. Kraft. 2015. Exploring the use of deep learning for feature location. In Proceedings of the 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME’15).

Digital Library

[45]

Patrick Cousot, Radhia Cousot, Jerôme Feret, Laurent Mauborgne, Antoine Miné, David Monniaux, and Xavier Rival. 2005. The ASTRÉE analyzer. In ESPO. Springer.

[46]

William Croft. 2008. Evolutionary linguistics. Ann. Rev. Anthropol. (2008).

[47]

Chris Cummins, Pavlos Petoumenos, Zheng Wang, and Hugh Leather. 2017. End-to-end deep learning of optimization heuristics. In Proceedings of the 26th International Conference on Parallel Computing Technologies (PACT'17). IEEE, 219--232.

[48]

Chris Cummins, Pavlos Petoumenos, Zheng Wang, and Hugh Leather. 2017. Synthesizing benchmarks for predictive modeling. In Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization (CGO’17). IEEE, 86--99.

[49]

Hoa Khanh Dam, Truyen Tran, and Trang Pham. 2016. A deep language model for software code. arXiv Preprint arXiv:1608.02715 (2016).

[50]

Florian Deißenböck and Markus Pizka. 2006. Concise and consistent naming. Software Quality Journal 14, 3 (2006), 261--282.

Digital Library

[51]

Yuntian Deng, Anssi Kanervisto, Jeffrey Ling, and Alexander M. Rush. 2017. Image-to-markup generation with coarse-to-fine attention. In Proceedings of the International Conference on Machine Learning (ICML’17). 980--989.

[52]

Premkumar Devanbu. 2015. New initiative: The naturalness of software. In Proceedings of the International Conference on Software Engineering (ICSE’15).

Digital Library

[53]

Jacob Devlin, Jonathan Uesato, Surya Bhupatiraju, Rishabh Singh, Abdel rahman Mohamed, and Pushmeet Kohli. 2017. Robustfill: Neural program learning under noisy I/O. In Proceedings of the International Conference on Machine Learning (ICML’17).

[54]

Robert Dyer, Hoan Anh Nguyen, Hridesh Rajan, and Tien N. Nguyen. 2013. Boa: A language and infrastructure for analyzing ultra-large-scale software repositories. In Proceedings of the International Conference on Software Engineering (ICSE’13).

Digital Library

[55]

Kevin Ellis, Daniel Ritchie, Armando Solar-Lezama, and Joshua B. Tenenbaum. 2017. Learning to infer graphics programs from hand-drawn images. arXiv Preprint arXiv:1707.09627 (2017).

[56]

Dawson Engler, David Yu Chen, Seth Hallem, Andy Chou, and Benjamin Chelf. 2001. Bugs as deviant behavior: A general approach to inferring errors in systems code. In ACM SIGOPS Operating Systems Review.

Digital Library

[57]

Michael D. Ernst. 2017. Natural language is a programming language: Applying natural language processing to software development. In Proceedings of the LIPIcs-Leibniz International Proceedings in Informatics.

[58]

Ethan Fast, Daniel Steffee, Lucy Wang, Joel R. Brandt, and Michael S. Bernstein. 2014. Emergent, crowd-scale programming practice in the IDE. In Proceedings of the Annual ACM Conference on Human Factors in Computing Systems.

Digital Library

[59]

John K. Feser, Marc Brockschmidt, Alexander L. Gaunt, and Daniel Tarlow. 2017. Neural functional programming. InProceedings of the International Conference on Learning Representations (ICLR’17).

[60]

Matthew Finifter, Adrian Mettler, Naveen Sastry, and David Wagner. 2008. Verifiable functional purity in java. In Proceedings of the 15th ACM Conference on Computer and Communications Security. ACM, 161--174.

Digital Library

[61]

Eclipse Foundation. Code Recommenders. Retrieved June 2017 from www.ecli pse.org/recommenders.

[62]

Jaroslav Fowkes, Pankajan Chanthirasegaran, Razvan Ranca, Miltos Allamanis, Mirella Lapata, and Charles Sutton. 2017. Autofolding for source code summarization. IEEE Transactions on Software Engineering 43, 12 (2017), 1095--1109.

Digital Library

[63]

Jaroslav Fowkes and Charles Sutton. 2015. Parameter-free probabilistic API mining at GitHub Scale. In Proceedings of the International Symposium on Foundations of Software Engineering (FSE’15).

Digital Library

[64]

Christine Franks, Zhaopeng Tu, Premkumar Devanbu, and Vincent Hellendoorn. 2015. Cacheca: A cache language model based code suggestion tool. In Proceedings of the International Conference on Software Engineering (ICSE’15).

Digital Library

[65]

Wei Fu and Tim Menzies. 2017. Easy over hard: A case study on deep learning. In Proceedings of the International Symposium on Foundations of Software Engineering (FSE’17).

Digital Library

[66]

Mark Gabel and Zhendong Su. 2008. Javert: Fully automatic mining of general temporal properties from dynamic traces. In Proceedings of the International Symposium on Foundations of Software Engineering (FSE’08).

Digital Library

[67]

Mark Gabel and Zhendong Su. 2010. A study of the uniqueness of source code. In Proceedings of the International Symposium on Foundations of Software Engineering (FSE’10).

Digital Library

[68]

Rosalva E. Gallardo-Valencia and Susan Elliott Sim. 2009. Internet-scale code search. In Proceedings of the 2009 ICSE Workshop on Search-Driven Development-Users, Infrastructure, Tools and Evaluation.

Digital Library

[69]

Alexander L. Gaunt, Marc Brockschmidt, Rishabh Singh, Nate Kushman, Pushmeet Kohli, Jonathan Taylor, and Daniel Tarlow. 2016. TerpreT: A probabilistic programming language for program induction. arXiv Preprint arXiv:1608.04428 (2016).

[70]

Spandana Gella, Mirella Lapata, and Frank Keller. 2016. Unsupervised visual sense disambiguation for verbs using multimodal embeddings. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’16).

[71]

Elena L. Glassman, Jeremy Scott, Rishabh Singh, Philip J. Guo, and Robert C. Miller. 2015. OverCode: Visualizing variation in student solutions to programming problems at scale. ACM Transactions on Computer-Human Interaction (TOCHI) 22, 2 (2015), 7 pages.

Digital Library

[72]

Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press.

Digital Library

[73]

Andrew D. Gordon, Thomas A. Henzinger, Aditya V. Nori, and Sriram K. Rajamani. 2014. Probabilistic programming. In Proceedings of the International Conference on Software Engineering (ICSE’14).

[74]

Orlena Gotel, Jane Cleland-Huang, Jane Huffman Hayes, Andrea Zisman, Alexander Egyed, Paul Grünbacher, Alex Dekhtyar, Giuliano Antoniol, Jonathan Maletic, and Patrick Mäder. 2012. Traceability fundamentals. In Software and Systems Traceability. Springer, 3--22.

[75]

Alex Graves, Greg Wayne, and Ivo Danihelka. 2014. Neural Turing machines. arXiv Preprint arXiv:1410.5401 (2014).

[76]

Xiaodong Gu, Hongyu Zhang, Dongmei Zhang, and Sunghun Kim. 2016. Deep API learning. In Proceedings of the International Symposium on Foundations of Software Engineering (FSE’16).

Digital Library

[77]

Sumit Gulwani and Mark Marron. 2014. NLyze: Interactive programming by natural language for spreadsheet data analysis and manipulation. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data.

Digital Library

[78]

Sumit Gulwani, Oleksandr Polozov, Rishabh Singh, and others. 2017. Program synthesis. In Foundations and Trends® in Programming Languages 4, 1--2 (2017), 1--119.

[79]

Jin Guo, Jinghui Cheng, and Jane Cleland-Huang. 2017. Semantically enhanced software traceability using deep learning techniques. In Proceedings of the International Conference on Software Engineering (ICSE’17).

Digital Library

[80]

Rahul Gupta, Aditya Kanade, and Shirish Shevade. 2018. Deep reinforcement learning for programming language correction. arXiv Preprint arXiv:1801.10467 (2018).

[81]

Rahul Gupta, Soham Pal, Aditya Kanade, and Shirish Shevade. 2017. DeepFix: Fixing common C language errors by deep learning. In Proceedings of the Conference of Artificial Intelligence (AAAI’17).

[82]

Tihomir Gvero and Viktor Kuncak. 2015. Synthesizing java expressions from free-form queries. In Proceedings of the Conference on Object-Oriented Programming, Systems, Languages 8 Applications (OOPSLA’15).

Digital Library

[83]

Alon Halevy, Peter Norvig, and Fernando Pereira. 2009. The unreasonable effectiveness of data. IEEE Intelligent Systems 24, 2 (2009), 8--12.

Digital Library

[84]

Vincent J. Hellendoorn and Premkumar Devanbu. 2017. Are deep neural networks the best choice for modeling source code? In Proceedings of the International Symposium on Foundations of Software Engineering (FSE’17).

Digital Library

[85]

Vincent J. Hellendoorn, Premkumar T. Devanbu, and Alberto Bacchelli. 2015. Will they like this?: Evaluating code contributions with language models. In Proceedings of the Working Conference on Mining Software Repositories (MSR’15).

Digital Library

[86]

Felix Hill, KyungHyun Cho, Anna Korhonen, and Yoshua Bengio. 2016. Learning to understand phrases by embedding the dictionary. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’16).

[87]

Abram Hindle, Earl T. Barr, Mark Gabel, Zhendong Su, and Premkumar Devanbu. 2016. On the naturalness of software. Communications of the ACM 59, 5 (2016), 122--131.

Digital Library

[88]

Abram Hindle, Earl T. Barr, Zhendong Su, Mark Gabel, and Premkumar Devanbu. 2012. On the naturalness of software. In Proceedings of the International Conference on Software Engineering (ICSE’12).

Digital Library

[89]

G. E. Hinton, J. L. McClelland, and D. E. Rumelhart. 1986. Distributed representations. In Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol. 1. MIT Press, 77--109.

Digital Library

[90]

C. A. R. Hoare. 1969. An axiomatic basis for computer programming. Commun. ACM 12, 10 (Oct. 1969), 576--580.

Digital Library

[91]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation 9, 8 (1997), 1735--1780.

Digital Library

[92]

Reid Holmes, Robert J. Walker, and Gail C. Murphy. 2005. Strathcona example recommendation tool. In ACM SIGSOFT Software Engineering Notes 30, 5 (2005), 237--240.

Digital Library

[93]

Chun-Hung Hsiao, Michael Cafarella, and Satish Narayanasamy. 2014. Using web corpus statistics for program analysis. In ACM SIGPLAN Notices 49, 10 (2014), 49--65.

Digital Library

[94]

Xing Hu, Yuhan Wei, Ge Li, and Zhi Jin. 2017. CodeSum: Translate program language to natural language. arXiv Preprint arXiv:1708.01837 (2017).

[95]

Andrew Hunt and David Thomas. 2000. The Pragmatic Programmer: From Journeyman to Master. Addison-Wesley Professional.

Digital Library

[96]

Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, and Luke Zettlemoyer. 2016. Summarizing source code using a neural attention model. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’16).

[97]

Siyuan Jiang, Ameer Armaly, and Collin McMillan. 2017. Automatically generating commit messages from diffs using neural machine translation. In Proceedings of the International Conference on Automated Software Engineering (ASE’17).

Digital Library

[98]

Daniel D. Johnson. 2016. Learning graphical state transitions. In Proceedings of the International Conference on Learning Representations (ICLR’16).

[99]

Dan Jurafsky. 2000. Speech 8 Language Processing (3rd. ed.). Pearson Education.

[100]

René Just, Darioush Jalali, and Michael D. Ernst. 2014. Defects4J: A database of existing faults to enable controlled testing studies for Java programs. In Proceedings of the International Symposium on Software Testing and Analysis (ISSTA’14).

Digital Library

[101]

Neel Kant. 2018. Recent advances in neural program synthesis. arXiv Preprint arXiv:1802.02353 (2018).

[102]

Svetoslav Karaivanov, Veselin Raychev, and Martin Vechev. 2014. Phrase-based statistical translation of programming languages. In Proceedings of the 2014 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming 8 Software. ACM, 173--184.

Digital Library

[103]

Andrej Karpathy, Justin Johnson, and Fei-Fei Li. 2015. Visualizing and understanding recurrent networks. arXiv Preprint arXiv:1506.02078 (2015).

[104]

Reinhard Kneser and Hermann Ney. 1995. Improved backing-off for m-gram language modeling. In Procdedings of the 1995 International Conference on Acoustics, Speech, and Signal Processing (ICASSP’95). 1 (1995), 181--184.

[105]

Donald Ervin Knuth. 1984. Literate programming. The Computer Journal 27, 2 (1984), 97--111.

Digital Library

[106]

Ugur Koc, Parsa Saadatpanah, Jeffrey S. Foster, and Adam A. Porter. 2017. Learning a classifier for false positive error reports emitted by static code analysis tools. In Proceedings of the 1st ACM SIGPLAN International Workshop on Machine Learning and Programming Languages.

Digital Library

[107]

Rainer Koschke. 2007. Survey of research on software clones. In Dagstuhl Seminar Proceedings. Schloss Dagstuhl-Leibniz-Zentrum für Informatik.

[108]

Ted Kremenek, Andrew Y. Ng, and Dawson R. Engler. 2007. A factor graph model for software bug finding. In Proceedings of the International Joint Conference on Artifical intelligence (IJCAI’07).

Digital Library

[109]

Roland Kuhn and Renato De Mori. 1990. A cache-based natural language model for speech recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 12, 6 (1990), 570--583.

Digital Library

[110]

Nate Kushman and Regina Barzilay. 2013. Using semantic unification to generate regular expressions from natural language. In Proceedings of Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’13).

[111]

Tessa Lau. 2001. Programming by Demonstration: A Machine Learning Approach. Ph.D. Dissertation. University of Washington.

Digital Library

[112]

Quoc V. Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In Proceedings of the International Conference on Machine Learning (ICML’14).

Digital Library

[113]

Tien-Duy B. Le, Mario Linares-Vásquez, David Lo, and Denys Poshyvanyk. 2015. Rclinker: Automated linking of issue reports and commits leveraging rich contextual information. In Proceedings of the International Conference on Program Comprehension (ICPC’15).

Digital Library

[114]

Dor Levy and Lior Wolf. 2017. Learning to align the source code to the compiled object code. In Proceedings of the International Conference on Machine Learning (ICML’17).

[115]

Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard Zemel. 2016. Gated graph sequence neural networks. In Proceedings of the International Conference on Learning Representations (ICLR’16).

[116]

Percy Liang, Michael I. Jordan, and Dan Klein. 2010. Learning programs: A hierarchical bayesian approach. In Proceedings of the International Conference on Machine Learning (ICML’10).

Digital Library

[117]

Ben Liblit, Mayur Naik, Alice X. Zheng, Alex Aiken, and Michael I. Jordan. 2005. Scalable statistical bug isolation. In ACM SIGPLAN Notices 40, 6 (2005), 15--26.

Digital Library

[118]

Xi Victoria Lin, Chenglong Wang, Deric Pang, Kevin Vu, Luke Zettlemoyer, and Michael D. Ernst. 2017. Program Synthesis from Natural Language using Recurrent Neural Networks. Technical Report UW-CSE-17-03-01. University of Washington Department of Computer Science and Engineering, Seattle, WA.

[119]

Xi Victoria Lin, Chenglong Wang, Luke Zettlemoyer, and Michael D. Ernst. 2018. NL2Bash: A corpus and semantic parser for natural language interface to the linux operating system. In Proceedings of the International Conference on Language Resources and Evaluation.

[120]

Wang Ling, Edward Grefenstette, Karl Moritz Hermann, Tomas Kocisky, Andrew Senior, Fumin Wang, and Phil Blunsom. 2016. Latent predictor networks for code generation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’16).

[121]

Han Liu. 2016. Towards better program obfuscation: Optimization via language models. In Proceedings of the 38th International Conference on Software Engineering Companion.

Digital Library

[122]

Benjamin Livshits, Aditya V. Nori, Sriram K. Rajamani, and Anindya Banerjee. 2009. Merlin: Specification inference for explicit information flow problems. In Proceedings of the Symposium on Programming Language Design and Implementation (PLDI’09).

Digital Library

[123]

Sarah M. Loos, Geoffrey Irving, Christian Szegedy, and Cezary Kaliszyk. 2017. Deep network guided proof search. In Proceedings of the International Conference on Logic for Programming Artificial Intelligence and Reasoning (LPAR’17).

[124]

Pablo Loyola, Edison Marrese-Taylor, and Yutaka Matsuo. 2017. A neural architecture for generating natural language descriptions from source code changes. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) 2 (2017), 287--292.

[125]

Yanxin Lu, Swarat Chaudhuri, Chris Jermaine, and David Melski. 2017. Data-Driven program completion. arXiv Preprint arXiv:1705.09042 (2017).

[126]

Chris Maddison and Daniel Tarlow. 2014. Structured generative models of natural source code. In Proceedings of the International Conference on Machine Learning (ICML’14).

Digital Library

[127]

Ravi Mangal, Xin Zhang, Aditya V. Nori, and Mayur Naik. 2015. A user-guided approach to program analysis. In Proceedings of the International Symposium on Foundations of Software Engineering (FSE’15).

Digital Library

[128]

Collin Mcmillan, Denys Poshyvanyk, Mark Grechanik, Qing Xie, and Chen Fu. 2013. Portfolio: Searching for relevant functions and their usages in millions of lines of code. ACM Transactions on Software Engineering and Methodology (TOSEM) 22, 4 (2013), 37 pages.

Digital Library

[129]

Aditya Menon, Omer Tamuz, Sumit Gulwani, Butler Lampson, and Adam Kalai. 2013. A machine learning framework for programming by example. In Proceedings of the International Conference on Machine Learning (ICML’13).

Digital Library

[130]

Kim Mens and Angela Lozano. 2014. Source code-based recommendation systems. In Recommendation Systems in Software Engineering. Springer, 93--130.

[131]

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv Preprint arXiv:1301.3781 (2013).

[132]

Lili Mou, Ge Li, Lu Zhang, Tao Wang, and Zhi Jin. 2016. Convolutional neural networks over tree structures for programming language processing. In Proceedings of the Conference of Artificial Intelligence (AAAI’16).

Digital Library

[133]

Dana Movshovitz-Attias and William W. Cohen. 2013. Natural language models for predicting programming comments. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’13).

[134]

Dana Movshovitz-Attias and William W. Cohen. 2015. KB-LDA: Jointly learning a knowledge base of hierarchy, relations, and facts. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’15).

[135]

Vijayaraghavan Murali, Letao Qi, Swarat Chaudhuri, and Chris Jermaine. 2018. Neural sketch learning for conditional program generation. In Proceedings of the International Conference on Learning Representations (ICLR).

[136]

Vijayaraghavan Murali, Swarat Chaudhuri, and Chris Jermaine. 2017. Bayesian specification learning for finding API usage errors. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering. ACM, 151--162.

Digital Library

[137]

Arvind Neelakantan, Quoc V. Le, and Ilya Sutskever. 2015. Neural programmer: Inducing latent programs with gradient descent. In Proceedings of the International Conference on Learning Representations (ICLR’15).

[138]

Graham Neubig. 2016. Survey of methods to generate natural language from source code. Retrieved from http://www.languageandcode.org/nlse2015/neubig15nlse-survey.pdf.

[139]

Anh Tuan Nguyen and Tien N. Nguyen. 2015. Graph-based statistical language model for code. In Proceedings of the International Conference on Software Engineering (ICSE’15).

[140]

Anh Tuan Nguyen, Tung Thanh Nguyen, and Tien N. Nguyen. 2013. Lexical statistical machine translation for language migration. In Proceedings of the International Symposium on Foundations of Software Engineering (FSE’13).

[141]

Anh T. Nguyen, Tung Thanh Nguyen, and Tien N. Nguyen. 2015. Divide-and-conquer approach for multi-phase statistical migration for source code. In Proceedings of the International Conference on Automated Software Engineering (ASE’15).

[142]

Trong Duc Nguyen, Anh Tuan Nguyen, and Tien N. Nguyen. 2016. Mapping API elements for code migration with vector representations. In Proceedings of the International Conference on Software Engineering (ICSE’16).

[143]

Trong Duc Nguyen, Anh Tuan Nguyen, Hung Dang Phan, and Tien N. Nguyen. 2017. Exploring API embedding for API usages and applications. In Proceedings of the International Conference on Software Engineering (ICSE’17).

[144]

Tung Thanh Nguyen, Anh Tuan Nguyen, Hoan Anh Nguyen, and Tien N. Nguyen. 2013. A statistical semantic language model for source code. In Proceedings of the Joint Meeting of the European Software Engineering Conference and the Symposium on the Foundations of Software Engineering (ESEC/FSE’13).

[145]

Haoran Niu, Iman Keivanloo, and Ying Zou. 2017. Learning to rank code examples for code search engines. Empirical Software Engineering (ESEM’16) 22, 1 (2017), 259--291.

Digital Library

[146]

Yusuke Oda, Hiroyuki Fudaba, Graham Neubig, Hideaki Hata, Sakriani Sakti, Tomoki Toda, and Satoshi Nakamura. 2015. Learning to generate pseudo-code from source code using statistical machine translation. In Proceedings of the International Conference on Automated Software Engineering (ASE’15).

Digital Library

[147]

Hakjoo Oh, Hongseok Yang, and Kwangkeun Yi. 2015. Learning a strategy for adapting a program analysis via bayesian optimisation. In Proceedings of the Conference on Object-Oriented Programming, Systems, Languages 8 Applications (OOPSLA’15).

Digital Library

[148]

Cyrus Omar. 2013. Structured statistical syntax tree prediction. In Proceedings of the Conference on Systems, Programming, Languages and Applications: Software for Humanity (SPLASH’13).

Digital Library

[149]

Cyrus Omar, Ian Voysey, Michael Hilton, Joshua Sunshine, Claire Le Goues, Jonathan Aldrich, and Matthew A. Hammer. 2017. Toward semantic foundations for program editors. arXiv preprint arXiv:1703.08694.

[150]

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’02).

Digital Library

[151]

Emilio Parisotto, Abdel-rahman Mohamed, Rishabh Singh, Lihong Li, Dengyong Zhou, and Pushmeet Kohli. 2017. Neuro-symbolic program synthesis. In Proceedings of the International Conference on Learning Representations (ICLR’17).

[152]

Terence Parr and Jurgen J. Vinju. 2016. Towards a universal code formatter through machine learning. In Proceedings of the International Conference on Software Language Engineering (SLE’16).

Digital Library

[153]

Jibesh Patra and Michael Pradel. 2016. Learning to Fuzz: Application-Independent Fuzz Testing with Probabilistic, Generative Models of Input Data. TU Darmstadt, Department of Computer Science, TUD-CS-2016-14664.

[154]

Hung Viet Pham, Phong Minh Vu, Tung Thanh Nguyen, and others. 2016. Learning API usages from bytecode: A statistical approach. In Proceedings of the International Conference on Software Engineering (ICSE’16).

[155]

Chris Piech, Jonathan Huang, Andy Nguyen, Mike Phulsuksombati, Mehran Sahami, and Leonidas J. Guibas. 2015. Learning program embeddings to propagate feedback on student code. In Proceedings of the International Conference on Machine Learning (ICML’15).

Digital Library

[156]

Matt Post and Daniel Gildea. 2009. Bayesian learning of a tree substitution grammar. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’09).

Digital Library

[157]

Michael Pradel and Koushik Sen. 2017. Deep learning to find bugs. TU Darmstadt, Department of Computer Science.

[158]

Sebastian Proksch, Sven Amann, Sarah Nadi, and Mira Mezini. 2016. Evaluating the evaluations of code recommender systems: A reality check. In Proceedings of the International Conference on Automated Software Engineering (ASE’16).

Digital Library

[159]

Sebastian Proksch, Johannes Lerch, and Mira Mezini. 2015. Intelligent code completion with bayesian networks. ACM Transactions on Software Engineering and Methodology (TOSEM) 25, 1 (2015), 3.

Digital Library

[160]

Yewen Pu, Karthik Narasimhan, Armando Solar-Lezama, and Regina Barzilay. 2016. sk_p: A neural program corrector for MOOCs. In Proceedings of the Conference on Systems, Programming, Languages and Applications: Software for Humanity (SPLASH’16).

Digital Library

[161]

Chris Quirk, Raymond Mooney, and Michel Galley. 2015. Language to code: Learning semantic parsers for if-this-then-that recipes. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’15).

[162]

Maxim Rabinovich, Mitchell Stern, and Dan Klein. 2017. Abstract syntax networks for code generation and semantic parsing. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’17).

[163]

Baishakhi Ray, Vincent Hellendoorn, Saheel Godhane, Zhaopeng Tu, Alberto Bacchelli, and Premkumar Devanbu. 2016. On the naturalness of buggy code. In Proceedings of the International Conference on Software Engineering (ICSE’16).

Digital Library

[164]

Veselin Raychev, Pavol Bielik, Martin Vechev, and Andreas Krause. 2016. Learning programs from noisy data. In Proceedings of the Symposium on Principles of Programming Languages (POPL’16).

Digital Library

[165]

Veselin Raychev, Martin Vechev, and Andreas Krause. 2015. Predicting program properties from “big code.” In Proceedings of the Symposium on Principles of Programming Languages (POPL’15).

Digital Library

[166]

Veselin Raychev, Martin Vechev, and Eran Yahav. 2014. Code completion with statistical language models. In Proceedings of the Symposium on Programming Language Design and Implementation (PLDI’14).

Digital Library

[167]

Scott Reed and Nando de Freitas. 2016. Neural programmer-interpreters. In Proceedings of the International Conference on Learning Representations (ICLR’16).

[168]

Sebastian Riedel, Matko Bosnjak, and Tim Rocktäschel. 2017. Programming with a differentiable forth interpreter. In Proceedings of the International Conference on Machine Learning (ICML’17).

[169]

Martin Robillard, Robert Walker, and Thomas Zimmermann. 2010. Recommendation systems for software engineering. IEEE Software 27, 4 (2010), 80--86.

Digital Library

[170]

Martin P. Robillard, Walid Maalej, Robert J. Walker, and Thomas Zimmermann. 2014. Recommendation Systems in Software Engineering. Springer.

Digital Library

[171]

Tim Rocktäschel and Sebastian Riedel. 2017. End-to-end differentiable proving. In Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS’17).

[172]

Caitlin Sadowski, Kathryn T. Stolee, and Sebastian Elbaum. 2015. How developers search for code: A case study. In Proceedings of the International Symposium on Foundations of Software Engineering (FSE’15).

Digital Library

[173]

Juliana Saraiva, Christian Bird, and Thomas Zimmermann. 2015. Products, developers, and milestones: How should I build my N-gram language model. In Proceedings of the International Symposium on Foundations of Software Engineering (FSE’15).

Digital Library

[174]

Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Neural machine translation of rare words with subword units. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’16).

[175]

Abhishek Sharma, Yuan Tian, and David Lo. 2015. NIRMAL: Automatic identification of software relevant tweets leveraging language model. In Proceedings of the International Conference on Software Analysis, Evolution, and Reengineering (SANER’15).

[176]

Rishabh Singh and Sumit Gulwani. 2015. Predicting a correct program in programming by example. In Proceedings of the International Conference on Computer Aided Verification.

[177]

Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. In Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS’14).

Digital Library

[178]

Suresh Thummalapenta and Tao Xie. 2007. Parseweb: A programmer assistant for reusing open source code on the web. In Proceedings of the International Conference on Automated Software Engineering (ASE’07).

Digital Library

[179]

Christoph Treude and Martin P. Robillard. 2016. Augmenting API documentation with insights from stack overflow. In Proceedings of the International Conference on Software Engineering (ICSE’16).

Digital Library

[180]

Zhaopeng Tu, Zhendong Su, and Premkumar Devanbu. 2014. On the localness of software. In Proceedings of the International Symposium on Foundations of Software Engineering (FSE’14).

Digital Library

[181]

Bogdan Vasilescu, Casey Casalnuovo, and Premkumar Devanbu. 2017. Recovering clear, natural identifiers from obfuscated JS names. In Proceedings of the International Symposium on Foundations of Software Engineering (FSE’17).

Digital Library

[182]

Lisa Wang, Angela Sy, Larry Liu, and Chris Piech. 2017. Deep knowledge tracing on programming exercises. In Proceedings of the Conference on Learning @ Scale.

Digital Library

[183]

Song Wang, Devin Chollak, Dana Movshovitz-Attias, and Lin Tan. 2016. Bugram: Bug detection with n-gram language models. In Proceedings of the International Conference on Automated Software Engineering (ASE’16).

Digital Library

[184]

Song Wang, Taiyue Liu, and Lin Tan. 2016. Automatically learning semantic features for defect prediction. In Proceedings of the International Conference on Software Engineering (ICSE’16).

Digital Library

[185]

Xin Wang, Chang Liu, Richard Shin, Joseph E. Gonzalez, and Dawn Song. 2016. Neural Code Completion. Retrieved from https://openreview.net/pdf?id=rJbPBt9lg.

[186]

Andrzej Wasylkowski, Andreas Zeller, and Christian Lindig. 2007. Detecting object usage anomalies. In Proceedings of the Joint Meeting of the European Software Engineering Conference and the Symposium on the Foundations of Software Engineering (ESEC/FSE’07).

Digital Library

[187]

Martin White, Michele Tufano, Christopher Vendome, and Denys Poshyvanyk. 2016. Deep learning code fragments for code clone detection. In Proceedings of the International Conference on Automated Software Engineering (ASE’16).

Digital Library

[188]

Martin White, Christopher Vendome, Mario Linares-Vásquez, and Denys Poshyvanyk. 2015. Toward deep learning software repositories. In Proceedings of the Working Conference on Mining Software Repositories (MSR’15).

Digital Library

[189]

Chadd C. Williams and Jeffrey K. Hollingsworth. 2005. Automatic mining of source code repositories to improve bug finding techniques. IEEE Transactions on Software Engineering 31, 6 (2005), 466--480.

Digital Library

[190]

Ian H. Witten, Eibe Frank, Mark A. Hall, and Christopher J. Pal. 2016. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann.

Digital Library

[191]

W. Eric Wong, Ruizhi Gao, Yihao Li, Rui Abreu, and Franz Wotawa. 2016. A survey on software fault localization. IEEE Transactions on Software Engineering 42, 8 (2016), 707--740.

Digital Library

[192]

Tao Xie and Jian Pei. 2006. MAPO: Mining API usages from open source repositories. In Proceedings of the Working Conference on Mining Software Repositories (MSR’06).

Digital Library

[193]

Chang Xu, Dacheng Tao, and Chao Xu. 2013. A survey on multi-view learning. arXiv Preprint arXiv:1304.5634 (2013).

[194]

Shir Yadid and Eran Yahav. 2016. Extracting code from programming tutorial videos. In Proceedings of the 2016 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software.

Digital Library

[195]

Eran Yahav. 2015. Programming with “big code.” In Asian Symposium on Programming Languages and Systems. Springer, 3--8.

[196]

Pengcheng Yin and Graham Neubig. 2017. A syntactic neural model for general-purpose code generation. Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’17).

[197]

Wojciech Zaremba and Ilya Sutskever. 2014. Learning to execute. arXiv Preprint arXiv:1410.4615 (2014).

[198]

Alice X. Zheng, Michael I. Jordan, Ben Liblit, and Alex Aiken. 2003. Statistical debugging of sampled programs. In Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS’03).

Digital Library

[199]

Alice X. Zheng, Michael I. Jordan, Ben Liblit, Mayur Naik, and Alex Aiken. 2006. Statistical debugging: Simultaneous identification of multiple bugs. In Proceedings of the International Conference on Machine Learning (ICML’06).

Digital Library

[200]

Victor Zhong, Caiming Xiong, and Richard Socher. 2017. Seq2SQL: Generating structured queries from natural language using reinforcement learning. arXiv Preprint arXiv:1709.00103 (2017).

[201]

Thomas Zimmermann, Andreas Zeller, Peter Weissgerber, and Stephan Diehl. 2005. Mining version histories to guide software changes. IEEE Transactions on Software Engineering 31, 6 (2005), 429--445.

Digital Library

Cited By

Tian ZLi HSun HChen YChen L(2025)HardVD: High-capacity cross-modal adversarial reprogramming for data-efficient vulnerability detectionInformation Sciences10.1016/j.ins.2024.121370686(121370)Online publication date: Jan-2025
https://doi.org/10.1016/j.ins.2024.121370
Chughtai MBibi IKarim SShah SLaghari AKhan A(2024)Deep learning trends and future perspectives of web security and vulnerabilitiesJournal of High Speed Networks10.3233/JHS-23003730:1(115-146)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.3233/JHS-230037
Kuramitsu KObara MSato MAkinobu Y(2024)Training AI Model that Suggests Python Code from Student Requests in Natural LanguageJournal of Information Processing10.2197/ipsjjip.32.6932(69-76)Online publication date: 2024
https://doi.org/10.2197/ipsjjip.32.69
Show More Cited By

Index Terms

A Survey of Machine Learning for Big Code and Naturalness

Recommendations

The adverse effects of code duplication in machine learning models of code
Onward! 2019: Proceedings of the 2019 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software

The field of big code relies on mining large corpora of code to perform some learning task towards creating better tools for software engineers. A significant threat to this approach was recently identified by Lopes et al. (2017) who found a large ...
Machine learning on big data

Machine learning (ML) is continuously unleashing its power in a wide range of applications. It has been pushed to the forefront in recent years partly owing to the advent of big data. ML algorithms have never been better promised while challenged by big ...
Severity Classification of Code Smells Using Machine-Learning Methods
Abstract
Code smell detection can be very useful for minimizing maintenance costs and improving software quality. Code smells help developers/programmers, researchers to subjectively interpret design defects in different ways. Code smells instances can ...

Comments

Information & Contributors

Information

Published In

cover image ACM Computing Surveys

ACM Computing Surveys Volume 51, Issue 4

July 2019

765 pages

ISSN:0360-0300

EISSN:1557-7341

DOI:10.1145/3236632

Editor:
Sartaj Sahni
Department of Computer and Information Science and Engineering / University of Florida / Gainesville, FL 32611

Issue’s Table of Contents

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 July 2018

Accepted: 01 April 2018

Revised: 01 March 2018

Received: 01 September 2017

Published in CSUR Volume 51, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Survey
Research
Refereed

Funding Sources

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

436
Total Citations
View Citations
10,417
Total Downloads

Downloads (Last 12 months)1,716
Downloads (Last 6 weeks)146

Reflects downloads up to 13 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Tian ZLi HSun HChen YChen L(2025)HardVD: High-capacity cross-modal adversarial reprogramming for data-efficient vulnerability detectionInformation Sciences10.1016/j.ins.2024.121370686(121370)Online publication date: Jan-2025
https://doi.org/10.1016/j.ins.2024.121370
Chughtai MBibi IKarim SShah SLaghari AKhan A(2024)Deep learning trends and future perspectives of web security and vulnerabilitiesJournal of High Speed Networks10.3233/JHS-23003730:1(115-146)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.3233/JHS-230037
Kuramitsu KObara MSato MAkinobu Y(2024)Training AI Model that Suggests Python Code from Student Requests in Natural LanguageJournal of Information Processing10.2197/ipsjjip.32.6932(69-76)Online publication date: 2024
https://doi.org/10.2197/ipsjjip.32.69
Zhang FLi MWu HWu T(2024)Intelligent code search aids edge software developmentJournal of Cloud Computing: Advances, Systems and Applications10.1186/s13677-024-00629-513:1Online publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1186/s13677-024-00629-5
Chi KLi CGe JLuo B(2024)An Empirical Study on Code Search Pre-trained Models: Academic Progresses vs. Industry RequirementsProceedings of the 15th Asia-Pacific Symposium on Internetware10.1145/3671016.3672580(41-50)Online publication date: 24-Jul-2024
https://dl.acm.org/doi/10.1145/3671016.3672580
Xu SShen JLi YYao YYu PXu FMa X(2024)On the Heterophily of Program Graphs: A Case Study of Graph-based Type InferenceProceedings of the 15th Asia-Pacific Symposium on Internetware10.1145/3671016.3671389(1-10)Online publication date: 24-Jul-2024
https://dl.acm.org/doi/10.1145/3671016.3671389
Ma WLiu SZhao MXie XWang WHu QZhang JLiu Y(2024)Unveiling Code Pre-Trained Models: Investigating Syntax and Semantics CapacitiesACM Transactions on Software Engineering and Methodology10.1145/366460633:7(1-29)Online publication date: 26-Aug-2024
https://dl.acm.org/doi/10.1145/3664606
Yang ZLiu FYu ZKeung JLi JLiu SHong YMa XJin ZLi G(2024)Exploring and Unleashing the Power of Large Language Models in Automated Code TranslationProceedings of the ACM on Software Engineering10.1145/36607781:FSE(1585-1608)Online publication date: 12-Jul-2024
https://dl.acm.org/doi/10.1145/3660778
Mir AKeshani MProksch SSpinellis DConstantinou EBacchelli A(2024)On the Effectiveness of Machine Learning-based Call Graph Pruning: An Empirical StudyProceedings of the 21st International Conference on Mining Software Repositories10.1145/3643991.3644897(457-468)Online publication date: 15-Apr-2024
https://dl.acm.org/doi/10.1145/3643991.3644897
Corso VMariani LMicucci DRiganelli OBaysal OLinares-Vasquez MMoran KSteinmacher I(2024)Generating Java Methods: An Empirical Assessment of Four AI-Based Code AssistantsProceedings of the 32nd IEEE/ACM International Conference on Program Comprehension10.1145/3643916.3644402(13-23)Online publication date: 15-Apr-2024
https://dl.acm.org/doi/10.1145/3643916.3644402
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents