Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3387904.3389252acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

GGF: A Graph-based Method for Programming Language Syntax Error Correction

Published: 12 September 2020 Publication History

Abstract

Syntax errors combined with obscure error messages generated by compilers usually annoy programmers and cause them to waste a lot of time on locating errors. The existing models do not utilize the structure in the code and just treat the code as token sequences. It causes low accuracy and poor performance on this task. In this paper, we propose a novel deep supervised learning model, called Graph-based Grammar Fix(GGF), to help programmers locate and fix the syntax errors. GGF treats the code as a mixture of the token sequences and graphs. The graphs build upon the Abstract Syntax Tree (AST) structure information. GGF encodes an erroneous code with its sub-AST structure, predicts the error position using pointer network and generates the right token. We utilized the DeepFix dataset which contains 46500 correct C programs and 6975 programs with errors written by students taking an introductory programming course. GGF is trained with the correct programs from the DeepFix dataset with intentionally injected syntax errors. After training, GGF could fix 4054 (58.12%) of the erroneous code, while the existing state of the art tool DeepFix fixes 1365 (19.57%) of the erroneous code.

References

[1]
Umair Z Ahmed, Pawan Kumar, Amey Karkare, Purushottam Kar, and Sumit Gulwani. 2018. Compilation error repair: for the student programs, from the student programs. In 2018 IEEE/ACM 40th International Conference on Software Engineering: Software Engineering Education and Training (ICSE-SEET). IEEE, 78--87.
[2]
Alfred V Aho and Thomas G Peterson. 1972. A minimum distance error-correcting parser for context-free languages. SIAM J. Comput. 1, 4 (1972), 305--312.
[3]
Miltiadis Allamanis, Marc Brockschmidt, and Mahmoud Khademi. 2017. Learning to represent programs with graphs. arXiv preprint arXiv:1711.00740 (2017).
[4]
Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2018. code2vec: Learning Distributed Representations of Code. arXiv preprint arXiv:1803.09473 (2018).
[5]
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).
[6]
Sahil Bhatia and Rishabh Singh. 2016. Automated correction for syntax errors in programming assignments using recurrent neural networks. arXiv preprint arXiv:1603.06129 (2016).
[7]
Joshua Charles Campbell, Abram Hindle, and José Nelson Amaral. 2014. Syntax errors just aren't natural: improving error reporting with language models. In Proceedings of the 11th Working Conference on Mining Software Repositories. ACM, 252--261.
[8]
Martin Chodorow, Joel R Tetreault, and Na-Rae Han. 2007. Detection of grammatical errors involving prepositions. In Proceedings of the fourth ACL-SIGSEM workshop on prepositions. Association for Computational Linguistics, 25--30.
[9]
Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014).
[10]
Frank DeRemer and Thomas Pennello. 1982. Efficient computation of LALR (1) look-ahead sets. ACM Transactions on Programming Languages and Systems (TOPLAS) 4, 4 (1982), 615--649.
[11]
Tao Ge, Furu Wei, and Ming Zhou. 2018. Reaching Human-level Performance in Automatic Grammatical Error Correction: An Empirical Study. arXiv preprint arXiv:1807.01270 (2018).
[12]
Michael Goodrich, Roberto Tamassia, and David Mount. 2007. DATA STRUCTURES AND ALOGORITHMS IN C++. John Wiley & Sons.
[13]
Rahul Gupta, Soham Pal, Aditya Kanade, and Shirish Shevade. 2017. DeepFix: Fixing Common C Language Errors by Deep Learning. In AAAI. 1345--1351.
[14]
Rahul Gupta, Soham Pal, Aditya Kanade, and Shirish Shevade. 2017. DeepFix: Fixing Common C Language Errors by Deep Learning. http://www.iisc-seal.net/deepfix [Online; accessed 09-04-2018].
[15]
Na-Rae Han, Martin Chodorow, and Claudia Leacock. 2006. Detecting errors in English article usage by non-native speakers. Natural Language Engineering 12, 2 (2006), 115--129.
[16]
John J Hopfield. 1982. Neural networks and physical systems with emergent collective computational abilities. Proceedings of the national academy of sciences 79, 8 (1982), 2554--2558.
[17]
Clinton L Jeffery. 2003. Generating LR syntax error messages from examples. ACM Transactions on Programming Languages and Systems (TOPLAS) 25, 5 (2003), 631--640.
[18]
Jianshu Ji, Qinlong Wang, Kristina Toutanova, Yongen Gong, Steven Truong, and Jianfeng Gao. 2017. A nested attention neural hybrid model for grammatical error correction. arXiv preprint arXiv:1707.02026 (2017).
[19]
Michael I Jordan. 1997. Serial order: A parallel distributed processing approach. In Advances in psychology. Vol. 121. Elsevier, 471--495.
[20]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[21]
Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard Zemel. 2015. Gated graph sequence neural networks. arXiv preprint arXiv:1511.05493 (2015).
[22]
Hwee Tou Ng, Siew Mei Wu, Ted Briscoe, Christian Hadiwinoto, Raymond Hendy Susanto, and Christopher Bryant. 2014. The CoNLL-2014 shared task on grammatical error correction. In Proceedings of the Eighteenth Conference on Computational Natural Language Learning: Shared Task. 1--14.
[23]
Sagar Parihar, Ziyaan Dadachanji, Praveen Kumar Singh, Rajdeep Das, Amey Karkare, and Arnab Bhattacharya. 2017. Automatic grading and feedback using program repair for introductory programming courses. In Proceedings of the 2017 ACM Conference on Innovation and Technology in Computer Science Education. ACM, 92--97.
[24]
Terence Parr and Kathleen Fisher. 2011. LL (*): the foundation of the ANTLR parser generator. ACM Sigplan Notices 46, 6 (2011), 425--436.
[25]
Alla Rozovskaya, Kai-Wei Chang, Mark Sammons, Dan Roth, and Nizar Habash. 2014. The Illinois-Columbia system in the CoNLL-2014 shared task. In Proceedings of the Eighteenth Conference on Computational Natural Language Learning: Shared Task. 34--42.
[26]
Alla Rozovskaya and Dan Roth. 2010. Generating confusion sets for context-sensitive error correction. In Proceedings of the 2010 conference on empirical methods in natural language processing. Association for Computational Linguistics, 961--970.
[27]
Eddie A Santos, Joshua C Campbell, Abram Hindle, and José Nelson Amaral. 2017. Finding and correcting syntax errors using recurrent neural networks. PeerJ PrePrints (2017).
[28]
Rishabh Singh, Sumit Gulwani, and Armando Solar-Lezama. 2013. Automated feedback generation for introductory programming assignments. Acm Sigplan Notices 48, 6 (2013), 15--26.
[29]
Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In Advances in neural information processing systems. 3104--3112.
[30]
Richard A Thompson. 1976. Language correction using probabilistic grammars. IEEE Trans. Comput. 100, 3 (1976), 275--286.
[31]
Jooyong Yi, Umair Z Ahmed, Amey Karkare, Shin Hwei Tan, and Abhik Roychoudhury. 2017. A feasibility study of using automated program repair for introductory programming assignments. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering. ACM, 740--751.

Cited By

View all
  • (2024)A survey on machine learning techniques applied to source codeJournal of Systems and Software10.1016/j.jss.2023.111934209:COnline publication date: 14-Mar-2024
  • (2023)A Survey of Learning-based Automated Program RepairACM Transactions on Software Engineering and Methodology10.1145/363197433:2(1-69)Online publication date: 23-Dec-2023
  • (2022)Seq2Parse: neurosymbolic parse error repairProceedings of the ACM on Programming Languages10.1145/35633306:OOPSLA2(1180-1206)Online publication date: 31-Oct-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICPC '20: Proceedings of the 28th International Conference on Program Comprehension
July 2020
481 pages
ISBN:9781450379588
DOI:10.1145/3387904
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 September 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Deep Learning
  2. GGNN
  3. Syntax Error Correction

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • the National Natural Science Foundation of China
  • the Key Project of Science and Technology Department of Jiangsu

Conference

ICPC '20
Sponsor:

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)31
  • Downloads (Last 6 weeks)2
Reflects downloads up to 02 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)A survey on machine learning techniques applied to source codeJournal of Systems and Software10.1016/j.jss.2023.111934209:COnline publication date: 14-Mar-2024
  • (2023)A Survey of Learning-based Automated Program RepairACM Transactions on Software Engineering and Methodology10.1145/363197433:2(1-69)Online publication date: 23-Dec-2023
  • (2022)Seq2Parse: neurosymbolic parse error repairProceedings of the ACM on Programming Languages10.1145/35633306:OOPSLA2(1180-1206)Online publication date: 31-Oct-2022
  • (2021)ArCode: A Tool for Supporting Comprehension and Implementation of Architectural Concerns2021 IEEE/ACM 29th International Conference on Program Comprehension (ICPC)10.1109/ICPC52881.2021.00056(485-489)Online publication date: May-2021

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media