research-article

GGF: A Graph-based Method for Programming Language Syntax Error Correction

Authors:

Tao ZhengAuthors Info & Claims

ICPC '20: Proceedings of the 28th International Conference on Program Comprehension

Pages 139 - 148

https://doi.org/10.1145/3387904.3389252

Published: 12 September 2020 Publication History

Abstract

Syntax errors combined with obscure error messages generated by compilers usually annoy programmers and cause them to waste a lot of time on locating errors. The existing models do not utilize the structure in the code and just treat the code as token sequences. It causes low accuracy and poor performance on this task. In this paper, we propose a novel deep supervised learning model, called Graph-based Grammar Fix(GGF), to help programmers locate and fix the syntax errors. GGF treats the code as a mixture of the token sequences and graphs. The graphs build upon the Abstract Syntax Tree (AST) structure information. GGF encodes an erroneous code with its sub-AST structure, predicts the error position using pointer network and generates the right token. We utilized the DeepFix dataset which contains 46500 correct C programs and 6975 programs with errors written by students taking an introductory programming course. GGF is trained with the correct programs from the DeepFix dataset with intentionally injected syntax errors. After training, GGF could fix 4054 (58.12%) of the erroneous code, while the existing state of the art tool DeepFix fixes 1365 (19.57%) of the erroneous code.

References

[1]

Umair Z Ahmed, Pawan Kumar, Amey Karkare, Purushottam Kar, and Sumit Gulwani. 2018. Compilation error repair: for the student programs, from the student programs. In 2018 IEEE/ACM 40th International Conference on Software Engineering: Software Engineering Education and Training (ICSE-SEET). IEEE, 78--87.

Digital Library

[2]

Alfred V Aho and Thomas G Peterson. 1972. A minimum distance error-correcting parser for context-free languages. SIAM J. Comput. 1, 4 (1972), 305--312.

Digital Library

[3]

Miltiadis Allamanis, Marc Brockschmidt, and Mahmoud Khademi. 2017. Learning to represent programs with graphs. arXiv preprint arXiv:1711.00740 (2017).

[4]

Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2018. code2vec: Learning Distributed Representations of Code. arXiv preprint arXiv:1803.09473 (2018).

[5]

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).

[6]

Sahil Bhatia and Rishabh Singh. 2016. Automated correction for syntax errors in programming assignments using recurrent neural networks. arXiv preprint arXiv:1603.06129 (2016).

[7]

Joshua Charles Campbell, Abram Hindle, and José Nelson Amaral. 2014. Syntax errors just aren't natural: improving error reporting with language models. In Proceedings of the 11th Working Conference on Mining Software Repositories. ACM, 252--261.

Digital Library

[8]

Martin Chodorow, Joel R Tetreault, and Na-Rae Han. 2007. Detection of grammatical errors involving prepositions. In Proceedings of the fourth ACL-SIGSEM workshop on prepositions. Association for Computational Linguistics, 25--30.

[9]

Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014).

Digital Library

[10]

Frank DeRemer and Thomas Pennello. 1982. Efficient computation of LALR (1) look-ahead sets. ACM Transactions on Programming Languages and Systems (TOPLAS) 4, 4 (1982), 615--649.

Digital Library

[11]

Tao Ge, Furu Wei, and Ming Zhou. 2018. Reaching Human-level Performance in Automatic Grammatical Error Correction: An Empirical Study. arXiv preprint arXiv:1807.01270 (2018).

[12]

Michael Goodrich, Roberto Tamassia, and David Mount. 2007. DATA STRUCTURES AND ALOGORITHMS IN C++. John Wiley & Sons.

[13]

Rahul Gupta, Soham Pal, Aditya Kanade, and Shirish Shevade. 2017. DeepFix: Fixing Common C Language Errors by Deep Learning. In AAAI. 1345--1351.

[14]

Rahul Gupta, Soham Pal, Aditya Kanade, and Shirish Shevade. 2017. DeepFix: Fixing Common C Language Errors by Deep Learning. http://www.iisc-seal.net/deepfix [Online; accessed 09-04-2018].

[15]

Na-Rae Han, Martin Chodorow, and Claudia Leacock. 2006. Detecting errors in English article usage by non-native speakers. Natural Language Engineering 12, 2 (2006), 115--129.

Digital Library

[16]

John J Hopfield. 1982. Neural networks and physical systems with emergent collective computational abilities. Proceedings of the national academy of sciences 79, 8 (1982), 2554--2558.

[17]

Clinton L Jeffery. 2003. Generating LR syntax error messages from examples. ACM Transactions on Programming Languages and Systems (TOPLAS) 25, 5 (2003), 631--640.

Digital Library

[18]

Jianshu Ji, Qinlong Wang, Kristina Toutanova, Yongen Gong, Steven Truong, and Jianfeng Gao. 2017. A nested attention neural hybrid model for grammatical error correction. arXiv preprint arXiv:1707.02026 (2017).

[19]

Michael I Jordan. 1997. Serial order: A parallel distributed processing approach. In Advances in psychology. Vol. 121. Elsevier, 471--495.

[20]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

[21]

Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard Zemel. 2015. Gated graph sequence neural networks. arXiv preprint arXiv:1511.05493 (2015).

[22]

Hwee Tou Ng, Siew Mei Wu, Ted Briscoe, Christian Hadiwinoto, Raymond Hendy Susanto, and Christopher Bryant. 2014. The CoNLL-2014 shared task on grammatical error correction. In Proceedings of the Eighteenth Conference on Computational Natural Language Learning: Shared Task. 1--14.

[23]

Sagar Parihar, Ziyaan Dadachanji, Praveen Kumar Singh, Rajdeep Das, Amey Karkare, and Arnab Bhattacharya. 2017. Automatic grading and feedback using program repair for introductory programming courses. In Proceedings of the 2017 ACM Conference on Innovation and Technology in Computer Science Education. ACM, 92--97.

Digital Library

[24]

Terence Parr and Kathleen Fisher. 2011. LL (*): the foundation of the ANTLR parser generator. ACM Sigplan Notices 46, 6 (2011), 425--436.

Digital Library

[25]

Alla Rozovskaya, Kai-Wei Chang, Mark Sammons, Dan Roth, and Nizar Habash. 2014. The Illinois-Columbia system in the CoNLL-2014 shared task. In Proceedings of the Eighteenth Conference on Computational Natural Language Learning: Shared Task. 34--42.

[26]

Alla Rozovskaya and Dan Roth. 2010. Generating confusion sets for context-sensitive error correction. In Proceedings of the 2010 conference on empirical methods in natural language processing. Association for Computational Linguistics, 961--970.

Digital Library

[27]

Eddie A Santos, Joshua C Campbell, Abram Hindle, and José Nelson Amaral. 2017. Finding and correcting syntax errors using recurrent neural networks. PeerJ PrePrints (2017).

[28]

Rishabh Singh, Sumit Gulwani, and Armando Solar-Lezama. 2013. Automated feedback generation for introductory programming assignments. Acm Sigplan Notices 48, 6 (2013), 15--26.

Digital Library

[29]

Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In Advances in neural information processing systems. 3104--3112.

[30]

Richard A Thompson. 1976. Language correction using probabilistic grammars. IEEE Trans. Comput. 100, 3 (1976), 275--286.

Digital Library

[31]

Jooyong Yi, Umair Z Ahmed, Amey Karkare, Shin Hwei Tan, and Abhik Roychoudhury. 2017. A feasibility study of using automated program repair for introductory programming assignments. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering. ACM, 740--751.

Digital Library

Cited By

Sharma TKechagia MGeorgiou STiwari RVats IMoazen HSarro F(2024)A survey on machine learning techniques applied to source codeJournal of Systems and Software10.1016/j.jss.2023.111934209:COnline publication date: 14-Mar-2024
https://dl.acm.org/doi/10.1016/j.jss.2023.111934
Zhang QFang CMa YSun WChen Z(2023)A Survey of Learning-based Automated Program RepairACM Transactions on Software Engineering and Methodology10.1145/363197433:2(1-69)Online publication date: 23-Dec-2023
https://dl.acm.org/doi/10.1145/3631974
Sakkas GEndres MGuo PWeimer WJhala R(2022)Seq2Parse: neurosymbolic parse error repairProceedings of the ACM on Programming Languages10.1145/35633306:OOPSLA2(1180-1206)Online publication date: 31-Oct-2022
https://dl.acm.org/doi/10.1145/3563330
Show More Cited By

Index Terms

GGF: A Graph-based Method for Programming Language Syntax Error Correction
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks
2. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis

Recommendations

VDoTR: Vulnerability detection based on tensor representation of comprehensive code graphs
Abstract
Code vulnerability detection has long been a critical issue due to its potential threat to computer systems. It is imperative to detect source code vulnerabilities in software and remediate them to avoid cyber attacks. To automate detection and ...
Learning lenient parsing & typing via indirect supervision
Abstract
Both professional coders and teachers frequently deal with imperfect (fragmentary, incomplete, ill-formed) code. Such fragments are common in StackOverflow; students also frequently produce ill-formed code, for which instructors, TAs (or students ...
Real-time Transportation Prediction Correction using Reconstruction Error in Deep Learning

In online complex systems such as transportation system, an important work is real-time traffic prediction. Due to the data shift, data model inconsistency, and sudden change of traffic patterns (like transportation accident), the prediction result ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICPC '20: Proceedings of the 28th International Conference on Program Comprehension

July 2020

481 pages

ISBN:9781450379588

DOI:10.1145/3387904

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 September 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

the National Natural Science Foundation of China
the Key Project of Science and Technology Department of Jiangsu

Conference

ICPC '20

Sponsor:

SIGSOFT

ICPC '20: 28th International Conference on Program Comprehension

July 13 - 15, 2020

Seoul, Republic of Korea

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
303
Total Downloads

Downloads (Last 12 months)31
Downloads (Last 6 weeks)2

Reflects downloads up to 02 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Sharma TKechagia MGeorgiou STiwari RVats IMoazen HSarro F(2024)A survey on machine learning techniques applied to source codeJournal of Systems and Software10.1016/j.jss.2023.111934209:COnline publication date: 14-Mar-2024
https://dl.acm.org/doi/10.1016/j.jss.2023.111934
Zhang QFang CMa YSun WChen Z(2023)A Survey of Learning-based Automated Program RepairACM Transactions on Software Engineering and Methodology10.1145/363197433:2(1-69)Online publication date: 23-Dec-2023
https://dl.acm.org/doi/10.1145/3631974
Sakkas GEndres MGuo PWeimer WJhala R(2022)Seq2Parse: neurosymbolic parse error repairProceedings of the ACM on Programming Languages10.1145/35633306:OOPSLA2(1180-1206)Online publication date: 31-Oct-2022
https://dl.acm.org/doi/10.1145/3563330
Shokri AMirakhorli M(2021)ArCode: A Tool for Supporting Comprehension and Implementation of Architectural Concerns2021 IEEE/ACM 29th International Conference on Program Comprehension (ICPC)10.1109/ICPC52881.2021.00056(485-489)Online publication date: May-2021
https://doi.org/10.1109/ICPC52881.2021.00056

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents