research-article

Automatic grading of programming assignments: an approach based on formal semantics

Authors:

Dinghao WuAuthors Info & Claims

ICSE-SEET '19: Proceedings of the 41st International Conference on Software Engineering: Software Engineering Education and Training

Pages 126 - 137

https://doi.org/10.1109/ICSE-SEET.2019.00022

Published: 27 May 2019 Publication History

Abstract

Programming assignment grading can be time-consuming and error-prone if done manually. Existing tools generate feedback with failing test cases. However, this method is inefficient and the results are incomplete. In this paper, we present AutoGrader, a tool that automatically determines the correctness of programming assignments and provides counterexamples given a single reference implementation of the problem. Instead of counting the passed tests, our tool searches for semantically different execution paths between a student's submission and the reference implementation. If such a difference is found, the submission is deemed incorrect; otherwise, it is judged to be a correct solution. We use weakest preconditions and symbolic execution to capture the semantics of execution paths and detect potential path differences. AutoGrader is the first automated grading tool that relies on program semantics and generates feedback with counterexamples based on path deviations. It also reduces human efforts in writing test cases and makes the grading more complete. We implement AutoGrader and test its effectiveness and performance with real-world programming problems and student submissions collected from an online programming site. Our experiment reveals that there are no false negatives using our proposed method and we detected 11 errors of online platform judges.

References

[1]

S. Carson, "MOOC," https://ocw.mit.edu/about/media-coverage/press-releases/chi600intro-announcement/, 2013, {Online; accessed 21-March-2017}.

[2]

D. S. Weld, E. Adar, L. Chilton, R. Hoffmann, E. Horvitz, M. Koch, J. Landay, C. H. Lin, and Mausam, "Personalized online education - a crowdsourcing challenge," in AAAI Workshop - Technical Report, vol. WS-12-08. AAAI, 2012, pp. 159--163.

[3]

P. Ihantola, T. Ahoniemi, V. Karavirta, and O. Seppälä, "Review of recent systems for automatic assessment of programming assignments," in Proceedings of the 10th Koli Calling International Conference on Computing Education Research. ACM, 2010, pp. 86--93.

Digital Library

[4]

N. Tillmann, J. De Halleux, T. Xie, S. Gulwani, and J. Bishop, "Teaching and learning programming and software engineering via interactive gaming," in Proceedings of the 2013 International Conference on Software Engineering, ser. ICSE '13, 2013, pp. 1117--1126.

Digital Library

[5]

L. Gong, "Auto-grading dynamic programming language assignments," University of California, Berkeley, Tech. Rep., 2014.

[6]

M. T. Helmick, "Interface-based programming assignments and automatic grading of Java programs," in Proceedings of the 12th Annual SIGCSE Conference on Innovation and Technology in Computer Science Education, ser. ITiCSE '07. ACM, 2007, pp. 63--67.

Digital Library

[7]

M. Sztipanovits, K. Qian, and X. Fu, "The automated web application testing (AWAT) system," in Proceedings of the 46th Annual Southeast Regional Conference, ser. ACM-SE 46. ACM, 2008, pp. 88--93.

Digital Library

[8]

X. Fu, B. Peltsverger, K. Qian, L. Tao, and J. Liu, "APOGEE: Automated project grading and instant feedback system for web based computing," in Proceedings of the 39th SIGCSE Technical Symposium on Computer Science Education, ser. SIGCSE '08, 2008, pp. 77--81.

Digital Library

[9]

M. Joy, N. Griffiths, and R. Boyatt, "The BOSS online submission and assessment system," J. Educ. Resour. Comput., vol. 5, no. 3, Sep. 2005.

Digital Library

[10]

CodeChef, "CodeChef," https://www.codechef.com/, 2009, {Online; accessed 21-April-2017}.

[11]

CodeForces, "CodeForces," https://hackerrank.com/, 2009, {Online; accessed 21-April-2017}.

[12]

Hackerrank, "Hackerrank," https://www.hackerrank.com/, 2017, {Online; accessed 21-April-2017}.

[13]

R. Singh, S. Gulwani, and A. Solar-Lezama, "Automated feedback generation for introductory programming assignments," in Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation, ser. PLDI '13, 2013, pp. 15--26.

Digital Library

[14]

S. Gulwani, I. Radiček, and F. Zuleger, "Feedback generation for performance problems in introductory programming assignments," in Proceedings of the 22Nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, ser. FSE 2014, 2014, pp. 41--51.

Digital Library

[15]

S. Gulwani, I. Radicek, and F. Zuleger, "Automated clustering and program repair for introductory programming assignments," CoRR, vol. abs/1603.03165, 2016. {Online}. Available: http://arxiv.org/abs/1603.03165

Digital Library

[16]

D. Brumley, J. Caballero, Z. Liang, J. Newsome, and D. Song, "Towards automatic discovery of deviations in binary implementations with applications to error detection and fingerprint generation," in Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium, ser. SS'07. USENIX Association, 2007, pp. 15:1--15:16.

Digital Library

[17]

F. Zhang, D. Wu, P. Liu, and S. Zhu, "Program logic based software plagiarism detection," in 2014 IEEE 25th International Symposium on Software Reliability Engineering. IEEE, Nov 2014, pp. 66--77.

Digital Library

[18]

J. Ming, F. Zhang, D. Wu, P. Liu, and S. Zhu, "Deviation-based obfuscation-resilient program equivalence checking with application to software plagiarism detection," IEEE Transactions on Reliability, vol. 65, no. 4, pp. 1647--1664, 2016.

[19]

A. Sharma, "Pathgrind," https://github.com/codelion/pathgrind, 2013, {Online; accessed 21-March-2017}.

[20]

P. Godefroid, N. Klarlund, and K. Sen, "DART: directed automated random testing," in PLDI, no. 6. ACM, 2005, pp. 213--223.

Digital Library

[21]

P. Godefroid, M. Y. Levin, and D. A. Molnar, "Automated whitebox fuzz testing," in Proceedings of the 16th Annual Network and Distributed System Security Symposium (NDSS), 2008, pp. 151--166.

[22]

J. C. King, "Symbolic execution and program testing," Communications of the ACM, vol. 19, no. 7, pp. 385--394, 1976.

Digital Library

[23]

E. J. Schwartz, T. Avgerinos, and D. Brumley, "All you ever wanted to know about dynamic taint analysis and forward symbolic execution (but might have been afraid to ask)," in 2010 IEEE symposium on Security and privacy. IEEE, 2010, pp. 317--331.

Digital Library

[24]

C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood, "Pin: Building customized program analysis tools with dynamic instrumentation," in Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation, ser. PLDI '05. ACM, 2005, pp. 190--200.

Digital Library

[25]

D. Brumley, I. Jager, T. Avgerinos, and E. J. Schwartz, "BAP: A binary analysis platform," in Proceedings of the 23rd International Conference on Computer Aided Verification (CAV), 2011, pp. 463--469.

Digital Library

[26]

D. Brumley, "BIL," https://github.com/BinaryAnalysisPlatform/bil, 2017.

[27]

E. W. Dijkstra, A Discipline of Programming. Prentice Hall, 1997.

Digital Library

[28]

V. Ganesh and D. L. Dill, "A decision procedure for bit-vectors and arrays," in International Conference on Computer Aided Verification. Springer, 2007, pp. 519--531.

Digital Library

[29]

D. Brumley, H. Wang, S. Jha, and D. Song, "Creating vulnerability signatures using weakest preconditions," in Computer Security Foundations Symposium, 2007. CSF'07. 20th IEEE. IEEE, 2007, pp. 311--325.

Digital Library

[30]

C. Flanagan and J. B. Saxe, "Avoiding exponential explosion: Generating compact verification conditions," in Proceedings of the 28th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, ser. POPL '01. ACM, 2001, pp. 193--205.

Digital Library

[31]

C. A. R. Hoare, "An axiomatic basis for computer programming," Communications of the ACM, vol. 12, no. 10, pp. 576--580, 1969.

Digital Library

[32]

C. Cadar, D. Dunbar, and D. Engler, "KLEE: Unassisted and automatic generation of high-coverage tests for complex systems programs," in Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, ser. OSDI'08, 2008, pp. 209--224.

Digital Library

[33]

P. Godefroid, A. Kiezun, and M. Y. Levin, "Grammar-based whitebox fuzzing," in Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation, ser. PLDI '08. ACM, 2008, pp. 206--215.

Digital Library

[34]

leetcode, "Not enough test cases," https://discuss.leetcode.com/topic/22381/not-enough-test-cases/2?page=1, 2015, {Online; accessed 21-April-2017}.

[35]

leetcode, "Majority Element - Not enough test cases," https://discuss.leetcode.com/topic/77846/majority-element-not-enough-test-cases, 2017, {Online; accessed 21-April-2017}.

[36]

leetcode, "Bug report, not enough test cases," https://discuss.leetcode.com/topic/58802/bug-report-not-enough-test-cases, 2017, {Online; accessed 21-April-2017}.

[37]

J. B. Hext and J. Winings, "An automatic grading scheme for simple programming exercises," Comm. ACM, vol. 12, no. 5, pp. 272--275, 1969.

Digital Library

[38]

M. Wick, D. Stevenson, and P. Wagner, "Using testing and JUnit across the curriculum," in Proceedings of the 36th SIGCSE Technical Symposium on Computer Science Education, ser. SIGCSE '05. ACM, 2005, pp. 236--240.

Digital Library

[39]

u. von Matt, "Kassandra: The automatic grading system," SIGCUE Outlook, vol. 22, no. 1, pp. 26--40, Jan. 1994.

Digital Library

[40]

M. Joy, N. Griffiths, and R. Boyatt, "The BOSS online submission and assessment system," Journal on Educational Resources in Computing (JERIC), vol. 5, no. 3, p. 2, 2005.

Digital Library

[41]

G. Jahangirova, D. Clark, M. Harman, and P. Tonella, "Test oracle assessment and improvement," in Proceedings of the 25th International Symposium on Software Testing and Analysis, 2016, pp. 247--258.

Digital Library

[42]

W. Afzal, R. Torkar, and R. Feldt, "A systematic review of search-based testing for non-functional system properties," Information and Software Technology, vol. 51, no. 6, pp. 957--976, 2009.

Digital Library

[43]

J. M. Rojas, G. Fraser, and A. Arcuri, "Seeding strategies in search-based unit test generation," Software Testing, Verification and Reliability, vol. 26, no. 5, pp. 366--401, 2016.

Digital Library

[44]

F. G. de Freitas and J. T. de Souza, "Ten years of search based software engineering: A bibliometric analysis," in Proceedings of the Third International Symposium on Search Based Software Engineering, 2011, pp. 18--32.

Digital Library

[45]

C. S. Păsăreanu, W. Visser, D. Bushnell, J. Geldenhuys, P. Mehlitz, and N. Rungta, "Symbolic PathFinder: integrating symbolic execution with model checking for java bytecode analysis," Automated Software Engineering, vol. 20, no. 3, pp. 391--425, 2013.

[46]

G. Birch, B. Fischer, and M. Poppleton, "Fast test suite-driven modelbased fault localisation with application to pinpointing defects in student programs," Software & Systems Modeling, vol. 1, 2017.

Digital Library

[47]

C. S. Păsăreanu and W. Visser, "A survey of new trends in symbolic execution for software testing and analysis," International Journal on Software Tools for Technology Transfer (STTT), vol. 11, no. 4, pp. 339--353, 2009.

[48]

J. Gao, B. Pang, and S. S. Lumetta, "Automated feedback framework for introductory programming courses," in Proceedings of the 2016 ACM Conference on Innovation and Technology in Computer Science Education. ACM, 2016, pp. 53--58.

Digital Library

[49]

G. Michaelson, "Automatic analysis of functional program style," in Australian Software Engineering Conference, 1996., Proceedings of 1996. IEEE, 1996, pp. 38--46.

Digital Library

[50]

K. Ala-Mutka, T. Uimonen, and H.-M. Jarvinen, "Supporting students in C++ programming courses with automatic program style assessment," Journal of Information Technology Education, vol. 3, no. 1, pp. 245--262, 2004.

[51]

M. Vujošević-Janičić, M. Nikolić, D. Tošić, and V. Kuncak, "Software verification and graph similarity for automated evaluation of students' assignments," Information and Software Technology, vol. 55, no. 6, pp. 1004--1016, 2013.

Digital Library

[52]

D. Perelman, J. Bishop, S. Gulwani, and D. Grossman, "Automated feedback and recognition through data mining in code hunt," Microsoft, Tech. Rep., 2015.

[53]

S. Parihar, Z. Dadachanji, P. K. Singh, R. Das, A. Karkare, and A. Bhattacharya, "Automatic grading and feedback using program repair for introductory programming courses," in Proceedings of the 2017 ACM Conference on Innovation and Technology in Computer Science Education, ser. ITiCSE '17. ACM, 2017, pp. 92--97.

Digital Library

[54]

Y. Pu, K. Narasimhan, A. Solar-Lezama, and R. Barzilay, "sk_p: a neural program corrector for moocs," in Companion Proceedings of the 2016 ACM SIGPLAN International Conference on Systems, Programming, Languages and Applications: Software for Humanity. ACM, 2016, pp. 39--40.

Digital Library

[55]

S. Srikant and V. Aggarwal, "Automatic grading of computer programs: A machine learning approach," in Proceedings of the 12th International Conference on Machine Learning and Applications, vol. 1. IEEE, 2013, pp. 85--92.

Digital Library

[56]

D. Jackson, D. A. Ladd et al., "Semantic diff: A tool for summarizing the effects of modifications." in ICSM, vol. 94. ACM, 1994, pp. 243-- 252.

Digital Library

[57]

B. Godlin and O. Strichman, "Regression verification," in Proceedings of the 46th Annual Design Automation Conference, ser. DAC '09. ACM, 2009, pp. 466--471.

Digital Library

[58]

S. K. Lahiri, C. Hawblitzel, M. Kawaguchi, and H. Rebêlo, "Symdiff: A language-agnostic semantic diff tool for imperative programs," in International Conference on Computer Aided Verification. Springer, 2012, pp. 712--717.

Digital Library

[59]

S. Person, M. B. Dwyer, S. Elbaum, and C. S. Pasareanu, "Differential symbolic execution," in Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering. ACM, 2008, pp. 226--237.

Digital Library

[60]

D. A. Ramos and D. R. Engler, "Practical, low-effort equivalence verification of real code," in International Conference on Computer Aided Verification. Springer, 2011, pp. 669--685.

Digital Library

[61]

D. Gao, M. K. Reiter, and D. Song, "BinHunt: Automatically finding semantic differences in binary programs," in International Conference on Information and Communications Security. Springer, 2008, pp. 238--255.

Digital Library

[62]

J. Ming, M. Pan, and D. Gao, "iBinHunt: Binary hunting with interprocedural control flow," in International Conference on Information Security and Cryptology. Springer, 2012, pp. 92--109.

Digital Library

[63]

S. Chakraborty, K. S. Meel, and M. Y. Vardi, "A scalable approximate model counter," in International Conference on Principles and Practice of Constraint Programming. Springer, 2013, pp. 200--216.

Cited By

Beaumont JWakevainen KMonga MLonati VBarendsen ESheard JPaterson J(2024)Scalable Autograding for Quantum Programming AssignmentsProceedings of the 2024 on Innovation and Technology in Computer Science Education V. 110.1145/3649217.3653619(457-463)Online publication date: 3-Jul-2024
https://dl.acm.org/doi/10.1145/3649217.3653619
Messer MBrown NKölling MShi M(2024)Automated Grading and Feedback Tools for Programming Education: A Systematic ReviewACM Transactions on Computing Education10.1145/363651524:1(1-43)Online publication date: 19-Feb-2024
https://dl.acm.org/doi/10.1145/3636515
Nurue HGray JLo DGamess E(2024)A Testing Extension for ScratchProceedings of the 2024 ACM Southeast Conference10.1145/3603287.3651217(266-271)Online publication date: 18-Apr-2024
https://dl.acm.org/doi/10.1145/3603287.3651217
Show More Cited By

Recommendations

Developing real-world programming assignments for CS1
ITICSE '06: Proceedings of the 11th annual SIGCSE conference on Innovation and technology in computer science education

Instructors have struggled to generate good programming assignments for the CS1 course. In attempting to deal with this issue ourselves, we have generated two real-world programming assignments that can be solved by most students yet generate challenges ...
Using hypersafety verification for proving correctness of programming assignments
ICSE-NIER '20: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: New Ideas and Emerging Results

We look at the problem of deciding correctness of a programming assignment submitted by a student, with respect to a reference implementation provided by the teacher, the correctness property being output equivalence of the two programs. Typically, ...
Developing real-world programming assignments for CS1

Instructors have struggled to generate good programming assignments for the CS1 course. In attempting to deal with this issue ourselves, we have generated two real-world programming assignments that can be solved by most students yet generate challenges ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICSE-SEET '19: Proceedings of the 41st International Conference on Software Engineering: Software Engineering Education and Training

May 2019

234 pages

Conference Chairs:
Sarah Beecham
Lero - the Irish Software Research Centre, Univ of Limerick
,
Daniela Damian
University of Victoria

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering
IEEE-CS: Computer Society

Publisher

IEEE Press

Publication History

Published: 27 May 2019

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ICSE '19

Sponsor:

SIGSOFT
IEEE-CS

ICSE '19: 41st International Conference on Software Engineering

May 27, 2019

Quebec, Montreal, Canada

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

16
Total Citations
View Citations
179
Total Downloads

Downloads (Last 12 months)11
Downloads (Last 6 weeks)0

Reflects downloads up to 18 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Beaumont JWakevainen KMonga MLonati VBarendsen ESheard JPaterson J(2024)Scalable Autograding for Quantum Programming AssignmentsProceedings of the 2024 on Innovation and Technology in Computer Science Education V. 110.1145/3649217.3653619(457-463)Online publication date: 3-Jul-2024
https://dl.acm.org/doi/10.1145/3649217.3653619
Messer MBrown NKölling MShi M(2024)Automated Grading and Feedback Tools for Programming Education: A Systematic ReviewACM Transactions on Computing Education10.1145/363651524:1(1-43)Online publication date: 19-Feb-2024
https://dl.acm.org/doi/10.1145/3636515
Nurue HGray JLo DGamess E(2024)A Testing Extension for ScratchProceedings of the 2024 ACM Southeast Conference10.1145/3603287.3651217(266-271)Online publication date: 18-Apr-2024
https://dl.acm.org/doi/10.1145/3603287.3651217
Silva DCarvalho DSilla C(2024)A Clustering-Based Computational Model to Group Students With Similar Programming Skills From Automatic Source Code Analysis Using Novel FeaturesIEEE Transactions on Learning Technologies10.1109/TLT.2023.327392617(428-444)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TLT.2023.3273926
Fan ZTan SRoychoudhury AJust RFraser G(2023)Concept-Based Automated Grading of CS-1 Programming AssignmentsProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3597926.3598049(199-210)Online publication date: 12-Jul-2023
https://dl.acm.org/doi/10.1145/3597926.3598049
Lawrence RFoss SUrazova T(2023)Evaluation of Submission Limits and Regression Penalties to Improve Student Behavior with Automatic Assessment SystemsACM Transactions on Computing Education10.1145/359121023:3(1-24)Online publication date: 20-Jun-2023
https://dl.acm.org/doi/10.1145/3591210
Nafa FSreeramareddy LMallapuram SMoulema P(2023)Improving Educational Outcomes: Developing and Assessing Grading System (ProGrader) for Programming CoursesHuman Interface and the Management of Information10.1007/978-3-031-35129-7_24(322-342)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.1007/978-3-031-35129-7_24
Blanchard JHott JBerry VCarroll REdmison BGlassey RKarnalim OPlancher BRussell SQuille KAlshaigy BBecker BQuille KLaakso M(2022)Stop Reinventing the Wheel! Promoting Community Software in Computing EducationProceedings of the 2022 Working Group Reports on Innovation and Technology in Computer Science Education10.1145/3571785.3574129(261-292)Online publication date: 27-Dec-2022
https://dl.acm.org/doi/10.1145/3571785.3574129
Tisha SOregon RBaumgartner GAlegre FMoreno J(2022)An automatic grading system for a high school-level computational thinking courseProceedings of the 4th International Workshop on Software Engineering Education for the Next Generation10.1145/3528231.3528357(20-27)Online publication date: 17-May-2022
https://dl.acm.org/doi/10.1145/3528231.3528357
Hofbauer MBachhuber CKuhn CSteinbach E(2022)Teaching software engineering as programming over timeProceedings of the 4th International Workshop on Software Engineering Education for the Next Generation10.1145/3528231.3528353(51-58)Online publication date: 17-May-2022
https://dl.acm.org/doi/10.1145/3528231.3528353
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents