Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1109/ICSE-SEET.2019.00022acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Automatic grading of programming assignments: an approach based on formal semantics

Published: 27 May 2019 Publication History

Abstract

Programming assignment grading can be time-consuming and error-prone if done manually. Existing tools generate feedback with failing test cases. However, this method is inefficient and the results are incomplete. In this paper, we present AutoGrader, a tool that automatically determines the correctness of programming assignments and provides counterexamples given a single reference implementation of the problem. Instead of counting the passed tests, our tool searches for semantically different execution paths between a student's submission and the reference implementation. If such a difference is found, the submission is deemed incorrect; otherwise, it is judged to be a correct solution. We use weakest preconditions and symbolic execution to capture the semantics of execution paths and detect potential path differences. AutoGrader is the first automated grading tool that relies on program semantics and generates feedback with counterexamples based on path deviations. It also reduces human efforts in writing test cases and makes the grading more complete. We implement AutoGrader and test its effectiveness and performance with real-world programming problems and student submissions collected from an online programming site. Our experiment reveals that there are no false negatives using our proposed method and we detected 11 errors of online platform judges.

References

[1]
S. Carson, "MOOC," https://ocw.mit.edu/about/media-coverage/press-releases/chi600intro-announcement/, 2013, {Online; accessed 21-March-2017}.
[2]
D. S. Weld, E. Adar, L. Chilton, R. Hoffmann, E. Horvitz, M. Koch, J. Landay, C. H. Lin, and Mausam, "Personalized online education - a crowdsourcing challenge," in AAAI Workshop - Technical Report, vol. WS-12-08. AAAI, 2012, pp. 159--163.
[3]
P. Ihantola, T. Ahoniemi, V. Karavirta, and O. Seppälä, "Review of recent systems for automatic assessment of programming assignments," in Proceedings of the 10th Koli Calling International Conference on Computing Education Research. ACM, 2010, pp. 86--93.
[4]
N. Tillmann, J. De Halleux, T. Xie, S. Gulwani, and J. Bishop, "Teaching and learning programming and software engineering via interactive gaming," in Proceedings of the 2013 International Conference on Software Engineering, ser. ICSE '13, 2013, pp. 1117--1126.
[5]
L. Gong, "Auto-grading dynamic programming language assignments," University of California, Berkeley, Tech. Rep., 2014.
[6]
M. T. Helmick, "Interface-based programming assignments and automatic grading of Java programs," in Proceedings of the 12th Annual SIGCSE Conference on Innovation and Technology in Computer Science Education, ser. ITiCSE '07. ACM, 2007, pp. 63--67.
[7]
M. Sztipanovits, K. Qian, and X. Fu, "The automated web application testing (AWAT) system," in Proceedings of the 46th Annual Southeast Regional Conference, ser. ACM-SE 46. ACM, 2008, pp. 88--93.
[8]
X. Fu, B. Peltsverger, K. Qian, L. Tao, and J. Liu, "APOGEE: Automated project grading and instant feedback system for web based computing," in Proceedings of the 39th SIGCSE Technical Symposium on Computer Science Education, ser. SIGCSE '08, 2008, pp. 77--81.
[9]
M. Joy, N. Griffiths, and R. Boyatt, "The BOSS online submission and assessment system," J. Educ. Resour. Comput., vol. 5, no. 3, Sep. 2005.
[10]
CodeChef, "CodeChef," https://www.codechef.com/, 2009, {Online; accessed 21-April-2017}.
[11]
CodeForces, "CodeForces," https://hackerrank.com/, 2009, {Online; accessed 21-April-2017}.
[12]
Hackerrank, "Hackerrank," https://www.hackerrank.com/, 2017, {Online; accessed 21-April-2017}.
[13]
R. Singh, S. Gulwani, and A. Solar-Lezama, "Automated feedback generation for introductory programming assignments," in Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation, ser. PLDI '13, 2013, pp. 15--26.
[14]
S. Gulwani, I. Radiček, and F. Zuleger, "Feedback generation for performance problems in introductory programming assignments," in Proceedings of the 22Nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, ser. FSE 2014, 2014, pp. 41--51.
[15]
S. Gulwani, I. Radicek, and F. Zuleger, "Automated clustering and program repair for introductory programming assignments," CoRR, vol. abs/1603.03165, 2016. {Online}. Available: http://arxiv.org/abs/1603.03165
[16]
D. Brumley, J. Caballero, Z. Liang, J. Newsome, and D. Song, "Towards automatic discovery of deviations in binary implementations with applications to error detection and fingerprint generation," in Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium, ser. SS'07. USENIX Association, 2007, pp. 15:1--15:16.
[17]
F. Zhang, D. Wu, P. Liu, and S. Zhu, "Program logic based software plagiarism detection," in 2014 IEEE 25th International Symposium on Software Reliability Engineering. IEEE, Nov 2014, pp. 66--77.
[18]
J. Ming, F. Zhang, D. Wu, P. Liu, and S. Zhu, "Deviation-based obfuscation-resilient program equivalence checking with application to software plagiarism detection," IEEE Transactions on Reliability, vol. 65, no. 4, pp. 1647--1664, 2016.
[19]
A. Sharma, "Pathgrind," https://github.com/codelion/pathgrind, 2013, {Online; accessed 21-March-2017}.
[20]
P. Godefroid, N. Klarlund, and K. Sen, "DART: directed automated random testing," in PLDI, no. 6. ACM, 2005, pp. 213--223.
[21]
P. Godefroid, M. Y. Levin, and D. A. Molnar, "Automated whitebox fuzz testing," in Proceedings of the 16th Annual Network and Distributed System Security Symposium (NDSS), 2008, pp. 151--166.
[22]
J. C. King, "Symbolic execution and program testing," Communications of the ACM, vol. 19, no. 7, pp. 385--394, 1976.
[23]
E. J. Schwartz, T. Avgerinos, and D. Brumley, "All you ever wanted to know about dynamic taint analysis and forward symbolic execution (but might have been afraid to ask)," in 2010 IEEE symposium on Security and privacy. IEEE, 2010, pp. 317--331.
[24]
C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood, "Pin: Building customized program analysis tools with dynamic instrumentation," in Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation, ser. PLDI '05. ACM, 2005, pp. 190--200.
[25]
D. Brumley, I. Jager, T. Avgerinos, and E. J. Schwartz, "BAP: A binary analysis platform," in Proceedings of the 23rd International Conference on Computer Aided Verification (CAV), 2011, pp. 463--469.
[26]
D. Brumley, "BIL," https://github.com/BinaryAnalysisPlatform/bil, 2017.
[27]
E. W. Dijkstra, A Discipline of Programming. Prentice Hall, 1997.
[28]
V. Ganesh and D. L. Dill, "A decision procedure for bit-vectors and arrays," in International Conference on Computer Aided Verification. Springer, 2007, pp. 519--531.
[29]
D. Brumley, H. Wang, S. Jha, and D. Song, "Creating vulnerability signatures using weakest preconditions," in Computer Security Foundations Symposium, 2007. CSF'07. 20th IEEE. IEEE, 2007, pp. 311--325.
[30]
C. Flanagan and J. B. Saxe, "Avoiding exponential explosion: Generating compact verification conditions," in Proceedings of the 28th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, ser. POPL '01. ACM, 2001, pp. 193--205.
[31]
C. A. R. Hoare, "An axiomatic basis for computer programming," Communications of the ACM, vol. 12, no. 10, pp. 576--580, 1969.
[32]
C. Cadar, D. Dunbar, and D. Engler, "KLEE: Unassisted and automatic generation of high-coverage tests for complex systems programs," in Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, ser. OSDI'08, 2008, pp. 209--224.
[33]
P. Godefroid, A. Kiezun, and M. Y. Levin, "Grammar-based whitebox fuzzing," in Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation, ser. PLDI '08. ACM, 2008, pp. 206--215.
[34]
leetcode, "Not enough test cases," https://discuss.leetcode.com/topic/22381/not-enough-test-cases/2?page=1, 2015, {Online; accessed 21-April-2017}.
[35]
leetcode, "Majority Element - Not enough test cases," https://discuss.leetcode.com/topic/77846/majority-element-not-enough-test-cases, 2017, {Online; accessed 21-April-2017}.
[36]
leetcode, "Bug report, not enough test cases," https://discuss.leetcode.com/topic/58802/bug-report-not-enough-test-cases, 2017, {Online; accessed 21-April-2017}.
[37]
J. B. Hext and J. Winings, "An automatic grading scheme for simple programming exercises," Comm. ACM, vol. 12, no. 5, pp. 272--275, 1969.
[38]
M. Wick, D. Stevenson, and P. Wagner, "Using testing and JUnit across the curriculum," in Proceedings of the 36th SIGCSE Technical Symposium on Computer Science Education, ser. SIGCSE '05. ACM, 2005, pp. 236--240.
[39]
u. von Matt, "Kassandra: The automatic grading system," SIGCUE Outlook, vol. 22, no. 1, pp. 26--40, Jan. 1994.
[40]
M. Joy, N. Griffiths, and R. Boyatt, "The BOSS online submission and assessment system," Journal on Educational Resources in Computing (JERIC), vol. 5, no. 3, p. 2, 2005.
[41]
G. Jahangirova, D. Clark, M. Harman, and P. Tonella, "Test oracle assessment and improvement," in Proceedings of the 25th International Symposium on Software Testing and Analysis, 2016, pp. 247--258.
[42]
W. Afzal, R. Torkar, and R. Feldt, "A systematic review of search-based testing for non-functional system properties," Information and Software Technology, vol. 51, no. 6, pp. 957--976, 2009.
[43]
J. M. Rojas, G. Fraser, and A. Arcuri, "Seeding strategies in search-based unit test generation," Software Testing, Verification and Reliability, vol. 26, no. 5, pp. 366--401, 2016.
[44]
F. G. de Freitas and J. T. de Souza, "Ten years of search based software engineering: A bibliometric analysis," in Proceedings of the Third International Symposium on Search Based Software Engineering, 2011, pp. 18--32.
[45]
C. S. Păsăreanu, W. Visser, D. Bushnell, J. Geldenhuys, P. Mehlitz, and N. Rungta, "Symbolic PathFinder: integrating symbolic execution with model checking for java bytecode analysis," Automated Software Engineering, vol. 20, no. 3, pp. 391--425, 2013.
[46]
G. Birch, B. Fischer, and M. Poppleton, "Fast test suite-driven modelbased fault localisation with application to pinpointing defects in student programs," Software & Systems Modeling, vol. 1, 2017.
[47]
C. S. Păsăreanu and W. Visser, "A survey of new trends in symbolic execution for software testing and analysis," International Journal on Software Tools for Technology Transfer (STTT), vol. 11, no. 4, pp. 339--353, 2009.
[48]
J. Gao, B. Pang, and S. S. Lumetta, "Automated feedback framework for introductory programming courses," in Proceedings of the 2016 ACM Conference on Innovation and Technology in Computer Science Education. ACM, 2016, pp. 53--58.
[49]
G. Michaelson, "Automatic analysis of functional program style," in Australian Software Engineering Conference, 1996., Proceedings of 1996. IEEE, 1996, pp. 38--46.
[50]
K. Ala-Mutka, T. Uimonen, and H.-M. Jarvinen, "Supporting students in C++ programming courses with automatic program style assessment," Journal of Information Technology Education, vol. 3, no. 1, pp. 245--262, 2004.
[51]
M. Vujošević-Janičić, M. Nikolić, D. Tošić, and V. Kuncak, "Software verification and graph similarity for automated evaluation of students' assignments," Information and Software Technology, vol. 55, no. 6, pp. 1004--1016, 2013.
[52]
D. Perelman, J. Bishop, S. Gulwani, and D. Grossman, "Automated feedback and recognition through data mining in code hunt," Microsoft, Tech. Rep., 2015.
[53]
S. Parihar, Z. Dadachanji, P. K. Singh, R. Das, A. Karkare, and A. Bhattacharya, "Automatic grading and feedback using program repair for introductory programming courses," in Proceedings of the 2017 ACM Conference on Innovation and Technology in Computer Science Education, ser. ITiCSE '17. ACM, 2017, pp. 92--97.
[54]
Y. Pu, K. Narasimhan, A. Solar-Lezama, and R. Barzilay, "sk_p: a neural program corrector for moocs," in Companion Proceedings of the 2016 ACM SIGPLAN International Conference on Systems, Programming, Languages and Applications: Software for Humanity. ACM, 2016, pp. 39--40.
[55]
S. Srikant and V. Aggarwal, "Automatic grading of computer programs: A machine learning approach," in Proceedings of the 12th International Conference on Machine Learning and Applications, vol. 1. IEEE, 2013, pp. 85--92.
[56]
D. Jackson, D. A. Ladd et al., "Semantic diff: A tool for summarizing the effects of modifications." in ICSM, vol. 94. ACM, 1994, pp. 243-- 252.
[57]
B. Godlin and O. Strichman, "Regression verification," in Proceedings of the 46th Annual Design Automation Conference, ser. DAC '09. ACM, 2009, pp. 466--471.
[58]
S. K. Lahiri, C. Hawblitzel, M. Kawaguchi, and H. Rebêlo, "Symdiff: A language-agnostic semantic diff tool for imperative programs," in International Conference on Computer Aided Verification. Springer, 2012, pp. 712--717.
[59]
S. Person, M. B. Dwyer, S. Elbaum, and C. S. Pasareanu, "Differential symbolic execution," in Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering. ACM, 2008, pp. 226--237.
[60]
D. A. Ramos and D. R. Engler, "Practical, low-effort equivalence verification of real code," in International Conference on Computer Aided Verification. Springer, 2011, pp. 669--685.
[61]
D. Gao, M. K. Reiter, and D. Song, "BinHunt: Automatically finding semantic differences in binary programs," in International Conference on Information and Communications Security. Springer, 2008, pp. 238--255.
[62]
J. Ming, M. Pan, and D. Gao, "iBinHunt: Binary hunting with interprocedural control flow," in International Conference on Information Security and Cryptology. Springer, 2012, pp. 92--109.
[63]
S. Chakraborty, K. S. Meel, and M. Y. Vardi, "A scalable approximate model counter," in International Conference on Principles and Practice of Constraint Programming. Springer, 2013, pp. 200--216.

Cited By

View all
  • (2024)Scalable Autograding for Quantum Programming AssignmentsProceedings of the 2024 on Innovation and Technology in Computer Science Education V. 110.1145/3649217.3653619(457-463)Online publication date: 3-Jul-2024
  • (2024)Automated Grading and Feedback Tools for Programming Education: A Systematic ReviewACM Transactions on Computing Education10.1145/363651524:1(1-43)Online publication date: 19-Feb-2024
  • (2024)A Testing Extension for ScratchProceedings of the 2024 ACM Southeast Conference10.1145/3603287.3651217(266-271)Online publication date: 18-Apr-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICSE-SEET '19: Proceedings of the 41st International Conference on Software Engineering: Software Engineering Education and Training
May 2019
234 pages

Sponsors

Publisher

IEEE Press

Publication History

Published: 27 May 2019

Check for updates

Author Tags

  1. automatic grader
  2. equivalence checking
  3. programming assignments
  4. weakest precondition

Qualifiers

  • Research-article

Conference

ICSE '19
Sponsor:

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)0
Reflects downloads up to 18 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Scalable Autograding for Quantum Programming AssignmentsProceedings of the 2024 on Innovation and Technology in Computer Science Education V. 110.1145/3649217.3653619(457-463)Online publication date: 3-Jul-2024
  • (2024)Automated Grading and Feedback Tools for Programming Education: A Systematic ReviewACM Transactions on Computing Education10.1145/363651524:1(1-43)Online publication date: 19-Feb-2024
  • (2024)A Testing Extension for ScratchProceedings of the 2024 ACM Southeast Conference10.1145/3603287.3651217(266-271)Online publication date: 18-Apr-2024
  • (2024)A Clustering-Based Computational Model to Group Students With Similar Programming Skills From Automatic Source Code Analysis Using Novel FeaturesIEEE Transactions on Learning Technologies10.1109/TLT.2023.327392617(428-444)Online publication date: 1-Jan-2024
  • (2023)Concept-Based Automated Grading of CS-1 Programming AssignmentsProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3597926.3598049(199-210)Online publication date: 12-Jul-2023
  • (2023)Evaluation of Submission Limits and Regression Penalties to Improve Student Behavior with Automatic Assessment SystemsACM Transactions on Computing Education10.1145/359121023:3(1-24)Online publication date: 20-Jun-2023
  • (2023)Improving Educational Outcomes: Developing and Assessing Grading System (ProGrader) for Programming CoursesHuman Interface and the Management of Information10.1007/978-3-031-35129-7_24(322-342)Online publication date: 23-Jul-2023
  • (2022)Stop Reinventing the Wheel! Promoting Community Software in Computing EducationProceedings of the 2022 Working Group Reports on Innovation and Technology in Computer Science Education10.1145/3571785.3574129(261-292)Online publication date: 27-Dec-2022
  • (2022)An automatic grading system for a high school-level computational thinking courseProceedings of the 4th International Workshop on Software Engineering Education for the Next Generation10.1145/3528231.3528357(20-27)Online publication date: 17-May-2022
  • (2022)Teaching software engineering as programming over timeProceedings of the 4th International Workshop on Software Engineering Education for the Next Generation10.1145/3528231.3528353(51-58)Online publication date: 17-May-2022
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media