Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3568813.3600124acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicerConference Proceedingsconference-collections
research-article
Public Access

Am I Wrong, or Is the Autograder Wrong? Effects of AI Grading Mistakes on Learning

Published: 10 September 2023 Publication History

Abstract

Errors in AI grading and feedback often have an intractable set of causes and are, by their nature, difficult to completely avoid. Since inaccurate feedback potentially harms learning, there is a need for designs and workflows that mitigate these harms. To better understand the mechanisms by which erroneous AI feedback impacts students’ learning, we conducted surveys and interviews that recorded students’ interactions with a short-answer AI autograder for “Explain in Plain English” code reading problems. Using causal modeling, we inferred the learning impacts of wrong answers marked as right (false positives, FPs) and right answers marked as wrong (false negatives, FNs). We further explored explanations for the learning impacts, including errors influencing participants’ engagement with feedback and assessments of their answers’ correctness, and participants’ prior performance in the class.
FPs harmed learning in large part due to participants’ failures to detect the errors. This was due to participants not paying attention to the feedback after being marked as right, and an apparent bias against admitting one’s answer was wrong once marked right. On the other hand, FNs harmed learning only for survey participants, suggesting that interviewees’ greater behavioral and cognitive engagement protected them from learning harms. Based on these findings, we propose ways to help learners detect FPs and encourage deeper reflection on FNs to mitigate the learning harms of AI errors.

References

[1]
John R Anderson, Albert T Corbett, Kenneth R Koedinger, and Ray Pelletier. 1995. Cognitive tutors: Lessons learned. The journal of the learning sciences 4, 2 (1995), 167–207.
[2]
Yigal Attali and Don Powers. 2008. Effect of immediate feedback and revision on psychometric properties of open-ended GRE® subject test items. ETS Research Report Series 2008, 1 (2008), i–23.
[3]
Sushmita Azad, Binglin Chen, Maxwell Fowler, Matthew West, and Craig Zilles. 2020. Strategies for Deploying Unreliable AI Graders in High-Transparency High-Stakes Exams. In International Conference on Artificial Intelligence in Education. Springer, Springer International Publishing, Cham, 16–28.
[4]
Albert Bandura. 1991. Social cognitive theory of self-regulation. Organizational behavior and human decision processes 50, 2 (1991), 248–287.
[5]
Albert Bandura and Daniel Cervone. 1983. Self-evaluative and self-efficacy mechanisms governing the motivational effects of goal systems.Journal of personality and social psychology 45, 5 (1983), 1017.
[6]
Robert L Bangert-Drowns, Chen-Lin C Kulik, James A Kulik, and MaryTeresa Morgan. 1991. The instructional effect of feedback in test-like events. Review of educational research 61, 2 (1991), 213–238.
[7]
Maria Bannert and Christoph Mengelkamp. 2008. Assessment of metacognitive skills by means of instruction to think aloud and reflect when prompted. Does the verbalisation method affect learning?Metacognition and Learning 3, 1 (2008), 39–58.
[8]
Reuben M Baron and David A Kenny. 1986. The moderator–mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations.Journal of personality and social psychology 51, 6 (1986), 1173.
[9]
BBC. 2020. A-levels and GCSEs: How did the exam algorithm work?https://www.bbc.com/news/explainers-53807730. Accessed 2022-09-14.
[10]
Randy Elliot Bennett. 2011. Formative assessment: A critical review. Assessment in education: principles, policy & practice 18, 1 (2011), 5–25.
[11]
J Martin Bland and Douglas G Altman. 2000. The odds ratio. Bmj 320, 7247 (2000), 1468.
[12]
Denys Brand, Matthew D Novak, Florence D DiGennaro Reed, and Samara A Tortolero. 2020. Examining the effects of feedback accuracy and timing on skill acquisition. Journal of organizational behavior management 40, 1-2 (2020), 3–18.
[13]
Joan F Brett and Leanne E Atwater. 2001. 360° feedback: Accuracy, reactions, and perceptions of usefulness.Journal of Applied psychology 86, 5 (2001), 930.
[14]
Steven Burrows, Iryna Gurevych, and Benno Stein. 2015. The eras and trends of automatic short answer grading. International Journal of Artificial Intelligence in Education 25, 1 (2015), 60–117.
[15]
Jennifer C. Jacoby, Sheelagh Heugh, Christopher Bax, and Christopher Branford-White. 2014. Enhancing learning through formative assessment. Innovations in Education and Teaching International 51, 1 (2014), 72–83.
[16]
Henian Chen, Patricia Cohen, and Sophie Chen. 2010. How big is a big odds ratio? Interpreting the magnitudes of odds ratios in epidemiological studies. Communications in Statistics—simulation and Computation® 39, 4 (2010), 860–864.
[17]
Roy B Clariana. 1990. A comparison of answer until correct feedback and knowledge of correct response feedback under two conditions of contextualization.Journal of Computer-Based Instruction (1990).
[18]
Albert T Corbett and John R Anderson. 2001. Locus of feedback control in computer-based tutoring: Impact on learning rate, achievement and attitudes. In Proceedings of the SIGCHI conference on Human factors in computing systems. 245–252.
[19]
Malcolm Corney, Sue Fitzgerald, Brian Hanks, Raymond Lister, Renee McCauley, and Laurie Murphy. 2014. ’Explain in Plain English’ Questions Revisited: Data Structures Problems. In Proceedings of the 45th ACM Technical Symposium on Computer Science Education (Atlanta, Georgia, USA) (SIGCSE ’14). ACM, New York, NY, USA, 591–596. http://doi.acm.org/10.1145/2538862.2538911
[20]
Donald B Fedor. 1991. Recipient responses to performance feedback: A proposed model and its implications. Research in personnel and human resources management 9, 73 (1991), 120.
[21]
Lot Fonteyne, Annick Eelbode, Isabelle Lanszweert, Elisabeth Roels, Stijn Schelfhout, Wouter Duyck, and Filip De Fruyt. 2018. Career goal engagement following negative feedback: Influence of expectancy-value and perceived feedback accuracy. International journal for educational and vocational guidance 18, 2 (2018), 165–180.
[22]
Max Fowler, Binglin Chen, Sushmita Azad, Matthew West, and Craig Zilles. 2021. Autograding" Explain in Plain English" questions using NLP. In Proceedings of the 52nd ACM Technical Symposium on Computer Science Education. 1163–1169.
[23]
Patricia Gaynor. 1981. The effect of feedback delay on retention of computer-based mathematical material. Journal of Computer-Based Instruction 8, 2 (1981), 28–34.
[24]
Andrew Gelman and Jennifer Hill. 2006. Data analysis using regression and multilevel/hierarchical models. Cambridge university press.
[25]
Andrew Gelman and Donald B Rubin. 1992. Inference from iterative simulation using multiple sequences. Statistical science (1992), 457–472.
[26]
Graham Gibbs and Claire Simpson. 2005. Conditions under which assessment supports students’ learning. Learning and teaching in higher education1 (2005), 3–31.
[27]
Nico Grant and Cade Metz. 2023. Google Releases Bard, Its Competitor in the Race to Create A.I. Chatbots. https://www.nytimes.com/2023/03/21/technology/google-bard-chatbot.html. Accessed 2023-3-24.
[28]
Douglas Grimes and Mark Warschauer. 2010. Utility in a fallible tool: A multi-site case study of automated writing evaluation. The Journal of Technology, Learning and Assessment 8, 6 (2010), 44 pages.
[29]
Karen Handley, Margaret Price, and Jill Millar. 2011. Beyond ‘doing time’: investigating the concept of student engagement with feedback. Oxford Review of Education 37, 4 (2011), 543–560.
[30]
Gerald S Hanna. 1976. Effects of total and partial feedback in multiple-choice testing upon learning. The Journal of Educational Research 69, 5 (1976), 202–205.
[31]
Richard Higgins, Peter Hartley, and Alan Skelton. 2002. The conscientious consumer: Reconsidering the role of assessment feedback in student learning. Studies in higher education 27, 1 (2002), 53–64.
[32]
Jason M Hirst and Florence D DiGennaro Reed. 2015. An examination of the effects of feedback accuracy on academic task acquisition in analogue settings. The Psychological Record 65, 1 (2015), 49–65.
[33]
Silas Hsu, Tiffany Wenting Li, Zhilin Zhang, Max Fowler, Craig Zilles, and Karrie Karahalios. 2021. Attitudes Surrounding an Imperfect AI Autograder. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI ’21). Association for Computing Machinery, New York, NY, USA, Article 681, 15 pages. https://doi.org/10.1145/3411764.3445424
[34]
Daniel R Ilgen, Cynthia D Fisher, and M Susan Taylor. 1979. Consequences of individual feedback on behavior in organizations.Journal of applied psychology 64, 4 (1979), 349.
[35]
Maurice Jakesch, Jeffrey T Hancock, and Mor Naaman. 2023. Human heuristics for AI-generated language are flawed. Proceedings of the National Academy of Sciences 120, 11 (2023), e2208839120.
[36]
Douglas A Johnson, Jessica M Rocheleau, and Rachael E Tilka. 2015. Considerations in feedback delivery: The role of accuracy and type of evaluation. Journal of Organizational Behavior Management 35, 3-4 (2015), 240–258.
[37]
Anders Jonsson. 2013. Facilitating productive use of feedback in higher education. Active learning in higher education 14, 1 (2013), 63–76.
[38]
Sally Jordan. 2012. Student engagement with assessment and feedback: Some lessons from short-answer free-text e-assessment questions. Computers & Education 58, 2 (2012), 818–834. https://doi.org/10.1016/j.compedu.2011.10.007
[39]
Matthew Kay, Gregory L Nelson, and Eric B Hekler. 2016. Researcher-centered design of statistics: Why Bayesian statistics better fit the culture and incentives of HCI. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. 4521–4532.
[40]
Angelo J Kinicki, Gregory E Prussia, Bin Joshua Wu, and Frances M McKee-Ryan. 2004. A covariance structure analysis of employees’ response to performance feedback.Journal of applied psychology 89, 6 (2004), 1057.
[41]
Christoph König and Rens van de Schoot. 2018. Bayesian statistics in educational research: a look at the current state of affairs. Educational Review 70, 4 (2018), 486–509.
[42]
Raymond W Kulhavy, Mary T White, Bruce W Topp, Ann L Chan, and James Adams. 1985. Feedback complexity and corrective efficiency. Contemporary educational psychology 10, 3 (1985), 285–291.
[43]
Claudia Leacock and Martin Chodorow. 2003. C-rater: Automated scoring of short-answer questions. Computers and the Humanities 37, 4 (2003), 389–405.
[44]
Raymond Lister, Colin Fidge, and Donna Teague. 2009. Further Evidence of a Relationship Between Explaining, Tracing and Writing Skills in Introductory Programming. In Proceedings of the 14th Annual ACM SIGCSE Conference on Innovation and Technology in Computer Science Education (Paris, France) (ITiCSE ’09). ACM, New York, NY, USA, 161–165. https://doi.org/10.1145/1562877.1562930
[45]
Mike Lopez, Jacqueline Whalley, Phil Robbins, and Raymond Lister. 2008. Relationships between reading, tracing and writing skills in introductory programming. In Proceedings of the Fourth International Workshop on Computing Education Research. ACM, 101–112.
[46]
Uwe Maier, Nicole Wolf, and Christoph Randler. 2016. Effects of a computer-assisted formative assessment intervention based on multiple-tier diagnostic items and different feedback types. Computers & Education 95 (2016), 85–98.
[47]
B Jean Mason and Roger Bruning. 2001. Providing feedback in computer-based instruction: What the research tells us. Retrieved February 15 (2001), 2007.
[48]
Santosh A Mathan and Kenneth R Koedinger. 2002. An empirical assessment of comprehension fostering features in an intelligent tutoring system. In International Conference on Intelligent Tutoring Systems. Springer, 330–343.
[49]
Richard McElreath. 2020. Statistical rethinking: A Bayesian course with examples in R and Stan. Chapman and Hall/CRC.
[50]
Cade Metz. 2022. The New Chatbots Could Change the World. Can You Trust Them?https://www.nytimes.com/2022/12/10/technology/ai-chat-bot-chatgpt.html. Accessed 2022-12-14.
[51]
Antonija Mitrovic, Stellan Ohlsson, and Devon K Barrow. 2013. The effect of positive feedback in a constraint-based intelligent tutoring system. Computers & Education 60, 1 (2013), 264–272.
[52]
Edna H Mory. 1994. Adaptive feedback in computer-based instruction: Effects of response certitude on performance, feedback-study time, and efficiency. Journal of Educational Computing Research 11, 3 (1994), 263–290.
[53]
Laurie Murphy, Renée McCauley, and Sue Fitzgerald. 2012. ’Explain in Plain English’ Questions: Implications for Teaching. In Proceedings of the 43rd ACM Technical Symposium on Computer Science Education (Raleigh, North Carolina, USA) (SIGCSE ’12). ACM, New York, NY, USA, 385–390. https://doi.org/10.1145/2157136.2157249
[54]
Susanne Narciss, Sergey Sosnovsky, Lenka Schnaubert, Eric Andrès, Anja Eichelmann, George Goguadze, and Erica Melis. 2014. Exploring feedback and student characteristics relevant for personalizing feedback strategies. Computers & Education 71 (2014), 56–76.
[55]
David Nicol. 2021. The power of internal feedback: Exploiting natural comparison processes. Assessment & Evaluation in Higher Education 46, 5 (2021), 756–778.
[56]
Stephen R Porter, Michael E Whitcomb, and William H Weitzer. 2004. Multiple surveys of students and survey fatigue. New directions for institutional research 2004, 121 (2004), 63–73.
[57]
Doris R Pridemore and James D Klein. 1995. Control of practice and level of feedback in computer-based instruction. Contemporary Educational Psychology 20, 4 (1995), 444–450.
[58]
Barry J Schimmel. 1983. A Meta-Analysis of Feedback to Learners in Computerized and Programmed Instruction. (1983).
[59]
Marvin L Schroth. 1992. The effects of delay of feedback on a delayed concept formation transfer task. Contemporary educational psychology 17, 1 (1992), 78–82.
[60]
Valerie J Shute. 2008. Focus on formative feedback. Review of educational research 78, 1 (2008), 153–189.
[61]
Hazel K Sinclair and Jennifer A Cleland. 2007. Undergraduate medical students: who seeks formative feedback?Medical education 41, 6 (2007), 580–582.
[62]
D Sleeman, Anthony E. Kelly, R Martinak, Robert D Ward, and Joi L Moore. 1989. Studies of diagnosis and remediation with high school algebra students. Cognitive science 13, 4 (1989), 551–568.
[63]
Caroline F Timmers, Jannie Braber-Van Den Broek, and StéPhanie M Van Den Berg. 2013. Motivational beliefs, student effort, and feedback behaviour in computer-based formative assessment. Computers & education 60, 1 (2013), 25–31.
[64]
Gill Turner and Graham Gibbs. 2010. Are assessment environments gendered? An analysis of the learning responses of male and female students to different assessment environments. Assessment & Evaluation in Higher Education 35, 6 (2010), 687–698.
[65]
Don VandeWalle and Larry L Cummings. 1997. A test of the influence of goal orientation on the feedback-seeking process.Journal of applied psychology 82, 3 (1997), 390.
[66]
Anne Venables, Grace Tan, and Raymond Lister. 2009. A Closer Look at Tracing, Explaining and Code Writing Skills in the Novice Programmer. In Proceedings of the Fifth International workshop on Computing Education Research. ACM, 117–128.
[67]
Jacqueline Whalley, Raymond Lister, Errol Thompson, Tony Clear, Phil Robbins, P K Ajith Kumar, and Christine Prasad. 2006. An Australasian study of Reading and Comprehension Skills in Novice Programmers, using the Bloom and SOLO Taxonomies. Eighth Australasian Computing Education Conference (ACE2006) (2006).
[68]
Dylan Wiliam. 2007. Keeping learning on track: Classroom assessment and the regulation of learning. Information Age Publishing.
[69]
Sue Ellen Williams. 1997. Teachers’ written comments and students’ responses: A socially constructed interaction. (1997).
[70]
Naomi E. Winstone, Robert A. Nash, Michael Parker, and James Rowntree. 2017. Supporting Learners’ Agentic Engagement With Feedback: A Systematic Review and a Taxonomy of Recipience Processes. Educational Psychologist 52, 1 (2017), 17–37. https://doi.org/10.1080/00461520.2016.1207538 arXiv:https://doi.org/10.1080/00461520.2016.1207538

Cited By

View all
  • (2024)Artificial Intelligence in EducationDisruptive Technologies in Education and Workforce Development10.4018/979-8-3693-3003-6.ch001(1-26)Online publication date: 30-Jun-2024
  • (2024)AI in CS Education: Opportunities, Challenges, and Pitfalls to AvoidACM Inroads10.1145/367920515:3(52-57)Online publication date: 21-Aug-2024
  • (2024)Combining LLM-Generated and Test-Based Feedback in a MOOC for ProgrammingProceedings of the Eleventh ACM Conference on Learning @ Scale10.1145/3657604.3662040(177-187)Online publication date: 9-Jul-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICER '23: Proceedings of the 2023 ACM Conference on International Computing Education Research - Volume 1
August 2023
520 pages
ISBN:9781450399760
DOI:10.1145/3568813
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 September 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. AI error
  2. Bayesian modeling
  3. EiPE
  4. autograder
  5. automated short answer grading
  6. computer science education
  7. explain in plain English
  8. formative feedback
  9. human-AI interaction

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

ICER 2023

Acceptance Rates

Overall Acceptance Rate 189 of 803 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)522
  • Downloads (Last 6 weeks)156
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Artificial Intelligence in EducationDisruptive Technologies in Education and Workforce Development10.4018/979-8-3693-3003-6.ch001(1-26)Online publication date: 30-Jun-2024
  • (2024)AI in CS Education: Opportunities, Challenges, and Pitfalls to AvoidACM Inroads10.1145/367920515:3(52-57)Online publication date: 21-Aug-2024
  • (2024)Combining LLM-Generated and Test-Based Feedback in a MOOC for ProgrammingProceedings of the Eleventh ACM Conference on Learning @ Scale10.1145/3657604.3662040(177-187)Online publication date: 9-Jul-2024
  • (2024)Prompting for Comprehension: Exploring the Intersection of Explain in Plain English Questions and Prompt WritingProceedings of the Eleventh ACM Conference on Learning @ Scale10.1145/3657604.3662039(39-50)Online publication date: 9-Jul-2024
  • (2024)Explaining Code with a Purpose: An Integrated Approach for Developing Code Comprehension and Prompting SkillsProceedings of the 2024 on Innovation and Technology in Computer Science Education V. 110.1145/3649217.3653587(283-289)Online publication date: 3-Jul-2024
  • (2024)Transforming Educational Assessment: Insights Into the Use of ChatGPT and Large Language Models in GradingInternational Journal of Human–Computer Interaction10.1080/10447318.2024.2338330(1-12)Online publication date: 15-Apr-2024
  • (2024)Potential Pitfalls of False PositivesArtificial Intelligence in Education. Posters and Late Breaking Results, Workshops and Tutorials, Industry and Innovation Tracks, Practitioners, Doctoral Consortium and Blue Sky10.1007/978-3-031-64315-6_45(469-476)Online publication date: 2-Jul-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media