Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Automated test generation for Scratch programs

Published: 13 May 2023 Publication History

Abstract

The importance of programming education has led to dedicated educational programming environments, where users visually arrange block-based programming constructs that typically control graphical, interactive game-like programs. The Scratch programming environment is particularly popular, with more than 90 million registered users at the time of this writing. While the block-based nature of Scratch helps learners by preventing syntactical mistakes, there nevertheless remains a need to provide feedback and support in order to implement desired functionality. To support individual learning and classroom settings, this feedback and support should ideally be provided in an automated fashion, which requires tests to enable dynamic program analysis. In prior work we introduced Whisker, a framework that enables automated testing of Scratch programs. However, creating these automated tests for Scratch programs is challenging. In this paper, we therefore investigate how to automatically generate Whisker tests. Generating tests for Scratch raises important challenges: First, game-like programs are typically randomised, leading to flaky tests. Second, Scratch programs usually consist of animations and interactions with long delays, inhibiting the application of classical test generation approaches. Thus, the new application domain raises the question of which test generation technique is best suited to produce high coverage tests capable of detecting faulty behaviour. We investigate these questions using an extension of the Whisker test framework for automated test generation. Evaluation on common programming exercises, a random sample of 1000 Scratch user programs, and the 1000 most popular Scratch programs demonstrates that our approach enables Whisker to reliably accelerate test executions, and even though many Scratch programs are small and easy to cover, there are many unique challenges for which advanced search-based test generation using many-objective algorithms is needed in order to achieve high coverage.

References

[1]
Adler F, Fraser G, Gründinger E, Körber N, Labrenz S, Lerchenberger J, Lukasczyk S, Schweikl S (2021) Improving readability of Scratch programs with search-based refactoring. In: 21st IEEE international working conference on source code analysis and manipulation, SCAM, 2021. IEEE, pp 120–130
[2]
Aivaloglou E, Hermans F (2016) How kids code and how we know: an exploratory study on the Scratch repository. In: Proceedings ICER, pp 53–61
[3]
Amalfitano D, Fasolino AR, Tramontana P, Ta BD, and Mobiguitar AMM Automated model-based testing of mobile apps IEEE Softw 2014 32 5 53-59
[4]
Anjum MS, Ryan C (2020) Seeding grammars in grammatical evolution to improve search based software testing. In: European conference on genetic programming (part of EvoStar). Springer, pp 18–34
[5]
Arcuri A It really does matter how you normalize the branch distance in search-based software testing Softw Test Verif Reliability 2013 23 2 119-147
[6]
Arcuri A (2017) Many independent objective (MIO) algorithm for test suite generation. In: Proceedings of the international symposium on search based software engineering, SSBSE, vol 10452 of lecture notes in computer science. Springer, pp 3–17
[7]
Arcuri A and Briand L A hitchhiker’s guide to statistical tests for assessing randomized algorithms in software engineering Softw Testing Verification and Reliability 2014 24 3 219-250
[8]
Baldoni R, Coppa E, D’elia DC, Demetrescu C, and Finocchi I A survey of symbolic execution techniques ACM Comput Surveys (CSUR) 2018 51 3 1-39
[9]
Boe B, Hill C, Len M, Dreschler G, Conrad P, Hairball DF (2013) Lint-inspired static analysis of Scratch projects. In: Proceedings SIGCSE, pp 215–220
[10]
Budd TA and Angluin D Two notions of correctness and their relation to testing Acta Inform 1982 18 1 31-45
[11]
Campos J, Ge Y, Fraser G, Eler M, Arcuri A (2017) An empirical evaluation of evolutionary algorithms for test suite generation. In: International symposium on search based software engineering. Springer, pp 33–48
[12]
Chang Z, Sun Y, Wu T-Y, Guizani M (2018) Scratch analysis tool (sat): a modern Scratch project analysis tool based on antlr to assess computational thinking skills. In: 2018 14th International wireless communications & mobile computing conference (IWCMC). IEEE, pp 950–955
[13]
Chen YT, Gopinath R, Tadakamalla A, Ernst MD, Holmes R, Fraser G, Ammann P, Just R (2020) Revisiting the relationship between fault detection, test adequacy criteria, and test set size. In: Proceedings of the 35th IEEE/ACM international conference on automated software engineering, pp 237–249
[14]
Cooper S, Dann W, Pausch R, Pausch R (2000) Alice: a 3-d tool for introductory programming concepts. In: Journal of computing sciences in colleges. Consortium for computing sciences in colleges, vol 15, pp 107–116
[15]
Corbett AT, Anderson JR (2001) Locus of feedback control in computer-based tutoring impact on learning rate, achievement and attitudes. In: Proceedings of the SIGCHI conference on human factors in computing systems. ACM, pp 245–252
[16]
Deb K, Pratap A, Agarwal S, and Meyarivan T A fast and elitist multiobjective genetic algorithm: NSGA-II IEEE Trans Evol Computat 2002 6 2 182-197
[17]
Deiner A, Frädrich C, Fraser G, Geserer S, Zantner N (2020) Search-based testing for Scratch programs. In: International symposium on search based software engineering. Springer, pp 58–72
[18]
Diner D, Fraser G, Schweikl S, Stahlbauer A (2021) Generating timed ui tests from counterexamples. In: International conference on tests and proofs. Springer, pp 53–71
[19]
Edmison B, Edwards SH, Pérez-quiñones MA (2017) Using spectrum-based fault location and heatmaps to express debugging suggestions to student programmers. In: Proceedings of the nineteenth australasian computing education conference, ACE ’17. Association for computing machinery, pp 48–54, New York
[20]
Edwards SH, Murali KP (2017) Codeworkout: short programming exercises with built-in data collection. In: Proceedings of the 2017 ACM conference on innovation and technology in computer science education, pp 188–193
[21]
Feldmeier P, Fraser G (2022) Neuroevolution-based generation of tests and oracles for games. In: 37th IEEE/ACM international conference on automated software engineering (ASE ’22). ACM
[22]
Fields DA, Kafai YB, and Giang MT Youth computational participation in the wild understanding experience and equity in participating and programming in the online Scratch community ACM Trans Comput Educ (TOCE) 2017 17 3 1-22
[23]
Frädrich C, Obermüller F, Körber N, Heuer U, Fraser G (2020) Common bugs in Scratch programs. In: Proceedings of the 2020 ACM conference on innovation and technology in computer science education, pp 89–95
[24]
Franklin D, Weintrop D, Palmer J, Coenraad M, Cobian M, Beck K, Rasmussen A, Krause S, White M, Anaya M, Crenshaw Z (2020) Scratch encore: the design and pilot of a culturally-relevant intermediate Scratch curriculum. In: Proceedings of the 51st ACM technical symposium on computer science education, SIGCSE ’20. Association for computing machinery. ISBN 9781450367936, pp 794–800, New York
[25]
Fraser G and Arcuri A Whole test suite generation IEEE Trans Softw Eng 2012 39 2 276-291
[26]
Fraser G and Zeller A Mutation-driven generation of unit tests and oracles IEEE Trans Softw Eng 2011 38 2 278-292
[27]
Fraser G, Arcuri A, and McMinn P A memetic algorithm for whole test suite generation J Syst Softw 2015 103 311-327
[28]
Fraser G, Heuer U, Körber N, Wasmeier E et al (2021) Litterbox: a linter for Scratch programs. In: 2021 IEEE/ACM 43rd international conference on software engineering: software engineering education and training (ICSE-SEET). IEEE, pp 183–188
[29]
Ganov S, Killmar C, Khurshid S, Perry DE (2009) Event listener analysis and symbolic execution for testing gui applications. In: International conference on formal engineering methods. Springer, pp 69–87
[30]
Garcia D, Harvey B, and Barnes T The beauty and joy of computing ACM Inroads 2015 6 4 71-79
[31]
Geldreich K, Funke A, Hubwieser P (2016) A programming circus for primary schools. In: ISSEP 2016, pp 49–50
[32]
Gopinath R, Jensen C, Groce A (2014) Mutations: how close are they to real faults?. In: 2014 IEEE 25th international symposium on software reliability engineering. IEEE, pp 189–200
[33]
Götz K, Feldmeier P, Fraser G (2022) Model-based testing of Scratch programs. In: 2022 IEEE conference on software testing, verification and validation (ICST). IEEE, pp 411–421
[34]
Greifenstein L, Obermüller F, Wasmeier E, Heuer U, Fraser G (2021) Effects of hints on debugging Scratch programs: an empirical study with primary school teachers in training. In: The 16th workshop in primary and secondary computing education, pp 1–10
[35]
Gross F, Fraser G, Zeller A (2012) Search-based system testing: high coverage, no false alarms. In: Proceedings ISSTA, pp 67–77
[36]
Gruber M, Lukasczyk S, Kroiß F, Fraser G (2021) An empirical study of flaky tests in python. In: 2021 14th IEEE conference on software testing, verification and validation (ICST). IEEE, pp 148–158
[37]
Gusukuma L, Tech V, Cory Bart A, Kafura D, Ernst J (2018) Misconception-driven feedback : results from an experimental study, (1):160–168
[38]
Harvey B, Garcia DD, Barnes T, Titterton N, Armendariz D, Segars L, Lemon E, Morris S, Paley J (2013) Snap!(build your own blocks). In: Proceedings of the 44th ACM technical symposium on computer science education, pp 759–759
[39]
Hermans F, Aivaloglou E (2016) Do code smells hamper novice programming? a controlled experiment on Scratch programs. In: Proceedings ICPC. IEEE, pp 1–10
[40]
Hermans F, Stolee KT, Hoepelman D (2016) Smells in block-based programming languages. In: Proceedings VL/HCC. IEEE, pp 68–72
[41]
Inozemtseva L, Holmes R (2014) Coverage is not strongly correlated with test suite effectiveness. In: Proceedings of the 36th international conference on software engineering, pp 435–445
[42]
Jahangirova G, Clark D, Harman M, Tonella P (2016) Test oracle assessment and improvement. In: Proceedings of the 25th international symposium on software testing and analysis, pp 247–258
[43]
Jia Y and Harman M An analysis and survey of the development of mutation testing IEEE Trans Softw Eng 2010 37 5 649-678
[44]
Johnson DE (2016) Itch: individual testing of computer homework for Scratch assignments. In: Proceedings SIGCSE, pp 223–227
[45]
Kalliamvakou E, Gousios G, Blincoe K, Singer L, German DM, and Damian D An in-depth study of the promises and perils of mining github Empir Softw Eng 2016 21 5 2035-2071
[46]
Kölling M The greenfoot programming environment ACM Trans Comput Educ 2010 10 4 1-21
[47]
Korel B Automated software test data generation IEEE Trans Softw Eng 1990 16 8 870-879
[48]
Lee I, Martin F, Denner J, Coulter B, Allan W, Erickson J, Malyn-Smith J, and Werner L Computational thinking for youth in practice Acm Inroads 2011 2 1 32-37
[49]
Leitner A, Oriol M, Zeller A, Ciupa I, Meyer B (2007) Efficient unit test case minimization. In: Proceedings of the twenty-second IEEE/ACM international conference on automated software engineering, pp 417–420
[50]
Luo Q, Hariri F, Eloussi L, Marinov D (2014) An empirical analysis of flaky tests. In: Proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering, pp 643–653
[51]
Mahmood R, Mirzaei N, Malek S (2014) Evodroid: segmented evolutionary testing of android apps. In: Proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering, pp 599–609
[52]
Maj P, Siek K, Kovalenko A, Vitek J (2021) Codedj: Reproducible queries over large-scale software repositories. In: 35th European conference on object-oriented programming (ECOOP 2021). Schloss Dagstuhl-Leibniz-Zentrum für Informatik
[53]
Maloney J, Resnick M, Rusk N, Silverman B, and Eastmond E The Scratch programming language and environment TOCE 2010 10 4 1-15
[54]
Mao K, Harman M, Jia Y (2016) Sapienz: multi-objective automated testing for android applications. In: Proceedings ISSTA, pp 94–105
[55]
Mariani L, Pezze M, Riganelli O, Santoro M (2012) Autoblacktest: automatic black-box testing of interactive applications. In: 2012 IEEE fifth international conference on software testing, verification and validation. IEEE, pp 81–90
[56]
McMinn P Search-based software test data generation: a survey Softw Test Verification Reliability 2004 14 2 105-156
[57]
Meerbaum-Salant O, Armoni M, Ben-Ari M (2011) Habits of programming in Scratch. In: Proceedings of the 16th annual joint conference on Innovation and technology in computer science education, pp 168–172
[58]
Miller BP, Koski D, Lee CP, Maganty V, Murthy R, Natarajan A, Steidl J (1995) Fuzz revisited: a re-examination of the reliability of unix utilities and services. Technical report, University of Wisconsin-Madison department of computer sciences
[59]
Mirzaei N, Malek S, Păsăreanu CS, Esfahani N, and Mahmood R Testing android apps through symbolic execution ACM SIGSOFT Softw Eng Notes 2012 37 6 1-5
[60]
Moreno-León J, Robles G (2015) Dr. Scratch: a web tool to automatically evaluate Scratch projects. In: Proc. WIPSCE, pp 132?-133
[61]
Obermüller F, Bloch L, Greifenstein L, Heuer U, Fraser G (2021) Code perfumes: reporting good code to encourage learners. In: The 16th workshop in primary and secondary computing education, pp 1–10
[62]
Offutt AJ, Lee A, Rothermel G, Untch RH, and Zapf C An experimental determination of sufficient mutant operators ACM Transactions on Software Engineering and Methodology (TOSEM) 1996 5 2 99-118
[63]
O’Neill M and Ryan C Grammatical evolution IEEE Trans Evol Comput 2001 5 4 349-358
[64]
Panichella A, Kifetew FM, Tonella P (2015) Reformulating branch coverage as a many-objective optimization problem. In: 8th IEEE international conference on software testing, verification and validation (ICST), pp 1-?10
[65]
Panichella A, Kifetew FM, and Tonella P A large scale empirical comparison of state-of-the-art search-based test case generators Inf Softw Technol 2018 104 236-256
[66]
Salvesen K, Galeotti JP, Gross F, Fraser G, Zeller A (2015) Using dynamic symbolic execution to generate inputs in search-based gui testing. In: 2015 IEEE/ACM 8th international workshop on search-based software testing. IEEE, pp 32–35
[67]
Papert S Mindstorms; Children Computers and Powerful Ideas 1980 New York Basic Book
[68]
Shamshiri S, Just R, Rojas JM, Fraser G, McMinn P, Arcuri A (2015) Do automatically generated unit tests find real faults? an empirical study of effectiveness and challenges (t). In: 2015 30th IEEE/ACM international conference on automated software engineering (ASE). IEEE, pp 201–211
[69]
Shamshiri S, Rojas JM, Gazzola L, Fraser G, McMinn P, Mariani L, and Arcuri A Random or evolutionary search for object-oriented test suite generation? Softw Test Verification Reliability 2018 28 4 e1660
[70]
Shute VJ Focus on formative feedback Rev Educ Res 2008 78 1 153-189
[71]
Sirkiä T, Sorva J (2012) Exploring programming misconceptions: an analysis of student mistakes in visual program simulation exercises. In: Proceedings of the 12th Koli calling international conference on computing education research, pp 19–28
[72]
Stahlbauer A, Kreis M, Fraser G (2019) Testing Scratch programs automatically. In: ESEC/SIGSOFT FSE. ACM, pp 165–175
[73]
Su T, Meng G, Chen Y, Wu K, Yang W, Yao Y, Pu G, Liu Y, Su Z (2017) Guided, stochastic model-based gui testing of android apps. In: Proceedings of the 2017 11th joint meeting on foundations of software engineering, pp 245–256
[74]
Techapalokul P, Tilevich E (2017b) Quality hound—an online code smell analyzer for Scratch programs. In: IEEE 2017 symposium on visual languages and human-centric computing (VL/HCC). IEEE, pp 337–338
[75]
Techapalokul P, Tilevich E (2017b) Understanding recurring quality problems and their impact on code sharing in block-based software. In: Proceedings VL/HCC. IEEE, pp 43–51
[76]
Vogl S, Schweikl S, Fraser G (2021) Encoding the certainty of boolean variables to improve the guidance for search-based test generation. In: Chicano F, Krawiec K (eds) GECCO ’21: genetic and evolutionary computation conference. ACM, Lille, France, 10-14 Jul 2021, pp 1088–1096
[77]
Wang W, Fraser G, Barnes T, Martens C, Price T (2021a) Execution-trace-based feature engineering to enable formative feedback on visual, interactive programs educational data mining in computer science education (CSEDM) workshop @ EDM’21
[78]
Wang W, Zhang C, Stahlbauer A, Fraser G, Price T (2021b) Snapcheck: automated testing for snap programs. In: Proceedings of the 26th ACM conference on innovation and technology in computer science education, ITiCSE ’21. ACM, pp 227–233
[79]
Wegener J, Baresel A, and Sthamer H Evolutionary test environment for automatic structural testing Inf Softw Technol 2001 43 14 841-854
[80]
Weintrop D and Wilensky U Comparing block-based and text-based programming in high school computer science classrooms ACM Trans Comput Educ 2017 18 1 3
[81]
Xie T (2006) Augmenting automatically generated unit-test suites with regression oracle checking. In: European conference on object-oriented programming. Springer, pp 380–403

Cited By

View all
  • (2024)A Block-Based Testing Framework for ScratchProceedings of the 24th Koli Calling International Conference on Computing Education Research10.1145/3699538.3699547(1-12)Online publication date: 12-Nov-2024
  • (2024)Extending Unit Test Cases with Graphical Comparison in Blocks-based Programming LanguagesProceedings of the 2024 ACM Conference on International Computing Education Research - Volume 210.1145/3632621.3671428(541-541)Online publication date: 12-Aug-2024
  • (2023)PlayTest: A Gamified Test Generator for GamesProceedings of the 2nd International Workshop on Gamification in Software Development, Verification, and Validation10.1145/3617553.3617884(47-51)Online publication date: 4-Dec-2023

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Empirical Software Engineering
Empirical Software Engineering  Volume 28, Issue 3
May 2023
845 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 13 May 2023
Accepted: 20 October 2022

Author Tags

  1. Search-based testing
  2. Block-based programming
  3. Scratch

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)A Block-Based Testing Framework for ScratchProceedings of the 24th Koli Calling International Conference on Computing Education Research10.1145/3699538.3699547(1-12)Online publication date: 12-Nov-2024
  • (2024)Extending Unit Test Cases with Graphical Comparison in Blocks-based Programming LanguagesProceedings of the 2024 ACM Conference on International Computing Education Research - Volume 210.1145/3632621.3671428(541-541)Online publication date: 12-Aug-2024
  • (2023)PlayTest: A Gamified Test Generator for GamesProceedings of the 2nd International Workshop on Gamification in Software Development, Verification, and Validation10.1145/3617553.3617884(47-51)Online publication date: 4-Dec-2023

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media