research-article

Enumerating Valid Non-Alpha-Equivalent Programs for Interpreter Testing

Authors:

James A. Jones,

Baowen XuAuthors Info & Claims

ACM Transactions on Software Engineering and Methodology, Volume 33, Issue 5

Article No.: 118, Pages 1 - 31

https://doi.org/10.1145/3647994

Published: 04 June 2024 Publication History

Abstract

Skeletal program enumeration (SPE) can generate a great number of test programs for validating the correctness of compilers or interpreters. The classic SPE generates programs by exhaustively enumerating all possible variable usage patterns into a given syntactic structure. Even though it is capable of producing many test programs, the exhaustive enumeration strategy generates a large number of invalid programs, which may waste plenty of testing time and resources. To address the problem, this article proposes a tree-based SPE technique. Compared to the state-of-the-art, the key merit of the tree-based approach is that it allows us to take the dependency information into consideration when producing test programs and, thus, make it possible to (1) directly generate non-equivalent programs and (2) apply dominance relations to eliminate invalid test programs that have undefined variables. Hence, our approach significantly saves the cost of the naïve SPE approach. We have implemented our approach into an automated testing tool, IFuzzer, and applied it to test eight different implementations of Python interpreters, including CPython, PyPy, IronPython, Jython, RustPython, GPython, Pyston, and Codon. In three months of fuzzing, IFuzzer detected 142 bugs, of which 87 have been confirmed to be previously unknown bugs, of which 34 have been fixed. Compared to the state-of-the-art SPE techniques, IFuzzer takes only 61.0% of the time cost given the same number of testing seeds and improves 5.3% source code function coverage in the same time budget of testing.

References

[1]

Alfred V. Aho, Monica S. Lam, Ravi Sethi, and Jeffrey D. Ullman. 2007. Compilers: Principles, Techniques, and Tools. Pearson Addison Wesley.

Digital Library

[2]

Marcel Böhme, Van-Thuan Pham, Manh-Dung Nguyen, and Abhik Roychoudhury. 2017. Directed greybox fuzzing. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security. 2329–2344.

Digital Library

[3]

Marcel Böhme, Van-Thuan Pham, and Abhik Roychoudhury. 2017. Coverage-based greybox fuzzing as Markov chain. IEEE Trans. Softw. Eng. 45, 5 (2017), 489–506.

[4]

Konstantin Böttinger, Patrice Godefroid, and Rishabh Singh. 2018. Deep reinforcement fuzzing. In Proceedings of the IEEE Security and Privacy Workshops (SPW’18). IEEE, 116–122.

[5]

Abdulazeez S. Boujarwah and Kassem Saleh. 1997. Compiler test case generation methods: A survey and assessment. Inf. Softw. Technol. 39, 9 (1997), 617–625.

[6]

Forrest Briggs and Melissa O’Neill. 2008. Functional genetic programming and exhaustive program search with combinator expressions. Int. J. Knowl.-based Intell. Eng. Syst. 12, 1 (2008), 47–68.

Digital Library

[7]

Stephen Cass. 2021. Top Programming Languages 2021; Python dominates as the de facto platform for new technologies. Retrieved from: https://spectrum.ieee.org/top-programming-languages-2021

[8]

Adam Cecchetti and Michael Eddington. 2018. Whitebox Network Fuzzing. US Patent 9,910,766.

[9]

Junjie Chen, Wenxiang Hu, Dan Hao, Yingfei Xiong, Hongyu Zhang, Lu Zhang, and Bing Xie. 2016. An empirical comparison of compiler testing techniques. In Proceedings of the 38th International Conference on Software Engineering. 180–190.

Digital Library

[10]

Junjie Chen, Jibesh Patra, Michael Pradel, Yingfei Xiong, Hongyu Zhang, Dan Hao, and Lu Zhang. 2020. A survey of compiler testing. ACM Comput. Surv. 53, 1 (2020), 1–36.

Digital Library

[11]

Yang Chen, Alex Groce, Chaoqiang Zhang, Weng-Keen Wong, Xiaoli Fern, Eric Eide, and John Regehr. 2013. Taming compiler fuzzers. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation. 197–208.

Digital Library

[12]

Shafiul Azam Chowdhury, Sohil Lal Shrestha, Taylor T. Johnson, and Christoph Csallner. 2020. SLEMI: Equivalence modulo input (EMI) based mutation of CPS models for finding compiler bugs in Simulink. In Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering (ICSE’20). IEEE, 335–346.

Digital Library

[13]

William Jay Conover. 1999. Practical Nonparametric Statistics. Vol. 350. John Wiley & Sons.

[14]

CPython. 2023. CPython. Retrieved from https://github.com/python/cpython

[15]

Chris Cummins, Pavlos Petoumenos, Alastair Murray, and Hugh Leather. 2018. Compiler fuzzing through deep learning. In Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis. 95–105.

Digital Library

[16]

Pascal Cuoq, Benjamin Monate, Anne Pacalet, Virgile Prevosto, John Regehr, Boris Yakobowski, and Xuejun Yang. 2012. Testing static analyzers with randomly generated programs. In Proceedings of the NASA Formal Methods Symposium. Springer, 120–125.

Digital Library

[17]

Python Issue Tracker Devguide. 2022. Retrieved from https://devguide.python.org/triaging/

[18]

V. D’Silva, M. Payer, and D. Song. 2015. The correctness-security gap in compiler optimization. In Proceedings of the Security & Privacy Workshops.

[19]

Steven R. Finch. 2003. Mathematical Constants. Cambridge University Press.

[20]

Patrice Godefroid, Hila Peleg, and Rishabh Singh. 2017. Learn&fuzz: Machine learning for input fuzzing. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE’17). IEEE, 50–59.

Digital Library

[21]

gpython. 2023. gpython. Retrieved from https://github.com/go-python/gpython

[22]

Christian Holler, Kim Herzig, and Andreas Zeller. 2012. Fuzzing with code fragments. In Proceedings of the 21st USENIX Security Symposium (USENIX Security’12). 445–458.

[23]

IFuzzer. 2023. Retrieved from https://github.com/xiaxinmeng/IFuzzer

[24]

IronPython. 2023. IronPython. Retrieved from https://ironpython.net

[25]

ISO. 2011. ISO/IEC 9899:2011 Information technology—Programming languages—C. International Organization for Standardization, Geneva, Switzerland. Retrieved from http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=57853

[26]

Bo Jiang, Xiaoyan Wang, W. K. Chan, T. H. Tse, Na Li, Yongfeng Yin, and Zhenyu Zhang. 2020. CUDAsmith: A fuzzer for CUDA compilers. In Proceedings of the IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC’20). IEEE, 861–871.

[27]

Jython. 2023. Jython. Retrieved from https://www.jython.org

[28]

Susumu Katayama. 2005. Systematic search for lambda expressions. Trends Function. Program. 6 (2005), 111–126.

[29]

Susumu Katayama. 2008. Efficient exhaustive generation of functional programs using Monte-Carlo search with iterative deepening. In Proceedings of the Pacific Rim International Conference on Artificial Intelligence. Springer, 199–210.

Digital Library

[30]

Susumu Katayama. 2012. An analytical inductive functional programming system that avoids unintended programs. In Proceedings of the ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation. 43–52.

[31]

Donald Ervin Knuth. 1975. The Art of Computer Programming. 1. Fundamental Algorithms. Addison-Wesley.

[32]

Donald E. Knuth. 1997. The Art of Computer Programming, Vol. 1: Fundamental Algorithms (3rd. ed.). Addison Wesley Longman Publishing Co., Inc.

Digital Library

[33]

Donald L. Kreher and Douglas R. Stinson. 1999. Combinatorial algorithms: generation, enumeration, and search. ACM SIGACT News 30, 1 (1999), 33–35.

Digital Library

[34]

Vu Le, Mehrdad Afshari, and Zhendong Su. 2014. Compiler validation via equivalence modulo inputs. ACM SIGPLAN Not. 49, 6 (2014), 216–226.

Digital Library

[35]

Vu Le, Chengnian Sun, and Zhendong Su. 2015. Finding deep compiler bugs via guided stochastic program mutation. ACM SIGPLAN Not. 50, 10 (2015), 386–399.

Digital Library

[36]

Vu Le, Chengnian Sun, and Zhendong Su. 2015. Randomized stress-testing of link-time optimizers. In Proceedings of the International Symposium on Software Testing and Analysis. 327–337.

Digital Library

[37]

Christopher Lidbury, Andrei Lascu, Nathan Chong, and Alastair F. Donaldson. 2015. Many-core compiler fuzzing. ACM SIGPLAN Not. 50, 6 (2015), 65–76.

Digital Library

[38]

Vsevolod Livinskii, Dmitry Babokin, and John Regehr. 2020. Random testing for C and C++ compilers with YARPGen. Proc. ACM Program. Lang. 4, OOPSLA (2020), 1–25.

Digital Library

[39]

H. B. Mann and D. R. Whitney. 1947. On a test of whether one of two random variables is stochastically larger than the other. Annals of Mathematical Statistics 18, 1 (1947), 50–60.

[40]

Lorenzo Martignoni, Roberto Paleari, Alessandro Reina, Giampaolo Fresi Roglia, and Danilo Bruschi. 2013. A methodology for testing CPU emulators. ACM Trans. Softw. Eng. Methodol. 22, 4 (2013), 1–26.

Digital Library

[41]

William M. McKeeman. 1998. Differential testing for software. Digit. Technic. J. 10, 1 (1998), 100–107.

[42]

Barton P. Miller, Louis Fredriksen, and Bryan So. 1990. An empirical study of the reliability of UNIX utilities. Commun. ACM 33, 12 (1990), 32–44.

Digital Library

[43]

Eriko Nagai, Atsushi Hashimoto, and Nagisa Ishiura. 2014. Reinforcing random testing of arithmetic optimization of C compilers by scaling up size and number of expressions. IPSJ Trans. Syst. LSI Des. Methodol. 7 (2014), 91–100.

[44]

Flemming Nielson, Hanne R. Nielson, and Chris Hankin. 2004. Principles of Program Analysis. Springer Science & Business Media.

Digital Library

[45]

Frank W. J. Olver, Daniel W. Lozier, Ronald F. Boisvert, and Charles W. Clark. 2010. NIST Handbook of Mathematical Functions Hardback and CD-ROM. Cambridge University Press.

Digital Library

[46]

PyPy. 2023. PyPy. Retrieved from https://www.pypy.org

[47]

Pyston. 2023. Pyston. Retrieved from https://www.pyston.org

[48]

RustPython. 2023. RustPython. Retrieved from https://rustpython.github.io

[49]

Kostya Serebryany. 2015. LibFuzzer. LibFuzzer: A library for coverage-guided fuzz testing. http://llvm.org/docs/LibFuzzer.html. Last Accessed: 2023-03-03

[50]

Ariya Shajii, Gabriel Ramirez, Haris Smajlović, Jessica Ray, Bonnie Berger, Saman Amarasinghe, and Ibrahim Numanagić. 2023. Codon: A compiler for high-performance Pythonic applications and DSLs. In Proceedings of the 32nd ACM SIGPLAN International Conference on Compiler Construction. 191–202.

Digital Library

[51]

K. R. Srinath. 2017. Python—The fastest growing programming language. Int. Res. J. Eng. Technol. 4, 12 (2017), 354–357.

[52]

Kevin Sullivan, Jinlin Yang, David Coppit, Sarfraz Khurshid, and Daniel Jackson. 2004. Software assurance by bounded exhaustive testing. In Proceedings of the International Symposium on Software Testing and Analysis. 133–142.

Digital Library

[53]

Chengnian Sun, Vu Le, and Zhendong Su. 2016. Finding and analyzing compiler warning defects. In Proceedings of the IEEE/ACM 38th International Conference on Software Engineering (ICSE’16). IEEE, 203–213.

Digital Library

[54]

Chengnian Sun, Vu Le, Qirun Zhang, and Zhendong Su. 2016. Toward understanding compiler bugs in GCC and LLVM. In Proceedings of the 25th International Symposium on Software Testing and Analysis. 294–305.

Digital Library

[55]

Yixuan Tang, Zhilei Ren, Weiqiang Kong, and He Jiang. 2020. Compiler testing: A systematic literature analysis. Front. Comput. Sci. 14, 1 (2020), 1–20.

Digital Library

[56]

Junjie Wang, Bihuan Chen, Lei Wei, and Yang Liu. 2017. Skyfire: Data-driven seed generation for fuzzing. In Proceedings of the IEEE Symposium on Security and Privacy (SP’17). IEEE, 579–594.

[57]

Xi Wang, Nickolai Zeldovich, M. Frans Kaashoek, and Armando Solar-Lezama. 2013. Towards optimization-safe systems: Analyzing the impact of undefined behavior. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP’13). ACM, 260–275.

Digital Library

[58]

Ziyuan Wang, Dexin Bu, Aiyue Sun, Shanyi Gou, Yong Wang, and Lin Chen. 2022. An empirical study on bugs in python interpreters. IEEE Transactions on Reliability 71, 2 (2022), 716–734.

[59]

Xuejun Yang, Yang Chen, Eric Eide, and John Regehr. 2011. Finding and understanding bugs in C compilers. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation. 283–294.

Digital Library

[60]

Michal Zalewski. 2007. American fuzzy lop. Retrieved from http://lcamtuf.coredump.cx/afl

[61]

Qirun Zhang, Chengnian Sun, and Zhendong Su. 2017. Skeletal program enumeration for rigorous compiler testing. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation. 347–361.

Digital Library

[62]

Zhide Zhou, Zhilei Ren, Guojun Gao, and He Jiang. 2021. An empirical study of optimization bugs in GCC and LLVM. J. Syst. Softw. 174 (2021), 110884.

Index Terms

Enumerating Valid Non-Alpha-Equivalent Programs for Interpreter Testing
1. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis
        Software testing and debugging

Recommendations

Semi-valid input coverage for fuzz testing
ISSTA 2013: Proceedings of the 2013 International Symposium on Software Testing and Analysis

We define semi-valid input coverage (SVCov), the first coverage criterion for fuzz testing. Our criterion is applicable whenever the valid inputs can be defined by a finite set of constraints. SVCov measures to what extent the tests cover the domain of ...
Turning programs against each other: high coverage fuzz-testing using binary-code mutation and dynamic slicing
ESEC/FSE 2015: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering

Mutation-based fuzzing is a popular and widely employed black-box testing technique for finding security and robustness bugs in software. It owes much of its success to its simplicity; a well-formed seed input is mutated, e.g. through random bit-...
Guiding Greybox Fuzzing with Mutation Testing
ISSTA 2023: Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis

Greybox fuzzing and mutation testing are two popular but mostly independent fields of software testing research that have so far had limited overlap. Greybox fuzzing, generally geared towards searching for new bugs, predominantly uses code coverage ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Software Engineering and Methodology

ACM Transactions on Software Engineering and Methodology Volume 33, Issue 5

June 2024

952 pages

EISSN:1557-7392

DOI:10.1145/3618079

Editor:
Mauro Pezzè
USI Università della Svizzera italiana and SIT Schaffhausen Institute of Technology, Switzerland

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 June 2024

Online AM: 12 February 2024

Accepted: 23 January 2024

Revised: 03 December 2023

Received: 08 July 2023

Published in TOSEM Volume 33, Issue 5

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
245
Total Downloads

Downloads (Last 12 months)245
Downloads (Last 6 weeks)26

Reflects downloads up to 22 Sep 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents