Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Enumerating Valid Non-Alpha-Equivalent Programs for Interpreter Testing

Published: 04 June 2024 Publication History

Abstract

Skeletal program enumeration (SPE) can generate a great number of test programs for validating the correctness of compilers or interpreters. The classic SPE generates programs by exhaustively enumerating all possible variable usage patterns into a given syntactic structure. Even though it is capable of producing many test programs, the exhaustive enumeration strategy generates a large number of invalid programs, which may waste plenty of testing time and resources. To address the problem, this article proposes a tree-based SPE technique. Compared to the state-of-the-art, the key merit of the tree-based approach is that it allows us to take the dependency information into consideration when producing test programs and, thus, make it possible to (1) directly generate non-equivalent programs and (2) apply dominance relations to eliminate invalid test programs that have undefined variables. Hence, our approach significantly saves the cost of the naïve SPE approach. We have implemented our approach into an automated testing tool, IFuzzer, and applied it to test eight different implementations of Python interpreters, including CPython, PyPy, IronPython, Jython, RustPython, GPython, Pyston, and Codon. In three months of fuzzing, IFuzzer detected 142 bugs, of which 87 have been confirmed to be previously unknown bugs, of which 34 have been fixed. Compared to the state-of-the-art SPE techniques, IFuzzer takes only 61.0% of the time cost given the same number of testing seeds and improves 5.3% source code function coverage in the same time budget of testing.

References

[1]
Alfred V. Aho, Monica S. Lam, Ravi Sethi, and Jeffrey D. Ullman. 2007. Compilers: Principles, Techniques, and Tools. Pearson Addison Wesley.
[2]
Marcel Böhme, Van-Thuan Pham, Manh-Dung Nguyen, and Abhik Roychoudhury. 2017. Directed greybox fuzzing. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security. 2329–2344.
[3]
Marcel Böhme, Van-Thuan Pham, and Abhik Roychoudhury. 2017. Coverage-based greybox fuzzing as Markov chain. IEEE Trans. Softw. Eng. 45, 5 (2017), 489–506.
[4]
Konstantin Böttinger, Patrice Godefroid, and Rishabh Singh. 2018. Deep reinforcement fuzzing. In Proceedings of the IEEE Security and Privacy Workshops (SPW’18). IEEE, 116–122.
[5]
Abdulazeez S. Boujarwah and Kassem Saleh. 1997. Compiler test case generation methods: A survey and assessment. Inf. Softw. Technol. 39, 9 (1997), 617–625.
[6]
Forrest Briggs and Melissa O’Neill. 2008. Functional genetic programming and exhaustive program search with combinator expressions. Int. J. Knowl.-based Intell. Eng. Syst. 12, 1 (2008), 47–68.
[7]
Stephen Cass. 2021. Top Programming Languages 2021; Python dominates as the de facto platform for new technologies. Retrieved from: https://spectrum.ieee.org/top-programming-languages-2021
[8]
Adam Cecchetti and Michael Eddington. 2018. Whitebox Network Fuzzing. US Patent 9,910,766.
[9]
Junjie Chen, Wenxiang Hu, Dan Hao, Yingfei Xiong, Hongyu Zhang, Lu Zhang, and Bing Xie. 2016. An empirical comparison of compiler testing techniques. In Proceedings of the 38th International Conference on Software Engineering. 180–190.
[10]
Junjie Chen, Jibesh Patra, Michael Pradel, Yingfei Xiong, Hongyu Zhang, Dan Hao, and Lu Zhang. 2020. A survey of compiler testing. ACM Comput. Surv. 53, 1 (2020), 1–36.
[11]
Yang Chen, Alex Groce, Chaoqiang Zhang, Weng-Keen Wong, Xiaoli Fern, Eric Eide, and John Regehr. 2013. Taming compiler fuzzers. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation. 197–208.
[12]
Shafiul Azam Chowdhury, Sohil Lal Shrestha, Taylor T. Johnson, and Christoph Csallner. 2020. SLEMI: Equivalence modulo input (EMI) based mutation of CPS models for finding compiler bugs in Simulink. In Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering (ICSE’20). IEEE, 335–346.
[13]
William Jay Conover. 1999. Practical Nonparametric Statistics. Vol. 350. John Wiley & Sons.
[14]
CPython. 2023. CPython. Retrieved from https://github.com/python/cpython
[15]
Chris Cummins, Pavlos Petoumenos, Alastair Murray, and Hugh Leather. 2018. Compiler fuzzing through deep learning. In Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis. 95–105.
[16]
Pascal Cuoq, Benjamin Monate, Anne Pacalet, Virgile Prevosto, John Regehr, Boris Yakobowski, and Xuejun Yang. 2012. Testing static analyzers with randomly generated programs. In Proceedings of the NASA Formal Methods Symposium. Springer, 120–125.
[17]
Python Issue Tracker Devguide. 2022. Retrieved from https://devguide.python.org/triaging/
[18]
V. D’Silva, M. Payer, and D. Song. 2015. The correctness-security gap in compiler optimization. In Proceedings of the Security & Privacy Workshops.
[19]
Steven R. Finch. 2003. Mathematical Constants. Cambridge University Press.
[20]
Patrice Godefroid, Hila Peleg, and Rishabh Singh. 2017. Learn&fuzz: Machine learning for input fuzzing. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE’17). IEEE, 50–59.
[21]
gpython. 2023. gpython. Retrieved from https://github.com/go-python/gpython
[22]
Christian Holler, Kim Herzig, and Andreas Zeller. 2012. Fuzzing with code fragments. In Proceedings of the 21st USENIX Security Symposium (USENIX Security’12). 445–458.
[23]
[24]
IronPython. 2023. IronPython. Retrieved from https://ironpython.net
[25]
ISO. 2011. ISO/IEC 9899:2011 Information technology—Programming languages—C. International Organization for Standardization, Geneva, Switzerland. Retrieved from http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=57853
[26]
Bo Jiang, Xiaoyan Wang, W. K. Chan, T. H. Tse, Na Li, Yongfeng Yin, and Zhenyu Zhang. 2020. CUDAsmith: A fuzzer for CUDA compilers. In Proceedings of the IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC’20). IEEE, 861–871.
[27]
Jython. 2023. Jython. Retrieved from https://www.jython.org
[28]
Susumu Katayama. 2005. Systematic search for lambda expressions. Trends Function. Program. 6 (2005), 111–126.
[29]
Susumu Katayama. 2008. Efficient exhaustive generation of functional programs using Monte-Carlo search with iterative deepening. In Proceedings of the Pacific Rim International Conference on Artificial Intelligence. Springer, 199–210.
[30]
Susumu Katayama. 2012. An analytical inductive functional programming system that avoids unintended programs. In Proceedings of the ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation. 43–52.
[31]
Donald Ervin Knuth. 1975. The Art of Computer Programming. 1. Fundamental Algorithms. Addison-Wesley.
[32]
Donald E. Knuth. 1997. The Art of Computer Programming, Vol. 1: Fundamental Algorithms (3rd. ed.). Addison Wesley Longman Publishing Co., Inc.
[33]
Donald L. Kreher and Douglas R. Stinson. 1999. Combinatorial algorithms: generation, enumeration, and search. ACM SIGACT News 30, 1 (1999), 33–35.
[34]
Vu Le, Mehrdad Afshari, and Zhendong Su. 2014. Compiler validation via equivalence modulo inputs. ACM SIGPLAN Not. 49, 6 (2014), 216–226.
[35]
Vu Le, Chengnian Sun, and Zhendong Su. 2015. Finding deep compiler bugs via guided stochastic program mutation. ACM SIGPLAN Not. 50, 10 (2015), 386–399.
[36]
Vu Le, Chengnian Sun, and Zhendong Su. 2015. Randomized stress-testing of link-time optimizers. In Proceedings of the International Symposium on Software Testing and Analysis. 327–337.
[37]
Christopher Lidbury, Andrei Lascu, Nathan Chong, and Alastair F. Donaldson. 2015. Many-core compiler fuzzing. ACM SIGPLAN Not. 50, 6 (2015), 65–76.
[38]
Vsevolod Livinskii, Dmitry Babokin, and John Regehr. 2020. Random testing for C and C++ compilers with YARPGen. Proc. ACM Program. Lang. 4, OOPSLA (2020), 1–25.
[39]
H. B. Mann and D. R. Whitney. 1947. On a test of whether one of two random variables is stochastically larger than the other. Annals of Mathematical Statistics 18, 1 (1947), 50–60.
[40]
Lorenzo Martignoni, Roberto Paleari, Alessandro Reina, Giampaolo Fresi Roglia, and Danilo Bruschi. 2013. A methodology for testing CPU emulators. ACM Trans. Softw. Eng. Methodol. 22, 4 (2013), 1–26.
[41]
William M. McKeeman. 1998. Differential testing for software. Digit. Technic. J. 10, 1 (1998), 100–107.
[42]
Barton P. Miller, Louis Fredriksen, and Bryan So. 1990. An empirical study of the reliability of UNIX utilities. Commun. ACM 33, 12 (1990), 32–44.
[43]
Eriko Nagai, Atsushi Hashimoto, and Nagisa Ishiura. 2014. Reinforcing random testing of arithmetic optimization of C compilers by scaling up size and number of expressions. IPSJ Trans. Syst. LSI Des. Methodol. 7 (2014), 91–100.
[44]
Flemming Nielson, Hanne R. Nielson, and Chris Hankin. 2004. Principles of Program Analysis. Springer Science & Business Media.
[45]
Frank W. J. Olver, Daniel W. Lozier, Ronald F. Boisvert, and Charles W. Clark. 2010. NIST Handbook of Mathematical Functions Hardback and CD-ROM. Cambridge University Press.
[46]
PyPy. 2023. PyPy. Retrieved from https://www.pypy.org
[47]
Pyston. 2023. Pyston. Retrieved from https://www.pyston.org
[48]
RustPython. 2023. RustPython. Retrieved from https://rustpython.github.io
[49]
Kostya Serebryany. 2015. LibFuzzer. LibFuzzer: A library for coverage-guided fuzz testing. http://llvm.org/docs/LibFuzzer.html. Last Accessed: 2023-03-03
[50]
Ariya Shajii, Gabriel Ramirez, Haris Smajlović, Jessica Ray, Bonnie Berger, Saman Amarasinghe, and Ibrahim Numanagić. 2023. Codon: A compiler for high-performance Pythonic applications and DSLs. In Proceedings of the 32nd ACM SIGPLAN International Conference on Compiler Construction. 191–202.
[51]
K. R. Srinath. 2017. Python—The fastest growing programming language. Int. Res. J. Eng. Technol. 4, 12 (2017), 354–357.
[52]
Kevin Sullivan, Jinlin Yang, David Coppit, Sarfraz Khurshid, and Daniel Jackson. 2004. Software assurance by bounded exhaustive testing. In Proceedings of the International Symposium on Software Testing and Analysis. 133–142.
[53]
Chengnian Sun, Vu Le, and Zhendong Su. 2016. Finding and analyzing compiler warning defects. In Proceedings of the IEEE/ACM 38th International Conference on Software Engineering (ICSE’16). IEEE, 203–213.
[54]
Chengnian Sun, Vu Le, Qirun Zhang, and Zhendong Su. 2016. Toward understanding compiler bugs in GCC and LLVM. In Proceedings of the 25th International Symposium on Software Testing and Analysis. 294–305.
[55]
Yixuan Tang, Zhilei Ren, Weiqiang Kong, and He Jiang. 2020. Compiler testing: A systematic literature analysis. Front. Comput. Sci. 14, 1 (2020), 1–20.
[56]
Junjie Wang, Bihuan Chen, Lei Wei, and Yang Liu. 2017. Skyfire: Data-driven seed generation for fuzzing. In Proceedings of the IEEE Symposium on Security and Privacy (SP’17). IEEE, 579–594.
[57]
Xi Wang, Nickolai Zeldovich, M. Frans Kaashoek, and Armando Solar-Lezama. 2013. Towards optimization-safe systems: Analyzing the impact of undefined behavior. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP’13). ACM, 260–275.
[58]
Ziyuan Wang, Dexin Bu, Aiyue Sun, Shanyi Gou, Yong Wang, and Lin Chen. 2022. An empirical study on bugs in python interpreters. IEEE Transactions on Reliability 71, 2 (2022), 716–734.
[59]
Xuejun Yang, Yang Chen, Eric Eide, and John Regehr. 2011. Finding and understanding bugs in C compilers. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation. 283–294.
[60]
Michal Zalewski. 2007. American fuzzy lop. Retrieved from http://lcamtuf.coredump.cx/afl
[61]
Qirun Zhang, Chengnian Sun, and Zhendong Su. 2017. Skeletal program enumeration for rigorous compiler testing. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation. 347–361.
[62]
Zhide Zhou, Zhilei Ren, Guojun Gao, and He Jiang. 2021. An empirical study of optimization bugs in GCC and LLVM. J. Syst. Softw. 174 (2021), 110884.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Software Engineering and Methodology
ACM Transactions on Software Engineering and Methodology  Volume 33, Issue 5
June 2024
952 pages
EISSN:1557-7392
DOI:10.1145/3618079
  • Editor:
  • Mauro Pezzè
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 June 2024
Online AM: 12 February 2024
Accepted: 23 January 2024
Revised: 03 December 2023
Received: 08 July 2023
Published in TOSEM Volume 33, Issue 5

Check for updates

Author Tags

  1. Interpreter testing
  2. fuzz testing
  3. program enumeration

Qualifiers

  • Research-article

Funding Sources

  • National Natural Science Foundation of China

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 245
    Total Downloads
  • Downloads (Last 12 months)245
  • Downloads (Last 6 weeks)26
Reflects downloads up to 22 Sep 2024

Other Metrics

Citations

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media