Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3611643.3616296acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article
Open access

Semantic Debugging

Published: 30 November 2023 Publication History
  • Get Citation Alerts
  • Abstract

    Why does my program fail? We present a novel and general technique to automatically determine failure causes and conditions, using logical properties over input elements: “The program fails if and only if int(<length>) > len(<payload>) holds—that is, the given <length> is larger than the <payload> length.” Our AVICENNA prototype uses modern techniques for inferring properties of passing and failing inputs and validating and refining hypotheses by having a constraint solver generate supporting test cases to obtain such diagnoses. As a result, AVICENNA produces crisp and expressive diagnoses even for complex failure conditions, considerably improving over the state of the art with diagnoses close to those of human experts.

    Supplementary Material

    Video (fse23main-p446-p-video.mp4)
    "Automatic query reformulation is a widely used technology to enhance code search results, by formulating as a machine translation problem of rewriting a query into a more comprehensive alternative. While showing promising results, it typically requires a large parallel corpus of query pairs (i.e., the original query and a reformulated query) that are confidential and unpublished by online code search engines. This restricts its practicality in software development. In this paper, we propose SSQR, a self-supervised query reformulation method that does not rely on any parallel query corpus. Inspired by pre-trained models, SSQR treats query reformulation as a masked language modeling task over a large-scale unlabelled corpus of queries. SSQR extends T5 (a sequence-to-sequence model based on Transformer) with a new pre-training objective named corrupted query completion (CQC), which randomly masks words from a complete query and asks T5 to predict the masked content. Then, for a given query to be reformulated, SSQR enumerates candidate positions to be expanded and employs the pre-trained T5 model to generate the content to fill the spans. Finally, SSQR selects expansions that have the most information gain. Our evaluation shows that SSQR significantly outperforms unsupervised baselines and gains competitive performance over supervised methods."

    References

    [1]
    Remita Amine. 2021. youtube-dl. https://github.com/ytdl-org/youtube-dl
    [2]
    Cornelius Aschermann, Tommaso Frassetto, Thorsten Holz, Patrick Jauernig, Ahmad-Reza Sadeghi, and Daniel Teuchert. 2019. NAUTILUS: Fishing for Deep Bugs with Grammars. In 26th Annual Network and Distributed System Security Symposium, NDSS 2019. The Internet Society. https://www.ndss-symposium.org/ndss-paper/nautilus-fishing-for-deep-bugs-with-grammars/
    [3]
    Marcel Böhme, Ezekiel Olamide Soremekun, Sudipta Chattopadhyay, Emamurho Ugherughe, and Andreas Zeller. 2017. Where is the Bug and How is it Fixed? An Experiment with Practitioners. In Proceedings of the 11th Joint meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE 2017). 1–11. https://dbgbench.github.io/
    [4]
    Eugen Cepoi. 2017. Genson. https://github.com/owlike/genson
    [5]
    Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’16). ACM, New York, NY, USA. 785–794. isbn:978-1-4503-4232-2 https://doi.org/10.1145/2939672.2939785
    [6]
    Martin Eberlein, Yannic Noller, Thomas Vogel, and Lars Grunske. 2020. Evolutionary Grammar-Based Fuzzing. In Proceedings of the 12th Symposium on Search-Based Software Engineering (SSBSE 2020).
    [7]
    Martin Eberlein, Marius Smytzek, Dominic Steinhöfel, Lars Grunske, and Andreas Zeller. 2023. https://doi.org/10.1145/3580408
    [8]
    Michael D. Ernst, Jake Cockrell, William G. Griswold, and David Notkin. 2001. Dynamically Discovering Likely Program Invariants to Support Program Evolution. IEEE Trans. Software Eng., 27, 2 (2001), 99–123. https://doi.org/10.1109/32.908957
    [9]
    John K. Feser, Swarat Chaudhuri, and Isil Dillig. 2015. Synthesizing Data Structure Transformations from Input-Output Examples. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI) (PLDI ’15). Association for Computing Machinery, New York, NY, USA. 229–239. isbn:9781450334686 https://doi.org/10.1145/2737924.2737977
    [10]
    Anil Gajawada. 2016. Heartbleed bug: How it works and how to avoid similar bugs. https://www.synopsys.com/blogs/software-security/heartbleed-bug/
    [11]
    Patrice Godefroid, Adam Kiezun, and Michael Y. Levin. 2008. Grammar-Based Whitebox Fuzzing. In Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’08). Association for Computing Machinery, New York, NY, USA. 206–215. isbn:9781595938602 https://doi.org/10.1145/1375581.1375607
    [12]
    Patrice Godefroid, Hila Peleg, and Rishabh Singh. 2017. Learn&Fuzz: Machine Learning for Input Fuzzing. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE 2017). IEEE Press, Piscataway, NJ, USA. 50–59. isbn:978-1-5386-2684-9 http://dl.acm.org/citation.cfm?id=3155562.3155573
    [13]
    Rahul Gopinath, Alexander Kampmann, Nikolas Havrikov, Ezekiel O. Soremekun, and Andreas Zeller. 2020. Abstracting Failure-Inducing Inputs. In ACM International Symposium on Software Testing and Analysis (ISSTA). ACM, 237–248. isbn:9781450380089 https://doi.org/10.1145/3395363.3397349
    [14]
    Claire Le Goues, ThanhVu Nguyen, Stephanie Forrest, and Westley Weimer. 2012. GenProg: A Generic Method for Automatic Software Repair. IEEE Trans. Software Eng., 38, 1 (2012), 54–72. https://doi.org/10.1109/TSE.2011.104
    [15]
    Audrey Roy Greenfeld. 2022. Cookiecutter. https://github.com/cookiecutter/cookiecutter
    [16]
    Sumit Gulwani, Oleksandr Polozov, and Rishabh Singh. 2017. Program Synthesis. Foundations and Trends in Programming Languages, 4, 1-2 (2017), 1–119. issn:2325-1107 https://doi.org/10.1561/2500000010
    [17]
    Satia Herfert, Jibesh Patra, and Michael Pradel. 2017. Automatically reducing tree-structured test inputs. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering, ASE 2017, Urbana, IL, USA, October 30 - November 03, 2017, Grigore Rosu, Massimiliano Di Penta, and Tien N. Nguyen (Eds.). IEEE Computer Society, 861–871. https://doi.org/10.1109/ASE.2017.8115697
    [18]
    John E Hopcroft, Rajeev Motwani, and Jeffrey D Ullman. 2001. Introduction to automata theory, languages, and computation. Acm Sigact News, 32, 1 (2001), 60–65.
    [19]
    James A. Jones and Mary Jean Harrold. 2005. Empirical Evaluation of the Tarantula Automatic Fault-Localization Technique. In Proceedings of the 20th IEEE/ACM International Conference on Automated Software Engineering (ASE ’05). Association for Computing Machinery, New York, NY, USA. 273–282. isbn:1581139934 https://doi.org/10.1145/1101908.1101949
    [20]
    Alexander Kampmann, Nikolas Havrikov, Ezekiel Soremekun, and Andreas Zeller. 2020. When does my Program do this? Learning Circumstances of Software Behavior. In Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE). https://doi.org/10.1145/3368089.3409687
    [21]
    Fitsum Meshesha Kifetew, Roberto Tiella, and špace 0mm Paolo Tonella. 2017. Generating valid grammar-based test inputs by means of genetic programming and annotated grammars. Empirical Software Engineering, 22, 2 (2017), 928–961. https://doi.org/10.1007/s10664-015-9422-4
    [22]
    Lukas Kirschner, Ezekiel O. Soremekun, and Andreas Zeller. 2020. Debugging inputs. In ICSE ’20: 42nd International Conference on Software Engineering, Seoul, South Korea, 27 June - 19 July, 2020, Gregg Rothermel and Doo-Hwan Bae (Eds.). ACM, 75–86. https://doi.org/10.1145/3377811.3380329
    [23]
    Ákos Kiss, Renáta Hodován, and Tibor Gyimóthy. 2018. HDDr: a recursive variant of the hierarchical Delta Debugging algorithm. In Proceedings of the 9th ACM SIGSOFT International Workshop on Automating TEST Case Design, Selection, and Evaluation, A-TEST@SIGSOFT FSE 2018, Lake Buena Vista, FL, USA, November 05, 2018, Wishnu Prasetya, Tanja E. J. Vos, and Sinem Getir (Eds.). ACM, 16–22. https://doi.org/10.1145/3278186.3278189
    [24]
    Ben Liblit, Mayur Naik, Alice X. Zheng, Alex Aiken, and Michael I. Jordan. 2005. Scalable Statistical Bug Isolation. In Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’05). Association for Computing Machinery, New York, NY, USA. 15–26. isbn:1595930566 https://doi.org/10.1145/1065010.1065014
    [25]
    Fan Long and Martin Rinard. 2016. Automatic Patch Generation by Learning Correct Code. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL) (POPL ’16). Association for Computing Machinery, New York, NY, USA. 298–312. isbn:9781450335492 https://doi.org/10.1145/2837614.2837617
    [26]
    Scott M. Lundberg and Su-In Lee. 2017. A Unified Approach to Interpreting Model Predictions. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.). 4765–4774. https://proceedings.neurips.cc/paper/2017/hash/8a20a8621978632d76c43dfd28b67767-Abstract.html
    [27]
    Eric Markowitz. 2014. Behind the Scenes: The Crazy 72 Hours Leading Up to the Heartbleed Discovery. https://www.vocativ.com/tech/hacking/behind-scenes-crazy-72-hours-leading-heartbleed-discovery/
    [28]
    Sergey Mechtaev, Jooyong Yi, and Abhik Roychoudhury. 2016. Angelix: scalable multiline program patch synthesis via symbolic analysis. In Proceedings of the 38th International Conference on Software Engineering, ICSE 2016, Austin, TX, USA, May 14-22, 2016, Laura K. Dillon, Willem Visser, and Laurie A. Williams (Eds.). ACM, 691–701. https://doi.org/10.1145/2884781.2884807
    [29]
    Lee Naish, Hua Jie Lee, and Kotagiri Ramamohanarao. 2011. A model for spectra-based software diagnosis. ACM Trans. Softw. Eng. Methodol., 20, 3 (2011), 11:1–11:32. https://doi.org/10.1145/2000791.2000795
    [30]
    Terence Parr. 2013. The Definitive ANTLR 4 Reference (2nd ed.). Pragmatic Bookshelf. isbn:1934356999, 9781934356999
    [31]
    Spencer Pearson, José Campos, René Just, Gordon Fraser, Rui Abreu, Michael D. Ernst, Deric Pang, and Benjamin Keller. 2017. Evaluating and improving fault localization. In Proceedings of the 39th International Conference on Software Engineering, ICSE 2017, Buenos Aires, Argentina, May 20-28, 2017, Sebastián Uchitel, Alessandro Orso, and Martin P. Robillard (Eds.). IEEE / ACM, 609–620. https://doi.org/10.1109/ICSE.2017.62
    [32]
    Nadia Polikarpova, Ivan Kuraj, and Armando Solar-Lezama. 2016. Program Synthesis from Polymorphic Refinement Types. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI) (PLDI ’16). Association for Computing Machinery, New York, NY, USA. 522–538. isbn:9781450342612 https://doi.org/10.1145/2908080.2908093
    [33]
    Ram Rachum. 2019. PySnooper - Never use print for debugging again. https://github.com/cool-RR/pysnooper
    [34]
    Sebastián Ramírez. 2018. FastAPI. https://github.com/tiangolo/fastapi
    [35]
    John Regehr, Yang Chen, Pascal Cuoq, Eric Eide, Chucky Ellison, and Xuejun Yang. 2012. Test-case reduction for C compiler bugs. In ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’12, Beijing, China - June 11 - 16, 2012, Jan Vitek, Haibo Lin, and Frank Tip (Eds.). ACM, 335–346. https://doi.org/10.1145/2254064.2254104
    [36]
    Ridwan Salihin Shariffdeen, Yannic Noller, Lars Grunske, and Abhik Roychoudhury. 2021. Concolic program repair. In PLDI ’21: 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation, Virtual Event, Canada, June 20-25, 2021, Stephen N. Freund and Eran Yahav (Eds.). ACM, 390–405. https://doi.org/10.1145/3453483.3454051
    [37]
    Marius Smytzek, Martin Eberlein, Batuhan Serce, Lars Grunske, and Andreas Zeller. 2023. Tests4Py: A Benchmark for System Testing. arxiv:2307.05147.
    [38]
    Ezekiel Soremekun, Esteban Pavese, Nikolas Havrikov, Lars Grunske, and Andreas Zeller. 2020. Inputs from Hell: Learning Input Distributions for Grammar-Based Test Generation. IEEE Transactions on Software Engineering.
    [39]
    Dominic Steinhöfel and Andreas Zeller. 2022. Input Invariants. In Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE). https://publications.cispa.saarland/3596/
    [40]
    Chengnian Sun, Yuanbo Li, Qirun Zhang, Tianxiao Gu, and Zhendong Su. 2018. Perses: syntax-guided program reduction. In Proceedings of the 40th International Conference on Software Engineering, ICSE 2018, Gothenburg, Sweden, May 27 - June 03, 2018, Michel Chaudron, Ivica Crnkovic, Marsha Chechik, and Mark Harman (Eds.). ACM, 361–371. https://doi.org/10.1145/3180155.3180236
    [41]
    Aalok Thakkar, Aaditya Naik, Nathaniel Sands, Rajeev Alur, Mayur Naik, and Mukund Raghothaman. 2021. Example-Guided Synthesis of Relational Queries. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). Association for Computing Machinery, New York, NY, USA. 1110–1125. isbn:9781450383912 https://doi.org/10.1145/3453483.3454098
    [42]
    Dániel Vince, Renáta Hodován, Daniella Bársony, and Ákos Kiss. 2022. The effect of hoisting on variants of Hierarchical Delta Debugging. Journal of Software: Evolution and Process, online first (2022), https://doi.org/10.1002/smr.2483
    [43]
    Chenglong Wang, Alvin Cheung, and Rastislav Bodik. 2017. Synthesizing Highly Expressive SQL Queries from Input-Output Examples. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI) (PLDI 2017). Association for Computing Machinery, New York, NY, USA. 452–466. isbn:9781450349888 https://doi.org/10.1145/3062341.3062365
    [44]
    Junjie Wang, Bihuan Chen, Lei Wei, and Yang Liu. 2017. Skyfire: Data-Driven Seed Generation for Fuzzing. In 2017 IEEE Symposium on Security and Privacy, SP 2017, San Jose, CA, USA, May 22-26, 2017. IEEE Computer Society, 579–594. https://doi.org/10.1109/SP.2017.23
    [45]
    Junjie Wang, Bihuan Chen, Lei Wei, and Yang Liu. 2019. Superion: grammar-aware greybox fuzzing. In Proceedings of the 41st International Conference on Software Engineering, ICSE 2019. IEEE / ACM, 724–735. https://doi.org/10.1109/ICSE.2019.00081
    [46]
    Mark David Weiser. 1979. Program Slices: Formal, Psychological, and Practical Investigations of an Automatic Program Abstraction Method. Ph.D. Dissertation. USA. AAI8007856.
    [47]
    Ratnadira Widyasari, Sheng Qin Sim, Camellia Lok, Haodi Qi, Jack Phan, Qijin Tay, Constance Tan, Fiona Wee, Jodie Ethelda Tan, Yuheng Yieh, Brian Goh, Ferdian Thung, Hong Jin Kang, Thong Hoang, David Lo, and Eng Lieh Ouh. 2020. BugsInPy: a database of existing bugs in Python programs to enable controlled testing and debugging studies. In ESEC/FSE ’20: 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Virtual Event, USA, November 8-13, 2020, Prem Devanbu, Myra B. Cohen, and Thomas Zimmermann (Eds.). ACM, 1556–1560. https://doi.org/10.1145/3368089.3417943
    [48]
    Wikipedia. Accessed 2022-07-26. Avicenna. https://en.wikipedia.org/wiki/Avicenna
    [49]
    W. Eric Wong, Ruizhi Gao, Yihao Li, Rui Abreu, and Franz Wotawa. 2016. A Survey on Software Fault Localization. IEEE Trans. Software Eng., 42, 8 (2016), 707–740. https://doi.org/10.1109/TSE.2016.2521368
    [50]
    Jifeng Xuan, Matias Martinez, Favio Demarco, Maxime Clement, Sebastian R. Lamelas Marcote, Thomas Durieux, Daniel Le Berre, and Martin Monperrus. 2017. Nopol: Automatic Repair of Conditional Statement Bugs in Java Programs. IEEE Trans. Software Eng., 43, 1 (2017), 34–55. https://doi.org/10.1109/TSE.2016.2560811
    [51]
    Andreas Zeller, Rahul Gopinath, Marcel Böhme, Gordon Fraser, and Christian Holler. 2019. The Fuzzing Book. In The Fuzzing Book. Saarland University. https://www.fuzzingbook.org/
    [52]
    Andreas Zeller and Ralf Hildebrandt. 2002. Simplifying and Isolating Failure-Inducing Input. IEEE Trans. Software Eng., 28, 2 (2002), 183–200. https://doi.org/10.1109/32.988498
    [53]
    Alice Zheng, Michael Jordan, Ben Liblit, and Alex Aiken. 2003. Statistical Debugging of Sampled Programs. In Advances in Neural Information Processing Systems, S. Thrun, L. Saul, and B. Schölkopf (Eds.). 16, MIT Press. https://proceedings.neurips.cc/paper/2003/file/0a65e195cb51418279b6fa8d96847a60-Paper.pdf

    Cited By

    View all
    • (2024)Tests4Py: A Benchmark for System TestingCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering10.1145/3663529.3663798(557-561)Online publication date: 10-Jul-2024
    • (2024)Language-Based Software TestingCommunications of the ACM10.1145/363152067:4(80-84)Online publication date: 25-Mar-2024

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ESEC/FSE 2023: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering
    November 2023
    2215 pages
    ISBN:9798400703270
    DOI:10.1145/3611643
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 30 November 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Badges

    Author Tags

    1. behavior explanation
    2. debugging
    3. program behavior
    4. testing

    Qualifiers

    • Research-article

    Conference

    ESEC/FSE '23
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 112 of 543 submissions, 21%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)478
    • Downloads (Last 6 weeks)45
    Reflects downloads up to 11 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Tests4Py: A Benchmark for System TestingCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering10.1145/3663529.3663798(557-561)Online publication date: 10-Jul-2024
    • (2024)Language-Based Software TestingCommunications of the ACM10.1145/363152067:4(80-84)Online publication date: 25-Mar-2024

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media