Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3497775.3503691acmconferencesArticle/Chapter ViewAbstractPublication PagespoplConference Proceedingsconference-collections
research-article
Open access

CertiStr: a certified string solver

Published: 11 January 2022 Publication History

Abstract

Theories over strings are among the most heavily researched logical theories in the SMT community in the past decade, owing to the error-prone nature of string manipulations, which often leads to security vulnerabilities (e.g. cross-site scripting and code injection). The majority of the existing decision procedures and solvers for these theories are themselves intricate; they are complicated algorithmically, and also have to deal with a very rich vocabulary of operations. This has led to a plethora of bugs in implementation, which have for instance been discovered through fuzzing.
In this paper, we present CertiStr, a certified implementation of a string constraint solver for the theory of strings with concatenation and regular constraints. CertiStr aims to solve string constraints using a forward-propagation algorithm based on symbolic representations of regular constraints as symbolic automata, which returns three results: sat, unsat, and unknown, and is guaranteed to terminate for the string constraints whose concatenation dependencies are acyclic. The implementation has been developed and proven correct in Isabelle/HOL, through which an effective solver in OCaml was generated. We demonstrate the effectiveness and efficiency of CertiStr against the standard Kaluza benchmark, in which 80.4% tests are in the string constraint fragment of CertiStr. Of these 80.4% tests, CertiStr can solve 83.5% (i.e. CertiStr returns sat or unsat) within 60s.

References

[1]
Parosh Aziz Abdulla, Mohamed Faouzi Atig, Yu-Fang Chen, Lukás Holík, Ahmed Rezine, Philipp Rümmer, and Jari Stenman. 2014. String Constraints for Verification. In Computer Aided Verification - 26th International Conference, CAV 2014, Held as Part of the Vienna Summer of Logic, VSL 2014, Vienna, Austria, July 18-22, 2014. Proceedings, Armin Biere and Roderick Bloem (Eds.) (Lecture Notes in Computer Science, Vol. 8559). Springer, 150–166. https://doi.org/10.1007/978-3-319-08867-9_10
[2]
Roberto Amadini, Mak Andrlon, Graeme Gange, Peter Schachte, Harald Søndergaard, and Peter J. Stuckey. 2019. Constraint Programming for Dynamic Symbolic Execution of JavaScript. In Integration of Constraint Programming, Artificial Intelligence, and Operations Research - 16th International Conference, CPAIOR 2019, Thessaloniki, Greece, June 4-7, 2019, Proceedings, Louis-Martin Rousseau and Kostas Stergiou (Eds.) (Lecture Notes in Computer Science, Vol. 11494). Springer, 1–19. https://doi.org/10.1007/978-3-030-19212-9_1
[3]
John Backes, Pauline Bolignano, Byron Cook, Catherine Dodge, Andrew Gacek, Kasper Sœ Luckow, Neha Rungta, Oksana Tkachuk, and Carsten Varming. 2018. Semantic-based Automated Reasoning for AWS Access Policies using SMT. In 2018 Formal Methods in Computer Aided Design, FMCAD 2018, Austin, TX, USA, October 30 - November 2, 2018, Nikolaj Bjørner and Arie Gurfinkel (Eds.). IEEE, 1–9. https://doi.org/10.23919/FMCAD.2018.8602994
[4]
Murphy Berzish, Vijay Ganesh, and Yunhui Zheng. 2017. Z3str3: A string solver with theory-aware heuristics. In 2017 Formal Methods in Computer Aided Design, FMCAD 2017, Vienna, Austria, October 2-6, 2017, Daryl Stewart and Georg Weissenbacher (Eds.). IEEE, 55–59. https://doi.org/10.23919/FMCAD.2017.8102241
[5]
Murphy Berzish, Mitja Kulczynski, Federico Mora, Florin Manea, Joel D. Day, Dirk Nowotka, and Vijay Ganesh. 2021. An SMT Solver for Regular Expressions and Linear Arithmetic over String Length. In Computer Aided Verification - 33rd International Conference, CAV 2021, Virtual Event, July 20-23, 2021, Proceedings, Part II, Alexandra Silva and K. Rustan M. Leino (Eds.) (Lecture Notes in Computer Science, Vol. 12760). Springer, 289–312. https://doi.org/10.1007/978-3-030-81688-9_14
[6]
Dmitry Blotsky, Federico Mora, Murphy Berzish, Yunhui Zheng, Ifaz Kabir, and Vijay Ganesh. 2018. StringFuzz: A Fuzzer for String Solvers. In Computer Aided Verification - 30th International Conference, CAV 2018, Held as Part of the Federated Logic Conference, FloC 2018, Oxford, UK, July 14-17, 2018, Proceedings, Part II, Hana Chockler and Georg Weissenbacher (Eds.) (Lecture Notes in Computer Science, Vol. 10982). Springer, 45–51. https://doi.org/10.1007/978-3-319-96142-2_6
[7]
Julian Brunner. 2017. Transition Systems and Automata Isabelle Library. Arch. Formal Proofs. https://www.isa-afp.org/entries/Transition_Systems_and_Automata.html
[8]
Alexandra Bugariu and Peter Müller. 2020. Automatically testing string solvers. In ICSE ’20: 42nd International Conference on Software Engineering, Seoul, South Korea, 27 June - 19 July, 2020, Gregg Rothermel and Doo-Hwan Bae (Eds.). ACM, 1459–1470. https://doi.org/10.1145/3377811.3380398
[9]
Diep Bui and contributors. 2019. Z3-Trau. https://github.com/diepbp/z3-trau
[10]
Taolue Chen, Yan Chen, Matthew Hague, Anthony W. Lin, and Zhilin Wu. 2018. What is decidable about string constraints with the ReplaceAll function. Proc. ACM Program. Lang., 2, POPL (2018), 3:1–3:29. https://doi.org/10.1145/3158091
[11]
Taolue Chen, Alejandro Flores-Lamas, Matthew Hague, Zhilei Han, Denghang Hu, Shuanglong Kan, Anthony W. Lin, Philipp Ruemmer, and Zhilin Wu. 2022. Solving String Constraints with Regex-Dependent Functions through Transducers with Priorities and Variables. Proc. ACM Program. Lang., 6, POPL (2022).
[12]
Taolue Chen, Matthew Hague, Jinlong He, Denghang Hu, Anthony Widjaja Lin, Philipp Rümmer, and Zhilin Wu. 2020. A Decision Procedure for Path Feasibility of String Manipulating Programs with Integer Data Type. In Automated Technology for Verification and Analysis - 18th International Symposium, ATVA 2020, Hanoi, Vietnam, October 19-23, 2020, Proceedings. 325–342. https://doi.org/10.1007/978-3-030-59152-6_18
[13]
Taolue Chen, Matthew Hague, Anthony W. Lin, Philipp Rümmer, and Zhilin Wu. 2019. Decision procedures for path feasibility of string-manipulating programs with complex operations. Proc. ACM Program. Lang., 3, POPL (2019), 49:1–49:30. https://doi.org/10.1145/3290362
[14]
Lucas C. Cordeiro, Pascal Kesseli, Daniel Kroening, Peter Schrammel, and Marek Trtík. 2018. JBMC: A Bounded Model Checking Tool for Verifying Java Bytecode. In Computer Aided Verification - 30th International Conference, CAV 2018, Held as Part of the Federated Logic Conference, FloC 2018, Oxford, UK, July 14-17, 2018, Proceedings, Part I, Hana Chockler and Georg Weissenbacher (Eds.) (Lecture Notes in Computer Science, Vol. 10981). Springer, 183–190. https://doi.org/10.1007/978-3-319-96145-3_10
[15]
Loris D’Antoni and Margus Veanes. 2017. The Power of Symbolic Automata and Transducers. In Computer Aided Verification - 29th International Conference, CAV 2017, Heidelberg, Germany, July 24-28, 2017, Proceedings, Part I, Rupak Majumdar and Viktor Kuncak (Eds.) (Lecture Notes in Computer Science, Vol. 10426). Springer, 47–67. https://doi.org/10.1007/978-3-319-63387-9_3
[16]
Loris D’Antoni and Margus Veanes. 2021. Automata modulo theories. Commun. ACM, 64, 5 (2021), 86–95. https://doi.org/10.1145/3419404
[17]
Leonardo Mendonça de Moura and Nikolaj Bjørner. 2008. Z3: An Efficient SMT Solver. In Tools and Algorithms for the Construction and Analysis of Systems, 14th International Conference, TACAS 2008, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2008, Budapest, Hungary, March 29-April 6, 2008. Proceedings, C. R. Ramakrishnan and Jakob Rehof (Eds.) (Lecture Notes in Computer Science, Vol. 4963). Springer, 337–340. https://doi.org/10.1007/978-3-540-78800-3_24
[18]
Volker Diekert. 2002. Makanin’s Algorithm. In Algebraic Combinatorics on Words, M. Lothaire (Ed.) (Encyclopedia of Mathematics and its Applications, Vol. 90). Cambridge University Press, 387–442.
[19]
Burak Ekici, Guy Katz, Chantal Keller, Alain Mebsout, Andrew J. Reynolds, and Cesare Tinelli. 2016. Extending SMTCoq, a Certified Checker for SMT (Extended Abstract). In Proceedings First International Workshop on Hammers for Type Theories, HaTT@IJCAR 2016, Coimbra, Portugal, July 1, 2016, Jasmin Christian Blanchette and Cezary Kaliszyk (Eds.) (EPTCS, Vol. 210). 21–29. https://doi.org/10.4204/EPTCS.210.5
[20]
Claudio Gutiérrez. 1998. Solving Equations in Strings: On Makanin’s Algorithm. In LATIN ’98: Theoretical Informatics, Third Latin American Symposium, Campinas, Brazil, April, 20-24, 1998, Proceedings, Claudio L. Lucchesi and Arnaldo V. Moura (Eds.) (Lecture Notes in Computer Science, Vol. 1380). Springer, 358–373. https://doi.org/10.1007/BFb0054336
[21]
Matthew Hague, Anthony W. Lin, Philipp Rümmer, and Zhilin Wu. 2020. Monadic Decomposition in Integer Linear Arithmetic. In Automated Reasoning - 10th International Joint Conference, IJCAR 2020, Paris, France, July 1-4, 2020, Proceedings, Part I, Nicolas Peltier and Viorica Sofronie-Stokkermans (Eds.) (Lecture Notes in Computer Science, Vol. 12166). Springer, 122–140. https://doi.org/10.1007/978-3-030-51074-9_8
[22]
Lukás Holík, Petr Janku, Anthony W. Lin, Philipp Rümmer, and Tomás Vojnar. 2018. String constraints with concatenation and transducers solved efficiently. Proc. ACM Program. Lang., 2, POPL (2018), 4:1–4:32. https://doi.org/10.1145/3158092
[23]
Artur Jez. 2016. Recompression: A Simple and Powerful Technique for Word Equations. J. ACM, 63, 1 (2016), 4:1–4:51. https://doi.org/10.1145/2743014
[24]
Nils Klarlund, Anders Møller, and Michael I. Schwartzbach. 2002. MONA Implementation Secrets. Int. J. Found. Comput. Sci., 13, 4 (2002), 571–586. https://doi.org/10.1142/S012905410200128X
[25]
Peter Lammich. 2013. Automatic Data Refinement. In Interactive Theorem Proving - 4th International Conference, ITP 2013, Rennes, France, July 22-26, 2013. Proceedings, Sandrine Blazy, Christine Paulin-Mohring, and David Pichardie (Eds.) (Lecture Notes in Computer Science, Vol. 7998). Springer, 84–99. https://doi.org/10.1007/978-3-642-39634-2_9
[26]
Peter Lammich. 2014. The CAVA Automata Isabelle Library. Arch. Formal Proofs. https://www.isa-afp.org/entries/CAVA_Automata.html
[27]
Peter Lammich and Andreas Lochbihler. 2010. The Isabelle Collections Framework. In Interactive Theorem Proving, First International Conference, ITP 2010, Edinburgh, UK, July 11-14, 2010. Proceedings, Matt Kaufmann and Lawrence C. Paulson (Eds.) (Lecture Notes in Computer Science, Vol. 6172). Springer, 339–354. https://doi.org/10.1007/978-3-642-14052-5_24
[28]
Tianyi Liang, Andrew Reynolds, Cesare Tinelli, Clark W. Barrett, and Morgan Deters. 2014. A DPLL(T) Theory Solver for a Theory of Strings and Regular Expressions. In Computer Aided Verification - 26th International Conference, CAV 2014, Held as Part of the Vienna Summer of Logic, VSL 2014, Vienna, Austria, July 18-22, 2014. Proceedings, Armin Biere and Roderick Bloem (Eds.) (Lecture Notes in Computer Science, Vol. 8559). Springer, 646–662. https://doi.org/10.1007/978-3-319-08867-9_43
[29]
Anthony Widjaja Lin and Pablo Barceló. 2016. String solving with word equations and transducers: towards a logic for analysing mutation XSS. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2016, St. Petersburg, FL, USA, January 20 - 22, 2016, Rastislav Bodík and Rupak Majumdar (Eds.). ACM, 123–136. https://doi.org/10.1145/2837614.2837641
[30]
Gennady S Makanin. 1977. The problem of solvability of equations in a free semigroup. Sbornik: Mathematics, 32, 2 (1977), 129–198.
[31]
Muhammad Numair Mansur, Maria Christakis, Valentin Wüstholz, and Fuyuan Zhang. 2020. Detecting critical bugs in SMT solvers using blackbox mutational fuzzing. In ESEC/FSE ’20: 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Virtual Event, USA, November 8-13, 2020, Prem Devanbu, Myra B. Cohen, and Thomas Zimmermann (Eds.). ACM, 701–712. https://doi.org/10.1145/3368089.3409763
[32]
Yasuhiko Minamide. 2005. Static approximation of dynamically generated Web pages. In Proceedings of the 14th international conference on World Wide Web, WWW 2005, Chiba, Japan, May 10-14, 2005, Allan Ellis and Tatsuya Hagino (Eds.). ACM, 432–441. https://doi.org/10.1145/1060745.1060809
[33]
Tobias Nipkow, Lawrence C Paulson, and Markus Wenzel. 2002. Isabelle/HOL: a proof assistant for higher-order logic. 2283, Springer Science & Business Media.
[34]
Yannic Noller, Corina S. Pasareanu, Aymeric Fromherz, Xuan-Bach Dinh Le, and Willem Visser. 2019. Symbolic Pathfinder for SV-COMP - (Competition Contribution). In Tools and Algorithms for the Construction and Analysis of Systems - 25 Years of TACAS: TOOLympics, Held as Part of ETAPS 2019, Prague, Czech Republic, April 6-11, 2019, Proceedings, Part III, Dirk Beyer, Marieke Huisman, Fabrice Kordon, and Bernhard Steffen (Eds.) (Lecture Notes in Computer Science, Vol. 11429). Springer, 239–243. https://doi.org/10.1007/978-3-030-17502-3_21
[35]
Prateek Saxena, Devdatta Akhawe, Steve Hanna, Feng Mao, Stephen McCamant, and Dawn Song. 2010. A Symbolic Execution Framework for JavaScript. In 31st IEEE Symposium on Security and Privacy, S&P 2010, 16-19 May 2010, Berleley/Oakland, California, USA. IEEE Computer Society, 513–528. https://doi.org/10.1109/SP.2010.38
[36]
Xiaomu Shi, Yu-Fu Fu, Jiaxiang Liu, Ming-Hsien Tsai, Bow-Yaw Wang, and Bo-Yin Yang. 2021. CoqQFBV: A Scalable Certified SMT Quantifier-Free Bit-Vector Solver. In Computer Aided Verification - 33rd International Conference, CAV 2021, Virtual Event, July 20-23, 2021, Proceedings, Part II, Alexandra Silva and K. Rustan M. Leino (Eds.) (Lecture Notes in Computer Science, Vol. 12760). Springer, 149–171. https://doi.org/10.1007/978-3-030-81688-9_7
[37]
Caleb Stanford, Margus Veanes, and Nikolaj Bjørner. 2021. Symbolic Boolean derivatives for efficiently solving extended regular expression constraints. In PLDI ’21: 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation, Virtual Event, Canada, June 20-25, 2021, Stephen N. Freund and Eran Yahav (Eds.). ACM, 620–635. https://doi.org/10.1145/3453483.3454066
[38]
Minh-Thai Trinh, Duc-Hiep Chu, and Joxan Jaffar. 2014. S3: A Symbolic String Solver for Vulnerability Detection in Web Applications. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, Scottsdale, AZ, USA, November 3-7, 2014, Gail-Joon Ahn, Moti Yung, and Ninghui Li (Eds.). ACM, 1232–1243. https://doi.org/10.1145/2660267.2660372
[39]
Minh-Thai Trinh, Duc-Hiep Chu, and Joxan Jaffar. 2016. Progressive Reasoning over Recursively-Defined Strings. In Computer Aided Verification - 28th International Conference, CAV 2016, Toronto, ON, Canada, July 17-23, 2016, Proceedings, Part I, Swarat Chaudhuri and Azadeh Farzan (Eds.) (Lecture Notes in Computer Science, Vol. 9779). Springer, 218–240. https://doi.org/10.1007/978-3-319-41528-4_12
[40]
Thomas Tuerk. 2012. A Formalisation of Finite Automata in Isabelle / HOL. https://www.thomas-tuerk.de/assets/talks/cava.pdf
[41]
Margus Veanes, Pieter Hooimeijer, Benjamin Livshits, David Molnar, and Nikolaj Bjørner. 2012. Symbolic finite state transducers: algorithms and applications. In Proceedings of the 39th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2012, Philadelphia, Pennsylvania, USA, January 22-28, 2012, John Field and Michael Hicks (Eds.). ACM, 137–150. https://doi.org/10.1145/2103656.2103674
[42]
Hung-En Wang, Tzung-Lin Tsai, Chun-Han Lin, Fang Yu, and Jie-Hong R. Jiang. 2016. String Analysis via Automata Manipulation with Logic Circuit Representation. In Computer Aided Verification - 28th International Conference, CAV 2016, Toronto, ON, Canada, July 17-23, 2016, Proceedings, Part I, Swarat Chaudhuri and Azadeh Farzan (Eds.) (Lecture Notes in Computer Science, Vol. 9779). Springer, 241–260. https://doi.org/10.1007/978-3-319-41528-4_13
[43]
Fang Yu, Muath Alkhalaf, Tevfik Bultan, and Oscar H. Ibarra. 2014. Automata-based symbolic string analysis for vulnerability detection. Formal Methods Syst. Des., 44, 1 (2014), 44–70. https://doi.org/10.1007/s10703-013-0189-1

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CPP 2022: Proceedings of the 11th ACM SIGPLAN International Conference on Certified Programs and Proofs
January 2022
351 pages
ISBN:9781450391825
DOI:10.1145/3497775
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 January 2022

Check for updates

Author Tags

  1. Isabelle
  2. SMT solvers
  3. string theory
  4. symbolic automata

Qualifiers

  • Research-article

Funding Sources

  • ERC Starting Grant
  • Swedish Research Council
  • Swedish Foundation for Strategic Research

Conference

CPP '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 18 of 26 submissions, 69%

Upcoming Conference

POPL '26

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)128
  • Downloads (Last 6 weeks)14
Reflects downloads up to 30 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)SMTQuery: Analysing SMT-LIB String BenchmarksFormal Methods: Foundations and Applications10.1007/978-3-031-78116-2_2(22-34)Online publication date: 29-Nov-2024
  • (2024)Formally Certified Approximate Model CountingComputer Aided Verification10.1007/978-3-031-65627-9_8(153-177)Online publication date: 24-Jul-2024
  • (2023)On the Expressive Power of String ConstraintsProceedings of the ACM on Programming Languages10.1145/35712037:POPL(278-308)Online publication date: 11-Jan-2023
  • (2023)A Closer Look at the Expressive Power of Logics Based on Word EquationsTheory of Computing Systems10.1007/s00224-023-10154-868:3(322-379)Online publication date: 11-Dec-2023
  • (2023)Rely-Guarantee Reasoning for Causally Consistent Shared MemoryComputer Aided Verification10.1007/978-3-031-37706-8_11(206-229)Online publication date: 17-Jul-2023
  • (2023)Carcara: An Efficient Proof Checker and Elaborator for SMT Proofs in the Alethe FormatTools and Algorithms for the Construction and Analysis of Systems10.1007/978-3-031-30823-9_19(367-386)Online publication date: 22-Apr-2023

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media