Regex matching with counting-set automata

Published: 13 November 2020 Publication History


We propose a solution to the problem of efficient matching regular expressions (regexes) with bounded repetition, such as (ab){1,100}, using deterministic automata. For this, we introduce novel counting-set automata (CsAs), automata with registers that can hold sets of bounded integers and can be manipulated by a limited portfolio of constant-time operations. We present an algorithm that compiles a large sub-class of regexes to deterministic CsAs. This includes (1) a novel Antimirov-style translation of regexes with counting to counting automata (CAs), nondeterministic automata with bounded counters, and (2) our main technical contribution, a determinization of CAs that outputs CsAs. The main advantage of this workflow is that the size of the produced CsAs does not depend on the repetition bounds used in the regex (while the size of the DFA is exponential to them). Our experimental results confirm that deterministic CsAs produced from practical regexes with repetition are indeed vastly smaller than the corresponding DFAs. More importantly, our prototype matcher based on CsA simulation handles practical regexes with repetition regardless of sizes of counter bounds. It easily copes with regexes with repetition where state-of-the-art matchers struggle.

  • (2024)Linear Matching of JavaScript Regular ExpressionsProceedings of the ACM on Programming Languages10.1145/36564318:PLDI(1336-1360)Online publication date: 20-Jun-2024
  • (2024)One Automaton to Rule Them All: Beyond Multiple Regular Expressions ExecutionProceedings of the 2024 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO57630.2024.10444810(193-206)Online publication date: 2-Mar-2024
  • (2024)Algebraic Reasoning Meets Automata in Solving Linear Integer ArithmeticComputer Aided Verification10.1007/978-3-031-65627-9_3(42-67)Online publication date: 24-Jul-2024
Author Tags

  1. Antimirov's derivatives
  2. ReDos
  3. bounded repetition
  4. counting automata
  5. counting-set automata
  6. determinization
  7. regular expression matching


  • (2024)Linear Matching of JavaScript Regular ExpressionsProceedings of the ACM on Programming Languages10.1145/36564318:PLDI(1336-1360)Online publication date: 20-Jun-2024
  • (2024)One Automaton to Rule Them All: Beyond Multiple Regular Expressions ExecutionProceedings of the 2024 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO57630.2024.10444810(193-206)Online publication date: 2-Mar-2024
  • (2024)Algebraic Reasoning Meets Automata in Solving Linear Integer ArithmeticComputer Aided Verification10.1007/978-3-031-65627-9_3(42-67)Online publication date: 24-Jul-2024
  • (2023)Derivative Based Nonbacktracking Real-World Regex Matching with Backtracking SemanticsProceedings of the ACM on Programming Languages10.1145/35912627:PLDI(1026-1049)Online publication date: 6-Jun-2023
  • (2023)Regular Expression Matching using Bit Vector AutomataProceedings of the ACM on Programming Languages10.1145/35860447:OOPSLA1(492-521)Online publication date: 6-Apr-2023
  • (2023)Improving Developers’ Understanding of Regex Denial of Service Tools through Anti-Patterns and Fix Strategies2023 IEEE Symposium on Security and Privacy (SP)10.1109/SP46215.2023.10179442(1238-1255)Online publication date: May-2023
  • (2023)Effective ReDoS Detection by Principled Vulnerability Modeling and Exploit Generation2023 IEEE Symposium on Security and Privacy (SP)10.1109/SP46215.2023.10179328(2427-2443)Online publication date: May-2023
  • (2023)String Constraints with Regex-Counting and String-Length Solved More EfficientlyDependable Software Engineering. Theories, Tools, and Applications10.1007/978-981-99-8664-4_1(1-20)Online publication date: 27-Nov-2023
  • (2023)Fast Matching of Regular Patterns with Synchronizing CountingFoundations of Software Science and Computation Structures10.1007/978-3-031-30829-1_19(392-412)Online publication date: 21-Apr-2023
  • (2023)Automata with Bounded Repetition in RE2Computer Aided Systems Theory – EUROCAST 202210.1007/978-3-031-25312-6_27(232-239)Online publication date: 10-Feb-2023
