Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1007/978-981-96-0602-3_19guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

A Derivative-Based Membership Algorithm for Enhanced Regular Expressions

Published: 26 November 2024 Publication History

Abstract

Enhanced regular expressions (EREs), which extend standard regular expressions with shuffle and counting operators, provide exponentially more succinct descriptions of regular languages. The membership problem, determining whether a given word w belongs to the language generated by an ERE E, is fundamental to numerous applications. However, efficient solutions for the membership problem of unconstrained EREs have remained elusive. This paper introduces a derivative for the counting operator and rigorously proves its correctness. We then leverage this derivative to design a membership algorithm for unconstrained EREs and analyze its time complexity based on a lemma establishing the relationship between the size of the derivative and the expression. We further propose algorithms based on the proposed derivatives to generate positive and negative words of specific lengths for EREs. The performance of the membership algorithm is then evaluated on real-world EREs. Finally, we validate the correctness of two existing inference algorithms that previously lacked formal correctness guarantees due to the absence of practical membership algorithms for unconstrained EREs.

References

[1]
Angluin D Learning regular sets from queries and counterexamples Inf. Comput. 1987 75 2 87-106
[2]
Broda, S., Machiavelo, A., Moreira, N., Reis, R.: Location automata for regular expressions with shuffle and intersection. Inf. Comput. 295(Part B), 104917 (2023)
[3]
Brüggemann-Klein A Regular expressions into finite automata Theoret. Comput. Sci. 1993 120 2 197-213
[4]
Brzozowski JA Derivatives of regular expressions J. ACM 1964 11 4 481-494
[5]
Clark, J., Makoto, M.: Relax NG specification. oasis (2001). http://www.oasis-open.org/committees/relax-ng/spec-20011203.html (2004)
[6]
Colazzo D, Ghelli G, and Sartiani C Linear time membership in a class of regular expressions with counting, interleaving, and unordered concatenation ACM Trans. Database Syst. 2017 42 4 24
[7]
David C, Francis N, and Marsault V Distinct shortest walk enumeration for RPQs Proc. ACM Manage. Data 2024 2 2 1-22
[8]
Davis, J.C., IV, L.G.M., Coghlan, C.A., Servant, F., Lee, D.: Why aren’t regular expressions a lingua franca? An empirical study on the re-use and portability of regular expressions. In: ESEC/FSE 2019, pp. 443–454 (2019)
[9]
Garg VK and Ragunath M Concurrent regular expressions and their relationship to Petri nets Theoret. Comput. Sci. 1992 96 2 285-304
[10]
Gelade W Succinctness of regular expressions with interleaving, intersection and counting Theoret. Comput. Sci. 2010 411 31–33 2987-2998
[11]
Gelade W, Gyssens M, and Martens W Královič R and Niwiński D Regular expressions with counting: weak versus strong determinism Mathematical Foundations of Computer Science 2009 2009 Heidelberg Springer 369-381
[12]
Gelade W, Martens W, and Neven F Optimizing schema languages for XML: Numerical constraints and interleaving SIAM J. Comput. 2009 38 5 2021-2043
[13]
Ghelli, G., Colazzo, D., Sartiani, C.: Linear time membership in a class of regular expressions with interleaving and counting. In: CIKM 2008, pp. 389–398 (2008)
[14]
Hovland D Dediu A-H and Martín-Vide C The membership problem for regular expressions with unordered concatenation and numerical constraints Language and Automata Theory and Applications 2012 Heidelberg Springer 313-324
[15]
Jiang T and Ravikumar B A note on the space complexity of some decision problems for finite automata Inf. Process. Lett. 1991 40 1 25-31
[16]
Kilpeläinen, P., Tuhkanen, R.: Regular expressions with numerical occurrence indicators-preliminary results. In: SPLST 2003, pp. 163–173 (2003)
[17]
Li, Y., Chu, X., Mou, X., Dong, C., Chen, H.: Practical study of deterministic regular expressions from large-scale XML and schema data. In: IDEAS 2018, pp. 45–53 (2018)
[18]
Li Y, Mou X, and Chen H Gan G, Li B, Li X, and Wang S Learning concise relax ng schemas supporting interleaving from XML documents Advanced Data Mining and Applications 2018 Cham Springer 303-317
[19]
Liang T, Tsiskaridze N, Reynolds A, Tinelli C, and Barrett C Lutz C and Ranise S A decision procedure for regular membership and length constraints over unbounded strings Frontiers of Combining Systems 2015 Cham Springer 135-150
[20]
Mayer AJ and Stockmeyer LJ The complexity of word problems-this time with interleaving Inf. Comput. 1994 115 2 293-311
[21]
Sperberg-McQueen, C., Thompson, H.: XML schema (2005). http://www.w3.org/xml/schema
[22]
Stanford, C., Veanes, M., Bjørner, N.S.: Symbolic Boolean derivatives for efficiently solving extended regular expression constraints. In: PLDI 2021, pp. 620–635 (2021)
[23]
Stockmeyer, L.J., Meyer, A.R.: Word problems requiring exponential time (Preliminary Report). In: STOC 1973, pp. 1–9 (1973)
[24]
Sulzmann M and Thiemann P Derivatives and partial derivatives for regular shuffle expressions J. Comput. Syst. Sci. 2019 104 323-341
[25]
Tekli, J., Chbeir, R., Traina, A.J.M., Jr., C.T., Fileto, R.: Approximate XML structure validation based on document-grammar tree similarity. Inf. Sci. 295, 258–302 (2015)
[26]
Ter Beek MH and Kleijn J Infinite unfair shuffles and associativity Theoret. Comput. Sci. 2007 380 3 401-410
[27]
Wang, X., Hong, Y., Chang, H., Langdale, G., Hu, J.: Hyperscan: a fast multi-pattern regex matcher for modern CPUs. In: NSDI 19, pp. 631–648 (2019)
[28]
Wang X et al. Bhattacharya A et al. Membership algorithm for single-occurrence regular expressions with shuffle and counting DASFAA 2022 2022 Cham Springer 526-542
[29]
Zhang, S., Gu, X., Chen, Y., Shen, B.: InfeRE: step-by-step regex generation via chain of inference. In: ASE 2023, pp. 1505–1515 (2023)
[30]
Zhang X, Li Y, Cui F, Dong C, and Chen H Phung D, Tseng VS, Webb GI, Ho B, Ganji M, and Rashidi L Inference of a concise regular expression considering interleaving from XML documents Advances in Knowledge Discovery and Data Mining 2018 Cham Springer 389-401
[31]
Zheng Y et al. Z3str2: an efficient solver for strings, regular expressions, and length constraints Formal Methods Syst. Des. 2017 50 2–3 249-288

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
Dependable Software Engineering. Theories, Tools, and Applications: 10th International Symposium, SETTA 2024, Hong Kong, China, November 26–28, 2024, Proceedings
Nov 2024
430 pages
ISBN:978-981-96-0601-6
DOI:10.1007/978-981-96-0602-3
  • Editors:
  • Timothy Bourke,
  • Liqian Chen,
  • Amir Goharshady

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 26 November 2024

Author Tags

  1. Membership
  2. Enhanced regular expressions
  3. Derivatives
  4. Complexity
  5. Positive and negative words

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media