Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Bit-coded Regular Expression Parsing

  • Conference paper
Language and Automata Theory and Applications (LATA 2011)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6638))

Abstract

Regular expression parsing is the problem of producing a parse tree of a string for a given regular expression. We show that a compact bit representation of a parse tree can be produced efficiently, in time linear in the product of input string size and regular expression size, by simplifying the DFA-based parsing algorithm due to Dubé and Feeley to emit the bits of the bit representation without explicitly materializing the parse tree itself. We furthermore show that Frisch and Cardelli’s greedy regular expression parsing algorithm can be straightforwardly modified to produce bit codings directly. We implement both solutions as well as a backtracking parser and perform benchmark experiments to gauge their practical performance. We observe that our DFA-based solution can be significantly more time and space efficient than the Frisch-Cardelli algorithm due to its sharing of DFA-nodes, but that the latter may still perform better on regular expressions that are “more deterministic” from the right than the left. (Backtracking is, unsurprisingly, quite hopeless.)

This work has been partially supported by the Danish Strategic Research Council under Project “TrustCare”. The order of authors is insignificant.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Bille, P., Thorup, M.: Faster regular expression matching. In: Albers, S., Marchetti-Spaccamela, A., Matias, Y., Nikoletseas, S., Thomas, W. (eds.) ICALP 2009. LNCS, vol. 5555, pp. 171–182. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  2. Brabrand, C., Thomsen, J.: Typed and unambiguous pattern matching on strings using regular expressions. In: Proc. 12th International ACM SIGPLAN Symposium on Principles and Practice of Declarative Programming, PPDP (2010)

    Google Scholar 

  3. Cameron, R.: Source encoding using syntactic information source models. IEEE Transactions on Information Theory 34(4), 843–850 (1988)

    Article  MathSciNet  Google Scholar 

  4. Contla, J.: Compact coding of syntactically correct source programs. Software: Practice and Experience 15(7), 625–636 (1985)

    Google Scholar 

  5. Cox, R.: Regular expression matching can be simple and fast

    Google Scholar 

  6. Dubé, D., Feeley, M.: Efficiently building a parse tree from a regular expression. Acta Informatica 37(2), 121–144 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  7. Frisch, A., Cardelli, L.: Greedy regular expression matching. In: Díaz, J., Karhumäki, J., Lepistö, A., Sannella, D. (eds.) ICALP 2004. LNCS, vol. 3142, pp. 618–629. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  8. Henglein, F., Nielsen, L.: Declarative coinductive axiomatization of regular expression containment and its computational interpretation (preliminary version). In: Proc. 38th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages (POPL) (January 2011)

    Google Scholar 

  9. Hosoya, H., Vouillon, J., Pierce, B.C.: Regular expression types for xml. ACM Trans. Program. Lang. Syst. 27(1), 46–90 (2005)

    Article  MATH  Google Scholar 

  10. Institute of Electrical and Electronics Engineers (IEEE): Standard for information technology — Portable Operating System Interface (POSIX) — Part 2 (Shell and utilities), Section 2.8 (Regular expression notation), New York, IEEE Standard 1003.2 (1992)

    Google Scholar 

  11. Jansson, P., Jeuring, J.: Polytypic compact printing and parsing. In: Swierstra, S.D. (ed.) ESOP 1999. LNCS, vol. 1576, p. 639. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  12. Kleene, S.C.: Representation of events in nerve nets and finite automata. Automata Studies 34, 3–41 (1956)

    MathSciNet  Google Scholar 

  13. Nielsen, L.: Regular expression compression parser, http://www.thelas.dk/index.php/Rcp

  14. Vansummeren, S.: Type inference for unique pattern matching. ACM Trans. Program. Lang. Syst. 28(3), 389–428 (2006)

    Article  Google Scholar 

  15. Veanes, M.V.M., de Halleux, P., Tillmann, N.: Rex: Symbolic regular expression explorer. In: Proc. 3d Int’l Conf. on Software Testing, Verification and Validation, April 6-10. IEEE Computer Society Press, Paris (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Nielsen, L., Henglein, F. (2011). Bit-coded Regular Expression Parsing. In: Dediu, AH., Inenaga, S., Martín-Vide, C. (eds) Language and Automata Theory and Applications. LATA 2011. Lecture Notes in Computer Science, vol 6638. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21254-3_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-21254-3_32

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-21253-6

  • Online ISBN: 978-3-642-21254-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics