Hostname: page-component-cd9895bd7-jn8rn Total loading time: 0 Render date: 2024-12-29T12:54:41.897Z Has data issue: false hasContentIssue false

Regular-expression derivatives re-examined

Published online by Cambridge University Press:  01 March 2009

SCOTT OWENS
Affiliation:
University of Cambridge (e-mail: Scott.Owens@cl.cam.ac.uk)
JOHN REPPY
Affiliation:
University of Chicago (e-mail: jhr@cs.uchicago.edu)
AARON TURON
Affiliation:
University of Chicago, Northeastern University (e-mail: turon@ccs.neu.edu)
Rights & Permissions [Opens in a new window]

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

Regular-expression derivatives are an old, but elegant, technique for compiling regular expressions to deterministic finite-state machines. It easily supports extending the regular-expression operators with boolean operations, such as intersection and complement. Unfortunately, this technique has been lost in the sands of time and few computer scientists are aware of it. In this paper, we reexamine regular-expression derivatives and report on our experiences in the context of two different functional-language implementations. The basic implementation is simple and we show how to extend it to handle large character sets (e.g., Unicode). We also show that the derivatives approach leads to smaller state machines than the traditional algorithm given by McNaughton and Yamada.

Type
Articles
Copyright
Copyright © Cambridge University Press 2009

References

Aho, A. V., Hopcroft, J. E. & Ullman, J. D. (1974) The Design and Analysis of Computer Algorithms. Reading, MA: Addison Wesley.Google Scholar
Aho, A. V., Sethi, R. & Ullman, J. D. (1986) Compilers: Principles, Techniques, and Tools. Reading, MA: Addison Wesley.Google Scholar
Aho, A. V. & Ullman, J. D. (1972) The Theory of Parsing, Translation, and Compiling. Vol. 1. Englewood Cliffs, NJ: Prentice Hall.Google Scholar
Appel, A. W. (1998) Modern Compiler Implementation in ML. Cambridge: Cambridge University Press.Google Scholar
Appel, A. W., Mattson, J. S. & Tarditi, D. R. (1994Oct.) A Lexical Analyzer Generator for Standard ML. Available at: http://smlnj.org/doc/ML-Lex/manual.html.Google Scholar
Baxter, I., Pidgeon, C., & Mehlich, M. (2004) DMS: Program transformations for practical scalable software evolution. In International Conference on Software Engineering.Google Scholar
Berry, G. (1999) The Esterel v5 Language Primer Version 5.21 Release 2.0. Available at: ftp://ftp-sop.inria.fr/meije/esterel/papers/primer.pdf.Google Scholar
Berry, G., & Sethi, R. (1986) From regular expressions to deterministic automata. Theoret. Comp. Sci. Dec., 48 (1)117126.CrossRefGoogle Scholar
Brzozowski, J. A. (1964) Derivatives of regular expressions. J. ACM 11 (4), 481494.CrossRefGoogle Scholar
English, J. (1999) How to Validate XML. Available at: http://www.flightlab.com/~joe/sgml/validate.html. (Accessed 24 November 2008).Google Scholar
Findler, R. B., Clements, J., Flanagan, C., Flatt, M., Krishnamurthi, S., Steckler, P., & Felleisen, M. (2002) DrScheme: A programming environment for Scheme. J. Funct. Prog. 12 (2), 159182.CrossRefGoogle Scholar
Fisher, C. N., & LeBlanc, R. J. Jr., (1988) Crafting a Compiler. Menlo Park, CA: Benjamin/Cummings.Google Scholar
McNaughton, R., & Yamada, H. (1960) Regular expressions and state graphs for automata. IEEE Trans. Elec. Comp. 9, 3947.CrossRefGoogle Scholar
Rabin, M. O., & Scott, D. (1959) Finite automata and their decision problems. IBM J. Res. Dev. 3 (2), 114125.CrossRefGoogle Scholar
Schmidt, Martin. (2002) Design and Implementation of a Validating XML Parser in Haskell. Master's thesis, Computer Science Department, University of Applied Sciences Wedel.Google Scholar
Sen, K., & Roşu, G. (2003) Generating optimal monitors for extended regular expressions. In Proceedings of Runtime Verification (RV'03). Boulder, Colorado. Electronic Notes in Theoretical Computer Science, vol. 89, no. 2, pp. 226245. Elsevier Science.Google Scholar
Thompson, K. (1968) Regular expression search algorithm. Comm. ACM 11 (6), 419422.CrossRefGoogle Scholar
Unicode Consortium. (2003) The Unicode Standard, Version 4. Reading, MA: Addison-Wesley Professional.Google Scholar
Submit a response

Discussions

No Discussions have been published for this article.