Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article
Open access

Regular expression types for XML

Published: 01 January 2005 Publication History

Abstract

We propose regular expression types as a foundation for statically typed XML processing languages. Regular expression types, like most schema languages for XML, introduce regular expression notations such as repetition (*), alternation (|), etc., to describe XML documents. The novelty of our type system is a semantic presentation of subtyping, as inclusion between the sets of documents denoted by two types. We give several examples illustrating the usefulness of this form of subtyping in XML processing.The decision problem for the subtype relation reduces to the inclusion problem between tree automata, which is known to be EXPTIME-complete. To avoid this high complexity in typical cases, we develop a practical algorithm that, unlike classical algorithms based on determinization of tree automata, checks the inclusion relation by a top-down traversal of the original type expressions. The main advantage of this algorithm is that it can exploit the property that type expressions being compared often share portions of their representations. Our algorithm is a variant of Aiken and Murphy's set-inclusion constraint solver, to which are added several new implementation techniques, correctness proofs, and preliminary performance measurements on some small programs in the domain of typed XML processing.

References

[1]
Aiken, A. and Murphy, B. R. 1991. Implementing regular tree expressions. In Proceedings of Functional Programming and Computer Architecture, J. Hughes, Ed. Lecture Notes in Computer Science, vol. 523. Springer-Verlag, New York.]]
[2]
Amadio, R. M. and Cardelli, L. 1993. Subtyping recursive types. ACM Trans. Prog. Lang. Syst. 15, 4, 575--631. (Preliminary version in POPL '91 (pp. 104--118); also DEC Systems Research Center Research Report number 62, August 1990.)]]
[3]
Brandt, M. and Henglein, F. 1998. Coinductive axiomatization of recursive type equality and subtyping. Fund. Inf. 33, 309--338.]]
[4]
Bray, T., Paoli, J., Sperberg-McQueen, C. M., and Maler, E. 2000. Extensible markup language (XML#8482;). http://www.w3.org/XML/.]]
[5]
Brzozowski, J. A. 1964. Derivatives of regular expressions. J. ACM 11, 4 (Oct.), 481--494.]]
[6]
Buneman, P., Davidson, S., Fernandez, M., and Suciu, D. 1997. Adding structure to unstructured data. In Proceedings of the International Conference on Database Theory. Lecture Notes in Computer Science, vol. 1. Springer-Verlag, New York, 336--351.]]
[7]
Buneman, P. and Pierce, B. 1998. Union types for semistructured data. In Proceedings of the International Database Programming Languages Workshop. Lecture Notes in Computer Science, vol. 1686. Springer-Verlag, New York.]]
[8]
Cai, J. and Paige, R. 1995. Using multiset discrimination to solve language processing problems without hashing. Theoret. Comput. Sci. 145, 1--2, 189--228.]]
[9]
Chawathe, S. S. 1999. Comparing hierarchical data in external memory. In Proceedings of the 25th International Conference on Very Large Data Bases (Edinburgh, Scotland, U.K.). 90--101.]]
[10]
Clark, J. 1999. XSL Transformations (XSLT). http://www.w3.org/TR/xslt.]]
[11]
Clark, J. 2001. TREX: Tree Regular Expressions for XML. http://www.thaiopensource.com/trex/.]]
[12]
Clark, J. and Murata, M. 2001. RELAX NG. http://www.relaxng.org.]]
[13]
Cluet, S. and Siméon, J. 1998. Using YAT to build a web server. In Proceedings of the International Workshop on the Web and Databases (WebDB). 118--135.]]
[14]
Comon, H., Dauchet, M., Gilleron, R., Jacquemard, F., Lugiez, D., Tison, S., and Tommasi, M. 1999. Tree automata techniques and applications (Draft book; available electronically on http://www.grappa.univ-lille3.fr/tata.)]]
[15]
Damm, F. M. 1994. Subtyping with union types, intersection types and recursive types. In Proceedings of the Theoretical Aspects of Computer Software, M. Hagiya and J. C. Mitchell, Eds. Lecture Notes in Computer Science, vol. 789. Springer-Verlag, New York, 687--706.]]
[16]
Davidson, A., Fuchs, M., Hedin, M., Jain, M., Koistinen, J., Lloyd, C., Maloney, M., and Schwarzhof, K. 1999. Schema for object-oriented XML. http://www.w3.org/TR/NOTE-SOX/.]]
[17]
Davies, R. 2000. Tree automata inclusion. Personal communication.]]
[18]
Deutsch, A., Fernandez, M., Florescu, D., Levy, A., and Suciu, D. 1998. XML-QL: A Query Language for XML. http://www.w3.org/TR/NOTE-xml-ql.]]
[19]
DOM. 2001. Document object model (DOM). http://www.w3.org/DOM/.]]
[20]
Fallside, D. C. 2001. XML Schema Part 0: Primer, W3C Recommendation. http://www.w3.org/TR/xmlschema-0/.]]
[21]
Fernández, M. F., Siméon, J., and Wadler, P. 2001. A semi-monad for semi-structured data. In Proceedings of 8th International Conference on Database Theory (ICDT 2001), J. V. den Bussche and V. Vianu, Eds. Lecture Notes in Computer Science, vol. 1973. Springer-Verlag, New York, 263--300.]]
[22]
Freeman, T. and Pfenning, F. 1991. Refinement types for ML. In Proceedings of the SIGPLAN '91 Symposium on Language Design and Implementation (Toronto, Ont. Canada). ACM, New York.]]
[23]
Frisch, A., Castagna, G., and Benzaken, V. 2002. Semantic subtyping. In Proceedings of the 17th Annual IEEE Symposium on Logic In Computer Science. 137--146.]]
[24]
Gapeyev, V., Levin, M., and Pierce, B. 2000. Recursive subtyping revealed. In Proceedings of the International Conference on Functional Programming (ICFP). 221--232.]]
[25]
Gilleron, R., Tison, S., and Tommasi, M. 1999. Set constraints and automata. Inf. Comput. 149, 1, 1--41.]]
[26]
Goldman, R. and Widom, J. 1997. Dataguides: Enabling query formulation and optimization in semistructured databases. In VLDB'97, Proceedings of 23rd International Conference on Very Large Data Bases, M. Jarke, M. J. Carey, K. R. Dittrich, F. H. Lochovsky, P. Loucopoulos, and M. A. Jeusfeld, Eds. Morgan Kaufmann, 436--445.]]
[27]
Hopcroft, J. E. and Ullman, J. D. 1979. Introduction to Automata Theory, Languages, and Computation. Addison-Wesley, Reading, Mass.]]
[28]
Hornung, T. 1996. Labelled trees and their recognition. In Publ. Math. Debrecen. Number 3--4 in 48. 309--316.]]
[29]
Hosoya, H. 2003. Regular expression pattern matching---A simpler design. Tech. Rep. 1397, RIMS, Kyoto University.]]
[30]
Hosoya, H. and Pierce, B. C. 2002. Regular expression pattern matching for XML. J. Funct. Prog. 13, 6, 961--1004. (Short version appeared in Proceedings of the 25th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp. 67--80, 2001.)]]
[31]
Hosoya, H. and Pierce, B. C. 2003. XDuce: A typed XML processing language. ACM Trans. Internet Tech. 3, 2, 117--148. (Short version appeared in Proceedings of 3rd International Workshop on the Web and Databases (WebDB2000), volume 1997 of Lecture Notes in Computer Science. Springer-Verlag, New York, 2000, pp. 226--244.)]]
[32]
Klarlund, N., Møller, A., and Schwartzbach, M. I. 2000. DSD: A schema language for XML. http://www.brics.dk/DSD/.]]
[33]
Kuper, G. M. and Siméon, J. 2001. Subsumption for XML types. In Proceedings of the International Conference on Database Theory (ICDT'2001). London.]]
[34]
Meijer, E. and Shields, M. 1999. XMλ: A functional programming language for constructing and manipulating XML documents. Manuscript.]]
[35]
Milo, T., Suciu, D., and Vianu, V. 2000. Typechecking for XML transformers. In Proceedings of the 19th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. ACM, New York, 11--22.]]
[36]
Murata, M. 2000. Hedge automata: A formal model for XML schemata. http://www.xml.gr.jp/relax/hedge_nice.html.]]
[37]
Murata, M. 2001. RELAX (REgular LAnguage description for XML). http://www.xml.gr.jp/relax/.]]
[38]
Papakonstantinou, Y. and Vianu, V. 2000. DTD Inference for views of XML data. In Proceedings of the 19th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (Dallas, Tex.). ACM, New York, 35--46.]]
[39]
Seidl, H. 1990. Deciding equivalence of finite tree automata. SIAM J. Comput. 19, 3 (June), 424--437.]]
[40]
Shields, M. and Meijer, E. 2001. Type-indexed rows. In Proceedings of the 25th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (London, United Kingdom). ACM New York.]]
[41]
Wallace, M. and Runciman, C. 1999. Haskell and XML: Generic combinators or type-based translation? In Proceedings of the 4th ACM SIGPLAN International Conference on Functional Programming (ICFP'99). ACM SIGPLAN Notices, vol. 34-9. ACM, New York, 148--159.]]

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Programming Languages and Systems
ACM Transactions on Programming Languages and Systems  Volume 27, Issue 1
January 2005
184 pages
ISSN:0164-0925
EISSN:1558-4593
DOI:10.1145/1053468
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 January 2005
Published in TOPLAS Volume 27, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Type systems
  2. XML
  3. subtyping

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)96
  • Downloads (Last 6 weeks)13
Reflects downloads up to 03 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Stream TypesProceedings of the ACM on Programming Languages10.1145/36564348:PLDI(1412-1436)Online publication date: 20-Jun-2024
  • (2024)Face Index of Silicon Carbide Structures: An Alternative ApproachSilicon10.1007/s12633-024-03119-0Online publication date: 29-Aug-2024
  • (2023)Towards a Generalised Semistructured Data Model and Query LanguageACM SIGWEB Newsletter10.1145/3609429.36094332023:Summer(1-22)Online publication date: 1-Aug-2023
  • (2023)POSIX Lexing with Derivatives of Regular ExpressionsJournal of Automated Reasoning10.1007/s10817-023-09667-167:3Online publication date: 8-Jul-2023
  • (2022)OceanBaseProceedings of the VLDB Endowment10.14778/3554821.355483015:12(3385-3397)Online publication date: 1-Aug-2022
  • (2022)MLstruct: principal type inference in a Boolean algebra of structural typesProceedings of the ACM on Programming Languages10.1145/35633046:OOPSLA2(449-478)Online publication date: 31-Oct-2022
  • (2021)Differentially private binary- and matrix-valued data queryProceedings of the VLDB Endowment10.14778/3446095.344610614:5(849-862)Online publication date: 1-Jan-2021
  • (2021)Finding data compatibility bugs with JSON subschema checkingProceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3460319.3464796(620-632)Online publication date: 11-Jul-2021
  • (2020)The simple essence of algebraic subtyping: principal type inference with subtyping made easy (functional pearl)Proceedings of the ACM on Programming Languages10.1145/34090064:ICFP(1-28)Online publication date: 3-Aug-2020
  • (2020)Continuous complianceProceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering10.1145/3324884.3416593(511-523)Online publication date: 21-Dec-2020
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media