Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Complexity of Decision Problems for XML Schemas and Chain Regular Expressions

Published: 01 November 2009 Publication History

Abstract

We study the complexity of the inclusion, equivalence, and intersection problem of extended chain regular expressions (eCHAREs). These are regular expressions with a very simple structure: they basically consist of the concatenation of factors, where each factor is a disjunction of strings, possibly extended with “$*$”, “$+$”, or “$?$”. Though of a very simple form, the usage of such expressions is widespread as eCHAREs, for instance, constitute a super class of the regular expressions most frequently used in practice in schema languages for XML. In particular, we show that all our lower and upper bounds for the inclusion and equivalence problem carry over to the corresponding decision problems for extended context-free grammars, and to single-type and restrained competition tree grammars. These grammars form abstractions of document type definitions (DTDs), XML schema definitions (XSDs) and the class of one-pass preorder typeable XML Schemas, respectively. For the intersection problem, we show that obtained complexities only carry over to DTDs. In this respect, we also study two other classes of regular expressions related to XML: deterministic expressions and expressions where the number of occurrences of alphabet symbols is bounded by a constant.

References

[1]
P. A. Abdulla, A. Collomb-Annichini, A. Bouajjani, and B. Jonsson, Using forward reachability analysis for verification of lossy channel systems, Formal Methods in System Design, 25 (2004), pp. 39-65.
[2]
S. Abiteboul, P. Buneman, and D. Suciu, Data on the Web: From Relations to Semistructured Data and XML, Morgan Kaufmann, San Francisco, 1999.
[3]
J. Albert, D. Giammerresi, and D. Wood, Normal form algorithms for extended context free grammars, Theoretical Computer Science, 267 (2001), pp. 35-47.
[4]
S. Bala, Intersection of regular languages and star hierarchy, in Proceedings of the 29th International Colloquium on Automata, Languages and Programming (ICALP), Berlin, (2002), pp. 159-169.
[5]
M. Benedikt, W. Fan, and F. Geerts, Xpath satisfiability in the presence of DTDs, Journal of the ACM, 55(2), 2008.
[6]
G. J. Bex, W. Martens, F. Neven, and T. Schwentick, Expressiveness of XSDs: from practice to theory, there and back again, in Proceedings of the 14th International Conference on World Wide Web (WWW), pages 712-721, USA, 2005. ACM.
[7]
G. J. Bex, F. Neven, T. Schwentick, and K. Tuyls, Inference of concise DTDs from XML data, in Proceedings of the 32nd International Conference on Very Large Data Bases (VLDB), ACM, USA, (2006), pp. 115-126.
[8]
G.J. Bex, F. Neven, and J. Van den Bussche, DTDs versus XML Schema: A practical study, in Proceedings of the 7th International Workshop on the Web and Databases (WebDB), (2004), pp. 79-84.
[9]
T. Bray, J. Paoli, C.M. Sperberg-McQueen, E. Maler, and F. Yergeau, Extensible Markup Language (XML), Technical report, World Wide Web Consortium, February 2004. http://www.w3.org/TR/REC-xml/.
[10]
A. Brüggemann-Klein and D. Wood, One unambiguous regular languages, Inform. and Comput., 140 (1998), pp. 229-253.
[11]
A. Brüggemann-Klein, M. Murata, and D. Wood, Regular tree and regular hedge languages over unranked alphabets, Technical Report HKUST-TCSC-2001-0, The Hong Kong University of Science and Technology, 2001.
[12]
A. Brüggemann-Klein and D. Wood, One-unambiguous regular languages, Inform. and Comput., 142 (1998), pp. 182-206.
[13]
A. Brüggemann-Klein and D. Wood, Caterpillars: A context specification technique, Markup Languages, 2 (2000), pp. 81-106.
[14]
D. Calvanese, De G. Giacomo, M. Lenzerini, and M. Y. Vardi, Reasoning on regular path queries, SIGMOD Record, 32(4) (2003), pp. 83-92.
[15]
B. S. Chlebus, Domino-tiling games, J. Comput. Syst. Sci., 32 (1986), pp. 374-392.
[16]
B. Choi, What are real DTDs like?, In Proceedings of the 5th International Workshop on the Web and Databases (WebDB), (2002), pp. 43-48.
[17]
A. Deutsch, M. F. Fernandez, and D. Suciu, Storing semistructured data with STORED, In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD), ACM, (1999), pp. 431-442.
[18]
M. R. Garey and D. S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness, W.H. Freeman, New York, NA, 1979.
[19]
W. Gelade, M. Gyssens, and W. Martens, Regular expressions with counting: Weak versus strong determinism, In Proceedings of the 33rd International Symposium on Mathematical Foundations of Computer Science (MFCS), 2009.
[20]
W. Gelade, W. Martens, and F. Neven, Optimizing schema languages for XML: Numerical constraints and interleaving, SIAM J. Comput., 38 (2009), pp. 2021-2043.
[21]
G. Ghelli, D. Colazzo, and C. Sartiani, Efficient inclusion for a class of XML types with interleaving and counting, 11th International Symposium on Database Programming Languages (DBPL), 2007.
[22]
V. Glushkov, Abstract theory of automata, Uspekhi Matematicheskikh Nauk, 16(1) (1961), pp. 3-62. (English translation in Russian Mathematical Surveys, 16 (1961),pp. 1-53.)
[23]
L. Hemaspaandra and M. Ogihara, The Complexity Theory Companion, Springer-Verlag, Berlin, 2002.
[24]
H. Hosoya and B. C. Pierce, XDuce: A statically typed XML processing language, ACM Trans. Internet Tech., (2003), pp. 117-148.
[25]
H. B. Hunt III, D. J. Rosenkrantz, and T. G. Szymanski, On the equivalence, containment, and covering problems for the regular and context-free languages, J. Comput. Syst. Sci., 12 (1976), pp. 222-268.
[26]
P. Kilpeläinen and R. Tuhkanen, One-unambiguity of regular expressions with numeric occurrence indicators, Inform. Comput., 205 (2007), pp. 890-916.
[27]
C. Koch, S. Scherzinger, N. Schweikardt, and B. Stegmaier, Schema-based scheduling of event processors and buffer minimization for queries on structured data streams, In Proceedings of the 30th International Conference on Very Large Data Bases (VLDB), pp. 228-239, San Francisco, 2004.
[28]
D. Kozen, Lower bounds for natural proof systems, In Proceedings of the 18th Annual Symposium on Foundations of Computer Science (FOCS), IEEE, (1977), pp. 254-266.
[29]
L. Libkin, Logics for unranked trees: An overview, Logical Methods in Computer Science, 2(3), 2006.
[30]
M. Mani, Keeping chess alive - Do we need 1-unambiguous content models?, In Extreme Markup Languages, Montreal, Canada, 2001.
[31]
I. Manolescu, D. Florescu, and D. Kossmann, Answering XML Queries on Heterogeneous Data Sources, In Proceedings of the 27th International Conference on Very Large Data Bases (VLDB), pp. 241-250, San Francisco, 2001.
[32]
W. Martens and F. Neven, On the complexity of typechecking top-down XML transformations, Theor. Comput. Sci., 336 (2005), pp. 153-180.
[33]
W. Martens, F. Neven, and T. Schwentick, Complexity of decision problems for simple regular expressions, In Proceedings of the 29th Symposium on Mathematical Foundations of Computer Science (MFCS), Berlin, Springer, (2004), pp. 889-900.
[34]
W. Martens, F. Neven, T. Schwentick, and G.J. Bex, Expressiveness and complexity of XML Schema, ACM Trans. Database Syst., 31 (2006), pp. 770-813.
[35]
M. Marx, XPath with conditional axis relations, In Proceedings of the 9th International Conference on Extending Database Technology (EDBT), Berlin, Springer, (2004), pp. 477-494.
[36]
G. Miklau and D. Suciu, Containment and equivalence for a fragment of XPath, J. ACM, 51 (2004), pp. 2-45.
[37]
T. Milo and D. Suciu, Index structures for path expressions, In Proceedings of the 7th International Conference on Database Theory (ICDT), Berlin, Springer, (1999), pp. 277-295.
[38]
T. Milo, D. Suciu, and V. Vianu, Typechecking for XML transformers, J. Comp. Syst. Sci., 66 (2003), pp. 66-97.
[39]
D. Mount, Bioinformatics: Sequence and Genome Analysis, Cold Spring Harbor Laboratory Press, 2001.
[40]
M. Murata, Relax NG, http://www.relaxng.org/.
[41]
M. Murata, D. Lee, M. Mani, and K. Kawaguchi, Taxonomy of XML schema languages using formal language theory, ACM Trans. Internet Tech., 5 (2005), pp. 660-704.
[42]
F. Neven, Automata, logic, and XML, In Proceedings of the 16th Conference for Computer Science Logic (CSL 2002), Berlin, Springer, (2002), pp. 2-26.
[43]
F. Neven and T. Schwentick, On the complexity of XPath containment in the presence of disjunction, DTDs, and variables, Logical Methods in Computer Science, 2, 2006.
[44]
Y. Papakonstantinou and V. Vianu, DTD inference for views of XML data, In Proceedings of the 19th Symposium on Principles of Database Systems (PODS), ACM Press, USA, (2000), pp. 35-46.
[45]
T. Schwentick, Automata for XML — a survey, J. Comput. Syst. Sci., 73 (2007), pp. 289-315.
[46]
R. Sedgewick, Algorithms. Addison-Wesley, Reading, MA, 1983.
[47]
H. Seidl, Deciding equivalence of finite tree automata, SIAM J. Comput., 19 (1990), pp. 424-437.
[48]
H. Seidl, Haskell overloading is DEXPTIME-complete, Inform. Process. Lett., 52 (1994), pp. 57-60.
[49]
C. M. Sperberg-McQueen, XML Schema 1.0: A language for document grammars, In XML 2003 Conference Proceedings, 2003.
[50]
C. M. Sperberg-McQueen and H. Thompson, XML Schema, http://www.w3.org/XML/Schema, 2005.
[51]
L. J. Stockmeyer and A. R. Meyer, Word problems requiring exponential time: Preliminary report, In Conference Record of the 5th Annual ACM Symposium on Theory of Computing (STOC), ACM, (1973), pp. 1-9.
[52]
B. ten Cate The expressivity of XPath with transitive closure, In Proceedings of the 25th Symposium on Principles of Database Systems (PODS), (2006), pp. 328-337.
[53]
E. van der Vlist, XML Schema, O'Reilly, Cambridge, MA, 2002.
[54]
E. van der Vlist, Relax NG, O'Reilly, Cambridge, MA, 2003.
[55]
V. Vianu, A Web odyssey: from Codd to XML, SIGMOD Record, 32 (2003), pp. 68-77.
[56]
G. Wang, M. Liu, J. Xu Yu, B. Sun, G. Yu, J. Lv, and H. Lu, Effective schema-based XML query optimization techniques, In International Database Engineering and Applications Symposium (IDEAS), (2003), pp. 230-235.

Cited By

View all
  • (2022)Towards Theory for Real-World DataProceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems10.1145/3517804.3526066(261-276)Online publication date: 12-Jun-2022
  • (2022)The Complexity of Regular Trail and Simple Path Queries on Undirected GraphsProceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems10.1145/3517804.3524149(165-174)Online publication date: 12-Jun-2022
  • (2019)Containment of Shape Expression Schemas for RDFProceedings of the 38th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems10.1145/3294052.3319687(303-319)Online publication date: 25-Jun-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image SIAM Journal on Computing
SIAM Journal on Computing  Volume 39, Issue 4
September 2009
447 pages

Publisher

Society for Industrial and Applied Mathematics

United States

Publication History

Published: 01 November 2009

Author Tags

  1. XML schemas
  2. complexity
  3. equivalence
  4. inclusion
  5. intersection
  6. regular expressions

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 07 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Towards Theory for Real-World DataProceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems10.1145/3517804.3526066(261-276)Online publication date: 12-Jun-2022
  • (2022)The Complexity of Regular Trail and Simple Path Queries on Undirected GraphsProceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems10.1145/3517804.3524149(165-174)Online publication date: 12-Jun-2022
  • (2019)Containment of Shape Expression Schemas for RDFProceedings of the 38th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems10.1145/3294052.3319687(303-319)Online publication date: 25-Jun-2019
  • (2019)Split-Correctness in Information ExtractionProceedings of the 38th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems10.1145/3294052.3319684(149-163)Online publication date: 25-Jun-2019
  • (2019)Context-Free Grammars for Deterministic Regular Expressions with InterleavingTheoretical Aspects of Computing – ICTAC 201910.1007/978-3-030-32505-3_14(235-252)Online publication date: 31-Oct-2019
  • (2018)Document Spanners for Extracting Incomplete InformationProceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems10.1145/3196959.3196968(125-136)Online publication date: 27-May-2018
  • (2018)Conjunctive query containment over trees using schema informationActa Informatica10.1007/s00236-016-0282-155:1(17-56)Online publication date: 1-Feb-2018
  • (2018)Inference of a Concise Regular Expression Considering Interleaving from XML DocumentsAdvances in Knowledge Discovery and Data Mining10.1007/978-3-319-93037-4_31(389-401)Online publication date: 3-Jun-2018
  • (2017)Games for Active XML RevisitedTheory of Computing Systems10.1007/s00224-016-9682-461:1(84-155)Online publication date: 1-Jul-2017
  • (2016)Bounded Repairability for Regular Tree LanguagesACM Transactions on Database Systems10.1145/289899541:3(1-45)Online publication date: 30-Jun-2016
  • Show More Cited By

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media