Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1989284.1989316acmconferencesArticle/Chapter ViewAbstractPublication PagespodsConference Proceedingsconference-collections
research-article

The complexity of text-preserving XML transformations

Published: 13 June 2011 Publication History

Abstract

While XML is nowadays adopted as the de facto standard for data exchange, historically, its predecessor SGML was invented for describing electronic documents, i.e., marked up text. Actually, today there are still large volumes of such XML texts. We consider simple transformations which can change the internal structure of documents, that is, the mark-up, and can filter out parts of the text but do not disrupt the ordering of the words. Specifically, we focus on XML transformations where the transformed document is a subsequence of the input document when ignoring mark-up. We call the latter text-preserving XML transformations. We characterize such transformations as copy- and rearrange-free transductions. Furthermore, we study the problem of deciding whether a given XML transducer is text-preserving over a given tree language. We consider top-down transducers as well as the abstraction of XSLT called DTL. We show that deciding whether a transformation is text-preserving over an unranked regular tree language is in PTime for top-down transducers, EXPTime-complete for DTL with XPath, and decidable for DTL with MSO patterns. Finally, we obtain that for every transducer in one of the above mentioned classes, the maximal subset of the input schema can be computed on which the transformation is text-preserving.

References

[1]
J. Albert, D. Giammerresi, D. Wood. Normal form algorithms for extended context free grammars. Theor. Comp. Sc., 267(1-2):35--47, 2001.
[2]
G. J. Bex, S. Maneth, F. Neven. A formal model for an expressive fragment of XSLT. Inf. Syst., 27(1):21--39, 2002.
[3]
H. Björklund, W. Gelade, W. Martens. Incremental XPath evaluation. ACM Trans. Database Syst., 35(4), 2011.
[4]
R. Bloem, J. Engelfriet. A comparison of tree transductions defined by monadic second order logic and by attribute grammars. J. Comput. Syst. Sci., 61(1):1--50, 2000.
[5]
M. Bojanczyk. Tree-walking automata. In LATA, pages 1--2, 2008.
[6]
M. Bojanczyk, A. Muscholl, T. Schwentick, L. Segoufin. Two-variable logic on data trees and XML reasoning. Journal of the ACM, 56(3), 2009.
[7]
J. Engelfriet, S. Maneth. A comparison of pebble tree transducers with macro tree transducers. Acta Inf., 39(9):613--698, 2003.
[8]
L. Libkin. Elements Of Finite Model Theory. Springer Verlag, 2004.
[9]
S. Maneth, A. Berlea, T. Perst, H. Seidl. XML type checking with macro tree transducers. In PODS, pages 283--294, 2005.
[10]
S. Maneth, S. Friese, H. Seidl. Type checking of tree walking transducers. In Modern Applications of Automata Theory. World Scientific Publishing, 2011.
[11]
S. Maneth, F. Neven. Structured document transformations based on XSL. In DBPL, pages 80--98, 1999.
[12]
S. Maneth, T. Perst, H. Seidl. Exact XML type checking in polynomial time. In ICDT, pages 254--268, 2007.
[13]
W. Martens, F. Neven. On the complexity of typechecking top-down XML transformations. Theor. Comp. Sc., 336(1):153--180, 2005.
[14]
W. Martens, F. Neven. Frontiers of tractability for typechecking simple XML transformations. J. Comput. Syst. Sci., 73(3):362--390, 2007.
[15]
W. Martens, F. Neven, M. Gyssens. Typechecking top-down XML transformations: Fixed input or output schemas. Inf. and Comput., 206(7):806--827, 2008.
[16]
W. Martens, F. Neven, T. Schwentick. Complexity of decision problems for XML schemas and chain regular expressions. SIAM J. Comput., 39(4):1486--1530, 2009.
[17]
M. Marx. XPath with conditional axis relations. In EDBT, pages 477--494, 2004.
[18]
T. Milo, D. Suciu, V. Vianu. Typechecking for XML transformers. J. Comput. Syst. Sci., 66(1):66--97, 2003.
[19]
F. Neven. On the power of walking for querying tree-structured data. In PODS, pages 77--84, 2002.
[20]
F. Neven. Attribute grammars for unranked trees as a query language for structured documents. J. Comput. Syst. Sci., 70(2):221--257, 2005.
[21]
F. Neven, T. Schwentick. On the complexity of XPath containment in the presence of disjunction, DTDs, and variables. Log. Meth. in Comp. Sc., 2(3), 2006.
[22]
T. Perst and H. Seidl. Macro forest transducers. Inf. Process. Lett., 89(3):141--149, 2004.
[23]
M. Samuelides and L. Segoufin. Complexity of pebble tree-walking automata. In FCT, pages 458--469, 2007.
[24]
B. ten Cate and C. Lutz. The complexity of query containment in expressive fragments of XPath 2.0. Journal of the ACM, 56(6), 2009.

Cited By

View all
  • (2012)Design Independent Query InterfacesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2012.5724:10(1819-1832)Online publication date: 1-Oct-2012

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PODS '11: Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
June 2011
332 pages
ISBN:9781450306607
DOI:10.1145/1989284
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 June 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. transducers
  2. trees
  3. xml

Qualifiers

  • Research-article

Conference

SIGMOD/PODS '11
Sponsor:

Acceptance Rates

PODS '11 Paper Acceptance Rate 25 of 113 submissions, 22%;
Overall Acceptance Rate 642 of 2,707 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 11 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2012)Design Independent Query InterfacesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2012.5724:10(1819-1832)Online publication date: 1-Oct-2012

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media