Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

XML Schema Mappings: Data Exchange and Metadata Management

Published: 24 April 2014 Publication History

Abstract

Relational schema mappings have been extensively studied in connection with data integration and exchange problems, but mappings between XML schemas have not received the same amount of attention. Our goal is to develop a theory of expressive XML schema mappings. Such mappings should be able to use various forms of navigation in a document, and specify conditions on data values. We develop a language for XML schema mappings, and study both data exchange with such mappings and metadata management problems. Specifically, we concentrate on four types of problems: complexity of mappings, query answering, consistency issues, and composition.
We first analyze the complexity of mappings, that is, recognizing pairs of documents such that one can be mapped into the other, and provide a classification based on sets of features used in mappings. Next, we chart the tractability frontier for the query answering problem. We show that the problem is tractable for expressive schema mappings and simple queries, but not vice versa. Then, we move to static analysis. We study the complexity of the consistency problem, that is, deciding whether it is possible to map some document of a source schema into a document of the target schema. Finally, we look at composition of XML schema mappings. We analyze its complexity and show that it is harder to achieve closure under composition for XML than for relational mappings. Nevertheless, we find a robust class of XML schema mappings that, in addition to being closed under composition, have good complexity properties with respect to the main data management tasks. Due to its good properties, we suggest this class as the class to use in applications of XML schema mappings.

Supplementary Material

a12-amano-apndx.pdf (amano.zip)
Supplemental movie, appendix, image and software files for, XML Schema Mappings: Data Exchange and Metadata Management

References

[1]
Serge Abiteboul, Paris C. Kanellakis, and Gösta Grahne. 1991. On the representation and querying of sets of possible worlds. Theor. Comput. Sci. 78, 1, 158--187.
[2]
Foto N. Afrati and Phokion G. Kolaitis. 2008. Answering aggregate queries in data exchange. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS). 129--138.
[3]
Shun'ichi Amano, Claire David, Leonid Libkin, and Filip Murlak. 2010. On the tradeoff between mapping and querying power in XML data exchange. In Proceedings of the International Conference on Database Theory (ICDT). 155--164.
[4]
Shun'ichi Amano, Leonid Libkin, and Filip Murlak. 2009. XML schema mappings. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS). 33--42.
[5]
Sihem Amer-Yahia, SungRan Cho, Laks V. S. Lakshmanan, and Divesh Srivastava. 2002. Tree pattern query minimization. VLDB J. 11, 4, 315--331.
[6]
Marcelo Arenas, Pablo Barceló, Ronald Fagin, and Leonid Libkin. 2004. LocAlly consistent transformations and query answering in data exchange. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS). 229--240.
[7]
Marcelo Arenas, Pablo Barceló, Leonid Libkin, and Filip Murlak. 2010. Relational and XML Data Exchange. Morgan & Claypool Publishers.
[8]
Marcelo Arenas, Pablo Barceló, and Juan L. Reutter. 2011. Query languages for data exchange: Beyond unions of conjunctive queries. Theory Comput. Syst. 49, 2, 489--564.
[9]
Marcelo Arenas and Leonid Libkin. 2008. XML data exchange: Consistency and query answering. J. ACM 55, 2.
[10]
Marcelo Arenas, Jorge Pérez, and Cristian Riveros. 2009. The recovery of a schema mapping: Bringing exchanged data back. ACM Trans. Datab. Syst. 34, 4.
[11]
Denilson Barbosa, Juliana Freire, and Alberto O. Mendelzon. 2005. Designing information-preserving mapping schemes for xml. In Proceedings of the International Conference on Very Large Data Bases (VLDB). 109--120.
[12]
Pablo Barceló. 2009. Logical foundations of relational data exchange. SIGMOD Record 38, 1, 49--58.
[13]
Pablo Barceló, Leonid Libkin, Antonella Poggi, and Cristina Sirangelo. 2010. XML with incomplete information. J. ACM 58, 1, 4.
[14]
Michael Benedikt, Wenfei Fan, and Floris Geerts. 2008. XPath satisfiability in the presence of DTDs. J. ACM 55, 2.
[15]
Philip A. Bernstein and Sergey Melnik. 2007. Model management 2.0: manipulating richer mappings. In Proceedings of the SIGMOD Conference. 1--12.
[16]
Henrik Björklund, Wim Martens, and Thomas Schwentick. 2008. Optimizing conjunctive queries over trees using schema information. In Proceedings of the International Symposium on Mathematical Foundations of Computer Science (MFCS). 132--143.
[17]
Henrik Björklund, Wim Martens, and Thomas Schwentick. 2011. Conjunctive query containment over trees. J. Comput. Syst. Sci. 77, 3, 450--472.
[18]
Miko&lslash;aj Bojańczyk, Leszek Aleksander Ko&lslash;odziejczyk, and Filip Murlak. 2013. Solutions in XML data exchange. J. Comput. Syst. Sci. 79, 6, 785--815.
[19]
Miko&lslash;aj Bojańczyk, Anca Muscholl, Thomas Schwentick, and Luc Segoufin. 2009. Two-variable logic on data trees and XML reasoning. J. ACM 56, 3.
[20]
Rada Chirkova, Leonid Libkin, and Juan Reutter. 2012. Tractable XML data exchange via relations. Frontiers Comput. Sci. 6, 3, 243--263.
[21]
Laura Chiticariu and Wang Chiew Tan. 2006. Debugging schema mappings with routes. In Proceedings of the International Conference on Very Large Data Bases (VLDB). 79--90.
[22]
Claire David. 2008. Complexity of data tree patterns over XML documents. In Proceedings of the International Symposium on Mathematical Foundations of Computer Science (MFCS). 278--289.
[23]
Claire David, Amélie Gheerbrant, Leonid Libkin, and Wim Martens. 2013. Containment of pattern-based queries over data trees. In Proceedings of the International Conference on Database Theory (ICDT). 201--212.
[24]
Claire David, Leonid Libkin, and Filip Murlak. 2010. Certain answers for XML queries. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS). 191--202.
[25]
Ronald Fagin, Laura M. Haas, Mauricio A. Hernández, Renée J. Miller, Lucian Popa, and Yannis Velegrakis. 2009. Clio: Schema mapping creation and data exchange. In Conceptual Modeling: Foundations and Applications, 198--236.
[26]
Ronald Fagin, Phokion G. Kolaitis, Renée J. Miller, and Lucian Popa. 2005. Data exchange: Semantics and query answering. Theor. Comput. Sci. 336, 1, 89--124.
[27]
Ronald Fagin, Phokion G. Kolaitis, Lucian Popa, and Wang Chiew Tan. 2004. Composing Schema Mappings: Second-Order Dependencies to the Rescue. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS). 83--94.
[28]
Ronald Fagin, Phokion G. Kolaitis, Lucian Popa, and Wang Chiew Tan. 2008. Quasi-inverses of schema mappings. ACM Trans. Datab. Syst. 33, 2, 11:1--11:52.
[29]
Wenfei Fan and Philip Bohannon. 2008. Information preserving XML schema embedding. ACM Trans. Datab. Syst. 33, 1.
[30]
Wenfei Fan and Leonid Libkin. 2002. On XML integrity constraints in the presence of DTDs. J. ACM 49, 3, 368--406.
[31]
Amélie Gheerbrant, Leonid Libkin, and Tony Tan. 2012. On the complexity of query answering over incomplete XML documents. In Proceedings of the International Conference on Database Theory (ICDT). 169--181.
[32]
Georg Gottlob, Christoph Koch, and Klaus U. Schulz. 2006. Conjunctive queries over trees. J. ACM 53, 2, 238--272.
[33]
Georg Gottlob and Pierre Senellart. 2010. Schema mapping discovery from data instances. J. ACM 57, 2.
[34]
André Hernich, Leonid Libkin, and Nicole Schweikardt. 2011. Closed world data exchange. ACM Trans. Datab. Syst. 36, 2, 14.
[35]
Jan Hidders. 2003. Satisfiability of XPath expressions. In Proceedings of the International Symposium on Database Programming Languages (DBPL). 21--36.
[36]
Phokion G. Kolaitis. 2005. Schema mappings, data exchange, and metadata management. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS). 61--75.
[37]
Phokion G. Kolaitis, Jonathan Panttaja, and Wang Chiew Tan. 2006. The complexity of data exchange. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS). 30--39.
[38]
Harry R. Lewis. 1980. Complexity results for classes of quantificational formulas. J. Comput. Syst. Sci. 21, 3, 317--353.
[39]
Leonid Libkin. 2006. Data exchange and incomplete information. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS). 60--69.
[40]
Leonid Libkin and Cristina Sirangelo. 2011. Data exchange and schema mappings in open and closed worlds. J. Comput. Syst. Sci. 77, 3, 542--571.
[41]
Jayant Madhavan and Alon Y. Halevy. 2003. Composing mappings among data sources. In Proceedings of the International Conference on Very Large Data Bases (VLDB). 572--583.
[42]
Aleksander Mądry. 2005. Data exchange: On the complexity of answering queries with inequalities. Inform. Process. Lett. 94, 6, 253--257.
[43]
Bruno Marnette, Giansalvatore Mecca, Paolo Papotti, Salvatore Raunich, and Donatello Santoro. 2011. ++Spicy: An OpenSource tool for second-generation schema mapping and data exchange. Proc. VLDB 4, 12, 1438--1441.
[44]
Sergey Melnik, Hector Garcia-Molina, and Erhard Rahm. 2002. Similarity flooding: A versatile graph matching algorithm and its application to schema matching. In Proceedings of the International Conference on Data Engineering (ICDE). 117--128.
[45]
Gerome Miklau and Dan Suciu. 2004. Containment and equivalence for a fragment of XPath. J. ACM 51, 1, 2--45.
[46]
Tova Milo and Sagit Zohar. 1998. Using schema matching to simplify heterogeneous data translation. In Proceedings of the International Conference on Very Large Data Bases (VLDB). 122--133.
[47]
Alan Nash, Philip A. Bernstein, and Sergey Melnik. 2007. Composition of mappings given by embedded dependencies. ACM Trans. Datab. Syst. 32, 1, 4:1--4:51.
[48]
Christos Papadimitriou. 1994. Computational Complexity. Addison-Wesley.
[49]
Reinhard Pichler and Sebastian Skritek. 2011. The complexity of evaluating tuple generating dependencies. In Proceedings of the International Conference on Database Theory (ICDT). 244--255.
[50]
Lucian Popa, Yannis Velegrakis, Renée J. Miller, Mauricio A. Hernández, and Ronald Fagin. 2002. Translating web data. In Proceedings of the International Conference on Very Large Data Bases (VLDB). 598--609.
[51]
Luc Segoufin. 2006. Automata and logics for words and trees over an infinite alphabet. In Computer Science Logic (CSL), 41--57.
[52]
Balder ten Cate and Phokion G. Kolaitis. 2010. Structural characterizations of schema-mapping languages. Commun. ACM 53, 1, 101--110.
[53]
Cong Yu and Lucian Popa. 2004. Constraint-based XML query rewriting for data integration. In Proceedings of the SIGMOD Conference. 371--382.

Cited By

View all
  • (2020)Consistency and Certain Answers in Relational to RDF Data Exchange with Shape ConstraintsNew Trends in Databases and Information Systems10.1007/978-3-030-54623-6_9(97-107)Online publication date: 17-Aug-2020
  • (2018)Reflections on Schema Mappings, Data Exchange, and Metadata ManagementProceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems10.1145/3196959.3196991(107-109)Online publication date: 27-May-2018
  • (2018)Conjunctive query containment over trees using schema informationActa Informatica10.1007/s00236-016-0282-155:1(17-56)Online publication date: 1-Feb-2018
  • Show More Cited By

Index Terms

  1. XML Schema Mappings: Data Exchange and Metadata Management

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Journal of the ACM
    Journal of the ACM  Volume 61, Issue 2
    April 2014
    206 pages
    ISSN:0004-5411
    EISSN:1557-735X
    DOI:10.1145/2605175
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 24 April 2014
    Accepted: 01 November 2013
    Revised: 01 August 2013
    Received: 01 September 2012
    Published in JACM Volume 61, Issue 2

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. XML
    2. certain answers
    3. consistency
    4. data exchange
    5. incomplete information
    6. membership
    7. query answering
    8. schema mappings

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)12
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 30 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2020)Consistency and Certain Answers in Relational to RDF Data Exchange with Shape ConstraintsNew Trends in Databases and Information Systems10.1007/978-3-030-54623-6_9(97-107)Online publication date: 17-Aug-2020
    • (2018)Reflections on Schema Mappings, Data Exchange, and Metadata ManagementProceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems10.1145/3196959.3196991(107-109)Online publication date: 27-May-2018
    • (2018)Conjunctive query containment over trees using schema informationActa Informatica10.1007/s00236-016-0282-155:1(17-56)Online publication date: 1-Feb-2018
    • (2018)A general framework for big data knowledge discovery and integrationConcurrency and Computation: Practice and Experience10.1002/cpe.442230:13Online publication date: 25-Jan-2018
    • (2017)Schema Mappings for Data GraphsProceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems10.1145/3034786.3056113(389-401)Online publication date: 9-May-2017
    • (2017)Schema Mappings: From Data Translation to Data CleaningA Comprehensive Guide Through the Italian Database Research Over the Last 25 Years10.1007/978-3-319-61893-7_12(203-217)Online publication date: 31-May-2017
    • (2016)An Approach of XML Schema Matching Using Top-K Mapping2016 3rd International Conference on Information Science and Control Engineering (ICISCE)10.1109/ICISCE.2016.47(174-178)Online publication date: Jul-2016

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media