Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1871437.1871479acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Automatic schema merging using mapping constraints among incomplete sources

Published: 26 October 2010 Publication History

Abstract

Schema merging is the process of consolidating multiple schemas into a unified view. The task becomes particularly challenging when the schemas are highly heterogeneous and autonomous. Classical data integration systems rely on a mediated schema created by human experts through an intensive design process.
In this paper, we present a novel approach for merging multiple relational data sources related by a collection of mapping constraints in the form of P2P style tuple-generating dependencies (tgds). In the scenario of data integration, we opt for minimal mediated schemas that are complete regarding certain answers of conjunctive queries. Under Open World Assumption (OWA), we characterize the semantics of schema merging by properties of the output mapping system between the source schemas and the mediated schema. We propose a merging algorithm following a redundancy reduction paradigm and prove that the output satisfies the desired logical properties. Recognizing the fact that multiple plausible mediated schemas may co-exist, a variant of the a priori algorithm is employed to enumerate alternative mediated schemas. Output mappings in the form of data dependencies are generated to support the mediated schemas, which enables query processing. We have evaluated our merging approach over a collection of real world data sets, which demonstrate the applicability and effectiveness of our approach in practice.

References

[1]
S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995.
[2]
R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. In Proc. VLDB, pages 487--499, 1994.
[3]
M. Arenas, R. Fagin, and A. Nash. Composition with target constraints. In ICDT, 2010.
[4]
M. Arenas, J. Pérez, J. L. Reutter, and C. Riveros. Foundations of schema mapping management. In PODS, pages 227--238, 2010.
[5]
M. Arenas, J. Pérez, and C. Riveros. The recovery of a schema mapping: bringing exchanged data back. In PODS, pages 13--22, 2008.
[6]
C. Batini, M. Lenzerini, and S. B. Navathe. A comparative analysis of methodologies for database schema integration. ACM Computing Surveys, 18(4):323--364, 1986.
[7]
P. A. Bernstein, A. Y. Halevy, and R. Pottinger. A vision for management of complex models. SIGMOD Record, 29(4):55--63, 2000.
[8]
P. A. Bernstein and H. Ho. Model management and schema mappings: Theory and practice. In Proc. VLDB, pages 1439--1440, 2007.
[9]
P. A. Bernstein and S. Melnik. Model management 2.0: Manipulating richer mappings. In Proc. SIGMOD, pages 1--12, Beijing, China, 2007.
[10]
J. Biskup and B. Convent. A formal view integration method. In Proc. SIGMOD, pages 398--407, Washington, D.C., 1986.
[11]
M. A. Casanova and V. M. P. Vidal. Towards a sound view integration methodology. In PODS, pages 36--47, Atlanta, GA, 1983. ACM.
[12]
L. Chiticariu, P. G. Kolaitis, and L. Popa. Interactive generation of integrated schemas. In Proc. SIGMOD, pages 833--846, 2008.
[13]
A. Deutsch, A. Nash, and J. Remmel. The chase revisited. In PODS, pages 149--158, 2008.
[14]
A. Doan and A. Y. Halevy. Semantic integration research in the database community: A brief survey. AI Magazine, 26(1):83--94, 2005.
[15]
X. L. Dong, A. Y. Halevy, and C. Yu. Data integration with uncertainty. VLDB J., 18(2):469--500, 2009.
[16]
O. M. Duschka and M. R. Genesereth. Answering recursive queries using views. In Proc. PODS, pages 109--116, 1997.
[17]
R. Fagin. Inverting schema mappings. In Proc. PODS, pages 50--59, 2006.
[18]
R. Fagin, L. M. Haas, M. A. Hernández, R. J. Miller, L. Popa, and Y. Velegrakis. Clio: Schema mapping creation and data exchange. In Conceptual Modeling: Foundations and Applications, volume 5600 of LNCS, pages 198--236. Springer, 2009.
[19]
R. Fagin, P. Kolaitis, R. J. Miller, and L. Popa. Data exchange: Semantics and query answering. Theoretical Computer Science, 336:89--124, 2005.
[20]
R. Fagin, P. G. Kolaitis, and L. Popa. Data exchange: getting to the core. ACM Trans. Database Syst., 30(1):174--210, 2005.
[21]
R. Fagin, P. G. Kolaitis, L. Popa, and W. C. Tan. Composing schema mappings: Second-order dependencies to the rescue. ACM Trans. Database Syst., 30(4):994--1055, 2005.
[22]
K. Gouda and M. J. Zaki. Genmax: An efficient algorithm for mining maximal frequent itemsets. Data Mining & Knowledge Discovery, 11(3):223--242, 2005.
[23]
R. Hull. Relative information capacity of simple relational database schemata. SIAM Journal of Computing, 15(3):856--886, August 1986.
[24]
M. Lenzerini. Data integration: A theoretical perspective. In PODS, pages 233--246, 2002.
[25]
S. Melnik. Generic Model Management: Concepts and Algorithms. PhD thesis, Universität Leipzig, 2004.
[26]
R. J. Miller, Y. E. Ioannidis, and R. Ramakrishnan. The use of information capacity in schema integration and translation. In Proc. VLDB, pages 120--133. Morgan Kaufmann, 1993.
[27]
R. Pottinger and P. A. Bernstein. Merging models based on given correspondences. In Proc. VLDB, pages 826--873, 2003.
[28]
R. Pottinger and P. A. Bernstein. Schema merging and mapping creation for relational sources. In Proc. EDBT, 2008.
[29]
C. Quix, D. Kensche, and X. Li. Generic schema merging. In Proc. CAiSE'07, volume 4495 of LNCS, pages 127--141, 2007.
[30]
A. Radwan, L. Popa, I. R. Stanoi, and A. A. Younis. Top-k generation of integrated schemas based on directed and weighted correspondences. In Proc. SIGMOD, pages 641--654, 2009.
[31]
A. D. Sarma, X. Dong, and A. Y. Halevy. Bootstrapping pay-as-you-go data integration systems. In Proc. SIGMOD, pages 861--874, 2008.
[32]
S. Spaccapietra, C. Parent, and Y. Dupont. Model independent assertions for integration of heterogeneous schemas. VLDB Journal, 1(1):81--126, 1992.
[33]
J. D. Ullman. Information integration using logical views. In Proc. ICDT, pages 19--40, Delphi, Greece, 1997. Springer.

Cited By

View all
  • (2020)A retrospective on Telos as a metamodeling language for requirements engineeringRequirements Engineering10.1007/s00766-020-00329-xOnline publication date: 12-Mar-2020
  • (2017)On Warehouses, Lakes, and Spaces: The Changing Role of Conceptual Modeling for Data IntegrationConceptual Modeling Perspectives10.1007/978-3-319-67271-7_16(231-245)Online publication date: 13-Oct-2017
  • (2014)Visual data integration based on description logic reasoningProceedings of the 18th International Database Engineering & Applications Symposium10.1145/2628194.2628215(19-28)Online publication date: 7-Jul-2014
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge management
October 2010
2036 pages
ISBN:9781450300995
DOI:10.1145/1871437
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 October 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data integration
  2. model management
  3. schema mappings
  4. schema merging

Qualifiers

  • Research-article

Conference

CIKM '10

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2020)A retrospective on Telos as a metamodeling language for requirements engineeringRequirements Engineering10.1007/s00766-020-00329-xOnline publication date: 12-Mar-2020
  • (2017)On Warehouses, Lakes, and Spaces: The Changing Role of Conceptual Modeling for Data IntegrationConceptual Modeling Perspectives10.1007/978-3-319-67271-7_16(231-245)Online publication date: 13-Oct-2017
  • (2014)Visual data integration based on description logic reasoningProceedings of the 18th International Database Engineering & Applications Symposium10.1145/2628194.2628215(19-28)Online publication date: 7-Jul-2014
  • (2014)High-Level Operations for Creation and Maintenance of Temporal and Conventional Schema in the tauXSchema Framework2014 21st International Symposium on Temporal Representation and Reasoning10.1109/TIME.2014.14(101-110)Online publication date: Sep-2014
  • (2014)Target-driven merging of taxonomies with AtomInformation Systems10.1016/j.is.2013.11.00142(1-14)Online publication date: 1-Jun-2014
  • (2014)Data-centric intelligent information integration--from concepts to automationJournal of Intelligent Information Systems10.1007/s10844-014-0340-543:3(437-462)Online publication date: 1-Dec-2014
  • (2011)Merging relational viewsProceedings of the 30th international conference on Conceptual modeling10.5555/2075144.2075182(379-392)Online publication date: 31-Oct-2011

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media