Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Meta-mappings for schema mapping reuse

Published: 01 January 2019 Publication History

Abstract

The definition of mappings between heterogeneous schemas is a critical activity of any database application. Existing tools provide high level interfaces for the discovery of correspondences between elements of schemas, but schema mappings need to be manually specified every time from scratch, even if the scenario at hand is similar to one that has already been addressed. The problem is that schema mappings are precisely defined over a pair of schemas and cannot directly be reused on different scenarios. We tackle this challenge by generalizing schema mappings as meta-mappings: formalisms that describe transformations between generic data structures called meta-schemas. We formally characterize schema mapping reuse and explain how meta-mappings are able to: (i) capture enterprise knowledge from previously defined schema mappings and (ii) use this knowledge to suggest new mappings. We develop techniques to infer meta-mappings from existing mappings, to organize them into a searchable repository, and to leverage the repository to propose to users mappings suitable for their needs. We study effectiveness and efficiency in an extensive evaluation over real-world scenarios and show that our system can infer, store, and search millions of meta-mappings in seconds.

References

[1]
D. Abadi et al. The Beckman report on database research. Commun. ACM, 59(2):92--99, Jan. 2016.
[2]
B. Alexe, B. ten Cate, P. G. Kolaitis, and W. C. Tan. Characterizing schema mappings via data examples. ACM Trans. Database Syst., 36(4):23, 2011.
[3]
B. Alexe, B. ten Cate, P. G. Kolaitis, and W. C. Tan. Designing and refining schema mappings via data examples. In SIGMOD, pages 133--144, 2011.
[4]
P. Atzeni, L. Bellomarini, P. Papotti, and R. Torlone. Metamappings for schema mapping reuse. Full version, 2018. http://www.eurecom.fr/~papotti/files/GaiaTR18.pdf.
[5]
Z. Bellahsene, A. Bonifati, and E. Rahm, editors. Schema Matching and Mapping. Data-Centric Systems and Applications. Springer, 2011.
[6]
L. Bellomarini, G. Gottlob, A. Pieris, and E. Sallinger. Swift logic for big data and knowledge graphs. In IJCAI, pages 2--10. ijcai.org, 2017.
[7]
J. L. Bentley. Multidimensional divide-and-conquer. Communications of the ACM, 23(4):214--229, 1980.
[8]
P. A. Bernstein, J. Madhavan, and E. Rahm. Generic schema matching, ten years later. PVLDB, 4(11):695--701, 2011.
[9]
P. A. Bernstein and S. Melnik. Model management 2.0: manipulating richer mappings. In SIGMOD, 2007.
[10]
C. M. Bishop. Pattern Recognition and Machine Learning. Springer-Verlag, 2006.
[11]
A. Bonifati, U. Comignani, E. Coquery, and R. Thion. Interactive mapping specification with exemplar tuples. In SIGMOD, pages 667--682, 2017.
[12]
C. Chen, B. Golshan, A. Y. Halevy, W. Tan, and A. Doan. Biggorilla: An open-source ecosystem for data preparation and integration. IEEE Data Eng. Bull., 41(2):10--22, 2018.
[13]
L. Chiticariu and W. C. Tan. Debugging schema mappings with routes. In VLDB, pages 79--90, 2006.
[14]
R. Fagin, L. M. Haas, M. A. Hernández, R. J. Miller, L. Popa, and Y. Velegrakis. Clio: Schema mapping creation and data exchange. In Conceptual Modeling, 2009.
[15]
R. Fagin, P. Kolaitis, R. Miller, and L. Popa. Data exchange: Semantics and query answering. In ICDT, 2003.
[16]
R. C. Fernandez, Z. Abedjan, S. Madden, and M. Stonebraker. Towards large-scale data discovery. In ExploreDB, pages 3--5, 2016.
[17]
E. Gamma, R. Helm, R. Johnson, and J. Vlissides. Design Patterns: Elements of Reusable Object-oriented Software. Addison-Wesley, 1995.
[18]
G. Gottlob and P. Senellart. Schema mapping discovery from data instances. J. ACM, 57(2), 2010.
[19]
P. Graham. Better bayesian filtering. In Proceedings of Spam Conference, 2003.
[20]
L. M. Haas, M. A. Hernández, H. Ho, L. Popa, and M. Roth. Clio grows up: from research prototype to industrial tool. In SIGMOD, pages 805--810. ACM, 2005.
[21]
A. Y. Halevy, F. Korn, N. F. Noy, C. Olston, N. Polyzotis, S. Roy, and S. E. Whang. Managing Google's data lake: an overview of the Goods system. IEEE Data Eng. Bull., 39(3):5--14, 2016.
[22]
J. Heer, J. M. Hellerstein, and S. Kandel. Predictive interaction for data transformation. In CIDR, 2015.
[23]
M. A. Hernández, P. Papotti, and W. C. Tan. Data exchange with data-metadata translations. PVLDB, 1(1):260--273, 2008.
[24]
V. Kantere, D. Bousounis, and T. K. Sellis. A tool for mapping discovery over revealing schemas. In EDBT, 2009.
[25]
N. Konstantinou, M. Koehler, E. Abel, C. Civili, B. Neumayr, E. Sallinger, A. A. A. Fernandes, G. Gottlob, J. A. Keane, L. Libkin, and N. W. Paton. The VADA architecture for cost-effective data wrangling. In SIGMOD, pages 1599--1602. ACM, 2017.
[26]
J. Madhavan, P. A. Bernstein, A. Doan, and A. Halevy. Corpus-based schema matching. In ICDE. IEEE, 2005.
[27]
R. J. Miller, L. M. Haas, and M. A. Hernández. Schema mapping as query discovery. In VLDB, pages 77--88, 2000.
[28]
T. Milo, S. Novgorodov, and W. Tan. Interactive rule refinement for fraud detection. In EDBT, pages 265--276, 2018.
[29]
P. Papotti and R. Torlone. Schema exchange: Generic mappings for transforming data and metadata. Data Knowl. Eng., 68(7):665--682, 2009.
[30]
B. ten Cate, P. G. Kolaitis, K. Qian, and W. Tan. Approximation algorithms for schema-mapping discovery from data examples. ACM Trans. Database Syst., 42(2):12:1--12:41, 2017.
[31]
I. G. Terrizzano, P. M. Schwarz, M. Roth, and J. E. Colino. Data wrangling: The challenging yourney from the wild to the lake. In CIDR, 2015.
[32]
R. Wisnesky, M. A. Hernández, and L. Popa. Mapping polymorphism. In ICDT, pages 196--208, 2010.

Cited By

View all
  • (2024)Alfa: active learning for graph neural network-based semantic schema alignmentThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-023-00822-z33:4(981-1011)Online publication date: 1-Jul-2024
  • (2022)Understanding Queries by Conditional InstancesProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3517898(355-368)Online publication date: 10-Jun-2022
  • (2022)WDA: A Domain-Aware Database Schema Analysis for Improving OBDA-Based Event Log ExtractionsAdvanced Data Mining and Applications10.1007/978-3-031-22137-8_22(297-309)Online publication date: 30-Nov-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 12, Issue 5
January 2019
163 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 January 2019
Published in PVLDB Volume 12, Issue 5

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)2
Reflects downloads up to 23 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Alfa: active learning for graph neural network-based semantic schema alignmentThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-023-00822-z33:4(981-1011)Online publication date: 1-Jul-2024
  • (2022)Understanding Queries by Conditional InstancesProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3517898(355-368)Online publication date: 10-Jun-2022
  • (2022)WDA: A Domain-Aware Database Schema Analysis for Improving OBDA-Based Event Log ExtractionsAdvanced Data Mining and Applications10.1007/978-3-031-22137-8_22(297-309)Online publication date: 30-Nov-2022
  • (2020)Neither in the Programs Nor in the Data: Mining the Hidden Financial Knowledge with Knowledge Graphs and ReasoningMining Data for Financial Applications10.1007/978-3-030-66981-2_10(119-134)Online publication date: 18-Sep-2020

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media