Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2949689.2949698acmotherconferencesArticle/Chapter ViewAbstractPublication PagesssdbmConference Proceedingsconference-collections
research-article

Functional Dependencies Unleashed for Scalable Data Exchange

Published: 18 July 2016 Publication History

Abstract

We address the problem of efficiently evaluating target functional dependencies (fds) in the Data Exchange (DE) process. Target fds naturally occur in many DE scenarios, including the ones in Life Sciences in which multiple source relations need to be structured under a constrained target schema. However, despite their wide use, target fds' evaluation is still a bottleneck in the state-of-the-art DE engines. Systems relying on an all-SQL approach typically do not support target fds unless additional information is provided. Alternatively, DE engines that do include these dependencies typically pay the price of a significant drop in performance and scalability. In this paper, we present a novel chase-based algorithm that can efficiently handle arbitrary fds on the target. Our approach essentially relies on exploiting the interactions between source-to-target (s-t) tuple-generating dependencies (tgds) and target fds. This allows us to tame the size of the intermediate chase results, by playing on a careful ordering of chase steps interleaving fds and (chosen) tgds. As a direct consequence, we importantly diminish the fd application scope, often a central cause of the dramatic overhead induced by target fds. Moreover, reasoning on dependency interaction further leads us to interesting parallelization opportunities, yielding additional scalability gains. We provide a proof-of-concept implementation of our chase-based algorithm and an experimental study aimed at gauging its scalability and efficiency. Finally, we empirically compare with the latest DE engines, and show that our algorithm outperforms them.

References

[1]
P. C. Arocena, R. Ciucanu, B. Glavic, and R. J. Miller. Gain control over your integration evaluations. PVLDB, 8(12):1960--1963, 2015.
[2]
M. Benedikt, J. Leblay, and E. Tsamoura. Querying with access patterns and integrity constraints. PVLDB, 8(6):690--701, 2015.
[3]
A. Bonifati, I. Ileana, and M. Linardi. Functional dependencies unleashed for scalable data exchange, 2016. http://arxiv.org/abs/1602.00563.
[4]
A. Calì, G. Gottlob, and M. Kifer. Taming the infinite chase: Query answering under expressive relational constraints. J. Artif. Intell. Res. (JAIR), 48:115--174, 2013.
[5]
A. Deutsch, L. Popa, and V. Tannen. Physical data independence, constraints, and optimization with universal plans. In Proceedings of VLDB, p. 459--470, 1999.
[6]
R. Fagin, P. G. Kolaitis, R. J. Miller, and L. Popa. Data exchange: semantics and query answering. Theoretical Computer Science, 336(1):89--124, 2005.
[7]
R. Fagin, P. G. Kolaitis, and L. Popa. Data exchange: getting to the core. ACM Trans. Database Syst., 30(1):174--210, 2005.
[8]
F. Geerts, G. Mecca, P. Papotti, and D. Santoro. Mapping and cleaning. In IEEE 30th International Conference on Data Engineering, Chicago, ICDE 2014, IL, USA, March 31 - April 4, 2014, pages 232--243, 2014.
[9]
G. Gottlob and A. Nash. Efficient core computation in data exchange. J. ACM, 55(2), 2008.
[10]
G. Gottlob, R. Pichler, and V. Savenkov. Normalization and optimization of schema mappings. VLDB J., 20(2):277--302, 2011.
[11]
G. Gottlob, S. Rudolph, and M. Simkus. Expressiveness of guarded existential rule languages. In Proceedings of PODS'14, pages 27--38, 2014.
[12]
G. Grahne and A. Onet. The data-exchange chase under the microscope. CoRR, abs/1407.2279, 2014.
[13]
T. J. Green, G. Karvounarakis, Z. G. Ives, and V. Tannen. Update exchange with mappings and provenance. In Proceedings of VLDB, pages 675--686, 2007.
[14]
P. G. Kolaitis, J. Panttaja, and W. C. Tan. The complexity of data exchange. In Proceedings of PODS, p. 30--39, 2006.
[15]
G. Konstantinidis and J. L. Ambite. Optimizing the chase: Scalable data integration under constraints. PVLDB, 7(14):1869--1880, 2014.
[16]
B. Marnette, G. Mecca, and P. Papotti. Scalable data exchange with functional dependencies. PVLDB, 3(1):105--116, 2010.
[17]
B. Marnette, G. Mecca, P. Papotti, S. Raunich, and D. Santoro. ++Spicy: an OpenSource Tool for Second-Generation Schema Mapping and Data Exchange. PVLDB, 4(12):1438--1441, 2011.
[18]
R. Pichler and V. Savenkov. Towards practical feasibility of core computation in data exchange. Theor. Comput. Sci., 411(7-9):935--957, 2010.
[19]
L. Popa, Y. Velegrakis, M. A. Hernández, R. J. Miller, and R. Fagin. Translating web data. In Proceedings of VLDB, pages 598--609, 2002.
[20]
B. ten Cate, L. Chiticariu, P. G. Kolaitis, and W. C. Tan. Laconic schema mappings: Computing the core with SQL queries. PVLDB, 2(1):1006--1017, 2009.
[21]
B. ten Cate, R. L. Halpert, and P. G. Kolaitis. Practical query answering in data exchange under inconsistencytolerant semantics. In Proceedings of EDBT, pages 233--244, 2016.
[22]
http://www.db.unibas.it/projects/llunatic/.
[23]
http://www.db.unibas.it/projects/spicy/.
[24]
www.mi.parisdescartes.fr/~mlinardi/FD_DE_paper.html.

Cited By

View all
  • (2023)Exploiting the Power of Equality-Generating Dependencies in Ontological ReasoningProceedings of the VLDB Endowment10.14778/3565838.356585015:13(3976-3988)Online publication date: 20-Jan-2023
  • (2022)iWarded: A Versatile Generator to Benchmark Warded Datalog+/– ReasoningRules and Reasoning10.1007/978-3-031-21541-4_8(113-129)Online publication date: 14-Dec-2022
  • (2021)Materializing knowledge bases via trigger graphsProceedings of the VLDB Endowment10.14778/3447689.344769914:6(943-956)Online publication date: 12-Apr-2021
  • Show More Cited By
  1. Functional Dependencies Unleashed for Scalable Data Exchange

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    SSDBM '16: Proceedings of the 28th International Conference on Scientific and Statistical Database Management
    July 2016
    290 pages
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 18 July 2016

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Chase
    2. functional dependencies
    3. parallelization

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    Conference

    SSDBM '16

    Acceptance Rates

    Overall Acceptance Rate 56 of 146 submissions, 38%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)4
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 07 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Exploiting the Power of Equality-Generating Dependencies in Ontological ReasoningProceedings of the VLDB Endowment10.14778/3565838.356585015:13(3976-3988)Online publication date: 20-Jan-2023
    • (2022)iWarded: A Versatile Generator to Benchmark Warded Datalog+/– ReasoningRules and Reasoning10.1007/978-3-031-21541-4_8(113-129)Online publication date: 14-Dec-2022
    • (2021)Materializing knowledge bases via trigger graphsProceedings of the VLDB Endowment10.14778/3447689.344769914:6(943-956)Online publication date: 12-Apr-2021
    • (2019)MapRepairProceedings of the 2019 International Conference on Management of Data10.1145/3299869.3320228(1873-1876)Online publication date: 25-Jun-2019
    • (2019)Dependencies for GraphsACM Transactions on Database Systems10.1145/328728544:2(1-40)Online publication date: 13-Feb-2019
    • (2019)Graph Data Integration and ExchangeEncyclopedia of Big Data Technologies10.1007/978-3-319-77525-8_209(815-822)Online publication date: 20-Feb-2019
    • (2019)Too Much Information: Can AI Cope with Modern Knowledge Graphs?Formal Concept Analysis10.1007/978-3-030-21462-3_2(17-31)Online publication date: 23-May-2019
    • (2018)The Vadalog systemProceedings of the VLDB Endowment10.14778/3213880.321388811:9(975-987)Online publication date: 1-May-2018
    • (2018)Parallel Reasoning of Graph Functional Dependencies2018 IEEE 34th International Conference on Data Engineering (ICDE)10.1109/ICDE.2018.00060(593-604)Online publication date: Apr-2018
    • (2018)Efficient Model Construction for Horn Logic with VLogAutomated Reasoning10.1007/978-3-319-94205-6_44(680-688)Online publication date: 30-Jun-2018
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media