Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2247596.2247640acmotherconferencesArticle/Chapter ViewAbstractPublication PagesedbtConference Proceedingsconference-collections
research-article

Efficient distributed query processing for autonomous RDF databases

Published: 27 March 2012 Publication History
  • Get Citation Alerts
  • Abstract

    The inherent flexibility of the RDF data model has led to its notable adoption in many domains, especially in the area of life-sciences. Some of these domains have an emerging need to access data integrated from various distributed sources of information. It is not always possible to implement this by simply loading all data into one central RDF store. For example, in the context of inter-institutional collaboration for drug development and clinical research participants often want to maintain control over their local databases. Alternatively, distributed query processing techniques can be utilized to evaluate queries by accessing the remote data sources only on demand and in conformance with local authorization models. In this paper we present an efficient approach to distributed query processing for large autonomous RDF databases. The groundwork is laid by a comprehensive RDF-specific schema- and instance-level synopsis. We present an optimizer that is able to utilize this synopsis to generate compact execution plans by precisely determining, at compile-time, those sources that are relevant to a query. Furthermore we present a tightly integrated query engine that is able to further reduce the volume of intermediate results at run-time. An extensive evaluation shows that our approach improves query execution times by up to two and transferred data volumes by up to three orders of magnitude compared to a naïve implementation.

    References

    [1]
    Linked data - design issues. http://www.w3.org/DesignIssues/LinkedData.html.
    [2]
    OWL web ontology language overview. http://www.w3.org/TR/owl-features/.
    [3]
    RDF primer. http://www.w3.org/TR/rdf-primer/.
    [4]
    RDF vocabulary description language 1.0: RDF schema. http://www.w3.org/TR/rdf-schema/.
    [5]
    SPARQL 1.1 federation extensions. http://www.w3.org/TR/sparql11-federated-query/.
    [6]
    SPARQL query language for RDF. http://www.w3.org/TR/rdf-sparql-query/.
    [7]
    D. J. Abadi, A. Marcus, S. Madden, and K. J. Hollenbach. Scalable semantic web data management using vertical partitioning. In VLDB, pages 411--422. ACM, 2007.
    [8]
    M. Arias, J. D. Fernández, M. A. Martínez-Prieto, and P. de la Fuente. An empirical study of real-world SPARQL queries. CoRR, abs/1103.5043, 2011.
    [9]
    C. Basca and A. Bernstein. Avalanche: putting the spirit of the web back into semantic web querying. In SSWS, pages 64--79, 2010.
    [10]
    F. Belleau, M.-A. Nolin, N. Tourigny, P. Rigault, and J. Morissette. Bio2RDF: Towards a mashup to build bioinformatics knowledge systems. Journal of Biomedical Informatics, 41(5):706--716, 2008.
    [11]
    K. -H. Cheung, H. R. Frost, M. S. Marshall, E. Prud'hommeaux, M. Samwald, J. Zhao, and A. Paschke. A journey to semantic web query federation in the life sciences. BMC Bioinformatics, 10(S-10):10, 2009.
    [12]
    P. Fender and G. Moerkotte. A new, highly efficient, and easy to implement top-down join enumeration algorithm. In ICDE, pages 864--875, 2011.
    [13]
    K. Goh, M. E. Cusick, D. Valle, B. Childs, M. Vidal, and A.-L. Barabási. The human disease network. PNAS, 104(21):8685--8690, May 2007.
    [14]
    A. Harth, K. Hose, M. Karnstedt, A. Polleres, K.-U. Sattler, and J. Umbrich. Data summaries for on-demand queries over linked data. In WWW, pages 411--420. ACM, 2010.
    [15]
    O. Hartig, C. Bizer, and J. C. Freytag. Executing sparql queries over the web of linked data. In ISWC, pages 293--309. Springer, 2009.
    [16]
    J. Huang, D. Abadi, and K. Ren. Scalable SPARQL querying of large RDF graphs. In VLDB, pages 1123--1134. ACM, 2011.
    [17]
    D. Kossmann. The state of the art in distributed query processing. ACM Comput. Surv., 32(4):422--469, 2000.
    [18]
    G. Ladwig and T. Tran. Linked data query processing strategies. In ISWC, pages 453--469. Springer, 2010.
    [19]
    A. Langegger, W. Wöß, and M. Blöchl. A semantic web middleware for virtual data integration on the web. In ESWC, pages 493--507. Springer, 2008.
    [20]
    S. T. Leutenegger, J. M. Edgington, and M. A. Lopez. STR: A simple and efficient algorithm for R-Tree packing. In ICDE, pages 497--506. IEEE Computer Society, 1997.
    [21]
    Y. Li and J. Heflin. Using reformulation trees to optimize queries over distributed heterogeneous sources. In ISWC, pages 502--517. Springer, 2010.
    [22]
    T. Neumann and G. Moerkotte. Characteristic sets: Accurate cardinality estimation for RDF queries with multiple joins. In ICDE, pages 984--994. IEEE Computer Society, 2011.
    [23]
    T. Neumann and G. Weikum. The RDF-3X engine for scalable management of RDF data. VLDB J., 19(1):91--113, 2010.
    [24]
    B. Quilitz and U. Leser. Querying distributed RDF data sources with SPARQL. In ESWC, pages 524--538. Springer, 2008.
    [25]
    L. Sidirourgos, R. Goncalves, M. L. Kersten, N. Nes, and S. Manegold. Column-store support for RDF data management: not all swans are white. PVLDB, 1(2):1553--1563, 2008.
    [26]
    K. Stocker, D. Kossmann, R. Braumandl, and A. Kemper. Integrating semi-join-reducers into state of the art query processors. In ICDE, pages 575--584. IEEE Computer Society, 2001.
    [27]
    H. Stuckenschmidt, R. Vdovjak, G.-J. Houben, and J. Broekstra. Index structures and algorithms for querying distributed RDF repositories. In WWW, pages 631--639. ACM, 2004.
    [28]
    L. Zou, J. Mo, L. Chen, M. T. Özsu, and D. Zhao. gStore: Answering SPARQL queries via subgraph matching. PVLDB, 4(8):482--493, May 2011.

    Cited By

    View all
    • (2024)A systematic overview of data federation systemsSemantic Web10.3233/SW-22320115:1(107-165)Online publication date: 12-Jan-2024
    • (2024)Answering Property Path Queries over Federated RDF SystemsWeb and Big Data10.1007/978-981-97-2387-4_2(16-31)Online publication date: 28-Apr-2024
    • (2023)Optimizing Keyword Search Over Federated RDF SystemsIEEE Transactions on Big Data10.1109/TBDATA.2022.32247499:3(918-935)Online publication date: 1-Jun-2023
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    EDBT '12: Proceedings of the 15th International Conference on Extending Database Technology
    March 2012
    643 pages
    ISBN:9781450307901
    DOI:10.1145/2247596
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 March 2012

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. RDF
    2. SPARQL
    3. distributed query processing

    Qualifiers

    • Research-article

    Conference

    EDBT '12

    Acceptance Rates

    Overall Acceptance Rate 7 of 10 submissions, 70%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)6
    • Downloads (Last 6 weeks)1

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)A systematic overview of data federation systemsSemantic Web10.3233/SW-22320115:1(107-165)Online publication date: 12-Jan-2024
    • (2024)Answering Property Path Queries over Federated RDF SystemsWeb and Big Data10.1007/978-981-97-2387-4_2(16-31)Online publication date: 28-Apr-2024
    • (2023)Optimizing Keyword Search Over Federated RDF SystemsIEEE Transactions on Big Data10.1109/TBDATA.2022.32247499:3(918-935)Online publication date: 1-Jun-2023
    • (2023)A Cost-Driven Top-K Queries Optimization Approach on Federated RDF SystemsIEEE Transactions on Big Data10.1109/TBDATA.2022.31560909:2(665-676)Online publication date: 1-Apr-2023
    • (2021)Subgraph matching over graph federationProceedings of the VLDB Endowment10.14778/3494124.349412915:3(437-450)Online publication date: 1-Nov-2021
    • (2019)Optimizing Multi-Query Evaluation in Federated RDF SystemsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2019.2947050(1-1)Online publication date: 2019
    • (2019)Partitioning Large-Scale Property Graph for Efficient Distributed Query Processing2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS)10.1109/HPCC/SmartCity/DSS.2019.00225(1643-1650)Online publication date: Aug-2019
    • (2019)Federated RDF Query ProcessingEncyclopedia of Big Data Technologies10.1007/978-3-319-77525-8_228(754-761)Online publication date: 20-Feb-2019
    • (2018)Multi-query Optimization in Federated RDF SystemsDatabase Systems for Advanced Applications10.1007/978-3-319-91452-7_48(745-765)Online publication date: 13-May-2018
    • (2018)Federated RDF Query ProcessingEncyclopedia of Big Data Technologies10.1007/978-3-319-63962-8_228-1(1-8)Online publication date: 21-Feb-2018
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media