Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3299869.3319864acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

An Efficient Index for RDF Query Containment

Published: 25 June 2019 Publication History
  • Get Citation Alerts
  • Abstract

    Query containment is a fundamental operation used to expedite query processing in view materialisation and query caching techniques. Since query containment has been shown to be NP-complete for arbitrary conjunctive queries on RDF graphs, we introduce a simpler form of conjunctive queries that we name f-graph queries. We first show that containment checking for f-graph queries can be solved in polynomial time. Based on this observation, we propose a novel indexing structure, named mv-index, that allows for fast containment checking between a single f-graph query and an arbitrary number of stored queries. Search is performed in polynomial time in the combined size of the query and the index. We then show how our algorithms and structures can be extended for arbitrary conjunctive queries on RDF graphs by introducing f-graph witnesses, i.e., f-graph representatives of conjunctive queries. F-graph witnesses have the following interesting property, a conjunctive query for RDF graphs is contained in another query only if its corresponding f-graph witness is also contained in it. The latter allows to use our indexing structure for the general case of conjunctive query containment. This translates in practice to microseconds or less for the containment test against hundreds of thousands of queries that are indexed within our structure.

    References

    [1]
    D. J. Abadi, A. Marcus, S. R. Madden, and K. Hollenbach. Sw-store: a vertically partitioned dbms for semantic web data management. VLDB J., 18(2):385--406, 2009.
    [2]
    F. N. Afrati, M. Damigos, and M. Gergatsoulis. Query containment under bag and bag-set semantics. Information Processing Letters, 110(10):360--369, 2010.
    [3]
    S. Agrawal, S. Chaudhuri, and V. R. Narasayya. Automated selection of materialized views and indexes in sql databases. In VLDB, volume 2000, pages 496--505, 2000.
    [4]
    G. Alucc, O. Hartig, M. T. Özsu, and K. Daudjee. Diversified stress testing of rdf data management systems. In ISWC, pages 197--212, 2014.
    [5]
    R. Angles and C. Gutierrez. The expressive power of sparql. ISWC, pages 114--129, 2008.
    [6]
    M. Arenas, B. C. Grau, E. Kharlamov, S. Marciuska, and D. Zheleznyakov. Faceted search over rdf-based knowledge graphs. J. Web Sem., 37--38:55--74, 2016.
    [7]
    M. Atre, V. Chaoji, M. J. Zaki, and J. A. Hendler. Matrix bit loaded: a scalable lightweight join query processor for rdf data. In WWW, pages 41--50. ACM, 2010.
    [8]
    S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. Ives. Dbpedia: A nucleus for a web of open data. ISWC/ASWC, pages 722--735, 2007.
    [9]
    C. Bizer and A. Schultz. The berlin sparql benchmark. International Journal on Semantic Web and Information Systems (IJSWIS), 5(2):1--24, 2009.
    [10]
    D. Bleco and Y. Kotidis. Graph analytics on massive collections of small graphs. In EDBT, pages 523--534. Citeseer, 2014.
    [11]
    D. Bleco and Y. Kotidis. Using entropy metrics for pruning very large graph cubes. Information Systems, 81:49--62, 2019.
    [12]
    A. Bonifati, W. Martens, and T. Timm. An analytical study of large sparql query logs. PVLDB, 11(2):149--161, 2017.
    [13]
    S. Borgwardt, T. Mailis, R. Pe naloza, and A.-Y. Turhan. Answering fuzzy conjunctive queries over finitely valued fuzzy ontologies. Journal on Data Semantics, 5(2):55--75, 2016.
    [14]
    D. Brickley and R. V. Guha. Rdf vocabulary description language 1.0: Rdf schema. https://www.w3.org/TR/rdf-schema/, 2004.
    [15]
    J. Broekstra, A. Kampman, and F. Van Harmelen. Sesame: A generic architecture for storing and querying rdf and rdf schema. In ISWC, pages 54--68, 2002.
    [16]
    B. Cautis and E. Kharlamov. Answering queries using views over probabilistic XML: complexity and tractability. PVLDB, 5(11):1148--1159, 2012.
    [17]
    A. K. Chandra and P. M. Merlin. Optimal implementation of conjunctive queries in relational data bases. In STOC. ACM, 1977.
    [18]
    S. Chaudhuri and M. Y. Vardi. Optimization of real conjunctive queries. In PODS, pages 59--70. ACM, 1993.
    [19]
    M. W. Chekol, J. Euzenat, P. Genevès, and N. Laya"ida. Sparql query containment under rdfs entailment regime. In IJCAR, 2012.
    [20]
    C. Chekuri and A. Rajaraman. Conjunctive query containment revisited. ICDT, pages 56--70, 1997.
    [21]
    T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to algorithms. MIT press, 2009.
    [22]
    S. Dar, M. J. Franklin, B. T. Jonsson, D. Srivastava, M. Tan, et al. Semantic data caching and replacement. VLDB, 96:330--341, 1996.
    [23]
    V. Dritsou, P. Constantopoulos, A. Deligiannakis, and Y. Kotidis. Optimizing query shortcuts in rdf databases. ESWC, pages 77--92, 2011.
    [24]
    O. Erling, A. Averbuch, J. Larriba-Pey, H. Chafi, A. Gubichev, A. Prat, M.-D. Pham, and P. Boncz. The ldbc social network benchmark: Interactive workload. In SIGMOD, pages 619--630. ACM, 2015.
    [25]
    R. Giugno, V. Bonnici, N. Bombieri, A. Pulvirenti, A. Ferro, and D. Shasha. Grapes: A software for parallel searching on biological graphs targeting multi-core architectures. PloS one, 8(10):e76911, 2013.
    [26]
    F. Goasdoué, K. Karanasos, J. Leblay, and I. Manolescu. View selection in semantic web databases. PVLDB, 5(2):97--108, 2011.
    [27]
    G. Gottlob, N. Leone, and F. Scarcello. The complexity of acyclic conjunctive queries. Journal of the ACM, 48(3):431--498, 2001.
    [28]
    Y. Guo, Z. Pan, and J. Heflin. Lubm: A benchmark for owl knowledge base systems. J. Web Semant., 3(2--3):158--182, 2005.
    [29]
    A. Gupta, I. S. Mumick, et al. Maintenance of materialized views: Problems, techniques, and applications. IEEE Data Eng. Bull., 18(2):3--18, 1995.
    [30]
    S. Gurajada, S. Seufert, I. Miliaraki, and M. Theobald. Triad: a distributed shared-nothing rdf engine based on asynchronous message passing. In SIGMOD, pages 289--300. ACM, 2014.
    [31]
    C. Gutierrez, C. A. Hurtado, A. O. Mendelzon, and J. Pérez. Foundations of semantic web databases. JCSS, 77(3):520--541, 2011.
    [32]
    V. Harinarayan, A. Rajaraman, and J. D. Ullman. Implementing data cubes efficiently. ACM SIGMOD Record, 25:205--216, 1996.
    [33]
    J. Huang, D. J. Abadi, and K. Ren. Scalable sparql querying of large rdf graphs. PVLDB, 4(11):1123--1134, 2011.
    [34]
    Y. E. Ioannidis and R. Ramakrishnan. Containment of conjunctive queries: Beyond relations as sets. TODS, 20(3):288--324, 1995.
    [35]
    A. Jena. semantic web framework for java, 2007.
    [36]
    D. S. Johnson and A. Klug. Optimizing conjunctive queries that contain untyped variables. SIAM Journal on Computing, 12(4):616--640, 1983.
    [37]
    G. Karvounarakis, A. Magganaraki, S. Alexaki, V. Christophides, D. Plexousakis, M. Scholl, and K. Tolle. Querying the semantic web with rql. Computer networks, 42(5):617--640, 2003.
    [38]
    E. Kharlamov, S. Brandt, E. Jimé nez-Ruiz, Y. Kotidis, S. Lamparter, T. Mailis, C. Neuenstadt, Ö. L. Ö zcc ep, C. Pinkel, C. Svingos, D. Zheleznyakov, I. Horrocks, Y. E. Ioannidis, and R. Mö ller. Ontology-based integration of streaming and static relational data with optique. In SIGMOD, pages 2109--2112, 2016.
    [39]
    E. Kharlamov, Y. Kotidis, T. Mailis, C. Neuenstadt, C. Nikolaou, Ö. Özcc ep, C. Svingos, D. Zheleznyakov, S. Brandt, I. Horrocks, Y. E. Ioannidis, S. Lamparter, and R. Möller. Towards analytics aware ontology based access to static and streaming data. In ISWC, pages 344--362, 2016.
    [40]
    E. Kharlamov, Y. Kotidis, T. Mailis, C. Neuenstadt, C. Nikolaou, Ö. Özcc ep, C. Svingos, D. Zheleznyakov, Y. Ioannnidis, S. Lamparter, R. Möller, and A. Waaler. An ontology-mediated analytics-aware approach to support monitoring and diagnostics of static and streaming data. J. Web Semant., 2019.
    [41]
    K. Klein, N. Kriege, and P. Mutzel. Ct-index: Fingerprint-based graph indexing combining cycles and trees. In ICDE, pages 1115--1126, 2011.
    [42]
    A. Klug. On conjunctive queries containing inequalities. JACM, 35(1):146--160, 1988.
    [43]
    Y. Kotidis and N. Roussopoulos. A case for dynamic view management. TODS, 26(4):388--423, 2001.
    [44]
    A. Letelier, J. Pérez, R. Pichler, and S. Skritek. Static analysis and optimization of semantic web queries. TODS, 38(4):25, 2013.
    [45]
    A. Y. Levy, A. O. Mendelzon, and Y. Sagiv. Answering queries using views. In PODS, pages 95--104. ACM, 1995.
    [46]
    T. Mailis, G. Stoilos, and G. Stamou. Expressive reasoning with horn rules and fuzzy description logics. Knowledge and information systems, 25(1):105--136, 2010.
    [47]
    S. Malyshev, M. Krötzsch, L. González, J. Gonsior, and A. Bielefeldt. Getting the most out of wikidata: Semantic technology usage in wikipedia's knowledge graph. In ISWC, pages 376--394, 2018.
    [48]
    B. McBride. Jena: Implementing the rdf model and syntax specification. In ISWC, pages 23--28. CEUR-WS. org, 2001.
    [49]
    M. Meimaris, G. Papastefanatos, N. Mamoulis, and I. Anagnostopoulos. Extended characteristic sets: graph indexing for sparql query optimization. In ICDE, pages 497--508, 2017.
    [50]
    H. Mistry, P. Roy, S. Sudarshan, and K. Ramamritham. Materialized view selection and maintenance using multi-query optimization. In ACM SIGMOD Record, volume 30, pages 307--318. ACM, 2001.
    [51]
    K. Morfonios, S. Konakas, Y. Ioannidis, and N. Kotsis. Rolap implementations of the data cube. ACM Computing Surveys (CSUR), 39(4):12, 2007.
    [52]
    D. R. Morrison. Patricia--practical algorithm to retrieve information coded in alphanumeric. JACM, 15(4):514--534, 1968.
    [53]
    Y. Nenov, R. Piro, B. Motik, I. Horrocks, Z. Wu, and J. Banerjee. Rdfox: A highly-scalable rdf store. In ISWC, pages 3--20, 2015.
    [54]
    T. Neumann and G. Weikum. x-rdf-3x: fast querying, high update rates, and consistency for rdf databases. PVLDB, 3(1--2):256--263, 2010.
    [55]
    J. Neyman. X--outline of a theory of statistical estimation based on the classical theory of probability. Phil. Trans. R. Soc. Lond. A, 236(767):333--380, 1937.
    [56]
    N. Papailiou, D. Tsoumakos, P. Karras, and N. Koziris. Graph-aware, workload-adaptive sparql query caching. In SIGMOD, pages 1777--1792. ACM, 2015.
    [57]
    J. Pérez, M. Arenas, and C. Gutierrez. Semantics and complexity of sparql. In ISWC, pages 30--43, 2006.
    [58]
    F. Picalausa and S. Vansummeren. What are real sparql queries like? In SWIM, page 7. ACM, 2011.
    [59]
    R. Pichler and S. Skritek. Containment and equivalence of well-designed sparql. In PODS, pages 39--50. ACM, 2014.
    [60]
    A. Polleres. From sparql to rules (and back). In WWW, pages 787--796. ACM, 2007.
    [61]
    E. Prud, A. Seaborne, et al. Sparql query language for rdf. https://www.w3.org/TR/rdf-sparql-query/, 2006.
    [62]
    Y. Sagiv and M. Yannakakis. Equivalences among relational expressions with the union and difference operators. JACM, 27(4):633--655, 1980.
    [63]
    G. Serfiotis, I. Koffina, V. Christophides, and V. Tannen. Containment and minimization of rdf/s query patterns. In ISWC, 2005.
    [64]
    L. Sidirourgos, R. Goncalves, M. Kersten, N. Nes, and S. Manegold. Column-store support for rdf data management: not all swans are white. PVLDB, 1(2):1553--1563, 2008.
    [65]
    A. Soylu, E. Kharlamov, D. Zheleznyakov, E. Jimé nez-Ruiz, M. Giese, M. G. Skjæveland, D. Hovland, R. Schlatte, S. Brandt, H. Lie, and I. Horrocks. Optiquevqs: A visual query system over ontologies for industry. Semantic Web, 9(5):627--660, 2018.
    [66]
    V. Spyropoulos and Y. K. Kotidis. Digree: Building a distributed graph processing engine out of single-node graph database installations. ACM SIGMOD Record, 46(4):22--27, 2018.
    [67]
    C. Svingos, T. Mailis, H. Kllapi, L. Stamatogiannakis, Y. Kotidis, and Y. Ioannidis. Real time processing of streaming and static information. In IEEE Big Data, pages 410--415, 2016.
    [68]
    T. Tran, G. Ladwig, and S. Rudolph. Managing structured and semistructured rdf data using structure indexes. IEEE Transactions on Knowledge and Data Engineering, 25(9):2076--2089, 2013.
    [69]
    J. Wang, Z. Liu, S. Ma, N. Ntarmos, and P. Triantafillou. GC: A graph caching system for subgraph/supergraph queries. PVLDB, 11(12):2022--2025, 2018.
    [70]
    J. Wang, N. Ntarmos, and P. Triantafillou. Indexing query graphs to speedup graph query processing. In EDBT, pages 41--52, 2016.
    [71]
    J. Wang, N. Ntarmos, and P. Triantafillou. Graphcache: A caching system for graph queries. In EDBT, pages 13--24, 2017.
    [72]
    C. Weiss, P. Karras, and A. Bernstein. Hexastore: sextuple indexing for semantic web data management. PVLDB, 1(1):1008--1019, 2008.
    [73]
    M. Wudage, J. Euzenat, P. Genevès, and N. Layaida. Sparql query containment under shi axioms. In Proceedings 26th AAAI Conference on Artificial Intelligence, pages 10--16, 2012.
    [74]
    M. Yannakakis. Algorithms for acyclic database schemes. In VLDB, volume 81, pages 82--94, 1981.
    [75]
    X. Zhang, L. Chen, Y. Tong, and M. Wang. Eagre: Towards scalable i/o efficient sparql query evaluation on the cloud. In ICDE, 2013.

    Cited By

    View all

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGMOD '19: Proceedings of the 2019 International Conference on Management of Data
    June 2019
    2106 pages
    ISBN:9781450356435
    DOI:10.1145/3299869
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 25 June 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. database views
    2. polynomial time
    3. ptime
    4. query containment
    5. radix tree
    6. rdf schema
    7. resource description framework (rdf)
    8. view materialisation

    Qualifiers

    • Research-article

    Conference

    SIGMOD/PODS '19
    Sponsor:
    SIGMOD/PODS '19: International Conference on Management of Data
    June 30 - July 5, 2019
    Amsterdam, Netherlands

    Acceptance Rates

    SIGMOD '19 Paper Acceptance Rate 88 of 430 submissions, 20%;
    Overall Acceptance Rate 785 of 4,003 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)17
    • Downloads (Last 6 weeks)0

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Checking Pattern Query Containment Under Shape ExpressionSN Computer Science10.1007/s42979-023-02142-z4:6Online publication date: 12-Oct-2023
    • (2022)Optimisation Techniques for Flexible SPARQL QueriesACM Transactions on the Web10.1145/353285516:4(1-44)Online publication date: 16-Nov-2022
    • (2022)Relational schema optimization for RDF-based knowledge graphsInformation Systems10.1016/j.is.2021.101754104:COnline publication date: 1-Feb-2022
    • (2021)View selection over knowledge graphs in triple storesProceedings of the VLDB Endowment10.14778/3484224.348422714:13(3281-3294)Online publication date: 28-Oct-2021
    • (2020)GRaCe: A Relaxed Approach for Graph Query CachingSOFSEM 2020: Theory and Practice of Computer Science10.1007/978-3-030-38919-2_55(657-666)Online publication date: 20-Jan-2020

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media