Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3295500.3356212acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article
Public Access

Semantic query transformations for increased parallelization in distributed knowledge graph query processing

Published: 17 November 2019 Publication History
  • Get Citation Alerts
  • Abstract

    Ontologies have become an increasingly popular semantic layer for integrating multiple heterogeneous datasets. However, significant challenges remain with supporting efficient and scalable processing of queries with data linked with ontologies (ontological queries). Ontological query processing queries requires explicitly defined query patterns be expanded to capture implicit ones, based on available ontology inference axioms. However, in practice such as in the biomedical domain, the complexity of the ontological axioms results in significantly large query expansions which present day query processing infrastructure cannot support. In particular, it remains unclear how to effectively parallelize such queries.
    In this paper, we propose data and query transformations that enable inter-operator parallelism of ontological queries on Hadoop platforms. Our transformation techniques exploit ontological axioms, second order data types and operator rewritings to eliminate expensive query substructures for increased parallelizability. Comprehensive experiments conducted on benchmark datasets show up to 25X performance improvement over existing approaches.

    References

    [1]
    [n. d.]. Adaptive Compression Buffer Size for Wide Tables in ORC. https://issues.apache.org/jira/browse/HIVE-7250.
    [2]
    [n. d.]. Logical Database Limits. https://docs.oracle.com/cd/B28359_01/server.111/b28320/limits003.htm#i288032.
    [3]
    [n. d.]. Project Webpage for R-Type: A Typing Model for RDF. http://research.csc.ncsu.edu/coul/RAPID+/SemStorm.
    [4]
    [n. d.]. Queries for Evaluation of SemStorm. https://research.csc.ncsu.edu/coul/RAPID+/SemStorm/comp2_queries.html.
    [5]
    [n. d.]. RDF Schema 1.1. https://www.w3.org/TR/rdf-schema.
    [6]
    [n. d.]. RDFHive: A Distributed RDF Store Based on top of Apache Hive. http://tyrex.inria.fr/rdfhive/home.html.
    [7]
    [n. d.]. Resource Description Framework (RDF). https://www.w3.org/RDF/.
    [8]
    [n. d.]. The Linked Open Data Cloud. https://lod-cloud.net/.
    [9]
    [n. d.]. Welcome to Apache Hadoop! http://hadoop.apache.org.
    [10]
    Jans Aasman. 2006. Allegro Graph: RDF Triple Database. Cidade: Oakland Franz Incorporated (2006).
    [11]
    Fran ois Belleau, Marc-Alexandre Nolin, Nicole Tourigny, Philippe Rigault, Jean Morissette, et al. 2008. Bio2RDF: Towards a Mashup to Build Bioinformatics Knowledge Systems. Journal of biomedical informatics 41 (2008).
    [12]
    Saman Biookaghazadeh, Shujia Zhou, and Ming Zhao. 2017. Kaleido: Enabling Efficient Scientific Data Processing on Big-Data Systems. In In Proc. NAS. 1--10.
    [13]
    Damian Bursztyn, Francois Goasdoue, and Ioana Manolescu. 2015. Optimizing Reformulation-based Query Answering in RDF. In Proc. EDBT.
    [14]
    Diego Calvanese, Giuseppe Giacomo, Domenico Lembo, Maurizio Lenzerini, and Riccardo Rosati. 2007. Tractable Reasoning and Efficient Query Answering in Description Logics: The DL-Lite Family. J. Autom. Reason. 39, 3 (Oct. 2007), 385--429.
    [15]
    Jae-Young Chang and Sang goo Lee. 1997. An Optimization of Disjunctive Queries: Union-Pushdown. In Proc. COMPSAC. 356--361.
    [16]
    Orri Erling and Ivan Mikhailov. 2009. RDF Support in the Virtuoso DBMS. In Networked Knowledge - Networked Media. Studies in Computational Intelligence, Vol. 221.
    [17]
    Sinziana Maria Filip. 2014. A Scalable Graph Pattern Matching Engine on Top of Apache Giraph. Master's thesis. Vrije Universiteit.
    [18]
    François Goasdoué, Zoi Kaoudi, Ioana Manolescu, Jorge Quiané-Ruiz, and Stamatis Zampetakis. 2015. CliqueSquare: Flat Plans for Massively Parallel RDF Queries. In Proc. ICDE.
    [19]
    Damien Graux, Louis Jachiet, Pierre Geneves, and Nabil Layaida. 2016. SPARQLGX: Efficient Distributed Evaluation of SPARQL with Apache Spark. In Proc. ISWC.
    [20]
    Yuanbo Guo, Zhengxiang Pan, and Jeff Heflin. 2005. LUBM: A Benchmark for OWL Knowledge Base Systems. Semantic Web Journal (2005).
    [21]
    Jiewen Huang, Daniel J. Abadi, and Kun Ren. 2011. Scalable SPARQL Querying of Large RDF Graphs. PVLDB 4 (2011).
    [22]
    M. Husain, J. McGlothlin, M.M. Masud, L. Khan, and Bhavani Thuraisingham. 2011. Heuristics-Based Query Processing for Large RDF Graphs Using Cloud Computing. 23, 9 (2011), 1312--1327.
    [23]
    Eric Jain, Amos Bairoch, Severine Duvaud, Isabelle Phan, Nicole Redaschi, Baris E Suzek, Maria J Martin, Peter McGarvey, and Elisabeth Gasteiger. 2009. Infrastructure for the Life Sciences: Design and Implementation of the UniProt Website. BMC bioinformatics 10 (2009).
    [24]
    Besat Kassaie. 2017. SPARQL over GraphX. CoRR abs/1701.03091 (2017).
    [25]
    HyeongSik Kim, Padmashree Ravindra, and Kemafor Anyanwu. 2012. Scan-Sharing for Optimizing RDF Graph Pattern Matching on MapReduce. In Proc. CLOUD.
    [26]
    H. Kim, P. Ravindra, and K. Anyanwu. 2017. A semantics-aware storage framework for scalable processing of knowledge graphs on Hadoop. In 2017 IEEE International Conference on Big Data (Big Data). 193--202.
    [27]
    HyeongSik Kim, Padmashree Ravindra, and Kemafor Anyanwu. 2017. A semantics-aware storage framework for scalable processing of knowledge graphs on Hadoop. In 2017 IEEE International Conference on Big Data, BigData 2017, Boston, MA, USA, December 11--14, 2017. 193--202.
    [28]
    HyeongSik Kim, Padmashree Ravindra, and Kemafor Anyanwu. 2017. Type-based Semantic Optimization for Scalable RDF Graph Pattern Matching. In Proc. WWW. 785--794.
    [29]
    Atanas Kiryakov, Damyan Ognyanov, and Dimitar Manov. 2005. Web Information Systems Engineering - WISE 2005 Workshops. Chapter OWLIM - A Pragmatic Semantic Repository for OWL, 182--192.
    [30]
    Wangchao Le, Anastasios Kementsietsidis, Songyun Duan, and Feifei Li. 2012. Scalable Multi-query Optimization for SPARQL. In Proc. ICDE. 666--677.
    [31]
    Kisung Lee and Ling Liu. 2013. Scaling Queries over Big RDF Graphs with Semantic Hash Partitioning. Proc. VLDB Endow. 6 (2013).
    [32]
    B. McBride. 2002. Jena: A Semantic Web Toolkit. Internet Computing, IEEE 6 (2002).
    [33]
    N. Papailiou, I. Konstantinou, D. Tsoumakos, P. Karras, and N. Koziris. 2013. H2RDF+;: High-performance Distributed Joins Over Large-scale RDF Graphs. In Proc. Big Data.
    [34]
    Minh-Duc Pham, Linnea Passing, Orri Erling, and Peter Boncz. 2015. Deriving an Emergent Relational Schema from RDF Data. In Proc. WWW.
    [35]
    Eric Prud'Hommeaux, Andy Seaborne, et al. 2008. SPARQL query language for RDF. W3C recommendation 15 (2008).
    [36]
    Padmashree Ravindra, HyeongSik Kim, and Kemafor Anyanwu. 2011. An Intermediate Algebra for Optimizing RDF Graph Pattern Matching on MapReduce. In Proc. ESWC.
    [37]
    Russ Rew and Glenn Davis. 1990. NetCDF: an interface for scientific data access. IEEE Computer Graphics and Applications 10, 4 (1990), 76--82.
    [38]
    Alexander Schatzle, Martin Przyjaciel-Zablocki, and Georg Lausen. 2011. PigSPARQL: Mapping SPARQL to Pig Latin. In Proc. SWIM.
    [39]
    Alexander Schatzle, Martin Przyjaciel-Zablocki, Simon Skilevic, and Georg Lausen. 2016. S2RDF: RDF Querying with SPARQL on Spark. Proc. VLDB Endow. 9, 10 (2016), 804--815.
    [40]
    Heiner Stuckenschmidt and Jeen Broekstra. 2005. Time-Space Tradeoffs in Scaling up RDF Schema Reasoning. In Proc. WISE. 172--181.
    [41]
    Jacopo Urbani, Spyros Kotoulas, Eyal Oren, and Frank Harmelen. 2009. Scalable Distributed Reasoning Using MapReduce. In Proc. ISWC. 634--649.
    [42]
    Jacopo Urbani, Frank van Harmelen, Stefan Schlobach, and Henri Bal. 2011. QueryPIE: Backward Reasoning for OWL Horst over Very Large Knowledge Bases. In Proc. ISWC. 730--745.
    [43]
    Kai Zeng, Jiacheng Yang, Haixun Wang, Bin Shao, and Zhongyuan Wang. 2013. A Distributed Graph Engine for Web Scale RDF Data. In Proc. PVLDB. 265--276.
    [44]
    Xiaofei Zhang, Lei Chen, Yongxin Tong, and Min Wang. 2013. EAGRE: Towards Scalable I/O Efficient SPARQL Query Evaluation on the Cloud. In Proc. ICDE. 0--0.
    [45]
    Lei Zou, Jinghui Mo, Lei Chen, M. Tamer Ozsu, and Dongyan Zhao. 2011. gStore: Answering SPARQL Queries via Subgraph Matching. Proc. VLDB Endow. 4 (2011).

    Cited By

    View all
    • (2023)Decentralized Stream Reasoning AgentsProceedings of the 17th ACM International Conference on Distributed and Event-based Systems10.1145/3583678.3603286(203-206)Online publication date: 27-Jun-2023
    • (2022)Bring orders into uncertaintyProceedings of the 36th ACM International Conference on Supercomputing10.1145/3524059.3532379(1-14)Online publication date: 28-Jun-2022
    • (2020)Proactive Digital Companions in Pervasive Hypermedia Environments2020 IEEE 6th International Conference on Collaboration and Internet Computing (CIC)10.1109/CIC50333.2020.00017(54-59)Online publication date: Dec-2020

    Index Terms

    1. Semantic query transformations for increased parallelization in distributed knowledge graph query processing

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SC '19: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
      November 2019
      1921 pages
      ISBN:9781450362290
      DOI:10.1145/3295500
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      In-Cooperation

      • IEEE CS

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 17 November 2019

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. algorithms
      2. and data management
      3. and resource management including dynamic resource provisioning
      4. cloud workflow
      5. data
      6. data analytics and frameworks supporting data analytics
      7. graph and network algorithms
      8. improved models
      9. metadata
      10. namespaces
      11. performance or scalability of specific applications and respective software
      12. scalable storage

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      SC '19
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)50
      • Downloads (Last 6 weeks)10
      Reflects downloads up to 11 Aug 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)Decentralized Stream Reasoning AgentsProceedings of the 17th ACM International Conference on Distributed and Event-based Systems10.1145/3583678.3603286(203-206)Online publication date: 27-Jun-2023
      • (2022)Bring orders into uncertaintyProceedings of the 36th ACM International Conference on Supercomputing10.1145/3524059.3532379(1-14)Online publication date: 28-Jun-2022
      • (2020)Proactive Digital Companions in Pervasive Hypermedia Environments2020 IEEE 6th International Conference on Collaboration and Internet Computing (CIC)10.1109/CIC50333.2020.00017(54-59)Online publication date: Dec-2020

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media