Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3106426.3106534acmconferencesArticle/Chapter ViewAbstractPublication PageswiConference Proceedingsconference-collections
research-article

Efficient processing of SPARQL queries over GraphFrames

Published: 23 August 2017 Publication History

Abstract

With the advent of huge data management systems storing voluminous data, there arises a need to develop efficient data analytics techniques for knowledge discovery at different levels of granularity. Resource Description Framework (RDF), mainly developed for Semantic Web, is presumably a good option when considering graph databases dealing with huge real-world data. RDF models information in the form of triples <subject, predicate, object>, and is considered as a useful tool to store graph data (aka linked data) where each edge can be stored as a triple. Due to existence of huge amount of linked data, mostly in the form of graphs, graph mining has been successful in attracting researchers from different research fields for efficient handling (storage, indexing, retrieval, etc.) of graph data. As a result, various APIs like GraphX and GraphFrames are developed to facilitate relational queries over graph data. Though GraphX is older than GraphFrames and processing SPARQL queries over GraphX has been explored by some researchers, to the best of our knowledge, SPARQL query processing over GraphFrames has not been explored yet. In this paper, we present an initial study on query-specific search space pruning and query optimization approach to process SPARQL queries over GraphFrames in an efficient manner. The experimental results, in terms of low response time for query execution, are encouraging, and give way to invest more research efforts in this direction.

References

[1]
Charu C Aggarwal. 2011. An introduction to social network data analytics. Social network data analytics (2011), 1--15.
[2]
Vibha Bhardwaj and Rahul Johari. 2015. Big data analysis: Issues and challenges. In 2015 International Conference on Electrical, Electronics, Signals, Communication and Optimization (EESCO).
[3]
Mihai Capotă, Tim Hegeman, Alexandru Iosup, Arnau Prat-Pérez, Orri Erling, and Peter Boncz. 2015. Graphalytics: A Big Data Benchmark for Graph-Processing Platforms. In Proceedings of the GRADES'15. ACM, New York, USA, 7:1--7:6.
[4]
Ankur Dave, Alekh Jindal, Li Erran Li, Reynold Xin, Joseph Gonzalez, and Matei Zaharia. 2016. Graphframes: an integrated api for mixing graph and relational queries. In Proceedings of the Fourth International Workshop on Graph Data Management Experiences and Systems. ACM, 2.
[5]
Gergo Gombos, Gábor Rácz, and Attila Kiss. 2016. Spar (k) ql: SPARQL evaluation method on Spark GraphX. In Future Internet of Things and Cloud Workshops (FiCloudW), IEEE International Conference on. IEEE, 188--193.
[6]
Joseph E Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs. In OSDI, Vol. 12. 2.
[7]
Eric L Goodman and Dirk Grunwald. 2014. Using vertex-centric programming platforms to implement SPARQL queries on large graphs. In Proceedings of the 4th Workshop on Irregular Applications: Architectures and Algorithms. IEEE Press, 25--32.
[8]
Andrey Gubichev, Srikanta Bedathur, Stephan Seufert, and Gerhard Weikum. 2010. Fast and accurate estimation of shortest paths in large graphs. In Proceedings of the 19th ACM international conference on Information and knowledge management. ACM, 499--508.
[9]
Yuanbo Guo, Zhengxiang Pan, and Jeff Heflin. 2005. LUBM: A benchmark for OWL knowledge base systems. Web Semantics: Science, Services and Agents on the World Wide Web 3, 2 (2005), 158--182.
[10]
Besat Kassaie. 2017. SPARQL over GraphX. arXiv preprint arXiv:1701.03091 (2017).
[11]
Avita Katal, Mohammad Wazid, and RH Goudar. 2013. Big data: issues, challenges, tools and good practices. In Contemporary Computing (IC3), 2013 Sixth International Conference on. IEEE, 404--409.
[12]
Graham Klyne and Jeremy J. Carroll. 2006. Resource Description Framework (RDF): Concepts and Abstract Syntax. (2006).
[13]
Chang Liu, Jun Qu, Guilin Qi, Haofen Wang, and Yong Yu. 2012. Hadoopsparql: a hadoop-based engine for multiple sparql query answering. In Extended Semantic Web Conference. Springer, 474--479.
[14]
AnnaLubiw. 1981. Some NP-complete problems similar to graph isomorphism. SIAM J. Comput. 10, 1 (1981), 11--21.
[15]
Grzegorz Malewicz, Matthew H Austern, Aart JC Bik, James C Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: a system for large-scale graph processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. ACM, 135--146.
[16]
Nikolaos Papailiou, Ioannis Konstantinou, Dimitrios Tsoumakos, and Nectarios Koziris. 2012. H2RDF: adaptive query processing on RDF data in the cloud. In Proceedings of the 21st International Conference on World Wide Web. ACM, 397--400.
[17]
Robert F Woolson and William R Clarke. 2011. Statistical methods for the analysis of biomedical data. Vol. 371. John Wiley & Sons.
[18]
Reynold S Xin, Joseph E Gonzalez, Michael J Franklin, and Ion Stoica. 2013. Graphx: A resilient distributed graph system on spark. In First International Workshop on Graph Data Management Experiences and Systems. ACM, 2.
[19]
Lei Zou, M Tamer özsu, Lei Chen, Xuchuan Shen, Ruizhe Huang, and Dongyan Zhao. 2014. gStore: a graph-based SPARQL query engine. The VLDB journal 23, 4 (2014), 565--590.

Cited By

View all
  • (2024)DIAERESIS: RDF data partitioning and query processing on SPARKSemantic Web10.3233/SW-243554(1-27)Online publication date: 6-Mar-2024
  • (2021)Uncovering Active Communities from Directed Graphs on Distributed Spark Frameworks, Case Study: Twitter DataBig Data and Cognitive Computing10.3390/bdcc50400465:4(46)Online publication date: 22-Sep-2021
  • (2021)Summarizing RDF graphs using Node Importance and Query History2021 International Conference on Service Science (ICSS)10.1109/ICSS53362.2021.00016(51-58)Online publication date: May-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WI '17: Proceedings of the International Conference on Web Intelligence
August 2017
1284 pages
ISBN:9781450349512
DOI:10.1145/3106426
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 August 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. GraphX
  2. SPARQL query processing
  3. graph mining
  4. graphframes
  5. linked data mining

Qualifiers

  • Research-article

Conference

WI '17
Sponsor:

Acceptance Rates

WI '17 Paper Acceptance Rate 118 of 178 submissions, 66%;
Overall Acceptance Rate 118 of 178 submissions, 66%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)1
Reflects downloads up to 18 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2024)DIAERESIS: RDF data partitioning and query processing on SPARKSemantic Web10.3233/SW-243554(1-27)Online publication date: 6-Mar-2024
  • (2021)Uncovering Active Communities from Directed Graphs on Distributed Spark Frameworks, Case Study: Twitter DataBig Data and Cognitive Computing10.3390/bdcc50400465:4(46)Online publication date: 22-Sep-2021
  • (2021)Summarizing RDF graphs using Node Importance and Query History2021 International Conference on Service Science (ICSS)10.1109/ICSS53362.2021.00016(51-58)Online publication date: May-2021
  • (2018)RDF Query Answering Using Apache Spark: Review and Assessment2018 IEEE 34th International Conference on Data Engineering Workshops (ICDEW)10.1109/ICDEW.2018.00016(54-59)Online publication date: Apr-2018

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media