Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

G-SQL: fast query processing via graph exploration

Published: 01 August 2016 Publication History

Abstract

A lot of real-life data are of graph nature. However, it is not until recently that business begins to exploit data's connectedness for business insights. On the other hand, RDBMSs are a mature technology for data management, but they are not for graph processing. Take graph traversal, a common graph operation for example, it heavily relies on a graph primitive that accesses a given node's neighborhood. We need to join tables following foreign keys to access the nodes in the neighborhood if an RDBMS is used to manage graph data. Graph exploration is a fundamental building block of many graph algorithms. But this simple operation is costly due to a large volume of I/O caused by the massive amount of table joins. In this paper, we present G-SQL, our effort toward the integration of a RDBMS and a native in-memory graph processing engine. G-SQL leverages the fast graph exploration capability provided by the graph engine to answer multi-way join queries. Meanwhile, it uses RDBMSs to provide mature data management functionalities, such as reliable data storage and additional data access methods. Specifically, G-SQL is a SQL dialect augmented with graph exploration functionalities and it dispatches query tasks to the in-memory graph engine and its underlying RDMBS. The G-SQL runtime coordinates the two query processors via a unified cost model to ensure the entire query is processed efficiently. Experimental results show that our approach greatly expands capabilities of RDBMs and delivers exceptional performance for SQL-graph hybrid queries.

References

[1]
M. Chen, H. Hsiao, and P. S. Yu. On applying hash filters to improving the execution of multi-join queries. VLDB J., 6(2):121--131, 1997.
[2]
P. S. Yu, M. syan Chen, H. ulrich Heiss, and S. Lee. On workload characterization of relational database environments. IEEE TSE, 18:347--355, 1992.
[3]
M. Pöss, R. O. Nambiar, and D. Walrath. Why you should run TPC-DS: A workload analysis. In VLDB, pages 1138--1149, 2007.
[4]
S. Chu, M. Balazinska, and D. Suciu. From theory to practice: Efficient join query evaluation in a parallel database system. In SIGMOD, pages 63--78, 2015.
[5]
B. Shao, H. Wang, and Y. Li. Trinity: a distributed graph engine on a memory cloud. In SIGMOD, 2013.
[6]
A. Gupta and I. S. Mumick. Maintenance of materialized views: Problems, techniques, and applications. IEEE Data Eng. Bull., 18(2), 1995.
[7]
Z. Sun, H. Wang, H. Wang, B. Shao, and J. Li. Efficient subgraph matching on billion node graphs. Proc. VLDB Endow., 2012.
[8]
M. J. Rattigan, M. Maier, and D. Jensen. Graph clustering with network structure indices. In ICML. ACM, 2007.
[9]
H. Kubinyi, R. Mannhold, H. Timmerman, H.-J. Böhm, and G. Schneider. Virtual screening for bioactive molecules. John Wiley & Sons, 2008.
[10]
C. Liu, X. Yan, H. Yu, J. Han, and S. Y. Philip. Mining behavior graphs for" backtrace" of noncrashing bugs. In SDM, 2005.
[11]
C. Liu, X. Yan, L. Fei, J. Han, and S. P. Midkiff. Sober: statistical model-based bug localization. ACM SIGSOFT Software Engineering Notes, 2005.
[12]
Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein. Graphlab: A new framework for parallel machine learning. arXiv preprint arXiv:1006.4990, 2010.
[13]
G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In SIGMOD. ACM, 2010.
[14]
J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. Powergraph: Distributed graph-parallel computation on natural graphs. In OSDI, 2012.
[15]
N. Developers. Neo4j. Graph NoSQL Database {online}, 2012.
[16]
A. Kyrola, G. Blelloch, and C. Guestrin. Graphchi: Large-scale graph computation on just a pc. In OSDI, 2012.
[17]
M. P. Consens and A. O. Mendelzon. Graphlog: a visual formalism for real life recursion. In PODS. ACM, 1990.
[18]
H. He and A. K. Singh. Graphs-at-a-time: query language and access methods for graph databases. In SIGMOD. ACM, 2008.
[19]
M. Gyssens, J. Paredaens, and D. Van Gucht. A graph-oriented object database model. In PODS, pages 417--424. ACM, 1990.
[20]
R. H. Güting. Graphdb: Modeling and querying graphs in databases. In VLDB, volume 94, pages 12--15. Citeseer, 1994.
[21]
L. Sheng, Z. M. Ozsoyoglu, and G. Ozsoyoglu. A graph query language and its query processing. In ICDE. IEEE, 1999.
[22]
J. Hayes and C. Gutierrez. Bipartite graphs as intermediate model for rdf. In The Semantic Web--ISWC 2004. Springer, 2004.
[23]
L. Zou, J. Mo, L. Chen, M. T. Özsu, and D. Zhao. gstore: Answering sparql queries via subgraph matching. VLDB, 4(8):482--493, May 2011.
[24]
R. Angles and C. Gutierrez. Querying rdf data from a graph database perspective. In The Semantic Web: Research and Applications. Springer, 2005.
[25]
V. Bonstrom, A. Hinze, and H. Schweppe. Storing rdf as a graph. In Web Congress, 2003. Proceedings. First Latin American. IEEE, 2003.
[26]
K. Zeng, J. Yang, H. Wang, B. Shao, and Z. Wang. A distributed graph engine for web scale rdf data. In VLDB. VLDB Endowment, 2013.
[27]
D. J. Abadi, A. Marcus, S. R. Madden, and K. Hollenbach. Sw-store: a vertically partitioned dbms for semantic web data management. The VLDB Journal, 2009.
[28]
C. Weiss, P. Karras, and A. Bernstein. Hexastore: sextuple indexing for semantic web data management. Proc. VLDB Endow., 2008.
[29]
T. Neumann and G. Weikum. Rdf-3x: a risc-style engine for rdf. Proc. VLDB Endow., 2008.
[30]
M. A. Bornea, J. Dolby, A. Kementsietsidis, K. Srinivas, P. Dantressangle, O. Udrea, and B. Bhattacharjee. Building an efficient rdf store over a relational database. In SIGMOD. ACM, 2013.

Cited By

View all
  • (2021)Fast and Accurate Optimizer for Query Processing over Knowledge GraphsProceedings of the ACM Symposium on Cloud Computing10.1145/3472883.3486991(503-517)Online publication date: 1-Nov-2021
  • (2017)GraphGenProceedings of the Fifth International Workshop on Graph Data-management Experiences & Systems10.1145/3078447.3078456(1-7)Online publication date: 19-May-2017
  • (2017)Extracting and Analyzing Hidden Graphs from Relational DatabasesProceedings of the 2017 ACM International Conference on Management of Data10.1145/3035918.3035949(897-912)Online publication date: 9-May-2017
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 9, Issue 12
August 2016
345 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 August 2016
Published in PVLDB Volume 9, Issue 12

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)38
  • Downloads (Last 6 weeks)1
Reflects downloads up to 04 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2021)Fast and Accurate Optimizer for Query Processing over Knowledge GraphsProceedings of the ACM Symposium on Cloud Computing10.1145/3472883.3486991(503-517)Online publication date: 1-Nov-2021
  • (2017)GraphGenProceedings of the Fifth International Workshop on Graph Data-management Experiences & Systems10.1145/3078447.3078456(1-7)Online publication date: 19-May-2017
  • (2017)Extracting and Analyzing Hidden Graphs from Relational DatabasesProceedings of the 2017 ACM International Conference on Management of Data10.1145/3035918.3035949(897-912)Online publication date: 9-May-2017
  • (2016)Fast in-memory SQL analytics on typed graphsProceedings of the VLDB Endowment10.14778/3021924.302194110:3(265-276)Online publication date: 1-Nov-2016

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media