research-article

Making RDBMSs efficient on graph workloads through predefined joins

Authors:

Semih SalihogluAuthors Info & Claims

Proceedings of the VLDB Endowment, Volume 15, Issue 5

Pages 1011 - 1023

https://doi.org/10.14778/3510397.3510400

Published: 01 January 2022 Publication History

Abstract

Joins in native graph database management systems (GDBMSs) are predefined to the system as edges, which are indexed in adjacency list indices and serve as pointers. This contrasts with and can be more performant than value-based joins in RDBMSs. Existing approaches to integrate predefined joins into RDBMSs adopt a strict separation of graph and relational data and processors, where a graph-specific processor uses left-deep and index nested loop joins (INLJ) for a subset of joins. In this paper we study and experimentally evaluate this technique's performance against an alternative technique that is based on using hash joins that use system-level row IDs (RIDs). In this alternative approach, when a join between two tables is predefined to the system, the RIDs of joining tuples are materialized in extended tables and optionally in RID indices. Instead of using the RID index to perform the join directly, we use it primarily in hash joins to generate filters that can be passed to scans using sideways information passing (sip), ensuring sequential scans. We further compare these two approaches against: (i) the default value-based joins of an RDBMS; and (ii) using materialized views that can avoid evaluating predefined joins completely and instead replace them with scans. We integrated our alternative approach to DuckDB and call the resulting system GRainDB. Our evaluation demonstrates that existing INJL-based approach can be very efficient when entity relations contain very selective filters. However, GRainDB's approach is more robust and is either competitive with or outperforms the INLJ-based approach across a wide range of settings. We further demonstrate that GRainDB far improves the performance of DuckDB, which uses default value-based joins, on relational and graph workloads with large many-to-many joins, making it competitive with a state-of-the-art GDBMS, and incurs no major overheads otherwise.

References

[1]

2021. DGraph. https://dgraph.io

[2]

2021. GRainDB. https://github.com/graindb/graindb

[3]

2021. Neo4j. http://neo4j.com

[4]

2021. TigerGraph. http://tigergraph.com

[5]

2021. Value reuse for auto-increment columns in MySQL. https://dev.mysql.com/doc/refman/8.0/en/delete.html

[6]

Sanjay Agrawal, Surajit Chaudhuri, and Vivek R. Narasayya. 2000. Automated Selection of Materialized Views and Indexes in SQL Databases. In VLDB.

Digital Library

[7]

Renzo Angles, János Benjamin Antal, Alex Averbuch, Peter Boncz, Orri Erling, Andrey Gubichev, Vlad Haprian, Moritz Kaufmann, Josep Lluís Larriba Pey, Norbert Martínez, et al. 2020. The LDBC social network benchmark. CoRR (2020).

[8]

Nafisa Anzum, Semih Salihoglu, and Daniel Vogel. 2019. GraphWrangler: An Interactive Graph View on Relational Data. In ICDE.

[9]

Francois Bancilhon, David Maier, Yehoshua Sagiv, and Jeffrey D Ullman. 1985. Magic Sets and Other Strange Ways to Implement Logic Programs (Extended Abstract). In PODS.

[10]

Angela Bonifati, George Fletcher, Hannes Voigt, and Nikolay Yakovets. 2018. Querying Graphs. Morgan & Claypool.

[11]

Edgar F Codd. 1982. Relational database: a practical foundation for productivity. Commun. ACM 25, 2 (1982).

Digital Library

[12]

Hector Garcia-Molina, Jeffrey D Ullman, and Jennifer Widom. 2000. Database system implementation. Prentice Hall Upper Saddle River, NJ:.

Digital Library

[13]

Jonathan Goldstein and Per-Åke Larson. 2001. Optimizing Queries Using Materialized Views: A Practical, Scalable Solution. SIGMOD Record 30, 2 (2001).

[14]

Goetz Graefe. 1993. Query evaluation techniques for large databases. ACM Computing Surveys (CSUR) 25, 2 (1993).

[15]

Pranjal Gupta, Amine Mhedhbi, and Semih Salihoglu. 2021. Integrating Column-Oriented Storage and Query Processing Techniques Into Graph Database Management Systems. CoRR (2021).

[16]

Alon Y. Halevy. 2001. Answering Queries Using Views: A Survey. The VLDB Journal 10, 4 (2001).

Digital Library

[17]

Mohamed S. Hassan, Tatiana Kuznetsova, Hyun Chai Jeong, Walid G. Aref, and Mohammad Sadoghi. 2018. Extending In-Memory Relational Database Engines with Native Graph Support. In EDBT.

[18]

Mohamed S Hassan, Tatiana Kuznetsova, Hyun Chai Jeong, Walid G Aref, and Mohammad Sadoghi. 2018. Grfusion: Graphs as first-class citizens in main-memory relational database systems. In SIGMOD.

[19]

Zachary G Ives and Nicholas E Taylor. 2008. Sideways information passing for push-style query processing. In ICDE.

[20]

Guodong Jin and Semih Salihoglu. 2021. Making RDBMSs Efficient on Graph Workloads Through Predefined Joins. https://cs.uwaterloo.ca/~ssalihog/papers/predefined-tr.pdf

[21]

Chathura Kankanamge, Siddhartha Sahu, Amine Mhedbhi, Jeremy Chen, and Semih Salihoglu. 2017. Graphflow: An active graph database. In SIGMOD.

Digital Library

[22]

Hui Lei and Kenneth A. Ross. 1999. Faster joins, self-joins and multi-way joins using join indices. Data & Knowledge Engineering 29, 2 (1999).

[23]

Viktor Leis, Andrey Gubichev, Atanas Mirchev, Peter Boncz, Alfons Kemper, and Thomas Neumann. 2015. How good are query optimizers, really? PVLDB 9, 3 (2015).

[24]

Zhe Li and Kenneth A. Ross. 1999. Fast Joins Using Join Indices. VLDBJ 8, 1 (1999).

[25]

Chunbin Lin, Benjamin Mandel, Yannis Papakonstantinou, and Matthias Springer. 2016. Fast In-Memory SQL Analytics on Typed Graphs. In ICDE.

[26]

Chunbin Lin, Jianguo Wang, and Yannis Papakonstantinou. 2017. GQFast: Fast graph exploration with context-aware autocompletion. In ICDE.

[27]

Imene Mami and Zohra Bellahsene. 2012. A Survey of View Selection Methods. SIGMOD Record 41, 1 (2012).

[28]

Amine Mhedhbi, Pranjal Gupta, Shahid Khaliq, and Semih Salihoglu. 2021. A+ Indexes: Lightweight and Highly Flexible Adjacency Lists for Graph Database Management Systems. In ICDE.

[29]

Amine Mhedhbi, Chathura Kankanamge, and Semih Salihoglu. 2021. Optimizing One-time and Continuous Subgraph Queries using Worst-Case Optimal Joins. TODS (2021).

[30]

Amine Mhedhbi and Semih Salihoglu. 2019. Optimizing subgraph queries by combining binary and worst-case optimal joins. PVLDB 12, 11 (2019).

[31]

Inderpal Singh Mumick and Hamid Pirahesh. 1994. Implementation of Magic-Sets in a Relational Database System. In SIGMOD.

[32]

Thomas Neumann and Gerhard Weikum. 2009. Scalable join processing on very large RDF graphs. In SIGMOD.

[33]

Patrick O'Neil and Goetz Graefe. 1995. Multi-Table Joins through Bitmapped Join Indices. SIGMOD Rec. 24, 3 (1995).

[34]

Patrick O'Neil and Dallan Quass. 1997. Improved Query Performance with Variant Indexes. SIGMOD Record 26, 2 (1997).

[35]

Mark Raasveldt and Hannes Mühleisen. 2019. DuckDB: an embeddable analytical database. In SIGMOD.

[36]

Mark Raasveldt and Hannes Mühleisen. 2020. Data Management for Data Science - Towards Embedded Analytics. In CIDR.

[37]

Michael Rudolf, Marcus Paradies, Christof Bornhövd, and Wolfgang Lehner. 2013. The graph story of the SAP HANA database. BTW.

[38]

Siddhartha Sahu, Amine Mhedhbi, Semih Salihoglu, Jimmy Lin, and M Tamer Özsu. 2020. The ubiquity of large graphs and surprising challenges of graph processing: extended survey. VLDBJ 29, 2 (2020).

[39]

Wen Sun, Achille Fokoue, Kavitha Srinivas, Anastasios Kementsietsidis, Gang Hu, and Guotong Xie. 2015. Sqlgraph: An efficient relational-based property graph store. In SIGMOD.

Digital Library

[40]

Yuanyuan Tian, En Liang Xu, Wei Zhao, Mir Hamid Pirahesh, Sui Jun Tong, Wen Sun, Thomas Kolanko, Md Shahidul Haque Apu, and Huijuan Peng. 2020. IBM Db2 Graph: Supporting Synergistic and Retrofittable Graph Queries Inside IBM Db2. In SIGMOD.

[41]

P. Tsaparas, T. Palpanas, Y. Kotidis, N. Koudas, and Divesh Srivastava. 2003. Ranked Join Indices. In ICDE.

[42]

Patrick Valduriez. 1987. Join indices. TODS 12, 2 (1987).

[43]

Konstantinos Xirogiannopoulos, Virinchi Srinivas, and Amol Deshpande. 2017. GraphGen: Adaptive graph processing using relational databases. In GRADES.

[44]

Jianqiao Zhu, Navneet Potti, Saket Saurabh, and Jignesh M Patel. 2017. Looking ahead makes query plans robust: Making the initial case with in-memory star schema data warehouse workloads. PVLDB 10, 8 (2017).

Digital Library

Cited By

Lou YLai LLyu BYang YZhou XYu WZhang YZhou J(2024)Towards a Converged Relational-Graph Optimization FrameworkProceedings of the ACM on Management of Data10.1145/36988282:6(1-27)Online publication date: 20-Dec-2024
https://dl.acm.org/doi/10.1145/3698828

Making RDBMSs efficient on graph workloads through predefined joins
1. Information systems
  1. Data management systems
    1. Database management system engines
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Database theory

Recommendations

Efficient joins with compressed bitmap indexes
CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management

We present a new class of adaptive algorithms that use compressed bitmap indexes to speed up evaluation of the range join query in relational databases. We determine the best strategy to process a join query based on a fast sub-linear time computation ...
OVI-3: A NoSQL visual query system supporting efficient anti-joins
Abstract
The aim of this work was to develop a technique to speed up complex joins in an incremental visual query system. When designing a visual, highly interactive interface for ad-hoc (read-only) queries, fast response times are of paramount importance. ...
Using slice join for efficient evaluation of multi-way joins

A standard hash join algorithm joins two relations at a time and requires reading the entire smaller input before results are generated. There has been recent focus on constructing join algorithms that produce results faster and can join more than two ...

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment

Proceedings of the VLDB Endowment Volume 15, Issue 5

January 2022

134 pages

ISSN:2150-8097

Editors:
Fatma Özcan
Google
,
Juliana Freire
New York University
,
Xuemin Lin
University of New South Wales

Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 January 2022

Published in PVLDB Volume 15, Issue 5

Badges

Artifacts Available / v1.1

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
50
Total Downloads

Downloads (Last 12 months)19
Downloads (Last 6 weeks)1

Reflects downloads up to 25 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Lou YLai LLyu BYang YZhou XYu WZhang YZhou J(2024)Towards a Converged Relational-Graph Optimization FrameworkProceedings of the ACM on Management of Data10.1145/36988282:6(1-27)Online publication date: 20-Dec-2024
https://dl.acm.org/doi/10.1145/3698828

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents