Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Making RDBMSs efficient on graph workloads through predefined joins

Published: 01 January 2022 Publication History

Abstract

Joins in native graph database management systems (GDBMSs) are predefined to the system as edges, which are indexed in adjacency list indices and serve as pointers. This contrasts with and can be more performant than value-based joins in RDBMSs. Existing approaches to integrate predefined joins into RDBMSs adopt a strict separation of graph and relational data and processors, where a graph-specific processor uses left-deep and index nested loop joins (INLJ) for a subset of joins. In this paper we study and experimentally evaluate this technique's performance against an alternative technique that is based on using hash joins that use system-level row IDs (RIDs). In this alternative approach, when a join between two tables is predefined to the system, the RIDs of joining tuples are materialized in extended tables and optionally in RID indices. Instead of using the RID index to perform the join directly, we use it primarily in hash joins to generate filters that can be passed to scans using sideways information passing (sip), ensuring sequential scans. We further compare these two approaches against: (i) the default value-based joins of an RDBMS; and (ii) using materialized views that can avoid evaluating predefined joins completely and instead replace them with scans. We integrated our alternative approach to DuckDB and call the resulting system GRainDB. Our evaluation demonstrates that existing INJL-based approach can be very efficient when entity relations contain very selective filters. However, GRainDB's approach is more robust and is either competitive with or outperforms the INLJ-based approach across a wide range of settings. We further demonstrate that GRainDB far improves the performance of DuckDB, which uses default value-based joins, on relational and graph workloads with large many-to-many joins, making it competitive with a state-of-the-art GDBMS, and incurs no major overheads otherwise.

References

[1]
2021. DGraph. https://dgraph.io
[2]
2021. GRainDB. https://github.com/graindb/graindb
[3]
2021. Neo4j. http://neo4j.com
[4]
2021. TigerGraph. http://tigergraph.com
[5]
2021. Value reuse for auto-increment columns in MySQL. https://dev.mysql.com/doc/refman/8.0/en/delete.html
[6]
Sanjay Agrawal, Surajit Chaudhuri, and Vivek R. Narasayya. 2000. Automated Selection of Materialized Views and Indexes in SQL Databases. In VLDB.
[7]
Renzo Angles, János Benjamin Antal, Alex Averbuch, Peter Boncz, Orri Erling, Andrey Gubichev, Vlad Haprian, Moritz Kaufmann, Josep Lluís Larriba Pey, Norbert Martínez, et al. 2020. The LDBC social network benchmark. CoRR (2020).
[8]
Nafisa Anzum, Semih Salihoglu, and Daniel Vogel. 2019. GraphWrangler: An Interactive Graph View on Relational Data. In ICDE.
[9]
Francois Bancilhon, David Maier, Yehoshua Sagiv, and Jeffrey D Ullman. 1985. Magic Sets and Other Strange Ways to Implement Logic Programs (Extended Abstract). In PODS.
[10]
Angela Bonifati, George Fletcher, Hannes Voigt, and Nikolay Yakovets. 2018. Querying Graphs. Morgan & Claypool.
[11]
Edgar F Codd. 1982. Relational database: a practical foundation for productivity. Commun. ACM 25, 2 (1982).
[12]
Hector Garcia-Molina, Jeffrey D Ullman, and Jennifer Widom. 2000. Database system implementation. Prentice Hall Upper Saddle River, NJ:.
[13]
Jonathan Goldstein and Per-Åke Larson. 2001. Optimizing Queries Using Materialized Views: A Practical, Scalable Solution. SIGMOD Record 30, 2 (2001).
[14]
Goetz Graefe. 1993. Query evaluation techniques for large databases. ACM Computing Surveys (CSUR) 25, 2 (1993).
[15]
Pranjal Gupta, Amine Mhedhbi, and Semih Salihoglu. 2021. Integrating Column-Oriented Storage and Query Processing Techniques Into Graph Database Management Systems. CoRR (2021).
[16]
Alon Y. Halevy. 2001. Answering Queries Using Views: A Survey. The VLDB Journal 10, 4 (2001).
[17]
Mohamed S. Hassan, Tatiana Kuznetsova, Hyun Chai Jeong, Walid G. Aref, and Mohammad Sadoghi. 2018. Extending In-Memory Relational Database Engines with Native Graph Support. In EDBT.
[18]
Mohamed S Hassan, Tatiana Kuznetsova, Hyun Chai Jeong, Walid G Aref, and Mohammad Sadoghi. 2018. Grfusion: Graphs as first-class citizens in main-memory relational database systems. In SIGMOD.
[19]
Zachary G Ives and Nicholas E Taylor. 2008. Sideways information passing for push-style query processing. In ICDE.
[20]
Guodong Jin and Semih Salihoglu. 2021. Making RDBMSs Efficient on Graph Workloads Through Predefined Joins. https://cs.uwaterloo.ca/~ssalihog/papers/predefined-tr.pdf
[21]
Chathura Kankanamge, Siddhartha Sahu, Amine Mhedbhi, Jeremy Chen, and Semih Salihoglu. 2017. Graphflow: An active graph database. In SIGMOD.
[22]
Hui Lei and Kenneth A. Ross. 1999. Faster joins, self-joins and multi-way joins using join indices. Data & Knowledge Engineering 29, 2 (1999).
[23]
Viktor Leis, Andrey Gubichev, Atanas Mirchev, Peter Boncz, Alfons Kemper, and Thomas Neumann. 2015. How good are query optimizers, really? PVLDB 9, 3 (2015).
[24]
Zhe Li and Kenneth A. Ross. 1999. Fast Joins Using Join Indices. VLDBJ 8, 1 (1999).
[25]
Chunbin Lin, Benjamin Mandel, Yannis Papakonstantinou, and Matthias Springer. 2016. Fast In-Memory SQL Analytics on Typed Graphs. In ICDE.
[26]
Chunbin Lin, Jianguo Wang, and Yannis Papakonstantinou. 2017. GQFast: Fast graph exploration with context-aware autocompletion. In ICDE.
[27]
Imene Mami and Zohra Bellahsene. 2012. A Survey of View Selection Methods. SIGMOD Record 41, 1 (2012).
[28]
Amine Mhedhbi, Pranjal Gupta, Shahid Khaliq, and Semih Salihoglu. 2021. A+ Indexes: Lightweight and Highly Flexible Adjacency Lists for Graph Database Management Systems. In ICDE.
[29]
Amine Mhedhbi, Chathura Kankanamge, and Semih Salihoglu. 2021. Optimizing One-time and Continuous Subgraph Queries using Worst-Case Optimal Joins. TODS (2021).
[30]
Amine Mhedhbi and Semih Salihoglu. 2019. Optimizing subgraph queries by combining binary and worst-case optimal joins. PVLDB 12, 11 (2019).
[31]
Inderpal Singh Mumick and Hamid Pirahesh. 1994. Implementation of Magic-Sets in a Relational Database System. In SIGMOD.
[32]
Thomas Neumann and Gerhard Weikum. 2009. Scalable join processing on very large RDF graphs. In SIGMOD.
[33]
Patrick O'Neil and Goetz Graefe. 1995. Multi-Table Joins through Bitmapped Join Indices. SIGMOD Rec. 24, 3 (1995).
[34]
Patrick O'Neil and Dallan Quass. 1997. Improved Query Performance with Variant Indexes. SIGMOD Record 26, 2 (1997).
[35]
Mark Raasveldt and Hannes Mühleisen. 2019. DuckDB: an embeddable analytical database. In SIGMOD.
[36]
Mark Raasveldt and Hannes Mühleisen. 2020. Data Management for Data Science - Towards Embedded Analytics. In CIDR.
[37]
Michael Rudolf, Marcus Paradies, Christof Bornhövd, and Wolfgang Lehner. 2013. The graph story of the SAP HANA database. BTW.
[38]
Siddhartha Sahu, Amine Mhedhbi, Semih Salihoglu, Jimmy Lin, and M Tamer Özsu. 2020. The ubiquity of large graphs and surprising challenges of graph processing: extended survey. VLDBJ 29, 2 (2020).
[39]
Wen Sun, Achille Fokoue, Kavitha Srinivas, Anastasios Kementsietsidis, Gang Hu, and Guotong Xie. 2015. Sqlgraph: An efficient relational-based property graph store. In SIGMOD.
[40]
Yuanyuan Tian, En Liang Xu, Wei Zhao, Mir Hamid Pirahesh, Sui Jun Tong, Wen Sun, Thomas Kolanko, Md Shahidul Haque Apu, and Huijuan Peng. 2020. IBM Db2 Graph: Supporting Synergistic and Retrofittable Graph Queries Inside IBM Db2. In SIGMOD.
[41]
P. Tsaparas, T. Palpanas, Y. Kotidis, N. Koudas, and Divesh Srivastava. 2003. Ranked Join Indices. In ICDE.
[42]
Patrick Valduriez. 1987. Join indices. TODS 12, 2 (1987).
[43]
Konstantinos Xirogiannopoulos, Virinchi Srinivas, and Amol Deshpande. 2017. GraphGen: Adaptive graph processing using relational databases. In GRADES.
[44]
Jianqiao Zhu, Navneet Potti, Saket Saurabh, and Jignesh M Patel. 2017. Looking ahead makes query plans robust: Making the initial case with in-memory star schema data warehouse workloads. PVLDB 10, 8 (2017).

Cited By

View all
  • (2024)Towards a Converged Relational-Graph Optimization FrameworkProceedings of the ACM on Management of Data10.1145/36988282:6(1-27)Online publication date: 20-Dec-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 15, Issue 5
January 2022
134 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 January 2022
Published in PVLDB Volume 15, Issue 5

Badges

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)19
  • Downloads (Last 6 weeks)1
Reflects downloads up to 25 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Towards a Converged Relational-Graph Optimization FrameworkProceedings of the ACM on Management of Data10.1145/36988282:6(1-27)Online publication date: 20-Dec-2024

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media