Vertex-centric Parallel Computation of SQL Queries
Pages 1664 - 1677
Abstract
We present a scheme for parallel execution of SQL queries on top of any vertex-centric BSP graph processing engine. The scheme comprises a graph encoding of relational instances and a vertex program specification of our algorithm called TAG-join, which matches the theoretical communication and computation complexity of state-of-the-art join algorithms. When run on top of the vertex-centric TigerGraph database engine on a single multi-core server, TAG-join exploits thread parallelism and is competitive with (and often outperforms) reference RDBMSs on the TPC benchmarks they are traditionally tuned for. In a distributed cluster, TAG-join outperforms the popular Spark SQL engine.
Supplementary Material
- Download
- 107.62 MB
References
[1]
2018. TPC-H Benchmark. http://www.tpc.org/tpch
[2]
2019. TPC-DS Benchmark. http://www.tpc.org/tpcds
[3]
2020. Apache Giraph. https://giraph.apache.org/
[4]
2020. Apache Parquet. https://parquet.apache.org/
[5]
2020. Apache Spark. https://spark.apache.org
[6]
2020. Apache Spark GraphX. https://spark.apache.org/graphx/
[7]
2020. TigerGraph. https://www.tigergraph.com/
[8]
C. Aberger, A. Lamb, K. Olukotun, and C. Re. 2018. LevelHeaded: A Unified Engine for Business Intelligence and Linear Algebra Querying. In 2018 IEEE 34th International Conference on Data Engineering (ICDE). 449--460.
[9]
Christopher R. Aberger, Andrew Lamb, Susan Tu, Andres Nötzli, Kunle Olukotun, and Christopher Ré. 2017. EmptyHeaded: A Relational Engine for Graph Processing. ACM Trans. Database Syst., Vol. 42, 4, Article 20 (Oct. 2017), 44 pages. https://doi.org/10.1145/3129246
[10]
F. N. Afrati and J. D. Ullman. 2011. Optimizing Multiway Joins in a Map-Reduce Environment. IEEE Transactions on Knowledge and Data Engineering, Vol. 23, 9 (2011), 1282--1298.
[11]
Khaled Ammar and M. Tamer Özsu. 2018. Experimental Analysis of Distributed Graph Systems. Proc. VLDB Endow., Vol. 11, 10 (June 2018), 1151--1164. https://doi.org/10.14778/3231751.3231764
[12]
Michael Armbrust, Reynold S. Xin, Cheng Lian, Yin Huai, Davies Liu, Joseph K. Bradley, Xiangrui Meng, Tomer Kaftan, Michael J. Franklin, Ali Ghodsi, and Matei Zaharia. 2015. Spark SQL: Relational Data Processing in Spark. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (Melbourne, Victoria, Australia) (SIGMOD '15). Association for Computing Machinery, New York, NY, USA, 1383--1394. https://doi.org/10.1145/2723372.2742797
[13]
Albert Atserias, Martin Grohe, and Dániel Marx. 2008. Size Bounds and Query Plans for Relational Joins. In Proceedings of the 2008 49th Annual IEEE Symposium on Foundations of Computer Science (FOCS '08). IEEE Computer Society, USA, 739--748. https://doi.org/10.1109/FOCS.2008.43
[14]
Nurzhan Bakibayev, Tomávs Kovciský, Dan Olteanu, and Jakub Závodný. 2013. Aggregation and Ordering in Factorised Databases. Proc. VLDB Endow., Vol. 6, 14 (Sept. 2013), 1990--2001. https://doi.org/10.14778/2556549.2556579
[15]
Paul Beame, Paraschos Koutris, and Dan Suciu. 2014. Skew in Parallel Query Processing. In Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (Snowbird, Utah, USA) (PODS '14). Association for Computing Machinery, New York, NY, USA, 212--223. https://doi.org/10.1145/2594538.2594558
[16]
Paul Beame, Paraschos Koutris, and Dan Suciu. 2017. Communication Steps for Parallel Query Processing. J. ACM, Vol. 64, 6, Article 40 (Oct. 2017), 58 pages. https://doi.org/10.1145/3125644
[17]
Catriel Beeri, Ronald Fagin, David Maier, and Mihalis Yannakakis. 1983. On the Desirability of Acyclic Database Schemes. J. ACM, Vol. 30, 3 (July 1983), 479--513. https://doi.org/10.1145/2402.322389
[18]
Philip A. Bernstein and Dah-Ming W. Chiu. 1981. Using Semi-Joins to Solve Relational Queries. J. ACM, Vol. 28, 1 (Jan. 1981), 25--40. https://doi.org/10.1145/322234.322238
[19]
Yingyi Bu, Vinayak R. Borkar, Jianfeng Jia, Michael J. Carey, and Tyson Condie. 2014. Pregelix: Big(ger) Graph Analytics on a Dataflow Engine. Proc. VLDB Endow., Vol. 8, 2 (2014), 161--172. https://doi.org/10.14778/2735471.2735477
[20]
Anand Deshpande and D. V. Gucht. 1988. An Implementation for Nested Relational Databases. In VLDB.
[21]
Alin Deutsch, Yu Xu, Mingxi Wu, and Victor E. Lee. 2019. TigerGraph: A Native MPP Graph Database. CoRR, Vol. abs/1901.08248 (2019). arxiv: 1901.08248 http://arxiv.org/abs/1901.08248
[22]
Alin Deutsch, Yu Xu, Mingxi Wu, and Victor E. Lee. 2020. Aggregation Support for Modern Graph Analytics in TigerGraph. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (Portland, OR, USA) (SIGMOD '20). Association for Computing Machinery, New York, NY, USA, 377--392. https://doi.org/10.1145/3318464.3386144
[23]
Mostafa Elhemali, César A. Galindo-Legaria, Torsten Grabs, and Milind M. Joshi. 2007. Execution Strategies for SQL Subqueries. In Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data (Beijing, China) (SIGMOD '07). Association for Computing Machinery, New York, NY, USA, 993--1004. https://doi.org/10.1145/1247480.1247598
[24]
Jing Fan, Adalbert Gerald Soosai Raj, and J. M. Patel. 2015. The Case Against Specialized Graph Analytics Engines. In CIDR .
[25]
George H.L. Fletcher and Peter W. Beck. 2009. Scalable Indexing of RDF Graphs for Efficient Join Processing. In Proceedings of the 18th ACM Conference on Information and Knowledge Management (Hong Kong, China) (CIKM '09). Association for Computing Machinery, New York, NY, USA, 1513--1516. https://doi.org/10.1145/1645953.1646159
[26]
Michael Freitag, Maximilian Bandle, Tobias Schmidt, Alfons Kemper, and Thomas Neumann. 2020. Adopting Worst-Case Optimal Joins in Relational Database Systems. Proc. VLDB Endow., Vol. 13, 12 (July 2020), 1891--1904. https://doi.org/10.14778/3407790.3407797
[27]
Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs. In 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI 12). USENIX Association, Hollywood, CA, 17--30. https://www.usenix.org/conference/osdi12/technical-sessions/presentation/gonzalez
[28]
G. Gottlob, M. Grohe, N. Musliu, M. Samer, and Francesco Scarcello. 2005. Hypertree Decompositions: Structure, Algorithms, and Applications. In WG .
[29]
Georg Gottlob, Nicola Leone, and Francesco Scarcello. 1999. Hypertree Decompositions and Tractable Queries. In Proceedings of the Eighteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, May 31 - June 2, 1999, Philadelphia, Pennsylvania. ACM Press, 21--32. https://doi.org/10.1145/303976.303979
[30]
Martin Grohe and Dániel Marx. 2014. Constraint Solving via Fractional Edge Covers. ACM Trans. Algorithms, Vol. 11, 1, Article 4 (Aug. 2014), 20 pages. https://doi.org/10.1145/2636918
[31]
Minyang Han, Khuzaima Daudjee, Khaled Ammar, M. Tamer Özsu, Xingfang Wang, and Tianqi Jin. 2014. An experimental comparison of Pregel-like graph processing systems. Proceedings of the VLDB Endowment, Vol. 7 (08 2014), 1047--1058. https://doi.org/10.14778/2732977.2732980
[32]
Xiao Hu, Yufei Tao, and Ke Yi. 2017. Output-Optimal Parallel Algorithms for Similarity Joins. In Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (Chicago, Illinois, USA) (PODS '17). Association for Computing Machinery, New York, NY, USA, 79--90. https://doi.org/10.1145/3034786.3056110
[33]
Xiao Hu and Ke Yi. 2019. Instance and Output Optimal Parallel Algorithms for Acyclic Joins. In Proceedings of the 38th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (Amsterdam, Netherlands) (PODS '19). Association for Computing Machinery, New York, NY, USA, 450--463. https://doi.org/10.1145/3294052.3319698
[34]
Alekh Jindal, Praynaa Rawlani, Eugene Wu, Samuel Madden, Amol Deshpande, and Mike Stonebraker. 2014. Vertexica: Your Relational Friend for Graph Analytics! Proc. VLDB Endow., Vol. 7, 13 (Aug. 2014), 1669--1672. https://doi.org/10.14778/2733004.2733057
[35]
Paraschos Koutris, Semih Salihoglu, and Dan Suciu. 2018. Algorithmic Aspects of Parallel Data Processing. Foundations and Trends® in Databases, Vol. 8, 4 (2018), 239--370. https://doi.org/10.1561/1900000055
[36]
Yucheng Low, Joseph Gonzalez, Aapo Kyrola, Danny Bickson, Carlos Guestrin, and Joseph Hellerstein. 2010. GraphLab: A New Framework for Parallel Machine Learning. In Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence (Catalina Island, CA) (UAI'10). AUAI Press, Arlington, Virginia, USA, 340--349.
[37]
Yi Lu, James Cheng, Da Yan, and Huanhuan Wu. 2014. Large-Scale Distributed Graph Computing Systems: An Experimental Evaluation. Proceedings of the VLDB Endowment, Vol. 8.
[38]
Grzegorz Malewicz, Matthew H. Austern, Aart J.C Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: A System for Large-Scale Graph Processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data (Indianapolis, Indiana, USA) (SIGMOD '10). Association for Computing Machinery, New York, NY, USA, 135--146. https://doi.org/10.1145/1807167.1807184
[39]
Robert McCune, Tim Weninger, and Gregory Madey. 2015. Thinking Like a Vertex: a Survey of Vertex-Centric Frameworks for Distributed Graph Processing. Comput. Surveys, Vol. 48 (07 2015). https://doi.org/10.1145/2818185
[40]
Amine Mhedhbi and Semih Salihoglu. 2019. Optimizing Subgraph Queries by Combining Binary and Worst-Case Optimal Joins. Proc. VLDB Endow., Vol. 12, 11 (July 2019), 1692--1704. https://doi.org/10.14778/3342263.3342643
[41]
Hung Q. Ngo. 2018. Worst-Case Optimal Join Algorithms: Techniques, Results, and Open Problems. arxiv: 1803.09930 [cs.DB]
[42]
Hung Q. Ngo, Ely Porat, Christopher Ré, and Atri Rudra. 2018. Worst-Case Optimal Join Algorithms. J. ACM, Vol. 65, 3, Article 16 (March 2018), 40 pages. https://doi.org/10.1145/3180143
[43]
Hung Q Ngo, Christopher Ré, and Atri Rudra. 2014. Skew Strikes Back: New Developments in the Theory of Join Algorithms. SIGMOD Rec., Vol. 42, 4 (Feb. 2014), 5--16. https://doi.org/10.1145/2590989.2590991
[44]
Dan Olteanu and Jakub Závodný. 2015. Size Bounds for Factorised Representations of Query Results. ACM Trans. Database Syst., Vol. 40, 1, Article 2 (March 2015), 44 pages. https://doi.org/10.1145/2656335
[45]
Meikel Poess, Raghunath Othayoth Nambiar, and David Walrath. 2007. Why You Should Run TPC-DS: A Workload Analysis. In Proceedings of the 33rd International Conference on Very Large Data Bases (Vienna, Austria) (VLDB '07). VLDB Endowment, 1138--1149.
[46]
Ainur Smagulova and Alin Deutsch. 2021. Vertex-centric Parallel Computation of SQL Queries. arxiv: 2103.14120 [cs.DB]
[47]
Radu Stoica, George Fletcher, and Juan F. Sequeda. 2019. On directly mapping relational databases to property graphs. In Alberto Mendelzon Workshop on Foundations of Data Management (AMW2019) (CEUR Workshop Proceedings), Aidan Hogan and Tova Milo (Eds.). CEUR-WS.org. http://ceur-ws.org/Vol-2369 13th Alberto Mendelzon International Workshop on Foundations of Data Management, AMW 2019 ; Conference date: 03-06--2019 Through 07-06--2019.
[48]
Leslie G. Valiant. 1990. A Bridging Model for Parallel Computation. Commun. ACM, Vol. 33, 8 (Aug. 1990), 103--111. https://doi.org/10.1145/79173.79181
[49]
Todd L. Veldhuizen. 2012. Leapfrog Triejoin: a worst-case optimal join algorithm. arxiv: 1210.0481 [cs.DB]
[50]
Adam Welc, Raghavan Raman, Zhe Wu, Sungpack Hong, Hassan Chafi, and Jay Banerjee. 2013. Graph Analysis: Do We Have to Reinvent the Wheel?. In First International Workshop on Graph Data Management Experiences and Systems (New York, New York) (GRADES '13). Association for Computing Machinery, New York, NY, USA, Article 7, 6 pages. https://doi.org/10.1145/2484425.2484432
[51]
Da Yan, Yingyi Bu, Yuanyuan Tian, and Amol Deshpande. 2017. Big Graph Analytics Platforms. Foundations and Trends in Databases, Vol. 7 (01 2017), 1--195. https://doi.org/10.1561/1900000056
[52]
Weipeng P. Yan and Per-rAke Larson. 1995. Eager Aggregation and Lazy Aggregation. In Proceedings of the 21th International Conference on Very Large Data Bases (VLDB '95). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 345--357.
[53]
Mihalis Yannakakis. 1981. Algorithms for Acyclic Database Schemes. In Proceedings of the Seventh International Conference on Very Large Data Bases - Volume 7 (Cannes, France) (VLDB '81). VLDB Endowment, 82--94.
[54]
Kangfei Zhao and Jeffrey Xu Yu. 2017. All-in-One: Graph Processing in RDBMSs Revisited. In Proceedings of the 2017 ACM International Conference on Management of Data (Chicago, Illinois, USA) (SIGMOD '17). Association for Computing Machinery, New York, NY, USA, 1165--1180. https://doi.org/10.1145/3035918.3035943
Index Terms
- Vertex-centric Parallel Computation of SQL Queries
Recommendations
Comments
Information & Contributors
Information
Published In
June 2021
2969 pages
ISBN:9781450383431
DOI:10.1145/3448016
- General Chairs:
- Guoliang Li,
- Zhanhuai Li,
- Program Chairs:
- Stratos Idreos,
- Divesh Srivastava
Copyright © 2021 Owner/Author.
This work is licensed under a Creative Commons Attribution International 4.0 License.
Sponsors
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
Published: 18 June 2021
Check for updates
Author Tags
Qualifiers
- Research-article
Conference
SIGMOD/PODS '21
Sponsor:
SIGMOD/PODS '21: International Conference on Management of Data
June 20 - 25, 2021
Virtual Event, China
Acceptance Rates
Overall Acceptance Rate 785 of 4,003 submissions, 20%
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 651Total Downloads
- Downloads (Last 12 months)104
- Downloads (Last 6 weeks)21
Reflects downloads up to 12 Jan 2025
Other Metrics
Citations
View Options
Login options
Check if you have access through your login credentials or your institution to get full access on this article.
Sign in