Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3448016.3457314acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article
Open access

Vertex-centric Parallel Computation of SQL Queries

Published: 18 June 2021 Publication History

Abstract

We present a scheme for parallel execution of SQL queries on top of any vertex-centric BSP graph processing engine. The scheme comprises a graph encoding of relational instances and a vertex program specification of our algorithm called TAG-join, which matches the theoretical communication and computation complexity of state-of-the-art join algorithms. When run on top of the vertex-centric TigerGraph database engine on a single multi-core server, TAG-join exploits thread parallelism and is competitive with (and often outperforms) reference RDBMSs on the TPC benchmarks they are traditionally tuned for. In a distributed cluster, TAG-join outperforms the popular Spark SQL engine.

Supplementary Material

MP4 File (3448016.3457314.mp4)
We present a scheme for parallel execution of SQL queries on top of any vertex-centric BSP graph processing engine. The scheme comprises a graph encoding of relational instances and a vertex program specification of our algorithm. The algorithm matches the theoretical communication and computation complexity of the state-of-the-art parallel join algorithms. When run on top of the TigerGraph database engine, it is competitive with (and often outperforms) reference RDBMSs on the TPC benchmarks they are traditionally tuned for.

References

[1]
2018. TPC-H Benchmark. http://www.tpc.org/tpch
[2]
2019. TPC-DS Benchmark. http://www.tpc.org/tpcds
[3]
2020. Apache Giraph. https://giraph.apache.org/
[4]
2020. Apache Parquet. https://parquet.apache.org/
[5]
2020. Apache Spark. https://spark.apache.org
[6]
2020. Apache Spark GraphX. https://spark.apache.org/graphx/
[7]
2020. TigerGraph. https://www.tigergraph.com/
[8]
C. Aberger, A. Lamb, K. Olukotun, and C. Re. 2018. LevelHeaded: A Unified Engine for Business Intelligence and Linear Algebra Querying. In 2018 IEEE 34th International Conference on Data Engineering (ICDE). 449--460.
[9]
Christopher R. Aberger, Andrew Lamb, Susan Tu, Andres Nötzli, Kunle Olukotun, and Christopher Ré. 2017. EmptyHeaded: A Relational Engine for Graph Processing. ACM Trans. Database Syst., Vol. 42, 4, Article 20 (Oct. 2017), 44 pages. https://doi.org/10.1145/3129246
[10]
F. N. Afrati and J. D. Ullman. 2011. Optimizing Multiway Joins in a Map-Reduce Environment. IEEE Transactions on Knowledge and Data Engineering, Vol. 23, 9 (2011), 1282--1298.
[11]
Khaled Ammar and M. Tamer Özsu. 2018. Experimental Analysis of Distributed Graph Systems. Proc. VLDB Endow., Vol. 11, 10 (June 2018), 1151--1164. https://doi.org/10.14778/3231751.3231764
[12]
Michael Armbrust, Reynold S. Xin, Cheng Lian, Yin Huai, Davies Liu, Joseph K. Bradley, Xiangrui Meng, Tomer Kaftan, Michael J. Franklin, Ali Ghodsi, and Matei Zaharia. 2015. Spark SQL: Relational Data Processing in Spark. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (Melbourne, Victoria, Australia) (SIGMOD '15). Association for Computing Machinery, New York, NY, USA, 1383--1394. https://doi.org/10.1145/2723372.2742797
[13]
Albert Atserias, Martin Grohe, and Dániel Marx. 2008. Size Bounds and Query Plans for Relational Joins. In Proceedings of the 2008 49th Annual IEEE Symposium on Foundations of Computer Science (FOCS '08). IEEE Computer Society, USA, 739--748. https://doi.org/10.1109/FOCS.2008.43
[14]
Nurzhan Bakibayev, Tomávs Kovciský, Dan Olteanu, and Jakub Závodný. 2013. Aggregation and Ordering in Factorised Databases. Proc. VLDB Endow., Vol. 6, 14 (Sept. 2013), 1990--2001. https://doi.org/10.14778/2556549.2556579
[15]
Paul Beame, Paraschos Koutris, and Dan Suciu. 2014. Skew in Parallel Query Processing. In Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (Snowbird, Utah, USA) (PODS '14). Association for Computing Machinery, New York, NY, USA, 212--223. https://doi.org/10.1145/2594538.2594558
[16]
Paul Beame, Paraschos Koutris, and Dan Suciu. 2017. Communication Steps for Parallel Query Processing. J. ACM, Vol. 64, 6, Article 40 (Oct. 2017), 58 pages. https://doi.org/10.1145/3125644
[17]
Catriel Beeri, Ronald Fagin, David Maier, and Mihalis Yannakakis. 1983. On the Desirability of Acyclic Database Schemes. J. ACM, Vol. 30, 3 (July 1983), 479--513. https://doi.org/10.1145/2402.322389
[18]
Philip A. Bernstein and Dah-Ming W. Chiu. 1981. Using Semi-Joins to Solve Relational Queries. J. ACM, Vol. 28, 1 (Jan. 1981), 25--40. https://doi.org/10.1145/322234.322238
[19]
Yingyi Bu, Vinayak R. Borkar, Jianfeng Jia, Michael J. Carey, and Tyson Condie. 2014. Pregelix: Big(ger) Graph Analytics on a Dataflow Engine. Proc. VLDB Endow., Vol. 8, 2 (2014), 161--172. https://doi.org/10.14778/2735471.2735477
[20]
Anand Deshpande and D. V. Gucht. 1988. An Implementation for Nested Relational Databases. In VLDB.
[21]
Alin Deutsch, Yu Xu, Mingxi Wu, and Victor E. Lee. 2019. TigerGraph: A Native MPP Graph Database. CoRR, Vol. abs/1901.08248 (2019). arxiv: 1901.08248 http://arxiv.org/abs/1901.08248
[22]
Alin Deutsch, Yu Xu, Mingxi Wu, and Victor E. Lee. 2020. Aggregation Support for Modern Graph Analytics in TigerGraph. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (Portland, OR, USA) (SIGMOD '20). Association for Computing Machinery, New York, NY, USA, 377--392. https://doi.org/10.1145/3318464.3386144
[23]
Mostafa Elhemali, César A. Galindo-Legaria, Torsten Grabs, and Milind M. Joshi. 2007. Execution Strategies for SQL Subqueries. In Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data (Beijing, China) (SIGMOD '07). Association for Computing Machinery, New York, NY, USA, 993--1004. https://doi.org/10.1145/1247480.1247598
[24]
Jing Fan, Adalbert Gerald Soosai Raj, and J. M. Patel. 2015. The Case Against Specialized Graph Analytics Engines. In CIDR .
[25]
George H.L. Fletcher and Peter W. Beck. 2009. Scalable Indexing of RDF Graphs for Efficient Join Processing. In Proceedings of the 18th ACM Conference on Information and Knowledge Management (Hong Kong, China) (CIKM '09). Association for Computing Machinery, New York, NY, USA, 1513--1516. https://doi.org/10.1145/1645953.1646159
[26]
Michael Freitag, Maximilian Bandle, Tobias Schmidt, Alfons Kemper, and Thomas Neumann. 2020. Adopting Worst-Case Optimal Joins in Relational Database Systems. Proc. VLDB Endow., Vol. 13, 12 (July 2020), 1891--1904. https://doi.org/10.14778/3407790.3407797
[27]
Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs. In 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI 12). USENIX Association, Hollywood, CA, 17--30. https://www.usenix.org/conference/osdi12/technical-sessions/presentation/gonzalez
[28]
G. Gottlob, M. Grohe, N. Musliu, M. Samer, and Francesco Scarcello. 2005. Hypertree Decompositions: Structure, Algorithms, and Applications. In WG .
[29]
Georg Gottlob, Nicola Leone, and Francesco Scarcello. 1999. Hypertree Decompositions and Tractable Queries. In Proceedings of the Eighteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, May 31 - June 2, 1999, Philadelphia, Pennsylvania. ACM Press, 21--32. https://doi.org/10.1145/303976.303979
[30]
Martin Grohe and Dániel Marx. 2014. Constraint Solving via Fractional Edge Covers. ACM Trans. Algorithms, Vol. 11, 1, Article 4 (Aug. 2014), 20 pages. https://doi.org/10.1145/2636918
[31]
Minyang Han, Khuzaima Daudjee, Khaled Ammar, M. Tamer Özsu, Xingfang Wang, and Tianqi Jin. 2014. An experimental comparison of Pregel-like graph processing systems. Proceedings of the VLDB Endowment, Vol. 7 (08 2014), 1047--1058. https://doi.org/10.14778/2732977.2732980
[32]
Xiao Hu, Yufei Tao, and Ke Yi. 2017. Output-Optimal Parallel Algorithms for Similarity Joins. In Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (Chicago, Illinois, USA) (PODS '17). Association for Computing Machinery, New York, NY, USA, 79--90. https://doi.org/10.1145/3034786.3056110
[33]
Xiao Hu and Ke Yi. 2019. Instance and Output Optimal Parallel Algorithms for Acyclic Joins. In Proceedings of the 38th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (Amsterdam, Netherlands) (PODS '19). Association for Computing Machinery, New York, NY, USA, 450--463. https://doi.org/10.1145/3294052.3319698
[34]
Alekh Jindal, Praynaa Rawlani, Eugene Wu, Samuel Madden, Amol Deshpande, and Mike Stonebraker. 2014. Vertexica: Your Relational Friend for Graph Analytics! Proc. VLDB Endow., Vol. 7, 13 (Aug. 2014), 1669--1672. https://doi.org/10.14778/2733004.2733057
[35]
Paraschos Koutris, Semih Salihoglu, and Dan Suciu. 2018. Algorithmic Aspects of Parallel Data Processing. Foundations and Trends® in Databases, Vol. 8, 4 (2018), 239--370. https://doi.org/10.1561/1900000055
[36]
Yucheng Low, Joseph Gonzalez, Aapo Kyrola, Danny Bickson, Carlos Guestrin, and Joseph Hellerstein. 2010. GraphLab: A New Framework for Parallel Machine Learning. In Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence (Catalina Island, CA) (UAI'10). AUAI Press, Arlington, Virginia, USA, 340--349.
[37]
Yi Lu, James Cheng, Da Yan, and Huanhuan Wu. 2014. Large-Scale Distributed Graph Computing Systems: An Experimental Evaluation. Proceedings of the VLDB Endowment, Vol. 8.
[38]
Grzegorz Malewicz, Matthew H. Austern, Aart J.C Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: A System for Large-Scale Graph Processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data (Indianapolis, Indiana, USA) (SIGMOD '10). Association for Computing Machinery, New York, NY, USA, 135--146. https://doi.org/10.1145/1807167.1807184
[39]
Robert McCune, Tim Weninger, and Gregory Madey. 2015. Thinking Like a Vertex: a Survey of Vertex-Centric Frameworks for Distributed Graph Processing. Comput. Surveys, Vol. 48 (07 2015). https://doi.org/10.1145/2818185
[40]
Amine Mhedhbi and Semih Salihoglu. 2019. Optimizing Subgraph Queries by Combining Binary and Worst-Case Optimal Joins. Proc. VLDB Endow., Vol. 12, 11 (July 2019), 1692--1704. https://doi.org/10.14778/3342263.3342643
[41]
Hung Q. Ngo. 2018. Worst-Case Optimal Join Algorithms: Techniques, Results, and Open Problems. arxiv: 1803.09930 [cs.DB]
[42]
Hung Q. Ngo, Ely Porat, Christopher Ré, and Atri Rudra. 2018. Worst-Case Optimal Join Algorithms. J. ACM, Vol. 65, 3, Article 16 (March 2018), 40 pages. https://doi.org/10.1145/3180143
[43]
Hung Q Ngo, Christopher Ré, and Atri Rudra. 2014. Skew Strikes Back: New Developments in the Theory of Join Algorithms. SIGMOD Rec., Vol. 42, 4 (Feb. 2014), 5--16. https://doi.org/10.1145/2590989.2590991
[44]
Dan Olteanu and Jakub Závodný. 2015. Size Bounds for Factorised Representations of Query Results. ACM Trans. Database Syst., Vol. 40, 1, Article 2 (March 2015), 44 pages. https://doi.org/10.1145/2656335
[45]
Meikel Poess, Raghunath Othayoth Nambiar, and David Walrath. 2007. Why You Should Run TPC-DS: A Workload Analysis. In Proceedings of the 33rd International Conference on Very Large Data Bases (Vienna, Austria) (VLDB '07). VLDB Endowment, 1138--1149.
[46]
Ainur Smagulova and Alin Deutsch. 2021. Vertex-centric Parallel Computation of SQL Queries. arxiv: 2103.14120 [cs.DB]
[47]
Radu Stoica, George Fletcher, and Juan F. Sequeda. 2019. On directly mapping relational databases to property graphs. In Alberto Mendelzon Workshop on Foundations of Data Management (AMW2019) (CEUR Workshop Proceedings), Aidan Hogan and Tova Milo (Eds.). CEUR-WS.org. http://ceur-ws.org/Vol-2369 13th Alberto Mendelzon International Workshop on Foundations of Data Management, AMW 2019 ; Conference date: 03-06--2019 Through 07-06--2019.
[48]
Leslie G. Valiant. 1990. A Bridging Model for Parallel Computation. Commun. ACM, Vol. 33, 8 (Aug. 1990), 103--111. https://doi.org/10.1145/79173.79181
[49]
Todd L. Veldhuizen. 2012. Leapfrog Triejoin: a worst-case optimal join algorithm. arxiv: 1210.0481 [cs.DB]
[50]
Adam Welc, Raghavan Raman, Zhe Wu, Sungpack Hong, Hassan Chafi, and Jay Banerjee. 2013. Graph Analysis: Do We Have to Reinvent the Wheel?. In First International Workshop on Graph Data Management Experiences and Systems (New York, New York) (GRADES '13). Association for Computing Machinery, New York, NY, USA, Article 7, 6 pages. https://doi.org/10.1145/2484425.2484432
[51]
Da Yan, Yingyi Bu, Yuanyuan Tian, and Amol Deshpande. 2017. Big Graph Analytics Platforms. Foundations and Trends in Databases, Vol. 7 (01 2017), 1--195. https://doi.org/10.1561/1900000056
[52]
Weipeng P. Yan and Per-rAke Larson. 1995. Eager Aggregation and Lazy Aggregation. In Proceedings of the 21th International Conference on Very Large Data Bases (VLDB '95). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 345--357.
[53]
Mihalis Yannakakis. 1981. Algorithms for Acyclic Database Schemes. In Proceedings of the Seventh International Conference on Very Large Data Bases - Volume 7 (Cannes, France) (VLDB '81). VLDB Endowment, 82--94.
[54]
Kangfei Zhao and Jeffrey Xu Yu. 2017. All-in-One: Graph Processing in RDBMSs Revisited. In Proceedings of the 2017 ACM International Conference on Management of Data (Chicago, Illinois, USA) (SIGMOD '17). Association for Computing Machinery, New York, NY, USA, 1165--1180. https://doi.org/10.1145/3035918.3035943

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '21: Proceedings of the 2021 International Conference on Management of Data
June 2021
2969 pages
ISBN:9781450383431
DOI:10.1145/3448016
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2021

Check for updates

Author Tags

  1. BSP parallel SQL evaluation
  2. vertex-centric graph processing

Qualifiers

  • Research-article

Conference

SIGMOD/PODS '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 651
    Total Downloads
  • Downloads (Last 12 months)104
  • Downloads (Last 6 weeks)21
Reflects downloads up to 12 Jan 2025

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media