Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3318464.3389769acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Bitvector-aware Query Optimization for Decision Support Queries

Published: 31 May 2020 Publication History

Abstract

Bitvector filtering is an important query processing technique that can significantly reduce the cost of execution, especially for complex decision support queries with multiple joins. Despite its wide application, however, its implication to query optimization is not well understood. In this work, we study how bitvector filters impact query optimization. We show that incorporating bitvector filters into query optimization straightforwardly can increase the plan space complexity by an exponential factor in the number of relations in the query. We analyze the plans with bitvector filters for star and snowflake queries in the plan space of right deep trees without cross products. Surprisingly, with some simplifying assumptions, we prove that, the plan of the minimal cost with bitvector filters can be found from a linear number of plans in the number of relations in the query. This greatly reduces the plan space complexity for such queries from exponential to linear. Motivated by our analysis, we propose an algorithm that accounts for the impact of bitvector filters in query optimization. Our algorithm optimizes the join order for an arbitrary decision support query by choosing from a linear number of candidate plans in the number of relations in the query. We implement our algorithm in a commercial database DBMS-X as a transformation rule. Our evaluation on both industry standard benchmarks and customer workload shows that, compared with DBMS-X, our technique reduces the total CPU execution time by 22%-64% for the workloads, with up to two orders of magnitude reduction in CPU execution time for individual queries.

Supplementary Material

MP4 File (3318464.3389769.mp4)
Presentation Video

References

[1]
Paulo Sérgio Almeida, Carlos Baquero, Nuno Preguicc a, and David Hutchison. 2007. Scalable bloom filters. Inform. Process. Lett., Vol. 101, 6 (2007), 255--261.
[2]
Lyublena Antova, Amr El-Helw, Mohamed A. Soliman, Zhongxian Gu, Michalis Petropoulos, and Florian Waas. 2014. Optimizing Queries over Partitioned Tables in MPP Systems. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data (SIGMOD'14). Association for Computing Machinery, New York, NY, USA, 373--384. https://doi.org/10.1145/2588555.2595640
[3]
M. M. Astrahan, M. W. Blasgen, D. D. Chamberlin, K. P. Eswaran, J. N. Gray, P. P. Griffiths, W. F. King, R. A. Lorie, P. R. McJones, J. W. Mehl, and et al. 1976. System R: Relational Approach to Database Management. ACM Trans. Database Syst., Vol. 1, 2, 97--137. https://doi.org/10.1145/320455.320457
[4]
Catriel Beeri and Raghu Ramakrishnan. 1991. On the power of magic. The journal of logic programming, Vol. 10, 3--4, 255--299.
[5]
Philip A. Bernstein and Dah-Ming W. Chiu. 1981. Using Semi-Joins to Solve Relational Queries. J. ACM, Vol. 28, 1, 25--40. https://doi.org/10.1145/322234.322238
[6]
Burton H. Bloom. 1970. Space/Time Trade-Offs in Hash Coding with Allowable Errors. Commun. ACM, Vol. 13, 7, 422--426. https://doi.org/10.1145/362686.362692
[7]
Kjell Bratbergsengen. 1984. Hashing Methods and Relational Algebra Operations. In Tenth International Conference on Very Large Data Bases, August 27--31, 1984, Singapore, Proceedings. Morgan Kaufmann, 323--333.
[8]
Chee-Yong Chan and Yannis E. Ioannidis. 1998. Bitmap Index Design and Evaluation. In Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data (SIGMOD'98). Association for Computing Machinery, New York, NY, USA, 355--366. https://doi.org/10.1145/276304.276336
[9]
M. Chen and P. S. Yu. 1992. Interleaving a join sequence with semijoins in distributed query processing. IEEE Transactions on Parallel and Distributed Systems, Vol. 3, 5, 611--621. https://doi.org/10.1109/71.159044
[10]
Ming-Syan Chen, Hui-I Hsiao, and Philip S. Yu. 1993. Applying Hash Filters to Improving the Execution of Bushy Trees. In 19th International Conference on Very Large Data Bases, August 24--27, 1993, Dublin, Ireland, Proceedings. Morgan Kaufmann, 505--516.
[11]
Ming-Syan Chen, Hui-I Hsiao, and Philip S. Yu. 1997. On Applying Hash Filters to Improving the Execution of Multi-Join Queries. VLDB J., Vol. 6, 2, 121--131. https://doi.org/10.1007/s007780050036
[12]
Dinesh Das, Jiaqi Yan, Mohamed Zait, Satyanarayana R. Valluri, Nirav Vyas, Ramarajan Krishnamachari, Prashant Gaharwar, Jesse Kamp, and Niloy Mukherjee. 2015. Query Optimization in Oracle 12c Database In-Memory. PVLDB, Vol. 8, 12, 1770--1781. https://doi.org/10.14778/2824032.2824074
[13]
Bailu Ding, Surajit Chaudhuri, and Vivek Narasayya. 2020. Bitvector-aware query optimization for decision support queries (extended version), MSR-TR-2020--8.
[14]
Bin Fan, Dave G. Andersen, Michael Kaminsky, and Michael D. Mitzenmacher. 2014. Cuckoo Filter: Practically Better Than Bloom. In Proceedings of the 10th ACM International on Conference on Emerging Networking Experiments and Technologies (CoNEXT'14). Association for Computing Machinery, New York, NY, USA, 75--88. https://doi.org/10.1145/2674005.2674994
[15]
P. Fender, G. Moerkotte, T. Neumann, and V. Leis. 2012. Effective and Robust Pruning for Top-Down Join Enumeration Algorithms. In 2012 IEEE 28th International Conference on Data Engineering. 414--425. https://doi.org/10.1109/ICDE.2012.27
[16]
C. A. Galindo-Legaria, T. Grabs, S. Gukal, S. Herbert, A. Surna, S. Wang, W. Yu, P. Zabback, and S. Zhang. 2008. Optimizing Star Join Queries for Data Warehousing in Microsoft SQL Server. In 2008 IEEE 24th International Conference on Data Engineering. 1190--1199. https://doi.org/10.1109/ICDE.2008.4497528
[17]
Goetz Graefe. 1993. Query Evaluation Techniques for Large Databases. ACM Comput. Surv., Vol. 25, 2, 73--169. https://doi.org/10.1145/152610.152611
[18]
Goetz Graefe. 1995. The Cascades framework for query optimization. IEEE Data Eng. Bull., Vol. 18, 3, 19--29.
[19]
G. Graefe and W. J. McKenna. 1993. The Volcano optimizer generator: extensibility and efficient search. In Proceedings of IEEE 9th International Conference on Data Engineering. 209--218. https://doi.org/10.1109/ICDE.1993.344061
[20]
Hui-I Hsiao, Ming-Syan Chen, and Philip S. Yu. 1994. On Parallel Execution of Multiple Pipelined Hash Joins. In Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data (SIGMOD'94). Association for Computing Machinery, New York, NY, USA, 185--196. https://doi.org/10.1145/191839.191879
[21]
Nikos Karayannidis, Aris Tsois, Timos K. Sellis, Roland Pieringer, Volker Markl, Frank Ramsak, Robert Fenk, Klaus Elhardt, and Rudolf Bayer. 2002. Processing Star Queries on Hierarchically-Clustered Fact Tables. In Proceedings of 28th International Conference on Very Large Data Bases, VLDB 2002, Hong Kong, August 20--23, 2002. Morgan Kaufmann, 730--741. https://doi.org/10.1016/B978--155860869--6/50070--6
[22]
T. Lahiri, S. Chavan, M. Colgan, D. Das, A. Ganesh, M. Gleeson, S. Hase, A. Holloway, J. Kamp, T. Lee, J. Loaiza, N. Macnaughton, V. Marwah, N. Mukherjee, A. Mullick, S. Muthulingam, V. Raja, M. Roth, E. Soylemez, and M. Zait. 2015. Oracle Database In-Memory: A dual format in-memory database. In 2015 IEEE 31st International Conference on Data Engineering. 1253--1258. https://doi.org/10.1109/ICDE.2015.7113373
[23]
Harald Lang, Thomas Neumann, Alfons Kemper, and Peter A. Boncz. 2019. Performance-Optimal Filtering: Bloom overtakes Cuckoo at High-Throughput. PVLDB, Vol. 12, 5, 502--515. https://doi.org/10.14778/3303753.3303757
[24]
Viktor Leis, Bernhard Radke, Andrey Gubichev, Atanas Mirchev, Peter A. Boncz, Alfons Kemper, and Thomas Neumann. 2018. Query optimization through the looking glass, and what we found running the Join Order Benchmark. VLDB J., Vol. 27, 5, 643--668. https://doi.org/10.1007/s00778-017-0480--7
[25]
Guido Moerkotte and Thomas Neumann. 2006. Analysis of Two Existing and One New Dynamic Programming Algorithm for the Generation of Optimal Bushy Join Trees without Cross Products. In Proceedings of the 32nd International Conference on Very Large Data Bases, Seoul, Korea, September 12--15, 2006. ACM, 930--941. http://dl.acm.org/citation.cfm?id=1164207
[26]
Guido Moerkotte and Thomas Neumann. 2008. Dynamic Programming Strikes Back. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (SIGMOD'08). Association for Computing Machinery, New York, NY, USA, 539--552. https://doi.org/10.1145/1376616.1376672
[27]
Thomas Neumann. 2009a. Query Simplification: Graceful Degradation for Join-Order Optimization. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data (SIGMOD'09). Association for Computing Machinery, New York, NY, USA, 403--414. https://doi.org/10.1145/1559845.1559889
[28]
Thomas Neumann. 2009b. Query Simplification: Graceful Degradation for Join-Order Optimization. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data (SIGMOD'09). Association for Computing Machinery, New York, NY, USA, 403--414. https://doi.org/10.1145/1559845.1559889
[29]
Thomas Neumann and César A. Galindo-Legaria. 2013. Taking the Edge off Cardinality Estimation Errors using Incremental Execution. In Datenbanksysteme für Business, Technologie und Web (BTW), 15. Fachtagung des GI-Fachbereichs "Datenbanken und Informationssysteme" (DBIS), 11.-15.3.2013 in Magdeburg, Germany. Proceedings (LNI ), Vol. P-214. GI, 73--92. https://dl.gi.de/20.500.12116/17356
[30]
Kiyoshi Ono and Guy M. Lohman. 1990. Measuring the Complexity of Join Enumeration in Query Optimization. In 16th International Conference on Very Large Data Bases, August 13--16, 1990, Brisbane, Queensland, Australia, Proceedings. Morgan Kaufmann, 314--325.
[31]
Felix Putze, Peter Sanders, and Johannes Singler. 2007. Cache-, hash-and space-efficient bloom filters. In International Workshop on Experimental and Efficient Algorithms. Springer, 108--121.
[32]
Praveen Seshadri, Joseph M. Hellerstein, Hamid Pirahesh, T. Y. Cliff Leung, Raghu Ramakrishnan, Divesh Srivastava, Peter J. Stuckey, and S. Sudarshan. 1996. Cost-Based Optimization for Magic: Algebra and Implementation. In Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data (SIGMOD'96). Association for Computing Machinery, New York, NY, USA, 435--446. https://doi.org/10.1145/233269.233360
[33]
David Simmen, Eugene Shekita, and Timothy Malkemus. 1996. Fundamental Techniques for Order Optimization. In Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data (SIGMOD'96). Association for Computing Machinery, New York, NY, USA, 57--67. https://doi.org/10.1145/233269.233320
[34]
Mohamed A. Soliman, Lyublena Antova, Venkatesh Raghavan, Amr El-Helw, Zhongxian Gu, Entong Shen, George C. Caragea, Carlos Garcia-Alvarado, Foyzur Rahman, Michalis Petropoulos, and et al. 2014. Orca: A Modular Query Optimizer Architecture for Big Data. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data (SIGMOD'14). Association for Computing Machinery, New York, NY, USA, 337--348. https://doi.org/10.1145/2588555.2595637
[35]
Patrick Valduriez and Georges Gardarin. 1984. Join and Semijoin Algorithms for a Multiprocessor Database Machine. ACM Trans. Database Syst., Vol. 9, 1, 133--161. https://doi.org/10.1145/348.318590
[36]
Andreas Weininger. 2002. Efficient Execution of Joins in a Star Schema. In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data (SIGMOD'02). Association for Computing Machinery, New York, NY, USA, 542--545. https://doi.org/10.1145/564691.564754
[37]
Jianqiao Zhu, Navneet Potti, Saket Saurabh, and Jignesh M. Patel. 2017. Looking Ahead Makes Query Plans Robust. PVLDB, Vol. 10, 8, 889--900. https://doi.org/10.14778/3090163.3090167

Cited By

View all
  • (2024)POLAR: Adaptive and Non-invasive Join Order Selection via Plans of Least ResistanceProceedings of the VLDB Endowment10.14778/3648160.364817517:6(1350-1363)Online publication date: 1-Feb-2024
  • (2023)Simple Adaptive Query Processing vs. Learned Query Optimizers: Observations and AnalysisProceedings of the VLDB Endowment10.14778/3611479.361150116:11(2962-2975)Online publication date: 24-Aug-2023
  • (2023)Analyzing the Impact of Cardinality Estimation on Execution Plans in Microsoft SQL ServerProceedings of the VLDB Endowment10.14778/3611479.361149416:11(2871-2883)Online publication date: 24-Aug-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data
June 2020
2925 pages
ISBN:9781450367356
DOI:10.1145/3318464
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 May 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. bitvector filter
  2. bloom filter
  3. database
  4. join order enumeration
  5. query optimization
  6. query processing

Qualifiers

  • Research-article

Conference

SIGMOD/PODS '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)67
  • Downloads (Last 6 weeks)15
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)POLAR: Adaptive and Non-invasive Join Order Selection via Plans of Least ResistanceProceedings of the VLDB Endowment10.14778/3648160.364817517:6(1350-1363)Online publication date: 1-Feb-2024
  • (2023)Simple Adaptive Query Processing vs. Learned Query Optimizers: Observations and AnalysisProceedings of the VLDB Endowment10.14778/3611479.361150116:11(2962-2975)Online publication date: 24-Aug-2023
  • (2023)Analyzing the Impact of Cardinality Estimation on Execution Plans in Microsoft SQL ServerProceedings of the VLDB Endowment10.14778/3611479.361149416:11(2871-2883)Online publication date: 24-Aug-2023
  • (2023)AggFirstJoin: Optimizing Geo-Distributed Joins using Aggregation-Based Transformations2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid57682.2023.00046(414-425)Online publication date: May-2023
  • (2022)SQLiteProceedings of the VLDB Endowment10.14778/3554821.355484215:12(3535-3547)Online publication date: 1-Aug-2022
  • (2022)New query optimization techniques in the Spark engine of Azure synapseProceedings of the VLDB Endowment10.14778/3503585.350360115:4(936-948)Online publication date: 14-Apr-2022
  • (2021)DSBProceedings of the VLDB Endowment10.14778/3484224.348423414:13(3376-3388)Online publication date: 28-Oct-2021
  • (2021)Optimization of correlate subquery based on distributed databaseXibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University10.1051/jnwpu/2021394090939:4(909-918)Online publication date: 23-Sep-2021

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media