Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Detecting Logic Bugs of Join Optimizations in DBMS

Published: 30 May 2023 Publication History

Abstract

Generation-based testing techniques have shown their effectiveness in detecting logic bugs of DBMS, which are often caused by improper implementation of query optimizers. Nonetheless, existing generation-based debug tools are limited to single-table queries and there is a substantial research gap regarding multi-table queries with join operators. In this paper, we propose TQS, a novel testing framework targeted at detecting logic bugs derived by queries involving multi-table joins. Given a target DBMS, TQS achieves the goal with two key components: Data-guided Schema and Query Generation (DSG) and Knowledge-guided Query Space Exploration (KQE). DSG addresses the key challenge of multi-table query debugging: how to generate ground-truth (query, result) pairs for verification. It adopts the database normalization technique to generate a testing schema and maintains a bitmap index for result tracking. To improve debug efficiency, DSG also artificially inserts some noises into the generated data. To avoid repetitive query space search, KQE forms the problem as isomorphic graph set discovery and combines the graph embedding and weighted random walk for query generation. We evaluated TQS on four popular DBMSs: MySQL, MariaDB, TiDB and PolarDB. Experimental results show that TQS is effective in finding logic bugs of join optimization in database management systems. It successfully detected 115 bugs within 24 hours, including 31 bugs in MySQL, 30 in MariaDB, 31 in TiDB, and 23 in PolarDB respectively.

Supplemental Material

MP4 File
Presentation video for SIGMOD 2023.

References

[1]
Gü nes Alucc, Olaf Hartig, M. Tamer Ö zsu, and Khuzaima Daudjee. 2014a. Diversified Stress Testing of RDF Data Management Systems. In ISWC. Springer, 197--212.
[2]
Günecs Alucc, Olaf Hartig, M Tamer Özsu, and Khuzaima Daudjee. 2014b. Diversified stress testing of RDF data management systems. In International Semantic Web Conference. 197--212.
[3]
Akhil Arora, Sakshi Sinha, Piyush Kumar, and Arnab Bhattacharya. 2018. HD-Index: Pushing the Scalability-Accuracy Boundary for Approximate kNN Search in High-Dimensional Spaces. Proc. VLDB Endow., Vol. 11, 8 (2018), 906--919.
[4]
Guillaume Bagan, Angela Bonifati, Radu Ciucanu, George H. L. Fletcher, Auré lien Lemay, and Nicky Advokaat. 2017. gMark: Schema-Driven Generation of Graphs and Queries. In ICDE. 63--64.
[5]
Hardik Bati, Leo Giakoumakis, Steve Herbert, and Aleksandras Surna. 2007. A genetic approach for random testing of database systems. In VLDB. ACM, 1243--1251.
[6]
Carsten Binnig, Donald Kossmann, Eric Lo, and M. Tamer Ö zsu. 2007. QAGen: generating query-aware test databases. In SIGMOD. ACM, 341--352.
[7]
Joachim Biskup. 1995. Achievements of Relational Database Schema Design Theory Revisited. In Semantics in Databases. Springer, 29--54.
[8]
Joachim Biskup, Umeshwar Dayal, and Philip A. Bernstein. 1979. Synthesizing Independent Database Schemas. In SIGMOD, Philip A. Bernstein (Ed.). ACM, 143--151.
[9]
Christian Bizer and Andreas Schultz. 2009. The berlin sparql benchmark. IJSWIS, Vol. 5, 2 (2009), 1--24.
[10]
Daniel Blum and Sara Cohen. 2011. Grr: generating random RDF. In Extended Semantic Web Conference. Springer, 16--30.
[11]
Manuel Bodirsky. 2015. Graph homomorphisms and universal algebra course notes. TU Dresden (2015).
[12]
Robert Brummayer and Armin Biere. 2009. Fuzzing and delta-debugging SMT solvers. In Proceedings of the 7th International Workshop on Satisfiability Modulo Theories. 1--5.
[13]
Nicolas Bruno and Surajit Chaudhuri. 2005. Flexible Database Generators. In VLDB. ACM, 1097--1107.
[14]
Nicolas Bruno, Surajit Chaudhuri, and Dilys Thomas. 2006. Generating Queries with Cardinality Constraints for DBMS Testing. IEEE Trans. Knowl. Data Eng., Vol. 18, 12 (2006), 1721--1725.
[15]
Wei Cao, Yingqiang Zhang, Xinjun Yang, Feifei Li, Sheng Wang, Qingda Hu, Xuntao Cheng, Zongzhi Chen, Zhenjun Liu, Jing Fang, Bo Wang, Yuhui Wang, Haiqing Sun, Ze Yang, Zhushi Cheng, Sen Chen, Jian Wu, Wei Hu, Jianwei Zhao, Yusong Gao, Songlu Cai, Yunyang Zhang, and Jiawang Tong. 2021. PolarDB Serverless: A Cloud Native Database for Disaggregated Data Centers. In SIGMOD. ACM, 2477--2489.
[16]
Pascal Cuoq, Benjamin Monate, Anne Pacalet, Virgile Prevosto, John Regehr, Boris Yakobowski, and Xuejun Yang. 2012. Testing Static Analyzers with Randomly Generated Programs. In NASA Formal Methods - 4th International Symposium, NFM. Springer, 120--125.
[17]
Angjela Davitkova, Damjan Gjurovski, and Sebastian Michel. 2022. LMKG: Learned Models for Cardinality Estimation in Knowledge Graphs. In EDBT. OpenProceedings.org, 2:169--2:182.
[18]
DB-Engines. 2018. DB-Engines Ranking. [EB/OL]. https://db-engines.com/en/ranking.
[19]
Jim Diederich and Jack Milton. 1988. New Methods and Fast Algorithms for Database Normalization. ACM Trans. Database Syst., Vol. 13, 3 (1988), 339--365.
[20]
Chi Thang Duong, Dung Hoang, Hongzhi Yin, Matthias Weidlich, Quoc Viet Hung Nguyen, and Karl Aberer. 2021. Efficient Streaming Subgraph Isomorphism with Graph Neural Networks. VLDB, Vol. 14, 5 (2021), 730--742.
[21]
Orri Erling, Alex Averbuch, Josep Larriba-Pey, Hassan Chafi, Andrey Gubichev, Arnau Prat, Minh-Duc Pham, and Peter Boncz. 2015. The LDBC social network benchmark: Interactive workload. In SIGMOD. 619--630.
[22]
M. R. Garey and David S. Johnson. 1979. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman.
[23]
Jim Gray, Prakash Sundaresan, Susanne Englert, Kenneth Baclawski, and Peter J. Weinberger. 1994. Quickly Generating Billion-Record Synthetic Databases. In SIGMOD. ACM Press, 243--252.
[24]
Zhongxian Gu, Mohamed A Soliman, and Florian M Waas. 2012. Testing the accuracy of query optimizers. In Proceedings of the Fifth International Workshop on Testing Database Systems. 1--6.
[25]
Yuanbo Guo, Zhengxiang Pan, and Jeff Heflin. 2005. LUBM: A benchmark for OWL knowledge base systems. Journal of Web Semantics, Vol. 3, 2--3 (2005), 158--182.
[26]
Kenneth Houkjær, Kristian Torp, and Rico Wind. 2006. Simple and Realistic Data Generation. In VLDB. ACM, 1243--1246.
[27]
Dongxu Huang, Qi Liu, Qiu Cui, Zhuhe Fang, Xiaoyu Ma, Fei Xu, Li Shen, Liu Tang, Yuxing Zhou, Menglong Huang, et al. 2020. TiDB: a Raft-based HTAP database. VLDB, Vol. 13, 12 (2020), 3072--3084.
[28]
Yka Huhtala, Juha K"arkk"ainen, Pasi Porkka, and Hannu Toivonen. 1999. TANE: An efficient algorithm for discovering functional and approximate dependencies. The computer journal, Vol. 42, 2 (1999), 100--111.
[29]
Jinho Jung, Hong Hu, Joy Arulraj, Taesoo Kim, and Woon-Hak Kang. 2019. APOLLO: Automatic Detection and Diagnosis of Performance Regressions in Database Systems. VLDB, Vol. 13, 1 (2019), 57--70.
[30]
Timotej Kapus and Cristian Cadar. 2017. Automatic testing of symbolic execution engines via program generation and differential testing. In ASE. IEEE Computer Society, 590--600.
[31]
Shadi Abdul Khalek, Bassem Elkarablieh, Yai O. Laleye, and Sarfraz Khurshid. 2008. Query-Aware Test Generation Using a Relational Constraint Solver. In ASE. IEEE Computer Society, 238--247.
[32]
Shadi Abdul Khalek and Sarfraz Khurshid. 2010. Automated SQL query generation for systematic testing of database engines. In ASE. ACM, 329--332.
[33]
Vu Le, Mehrdad Afshari, and Zhendong Su. 2014. Compiler validation via equivalence modulo inputs. In PLDI. ACM, 216--226.
[34]
Eric Lo, Carsten Binnig, Donald Kossmann, M. Tamer Ö zsu, and Wing-Kai Hon. 2010. A framework for testing DBMS features. VLDB J., Vol. 19, 2 (2010), 203--230.
[35]
David Maier. 1983. The Theory of Relational Databases. Computer Science Press.
[36]
MariaDB. 2022. MariaDB hints. [EB/OL]. https://mariadb.com/kb/en/optimizer-switch/.
[37]
Mariadb. 2022. Mariadb Homepage. [EB/OL]. https://mariadb.org/.
[38]
William M. McKeeman. 1998. Differential Testing for Software. Digit. Tech. J., Vol. 10, 1 (1998), 100--107.
[39]
Chaitanya Mishra, Nick Koudas, and Calisto Zuzarte. 2008. Generating targeted queries for database testing. In SIGMOD. ACM, 499--510.
[40]
MySQL. 2022a. MySQL hints. [EB/OL]. https://dev.mysql.com/doc/refman/8.0/en/optimizer-hints.html.
[41]
MySQL. 2022b. MySQL Homepage. [EB/OL]. https://www.mysql.com.
[42]
Stack Overflow. 2021. Developer Survey Results. [EB/OL]. https://insights.stackoverflow.com/survey/2021.
[43]
Thorsten Papenbrock and Felix Naumann. 2016. A Hybrid Approach to Functional Dependency Discovery. In SIGMOD, FatmaÖzcan, Georgia Koutrika, and Sam Madden (Eds.). ACM, 821--833.
[44]
Thorsten Papenbrock and Felix Naumann. 2017. Data-driven Schema Normalization. In EDBT. OpenProceedings.org, 342--353.
[45]
Meikel Poess and John M. Stephens. 2004. Generating Thousand Benchmark Queries in Seconds. In VLDB. Morgan Kaufmann, 1045--1053.
[46]
Kim-Thomas Rehmann, Changyun Seo, Dongwon Hwang, Binh Than Truong, Alexander Boehm, and Dong Hun Lee. 2016. Performance Monitoring in SAP HANA's Continuous Integration Process. SIGMETRICS Perform. Evaluation Rev., Vol. 43, 4 (2016), 43--52.
[47]
Manuel Rigger and Zhendong Su. 2020a. Detecting optimization bugs in database engines via non-optimizing reference engine construction. In ACM Joint Meeting on ESEC and FSE. 1140--1152.
[48]
Manuel Rigger and Zhendong Su. 2020b. Finding bugs in database systems via query partitioning. Proceedings of the ACM on Programming Languages, Vol. 4, OOPSLA (2020), 1--30.
[49]
Manuel Rigger and Zhendong Su. 2020c. SQLancer. [EB/OL]. https://github.com/sqlancer/sqlancer.
[50]
Manuel Rigger and Zhendong Su. 2020d. Testing database engines via pivoted query synthesis. In OSDI 20. 667--682.
[51]
Michael Schmidt, Thomas Hornung, Georg Lausen, and Christoph Pinkel. 2009. SP^ 2Bench: a SPARQL performance benchmark. In ICDE. IEEE, 222--233.
[52]
Andreas Seltenreich. 2020. SQLSmith. [EB/OL]. https://github.com/anse1/sqlsmith.
[53]
Donald R. Slutz. 1998. Massive Stochastic Testing of SQL. In VLDB, Ashish Gupta, Oded Shmueli, and Jennifer Widom (Eds.). Morgan Kaufmann, 618--622.
[54]
TiDB. 2022. TiDB hints. [EB/OL]. https://docs.pingcap.com/tidb/v5.3/optimizer-hints.
[55]
Manasi Vartak, Venkatesh Raghavan, and Elke A. Rundensteiner. 2010. QRelX: generating meaningful queries that provide cardinality assurance. In SIGMOD. ACM, 1215--1218.
[56]
Ziheng Wei and Sebastian Link. 2019. Embedded Functional Dependencies and Data-completeness Tailored Database Design. VLDB, Vol. 12, 11 (2019), 1458--1470.
[57]
Kesheng Wu, Ekow J. Otoo, and Arie Shoshani. 2002. Compressing Bitmap Indexes for Faster Search Operations. In SSDBM. IEEE Computer Society, 99--108.
[58]
Jiaqi Yan, Qiuye Jin, Shrainik Jain, Stratis D. Viglas, and Allison W. Lee. 2018. Snowtrail: Testing with Production Queries on a Cloud Database. In SIGMOD. ACM, 4:1--4:6.
[59]
Xuejun Yang, Yang Chen, Eric Eide, and John Regehr. 2011. Finding and understanding bugs in C compilers. In SIGPLAN, PLDI. ACM, 283--294.
[60]
Yixing Yang, Yixiang Fang, Maria E. Orlowska, Wenjie Zhang, and Xuemin Lin. 2021. Efficient Bi-triangle Counting for Large Bipartite Networks. VLDB, Vol. 14, 6 (2021), 984--996.
[61]
Rex Ying, Zhaoyu Lou, Jiaxuan You, Chengtao Wen, Arquimedes Canedo, and Jure Leskovec. 2020. Neural Subgraph Matching. CoRR, Vol. abs/2007.03092 (2020).
[62]
Rui Zhong, Yongheng Chen, Hong Hu, Hangfan Zhang, Wenke Lee, and Dinghao Wu. 2020. SQUIRREL: Testing Database Management Systems with Language Validity and Coverage Feedback. In CCS. ACM, 955--970.

Cited By

View all
  • (2024)Detecting Metadata-Related Logic Bugs in Database Systems via Raw Database ConstructionProceedings of the VLDB Endowment10.14778/3659437.365944517:8(1884-1897)Online publication date: 31-May-2024
  • (2024)Step-wise Execution of Data-Centric SystemsCompanion Proceedings of the 2024 ACM SIGPLAN International Conference on Systems, Programming, Languages, and Applications: Software for Humanity10.1145/3689491.3691821(19-21)Online publication date: 20-Oct-2024
  • (2024)Keep It Simple: Testing Databases via Differential Query PlansProceedings of the ACM on Management of Data10.1145/36549912:3(1-26)Online publication date: 30-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Management of Data
Proceedings of the ACM on Management of Data  Volume 1, Issue 1
PACMMOD
May 2023
2807 pages
EISSN:2836-6573
DOI:10.1145/3603164
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 May 2023
Published in PACMMOD Volume 1, Issue 1

Permissions

Request permissions for this article.

Badges

  • Best Paper

Author Tags

  1. database
  2. join optimization
  3. logic bug

Qualifiers

  • Research-article

Funding Sources

  • National Natural Science Foundation of China

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)869
  • Downloads (Last 6 weeks)86
Reflects downloads up to 16 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Detecting Metadata-Related Logic Bugs in Database Systems via Raw Database ConstructionProceedings of the VLDB Endowment10.14778/3659437.365944517:8(1884-1897)Online publication date: 31-May-2024
  • (2024)Step-wise Execution of Data-Centric SystemsCompanion Proceedings of the 2024 ACM SIGPLAN International Conference on Systems, Programming, Languages, and Applications: Software for Humanity10.1145/3689491.3691821(19-21)Online publication date: 20-Oct-2024
  • (2024)Keep It Simple: Testing Databases via Differential Query PlansProceedings of the ACM on Management of Data10.1145/36549912:3(1-26)Online publication date: 30-May-2024
  • (2024)Towards automating microservices orchestration through data-driven evolutionary architecturesService Oriented Computing and Applications10.1007/s11761-024-00387-x18:1(1-12)Online publication date: 27-Feb-2024
  • (2023)A Demonstration of DLBD: Database Logic Bug Detection SystemProceedings of the VLDB Endowment10.14778/3611540.361158416:12(3914-3917)Online publication date: 1-Aug-2023

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media