Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3372297.3417260acmconferencesArticle/Chapter ViewAbstractPublication PagesccsConference Proceedingsconference-collections
research-article
Public Access

SQUIRREL: Testing Database Management Systems with Language Validity and Coverage Feedback

Published: 02 November 2020 Publication History

Abstract

Fuzzing is an increasingly popular technique for verifying software functionalities and finding security vulnerabilities. However, current mutation-based fuzzers cannot effectively test database management systems (DBMSs), which strictly check inputs for valid syntax and semantics. Generation-based testing can guarantee the syntax correctness of the inputs, but it does not utilize any feedback, like code coverage, to guide the path exploration.
In this paper, we develop Squirrel, a novel fuzzing framework that considers both language validity and coverage feedback to test DBMSs. We design an intermediate representation (IR) to maintain SQL queries in a structural and informative manner. To generate syntactically correct queries, we perform type-based mutations on IR, including statement insertion, deletion and replacement. To mitigate semantic errors, we analyze each IR to identify the logical dependencies between arguments, and generate queries that satisfy these dependencies. We evaluated Squirrel on four popular DBMSs: SQLite, MySQL, PostgreSQL and MariaDB. Squirrel found 51 bugs in SQLite, 7 in MySQL and 5 in MariaDB. 52 of the bugs are fixed with 12 CVEs assigned. In our experiment, Squirrel achieves 2.4×-243.9× higher semantic correctness than state-of-the-art fuzzers, and explores 2.0×-10.9× more new edges than mutation-based tools. These results show that Squirrel is effective in finding memory errors of database management systems.

Supplementary Material

MOV File (Copy of CCS2020_fp216_RuiZhong - Brian Hollendyke.mov)
Presentation video

References

[1]
MariaDB. https://www.mariadb.org/.
[2]
MySQL. https://www.mysql.com/.
[3]
PostgreSQL. https://www.postgresql.org/.
[4]
SQLite. https://www.sqlite.org/index.html.
[5]
SQLSmith. https://github.com/anse1/sqlsmith, 2016.
[6]
Processing a SQL Statement. https://docs.microsoft.com/en-us/sql/odbc/reference/processing-a-sql-statement?view=sql-server-ver15, 2017.
[7]
Yahoo Says All Three Billion Accounts Hacked in 2013 Data Theft. https://www.reuters.com/article/us-yahoo-cyber/yahoo-says-all-three-billion-accounts-hacked-in-2013-data-theft-idUSKCN1C82O1, October 2017.
[8]
MySQL Customers. https://www.mysql.com/customers/, 2020.
[9]
PostgreSQL Clients. https://wiki.postgresql.org/wiki/PostgreSQL_Clients, 2020.
[10]
SQL Keywords Reference. https://www.w3schools.com/sql/sql_ref_keywords.asp, 2020.
[11]
SQL Operators. https://www.w3schools.com/sql/sql_operators.asp, 2020.
[12]
Well-Known Users of SQLite. https://www.sqlite.org/famous.html, 2020.
[13]
SQLsmith Description. https://github.com/anse1/sqlsmith#description, 2020.
[14]
S. Abdul Khalek and S. Khurshid. Automated SQL Query Generation for Systematic Testing of Database Engines. In Proceedings of the IEEE/ACM International Conference on Automated Software Engineering, ASE10, 2010.
[15]
B. Acohido. Small Banks and Credit Union Attack Set for Tuesday. https://www.usatoday.com/story/cybertruth/2013/05/06/ddos-denial-of-service-small-business-cybersecurity-privacy/2139349/, May 2013.
[16]
Alloy. Alloy - Documentation of Alloy SAT solver, 2019. https://alloytools.org/documentation.html.
[17]
C. Aschermann, T. Frassetto, T. Holz, P. Jauernig, A.-R. Sadeghi, and D. Teuchert. Nautilus: Fishing for deep bugs with grammars. In NDSS, 2019.
[18]
C. Aschermann, S. Schumilo, T. Blazytko, R. Gawlik, and T. Holz. Redqueen: Fuzzing with input-to-state correspondence. In Symposium on Network and Distributed System Security (NDSS), 2019.
[19]
H. Bati, L. Giakoumakis, S. Herbert, and A. Surna. A Genetic Approach for Random Testing of Database Systems. In Proceedings of the 33rd International Conference on Very Large Data Bases, VLDB07, pages 1243--1251, 2007.
[20]
T. Blazytko, C. Aschermann, M. Schlögel, A. Abbasi, S. Schumilo, S. Wörner, and T. Holz. GRIMOIRE: Synthesizing structure while fuzzing. In USENIX Security Symposium, 2019.
[21]
M. Böhme, V.-T. Pham, and A. Roychoudhury. Coverage-based Greybox Fuzzing As Markov Chain. In Proceedings of the 23rd ACM Conference on Computer and Communications Security (CCS), Vienna, Austria, Oct. 2016.
[22]
M. Böhme, V.-T. Pham, M.-D. Nguyen, and A. Roychoudhury. Directed greybox fuzzing. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pages 2329--2344, 2017.
[23]
E. V. Buskirk. Facebook Confirms Denial-of-Service Attack. https://www.wired.com/2009/08/facebook-apparently-attacked-in-addition-to-twitter/, Au-gust 2009.
[24]
B. Chandra, B. Chawda, B. Kar, K. V. M. Reddy, S. Shah, and S. Sudarshan. Data generation for testing and grading sql queries. The VLDB Journal, 24(6):731--755,Aug 2015. ISSN 0949--877X. URL http://dx.doi.org/10.1007/s00778-015-0395-0.
[25]
P. Chen and H. Chen. Angora: Efficient Fuzzing By Principled Search. In Proceedings of the 39th IEEE Symposium on Security and Privacy (Oakland), San Francisco, CA, May 2018.
[26]
Y. Chen, P. Li, J. Xu, S. Guo, R. Zhou, Y. Zhang, L. Lu, et al. Savior: Towards bug-driven hybrid testing. In Proceedings of the 41th IEEE Symposium on Security and Privacy (Oakland), San Francisco, CA, May 2020.
[27]
C. Cimpanu. Google Chrome Impacted by New Magellan 2.0 Vulnerabilities. https://www.zdnet.com/article/google-chrome-impacted-by-new-magellan-2-0-vulnerabilities/, December 2019.
[28]
U. M. Fayyad. Data Science Revealed: A Data-Driven Glimpse into the BurgeoningNew Field. https://fayyad.com/data-science-revealed-a-data-driven-glimpse-into-the-burgeoning-new-field/, 2011.
[29]
L. Franceschi-Bicchierai. Hacker Tries To Sell 427 Milllion Stolen MySpacePasswords For $2,800. https://www.vice.com/en_us/article/pgkk8v/427-million-myspace-passwords-emails-data-breach, May 2016.
[30]
S. Gan, C. Zhang, X. Qin, X. Tu, K. Li, Z. Pei, and Z. Chen. CollAFL: Path Sensitive Fuzzing. In Proceedings of the 39th IEEE Symposium on Security and Privacy(Oakland), San Francisco, CA, May 2018.
[31]
S. Gan, C. Zhang, P. Chen, B. Zhao, X. Qin, D. Wu, and Z. Chen. GREYONE: Data Flow Sensitive Fuzzing. In Proceedings of the 29th USENIX Security Symposium(Security), BOSTON, MA, Aug. 2020.
[32]
Google. Honggfuzz, 2016. https://google.github.io/honggfuzz/.
[33]
Google. OSS-Fuzz - Continuous Fuzzing For Open Source Software. https://github.com/google/oss-fuzz, 2018.
[34]
H. Han, D. Oh, and S. K. Cha. Codealchemist: Semantics-aware code generation to find vulnerabilities in javascript engines. In Proceedings of the 2019 Annual Network and Distributed System Security Symposium (NDSS), San Diego, CA, Feb. 2019.
[35]
T. Hunt. The 773 Million Record "Collection #1" Data Breach. https://www.troyhunt.com/the-773-million-record-collection-1-data-reach/, January 2020.
[36]
J. Jung, H. Hu, J. Arulraj, T. Kim, and W. Kang. APOLLO: Automatic Detection and Diagnosis of Performance Regressions in Database Systems (to appear). In Proceedings of the 46th International Conference on Very Large Data Bases (VLDB), Tokyo, Japan, Aug. 2020.
[37]
V. V. Koushik. ALERT: SQLite database Remote Code Execution Vulnerability. https://www.secpod.com/blog/sqlite-database-remote-code-execution/, August2019.
[38]
D. Laney. 3-D Data Management: Controlling Data Volume, Velocity and Variety. Technical report, Feb. 2001.
[39]
C. Lemieux and K. Sen. Fairfuzz: A targeted mutation strategy for increasing greybox fuzz testing coverage. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, pages 475--485, 2018.
[40]
G. Li, X. Zhou, S. Li, and B. Gao. QTune: A Query-Aware Database Tuning System with Deep Reinforcement Learning.Proceedings of the VLDB Endowment, 12(12):2118--2130, 2019.
[41]
Y. Li, B. Chen, M. Chandramohan, S.-W. Lin, Y. Liu, and A. Tiu. Steelix: Program-state Based Binary Fuzzing. In Proceedings of the 11th Joint Meeting on Foundations of Software Engineering, 2017.
[42]
LLVM. LibFuzzer - A Library For Coverage-guided Fuzz Testing, 2017. http://llvm.org/docs/LibFuzzer.html.
[43]
E. Lo, C. Binnig, D. Kossmann, M. Tamer Özsu, and W.-K. Hon. A Framework for Testing DBMS Features. The VLDB Journal, 19(2):203--230, Apr. 2010.
[44]
M. Marcozzi, W. Vanhoof, and J.-L. Hainaut. Test Input Generation for Database Programs Using Relational Constraints. In Proceedings of the Fifth International Workshop on Testing Database Systems, DBTest 12, 2012.
[45]
B. P. Miller, L. Fredriksen, and B. So. An Empirical Study Of The Reliability Of UNIX Utilities. Commun. ACM, 33(12):32--44, Dec. 1990.
[46]
C. Mishra, N. Koudas, and C. Zuzarte. Generating targeted queries for database testing. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD '08, page 499--510, New York, NY, USA, 2008. Association for Computing Machinery. ISBN 9781605581026. URL https://doi.org/10.1145/1376616.1376668.
[47]
Mozilla Security. funfuzz. https://github.com/MozillaSecurity/funfuzz, 2020.
[48]
S. Nagy and M. Hicks. Full-speed Fuzzing: Reducing Fuzzing Overhead Through Coverage-guided Tracing. In Proceedings of the 40th IEEE Symposium on Security and Privacy (Oakland), San Francisco, CA, May 2019.
[49]
R. Padhye, C. Lemieux, K. Sen, M. Papadakis, and Y. Le Traon. Semantic fuzzing with zest. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis, pages 329--340, 2019.
[50]
H. Peng, Y. Shoshitaishvili, and M. Payer. T-Fuzz: Fuzzing By Program Trans-formation. In Proceedings of the 39th IEEE Symposium on Security and Privacy(Oakland), San Francisco, CA, May 2018.
[51]
K.-T. Rehmann, C. Seo, D. Hwang, B. Truong, A. Böhm, and D. Lee. Performance Monitoring in SAP HANA's Continuous Integration Process. ACM SIGMETRICS Performance Evaluation Review, 43:43--52, 02 2016.
[52]
M. Rigger and Z. Su. Testing Database Engines via Pivoted Query Synthesis. arXiv preprint arXiv:2001.04174, 2020.
[53]
S. Schumilo, C. Aschermann, R. Gawlik, S. Schinzel, and T. Holz. kAFL: Hardware-Assisted Feedback Fuzzing for OS Kernels. In Proceedings of the 26th USENIX Security Symposium (Security), Vancouver, BC, Canada, Aug. 2017.
[54]
K. Serebryany. Sanitize, Fuzz, And Harden Your C++ Code. San Francisco, CA, 2016. USENIX Association.
[55]
D. Slutz. Massive Stochastic Testing of SQL. Technical Report MSR-TR-98--21, August 1998. URL https://www.microsoft.com/en-us/research/publication/massive-stochastic-testing-of-sql/.
[56]
N. Stephens, J. Grosen, C. Salls, A. Dutcher, R. Wang, J. Corbetta, Y. Shoshitaishvili, C. Kruegel, and G. Vigna. Driller: Augmenting fuzzing through selective symbolic execution. In NDSS, volume 16, pages 1--16, 2016.
[57]
M. Stonebraker, S. Madden, and P. Dubey. Intel "Big Data" Science and Technology Center Vision and Execution Plan.ACM SIGMOD Record, 42(1):44--49, 2013.
[58]
J. Wang, P. Zhang, L. Zhang, H. Zhu, and X. Ye. A model-based fuzzing approach for dbms. In 2013 8th International Conference on Communications and Networking in China (CHINACOM), pages 426--431, Aug 2013.
[59]
J. Wang, B. Chen, L. Wei, and Y. Liu. Superion: Grammar-Aware Greybox Fuzzing. In Proceedings of the 41st International Conference on Software Engineering (ICSE), Montreal, QC, Canada, May 2019.
[60]
G. Wassermann and Z. Su. Sound and Precise Analysis of Web Applications for Injection Vulnerabilities. In Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation, New York, NY, USA, 2007.
[61]
R. Xu and S. Vegasena. Vasilisk. https://blog.osiris.cyber.nyu.edu/2019/12/22/vasilisk/, December 2019.
[62]
W. Xu, S. Kashyap, C. Min, and T. Kim. Designing New Operating Primitives to Improve Fuzzing Performance. In Proceedings of the 24th ACM Conference on Computer and Communications Security (CCS), Dallas, TX, Oct.--Nov. 2017.
[63]
J. Yan, Q. Jin, S. Jain, S. D. Viglas, and A. Lee. Snowtrail: Testing with Production Queries on a Cloud Database. In Proceedings of the Workshop on Testing Database Systems, New York, NY, USA, 2018.
[64]
I. Yun, S. Lee, M. Xu, Y. Jang, and T. Kim. QSYM: A Practical Concolic Execution Engine Tailored for Hybrid Fuzzing. In 27th USENIX Security Symposium (USENIX Security 18), Baltimore, MD, Aug. 2018.
[65]
M. Zalewski. American Fuzzy Lop (2.52b). http://lcamtuf.coredump.cx/afl, 2019.
[66]
M. Zalewski. Technical "Whitepaper" For Afl-fuzz. http://lcamtuf.coredump.cx/afl/technical_details.txt, 2019.

Cited By

View all
  • (2024)Keep It Simple: Testing Databases via Differential Query PlansProceedings of the ACM on Management of Data10.1145/36549912:3(1-26)Online publication date: 30-May-2024
  • (2024)Finding Cross-Rule Optimization Bugs in Datalog EnginesProceedings of the ACM on Programming Languages10.1145/36498158:OOPSLA1(110-136)Online publication date: 29-Apr-2024
  • (2024)Finding XPath Bugs in XML Document Processors via Differential TestingProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639208(1-12)Online publication date: 20-May-2024
  • Show More Cited By

Index Terms

  1. SQUIRREL: Testing Database Management Systems with Language Validity and Coverage Feedback

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CCS '20: Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security
      October 2020
      2180 pages
      ISBN:9781450370899
      DOI:10.1145/3372297
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 02 November 2020

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. coverage-guided testing
      2. database security
      3. language validity

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      CCS '20
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 1,261 of 6,999 submissions, 18%

      Upcoming Conference

      CCS '24
      ACM SIGSAC Conference on Computer and Communications Security
      October 14 - 18, 2024
      Salt Lake City , UT , USA

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)592
      • Downloads (Last 6 weeks)43
      Reflects downloads up to 30 Aug 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Keep It Simple: Testing Databases via Differential Query PlansProceedings of the ACM on Management of Data10.1145/36549912:3(1-26)Online publication date: 30-May-2024
      • (2024)Finding Cross-Rule Optimization Bugs in Datalog EnginesProceedings of the ACM on Programming Languages10.1145/36498158:OOPSLA1(110-136)Online publication date: 29-Apr-2024
      • (2024)Finding XPath Bugs in XML Document Processors via Differential TestingProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639208(1-12)Online publication date: 20-May-2024
      • (2024)Understanding Transaction Bugs in Database SystemsProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639207(1-13)Online publication date: 20-May-2024
      • (2024)CERT: Finding Performance Issues in Database Systems Through the Lens of Cardinality EstimationProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639076(1-13)Online publication date: 20-May-2024
      • (2024)Large Language Models are Edge-Case Generators: Crafting Unusual Programs for Fuzzing Deep Learning LibrariesProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3623343(1-13)Online publication date: 20-May-2024
      • (2024)Detecting Logic Bugs in Graph Database Management Systems via Injective and Surjective Graph Query TransformationProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3623307(1-12)Online publication date: 20-May-2024
      • (2024)AdvSQLi: Generating Adversarial SQL Injections Against Real-World WAF-as-a-ServiceIEEE Transactions on Information Forensics and Security10.1109/TIFS.2024.335091119(2623-2638)Online publication date: 2024
      • (2024)Differential Optimization Testing of Gremlin-Based Graph Database Systems2024 IEEE Conference on Software Testing, Verification and Validation (ICST)10.1109/ICST60714.2024.00012(25-36)Online publication date: 27-May-2024
      • (2024)SQLPass: A Semantic Effective Fuzzing Method for DBMS2024 IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC)10.1109/COMPSAC61105.2024.00141(1035-1044)Online publication date: 2-Jul-2024
      • Show More Cited By

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media