research-article

Public Access

SQUIRREL: Testing Database Management Systems with Language Validity and Coverage Feedback

Authors:

Dinghao WuAuthors Info & Claims

CCS '20: Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security

Pages 955 - 970

https://doi.org/10.1145/3372297.3417260

Published: 02 November 2020 Publication History

Abstract

Fuzzing is an increasingly popular technique for verifying software functionalities and finding security vulnerabilities. However, current mutation-based fuzzers cannot effectively test database management systems (DBMSs), which strictly check inputs for valid syntax and semantics. Generation-based testing can guarantee the syntax correctness of the inputs, but it does not utilize any feedback, like code coverage, to guide the path exploration.

In this paper, we develop Squirrel, a novel fuzzing framework that considers both language validity and coverage feedback to test DBMSs. We design an intermediate representation (IR) to maintain SQL queries in a structural and informative manner. To generate syntactically correct queries, we perform type-based mutations on IR, including statement insertion, deletion and replacement. To mitigate semantic errors, we analyze each IR to identify the logical dependencies between arguments, and generate queries that satisfy these dependencies. We evaluated Squirrel on four popular DBMSs: SQLite, MySQL, PostgreSQL and MariaDB. Squirrel found 51 bugs in SQLite, 7 in MySQL and 5 in MariaDB. 52 of the bugs are fixed with 12 CVEs assigned. In our experiment, Squirrel achieves 2.4×-243.9× higher semantic correctness than state-of-the-art fuzzers, and explores 2.0×-10.9× more new edges than mutation-based tools. These results show that Squirrel is effective in finding memory errors of database management systems.

Supplementary Material

MOV File (Copy of CCS2020_fp216_RuiZhong - Brian Hollendyke.mov)

Presentation video

Download
269.34 MB

References

[1]

MariaDB. https://www.mariadb.org/.

[2]

MySQL. https://www.mysql.com/.

[3]

PostgreSQL. https://www.postgresql.org/.

[4]

SQLite. https://www.sqlite.org/index.html.

[5]

SQLSmith. https://github.com/anse1/sqlsmith, 2016.

[6]

Processing a SQL Statement. https://docs.microsoft.com/en-us/sql/odbc/reference/processing-a-sql-statement?view=sql-server-ver15, 2017.

[7]

Yahoo Says All Three Billion Accounts Hacked in 2013 Data Theft. https://www.reuters.com/article/us-yahoo-cyber/yahoo-says-all-three-billion-accounts-hacked-in-2013-data-theft-idUSKCN1C82O1, October 2017.

[8]

MySQL Customers. https://www.mysql.com/customers/, 2020.

[9]

PostgreSQL Clients. https://wiki.postgresql.org/wiki/PostgreSQL_Clients, 2020.

[10]

SQL Keywords Reference. https://www.w3schools.com/sql/sql_ref_keywords.asp, 2020.

[11]

SQL Operators. https://www.w3schools.com/sql/sql_operators.asp, 2020.

[12]

Well-Known Users of SQLite. https://www.sqlite.org/famous.html, 2020.

[13]

SQLsmith Description. https://github.com/anse1/sqlsmith#description, 2020.

[14]

S. Abdul Khalek and S. Khurshid. Automated SQL Query Generation for Systematic Testing of Database Engines. In Proceedings of the IEEE/ACM International Conference on Automated Software Engineering, ASE10, 2010.

[15]

B. Acohido. Small Banks and Credit Union Attack Set for Tuesday. https://www.usatoday.com/story/cybertruth/2013/05/06/ddos-denial-of-service-small-business-cybersecurity-privacy/2139349/, May 2013.

[16]

Alloy. Alloy - Documentation of Alloy SAT solver, 2019. https://alloytools.org/documentation.html.

[17]

C. Aschermann, T. Frassetto, T. Holz, P. Jauernig, A.-R. Sadeghi, and D. Teuchert. Nautilus: Fishing for deep bugs with grammars. In NDSS, 2019.

[18]

C. Aschermann, S. Schumilo, T. Blazytko, R. Gawlik, and T. Holz. Redqueen: Fuzzing with input-to-state correspondence. In Symposium on Network and Distributed System Security (NDSS), 2019.

[19]

H. Bati, L. Giakoumakis, S. Herbert, and A. Surna. A Genetic Approach for Random Testing of Database Systems. In Proceedings of the 33rd International Conference on Very Large Data Bases, VLDB07, pages 1243--1251, 2007.

Digital Library

[20]

T. Blazytko, C. Aschermann, M. Schlögel, A. Abbasi, S. Schumilo, S. Wörner, and T. Holz. GRIMOIRE: Synthesizing structure while fuzzing. In USENIX Security Symposium, 2019.

[21]

M. Böhme, V.-T. Pham, and A. Roychoudhury. Coverage-based Greybox Fuzzing As Markov Chain. In Proceedings of the 23rd ACM Conference on Computer and Communications Security (CCS), Vienna, Austria, Oct. 2016.

Digital Library

[22]

M. Böhme, V.-T. Pham, M.-D. Nguyen, and A. Roychoudhury. Directed greybox fuzzing. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pages 2329--2344, 2017.

Digital Library

[23]

E. V. Buskirk. Facebook Confirms Denial-of-Service Attack. https://www.wired.com/2009/08/facebook-apparently-attacked-in-addition-to-twitter/, Au-gust 2009.

[24]

B. Chandra, B. Chawda, B. Kar, K. V. M. Reddy, S. Shah, and S. Sudarshan. Data generation for testing and grading sql queries. The VLDB Journal, 24(6):731--755,Aug 2015. ISSN 0949--877X. URL http://dx.doi.org/10.1007/s00778-015-0395-0.

Digital Library

[25]

P. Chen and H. Chen. Angora: Efficient Fuzzing By Principled Search. In Proceedings of the 39th IEEE Symposium on Security and Privacy (Oakland), San Francisco, CA, May 2018.

[26]

Y. Chen, P. Li, J. Xu, S. Guo, R. Zhou, Y. Zhang, L. Lu, et al. Savior: Towards bug-driven hybrid testing. In Proceedings of the 41th IEEE Symposium on Security and Privacy (Oakland), San Francisco, CA, May 2020.

[27]

C. Cimpanu. Google Chrome Impacted by New Magellan 2.0 Vulnerabilities. https://www.zdnet.com/article/google-chrome-impacted-by-new-magellan-2-0-vulnerabilities/, December 2019.

[28]

U. M. Fayyad. Data Science Revealed: A Data-Driven Glimpse into the BurgeoningNew Field. https://fayyad.com/data-science-revealed-a-data-driven-glimpse-into-the-burgeoning-new-field/, 2011.

[29]

L. Franceschi-Bicchierai. Hacker Tries To Sell 427 Milllion Stolen MySpacePasswords For $2,800. https://www.vice.com/en_us/article/pgkk8v/427-million-myspace-passwords-emails-data-breach, May 2016.

[30]

S. Gan, C. Zhang, X. Qin, X. Tu, K. Li, Z. Pei, and Z. Chen. CollAFL: Path Sensitive Fuzzing. In Proceedings of the 39th IEEE Symposium on Security and Privacy(Oakland), San Francisco, CA, May 2018.

[31]

S. Gan, C. Zhang, P. Chen, B. Zhao, X. Qin, D. Wu, and Z. Chen. GREYONE: Data Flow Sensitive Fuzzing. In Proceedings of the 29th USENIX Security Symposium(Security), BOSTON, MA, Aug. 2020.

[32]

Google. Honggfuzz, 2016. https://google.github.io/honggfuzz/.

[33]

Google. OSS-Fuzz - Continuous Fuzzing For Open Source Software. https://github.com/google/oss-fuzz, 2018.

[34]

H. Han, D. Oh, and S. K. Cha. Codealchemist: Semantics-aware code generation to find vulnerabilities in javascript engines. In Proceedings of the 2019 Annual Network and Distributed System Security Symposium (NDSS), San Diego, CA, Feb. 2019.

[35]

T. Hunt. The 773 Million Record "Collection #1" Data Breach. https://www.troyhunt.com/the-773-million-record-collection-1-data-reach/, January 2020.

[36]

J. Jung, H. Hu, J. Arulraj, T. Kim, and W. Kang. APOLLO: Automatic Detection and Diagnosis of Performance Regressions in Database Systems (to appear). In Proceedings of the 46th International Conference on Very Large Data Bases (VLDB), Tokyo, Japan, Aug. 2020.

[37]

V. V. Koushik. ALERT: SQLite database Remote Code Execution Vulnerability. https://www.secpod.com/blog/sqlite-database-remote-code-execution/, August2019.

[38]

D. Laney. 3-D Data Management: Controlling Data Volume, Velocity and Variety. Technical report, Feb. 2001.

[39]

C. Lemieux and K. Sen. Fairfuzz: A targeted mutation strategy for increasing greybox fuzz testing coverage. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, pages 475--485, 2018.

Digital Library

[40]

G. Li, X. Zhou, S. Li, and B. Gao. QTune: A Query-Aware Database Tuning System with Deep Reinforcement Learning.Proceedings of the VLDB Endowment, 12(12):2118--2130, 2019.

Digital Library

[41]

Y. Li, B. Chen, M. Chandramohan, S.-W. Lin, Y. Liu, and A. Tiu. Steelix: Program-state Based Binary Fuzzing. In Proceedings of the 11th Joint Meeting on Foundations of Software Engineering, 2017.

Digital Library

[42]

LLVM. LibFuzzer - A Library For Coverage-guided Fuzz Testing, 2017. http://llvm.org/docs/LibFuzzer.html.

[43]

E. Lo, C. Binnig, D. Kossmann, M. Tamer Özsu, and W.-K. Hon. A Framework for Testing DBMS Features. The VLDB Journal, 19(2):203--230, Apr. 2010.

[44]

M. Marcozzi, W. Vanhoof, and J.-L. Hainaut. Test Input Generation for Database Programs Using Relational Constraints. In Proceedings of the Fifth International Workshop on Testing Database Systems, DBTest 12, 2012.

[45]

B. P. Miller, L. Fredriksen, and B. So. An Empirical Study Of The Reliability Of UNIX Utilities. Commun. ACM, 33(12):32--44, Dec. 1990.

Digital Library

[46]

C. Mishra, N. Koudas, and C. Zuzarte. Generating targeted queries for database testing. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD '08, page 499--510, New York, NY, USA, 2008. Association for Computing Machinery. ISBN 9781605581026. URL https://doi.org/10.1145/1376616.1376668.

Digital Library

[47]

Mozilla Security. funfuzz. https://github.com/MozillaSecurity/funfuzz, 2020.

[48]

S. Nagy and M. Hicks. Full-speed Fuzzing: Reducing Fuzzing Overhead Through Coverage-guided Tracing. In Proceedings of the 40th IEEE Symposium on Security and Privacy (Oakland), San Francisco, CA, May 2019.

[49]

R. Padhye, C. Lemieux, K. Sen, M. Papadakis, and Y. Le Traon. Semantic fuzzing with zest. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis, pages 329--340, 2019.

Digital Library

[50]

H. Peng, Y. Shoshitaishvili, and M. Payer. T-Fuzz: Fuzzing By Program Trans-formation. In Proceedings of the 39th IEEE Symposium on Security and Privacy(Oakland), San Francisco, CA, May 2018.

[51]

K.-T. Rehmann, C. Seo, D. Hwang, B. Truong, A. Böhm, and D. Lee. Performance Monitoring in SAP HANA's Continuous Integration Process. ACM SIGMETRICS Performance Evaluation Review, 43:43--52, 02 2016.

Digital Library

[52]

M. Rigger and Z. Su. Testing Database Engines via Pivoted Query Synthesis. arXiv preprint arXiv:2001.04174, 2020.

[53]

S. Schumilo, C. Aschermann, R. Gawlik, S. Schinzel, and T. Holz. kAFL: Hardware-Assisted Feedback Fuzzing for OS Kernels. In Proceedings of the 26th USENIX Security Symposium (Security), Vancouver, BC, Canada, Aug. 2017.

[54]

K. Serebryany. Sanitize, Fuzz, And Harden Your C++ Code. San Francisco, CA, 2016. USENIX Association.

[55]

D. Slutz. Massive Stochastic Testing of SQL. Technical Report MSR-TR-98--21, August 1998. URL https://www.microsoft.com/en-us/research/publication/massive-stochastic-testing-of-sql/.

[56]

N. Stephens, J. Grosen, C. Salls, A. Dutcher, R. Wang, J. Corbetta, Y. Shoshitaishvili, C. Kruegel, and G. Vigna. Driller: Augmenting fuzzing through selective symbolic execution. In NDSS, volume 16, pages 1--16, 2016.

[57]

M. Stonebraker, S. Madden, and P. Dubey. Intel "Big Data" Science and Technology Center Vision and Execution Plan.ACM SIGMOD Record, 42(1):44--49, 2013.

Digital Library

[58]

J. Wang, P. Zhang, L. Zhang, H. Zhu, and X. Ye. A model-based fuzzing approach for dbms. In 2013 8th International Conference on Communications and Networking in China (CHINACOM), pages 426--431, Aug 2013.

[59]

J. Wang, B. Chen, L. Wei, and Y. Liu. Superion: Grammar-Aware Greybox Fuzzing. In Proceedings of the 41st International Conference on Software Engineering (ICSE), Montreal, QC, Canada, May 2019.

Digital Library

[60]

G. Wassermann and Z. Su. Sound and Precise Analysis of Web Applications for Injection Vulnerabilities. In Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation, New York, NY, USA, 2007.

Digital Library

[61]

R. Xu and S. Vegasena. Vasilisk. https://blog.osiris.cyber.nyu.edu/2019/12/22/vasilisk/, December 2019.

[62]

W. Xu, S. Kashyap, C. Min, and T. Kim. Designing New Operating Primitives to Improve Fuzzing Performance. In Proceedings of the 24th ACM Conference on Computer and Communications Security (CCS), Dallas, TX, Oct.--Nov. 2017.

Digital Library

[63]

J. Yan, Q. Jin, S. Jain, S. D. Viglas, and A. Lee. Snowtrail: Testing with Production Queries on a Cloud Database. In Proceedings of the Workshop on Testing Database Systems, New York, NY, USA, 2018.

Digital Library

[64]

I. Yun, S. Lee, M. Xu, Y. Jang, and T. Kim. QSYM: A Practical Concolic Execution Engine Tailored for Hybrid Fuzzing. In 27th USENIX Security Symposium (USENIX Security 18), Baltimore, MD, Aug. 2018.

[65]

M. Zalewski. American Fuzzy Lop (2.52b). http://lcamtuf.coredump.cx/afl, 2019.

[66]

M. Zalewski. Technical "Whitepaper" For Afl-fuzz. http://lcamtuf.coredump.cx/afl/technical_details.txt, 2019.

Cited By

Ba JRigger M(2024)Keep It Simple: Testing Databases via Differential Query PlansProceedings of the ACM on Management of Data10.1145/36549912:3(1-26)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3654991
Zhang CWang LRigger M(2024)Finding Cross-Rule Optimization Bugs in Datalog EnginesProceedings of the ACM on Programming Languages10.1145/36498158:OOPSLA1(110-136)Online publication date: 29-Apr-2024
https://dl.acm.org/doi/10.1145/3649815
Li SRigger MRoychoudhury APaiva AAbreu RStorey M(2024)Finding XPath Bugs in XML Document Processors via Differential TestingProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639208(1-12)Online publication date: 20-May-2024
https://dl.acm.org/doi/10.1145/3597503.3639208
Show More Cited By

Index Terms

SQUIRREL: Testing Database Management Systems with Language Validity and Coverage Feedback
1. Security and privacy
  1. Database and storage security
  2. Software and application security

Recommendations

Using an SQL coverage measurement for testing database applications
SIGSOFT '04/FSE-12: Proceedings of the 12th ACM SIGSOFT twelfth international symposium on Foundations of software engineering

Many software applications have a component based on database management systems in which information is generally handled through SQL queries embedded in the application code. When automation of software testing is mentioned in the research, this is ...
Database Management Systems
Using an SQL coverage measurement for testing database applications

Many software applications have a component based on database management systems in which information is generally handled through SQL queries embedded in the application code. When automation of software testing is mentioned in the research, this is ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CCS '20: Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security

October 2020

2180 pages

ISBN:9781450370899

DOI:10.1145/3372297

General Chairs:
Jay Ligatti
University of South Florida, USA
,
Xinming Ou
University of South Florida, USA
,
Program Chairs:
Jonathan Katz
University of Maryland, USA
,
Giovanni Vigna
University of California-Santa Barbara, USA

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGSAC: ACM Special Interest Group on Security, Audit, and Control

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 November 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation
Office of Naval Research

Conference

CCS '20

Sponsor:

SIGSAC

CCS '20: 2020 ACM SIGSAC Conference on Computer and Communications Security

November 9 - 13, 2020

Virtual Event, USA

Acceptance Rates

Overall Acceptance Rate 1,261 of 6,999 submissions, 18%

Upcoming Conference

CCS '24

Sponsor:
sigsac

ACM SIGSAC Conference on Computer and Communications Security

October 14 - 18, 2024

Salt Lake City , UT , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

39
Total Citations
View Citations
1,727
Total Downloads

Downloads (Last 12 months)592
Downloads (Last 6 weeks)43

Reflects downloads up to 30 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Ba JRigger M(2024)Keep It Simple: Testing Databases via Differential Query PlansProceedings of the ACM on Management of Data10.1145/36549912:3(1-26)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3654991
Zhang CWang LRigger M(2024)Finding Cross-Rule Optimization Bugs in Datalog EnginesProceedings of the ACM on Programming Languages10.1145/36498158:OOPSLA1(110-136)Online publication date: 29-Apr-2024
https://dl.acm.org/doi/10.1145/3649815
Li SRigger MRoychoudhury APaiva AAbreu RStorey M(2024)Finding XPath Bugs in XML Document Processors via Differential TestingProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639208(1-12)Online publication date: 20-May-2024
https://dl.acm.org/doi/10.1145/3597503.3639208
Cui ZDou WGao YWang DSong JZheng YWang TYang RXu KHu YWei JHuang TRoychoudhury APaiva AAbreu RStorey M(2024)Understanding Transaction Bugs in Database SystemsProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639207(1-13)Online publication date: 20-May-2024
https://dl.acm.org/doi/10.1145/3597503.3639207
Ba JRigger MRoychoudhury APaiva AAbreu RStorey M(2024)CERT: Finding Performance Issues in Database Systems Through the Lens of Cardinality EstimationProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639076(1-13)Online publication date: 20-May-2024
https://dl.acm.org/doi/10.1145/3597503.3639076
Deng YXia CYang CZhang SYang SZhang LRoychoudhury APaiva AAbreu RStorey M(2024)Large Language Models are Edge-Case Generators: Crafting Unusual Programs for Fuzzing Deep Learning LibrariesProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3623343(1-13)Online publication date: 20-May-2024
https://dl.acm.org/doi/10.1145/3597503.3623343
Jiang YLiu JBa JYap RLiang ZRigger MRoychoudhury APaiva AAbreu RStorey M(2024)Detecting Logic Bugs in Graph Database Management Systems via Injective and Surjective Graph Query TransformationProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3623307(1-12)Online publication date: 20-May-2024
https://dl.acm.org/doi/10.1145/3597503.3623307
Qu ZLing XWang TChen XJi SWu C(2024)AdvSQLi: Generating Adversarial SQL Injections Against Real-World WAF-as-a-ServiceIEEE Transactions on Information Forensics and Security10.1109/TIFS.2024.335091119(2623-2638)Online publication date: 2024
https://doi.org/10.1109/TIFS.2024.3350911
Zheng YDou WTang LCui ZSong JCheng ZWang WWei JZhong HHuang T(2024)Differential Optimization Testing of Gremlin-Based Graph Database Systems2024 IEEE Conference on Software Testing, Verification and Validation (ICST)10.1109/ICST60714.2024.00012(25-36)Online publication date: 27-May-2024
https://doi.org/10.1109/ICST60714.2024.00012
Li YYang YGuan YShi ZWang R(2024)SQLPass: A Semantic Effective Fuzzing Method for DBMS2024 IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC)10.1109/COMPSAC61105.2024.00141(1035-1044)Online publication date: 2-Jul-2024
https://doi.org/10.1109/COMPSAC61105.2024.00141
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents