Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3650212.3680392acmconferencesArticle/Chapter ViewAbstractPublication PagesisstaConference Proceedingsconference-collections
research-article
Open access

Testing Gremlin-Based Graph Database Systems via Query Disassembling

Published: 11 September 2024 Publication History

Abstract

Graph Database Systems (GDBs) support efficiently storing and retrieving graph data, and have become a critical component in many important applications. Many widely-used GDBs utilize the Gremlin query language to create, modify, and retrieve data in graph databases, in which developers can assemble a sequence of Gremlin APIs to perform a complex query. However, incorrect implementations and optimizations of GDBs can introduce logic bugs, which can cause Gremlin queries to return incorrect query results, e.g., omitting vertices in a graph database. In this paper, we propose Query Di sassembling (QuDi), an effective testing technique to automatically detect logic bugs in Gremlin-based GDBs. Given a Gremlin query Q, QuDi disassembles Q into a sequence of atomic graph traversals TList, which shares the equivalent execution semantics with Q. If the execution results of Q and TList are different, a logic bug is revealed in the target GDB. We evaluate QuDi on six popular GDBs, and have found 25 logic bugs in these GDBs, 10 of which have been confirmed as previously-unknown bugs by GDB developers.

References

[1]
2024. AFL. https://github.com/google/AFL
[2]
2024. Alloy. https://alloytools.org/
[3]
2024. DB-Engines Ranking of Graph DBMS. https://db-engines.com/en/ranking/graph+dbms
[4]
2024. Gremlin Query Language. https://tinkerpop.apache.org/gremlin.html
[5]
2024. Gremlin Traversal Strategy. https://tinkerpop.apache.org/docs/3.5.2/
[6]
2024. HugeGraph. https://hugegraph.github.io/hugegraph-doc/
[7]
2024. Introducing the new Cypher Query Optimizer. https://neo4j.com/blog/introducing-new-cypher-query-optimizer/
[8]
2024. JanusGraph. https://janusgraph.org
[9]
2024. MariaDB. https://mariadb.org
[10]
2024. MySQL. https://www.mysql.com
[11]
2024. Nebula Graph Query Language (nGQL). https://docs.nebula-graph.io/2.0.1/3.ngql-guide/1.nGQL-overview/1.overview/
[12]
2024. Neo4j. https://neo4j.com/
[13]
2024. Neo4j-Gremlin. https://github.com/thinkaurelius/neo4j-gremlin-plugin
[14]
2024. The Next Generation Multi-Model Database Supporting Graphs Key/Value, Documents and Time-Series. https://arcadedb.com/
[15]
2024. Open Source, Distributed, Scalable, Lightning Fast. https://nebula-graph.io/
[16]
2024. OrientDB. https://orientdb.org
[17]
2024. PostgreSQL. https://www.postgresql.org
[18]
2024. SPARQL 1.1 Query Language. https://www.w3.org/TR/sparql11-query/
[19]
2024. SQLsmith. https://github.com/anse1/sqlsmith
[20]
2024. Testing Gremlin-Based Graph Database Systems via Query Disassembling. https://doi.org/10.5281/zenodo.12771889
[21]
2024. TiDB, PingCAP. https://pingcap.com
[22]
2024. TinkerGraph. https://github.com/tinkerpop/blueprints/wiki/tinkergraph
[23]
2024. TinkerPop. https://tinkerpop.apache.org/
[24]
2024. What is openCypher? http://www.opencypher.org/
[25]
Renzo Angles, Juan L. Reutter, and Hannes Voigt. 2019. Graph Query Languages. In Encyclopedia of Big Data Technologies, Sherif Sakr and Albert Y. Zomaya (Eds.).
[26]
Marcelo Arenas, Claudio Gutiérrez, and Juan F. Sequeda. 2021. Querying in the Age of Graph Databases and Knowledge Graphs. In Proceedings of ACM SIGMOD International Conference on Management of Data (SIGMOD). 2821–2828. https://doi.org/10.1145/3448016.3457545
[27]
Jinsheng Ba and Manuel Rigger. 2023. Testing Database Engines via Query Plan Guidance. In Proceedings of IEEE/ACM International Conference on Software Engineering (ICSE). 2060–2071. https://doi.org/10.1109/ICSE48619.2023.00174
[28]
Earl T. Barr, Mark Harman, Phil McMinn, Muzammil Shahbaz, and Shin Yoo. 2015. The Oracle Problem in Software Testing: A Survey. IEEE Trans. Software Eng., 41, 5 (2015), 507–525.
[29]
Shafiul Azam Chowdhury, Soumik Mohian, Sidharth Mehra, Siddhant Gawsane, Taylor T. Johnson, and Christoph Csallner. 2018. Automatically Finding Bugs in a Commercial Cyber-Physical System Development Tool Chain with SLforge. In Proceedings of International Conference on Software Engineering (ICSE). 981–992. https://doi.org/10.1145/3180155.3180231
[30]
Ziyu Cui, Wensheng Dou, Qianwang Dai, Jiansen Song, Wei Wang, Jun Wei, and Dan Ye. 2022. Differentially Testing Database Transactions for Fun and Profit. In Proceedings of International Conference on Automated Software Engineering (ASE). 35:1–35:12. https://doi.org/10.1145/3551349.3556924
[31]
Ziyu Cui, Wensheng Dou, Yu Gao, Dong Wang, Jiansen Song, Yingying Zheng, Tao Wang, Rui Yang, Kang Xu, Yixin Hu, Jun Wei, and Tao Huang. 2024. Understanding Transaction Bugs in Database Systems. In Proceedings of the 46th IEEE/ACM International Conference on Software Engineering (ICSE). 163:1–163:13. https://doi.org/10.1145/3597503.3639207
[32]
Alin Deutsch. 2018. Querying Graph Databases with the GSQL Query Language. In Proceedings of Simpósio Brasileiro de Banco de Dados (SBBD). 313.
[33]
Alin Deutsch, Yu Xu, Mingxi Wu, and Victor E. Lee. 2020. Aggregation Support for Modern Graph Analytics in TigerGraph. In Proceedings of ACM SIGMOD International Conference on Management of Data (SIGMOD). 377–392. https://doi.org/10.1145/3318464.3386144
[34]
Wensheng Dou, Ziyu Cui, Qianwang Dai, Jiansen Song, Dong Wang, Yu Gao, Wei Wang, Jun Wei, Lei Chen, Hanmo Wang, Hua Zhong, and Tao Huang. 2023. Detecting Isolation Bugs via Transaction Oracle Construction. In Proceedings of IEEE/ACM International Conference on Software Engineering (ICSE). 1123–1135. https://doi.org/10.1109/ICSE48619.2023.00101
[35]
Orri Erling, Alex Averbuch, Josep Lluís Larriba-Pey, Hassan Chafi, Andrey Gubichev, Arnau Prat-Pérez, Minh-Duc Pham, and Peter A. Boncz. 2015. The LDBC Social Network Benchmark: Interactive Workload. In Proceedings of ACM SIGMOD International Conference on Management of Data (SIGMOD). 619–630. https://doi.org/10.1145/2723372.2742786
[36]
Nadime Francis, Alastair Green, Paolo Guagliardo, Leonid Libkin, Tobias Lindaaker, Victor Marsault, Stefan Plantikow, Mats Rydberg, Petra Selmer, and Andrés Taylor. 2018. Cypher: An Evolving Query Language for Property Graphs. In Proceedings of ACM SIGMOD International Conference on Management of Data (SIGMOD). 1433–1445. https://doi.org/10.1145/3183713.3190657
[37]
Zongyin Hao, Quanfeng Huang, Chengpeng Wang, Jianfeng Wang, Yushan Zhang, Rongxin Wu, and Charles Zhang. 2023. Pinolo: Detecting Logical Bugs in Database Management Systems with Approximate Query Synthesis. In Proceedings of USENIX Annual Technical Conference (USENIX ATC). 345–358.
[38]
Zhenzhen He, Jiong Yu, and Binglei Guo. 2022. Execution Time Prediction for Cypher Queries in the Neo4j Database Using a Learning Approach. Symmetry, 14, 1 (2022), 55.
[39]
Ziyue Hua, Wei Lin, Luyao Ren, Zongyang Li, Lu Zhang, and Tao Xie. 2023. GDsmith: Detecting Bugs in Cypher Graph Database Engines. In Proceedings of International Symposium on Software Testing and Analysis (ISSTA). 163–174. https://doi.org/10.1145/3597926.3598046
[40]
Yuancheng Jiang, Jiahao Liu, Jinsheng Ba, Roland H. C. Yap, Zhenkai Liang, and Manuel Rigger. 2024. Detecting Logic Bugs in Graph Database Management Systems via Injective and Surjective Graph Query Transformation. In Proceedings of the 46th IEEE/ACM International Conference on Software Engineering (ICSE). 46:1–46:12. https://doi.org/10.1109/ICST60714.2024.00012
[41]
Jinho Jung, Hong Hu, Joy Arulraj, Taesoo Kim, and Woon-Hak Kang. 2019. APOLLO: Automatic Detection and Diagnosis of Performance Regressions in Database Systems. Proceedings of the VLDB Endowment (VLDB), 13, 1 (2019), 57–70.
[42]
Matteo Kamm, Manuel Rigger, Chengyu Zhang, and Zhendong Su. 2023. Testing Graph Database Engines via Query Partitioning. In Proceedings of ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA). https://doi.org/10.1145/3597926.3598044
[43]
Shadi Abdul Khalek, Bassem Elkarablieh, Yai O. Laleye, and Sarfraz Khurshid. 2008. Query-Aware Test Generation Using a Relational Constraint Solver. In Proceedings of IEEE/ACM International Conference on Automated Software Engineering (ASE). 238–247. https://doi.org/10.1109/ASE.2008.34
[44]
Shadi Abdul Khalek and Sarfraz Khurshid. 2010. Automated SQL query generation for systematic testing of database engines. In International Conference on Automated Software Engineering (ASE). 329–332. https://doi.org/10.1145/1858996.1859063
[45]
Baozhu Liu, Xin Wang, Pengkai Liu, Sizhuo Li, Qiang Fu, and Yunpeng Chai. 2021. UniKG: A Unified Interoperable Knowledge Graph Database System. In Proceedings of IEEE International Conference on Data Engineering (ICDE). 2681–2684. https://doi.org/10.1109/ICDE51399.2021.00303
[46]
Xinyu Liu, Qi Zhou, Joy Arulraj, and Alessandro Orso. 2022. Automatic Detection of Performance Bugs in Database Systems Using Equivalent Queries. In Proceedings of International Conference on Software Engineering (ICSE). 225–236. https://doi.org/10.1145/3510003.3510093
[47]
Qiuyang Mang, Aoyang Fang, Boxi Yu, Hanfei Chen, and Pinjia He. 2024. Testing Graph Database Systems via Equivalent Query Rewriting. In Proceedings of the 46th IEEE/ACM International Conference on Software Engineering (ICSE). 143:1–143:12. https://doi.org/10.1145/3597503.3639200
[48]
Muhammad Numair Mansur, Maria Christakis, and Valentin Wüstholz. 2021. Metamorphic Testing of Datalog Engines. In Proceedings of ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). 639–650. https://doi.org/10.1145/3468264.3468573
[49]
William M. McKeeman. 1998. Differential Testing for Software. Digit. Tech. J., 10, 1 (1998), 100–107.
[50]
Yuxiang Ren, Hao Zhu, Jiawei Zhang, Peng Dai, and Liefeng Bo. 2021. EnsemFDet: An Ensemble Approach to Fraud Detection based on Bipartite Graph. In Proceedings of International Conference on Data Engineering (ICDE). 2039–2044. https://doi.org/10.1109/ICDE51399.2021.00197
[51]
Manuel Rigger and Zhendong Su. 2020. Detecting Optimization Bugs in Database Engines via Non-Optimizing Reference Engine Construction. In Proceedings of ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). 1140–1152. https://doi.org/10.1145/3368089.3409710
[52]
Manuel Rigger and Zhendong Su. 2020. Finding Bugs in Database Systems via Query Partitioning. 4 (2020), Article 211, 30 pages.
[53]
Manuel Rigger and Zhendong Su. 2020. Testing Database Engines via Pivoted Query Synthesis. In Proceedings of USENIX Symposium on Operating Systems Design and Implementation (OSDI). 667–682.
[54]
Marko A. Rodriguez. 2015. The Gremlin Graph Traversal Machine and Language (Invited Talk). In Proceedings of the Symposium on Database Programming Languages (DBPL). 1–10. https://doi.org/10.1145/2815072.2815073
[55]
Donald S. Slutz. 1998. Massive Stochastic Testing of SQL. In Proceedings of International Conference on Very Large Data Bases (VLDB). 618–622.
[56]
Jiansen Song, Wensheng Dou, Ziyu Cui, Qianwang Dai, Wei Wang, Jun Wei, Hua Zhong, and Tao Huang. 2023. Testing Database Systems via Differential Query Execution. In Proceedings of IEEE/ACM International Conference on Software Engineering (ICSE). 2072–2084. https://doi.org/10.1109/ICSE48619.2023.00175
[57]
Jiansen Song, Wensheng Dou, Yu Gao, Ziyu Cui, Yingying Zheng, Dong Wang, Wei Wang, Jun Wei, and Tao Huang. 2024. Detecting Metadata-Related Logic Bugs in Database Systems via Raw Database Construction. Proceedings of the VLDB Endowment (VLDB).
[58]
Ran Wang, Zhengyi Yang, Wenjie Zhang, and Xuemin Lin. 2020. An Empirical Study on Recent Graph Database Systems. In Proceedings of International Conference on Knowledge Science, Engineering and Management (KSEM). 328–340. https://doi.org/10.1007/978-3-030-55130-8_29
[59]
Min Wu, Xinglu Yi, Hui Yu, Yu Liu, and Yujue Wang. 2022. Nebula Graph: An Open Source Distributed Graph Database. CoRR, abs/2206.07278 (2022).
[60]
Rui Yang, Yingying Zheng, Lei Tang, Wensheng Dou, Wei Wang, and Jun Wei. 2023. Randomized Differential Testing of RDF Stores. In Proceedings of IEEE/ACM International Conference on Software Engineering (ICSE Demo). 136–140. https://doi.org/10.1109/ICSE-Companion58688.2023.00041
[61]
Xuejun Yang, Yang Chen, Eric Eide, and John Regehr. 2011. Finding and Understanding Bugs in C Compilers. In Proceedings of International Conference on Programming Language Design and Implementation (PLDI). 283–294. https://doi.org/10.1145/1993316.1993532
[62]
Yui Yang, Yingying Zheng, Lei Tang, Wensheng Dou, Wei Wang, and Jun Wei. 2023. Randomized Differential Testing of RDF Stores. In Proceedings of International Conference on Software Engineering: Companion Proceedings (ICSE Companion). 136–140. https://doi.org/10.1109/ICSE-Companion58688.2023.00041
[63]
Yingying Zheng, Wensheng Dou, Lei Tang, Ziyu Cui, Jiansen Song, Ziyue Cheng, Wei Wang, Jun Wei, Hua Zhong, and Tao Huang. 2024. Differential Optimization Testing of Gremlin-Based Graph Database Systems. In Proceedings of the 17th IEEE International Conference on Software Testing, Verification and Validation (ICST). 25–36.
[64]
Yingying Zheng, Wensheng Dou, Yicheng Wang, Zheng Qin, Lei Tang, Yu Gao, Dong Wang, Wei Wang, and Jun Wei. 2022. Finding Bugs in Gremlin-Based Graph Database Systems via Randomized Differential Testing. In Proceedings of International Symposium on Software Testing and Analysis (ISSTA). 302–313. https://doi.org/10.1145/3533767.3534409
[65]
Zeyang Zhuang, Penghui Li, Pingchuan Ma, Wei Meng, and Shuai Wang. 2023. Testing Graph Database Systems via Graph-Aware Metamorphic Relations. Proceedings of the VLDB Endowment (VLDB), 17, 4 (2023), 836–848.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ISSTA 2024: Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis
September 2024
1928 pages
ISBN:9798400706127
DOI:10.1145/3650212
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 September 2024

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. Graph database systems
  2. bug detection
  3. graph traversal
  4. logic bug

Qualifiers

  • Research-article

Conference

ISSTA '24
Sponsor:

Acceptance Rates

Overall Acceptance Rate 58 of 213 submissions, 27%

Upcoming Conference

ISSTA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 180
    Total Downloads
  • Downloads (Last 12 months)180
  • Downloads (Last 6 weeks)54
Reflects downloads up to 31 Dec 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media