Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Testing Graph Database Systems via Graph-Aware Metamorphic Relations

Published: 05 March 2024 Publication History

Abstract

Graph database systems (GDBs) have supported many important real-world applications such as social networks, logistics, and path planning. Meanwhile, logic bugs are also prevalent in GDBs, leading to incorrect results and severe consequences. However, the logic bugs largely cannot be revealed by prior solutions which are unaware of the graph native structures of the graph data. In this paper, we propose Gamera (Graph-aware metamorphic relations), a novel metamorphic testing approach to uncover unknown logic bugs in GDBs. We design three classes of novel graph-aware Metamorphic Relations (MRs) based on the graph native structures. Gamera would generate a set of queries according to the graph-aware MRs to test diverse and complex GDB operations, and check whether the GDB query results conform to the chosen MRs.
We thoroughly evaluated the effectiveness of Gamera on seven widely-used GDBs such as Neo4j and OrientDB. Gamera was highly effective in detecting logic bugs in GDBs. In total, it detected 39 logic bugs, of which 15 bugs have been confirmed, and three bugs have been fixed. Our experiments also demonstrated that Gamera significantly outperformed prior solutions including Grand, GD-smith and GDBMeter. Gamera has been well-recognized by GDB developers and we open-source our prototype implementation to contribute to the community.

References

[1]
2015. Count Lines of Code. https://github.com/AlDanial/cloc.
[2]
2023. AQL (ArangoDB Query Language). https://www.arangodb.com/docs/stable/aql/.
[3]
2023. ArangoDB. https://www.arangodb.com/.
[4]
2023. Cypher Manual. https://neo4j.com/docs/cypher-manual/current/introduction/.
[5]
2023. DB-Engines Ranking of Graph DBMS. https://db-engines.com/en/ranking/graph+dbms.
[6]
2023. GDBMeter artifact. https://github.com/gdbmeter/gdbmeter.
[7]
2023. GDsmith artifact. https://github.com/ddaa2000/GDsmith.
[8]
2023. Grand artifact. https://github.com/tcse-iscas/Grand.
[9]
2023. Graph Algorithms. https://neo4j.com/docs/graph-data-science/current/algorithms/.
[10]
2023. Gremlin Manual. https://tinkerpop.apache.org/docs/3.6.2/reference/.
[11]
2023. HugeGraph. https://hugegraph.apache.org/.
[12]
2023. JanusGraph. https://janusgraph.org/.
[13]
2023. Neo4j. https://neo4j.com/.
[14]
2023. The Next Generation Multi-Model Database Supporting Graphs, Key/Value, Documents and Time-Series. https://arcadedb.com/.
[15]
2023. OrientDB. https://orientdb.org/.
[16]
2023. Path Finding. https://neo4j.com/docs/graph-data-science/current/algorithms/pathfinding/.
[17]
2023. RedisGraph. https://redis.io/docs/stack/graph/.
[18]
2023. SQLancer. https://github.com/sqlancer/sqlancer.
[19]
2023. TingerGraph. https://www.tigergraph.com/.
[20]
2023. TinkerGraph. https://github.com/tinkerpop/blueprints/wiki/tinkergraph.
[21]
2023. TinkerPop. https://tinkerpop.apache.org/.
[22]
2023. Traversal Recipes. https://tinkerpop.apache.org/docs/current/recipes/#_traversal_recipes.
[23]
Marcelo Arenas, Claudio Gutiérrez, and Juan F Sequeda. 2021. Querying in the Age of Graph Databases and Knowledge Graphs. In Proceedings of the 2021 International Conference on Management of Data. 2821--2828.
[24]
Hongzhi Chen, Changji Li, Juncheng Fang, Chenghuan Huang, James Cheng, Jian Zhang, Yifan Hou, and Xiao Yan. 2019. Grasper: A high performance distributed system for OLAP on property graphs. In Proceedings of the ACM Symposium on Cloud Computing. 87--100.
[25]
Hongzhi Chen, Changji Li, Chenguang Zheng, Chenghuan Huang, Juncheng Fang, James Cheng, and Jian Zhang. 2022. G-tran: a high performance distributed graph database with a decentralized architecture. Proceedings of the VLDB Endowment 15, 11 (2022), 2545--2558.
[26]
Tsong Y Chen, Shing C Cheung, and Shiu Ming Yiu. 1998. Metamorphic testing: a new approach for generating next test cases. Technical Report. HKUST-CS98-01, Department of Computer Science, Hong Kong.
[27]
Forrester Consulting. 2021. The Total Economic Impact™ of the Neo4j Graph Data Platform. https://neo4j.com/whitepapers/forrester-total-economic-impact/.
[28]
Alin Deutsch. 2018. Querying Graph Databases with the GSQL Query Language. In SBBD. 313.
[29]
Orri Erling, Alex Averbuch, Josep Larriba-Pey, Hassan Chafi, Andrey Gubichev, Arnau Prat, Minh-Duc Pham, and Peter Boncz. 2015. The LDBC social network benchmark: Interactive workload. In Proceedings of the 2015 ACM SIGMOD/PODS Conference. Melbourne, Victoria, Australia.
[30]
Nadime Francis, Alastair Green, Paolo Guagliardo, Leonid Libkin, Tobias Lindaaker, Victor Marsault, Stefan Plantikow, Mats Rydberg, Petra Selmer, and Andrés Taylor. 2018. Cypher: An evolving query language for property graphs. In Proceedings of the 2018 international conference on management of data. 1433--1445.
[31]
Bogdan Ghit, Nicolas Poggi, Josh Rosen, Reynold Xin, and Peter Boncz. 2020. SparkFuzz: searching correctness regressions in modern query engines. In Proceedings of the workshop on Testing Database Systems. 1--6.
[32]
Ziyue Hua, Wei Lin, Luyao Ren, Zongyang Li, Lu Zhang, Wenpin Jiao, and Tao Xie. 2023. GDsmith: Detecting bugs in Cypher graph database engines. In Proceedings of ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA).
[33]
Jinho Jung, Hong Hu, Joy Arulraj, Taesoo Kim, and Woonhak Kang. 2019. Apollo: Automatic detection and diagnosis of performance regressions in database systems. Proceedings of the VLDB Endowment 13, 1 (2019), 57--70.
[34]
M Kamm, M Rigger, C Zhang, and Z Su. 2023. Testing graph database engines via query partitioning. In Proceedings of ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA).
[35]
Chathura Kankanamge, Siddhartha Sahu, Amine Mhedbhi, Jeremy Chen, and Semih Salihoglu. 2017. Graphflow: An active graph database. In Proceedings of the 2017 ACM SIGMOD/PODS Conference. Chicago, Illinois, USA.
[36]
Baozhu Liu, Xin Wang, Pengkai Liu, Sizhuo Li, Qiang Fu, and Yunpeng Chai. 2021. UniKG: A unified interoperable knowledge graph database system. In 2021 IEEE 37th International Conference on Data Engineering (ICDE). IEEE, 2681--2684.
[37]
Pingchuan Ma and Shuai Wang. 2021. MT-teql: evaluating and augmenting neural NLIDB on real-world linguistic and schema variations. Proceedings of the VLDB Endowment 15, 3 (2021), 569--582.
[38]
William M McKeeman. 1998. Differential testing for software. Digital Technical Journal 10, 1 (1998), 100--107.
[39]
Neo4j. 2015. Introducing the new Cypher Query Optimizer. https://neo4j.com/blog/introducing-new-cypher-query-optimizer/.
[40]
Yuxiang Ren, Hao Zhu, Jiawei Zhang, Peng Dai, and Liefeng Bo. 2021. Ensemfdet: An ensemble approach to fraud detection based on bipartite graph. In 2021 IEEE 37th International Conference on Data Engineering (ICDE). IEEE, 2039--2044.
[41]
Manuel Rigger and Zhendong Su. 2020. Detecting optimization bugs in database engines via non-optimizing reference engine construction. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1140--1152.
[42]
Manuel Rigger and Zhendong Su. 2020. Finding bugs in database systems via query partitioning. Proceedings of the ACM on Programming Languages 4, OOPSLA (2020), 1--30.
[43]
Marko A Rodriguez. 2015. The gremlin graph traversal machine and language (invited talk). In Proceedings of the 15th Symposium on Database Programming Languages. 1--10.
[44]
Chandan Sharma and Roopak Sinha. 2019. A schema-first formalism for labeled property graph databases: Enabling structured data loading and analytics. In Proceedings of the 6th ieee/acm international conference on big data computing, applications and technologies. 71--80.
[45]
Donald R Slutz. 1998. Massive stochastic testing of SQL. In VLDB, Vol. 98. Citeseer, 618--622.
[46]
Jiaqi Yan, Qiuye Jin, Shrainik Jain, Stratis D Viglas, and Allison Lee. 2018. Snowtrail: Testing with production queries on a cloud database. In Proceedings of the Workshop on Testing Database Systems. 1--6.
[47]
Yingying Zheng, Wensheng Dou, Yicheng Wang, Zheng Qin, Lei Tang, Yu Gao, Dong Wang, Wei Wang, and Jun Wei. 2022. Finding bugs in Gremlin-based graph database systems via Randomized differential testing. In Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis. 302--313.
[48]
Xiaowei Zhu, Guanyu Feng, Marco Serafini, Xiaosong Ma, Jiping Yu, Lei Xie, Ashraf Aboulnaga, and Wenguang Chen. 2020. LiveGraph: a transactional graph storage system with purely sequential adjacency list scans. Proceedings of the VLDB Endowment 13, 7 (2020), 1020--1034.

Cited By

View all
  • (2024)Finding Logic Bugs in Spatial Database Engines via Affine Equivalent InputsProceedings of the ACM on Management of Data10.1145/36988102:6(1-26)Online publication date: 20-Dec-2024
  • (2024)Testing Gremlin-Based Graph Database Systems via Query DisassemblingProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680392(1695-1707)Online publication date: 11-Sep-2024
  • (2024)Testing Graph Database Systems with Graph-State Persistence OracleProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680311(666-677)Online publication date: 11-Sep-2024

Index Terms

  1. Testing Graph Database Systems via Graph-Aware Metamorphic Relations
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image Proceedings of the VLDB Endowment
        Proceedings of the VLDB Endowment  Volume 17, Issue 4
        December 2023
        309 pages
        ISSN:2150-8097
        Issue’s Table of Contents

        Publisher

        VLDB Endowment

        Publication History

        Published: 05 March 2024
        Published in PVLDB Volume 17, Issue 4

        Check for updates

        Badges

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)106
        • Downloads (Last 6 weeks)11
        Reflects downloads up to 01 Jan 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Finding Logic Bugs in Spatial Database Engines via Affine Equivalent InputsProceedings of the ACM on Management of Data10.1145/36988102:6(1-26)Online publication date: 20-Dec-2024
        • (2024)Testing Gremlin-Based Graph Database Systems via Query DisassemblingProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680392(1695-1707)Online publication date: 11-Sep-2024
        • (2024)Testing Graph Database Systems with Graph-State Persistence OracleProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680311(666-677)Online publication date: 11-Sep-2024

        View Options

        Login options

        Full Access

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media