Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

VeriDKG: A Verifiable SPARQL Query Engine for Decentralized Knowledge Graphs

Published: 05 March 2024 Publication History

Abstract

The ability to decentralize knowledge graphs (KG) is important to exploit the full potential of the Semantic Web and realize the Web 3.0 vision. However, decentralization also renders KGs more prone to attacks with adverse effects on data integrity and query verifiability. While existing studies focus on ensuring data integrity, how to ensure query verifiability - thus guarding against incorrect, incomplete, or outdated query results - remains unsolved. We propose VeriDKG, the first SPARQL query engine for decentralized knowledge graphs (DKG) that offers both data integrity and query verifiability guarantees. The core of VeriDKG is the RGB-Trie, a new blockchain-maintained authenticated data structure (ADS) facilitating correctness proofs for SPARQL query results. VeriDKG enables verifiability of subqueries by gathering global index information on subgraphs using the RGB-Trie, which is implemented as a new variant of the Merkle prefix tree with an RGB color model. To enable verifiability of the final query result, the RGB-Trie is integrated with a cryptographic accumulator to support verifiable aggregation operations. A rigorous analysis of query verifiability in VeriDKG is presented, along with evidence from an extensive experimental study demonstrating its state-of-the-art query performance on the largeRDFbench benchmark.

References

[1]
Ibrahim Abdelaziz, Essam Mansour, Mourad Ouzzani, Ashraf Aboulnaga, and Panos Kalnis. 2017. Query optimizations over decentralized RDF graphs. In Proc. of 2017 IEEE 33rd International Conference on Data Engineering (ICDE). 139--142.
[2]
Christian Aebeloe, Gabriela Montoya, and Katja Hose. 2019. A Decentralized Architecture for Sharing and Querying Semantic Data. In Proc. of the European Semantic Web Conference (ESWC). 3--18.
[3]
Christian Aebeloe, Gabriela Montoya, and Katja Hose. 2019. Decentralized Indexing over a Network of RDF Peers. In Proc. of the International Semantic Web Conference (ISWC). 3--20.
[4]
Christian Aebeloe, Gabriela Montoya, and Katja Hose. 2021. ColChain: Collaborative Linked Data Networks. In Proc. of the Web Conference (WWW). 1385--1396.
[5]
Elli Androulaki, Artem Barger, Vita Bortnikov, Christian Cachin, Konstantinos Christidis, Angelo De Caro, David Enyeart, Christopher Ferris, Gennady Laventman, Yacov Manevich, Srinivasan Muralidharan, Chet Murthy, Binh Nguyen, Manish Sethi, Gari Singh, Keith Smith, Alessandro Sorniotti, Chrysoula Stathakopoulou, Marko Vukolić, Sharon Weed Cocco, and Jason Yellick. 2018. Hyperledger Fabric: A Distributed Operating System for Permissioned Blockchains. In Proc. of the EuroSys Conference (EuroSys). Article 30, 15 pages.
[6]
Giuseppe Ateniese, Randal Burns, Reza Curtmola, Joseph Herring, Lea Kissner, Zachary Peterson, and Dawn Song. 2007. Provable Data Possession at Untrusted Stores. In Proc. of ACM conference on Computer and communications security (CCS). 598--609.
[7]
Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and Zachary Ives. 2007. Dbpedia: A Nucleus for a Web of Open Data. In Journal of The semantic web. 722--735.
[8]
Amr Azzam, Christian Aebeloe, Gabriela Montoya, Ilkcan Keles, Axel Polleres, and Katja Hose. 2021. WiseKG: Balanced Access to Web Knowledge Graphs. In Proc. of the Web Conference (WWW). 1422--1434.
[9]
Debayan Banerjee, Pranav Ajit Nair, Jivat Neet Kaur, Ricardo Usbeck, and Chris Biemann. 2022. Modern Baselines for SPARQL Semantic Parsing. In Proc. of ACM SIGIR. 2260--2265.
[10]
Juan Benet. 2014. Ipfs-content Addressed, Versioned, p2p File System. arXiv preprint arXiv:1407.3561 (2014).
[11]
Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. 2008. Freebase: a Collaboratively Created Graph Database for Structuring Human Knowledge. In Proc. of ACM Special Interest Group on Management of Data (SIGMOD). 1247--1250.
[12]
Dan Boneh and Xavier Boyen. 2008. Short Signatures Without Random Oracles and the SDH Assumption in Bilinear Groups. J. Cryptol. 21, 2 (feb 2008), 149--177.
[13]
Kevin D Bowers, Ari Juels, and Alina Oprea. 2009. HAIL: A High-availability and Integrity Layer for Cloud Storage. In Proc. of ACM conference on Computer and communications security (CCS). 187--198.
[14]
Marco Brandizi, Ajit Singh, and Keywan Hassani-Pak. 2018. Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMiner use case. In SWAT4LS.
[15]
Carlos Buil-Aranda, Aidan Hogan, Jürgen Umbrich, and Pierre-Yves Vandenbussche. 2013. SPARQL Web-querying Infrastructure: Ready for Action?. In Proc. of the international Semantic Web Conference (ISWC). Springer, 277--293.
[16]
Benedikt Bünz, Jonathan Bootle, Dan Boneh, Andrew Poelstra, Pieter Wuille, and Greg Maxwell. 2018. Bulletproofs: Short proofs for confidential transactions and more. In Proc. of 2018 IEEE symposium on security and privacy (SP). 315--334.
[17]
Min Cai and Martin Frank. 2004. RDFPeers: a Scalable Distributed RDF Repository Based on a Structured Peer-to-peer Network. In Proc. of the Web Conference (WWW). 650--657.
[18]
Ran Canetti, Omer Paneth, Dimitrios Papadopoulos, and Nikos Triandopoulos. 2014. Verifiable Set Operations over Outsourced Databases. In Public-Key Cryptography - PKC 2014, Hugo Krawczyk (Ed.). 113--130.
[19]
Juan Cano-Benito, Andrea Cimmino, and Raúl García-Castro. 2019. Towards Blockchain and Semantic Web. In Proc. of the international Conference on Business Information Systems. Springer, 220--231.
[20]
Wei Cao, Zhenjun Liu, Peng Wang, Sen Chen, Caifeng Zhu, Song Zheng, Yuhui Wang, and Guoqing Ma. 2018. PolarFS: an ultra-low latency and failure resilient distributed file system for shared storage cloud database. Proc. of the VLDB Endowment 11, 12 (2018), 1849--1862.
[21]
Miguel Castro, Barbara Liskov, et al. 1999. Practical byzantine fault tolerance. In Proc. of OSDI, Vol. 99. 173--186.
[22]
Hongzhi Chen, Changji Li, Chenguang Zheng, Chenghuan Huang, Juncheng Fang, James Cheng, and Jian Zhang. 2022. G-tran: a high performance distributed graph database with a decentralized architecture. Proc. of the VLDB Endowment 15, 11 (2022), 2545--2558.
[23]
Usman W Chohan. 2022. Web 3.0: The Future Architecture of the Internet? https://ssrn.com/abstract=4037693. Available at SSRN (2022).
[24]
Hung Dang, Tien Tuan Anh Dinh, Dumitrel Loghin, Ee-Chien Chang, Qian Lin, and Beng Chin Ooi. 2019. Towards Scaling Blockchain Systems via Sharding. In Proc. of ACM Special Interest Group on Management of Data (SIGMOD). 123--140.
[25]
Muhammad El-Hindi, Carsten Binnig, Arvind Arasu, Donald Kossmann, and Ravi Ramamurthy. 2019. BlockchainDB: A Shared Database on Blockchains. Proc. of the VLDB Endowment 12, 11, 1597--1609.
[26]
Ethereum. 2013. Go Ethereum. Retrieved March 20, 2023 from https://github.com/ethereum/go-ethereum
[27]
Ethereum. 2016. web3.js - Ethereum JavaScript API. Retrieved March 20, 2023 from https://web3js.readthedocs.io/en/v1.5.2/
[28]
Nicholas L Farnan, Adam J Lee, Panos K Chrysanthis, and Ting Yu. 2014. PAQO: Preference-aware query optimization for decentralized database systems. In Proc. of 2014 IEEE 30th International Conference on Data Engineering (ICDE). 424--435.
[29]
Javier D Fernández, Miguel A Martínez-Prieto, Claudio Gutiérrez, Axel Polleres, and Mario Arias. 2013. Binary RDF Representation for Publication and Exchange (HDT). Journal of Web Semantics 19 (2013), 22--41.
[30]
Sébastien Ferré. [n.d.]. Expressive and Scalable Query-Based Faceted Search over SPARQL Endpoints. In Proc. of ISWC. 438--453.
[31]
Wensheng Gan, Zhenqiang Ye, Shicheng Wan, and Philip S Yu. 2023. Web 3.0: The Future of Internet. In Proc. of the Web Conference (WWW). 1266--1275.
[32]
Zerui Ge, Dumitrel Loghin, Beng Chin Ooi, Pingcheng Ruan, and Tianwen Wang. 2022. Hybrid Blockchain Database Systems: Design and Performance. Proc. of the VLDB Endowment 15, 5 (2022), 1092--1104.
[33]
Mitchell L Gordon, Michelle S Lam, Joon Sung Park, Kayur Patel, Jeff Hancock, Tatsunori Hashimoto, and Michael S Bernstein. 2022. Jury learning: Integrating dissenting voices into machine learning models. In Proc. of CHI Conference on Human Factors in Computing Systems. 1--19.
[34]
Herumi. 2020. High-Speed Software Implementation of the Optimal Ate Pairing over Barreto-Naehrig Curves. Retrieved March 20, 2023 from https://github.com/herumi/ate-pairing
[35]
Zicong Hong, Song Guo, Peng Li, and Wuhui Chen. 2021. Pyramid: A Layered Sharding Blockchain System. In Proc. of IEEE INFOCOM.
[36]
Shagun Jhaver, Sucheta Ghoshal, Amy Bruckman, and Eric Gilbert. 2018. Online harassment and content moderation: The case of blocklists. Journal of ACM Transactions on Computer-Human Interaction (TOCHI) 25, 2 (2018), 1--33.
[37]
Nikolaos Karapanos, Alexandros Filios, Raluca Ada Popa, and Srdjan Capkun. 2016. Verena: End-to-end Integrity Protection for Web Applications. In Proc. of 2016 IEEE Symposium on Security and Privacy (SP). 895--913.
[38]
Lukas Klic. 2023. Linked Open Images: Visual similarity for the Semantic Web. Journal of Semantic Web 14, 2 (2023), 197--208.
[39]
Ora Lassila, Ralph R Swick, et al. 1998. Resource Description Framework (RDF) Model and Syntax Specification. W3C Recommendation (1998).
[40]
Zhuotao Liu, Yangxi Xiang, Jian Shi, Peng Gao, Haoyu Wang, Xusheng Xiao, Bihan Wen, Qi Li, and Yih-Chun Hu. 2021. Make Web3. 0 Connected. Journal of IEEE Transactions on Dependable and Secure Computing (2021).
[41]
Ralph C Merkle. 1987. A Digital Signature Based on a Conventional Encryption Function. In Proc. of the conference on the theory and application of cryptographic techniques. Springer, 369--378.
[42]
Einar Mykletun, Maithili Narasimha, and Gene Tsudik. 2006. Authentication and integrity in outsourced databases. Journal of ACM Transactions on Storage (TOS) 2, 2 (2006), 107--138.
[43]
Satoshi Nakamoto. 2008. Bitcoin: A Peer-to-peer Electronic Cash System. Decentralized Business Review (2008), 21260.
[44]
Senthil Nathan, Chander Govindarajan, Adarsh Saraf, Manish Sethi, and Praveen Jayachandran. 2019. Blockchain Meets Database: Design and Implementation of a Blockchain Relational Database. Proc. of the VLDB Endowment 12, 11, 1539--1552.
[45]
Lan Nguyen. 2005. Accumulators from Bilinear Pairings and Applications. In Proc. of the 2005 International Conference on Topics in Cryptology (CT-RSA). 275--292.
[46]
Charalampos Papamanthou, Roberto Tamassia, and Nikos Triandopoulos. 2011. Optimal Verification of Operations on Dynamic Sets. In Advances in Cryptology - CRYPTO 2011, Phillip Rogaway (Ed.). 91--110.
[47]
OriginTrail Parachain. 2022. OriginTrail Ecosystem White Paper 2.0. Retrieved March 20, 2023 from https://parachain.origintrail.io/whitepaper
[48]
Qingqi Pei, Enyuan Zhou, Yang Xiao, Deyu Zhang, and Dongxiao Zhao. 2020. An Efficient Query Scheme for Hybrid Storage Blockchains Based on Merkle Semantic Trie. In Proc. of the International Symposium on Reliable Distributed Systems (SRDS). IEEE, 51--60.
[49]
Yanqing Peng, Min Du, Feifei Li, Raymond Cheng, and Dawn Song. 2020. FalconDB: Blockchain-based Collaborative Database. In Proc. of ACM Special Interest Group on Management of Data (SIGMOD). 637--652.
[50]
Eric Prudhommeaux. 2008. SPARQL Query Language for RDF. Retrieved March 20, 2023 from http://www.w3.org/TR/rdf-sparql-query/
[51]
Pingcheng Ruan, Tien Tuan Anh Dinh, Dumitrel Loghin, Meihui Zhang, Gang Chen, Qian Lin, and Beng Chin Ooi. 2021. Blockchains vs. distributed databases: Dichotomy and fusion. In Proc. of ACM Special Interest Group on Management of Data (SIGMOD). 1504--1517.
[52]
Muhammad Saleem, Ali Hasnain, and Axel-Cyrille Ngonga Ngomo. 2018. Largerdfbench: a Billion Triples Benchmark for sparql Endpoint Federation. Journal of Web Semantics 48 (2018), 85--125.
[53]
Adi Shamir. 1979. How to share a secret. Journal of Communications of the ACM 22, 11 (1979), 612--613.
[54]
Dan Sheridan, James Harris, Frank Wear, Jerry Cowell Jr, Easton Wong, and Abbas Yazdinejad. 2022. Web3 Challenges and Opportunities for the Market. arXiv preprint arXiv:2209.02446 (2022).
[55]
Mirek Sopek, Przemyslaw Gradzki, Witold Kosowski, Dominik Kuziski, Rafa Trójczak, and Robert Trypuz. 2018. GraphChain: a distributed database with explicit semantics and chained RDF graphs. In Proc. of the Web Conference (WWW). 1171--1178.
[56]
Fabian M Suchanek, Gjergji Kasneci, and Gerhard Weikum. 2007. Yago: a Core of Semantic Knowledge. In Proc. of the Web Conference (WWW). 697--706.
[57]
Roberto Tamassia. 2003. Authenticated data structures. In Proc. of European symposium on algorithms. 2--5.
[58]
Kristen Vaccaro, Ziang Xiao, Kevin Hamilton, and Karrie Karahalios. 2021. Contestability For Content Moderation. Proc. of the ACM on Human-Computer Interaction 5, CSCW2 (2021), 1--28.
[59]
VMvare. 2023. Spring boot. Retrieved March 20, 2023 from https://spring.io/projects/spring-boot
[60]
Shicheng Wan, Hong Lin, Wensheng Gan, Jiahui Chen, and Philip S Yu. 2023. Web3: The Next Internet Revolution. arXiv preprint arXiv:2304.06111 (2023).
[61]
Haixin Wang, Cheng Xu, Ce Zhang, Jianliang Xu, Zhe Peng, and Jian Pei. 2022. vChain+: Optimizing Verifiable Blockchain Boolean Range Queries. In Proc. of IEEE International Conference on Data Engineering (ICDE).
[62]
Shuai Wang, Chenchen Huang, Juanjuan Li, Yong Yuan, and Fei-Yue Wang. 2019. Decentralized Construction of Knowledge Graphs for Deep Recommender Systems Based on Blockchain-powered Smart Contracts. Journal of IEEE Access 7 (2019), 136951--136961.
[63]
Gavin Wood et al. 2014. Ethereum: A Secure Decentralised Generalised Transaction Ledger. Ethereum project yellow paper 151, 2014 (2014), 1--32.
[64]
Songrui Wu, Qi Li, Guoliang Li, Dong Yuan, Xingliang Yuan, and Cong Wang. 2019. Servedb: Secure, verifiable, and efficient range queries on outsourced database. In Proc. of 2019 IEEE 35th International Conference on Data Engineering (ICDE). 626--637.
[65]
Min Xie, Haixun Wang, Jian Yin, and Xiaofeng Meng. 2007. Integrity Auditing of Outsourced Data. In Proc. of the VLDB Endowment, Vol. 7. 782--793.
[66]
Cheng Xu, Ce Zhang, and Jianliang Xu. 2019. vchain: Enabling Verifiable Boolean Range Queries over Blockchain Databases. In Proc. of ACM Special Interest Group on Management of Data (SIGMOD). 141--158.
[67]
Sean Yang and Max Li. 2023. Web3. 0 Data Infrastructure: Challenges and Opportunities. Journal of IEEE Network 37, 1 (2023), 4--5.
[68]
Cong Yue, Tien Tuan Anh Dinh, Zhongle Xie, Meihui Zhang, Gang Chen, Beng Chin Ooi, and Xiaokui Xiao. 2023. GlassDB: An Efficient Verifiable Ledger Database System Through Transparency. Proc. of the VLDB Endowment 16, 6 (2023).
[69]
Kai Zeng, Jiacheng Yang, Haixun Wang, Bin Shao, and Zhongyuan Wang. 2013. A Distributed Graph Engine for Web Scale RDF Data. Proc. of the VLDB Endowment 6, 4, 265--276.
[70]
Ce Zhang, Cheng Xu, Haixin Wang, Jianliang Xu, and Byron Choi. 2021. Authenticated Keyword Search in Scalable Hybrid-storage Blockchains. In Proc. of IEEE International Conference on Data Engineering (ICDE). 996--1007.
[71]
Ce Zhang, Cheng Xu, Jianliang Xu, Yuzhe Tang, and Byron Choi. 2019. GEM2-Tree: A Gas-Efficient Structure for Authenticated Range Queries in Blockchain. In Proc. of IEEE International Conference on Data Engineering (ICDE). 842--853.
[72]
Meihui Zhang, Zhongle Xie, Cong Yue, and Ziyue Zhong. 2020. Spitz: a verifiable Database System. Proc. of the VLDB Endowment 13, 12 (2020), 3449--3460.
[73]
Yupeng Zhang, Daniel Genkin, Jonathan Katz, Dimitrios Papadopoulos, and Charalampos Papamanthou. 2017. vSQL: Verifying Arbitrary SQL Queries over Dynamic Outsourced Databases. In Proc. of IEEE Symposium on Security and Privacy (SP). 863--880.
[74]
Yupeng Zhang, Jonathan Katz, and Charalampos Papamanthou. 2015. IntegriDB: Verifiable SQL for outsourced databases. In Proc. of ACM Computer and Communications Security (CCS). 1480--1491.
[75]
Chris Liu Ziliang Lai and Eric Lo. 2023. When Private Blockchain Meets Deterministic Database. In Proc. of ACM Special Interest Group on Management of Data (SIGMOD).

Index Terms

  1. VeriDKG: A Verifiable SPARQL Query Engine for Decentralized Knowledge Graphs
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image Proceedings of the VLDB Endowment
        Proceedings of the VLDB Endowment  Volume 17, Issue 4
        December 2023
        309 pages
        ISSN:2150-8097
        Issue’s Table of Contents

        Publisher

        VLDB Endowment

        Publication History

        Published: 05 March 2024
        Published in PVLDB Volume 17, Issue 4

        Check for updates

        Badges

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • 0
          Total Citations
        • 47
          Total Downloads
        • Downloads (Last 12 months)47
        • Downloads (Last 6 weeks)14
        Reflects downloads up to 03 Oct 2024

        Other Metrics

        Citations

        View Options

        Get Access

        Login options

        Full Access

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media