Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Canonical Forms for Isomorphic and Equivalent RDF Graphs: Algorithms for Leaning and Labelling Blank Nodes

Published: 25 July 2017 Publication History
  • Get Citation Alerts
  • Abstract

    Existential blank nodes greatly complicate a number of fundamental operations on Resource Description Framework (RDF) graphs. In particular, the problems of determining if two RDF graphs have the same structure modulo blank node labels (i.e., if they are isomorphic), or determining if two RDF graphs have the same meaning under simple semantics (i.e., if they are simple-equivalent), have no known polynomial-time algorithms. In this article, we propose methods that can produce two canonical forms of an RDF graph. The first canonical form preserves isomorphism such that any two isomorphic RDF graphs will produce the same canonical form; this iso-canonical form is produced by modifying the well-known canonical labelling algorithm Nauty for application to RDF graphs. The second canonical form additionally preserves simple-equivalence such that any two simple-equivalent RDF graphs will produce the same canonical form; this equi-canonical form is produced by, in a preliminary step, leaning the RDF graph, and then computing the iso-canonical form. These algorithms have a number of practical applications, such as for identifying isomorphic or equivalent RDF graphs in a large collection without requiring pairwise comparison, for computing checksums or signing RDF graphs, for applying consistent Skolemisation schemes where blank nodes are mapped in a canonical manner to Internationalised Resource Identifiers (IRIs), and so forth. Likewise a variety of algorithms can be simplified by presupposing RDF graphs in one of these canonical forms. Both algorithms require exponential steps in the worst case; in our evaluation we demonstrate that there indeed exist difficult synthetic cases, but we also provide results over 9.9 million RDF graphs that suggest such cases occur infrequently in the real world, and that both canonical forms can be efficiently computed in all but a handful of such cases.

    References

    [1]
    Renzo Angles, Marcelo Arenas, Pablo Barceló, Aidan Hogan, Juan L. Reutter, and Domagoj Vrgoc. 2016. Foundations of modern graph query languages. CoRR abs/1610.06264 (2016), 1--50. http://arxiv.org/abs/1610.06264
    [2]
    Jesús Arias-Fisteus, Norberto Fernández García, Luis Sánchez Fernández, and Carlos Delgado Kloos. 2010. Hashing and canonicalizing Notation 3 graphs. J. Comput. Syst. Sci. 76, 7 (2010), 663--685.
    [3]
    László Babai. 2015. Graph isomorphism in quasipolynomial time. CoRR abs/1512.03547 (2015), 1--89. http://arxiv.org/abs/1512.03547
    [4]
    László Babai, Paul Erdös, and Stanley M. Selkow. 1980. Random graph isomorphism. SIAM J. Comput. 9, 3 (1980), 628--635.
    [5]
    David Beckett, Tim Berners-Lee, Eric Prud’hommeaux, and Gavin Carothers. 2014. RDF 1.1 Turtle -- Terse RDF Triple Language. W3C Recommendation. Retrieved from http://www.w3.org/TR/turtle/.
    [6]
    David Booth. 2012. Well Behaved RDF: A Straw-Man Proposal for Taming Blank Nodes. Retrieved from http://dbooth.org/2013/well-behaved-rdf/Booth-well-behaved-rdf. pdf.
    [7]
    Jin-yi Cai, Martin Fürer, and Neil Immerman. 1992. An optimal lower bound on the number of variables for graph identifications. Combinatorica 12, 4 (1992), 389--410.
    [8]
    Gavin Carothers. 2014. RDF 1.1 N-Quads. W3C Recommendation. Retrieved from http://www.w3.org/TR/n-quads/.
    [9]
    Jeremy J. Carroll. 2003. Signing RDF graphs. In International Semantic Web Conference. 369--384.
    [10]
    Yodsawalai Chodpathumwan, Amirhossein Aleyasen, Arash Termehchy, and Yizhou Sun. 2016. Towards representation independent similarity search over graph databases. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management (CIKM’16). ACM, 2233--2238.
    [11]
    Richard Cyganiak, David Wood, and Markus Lanthaler. 2014. RDF 1.1 Concepts and Abstract Syntax. W3C Recommendation. Retrieved from http://www.w3.org/TR/rdf11-concepts/.
    [12]
    Ronald Fagin, Phokion G. Kolaitis, and Lucian Popa. 2005. Data exchange: Getting to the core. TODS 30, 1 (2005), 174--210.
    [13]
    Wenfei Fan and Philip Bohannon. 2008. Information preserving XML schema embedding. ACM Trans. Database Syst. 33, 1 (2008), 4:1--4:44.
    [14]
    Mark Giereth. 2005. On partial encryption of RDF-graphs. In The Semantic Web - ISWC 2005, Proceedings of the 4th International Semantic Web Conference (ISWC’05). Springer, 308--322.
    [15]
    Georg Gottlob. 2005. Computing cores for data exchange: New algorithms and practical solutions. In ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS). ACM, 148--159.
    [16]
    Georg Gottlob and Alan Nash. 2008. Efficient core computation in data exchange. J. ACM 55, 2 (2008).
    [17]
    Bernardo Cuenca Grau, Boris Motik, Zhe Wu, Achille Fokoue, and Carsten Lutz. 2009. OWL 2 Web Ontology Language: Profiles. W3C Recommendation. Retrieved from http://www.w3.org/TR/owl2-profiles/.
    [18]
    Claudio Gutierrez, Carlos A. Hurtado, Alberto O. Mendelzon, and Jorge Pérez. 2011. Foundations of semantic web databases. J. Comput. Syst. Sci. 77, 3 (2011), 520--541.
    [19]
    Steve Harris, Andy Seaborne, and Eric Prud’hommeaux. 2013. SPARQL 1.1 Query Language. W3C Recommendation. Retrieved from http://www.w3.org/TR/sparql11-query/.
    [20]
    Patrick Hayes. 2004. RDF Semantics. W3C Recommendation. Retrieved from http://www.w3.org/TR/2004/REC-rdf-mt-20040210/.
    [21]
    Patrick Hayes and Peter F. Patel-Schneider. 2014. RDF 1.1 Semantics. W3C Recommendation. Retrieved from http://www.w3.org/TR/2014/REC-rdf11-mt-20140225/.
    [22]
    Tom Heath and Christian Bizer. 2011. Linked Data: Evolving the Web into a Global Data Space. Vol. 1, Issue 1. Morgan 8 Claypool. 1--136 pages.
    [23]
    Pavol Hell and Jaroslav Nes̆etr̆il. 1992. The core of a graph. Discr. Math. 109, 1--3 (1992), 127--126.
    [24]
    Ivan Herman, Ben Adida, Manu Sporny, and Mark Birbeck. 2013. RDFa 1.1 Primer -- Second Edition -- Rich Structured Data Markup for Web Documents. W3C Working Group Note. (22 Aug. 2013). http://www.w3.org/TR/rdfa-primer/.
    [25]
    Daniel Hernández, Aidan Hogan, and Markus Krötzsch. 2015. Reifying RDF: What works well with wikidata?. In Proceedings of the 11th International Workshop on Scalable Semantic Web Knowledge Base Systems co-located with the 14th International Semantic Web Conference (ISWC’15) (CEUR Workshop Proceedings), Vol. 1457. 32--47. http://ceur-ws.org/Vol-1457/SSWS2015_paper3.pdf.
    [26]
    Edzard Höfig and Ina Schieferdecker. 2014. Hashing of RDF graphs and a solution to the blank node problem. In Proceedings of the 10th International Workshop on Uncertainty Reasoning for the Semantic Web (URSW’14) co-located with the 13th International Semantic Web Conference (ISWC’14) (CEUR Workshop Proceedings), Vol. 1259. 55--66. http://ceur-ws.org/Vol-1259/method2014_submission_1.pdf.
    [27]
    Aidan Hogan. 2015. Skolemising blank nodes while preserving isomorphism. In International Conference on World Wide Web (WWW’15). 430--440.
    [28]
    Aidan Hogan, Marcelo Arenas, Alejandro Mallea, and Axel Polleres. 2014. Everything you always wanted to know about blank nodes. J. Web Sem. 27 (2014), 42--69.
    [29]
    Aidan Hogan, Jürgen Umbrich, Andreas Harth, Richard Cyganiak, Axel Polleres, and Stefan Decker. 2012. An empirical survey of Linked Data conformance. J. Web Sem. 14 (2012), 14--44.
    [30]
    Tommi A. Junttila and Petteri Kaski. 2007. Engineering an efficient canonical labeling tool for large and sparse graphs. In Workshop on Algorithm Engineering and Experiments (ALENEX).
    [31]
    Tobias Käfer and Andreas Harth. 2014. Billion Triples Challenge data set. Retrieved from http://km.aifb.kit.edu/projects/btc-2014/.
    [32]
    Andreas Kasten, Ansgar Scherp, and Peter Schauß. 2014. A framework for iterative signing of graph data on the web. In The Semantic Web: Trends and Challenges, Proceedings of the 11th International Conference (ESWC’14). Springer, 146--160.
    [33]
    Tobias Kuhn and Michel Dumontier. 2014. Trusty URIs: Verifiable, immutable, and permanent digital artifacts for linked data. In ESWC. 395--410.
    [34]
    Christina Lantzaki, Panagiotis Papadakos, Anastasia Analyti, and Yannis Tzitzikas. 2017. Radius-aware approximate blank node matching using signatures. Knowl. Inf. Syst. 50, 2 (2017), 505--542.
    [35]
    Ora Lassila and Ralph R. Swick. 1999. Resource Description Framework (RDF) Model and Syntax Specification. W3C Recommendation. Retrieved from http://www.w3.org/TR/1999/REC-rdf-syntax-19990222/.
    [36]
    Andrés Letelier, Jorge Pérez, Reinhard Pichler, and Sebastian Skritek. 2013. Static analysis and optimization of semantic web queries. ACM Trans. Database Syst. 38, 4 (2013), 25:1--25:45.
    [37]
    Alejandro Mallea, Marcelo Arenas, Aidan Hogan, and Axel Polleres. 2011. On blank nodes. In International Semantic Web Conference. 421--437.
    [38]
    Bruno Marnette, Giansalvatore Mecca, and Paolo Papotti. 2010. Scalable data exchange with functional dependencies. PVLDB 3, 1 (2010), 105--116.
    [39]
    Brian McBride. 2002. Jena: A semantic web toolkit. IEEE Internet Computing 6, 6 (2002), 55--59.
    [40]
    Brendan McKay. 1980. Practical graph isomorphism. In Congressum Numerantium, Vol. 30. 45--87.
    [41]
    Brendan D. McKay and Adolfo Piperno. 2014. Practical graph isomorphism, II. J. Symb. Comput. 60 (2014), 94--112.
    [42]
    Giansalvatore Mecca, Paolo Papotti, and Salvatore Raunich. 2012. Core schema mappings: Scalable core computations in data exchange. Inf. Syst. 37, 7 (2012), 677--711.
    [43]
    Robert Meusel, Petar Petrovski, and Christian Bizer. 2014. The webdatacommons microdata, RDFa and microformat dataset series. In International Semantic Web Conference (ISWC’14). 277--292.
    [44]
    Takunari Miyazaki. 1997. The complexity of McKay’s canonical labeling algorithm. In Groups and Computation, II. 239--256.
    [45]
    Reinhard Pichler, Axel Polleres, Sebastian Skritek, and Stefan Woltran. 2013. Complexity of redundancy detection on RDF graphs in the presence of rules, constraints, and queries. Semantic Web 4, 4 (2013), 351--393.
    [46]
    Reinhard Pichler, Axel Polleres, Fang Wei, and Stefan Woltran. 2008. dRDF: Entailment for domain-restricted RDF. In ESWC. 200--214.
    [47]
    Reinhard Pichler and Vadim Savenkov. 2010. Towards practical feasibility of core computation in data exchange. Theor. Comput. Sci. 411, 7--9 (2010), 935--957.
    [48]
    Adolfo Piperno. 2008. Search space contraction in canonical labeling of graphs (preliminary version). CoRR abs/0804.4881 (2008). http://arxiv.org/abs/0804.4881
    [49]
    Vadim Savenkov. 2013. Algorithms for core computation in data exchange. In Data Exchange, Integration, and Streams. Dagstuhl Follow-Ups, Vol. 5. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 38--68.
    [50]
    Craig Sayers and Alan H. Karp. 2004. Computing the Digest of an RDF Graph. HP Technical Report. http://www.hpl.hp.com/techreports/2003/HPL-2003-235R1.pdf.
    [51]
    Max Schmachtenberg, Christian Bizer, and Heiko Paulheim. 2014. Adoption of the linked data best practices in different topical domains. In International Semantic Web Conference (ISWC’14). 245--260.
    [52]
    Stephen B. Seidman. 1983. Network structure and minimum degree. Soc. Netw. 5 (1983), 269--287.
    [53]
    Greg Daniel Tener. 2009. Attacks on Difficult Instances of Graph Isomorphism: Sequential and Parallel Algorithms. Ph.D. dissertation. University of Central Florida, Orlando, FL.
    [54]
    Arash Termehchy, Marianne Winslett, Yodsawalai Chodpathumwan, and Austin Gibbons. 2012. Design independent query interfaces. IEEE Trans. Knowl. Data Eng. 24, 10 (2012), 1819--1832.
    [55]
    Giovanni Tummarello, Christian Morbidoni, Paolo Puliti, and Francesco Piazza. 2005. Signing individual fragments of an RDF graph. In Proceedings of the 14th International Conference on World Wide Web (WWW’05) -- Special interest tracks and posters. ACM, 1020--1021.
    [56]
    Denny Vrandecic and Markus Krötzsch. 2014. Wikidata: A free collaborative knowledgebase. Commun. ACM 57, 10 (2014), 78--85.

    Cited By

    View all
    • (2024)smart-KG: Partition-Based Linked Data Fragments for querying knowledge graphsSemantic Web10.3233/SW-243571(1-45)Online publication date: 20-Mar-2024
    • (2024)SSI, from Specifications to Protocol? Formally Verify Security!Proceedings of the ACM on Web Conference 202410.1145/3589334.3645426(1620-1631)Online publication date: 13-May-2024
    • (2023)Quantifiable integrity for Linked Data on the webSemantic Web10.3233/SW-23340914:6(1167-1207)Online publication date: 13-Dec-2023
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on the Web
    ACM Transactions on the Web  Volume 11, Issue 4
    November 2017
    257 pages
    ISSN:1559-1131
    EISSN:1559-114X
    DOI:10.1145/3127338
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 25 July 2017
    Accepted: 01 March 2017
    Received: 01 June 2016
    Published in TWEB Volume 11, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Semantic web
    2. isomorphism
    3. linked data
    4. signing
    5. skolemisation

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • Fondecyt
    • Millennium Nucleus Center for Semantic Web Research

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)14
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 11 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)smart-KG: Partition-Based Linked Data Fragments for querying knowledge graphsSemantic Web10.3233/SW-243571(1-45)Online publication date: 20-Mar-2024
    • (2024)SSI, from Specifications to Protocol? Formally Verify Security!Proceedings of the ACM on Web Conference 202410.1145/3589334.3645426(1620-1631)Online publication date: 13-May-2024
    • (2023)Quantifiable integrity for Linked Data on the webSemantic Web10.3233/SW-23340914:6(1167-1207)Online publication date: 13-Dec-2023
    • (2023)VOYAGE: A Large Collection of Vocabulary Usage in Open RDF DatasetsThe Semantic Web – ISWC 202310.1007/978-3-031-47243-5_12(211-229)Online publication date: 27-Oct-2023
    • (2022)Semantics and canonicalisation of SPARQL 1.1Semantic Web10.3233/SW-21287113:5(829-893)Online publication date: 18-Aug-2022
    • (2022)Formalising Linked-Data based Verifiable Credentials for Selective Disclosure2022 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW)10.1109/EuroSPW55150.2022.00013(52-65)Online publication date: Jun-2022
    • (2022)Interwoven Hash of Vicious Circle Free Graph2022 IEEE International Conference on Blockchain (Blockchain)10.1109/Blockchain55522.2022.00069(449-454)Online publication date: Aug-2022
    • (2022)Technological Foundations of Ontological Ecosystems on the 3rd Generation BlockchainsIEEE Access10.1109/ACCESS.2022.314101410(12487-12502)Online publication date: 2022
    • (2022)Efficient Dependency Analysis for Rule-Based OntologiesThe Semantic Web – ISWC 202210.1007/978-3-031-19433-7_16(267-283)Online publication date: 23-Oct-2022
    • (2022)Self-verifying Web Resource Representations Using Solid, RDF-Star and Signed URIsThe Semantic Web: ESWC 2022 Satellite Events10.1007/978-3-031-11609-4_26(138-142)Online publication date: 29-May-2022
    • Show More Cited By

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media