Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2484712.2484713acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Large-scale bisimulation of RDF graphs

Published: 23 June 2013 Publication History
  • Get Citation Alerts
  • Abstract

    RDF datasets with billions of triples are no longer unusual and continue to grow constantly (e.g. LOD cloud) driven by the inherent flexibility of RDF that allows to represent very diverse datasets, ranging from highly structured to unstructured data. Because of their size, understanding and processing RDF graphs is often a difficult task and methods to reduce the size while keeping as much of its structural information become attractive. In this paper we study bisimulation as a means to reduce the size of RDF graphs according to structural equivalence. We study two bisimulation algorithms, one for sequential execution using SQL and one for distributed execution using MapReduce. We demonstrate that the MapReduce-based implementation scales linearly with the number of the RDF triples, allowing to compute the bisimulation of very large RDF graphs within a time which is by far not possible for the sequential version. Experiments based on synthetic benchmark data and real data (DBPedia) exhibit a reduction of more than 90% of the size of the RDF graph in terms of the number of nodes to the number of blocks in the resulting bisimulation partition.

    References

    [1]
    The Linking Open Data cloud diagram. http://richard.cyganiak.de/2007/10/lod/.
    [2]
    RDF Specification Overview (w3c). http://www.w3.org/standards/techs/rdf.
    [3]
    Social Network Intelligence BenchMark. http://www.w3.org/wiki/Social_Network_Intelligence_BenchMark.
    [4]
    A. Alzoghbi and G. Lausen. Similar Structures inside RDF Graphs. In LDOW, 2013, to appear.
    [5]
    C. Bizer, J. Lehmann, G. Kobilarov, S. Auer, C. Becker, R. Cyganiak, and S. Hellmann. DBpedia - A crystallization point for the Web of Data. Web Semantics: Science, Services and Agents on the World Wide Web, 7(3):154--165, 2009.
    [6]
    S. Blom and S. Orzan. A Distributed Algorithm for Strong Bisimulation Reduction of State Spaces. STTT, 7(1):74--86, 2005.
    [7]
    J. Dean and S. Ghemawat. MapReduce: simplified data processing on large clusters. Commun. ACM, 51(1):107--113, 2008.
    [8]
    A. Dovier, C. Piazza, and A. Policriti. An efficient algorithm for computing bisimulation equivalence. Theoretical Computer Science, 311:221--256, 2004.
    [9]
    S. Duan, A. Kementsietsidis, K. Srinivas, and O. Udrea. Apples and Oranges: A Comparison of RDF Benchmarks and Real RDF Datasets. In SIGMOD, pages 145--156, 2011.
    [10]
    W. Fan, J. Li, X. Wang, and Y. Wu. Query Preserving Graph Compression. In SIGMOD, pages 157--168, 2012.
    [11]
    Y. Guo, Z. Pan, and J. Heflin. LUBM: A benchmark for OWL knowledge base systems. J. Web Sem., 3(2-3):158--182, 2005.
    [12]
    J. Hellings, G. H. L. Fletcher, and H. J. Haverkort. Efficient External-Memory Bisimulation on DAGs. In SIGMOD Conference, 2012.
    [13]
    P. C. Kanellakis and S. A. Smolka. CCS Expressions, Finite State Processes, and Three Problems of Equivalence. In PODC, 1983.
    [14]
    R. Kaushik, P. Bohannon, J. F. Naughton, and H. F. Korth. Covering indexes for branching path queries. In SIGMOD Conference, 2002.
    [15]
    R. Kaushik, P. Shenoy, P. Bohannon, and E. Gudes. Exploiting local similarity for indexing paths in graph-structured data. In ICDE, 2002.
    [16]
    S. Khatchadourian and M. P. Consens. ExpLOD: Summary-Based Exploration of Interlinking and RDF Usage in the Linked Open Data Cloud. In ESWC (2), 2010.
    [17]
    M. Konrath, T. Gottron, S. Staab, and A. Scherp. SchemEX - Efficient Construction of a Data Catalogue by Stream-Based Indexing of Linked Data. Journal of Web Semantics, 16(5), 2012.
    [18]
    Y. Luo, Y. de Lange, G. H. L. Fletcher, P. De Bra, J. Hidders, and Y. Wu. Bisimulation reduction of Big Graphs on Mapreduce. In BNCOD, 2013, to appear.
    [19]
    R. Milner. Communication and concurrency. PHI Series in computer science. Prentice Hall, 1989.
    [20]
    T. Milo and D. Suciu. Index structures for path expressions. In ICDT, 1999.
    [21]
    S. Nestorov, S. Abiteboul, and R. Motwani. Extracting Schema from Semistructured Data. In SIGMOD Conference, 1998.
    [22]
    C. Qun, A. Lim, and K. W. Ong. D(k)-index: An adaptive structural summary for graph-structured data. In SIGMOD Conference, 2003.
    [23]
    D. Sangiorgi. On the Origins of Bisimulation and Coinduction. ACM Trans. Program. Lang. Syst., 31(4):15:1--15:41, 2009.
    [24]
    M. Schmidt, T. Hornung, G. Lausen, and C. Pinkel. SP2Bench: A SPARQL Performance Benchmark. In ICDE, pages 222--233, 2009.
    [25]
    T. Tran, G. Ladwig, and S. Rudolph. RDF Data Partitioning and Query Processing Using Structure Indexes. TKDE, 99, 2012. to appear.

    Cited By

    View all
    • (2024)Instance-Based Lossless Summarization of Knowledge Graph With Optimized Triples and Corrections (IBA-OTC)IEEE Access10.1109/ACCESS.2023.334098412(5584-5604)Online publication date: 2024
    • (2024)HERSE: Handling and Enhancing RDF Summarization Through Blank Node EliminationFoundations of Intelligent Systems10.1007/978-3-031-62700-2_9(87-101)Online publication date: 17-Jun-2024
    • (2023)A Novel Approach for Extracting Summarized RDF Graph from Heterogeneous Corpus2023 International Conference on Innovations in Intelligent Systems and Applications (INISTA)10.1109/INISTA59065.2023.10310645(1-7)Online publication date: 20-Sep-2023
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SWIM '13: Proceedings of the Fifth Workshop on Semantic Web Information Management
    June 2013
    50 pages
    ISBN:9781450321945
    DOI:10.1145/2484712
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 23 June 2013

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. MapReduce
    2. RDF
    3. bisimulation
    4. semantic web

    Qualifiers

    • Research-article

    Conference

    SIGMOD/PODS'13
    Sponsor:

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)6
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 10 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Instance-Based Lossless Summarization of Knowledge Graph With Optimized Triples and Corrections (IBA-OTC)IEEE Access10.1109/ACCESS.2023.334098412(5584-5604)Online publication date: 2024
    • (2024)HERSE: Handling and Enhancing RDF Summarization Through Blank Node EliminationFoundations of Intelligent Systems10.1007/978-3-031-62700-2_9(87-101)Online publication date: 17-Jun-2024
    • (2023)A Novel Approach for Extracting Summarized RDF Graph from Heterogeneous Corpus2023 International Conference on Innovations in Intelligent Systems and Applications (INISTA)10.1109/INISTA59065.2023.10310645(1-7)Online publication date: 20-Sep-2023
    • (2023)Computing k-Bisimulations for Large Graphs: A Comparison and Efficiency AnalysisGraph Transformation10.1007/978-3-031-36709-0_12(223-242)Online publication date: 12-Jul-2023
    • (2021)A Framework to Quantify Approximate Simulation on Graph Data2021 IEEE 37th International Conference on Data Engineering (ICDE)10.1109/ICDE51399.2021.00117(1308-1319)Online publication date: Apr-2021
    • (2021)FLUID: A common model for semantic structural graph summaries based on equivalence relationsTheoretical Computer Science10.1016/j.tcs.2020.12.019854(136-158)Online publication date: Jan-2021
    • (2021)A survey on semantic schema discoveryThe VLDB Journal10.1007/s00778-021-00717-x31:4(675-710)Online publication date: 27-Nov-2021
    • (2021)ABSTAT-HD: a scalable tool for profiling very large knowledge graphsThe VLDB Journal10.1007/s00778-021-00704-231:5(851-876)Online publication date: 29-Sep-2021
    • (2020)RDF graph summarization for first-sight structure discoveryThe VLDB Journal10.1007/s00778-020-00611-y29:5(1191-1218)Online publication date: 30-Apr-2020
    • (2019)Quality metrics for RDF graph summarizationSemantic Web10.3233/SW-19034610:3(555-584)Online publication date: 1-Jan-2019
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media