Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3132218.3132238acmotherconferencesArticle/Chapter ViewAbstractPublication PagessemanticsConference Proceedingsconference-collections
research-article

IDOL: Comprehensive & Complete LOD Insights

Published: 11 September 2017 Publication History

Abstract

Over the last decade, we observed a steadily increasing amount of RDF datasets made available on the web of data. The decentralized nature of the web, however, makes it hard to identify all these datasets. Even more so, when downloadable data distributions are discovered, only insufficient metadata is available to describe the datasets properly, thus posing barriers on its usefulness and reuse. In this paper, we describe an attempt to exhaustively identify the whole linked open data cloud by harvesting metadata from multiple sources, providing insights about duplicated data and the general quality of the available metadata. This was only possible by using a probabilistic data structure called Bloom filter. Finally, we published a dump file containing metadata which can further be used to enrich existent datasets.

References

[1]
K. Alexander and M. Hausenblas. Describing linked datasets - on the design and usage of void, the vocabulary of interlinked datasets. In In Linked Data on the Web Workshop (LDOW 09), 2009.
[2]
W. Beek, L. Rietveld, H. R. Bazoobandi, J. Wielemaker, S. Schlobach, T. Tudorache, A. Bernstein, C. Welty, C. Knoblock, D. Vrandečić, P. Groth, N. Noy, K. Janowicz, and C. Goble. Lod laundromat: A uniform way of publishing other people's dirty data. In The Semantic Web -- ISWC 2014: 13th International Semantic Web Conference, Riva del Garda, Italy, October 19--23, 2014. Proceedings, Part I, Cham, 2014. Springer International Publishing.
[3]
B. H. Bloom. Space/Time Trade-offs in Hash Coding with Allowable Errors. Communications of the ACM, 13(7), July 1970.
[4]
M. Brümmer, C. Baron, I. Ermilov, M. Freudenberg, D. Kontokostas, and S. Hellmann. DataID: Towards Semantically Rich Metadata for Complex Datasets. In Proceedings of the 10th International Conference on Semantic Systems, 2014.
[5]
J. Debattista, S. Londoño, C. Lange, and S. Auer. Quality assessment of linked datasets using probabilistic approximation. In Proceedings of the 12th European Semantic Web Conference on The Semantic Web. Latest Advances and New Domains - Volume 9088, pages 221--236, New York, NY, USA, 2015. Springer-Verlag New York, Inc.
[6]
D. Esteves, D. Moussallem, C. B. Neto, T Soru, R. Usbeck, M. Ackermann, and J. Lehmann. Mex vocabulary: a lightweight interchange format for machine learning experiments. In Proceedings of the 11th International Conference on Semantic Systems, pages 169--176. ACM, 2015.
[7]
M. Freudenberg, M. Brummer, J. Rucknagel, R. Ulrich, T Eckart, D. Kontokostas, and S. Hellmann. The metadata ecosystem of dataid. In 10th International Conference on Metadata and Semantics Research, 2016.
[8]
A. Hogan, A. Harth, J. Umbrich, S. Kinsella, A. Polleres, and S. Decker. Searching and browsing linked data with swse: The semantic web search engine. Web semantics: science, services and agents on the worldwide web, 9(4), 2011.
[9]
J. Lehmann, R. Isele, M. Jakob, A. Jentzsch, D. Kontokostas, P. N. Mendes, S. Hellmann, M. Morsey, P. van Kleef, S. Auer, and C. Bizer. DBpedia - A Large-scale, Multilingual Knowledge Base Extracted from Wikipedia. Semantic Web Journal, 2014.
[10]
F. Maali and J. Erickson. Data Catalog Vocabulary (DCAT). W3C recommendation, W3C, Jan. 2014.
[11]
J. P. McCrae and P. Cimiano. Linghub: a Linked Data based portal supporting the discovery of language resources. In A. Filipowska, R. Verborgh, and A. Polleres, editors, SEMANTiCS, CEUR Workshop Proceedings. CEUR-WS.org, 2015.
[12]
R. Meusel, B. Spahiu, C. Bizer, and H. Paulheim. Towards Automatic Topical Classification of LOD Datasets. In C. Bizer, S. Auer, T Berners-Lee, and T Heath, editors, LDOW@WWW, CEUR Workshop Proceedings, 2015.
[13]
L. Michael, W. Nejdl, O. Papapetrou, and W. Siberski. Improving distributed join efficiency with extended bloom filter operations. In 21st International Advanced Information Networking and Applications (AINA-07). IEEE, 2007.
[14]
C. B. Neto, D. Esteves, T Soru, D. Moussallem, A. Valdestilhas, and E. Marx. Wasota: What are the states of the art? In SEMANTiCS (Posters, Demos, SuCCESS), 2016.
[15]
H. Pampel, P. Vierkant, F. Scholze, R. Bertelmann, M. Kindling, J. Klump, H.-J. Goebelbecker, J. Gundlach, P. Schirmbacher, and U. Dierolf. Making Research Data Repositories Visible: The re3data.org Registry. PLoS One, 8(11), Nov 2013.
[16]
C. Patel, K. Supekar, Y. Lee, and E. K. Park. OntoKhoj: A Semantic Web Portal for Ontology Searching, Ranking and Classification. In Proceedings of the 5th ACM International Workshop on Web Information and Data Management. ACM, 2003.
[17]
M. Schmachtenberg, C. Bizer, and H. Paulheim. Adoption of the linked data best practices in different topical domains. In The Semantic Web -- ISWC 2014: 13th International Semantic Web Conference, Riva del Garda, Italy, October 19--23, 2014. Proceedings, Part I, Cham, 2014. Springer.
[18]
P.-Y. Vandenbussche, G. A. Atemezing1, P. Maria, and B. Vatant. Linked Open Vocabularies (LOV): a gateway to reusable semantic vocabularies on the Web. Semantic Web Journal, 2015.
[19]
R. Verborgh, M. Vander Sande, P. Colpaert, S. Coppens, E. Mannens, and R. Van de Walle. Web-Scale Querying through Linked Data Fragments. In Proceedings of the 7th Workshop on Linked Data on the Web, Apr. 2014.

Cited By

View all
  • (2021)GeoLOD: A Spatial Linked Data Catalog and RecommenderBig Data and Cognitive Computing10.3390/bdcc50200175:2(17)Online publication date: 19-Apr-2021
  • (2020)A more decentralized vision for Linked DataSemantic Web10.3233/SW-19038011:1(101-113)Online publication date: 1-Jan-2020
  • (2019)Is the LOD cloud at risk of becoming a museum for datasets? Looking ahead towards a fully collaborative and sustainable LOD cloudCompanion Proceedings of The 2019 World Wide Web Conference10.1145/3308560.3317075(850-858)Online publication date: 13-May-2019
  • Show More Cited By
  1. IDOL: Comprehensive & Complete LOD Insights

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    Semantics2017: Proceedings of the 13th International Conference on Semantic Systems
    September 2017
    202 pages
    ISBN:9781450352963
    DOI:10.1145/3132218
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    In-Cooperation

    • St. Pölten University: St. Pölten University of Applied Sciences, Austria
    • Wolters Kluwer: Wolters Kluwer, Germany
    • Vrije Universeit Amsterdam: Vrije Universeit Amsterdam
    • Semantic Web Company: Semantic Web Company
    • Uinv. Leipzig: Universität Leipzig

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 11 September 2017

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Bloom Filter
    2. Dataset Overlap
    3. Linked Open Data
    4. RDF

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    Semantics2017

    Acceptance Rates

    Overall Acceptance Rate 40 of 182 submissions, 22%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)3
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 12 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)GeoLOD: A Spatial Linked Data Catalog and RecommenderBig Data and Cognitive Computing10.3390/bdcc50200175:2(17)Online publication date: 19-Apr-2021
    • (2020)A more decentralized vision for Linked DataSemantic Web10.3233/SW-19038011:1(101-113)Online publication date: 1-Jan-2020
    • (2019)Is the LOD cloud at risk of becoming a museum for datasets? Looking ahead towards a fully collaborative and sustainable LOD cloudCompanion Proceedings of The 2019 World Wide Web Conference10.1145/3308560.3317075(850-858)Online publication date: 13-May-2019
    • (2018)SHARK: A Test-Driven Framework for Design and Evolution of OntologiesThe Semantic Web: ESWC 2018 Satellite Events10.1007/978-3-319-98192-5_50(314-324)Online publication date: 2-Aug-2018

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media