Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3167132.3167164acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article

Predicting incorrect mappings: a data-driven approach applied to DBpedia

Published: 09 April 2018 Publication History
  • Get Citation Alerts
  • Abstract

    DBpedia releases consist of more than 70 multilingual datasets that cover data extracted from different language-specific Wikipedia instances. The data extracted from those Wikipedia instances are transformed into RDF using mappings created by the DBpedia community. Nevertheless, not all the mappings are correct and consistent across all the distinct language-specific DBpedia datasets. As these incorrect mappings are spread in a large number of mappings, it is not feasible to inspect all such mappings manually to ensure their correctness. Thus, the goal of this work is to propose a data-driven method to detect incorrect mappings automatically by analyzing the information from both instance data as well as ontological axioms. We propose a machine learning based approach to building a predictive model which can detect incorrect mappings. We have evaluated different supervised classification algorithms for this task and our best model achieves 93% accuracy. These results help us to detect incorrect mappings and achieve a high-quality DBpedia.

    References

    [1]
    Maribel Acosta, Amrapali Zaveri, Elena Simperl, Dimitris Kontokostas, Sören Auer, and Jens Lehmann. 2013. Crowdsourcing linked data quality assessment. In International Semantic Web Conference. Springer, 260--276.
    [2]
    Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and Zachary Ives. 2007. DBpedia: A Nucleus for a Web of Open Data. The Semantic Web (2007), 722--735.
    [3]
    Jeremy Debattista, Christoph Lange, and Sören Auer. 2016. A Preliminary Investigation Towards Improving Linked Data Quality Using Distance-Based Outlier Detection. In Joint International Semantic Technology Conference. Springer, 116--124.
    [4]
    Jeremy Debattista, Santiago Londoño, Christoph Lange, and Sören Auer. 2015. Quality assessment of linked datasets using probabilistic approximation. In European Semantic Web Conference. Springer, 221--236.
    [5]
    Anastasia Dimou, Dimitris Kontokostas, Markus Freudenberg, Ruben Verborgh, Jens Lehmann, Erik Mannens, Sebastian Hellmann, and Rik Van de Walle. 2015. Assessing and refining mappingsto rdf to improve dataset quality. In International Semantic Web Conference. Springer, 133--149.
    [6]
    Daniel Fleischhacker, Heiko Paulheim, Volha Bryl, Johanna Völker, and Christian Bizer. 2014. Detecting errors in numerical linked data using cross-checked outlier detection. In International Semantic Web Conference. Springer, 357--372.
    [7]
    Daniel Gerber, Diego Esteves, Jens Lehmann, Lorenz Bühmann, Ricardo Usbeck, Axel-Cyrille Ngonga Ngomo, and René Speck. 2015. DeFacto Temporal and multilingual Deep Fact Validation. Web Semantics: Science, Services and Agents on the World Wide Web 35 (2015), 85--101.
    [8]
    Ben Hachey, Will Radford, Joel Nothman, Matthew Honnibal, and James R Curran. 2013. Evaluating entity linking with Wikipedia. Artificial intelligence 194 (2013), 130--150.
    [9]
    Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H Witten. 2009. The WEKA data mining software: an update. ACM SIGKDD explorations newsletter 11, 1 (2009), 10--18.
    [10]
    Markus Ketterl, Lars Knipping, Nadine Ludwig, Robert Mertens, Jörg Waitelonis, Nadine Ludwig, Magnus Knuth, and Harald Sack. 2011. Whoknows? evaluating linked data heuristics with a quiz that cleans up dbpedia. Interactive Technology and Smart Education 8, 4 (2011), 236--248.
    [11]
    Pablo N Mendes, Hannes Mühleisen, and Christian Bizer. 2012. Sieve: linked data quality assessment and fusion. In Proceedings of the 2012 Joint EDBT/ICDT Workshops. ACM, 116--123.
    [12]
    Heiko Paulheim. 2014. Identifying Wrong Links between Datasets by Multidimensional Outlier Detection. In WoDOOM. 27--38.
    [13]
    Heiko Paulheim. 2017. Data-driven Joint Debugging of the DBpedia Mappings and Ontology. In European Semantic Web Conference. 1--15.
    [14]
    Heiko Paulheim and Christian Bizer. 2014. Improving the quality of linked data using statistical distributions. International Journal on Semantic Web and Information Systems (IJSWIS) 10, 2 (2014), 63--86.
    [15]
    Mariano Rico, Nandana Mihindukulasooriya, and Asunción Gómez-Pérez. 2016. Data-Driven RDF Property Semantic-Equivalence Detection Using NLP Techniques. In EKAW Proceedings, LNCS 10024. Springer International Publishing, 797--804.
    [16]
    Amrapali Zaveri, Anisa Rula, Andrea Maurino, Ricardo Pietrobon, Jens Lehmann, and Sören Auer. 2016. Quality assessment for linked data: A survey. Semantic Web 7, 1 (2016), 63--93.

    Cited By

    View all
    • (2024)Knowledge-Graph-Based IoTs Entity Discovery Middleware for Nonsmart SensorIEEE Transactions on Industrial Informatics10.1109/TII.2023.329254020:2(2551-2563)Online publication date: Feb-2024
    • (2023) Structured knowledge creation for Urdu language: A DBpedia approach Expert Systems10.1111/exsy.13223Online publication date: 14-Jan-2023
    • (2022)Crowdsourced Fact Validation for Knowledge Bases2022 IEEE 38th International Conference on Data Engineering (ICDE)10.1109/ICDE53745.2022.00075(938-950)Online publication date: May-2022
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SAC '18: Proceedings of the 33rd Annual ACM Symposium on Applied Computing
    April 2018
    2327 pages
    ISBN:9781450351911
    DOI:10.1145/3167132
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 09 April 2018

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. DBpedia
    2. data quality
    3. linked data
    4. machine learning
    5. mappings

    Qualifiers

    • Research-article

    Funding Sources

    • MINECO

    Conference

    SAC 2018
    Sponsor:
    SAC 2018: Symposium on Applied Computing
    April 9 - 13, 2018
    Pau, France

    Acceptance Rates

    Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)9
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 09 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Knowledge-Graph-Based IoTs Entity Discovery Middleware for Nonsmart SensorIEEE Transactions on Industrial Informatics10.1109/TII.2023.329254020:2(2551-2563)Online publication date: Feb-2024
    • (2023) Structured knowledge creation for Urdu language: A DBpedia approach Expert Systems10.1111/exsy.13223Online publication date: 14-Jan-2023
    • (2022)Crowdsourced Fact Validation for Knowledge Bases2022 IEEE 38th International Conference on Data Engineering (ICDE)10.1109/ICDE53745.2022.00075(938-950)Online publication date: May-2022
    • (2022)Linked Data Quality Assessment: A SurveyWeb Services – ICWS 202110.1007/978-3-030-96140-4_5(63-76)Online publication date: 18-Feb-2022
    • (2021)Representing COVID-19 information in collaborative knowledge graphs: The case of WikidataSemantic Web10.3233/SW-210444(1-32)Online publication date: 28-Sep-2021
    • (2020)Task-Oriented Uncertainty Evaluation for Linked Data Based on Graph InterlinksKnowledge Engineering and Knowledge Management10.1007/978-3-030-61244-3_15(204-215)Online publication date: 27-Oct-2020
    • (2019)DBkWik: extracting and integrating knowledge from thousands of WikisKnowledge and Information Systems10.1007/s10115-019-01415-5Online publication date: 2-Nov-2019
    • (2019)Applying Predictive Models to Support skos:ExactMatch ValidationMetadata and Semantic Research10.1007/978-3-030-36599-8_16(187-193)Online publication date: 4-Dec-2019
    • (2018)Challenges in Quality Assessment of Arabic DBpediaProceedings of the 8th International Conference on Web Intelligence, Mining and Semantics10.1145/3227609.3227675(1-4)Online publication date: 25-Jun-2018
    • (2018)DBkWik: A Consolidated Knowledge Graph from Thousands of Wikis2018 IEEE International Conference on Big Knowledge (ICBK)10.1109/ICBK.2018.00011(17-24)Online publication date: Nov-2018
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media