research-article

Toward Veracity Assessment in RDF Knowledge Bases: An Exploratory Analysis

Authors:

Aniketh Janardhan Reddy,

Jens LehmannAuthors Info & Claims

Journal of Data and Information Quality (JDIQ), Volume 9, Issue 3

Article No.: 16, Pages 1 - 26

https://doi.org/10.1145/3177873

Published: 20 February 2018 Publication History

Abstract

Among different characteristics of knowledge bases, data quality is one of the most relevant to maximize the benefits of the provided information. Knowledge base quality assessment poses a number of big data challenges such as high volume, variety, velocity, and veracity. In this article, we focus on answering questions related to the assessment of the veracity of facts through Deep Fact Validation (DeFacto), a triple validation framework designed to assess facts in RDF knowledge bases. Despite current developments in the research area, the underlying framework faces many challenges. This article pinpoints and discusses these issues and conducts a thorough analysis of its pipeline, aiming at reducing the error propagation through its components. Furthermore, we discuss recent developments related to this fact validation as well as describing advantages and drawbacks of state-of-the-art models. As a result of this exploratory analysis, we give insights and directions toward a better architecture to tackle the complex task of fact-checking in knowledge bases.

References

[1]

Gabor Angeli, Melvin Johnson Premkumar, and Christopher D. Manning. 2015. Leveraging linguistic structure for open domain information extraction. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. Association for Computational Linguistics, Vol. 1, 344--354.

[2]

Hannah Bast, Björn Buchhold, and Elmar Haussmann. 2015. Relevance scores for triples from type-like relations. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 243--252.

Digital Library

[3]

Laure Berti-Équille and Javier Borge-Holthoefer. 2015. Veracity of Data: From Truth Discovery Computation Algorithms to Models of Misinformation Dynamics. Morgan 8 Claypool Publishers.

Digital Library

[4]

Christian Bizer, Jens Lehmann, Georgi Kobilarov, Sören Auer, Christian Becker, Richard Cyganiak, and Sebastian Hellmann. 2009. DBpedia—A crystallization point for the Web of Data. J. Web Semant: Sci., Serv. Agents World Wide Web 7, 3 (2009), 154--165.

Digital Library

[5]

Peter Buneman, Sanjeev Khanna, and Tan Wang-Chiew. 2001. Why and where: A characterization of data provenance. In Proceedings of the International Conference on Database Theory. Springer, 316--330.

Digital Library

[6]

Jorge Carrillo de Albornoz, Laura Plaza, and Pablo Gervás. 2010. A hybrid approach to emotional sentence polarity and intensity classification. In Proceedings of the 14th Conference on Computational Natural Language Learning. Association for Computational Linguistics, 153--161.

Digital Library

[7]

Davide Ceolin, Paul Groth, Willem Robert Van Hage, Archana Nottamkandath, and Wan Fokkink. 2012. Trust evaluation through user reputation and provenance analysis. In Proceedings of the 8th International Conference on Uncertainty Reasoning for the Semantic Web, Vol. 900. CEUR-WS.org, 15--26.

Digital Library

[8]

Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. Mach. Learn. 20, 3 (Sept. 1995), 273--297.

[9]

Leon Derczynski, Diana Maynard, Giuseppe Rizzo, Marieke van Erp, Genevieve Gorrell, Raphaël Troncy, Johann Petrak, and Kalina Bontcheva. 2015. Analysis of named entity recognition and linking for tweets. Inform. Proc. Manag. 51, 2 (2015), 32--49.

[10]

Boyang Ding, Quan Wang, and Bin Wang. 2017. Leveraging text and knowledge bases for triple scoring: An ensemble approach—The BOKCHOY triple scorer at WSDM Cup 2017. In WSDM Cup 2017 Notebook Papers, February 10, Cambridge, UK, Martin Potthast, Stefan Heindorf, and Hannah Bast (Eds.). CEUR-WS.org. Retrieved from http://www.wsdm-cup-2017.org/proceedings.html.

[11]

Diego Esteves, Diego Moussallem, Ciro Baron Neto, Tommaso Soru, Ricardo Usbeck, Markus Ackermann, and Jens Lehmann. 2015. MEX vocabulary: A lightweight interchange format for machine learning experiments. In Proceedings of the 11th International Conference on Semantic Systems. ACM, 169--176.

Digital Library

[12]

Diego Esteves, Rafael Peres, Jens Lehmann, and Giulio Napolitano. 2017. Named entity recognition in Twitter using images and text. Arxiv:1710.11027 (2017).

[13]

Oren Etzioni, Michael Cafarella, Doug Downey, Stanley Kok, Ana-Maria Popescu, Tal Shaked, Stephen Soderland, Daniel S. Weld, and Alexander Yates. 2004. Web-scale information extraction in Knowitall: (preliminary results). In Proceedings of the 13th International Conference on World Wide Web (WWW’04). ACM, New York, 100--110.

Digital Library

[14]

Daniel Gerber, Diego Esteves, Jens Lehmann, Lorenz Bühmann, Ricardo Usbeck, Axel-Cyrille Ngonga Ngomo, and René Speck. 2015. DeFacto—Temporal and multilingual deep fact validation. J. Web Semant: Sci., Serv. Agents World Wide Web 35 (2015), 85--101.

Digital Library

[15]

Daniel Gerber and Axel-Cyrille Ngonga Ngomo. 2012. Extracting multilingual natural-language patterns for RDF predicates. In International Conference on Knowledge Engineering and Knowledge Management. Springer, 87--96.

Digital Library

[16]

Hugh Glaser, Afraz Jaffri, and Ian Millard. 2009. Managing co-reference on the semantic web. WWW2009 Workshop: Linked Data on the Web (LDOW2009). University of Southampton Institutional Repository. https://eprints.soton.ac.uk/267587/.

[17]

Faegheh Hasibi, Darío Garigliotti, Shuo Zhang, and Krisztian Balog. 2017. Supervised ranking of triples for type-like relations—the cress triple scorer at the WSDM Cup 2017. In WSDM Cup 2017 Notebook Papers, February 10, Cambridge, UK, Martin Potthast, Stefan Heindorf, and Hannah Bast (Eds.). CEUR-WS.org. Retrieved from http://www.wsdm-cup-2017.org/proceedings.html.

[18]

Naeemul Hassan, Afroza Sultana, You Wu, Gensheng Zhang, Chengkai Li, Jun Yang, and Cong Yu. 2014. Data in, fact out: Automated monitoring of facts by FactQatcher. Proceedings of the VLDB Endowment 7, 13 (2014), 1557--1560.

Digital Library

[19]

Soon Gill Hong, Sin-hee Cho, and Mun Yong Yi. 2014. Unsupervised verb inference from nouns crossing root boundary. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers. Dublin City University and Association for Computational Linguistics, Dublin, 1248--1259.

[20]

Soon Gill Hong and Mun Yong Yi. 2017. Plausibility assessment of triples with instance-based learning distantly supervised by background knowledge. Submitted to Semant. Web J. Retrieved from http://www.semantic-web-journal.net/system/files/swj1546.pdf.

[21]

Krzysztof Janowicz. 2009. Trust and Provenance—-You Canfit Have One Without the Other. Technical Report. Institute for Geoinformatics, University of Muenster, Germany.

[22]

Jens Lehmann, Daniel Gerber, Mohamed Morsey, and Axel-Cyrille Ngonga Ngomo. 2012. DeFacto—Deep fact validation. In Proceedings of the International Semantic Web Conference.

Digital Library

[23]

Furong Li, Xin Luna Dong, Anno Langen, and Yang Li. 2017. Knowledge verification for longtail verticals. PVLDB 10, 11 (2017), 1370--1381.

Digital Library

[24]

Xian Li, Weiyi Meng, and Clement Yu. 2011. T-verifier: Verifying truthfulness of fact statements. In Proceedings of the 2011 IEEE 27th International Conference on Data Engineering (ICDE’11). IEEE Computer Society, Washington, DC, 63--74.

Digital Library

[25]

Marco Lippi and Paolo Torroni. 2016. Argumentation mining: State of the art and emerging trends. ACM Trans. Internet Technol. (TOIT) 16, 2 (2016), 10.

Digital Library

[26]

Catherine Macleod, Ralph Grishman, Adam Meyers, Leslie Barrett, and Ruth Reeves. 1998. NOMLEX: A lexicon of nominalizations. In Proceedings of Euralex98. 187--193.

[27]

Mónica Marrero, Julián Urbano, Sonia Sánchez-Cuadrado, Jorge Morato, and Juan Miguel Gómez-Berbís. 2013. Named entity recognition: Fallacies, challenges and opportunities. Computer Standards 8 Interfaces 35, 5 (2013), 482--489.

[28]

George A. Miller. 1995. WordNet: A lexical database for english. Commun. ACM 38, 11 (Nov. 1995), 39--41. 0001-0782

Digital Library

[29]

Raquel Mochales Palau and Marie-Francine Moens. 2009. Argumentation mining: The detection, classification and structure of arguments in text. In Proceedings of the 12th International Conference on Artificial Intelligence and Law. ACM, 98--107.

Digital Library

[30]

Jeff Pasternack and Dan Roth. 2011. Generalized fact-finding. In Proceedings of the 20th International Conference Companion on World Wide Web. ACM, 99--100.

Digital Library

[31]

Jeff Pasternack and Dan Roth. 2011. Making better informed trust decisions with generalized fact-finding. In IJCAI. 2324--2329.

Digital Library

[32]

Jeff Pasternack and Dan Roth. 2013. Latent credibility analysis. In Proceedings of the 22nd International Conference on World Wide Web (WWW’13). 1009--1020.

Digital Library

[33]

Zhong Qian, Peifeng Li, Qiaoming Zhu, Guodong Zhou, Zhunchen Luo, and Wei Luo. Speculation and negation scope detection via convolutional neural networks. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 815--825.

[34]

Lev Ratinov and Dan Roth. 2009. Design challenges and misconceptions in named entity recognition. In Proceedings of the Thirteenth Conference on Computational Natural Language Learning. Association for Computational Linguistics, 147--155.

Digital Library

[35]

Sebastian Riedel, Limin Yao, Andrew McCallum, and Benjamin M. Marlin. 2013. Relation extraction with matrix factorization and universal schemas. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 74--84.

[36]

Gil Rocha, Henrique Lopes Cardoso, and Jorge Teixeira. 2016. ArgMine: A framework for argumentation mining. Computational Processing of the Portuguese Language-12th International Conference, PROPOR. 13--15.

[37]

Barna Saha and Divesh Srivastava. 2014. Data quality: The other face of big data. In IEEE 30th International Conference on Data Engineering, Chicago (ICDE’14). 1294--1297.

[38]

B. Saha and D. Srivastava. 2014. Data quality: The other face of Big Data. In 2014 IEEE 30th International Conference on Data Engineering. 1294--1297.

[39]

Mehdi Samadi, Partha Talukdar, Manuela Veloso, and Manuel Blum. 2016. ClaimEval: Integrated and flexible framework for claim evaluation using credibility of sources. In Proceedings of the 13th AAAI Conference on Artificial Intelligence (AAAI’16). AAAI Press, 222--228.

Digital Library

[40]

Mehdi Samadi, Manuela M. Veloso, and Manuel Blum. 2013. OpenEval: Web information query evaluation. In Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence (AAAI’13). AAAI Press, Bellevue, Washington, 1163--1169. http://dl.acm.org/citation.cfm?id=2891460.2891622.

Digital Library

[41]

Richard Socher, Alex Perelygin, Jean Y. Wu, Jason Chuang, Christopher D. Manning, Andrew Y. Ng, Christopher Potts, and others. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’13), Vol. 1631. Citeseer, 1642.

[42]

Stephen Soderland, Oren Etzioni, Tal Shaked, and D. Weld. 2004. The use of web-based statistics to validate information extraction. In AAAI-04 Workshop on Adaptive Text Extraction and Mining. 21--26.

[43]

Andreas Vlachos and Sebastian Riedel. 2014. Fact checking: Task definition and dataset construction. In Proceedings of the ACL 2014 Workshop on Language Technologies and Computational Social Science. 18--22.

[44]

Amrapali Zaveri, Anisa Rula, Andrea Maurino, Ricardo Pietrobon, Jens Lehmann, and Sören Auer. 2015. Quality assessment for linked data: A survey. Semantic Web 7, 1 (2015), 63--93.

[45]

Xiaodan Zhu, Svetlana Kiritchenko, and Saif M. Mohammad. 2014. NRC-Canada-2014: Recent improvements in the sentiment analysis of tweets. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval’14). Citeseer, 443--447.

[46]

Valentin Zmiycharov, Dimitar Alexandrov, Preslav Nakov, Ivan Koychev, and Yasen Kiprov. 2017. Finding people’s professions and nationalities using distant supervision: The FMI@SU “goosefoot” team at the WSDM Cup 2017 triple scoring task. In WSDM Cup 2017 Notebook Papers, February 10, Cambridge, UK, Martin Potthast, Stefan Heindorf, and Hannah Bast (Eds.). CEUR-WS.org. http://www.wsdm-cup-2017.org/proceedings.html.

Cited By

Marchesin SSilvello G(2024)Efficient and Reliable Estimation of Knowledge Graph AccuracyProceedings of the VLDB Endowment10.14778/3665844.366586517:9(2392-2403)Online publication date: 1-May-2024
https://dl.acm.org/doi/10.14778/3665844.3665865
Sarıkoz S(2023)Examining Knowledge Extraction Processes from Heterogeneous Data SourcesBrilliant Engineering10.36937/ben.2023.47984:1(1-8)Online publication date: 8-Feb-2023
https://doi.org/10.36937/ben.2023.4798
Hogan ABlomqvist ECochez Md'Amato CMelo GGutierrez CKirrane SGayo JNavigli RNeumaier SNgomo APolleres ARashid SRula ASchmelzeisen LSequeda JStaab SZimmermann A(2021)Knowledge GraphsSynthesis Lectures on Data, Semantics, and Knowledge10.2200/S01125ED1V01Y202109DSK02212:2(1-257)Online publication date: 8-Nov-2021
https://doi.org/10.2200/S01125ED1V01Y202109DSK022
Show More Cited By

Index Terms

Toward Veracity Assessment in RDF Knowledge Bases: An Exploratory Analysis
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
  2. Machine learning
    1. Learning paradigms
      1. Supervised learning
2. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
  2. World Wide Web
    1. Web searching and information discovery

Recommendations

The role of reasoning for RDF validation
SEMANTICS '15: Proceedings of the 11th International Conference on Semantic Systems

For data practitioners embracing the world of RDF and Linked Data, the openness and flexibility is a mixed blessing. For them, data validation according to predefined constraints is a much sought-after feature, particularly as this is taken for granted ...
Linked Data Quality Assessment: A Survey
Web Services – ICWS 2021
Abstract
Data is of high quality if it is fit for its intended use in operations, decision-making, and planning. There is a colossal amount of linked data available on the web. However, it is difficult to understand how well the linked data fits into the ...
Luzzu—A Methodology and Framework for Linked Data Quality Assessment
Special Issue on Web Data Quality

The increasing variety of Linked Data on the Web makes it challenging to determine the quality of this data and, subsequently, to make this information explicit to data consumers. Despite the availability of a number of tools and frameworks to assess ...

Comments

Information & Contributors

Information

Published In

cover image Journal of Data and Information Quality

Journal of Data and Information Quality Volume 9, Issue 3

Special Issue on Improving the Veracity and Value of Big Data

September 2017

140 pages

ISSN:1936-1955

EISSN:1936-1963

DOI:10.1145/3183573

Editor:
Tiziana Catarci
Sapienza University of Rome, Rome, Italy

Issue’s Table of Contents

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 February 2018

Accepted: 01 January 2018

Revised: 01 December 2017

Received: 01 April 2017

Published in JDIQ Volume 9, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
418
Total Downloads

Downloads (Last 12 months)23
Downloads (Last 6 weeks)0

Reflects downloads up to 04 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Marchesin SSilvello G(2024)Efficient and Reliable Estimation of Knowledge Graph AccuracyProceedings of the VLDB Endowment10.14778/3665844.366586517:9(2392-2403)Online publication date: 1-May-2024
https://dl.acm.org/doi/10.14778/3665844.3665865
Sarıkoz S(2023)Examining Knowledge Extraction Processes from Heterogeneous Data SourcesBrilliant Engineering10.36937/ben.2023.47984:1(1-8)Online publication date: 8-Feb-2023
https://doi.org/10.36937/ben.2023.4798
Hogan ABlomqvist ECochez Md'Amato CMelo GGutierrez CKirrane SGayo JNavigli RNeumaier SNgomo APolleres ARashid SRula ASchmelzeisen LSequeda JStaab SZimmermann A(2021)Knowledge GraphsSynthesis Lectures on Data, Semantics, and Knowledge10.2200/S01125ED1V01Y202109DSK02212:2(1-257)Online publication date: 8-Nov-2021
https://doi.org/10.2200/S01125ED1V01Y202109DSK022
Azzini ABarbon SBellandi VCatarci TCeravolo PCudré-Mauroux PMaghool SPokorny JScannapieco MSedes FTavares GWrembel R(2021)Advances in Data Management in the Big Data EraAdvancing Research in Information and Communication Technology10.1007/978-3-030-81701-5_4(99-126)Online publication date: 4-Aug-2021
https://doi.org/10.1007/978-3-030-81701-5_4
Fensel DŞimşek UAngele KHuaman EKärle EPanasiuk OToma IUmbrich JWahler AFensel DŞimşek UAngele KHuaman EKärle EPanasiuk OToma IUmbrich JWahler A(2020)How to Build a Knowledge GraphKnowledge Graphs10.1007/978-3-030-37439-6_2(11-68)Online publication date: 1-Feb-2020
https://doi.org/10.1007/978-3-030-37439-6_2
Rula APalmonari MRubinacci SNgonga Ngomo ALehmann JMaurino AEsteves D(2019)TISCO: Temporal Scoping of FactsCompanion Proceedings of The 2019 World Wide Web Conference10.1145/3308560.3316524(959-960)Online publication date: 13-May-2019
https://dl.acm.org/doi/10.1145/3308560.3316524
Rula APalmonari MRubinacci SNgomo ALehmann JMaurino AEsteves D(undefined)TISCO: Temporal Scoping of FactsSSRN Electronic Journal10.2139/ssrn.3254234
https://doi.org/10.2139/ssrn.3254234

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents