Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Assessing the Quality of Sources in Wikidata Across Languages: A Hybrid Approach

Published: 15 October 2021 Publication History

Abstract

Wikidata is one of the most important sources of structured data on the web, built by a worldwide community of volunteers. As a secondary source, its contents must be backed by credible references; this is particularly important, as Wikidata explicitly encourages editors to add claims for which there is no broad consensus, as long as they are corroborated by references. Nevertheless, despite this essential link between content and references, Wikidata's ability to systematically assess and assure the quality of its references remains limited. To this end, we carry out a mixed-methods study to determine the relevance, ease of access, and authoritativeness of Wikidata references, at scale and in different languages, using online crowdsourcing, descriptive statistics, and machine learning. Building on previous work of ours, we run a series of microtasks experiments to evaluate a large corpus of references, sampled from Wikidata triples with labels in several languages. We use a consolidated, curated version of the crowdsourced assessments to train several machine learning models to scale up the analysis to the whole of Wikidata. The findings help us ascertain the quality of references in Wikidata and identify common challenges in defining and capturing the quality of user-generated multilingual structured data on the web. We also discuss ongoing editorial practices, which could encourage the use of higher-quality references in a more immediate way. All data and code used in the study are available on GitHub for feedback and further improvement and deployment by the research community.

References

[1]
B. Thomas Adler, Krishnendu Chatterjee, Luca de Alfaro, Marco Faella, Ian Pye, and Vishwanath Raman. 2008. Assigning trust to Wikipedia content. In Proceedings of the 4th International Symposium on Wikis (WikiSym'08). ACM, New York, NY, Article 26, 12 pages. https://doi.org/10.1145/1822258.1822293
[2]
Omar Alonso and Stefano Mizzaro. 2009. Can we get rid of TREC assessors? Using mechanical turk for relevance assessment. In Proceedings of the SIGIR Workshop on the Future of IR Evaluation, Vol. 15. R Publications, Amsterdam, Boston, MA, 16.
[3]
Maik Anderka, Benno Stein, and Nedim Lipka. 2012. Predicting quality flaws in user-generated content: The case of Wikipedia. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'12). ACM, New York, NY, 981–990. https://doi.org/10.1145/2348283.2348413
[4]
Vevake Balaraman, Simon Razniewski, and Werner Nutt. 2018. Recoin: Relative completeness in Wikidata. In Proceedings of the The Web Conference (WWW'18). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE, 1787–1792. https://doi.org/10.1145/3184558.3191641
[5]
Roi Blanco, Harry Halpin, Daniel M. Herzig, Peter Mika, Jeffrey Pound, Henry S. Thompson, and Thanh Tran Duc. 2011. Repeatable and reliable search system evaluation using crowdsourcing. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'11). ACM, New York, NY, 923–932. https://doi.org/10.1145/2009916.2010039
[6]
Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer. 2002. SMOTE: Synthetic minority over-sampling technique. J. Artific. Intell. Res. 16 (2002), 321–357.
[7]
Andrew Chisholm, Will Radford, and Ben Hachey. 2017. Learning to generate one-sentence biographies from Wikidata. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, Valencia, Spain, 633–642. Retrieved from https://aclanthology.org/E17-1060.
[8]
Djellel Difallah, Elena Filatova, and Panos Ipeirotis. 2018. Demographics and dynamics of mechanical turk workers. In Proceedings of the 11th ACM International Conference on Web Search and Data Mining (WSDM'18). ACM, New York, NY, 135–143. https://doi.org/10.1145/3159652.3159661
[9]
Pierpaolo Dondio and Stephen Barrett. 2007. Computational trust in web content quality: A comparative evaluation on the Wikipedia project. Informatica 31 (Jan. 2007), 151.
[10]
Fredo Erxleben, Michael Günther, Markus Krötzsch, Julian Mendez, and Denny Vrandečić. 2014. Introducing Wikidata to the linked data web. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 8796 (2014), 50–65.
[11]
Michael Färber, Frederic Bartscherer, Carsten Menne, and Achim Rettinger. 2018. Linked data quality of dbpedia, freebase, opencyc, Wikidata, and yago. Semantic Web 9, 1 (2018), 77–129.
[12]
Alvan R. Feinstein and Domenic V. Cicchetti. 1990. High agreement but low kappa: I. The problems of two paradoxes. J. Clin. Epidemiol. 43, 6 (1990), 543–549.
[13]
Besnik Fetahu, Katja Markert, Wolfgang Nejdl, and Avishek Anand. 2016. Finding news citations for Wikipedia. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management (CIKM'16). ACM, New York, NY, 337–346. https://doi.org/10.1145/2983323.2983808
[14]
Joseph L. Fleiss. 1971. Measuring nominal scale agreement among many raters.Psychol. Bull. 76, 5 (1971), 378.
[15]
Heather Ford, Shilad Sen, David R Musicant, and Nathaniel Miller. 2013. Getting to the source: Where does Wikipedia get its information from? In Proceedings of the 9th International Symposium on Open Collaboration (WikiSym'13). ACM, New York, NY, 9:1—9:10. https://doi.org/10.1145/2491055.2491064
[16]
Ujwal Gadiraju, Ricardo Kawase, Stefan Dietze, and Gianluca Demartini. 2015. Understanding Malicious Behavior in Crowdsourcing Platforms: The Case of Online Surveys. ACM, New York, NY, 1631–1640. https://doi.org/10.1145/2702123.2702443
[17]
Daniel Gerber, Diego Esteves, Jens Lehmann, Lorenz Bühmann, Ricardo Usbeck, Axel-Cyrille Ngonga Ngomo, and René Speck. 2015. Defacto—Temporal and multilingual deep fact validation. J. Web Semant. 35 (2015), 85–101.
[18]
Antonio Ghezzi, Donata Gabelloni, Antonella Martini, and Angelo Natalicchio. 2018. Crowdsourcing: A review and suggestions for future research. Int. J. Manage. Rev. 20, 2 (2018), 343–363.
[19]
Yolanda Gil and Varun Ratnakar. 2002. Trusting information sources one citizen at a time. In Proceedings of the 1st International Semantic Web Conference on The Semantic Web (ISWC'02). Springer-Verlag, Berlin, 162–176.
[20]
Hui Han, Wen-Yuan Wang, and Bing-Huan Mao. 2005. Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning. In Proceedings of the International Conference on Advances in Intelligent Computing (ICIC'05). Springer-Verlag, Berlin, 878–887. https://doi.org/10.1007/11538059_91
[21]
Andrew F. Hayes and Klaus Krippendorff. 2007. Answering the call for a standard reliability measure for coding data. Commun. Methods Measures 1, 1 (2007), 77–89. https://doi.org/10.1080/19312450709336664arXiv:https://doi.org/10.1080/19312450709336664
[22]
Haibo He, Yang Bai, Edwardo A. Garcia, and Shutao Li. 2008. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In Proceedings of the IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence). IEEE, Hong Kong, China, 1322–1328. https://doi.org/10.1109/IJCNN.2008.4633969
[23]
Mokter Hossain and Ilkka Kauranen. 2015. Crowdsourcing: A comprehensive literature review. Strat. Outsourc.: Int. J. 8 (02 2015), 2–22.
[24]
Meiqun Hu, Ee-Peng Lim, Aixin Sun, Hady Wirawan Lauw, and Ba-Quy Vuong. 2007. Measuring article quality in Wikipedia: Models and evaluation. In Proceedings of the 16th ACM Conference on Conference on Information and Knowledge Management (CIKM'07). ACM, New York, NY, 243–252. https://doi.org/10.1145/1321440.1321476
[25]
Yuichiro Ikemoto and Kazuhiro Kuwabara. 2019. On-the-spot knowledge refinement for an interactive recommender system. In Proceedings of the 11th International Conference on Agents and Artificial Intelligence (ICAART'19). INSTICC, SciTePress, Prague, Czech Republic, 817–823. https://doi.org/10.5220/0007571508170823
[26]
Ali Ismayilov, Dimitris Kontokostas, Sören Auer, Jens Lehmann, Sebastian Hellmann, et al. 2018. Wikidata through the eyes of DBpedia. Semantic Web 9, 4 (2018), 493–503.
[27]
Yubin Kim, K. Collins-Thompson, and J. Teevan. 2016. Using the crowd to improve search result ranking and the search experience. ACM Trans. Intell. Syst. Technol. 7 (2016), 50:1–50:24.
[28]
Aniket Kittur, Bongwon Suh, and Ed H. Chi. 2008. Can you ever trust a wiki? Impacting perceived trustworthiness in Wikipedia. In Proceedings of the ACM Conference on Computer Supported Cooperative Work (CSCW'08). ACM, New York, NY, 477–480. https://doi.org/10.1145/1460563.1460639
[29]
Michał Kąkol, Michał Jankowski-Lorek, Katarzyna Abramczuk, Adam Wierzbicki, and Michele Catasta. 2013. On the subjectivity and bias of web content credibility evaluations. In Proceedings of the 22nd International Conference on World Wide Web (WWW'13 Companion). ACM, New York, NY, 1131–1136. https://doi.org/10.1145/2487788.2488133
[30]
Jon M. Kleinberg. 1999. Authoritative sources in a hyperlinked environment. J. ACM 46, 5 (1999), 604–632.
[31]
J. Richard Landis and Gary G. Koch. 1977. The measurement of observer agreement for categorical data. Biometrics 33, 1 (1977), 159–174. http://www.jstor.org/stable/2529310.
[32]
Martha Larson, Paolo Cremonesi, and Alexandros Karatzoglou. 2014. Overview of ACM RecSys CrowdRec 2014 workshop: Crowdsourcing and human computation for recommender systems. In Proceedings of the 8th ACM Conference on Recommender Systems (RecSys'14). ACM, New York, NY, 381–382. https://doi.org/10.1145/2645710.2645783
[33]
Adam Lerer, Ledell Wu, Jiajun Shen, Timothee Lacroix, Luca Wehrstedt, Abhijit Bose, and Alex Peysakhovich. 2019. PyTorch-BigGraph: A Large-scale Graph Embedding System. Retrieved from https://arxiv:cs.LG/1903.12287.
[34]
Di Liu, Randolph G. Bias, Matthew Lease, and Rebecca Kuipers. 2012. Crowdsourcing for usability testing. Proc. Amer. Soc. Info. Sci. Technol. 49, 1 (2012), 1–10.
[35]
Rui Lopes and Luis Carriço. 2008. On the credibility of Wikipedia: An accessibility perspective. In Proceedings of the 2nd ACM Workshop on Information Credibility on the Web (WICOW'08). ACM, New York, NY, 27–34. https://doi.org/10.1145/1458527.1458536
[36]
Stanislav Malyshev, Markus Krötzsch, Larry González, Julius Gonsior, and Adrian Bielefeldt. 2018. Getting the most out of Wikidata: Semantic technology usage in Wikipedia's knowledge graph. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 11137 LNCS. 376–394.
[37]
Mia Massicotte and Kathleen Botter. 2017. Reference rot in the repository: A case study of electronic theses and dissertations (ETDs) in an academic library. Info. Technol. Libraries 36, 1 (2017), 11–28.
[38]
Günter Mühlberger, Johannes Zelger, and David Sagmeister. 2014. User-driven correction of OCR errors: Combining crowdsourcing and information retrieval technology. In Proceedings of the 1st International Conference on Digital Access to Textual Cultural Heritage (DATeCH'14). ACM, New York, NY, 53–56. https://doi.org/10.1145/2595188.2595212
[39]
Shuyo Nakatani. 2010. Language Detection Library for Java. Retrieved from https://github.com/shuyo/language-detection.
[40]
Luong Vuong Nguyen and Jason J. Jung. 2020. Crowdsourcing Platform for Collecting Cognitive Feedbacks from Users: A Case Study on Movie Recommender System. Springer International Publishing, Cham, 139–150. https://doi.org/10.1007/978-3-030-43412-0_9
[41]
Stefanie Nowak and Stefan Rüger. 2010. How reliable are annotations via crowdsourcing: A study about inter-annotator agreement for multi-label image annotation. In Proceedings of the International Conference on Multimedia Information Retrieval (MIR'10). ACM, New York, NY, 557–566. https://doi.org/10.1145/1743384.1743478
[42]
Thomas Pellissier Tanon, Denny Vrandečić, Sebastian Schaffert, Thomas Steiner, and Lydia Pintscher. 2016. From freebase to Wikidata: The great migration. In Proceedings of the 25th International Conference on World Wide Web (WWW'16). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE, 1419–1428. https://doi.org/10.1145/2872427.2874809
[43]
Gordon Pennycook and David G. Rand. 2019. Fighting misinformation on social media using crowdsourced judgments of news source quality. Proc. Natl. Acad. Sci. U.S.A. 116, 7 (2019), 2521–2526.
[44]
Alessandro Piscopo, Lucie-Aimée Kaffee, Chris Phethean, and Elena Simperl. 2017. Provenance information in a collaborative knowledge graph: An evaluation of Wikidata external references. In Proceedings of the International Semantic Web Conference, Claudia D'Amato, Miriam Fernandez, Valentina Tamma, Freddy Lecue, Philippe Cudré-Mauroux, Juan Sequeda, Christoph Lange, and Jeff Heflin (Eds.). Lecture Notes in Computer Science, Vol. 10587. Springer International Publishing, Cham, 542–558. https://doi.org/10.1007/978-3-319-68288-4_32
[45]
Alessandro Piscopo, Christopher Phethean, and Elena Simperl. 2017. Wikidatians are born: Paths to full participation in a collaborative structured knowledge base. In Proceedings of the 50th Hawaii International Conference on System Sciences. University of Hawaii, Honolulu, 4354–4363. Retrieved from https://eprints.soton.ac.uk/401755/.
[46]
Alessandro Piscopo and Elena Simperl. 2019. What we talk about when we talk about Wikidata quality: A literature survey. In Proceedings of the 15th International Symposium on Open Collaboration (OpenSym'19). ACM, New York, NY, Article 17, 11 pages. https://doi.org/10.1145/3306446.3340822
[47]
Justus Randolph. 2010. Free-Marginal Multirater Kappa (multirater kfree): An Alternative to Fleiss Fixed-Marginal Multirater Kappa. Joensuu University Learning and Instruction Symposium, Oct 14-15, 2005, Joensuu, Finland.
[48]
Tom Simonite. 2019. Inside the Alexa-Friendly World of Wikidata. Retrieved from https://www.wired.com/story/inside-the-alexa-friendly-world-of-wikidata/.
[49]
Harsh Thakkar, Kemele M. Endris, Jose M. Gimenez-Garcia, Jeremy Debattista, Christoph Lange, and Sören Auer. 2016. Are linked datasets fit for open-domain question answering? A quality assessment. In Proceedings of the 6th International Conference on Web Intelligence, Mining and Semantics (WIMS'16). ACM, New York, NY, Article 19, 12 pages. https://doi.org/10.1145/2912845.2912857
[50]
James Thorne, Andreas Vlachos, Christos Christodoulopoulos, and Arpit Mittal. 2018. FEVER: A large-scale dataset for fact extraction and VERification. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, New Orleans, LA, 809–819. https://doi.org/10.18653/v1/N18-1074
[51]
Katherine Thornton, Euan Cochrane, Thomas Ledoux, Bertrand Caron, and Carl Wilson. 2017. Modeling the domain of digital preservation in Wikidata. In Proceedings of the ACM International Conference on Digital Preservation. ACM, Kyoto, Japan, 10.
[52]
Paraskevi Tzekou., Sofia Stamou., Nikos Kirtsis., and Nikos Zotos.2011. Quality assessment of Wikipedia external links. In Proceedings of the 7th International Conference on Web Information Systems and Technologies (WEBIST'11). INSTICC, SciTePress, Noordwijkerhout, Netherlands, 248–254. https://doi.org/10.5220/0003299502480254
[53]
Denny Vrandečić. 2013. The rise of Wikidata. IEEE Intell. Syst. 28, 4 (2013), 90–95.
[54]
Denny Vrandečić. 2012. Wikidata: A new platform for collaborative data collection. In Proceedings of the 21st International Conference on World Wide Web (WWW'12 Companion). ACM, New York, NY, 1063–1064. https://doi.org/10.1145/2187980.2188242
[55]
Wikidata. 2020. Help:Sources—Wikidata. Retrieved from https://www.wikidata.org/wiki/Help:Sources.
[56]
Wikidata. 2020. Statistics—Wikidata. Retrieved from https://www.wikidata.org/wiki/Special:Statistics.
[57]
Wikidata. 2020. Wikidata:Licensing—Wikidata. Retrieved from https://www.wikidata.org/wiki/Wikidata:Licensing.
[58]
Wikidata. 2020. Wikidata:List of policies and guidelines—Wikidata. Retrieved from https://www.wikidata.org/wiki/Wikidata:List_of_policies_and_guidelines.
[59]
Wikidata. 2020. Wikidata:Verifiability—Wikidata. Retrieved from https://www.wikidata.org/wiki/Wikidata:Verifiability.
[60]
Wikipedia. 2020. Wikipedia:Edit warring—Wikipedia. Retrieved from https://en.wikipedia.org/wiki/Wikipedia:Edit_warring.
[61]
Wikipedia. 2020. Wikipedia:Verifiability—Wikipedia. Retrieved from https://en.wikipedia.org/wiki/Wikipedia:Verifiability.
[62]
Thomas Wöhner and Ralf Peters. 2009. Assessing the quality of Wikipedia articles with lifecycle-based metrics. In Proceedings of the 5th International Symposium on Wikis and Open Collaboration (WikiSym'09). ACM, New York, NY, 16:1—16:10. https://doi.org/10.1145/1641309.1641333
[63]
Zhiqiang Wu. 2009. An empirical study of the accessibility of web references in two Chinese academic journals. Scientometrics 78, 3 (2009), 481–503. https://doi.org/10.1007/s11192-007-1951-1
[64]
Aifang Xu, X. Feng, and Y. Tian. 2015. Revealing, characterizing, and detecting crowdsourcing spammers: A case study in community Q&A. In Proceedings of the IEEE Conference on Computer Communications (INFOCOM'15). 2533–2541.
[65]
Amrapali Zaveri, Anisa Rula, Andrea Maurino, Ricardo Pietrobon, Jens Lehmann, and Soeren Auer. 2016. Quality assessment for linked data: A survey. Semantic Web 7, 1 (2016), 63–93.
[66]
Honglei Zeng, Maher A. Alhossaini, Li Ding, Richard Fikes, and Deborah L. McGuinness. 2006. Computing trust from revision history. In Proceedings of the International Conference on Privacy, Security and Trust: Bridge the Gap Between PST Technologies and Business Services (PST'06). ACM, New York, NY, Article 8, 1 pages. https://doi.org/10.1145/1501434.1501445

Cited By

View all
  • (2024)What Makes a High-Quality Training Dataset for Large Language Models: A Practitioners' PerspectiveProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695061(656-668)Online publication date: 27-Oct-2024
  • (2024)The State of Pilot Study Reporting in Crowdsourcing: A Reflection on Best Practices and GuidelinesProceedings of the ACM on Human-Computer Interaction10.1145/36410238:CSCW1(1-45)Online publication date: 26-Apr-2024
  • (2024)Evaluating the quality of student-generated content in learnersourcing: A large language model based approachEducation and Information Technologies10.1007/s10639-024-12851-4Online publication date: 17-Jul-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Journal of Data and Information Quality
Journal of Data and Information Quality  Volume 13, Issue 4
December 2021
84 pages
ISSN:1936-1955
EISSN:1936-1963
DOI:10.1145/3477245
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 October 2021
Accepted: 01 September 2021
Revised: 01 May 2021
Received: 01 January 2021
Published in JDIQ Volume 13, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Wikidata
  2. crowdsourcing
  3. verifiability
  4. data quality
  5. knowledge graphs

Qualifiers

  • Research-article
  • Refereed

Funding Sources

  • European Union's Horizon 2020

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)60
  • Downloads (Last 6 weeks)11
Reflects downloads up to 17 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)What Makes a High-Quality Training Dataset for Large Language Models: A Practitioners' PerspectiveProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695061(656-668)Online publication date: 27-Oct-2024
  • (2024)The State of Pilot Study Reporting in Crowdsourcing: A Reflection on Best Practices and GuidelinesProceedings of the ACM on Human-Computer Interaction10.1145/36410238:CSCW1(1-45)Online publication date: 26-Apr-2024
  • (2024)Evaluating the quality of student-generated content in learnersourcing: A large language model based approachEducation and Information Technologies10.1007/s10639-024-12851-4Online publication date: 17-Jul-2024
  • (2024)Explainable Classification of Wiki StreamsInformation Systems and Technologies10.1007/978-3-031-45642-8_7(75-84)Online publication date: 16-Feb-2024
  • (2023)ProVe: A pipeline for automated provenance verification of knowledge graphs against textual sourcesSemantic Web10.3233/SW-233467(1-34)Online publication date: 12-Sep-2023
  • (2023)Optimization Techniques for Focused Web Crawlers2023 3rd International Conference on Technological Advancements in Computational Sciences (ICTACS)10.1109/ICTACS59847.2023.10390228(1500-1507)Online publication date: 1-Nov-2023
  • (2023)Investigating the potential of the semantic web for education: Exploring Wikidata as a learning platformEducation and Information Technologies10.1007/s10639-023-11664-128:10(12565-12614)Online publication date: 13-Mar-2023
  • (2022)Simulation, modelling and classification of wiki contributors: Spotting the good, the bad, and the uglySimulation Modelling Practice and Theory10.1016/j.simpat.2022.102616120(102616)Online publication date: Nov-2022
  • (2022)WDV: A Broad Data Verbalisation Dataset Built from WikidataThe Semantic Web – ISWC 202210.1007/978-3-031-19433-7_32(556-574)Online publication date: 23-Oct-2022
  • (2022)Statistical and Neural Methods for Cross-lingual Entity Label Mapping in Knowledge GraphsText, Speech, and Dialogue10.1007/978-3-031-16270-1_4(39-51)Online publication date: 5-Sep-2022

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media