Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3308560.3316465acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

NIFify: Towards Better Quality Entity Linking Datasets

Published: 13 May 2019 Publication History

Abstract

The Entity Linking (EL) task identifies entity mentions in a text corpus and associates them with a corresponding unambiguous entry in a Knowledge Base. The evaluation of EL systems relies on the comparison of their results against gold standards. A common format used to represent gold standard datasets is the NLP Interchange Format (NIF), which uses RDF as a data model. However, creating gold standard datasets for EL is a time-consuming and error-prone process. In this paper we propose a tool called NIFify to help manually generate, curate, visualize and validate EL annotations; the resulting tool is useful, for example, in the creation of gold standard datasets. NIFify also serves as a benchmark tool that enables the assessment of EL results. Using the validation features of NIFify, we further explore the quality of popular EL gold standards.

References

[1]
Carmen Brando, Francesca Frontini, and Jean-Gabriel Ganascia. 2016. REDEN: named entity linking in digital literary editions using linked data sets. CSIMQ7(2016), 60–80.
[2]
Martin Brümmer, Milan Dojchinovski, and Sebastian Hellmann. 2016. DBpedia Abstracts: A Large-Scale, Open, Multilingual NLP Training Corpus. In LREC.
[3]
Marco Cornolti, Paolo Ferragina, and Massimiliano Ciaramita. 2013. A framework for benchmarking entity-annotation systems. In World Wide Web Conference.
[4]
Silviu Cucerzan. 2007. Large-Scale Named Entity Disambiguation Based on Wikipedia Data. EMNLP-CoNLL (2007), 708.
[5]
G. de Melo and G. Weikum. 2008. Language as a foundation of the Semantic Web. In Proceedings of the Poster and Demonstration Session at the 7th International Semantic Web Conference.
[6]
Sebastian Hellmann, Jens Lehmann, Sören Auer, and Martin Brümmer. 2013. Integrating NLP Using Linked Data. In ISWC. 98–113.
[7]
Johannes Hoffart and et al.2011. Robust disambiguation of named entities in text. In EMNLP. ACL, 782–792.
[8]
Johannes Hoffart, Stephan Seufert, Dat Ba Nguyen, Martin Theobald, and Gerhard Weikum. 2012. KORE: keyphrase overlap relatedness for entity disambiguation. In CIKM. 545–554.
[9]
Kunal Jha, Michael Röder, and Axel-Cyrille Ngonga Ngomo. 2017. All that Glitters Is Not Gold - Rule-Based Curation of Reference Datasets for Named Entity Recognition and Entity Linking. In ESWC. 305–320.
[10]
Sayali Kulkarni, Amit Singh, Ganesh Ramakrishnan, and Soumen Chakrabarti. 2009. Collective annotation of Wikipedia entities in web text. In SIGKDD. 457–466.
[11]
Pablo N Mendes, Max Jakob, Andrés García-Silva, and Christian Bizer. 2011. DBpedia spotlight: shedding light on the web of documents. In I-SEMANTICS. ACM, 1–8.
[12]
A.L. Minard and et al.2016. MEANTIME, the NewsReader multilingual event and time corpus. (2016).
[13]
Andrea Moro and Roberto Navigli. 2015. SemEval-2015 Task 13: Multilingual All-Words Sense Disambiguation and Entity Linking. In SemEval@ NAACL-HLT. 288–297.
[14]
Axel-Cyrille Ngonga Ngomo, Michael Röder, Diego Moussallem, Ricardo Usbeck, and René Speck. 2018. BENGAL: An Automatic Benchmark Generator for Entity Recognition and Linking. In Proceedings of the 11th International Conference on Natural Language Generation. 339–349.
[15]
Fabian Odoni, Philipp Kuntschik, Adrian M. P. Brasoveanu, and Albert Weichselbraun. 2018. On the Importance of Drill-Down Analysis for Assessing Gold Standards and Named Entity Linking Performance. In SEMANTICS. 33–42.
[16]
Lev Ratinov, Dan Roth, Doug Downey, and Mike Anderson. 2011. Local and global algorithms for disambiguation to Wikipedia. In NAACL-HLT. 1375–1384.
[17]
Michael Röder, Ricardo Usbeck, Sebastian Hellmann, Daniel Gerber, and Andreas Both. 2014. N3 - A Collection of Datasets for Named Entity Recognition and Disambiguation in the NLP Interchange Format. In LREC. 3529–3533.
[18]
Henry Rosales-Méndez, Aidan Hogan, and Barbara Poblete. 2018. VoxEL: A Benchmark Dataset for Multilingual Entity Linking. In ISWC. 170–186.
[19]
Henry Rosales-Méndez, Barbara Poblete, and Aidan Hogan. 2018. What Should Entity Linking link?. In AMW.
[20]
Ricardo Usbeck and et al.2015. GERBIL: General Entity Annotator Benchmarking Framework. In WWW. 1133–1143.
[21]
Marieke van Erp, Pablo Mendes, Heiko Paulheim, Filip Ilievski, Julien Plu, Giuseppe Rizzo, and Joerg Waitelonis. 2016. Evaluating entity linking: An analysis of current benchmark datasets and a roadmap for doing a better job. In LREC.
[22]
Jörg Waitelonis, Claudia Exeler, and Harald Sack. 2015. Linked data enabled generalized vector space model to improve document retrieval. In NLP & DBpedia @ ISWC.

Cited By

View all
  • (2021)A step further towards a consensus on linking tweets to WikipediaEvolutionary Intelligence10.1007/s12065-020-00549-816:6(1825-1840)Online publication date: 1-Feb-2021
  • (2020)Fine-Grained Entity LinkingJournal of Web Semantics10.1016/j.websem.2020.10060065(100600)Online publication date: Dec-2020

Index Terms

  1. NIFify: Towards Better Quality Entity Linking Datasets
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image ACM Other conferences
          WWW '19: Companion Proceedings of The 2019 World Wide Web Conference
          May 2019
          1331 pages
          ISBN:9781450366755
          DOI:10.1145/3308560
          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          In-Cooperation

          • IW3C2: International World Wide Web Conference Committee

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          Published: 13 May 2019

          Permissions

          Request permissions for this article.

          Check for updates

          Author Tags

          1. Benchmark Dataset
          2. Entity Linking
          3. Information Extraction

          Qualifiers

          • Research-article
          • Research
          • Refereed limited

          Conference

          WWW '19
          WWW '19: The Web Conference
          May 13 - 17, 2019
          San Francisco, USA

          Acceptance Rates

          Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)3
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 27 Jan 2025

          Other Metrics

          Citations

          Cited By

          View all
          • (2021)A step further towards a consensus on linking tweets to WikipediaEvolutionary Intelligence10.1007/s12065-020-00549-816:6(1825-1840)Online publication date: 1-Feb-2021
          • (2020)Fine-Grained Entity LinkingJournal of Web Semantics10.1016/j.websem.2020.10060065(100600)Online publication date: Dec-2020

          View Options

          Login options

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format.

          HTML Format

          Figures

          Tables

          Media

          Share

          Share

          Share this Publication link

          Share on social media