Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3383583.3398597acmconferencesArticle/Chapter ViewAbstractPublication PagesjcdlConference Proceedingsconference-collections
short-paper

Linking Named Entities across Languages using Multilingual Word Embeddings

Published: 01 August 2020 Publication History

Abstract

Digital libraries are online collections of digital objects that can include text, images, audio, or videos in several languages. It has long been observed that named entities (NEs) are key to the access to digital library portals as they are contained in most user queries. However, NEs can have different spellings for each language which reduces the performance of user queries to retrieve documents across languages. Cross-lingual named entity linking (XEL) connects NEs from documents in a source language to external knowledge bases in another (target) language. The XEL task is especially challenging due to the diversity of NEs across languages and contexts. This paper describes an XEL system applied and evaluated with several languages pairs including English and various low-resourced languages of different linguistic families such as Croatian, Finnish, Estonian, and Slovenian. We tested this approach to analyze documents and NEs in low-resourced languages and link them to the English version of Wikipedia. We present the resulting study of this analysis and the challenges involved in the case of degraded documents from digital libraries. Further works will make an extensive analysis of the impact of our approach on the XEL task with OCRed documents.

Supplementary Material

MP4 File (3383583.3398597.mp4)
Video presentation of the JCDL 2020 paper "Linking Named Entities across Languages using Multilingual Word Embeddings" by Elvys Linhares Pontes, Jose G. Moreno and Antoine Doucet

References

[1]
Diego Ceccarelli, Claudio Lucchese, Salvatore Orlando, Raffaele Perego, and Salvatore Trani. 2013. Learning Relatedness Measures for Entity Linking. In Proceedings of the 22Nd ACM International Conference on Information & Knowledge Management (CIKM '13). ACM, New York, NY, USA, 139--148.
[2]
Xilun Chen and Claire Cardie. 2018. Unsupervised Multilingual Word Embeddings. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. ACL, Brussels, Belgium, 261--270.
[3]
Guillaume Chiron, Antoine Doucet, Mickaël Coustaty, Muriel Visani, and Jean-Philippe Moreux. 2017. Impact of OCR errors on the use of digital libraries: towards a better access to information. In Proceedings of the 17th ACM/IEEE Joint Conference on Digital Libraries. IEEE Press, 249--252.
[4]
Alexis Conneau, Guillaume Lample, Marc'Aurelio Ranzato, Ludovic Denoyer, and Hervé Jégou. 2018. Word Translation Without Parallel Data. In International Conference on Learning Representations (ICLR '18).
[5]
Evgeniy Gabrilovich, Michael Ringgaard, and Amarnag Subramanya. 2013. FACC1: Freebase annotation of ClueWeb corpora, Version 1 .
[6]
Octavian-Eugen Ganea and Thomas Hofmann. 2017. Deep Joint Entity Disambiguation with Local Neural Attention. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. ACL, 2619--2629.
[7]
Zhaochen Guo and Denilson Barbosa. 2014. Robust Entity Linking via Random Walks. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management. ACM, New York, NY, USA, 499--508.
[8]
Johannes Hoffart, Mohamed Amir Yosef, Ilaria Bordino, Hagen Fürstenau, Manfred Pinkal, Marc Spaniol, Bilyana Taneva, Stefan Thater, and Gerhard Weikum. 2011. Robust Disambiguation of Named Entities in Text. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP '11). Association for Computational Linguistics, Stroudsburg, PA, USA, 782--792.
[9]
Nikolaos Kolitsas, Octavian-Eugen Ganea, and Thomas Hofmann. 2018. End-to-End Neural Entity Linking. In Proceedings of the 22nd Conference on Computational Natural Language Learning. Association for Computational Linguistics, 519--529.
[10]
Phong Le and Ivan Titov. 2018. Improving Entity Linking by Modeling Latent Relations between Mentions. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. ACL, 1595--1604.
[11]
Elvys Linhares Pontes, Ahmed Hamdi, Nicolas Sidere, and Antoine Doucet. 2019. Impact of OCR Quality on Named Entity Linking. In Digital Libraries at the Crossroads of Digital Information for the Future, Adam Jatowt, Akira Maeda, and Sue Yeon Syn (Eds.). Springer International Publishing, Cham, 102--115.
[12]
Paul McNamee, James Mayfield, Dawn Lawrie, Douglas Oard, and David Doermann. 2011. Cross-Language Entity Linking. In Proceedings of 5th International Joint Conference on Natural Language Processing. Asian Federation of Natural Language Processing, Chiang Mai, Thailand, 255--263.
[13]
David Milne and Ian H. Witten. 2008. Learning to Link with Wikipedia. In CIKM '08: Proceeding of the 17th ACM conference on Information and knowledge mining. ACM, New York, NY, USA, 509--518.
[14]
Xiaoman Pan, Boliang Zhang, Jonathan May, Joel Nothman, Kevin Knight, and Heng Ji. 2017. Cross-lingual Name Tagging and Linking for 282 Languages. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. ACL, Vancouver, Canada, 1946--1958.
[15]
Jonathan Raiman and Olivier Raiman. 2018. DeepType: Multilingual Entity Linking by Neural Type System Evolution. In AAAI Conference on Artificial Intelligence. 5406--5413.
[16]
Lev Ratinov, Dan Roth, Doug Downey, and Mike Anderson. 2011. Local and Global Algorithms for Disambiguation to Wikipedia. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (HLT '11). ACL, Stroudsburg, PA, USA, 1375--1384.
[17]
Shruti Rijhwani, Jiateng Xie, Graham Neubig, and Jaime Carbonell. 2019. Zero-shot Neural Transfer for Cross-lingual Entity Linking. In Thirty-Third AAAI Conference on Artificial Intelligence (AAAI). Honolulu, Hawaii.
[18]
W. Shen, J. Wang, and J. Han. 2015. Entity Linking with a Knowledge Base: Issues, Techniques, and Solutions. IEEE Transactions on Knowledge and Data Engineering, Vol. 27, 2 (Feb 2015), 443--460.
[19]
Ikuya Yamada, Hiroyuki Shindo, Hideaki Takeda, and Yoshiyasu Takefuji. 2016. Joint Learning of the Embedding of Words and Entities for Named Entity Disambiguation. In Proceedings of The 20th Conference on Computational Natural Language Learning. ACL, Berlin, Germany, 250--259.
[20]
Shuyan Zhou, Shruti Rijhwani, and Graham Neubig. 2019. Towards Zero-resource Cross-lingual Entity Linking. In Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo 2019). ACL, China, 243--252.

Cited By

View all

Index Terms

  1. Linking Named Entities across Languages using Multilingual Word Embeddings

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    JCDL '20: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020
    August 2020
    611 pages
    ISBN:9781450375856
    DOI:10.1145/3383583
    © 2020 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 August 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. cross-lingual named entity linking
    2. digital library
    3. indexing
    4. multilingual word embeddings

    Qualifiers

    • Short-paper

    Funding Sources

    • Horizon 2020

    Conference

    JCDL '20
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 415 of 1,482 submissions, 28%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)13
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 12 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Intelligent Text Processing: A Review of Automated Summarization MethodsVirtual Communication and Social Networks10.21603/2782-4799-2024-3-3-203-2223:3(203-222)Online publication date: 1-Oct-2024
    • (2022)In-depth analysis of the impact of OCR errors on named entity recognition and linkingNatural Language Engineering10.1017/S135132492200011029:2(425-448)Online publication date: 18-Mar-2022
    • (2022)MICEKnowledge-Based Systems10.1016/j.knosys.2021.107606235:COnline publication date: 10-Jan-2022
    • (2021)MELHISSA: a multilingual entity linking architecture for historical press articlesInternational Journal on Digital Libraries10.1007/s00799-021-00319-623:2(133-160)Online publication date: 29-Nov-2021
    • (2020)Entity Linking for Historical Documents: Challenges and SolutionsDigital Libraries at Times of Massive Societal Transition10.1007/978-3-030-64452-9_19(215-231)Online publication date: 26-Nov-2020

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media