Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Robust named entity disambiguation with random walks

Published: 01 January 2018 Publication History

Abstract

Named Entity Disambiguation is the task of assigning entities from a Knowledge Graph (KG) to mentions of such entities in a textual document. The state-of-the-art for this task balances two disparate sources of similarity: lexical, defined as the pairwise similarity between mentions in the text and names of entities in the KG; and semantic, defined through some graph-theoretic property of a subgraph of the KG induced by the choice of entities for each mention. Departing from previous work, our notion of semantic similarity is rooted in Information Theory and is defined as the mutual information between random walks on the disambiguation graph induced by choice of entities for each mention. We describe an iterative algorithm based on this idea, and show an extension that uses learning-to-rank, which yields further improvements. Our experimental evaluation demonstrates that this approach is robust and very competitive on well-known existing benchmarks. We also justify the need for new and more difficult benchmarks, and provide an extensive experimental comparison of our method and previous work on these new benchmarks.

References

[1]
E. Agirre, E. Alfonseca, K.B. Hall, J. Kravalova, M. Pasca and A. Soroa, A study on similarity and relatedness using distributional and WordNet-based approaches, in: Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, Proceedings, Boulder, Colorado, USA, May 31–June 5, 2009, ACL, 2009, pp. 19–27, http://www.aclweb.org/anthology/N09-1003.
[2]
A. Bagga and B. Baldwin, Entity-based cross-document coreferencing using the vector space model, in: 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, COLING-ACL ’98, Proceedings of the Conference, Université de Montréal, Montréal, Quebec, Canada, August 10–14, 1998, Morgan Kaufmann Publishers / ACL, 1998, pp. 79–85, http://aclweb.org/anthology/P/P98/P98-1012.pdf.
[3]
R.C. Bunescu and M. Pasca, Using encyclopedic knowledge for named entity disambiguation, in: EACL 2006, 11st Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference, Trento, Italy, April 3–7, 2006, D. McCarthy and S. Wintner, eds, ACL, 2006, http://aclweb.org/anthology/E/E06/E06-1002.pdf.
[4]
C.J.C. Burges, R. Ragno and Q.V. Le, Learning to rank with nonsmooth cost functions, in: Advances in Neural Information Processing Systems 19, Proceedings of the Twentieth Annual Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada, December 4–7, 2006, B. Schölkopf, J.C. Platt and T. Hofmann, eds, MIT Press, 2006, pp. 193–200, http://papers.nips.cc/paper/2971-learning-to-rank-with-nonsmooth-cost-functions.
[5]
Z. Cao, T. Qin, T.-Y. Liu, M.-F. Tsai and H. Li, Learning to rank: From pairwise approach to listwise approach, in: Machine Learning, Proceedings of the Twenty-Fourth International Conference (ICML 2007), Corvallis, Oregon, USA, June 20–24, 2007, Z. Ghahramani, ed., ACM International Conference Proceeding Series, Vol. 227, ACM, 2007, pp. 129–136.
[6]
X. Cheng and D. Roth, Relational inference for wikification, in: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, a Meeting of SIGDAT, a Special Interest Group of the ACL, Grand Hyatt Seattle, Seattle, Washington, USA, 18–21 October 2013, ACL, 2013, pp. 1787–1796, http://aclweb.org/anthology/D/D13/D13-1184.pdf.
[7]
S. Cucerzan, Large-scale named entity disambiguation based on Wikipedia data, in: EMNLP-CoNLL 2007, Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Prague, Czech Republic, June 28–30, 2007, J. Eisner, ed., ACL, 2007, pp. 708–716, http://www.aclweb.org/anthology/D07-1074.
[8]
M. Dredze, P. McNamee, D. Rao, A. Gerber and T. Finin, Entity disambiguation for knowledge base population, in: COLING 2010, 23rd International Conference on Computational Linguistics, Proceedings of the Conference, Beijing, China, 23–27 August 2010, C.-R. Huang and D. Jurafsky, eds, Tsinghua University Press, 2010, pp. 277–285, http://aclweb.org/anthology/C10-1032.
[9]
O.-E. Ganea, M. Ganea, A. Lucchi, C. Eickhoff and T. Hofmann, Probabilistic bag-of-hyperlinks model for entity linking, in: Proceedings of the 25th International Conference on World Wide Web, WWW 2016, Montreal, Canada, April 11–15, 2016, J. Bourdeau, J. Hendler, R. Nkambou, I. Horrocks and B.Y. Zhao, eds, ACM, 2016, pp. 927–938.
[10]
M.R. Garey and D.S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness, W.H. Freeman, 1979. ISBN 0-7167-1044-7.
[11]
Z. Guo and D. Barbosa, Robust entity linking via random walks, in: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, CIKM 2014, Shanghai, China, November 3–7, 2014, J. Li, X.S. Wang, M.N. Garofalakis, I. Soboroff, T. Suel and M. Wang, eds, ACM, 2014, pp. 499–508.
[12]
X. Han and L. Sun, An entity-topic model for entity linking, in: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL 2012, Jeju Island, Korea, July 12–14, 2012, J. Tsujii, J. Henderson and M. Pasca, eds, ACL, 2012, pp. 105–115, http://www.aclweb.org/anthology/D12-1010.
[13]
X. Han, L. Sun and J. Zhao, Collective entity linking in web text: A graph-based method, in: Proceeding of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2011, Beijing, China, July 25–29, 2011, W.-Y. Ma, J.-Y. Nie, R.A. Baeza-Yates, T.-S. Chua and W.B. Croft, eds, ACM, 2011, pp. 765–774.
[14]
J. Hoffart, S. Seufert, D.B. Nguyen, M. Theobald and G. Weikum, KORE: Keyphrase overlap relatedness for entity disambiguation, in: 21st ACM International Conference on Information and Knowledge Management, CIKM’12, Maui, HI, USA, October 29–November 02, 2012, X.-W. Chen, G. Lebanon, H. Wang and M.J. Zaki, eds, ACM, 2012, pp. 545–554.
[15]
J. Hoffart, M.A. Yosef, I. Bordino, H. Fürstenau, M. Pinkal, M. Spaniol, B. Taneva, S. Thater and G. Weikum, Robust disambiguation of named entities in text, in: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, EMNLP 2011, a Meeting of SIGDAT, a Special Interest Group of the ACL, John McIntyre Conference Centre, Edinburgh, UK, 27–31 July 2011, ACL, 2011, pp. 782–792, http://www.aclweb.org/anthology/D11-1072.
[16]
T. Hughes and D. Ramage, Lexical semantic relatedness with random graph walks, in: EMNLP-CoNLL 2007, Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Prague, Czech Republic, June 28–30, 2007, J. Eisner, ed., ACL, 2007, pp. 581–589, http://www.aclweb.org/anthology/D07-1061.
[17]
H. Ji and R. Grishman, Knowledge base population: Successful approaches and challenges, in: The 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference, Portland, Oregon, USA, 19–24 June, 2011, D. Lin, Y. Matsumoto and R. Mihalcea, eds, ACL, 2011, pp. 1148–1158, http://www.aclweb.org/anthology/P11-1115.
[18]
S. Kataria, K.S. Kumar, R. Rastogi, P. Sen and S.H. Sengamedu, Entity disambiguation with hierarchical topic models, in: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, USA, August 21–24, 2011, C. Apté, J. Ghosh and P. Smyth, eds, ACM, 2011, pp. 1037–1045.
[19]
G. Kondrak, N-gram similarity and distance, in: String Processing and Information Retrieval, 12th International Conference, SPIRE 2005, Proceedings, Buenos Aires, Argentina, November 2–4, 2005, M.P. Consens and G. Navarro, eds, Lecture Notes in Computer Science, Vol. 3772, Springer, 2005, pp. 115–126.
[20]
S. Kulkarni, A. Singh, G. Ramakrishnan and S. Chakrabarti, Collective annotation of Wikipedia entities in web text, in: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, France, June 28–July 1, 2009, J.F. Elder IV, F. Fogelman-Soulié, P.A. Flach and M.J. Zaki, eds, ACM, 2009, pp. 457–466.
[21]
H. Li, Learning to Rank for Information Retrieval and Natural Language Processing. Synthesis Lectures on Human Language Technologies, Morgan & Claypool Publishers, 2011.
[22]
Y. Li, C. Wang, F. Han, J. Han, D. Roth and X. Yan, Mining evidences for named entity disambiguation, in: The 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2013, Chicago, IL, USA, August 11–14, 2013, I.S. Dhillon, Y. Koren, R. Ghani, T.E. Senator, P. Bradley, R. Parekh, J. He, R.L. Grossman and R. Uthurusamy, eds, ACM, 2013, pp. 1070–1078.
[23]
P.N. Mendes, M. Jakob, A. García-Silva and C. Bizer, Dbpedia spotlight: Shedding light on the web of documents, in: CProceedings the 7th International Conference on Semantic Systems, I-SEMANTICS 2011, Graz, Austria, September 7–9, 2011, C. Ghidini, A.-C.N. Ngomo, S.N. Lindstaedt and T. Pellegrini, eds, ACM International Conference Proceeding Series, ACM, 2011, pp. 1–8.
[24]
R. Mihalcea and A. Csomai, Wikify!: Linking documents to encyclopedic knowledge, in: Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management, CIKM 2007, Lisbon, Portugal, November 6–10, 2007, M.J. Silva, A.H.F. Laender, R.A. Baeza-Yates, D.L. McGuinness, B. Olstad, Ø.H. Olsen and A.O. Falcão, eds, ACM, 2007, pp. 233–242.
[25]
G.A. Miller, WordNet: A lexical database for English, Communications of the ACM 38(11) (1995), 39–41.
[26]
D.N. Milne and I.H. Witten, Learning to link with Wikipedia, in: Proceedings of the 17th ACM Conference on Information and Knowledge Management, CIKM 2008, Napa Valley, California, USA, October 26–30, 2008, J.G. Shanahan, S. Amer-Yahia, I. Manolescu, Y. Zhang, D.A. Evans, A. Kolcz, K.-S. Choi and A. Chowdhury, eds, ACM, 2008, pp. 26–30.
[27]
D.N. Milne and I.H. Witten, An effective, low-cost measure of semantic relatedness obtained from Wikipedia links, in: Wikipedia and Artificial Intelligence: An Evolving Synergy, Papers from the 2008 AAAI Workshop, Chicago, Illinois, USA, July 13–14, 2008, R. Bunescu, E. Gabrilovich and R. Mihalcea, eds, AAAI Press, 2008, http://www.aaai.org/Papers/Workshops/2008/WS-08-15/WS08-15-005.pdf.
[28]
A. Moro, A. Raganato and R. Navigli, Entity linking meets word sense disambiguation: A unified approach, Transactions of the Association for Computational Linguistics 2 (2014), 231–244, https://tacl2013.cs.columbia.edu/ojs/index.php/tacl/article/view/291.
[29]
F. Piccinno and P. Ferragina, From tagme to WAT: A new entity annotator, in: ERD’14, Proceedings of the First ACM International Workshop on Entity Recognition & Disambiguation, Gold Coast, Queensland, Australia, July 11, 2014, D. Carmel, M.-W. Chang, E. Gabrilovich, B.-J.P. Hsu and K. Wang, eds, ACM, 2014, pp. 55–62.
[30]
M.T. Pilehvar, D. Jurgens and R. Navigli, Align, disambiguate and walk: A unified approach for measuring semantic similarity, in: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, ACL 2013, Volume 1: Long Papers, Sofia, Bulgaria, 4–9 August 2013, ACL, 2013, pp. 1341–1351, http://aclweb.org/anthology/P/P13/P13-1132.pdf.
[31]
L.-A. Ratinov, D. Roth, D. Downey and M. Anderson, Local and global algorithms for disambiguation to Wikipedia, in: The 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference, Portland, Oregon, USA, 19–24 June, 2011, D. Lin, Y. Matsumoto and R. Mihalcea, eds, ACL, pp. 1375–1384, http://www.aclweb.org/anthology/P11-1138.
[32]
G. Rizzo, M. van Erp and R. Troncy, Benchmarking the extraction and disambiguation of named entities on the Semantic Web, in: Proceedings of the Ninth International Conference on Language Resources and Evaluation, LREC 2014, Reykjavik, Iceland, May 26–31, 2014, N. Calzolari, K. Choukri, T. Declerck, H. Loftsson, B. Maegaard, J. Mariani, A. Moreno, J. Odijk and S. Piperidis, eds, European Language Resources Association (ELRA), 2014, pp. 4593–4600, http://www.lrec-conf.org/proceedings/lrec2014/summaries/176.html.
[33]
F. Sasaki, T. Gornostay, M. Dojchinovski, M. Osella, E. Mannens, G. Stoitsis, P. Ritchie, T. Declerck and K. Koidl, Introducing FREME: Deploying linguistic linked data, in: Proceedings of the Fourth Workshop on the Multilingual Semantic Web (MSW4) Co-Located with 12th Extended Semantic Web Conference (ESWC 2015), Portorož, Slovenia, June 1, 2015, J. Gracia, J.P. McCrae and G. Vulcu, eds, CEUR Workshop Proceedings, Vol. 1532, CEUR-WS.org, 2015, pp. 59–66, http://ceur-ws.org/Vol-1532/paper6.pdf.
[34]
L. Shen and A.K. Joshi, Ranking and reranking with perceptron, Machine Learning 60(1–3) (2005), 73–96.
[35]
S. Singh, A. Subramanya, F. Pereira and A. McCallum, Wikilinks: A large-scale cross-document coreference corpus labeled via links to Wikipedia, Technical Report UMASS-CS-2012-015, Department of Computer Science, University of Massachusetts, Amherst, 2012, https://web.cs.umass.edu/publication/docs/2012/UM-CS-2012-015.pdf.
[36]
R. Speck and A.-C.N. Ngomo, Named entity recognition using FOX, in: Proceedings of the ISWC 2014 Posters & Demonstrations Track a Track Within the 13th International Semantic Web Conference, ISWC 2014, Riva del Garda, Italy, October 21, 2014, M. Horridge, M. Rospocher and J. van Ossenbruggen, eds, CEUR Workshop Proceedings, Vol. 1272, CEUR-WS.org, 2014, pp. 85–88, http://ceur-ws.org/Vol-1272/paper_70.pdf.
[37]
N. Steinmetz and H. Sack, Semantic multimedia information retrieval based on contextual descriptions, in: The Semantic Web: Semantics and Big Data, 10th International Conference, ESWC 2013. Proceedings, Montpellier, France, May 26–30, 2013, P. Cimiano, Ó. Corcho, V. Presutti, L. Hollink and S. Rudolph, eds, Lecture Notes in Computer Science, Vol. 7882, Springer, Montpellier, France, 2013, pp. 382–396.
[38]
H. Tong, C. Faloutsos and J.-Y. Pan, Fast random walk with restart and its applications, in: Proceedings of the 6th IEEE International Conference on Data Mining (ICDM 2006), Hong Kong, China, 18–22 December 2006, IEEE Computer Society, 2006, pp. 613–622.
[39]
R. Usbeck, A.-C.N. Ngomo, M. Röder, D. Gerber, S.A. Coelho, S. Auer and A. Both, AGDISTIS – agnostic disambiguation of named entities using linked open data, in: ECAI 2014 – 21st European Conference on Artificial Intelligence, Including Prestigious Applications of Intelligent Systems (PAIS 2014), Prague, Czech Republic, 18–22 August 2014, T. Schaub, G. Friedrich and B. O’Sullivan, eds, Frontiers in Artificial Intelligence and Applications, Vol. 263, IOS Press, 2014, pp. 1113–1114.
[40]
R. Usbeck, M. Röder, A.-C.N. Ngomo, C. Baron, A. Both, M. Brümmer, D. Ceccarelli, M. Cornolti, D. Cherix, B. Eickmann, P. Ferragina, C. Lemke, A. Moro, R. Navigli, F. Piccinno, G. Rizzo, H. Sack, R. Speck, R. Troncy, J. Waitelonis and L. Wesemann, GERBIL: General entity annotator benchmarking framework, in: Proceedings of the 24th International Conference on World Wide Web, WWW 2015, Florence, Italy, May 18–22, 2015, A. Gangemi, S. Leonardi and A. Panconesi, eds, ACM, 2015, pp. 1133–1143.
[41]
Q. Wu, C.J.C. Burges, K.M. Svore and J. Gao, Adapting boosting for information retrieval measures, Information Retrieval 13(3) (2010), 254–270.
[42]
L. Zhang and Achim, Rettinger. X-LiSA: Cross-lingual semantic annotation, Proceedings of the VLDB Endowment 7(13) (2014), 1693–1696, http://www.vldb.org/pvldb/vol7/p1693-zhang.pdf.
[43]
W. Zhang, Y.C. Sim, J. Su and C.L. Tan, Entity linking with effective acronym expansion, instance selection, and topic modeling, in: IJCAI 2011, Proceedings of the 22nd International Joint Conference on Artificial Intelligence, Barcelona, Catalonia, Spain, July 16–22, 2011, T. Walsh, ed., IJCAI/AAAI, 2011, pp. 1909–1914.
[44]
W. Zhang, J. Su, C.L. Tan and W. Wang, Entity linking leveraging automatically generated annotation, in: COLING 2010, 23rd International Conference on Computational Linguistics, Proceedings of the Conference, Beijing, China, 23–27 August 2010, C.-R. Huang and D. Jurafsky, eds, Tsinghua University Press, 2010, pp. 1290–1298, http://aclweb.org/anthology/C10-1145.
[45]
Z. Zheng, F. Li, M. Huang and X. Zhu, Learning to link entities with knowledge base, in: Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, Proceedings, Los Angeles, California, USA, June 2–4, 2010, ACL, 2010, pp. 483–491, http://www.aclweb.org/anthology/N10-1072.
[46]
Y. Zhou, L. Nie, O. Rouhani-Kalleh, F. Vasile and S. Gaffney, Resolving surface forms to Wikipedia topics, in: COLING 2010, 23rd International Conference on Computational Linguistics, Proceedings of the Conference, Beijing, China, 23–27 August 2010, C.-R. Huang and D. Jurafsky, eds, Tsinghua University Press, 2010, pp. 1335–1343, http://aclweb.org/anthology/C10-1150.
[47]
S. Zwicklbauer, C. Seifert and M. Granitzer, Robust and collective entity disambiguation through semantic embeddings, in: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2016, Pisa, Italy, July 17–21, 2016, R. Perego, F. Sebastiani, J.A. Aslam, I. Ruthven and J. Zobel, eds, ACM, 2016, pp. 425–434.

Cited By

View all
  • (2024)CorpusLM: Towards a Unified Language Model on Corpus for Knowledge-Intensive TasksProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657778(26-37)Online publication date: 10-Jul-2024
  • (2024)Entity Disambiguation with Extreme Multi-label RankingProceedings of the ACM Web Conference 202410.1145/3589334.3645498(4172-4180)Online publication date: 13-May-2024
  • (2024)Adaptive deep learning for entity disambiguation via knowledge-based risk analysisExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.122342238:PEOnline publication date: 27-Feb-2024
  • Show More Cited By

Index Terms

  1. Robust named entity disambiguation with random walks
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image Semantic Web
        Semantic Web  Volume 9, Issue 4
        2018
        157 pages
        ISSN:1570-0844
        EISSN:2210-4968
        Issue’s Table of Contents

        Publisher

        IOS Press

        Netherlands

        Publication History

        Published: 01 January 2018

        Author Tags

        1. Named entities
        2. entity linking
        3. entity disambiguation
        4. relatedness measure
        5. random walk
        6. benchmarking

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 25 Feb 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)CorpusLM: Towards a Unified Language Model on Corpus for Knowledge-Intensive TasksProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657778(26-37)Online publication date: 10-Jul-2024
        • (2024)Entity Disambiguation with Extreme Multi-label RankingProceedings of the ACM Web Conference 202410.1145/3589334.3645498(4172-4180)Online publication date: 13-May-2024
        • (2024)Adaptive deep learning for entity disambiguation via knowledge-based risk analysisExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.122342238:PEOnline publication date: 27-Feb-2024
        • (2024)NEREL: a Russian information extraction dataset with rich annotation for nested entities, relations, and wikidata entity linksLanguage Resources and Evaluation10.1007/s10579-023-09674-z58:2(547-583)Online publication date: 1-Jun-2024
        • (2024)Entity linking for English and other languages: a surveyKnowledge and Information Systems10.1007/s10115-023-02059-266:7(3773-3824)Online publication date: 1-Jul-2024
        • (2024)Knowledge Graphs for Enhancing Large Language Models in Entity DisambiguationThe Semantic Web – ISWC 202410.1007/978-3-031-77844-5_9(162-179)Online publication date: 11-Nov-2024
        • (2024)Named Entity Linking in English-Czech Parallel CorpusText, Speech, and Dialogue10.1007/978-3-031-70563-2_12(147-158)Online publication date: 9-Sep-2024
        • (2024)MESS: Coarse-Grained Modular Two-Way Dialogue Entity Linking FrameworkMachine Learning and Knowledge Discovery in Databases. Research Track10.1007/978-3-031-70341-6_15(248-263)Online publication date: 8-Sep-2024
        • (2023)A Unified Generative Retriever for Knowledge-Intensive Language Tasks via Prompt LearningProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591631(1448-1457)Online publication date: 19-Jul-2023
        • (2023)Modeling Fine-grained Information via Knowledge-aware Hierarchical Graph for Zero-shot Entity RetrievalProceedings of the Sixteenth ACM International Conference on Web Search and Data Mining10.1145/3539597.3570415(1021-1029)Online publication date: 27-Feb-2023
        • Show More Cited By

        View Options

        View options

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media