Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3404835.3463258acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
short-paper
Open access

Conversational Entity Linking: Problem Definition and Datasets

Published: 11 July 2021 Publication History

Abstract

Machine understanding of user utterances in conversational systems is of utmost importance for enabling engaging and meaningful conversations with users. Entity Linking (EL) is one of the means of text understanding, with proven efficacy for various downstream tasks in information retrieval. In this paper, we study entity linking for conversational systems. To develop a better understanding of what EL in a conversational setting entails, we analyze a large number of dialogues from existing conversational datasets and annotate references to concepts, named entities, and personal entities using crowdsourcing. Based on the annotated dialogues, we identify the main characteristics of conversational entity linking. Further, we report on the performance of traditional EL systems on our Conversational Entity Linking dataset, ConEL, and present an extension to these methods to better fit the conversational setting. The resources released with this paper include annotated datasets, detailed descriptions of crowdsourcing setups, as well as the annotations produced by various EL systems. These new resources allow for an investigation of how the role of entities in conversations is different from that in documents or isolated short text utterances like queries and tweets, and complement existing conversational datasets.

Supplementary Material

MP4 File (SIGIR21_ConEL_210610_1429.mp4)
The pre-recorded presentation for the SIGIR'21 resource paper: "Conversational Entity Linking: Problem Definition and Datasets", Hideaki Joko, Faegheh Hasibi, Krisztian Balog, and Arjen P. de Vries.

References

[1]
Omar Adjali, Romaric Besancc on, Olivier Ferret, Hervé Le Borgne, and Brigitte Grau. 2020. Building a Multimodal Entity Linking Dataset From Tweets. In Proceedings of the 12th Language Resources and Evaluation Conference. 4285--4292.
[2]
Raviteja Anantha, Svitlana Vakulenko, Zhucheng Tu, Shayne Longpre, Stephen Pulman, and Srinivas Chappidi. 2020. Open-Domain Question Answering Goes Conversational via Question Rewriting. arXiv preprint arXiv:2010.04898 (2020).
[3]
Gabor Angeli, Melvin Jose Johnson Premkumar, and Christopher D. Manning. 2015. Leveraging Linguistic Structure For Open Domain Information Extraction. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 344--354.
[4]
Krisztian Balog. 2018. Entity-Oriented Search. The Information Retrieval Series, Vol. 39. Springer.
[5]
Krisztian Balog and Tom Kenter. 2019. Personal Knowledge Graphs: A Research Agenda. In Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval (ICTIR '19). 217--220.
[6]
Preeti Bhargava, Nemanja Spasojevic, Sarah Ellinger, Adithya Rao, Abhinand Menon, Saul Fuhrmann, and Guoning Hu. 2019. Learning to Map Wikidata Entities To Predefined Topics (WWW '19). 1194--1202.
[7]
Kevin Bowden, Jiaqi Wu, Shereen Oraby, Amita Misra, and Marilyn Walker. 2018. SlugNERDS: A Named Entity Recognition Tool for Open Domain Dialogue Systems. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018) .
[8]
Pavel Braslavski, Denis Savenkov, Eugene Agichtein, and Alina Dubatovka. 2017. What Do You Mean Exactly? Analyzing Clarification Questions in CQA. In Proceedings of the 2017 Conference on Conference Human Information Interaction and Retrieval (CHIIR '17). 345--348.
[9]
Tom B Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. arXiv preprint arXiv:2005.14165 (2020).
[10]
Paweł Budzianowski, Tsung-Hsien Wen, Bo-Hsiang Tseng, I nigo Casanueva, Stefan Ultes, Osman Ramadan, and Milica Gavs ić. 2018. MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 5016--5026.
[11]
Bill Byrne, Karthik Krishnamoorthi, Chinnadhurai Sankar, Arvind Neelakantan, Ben Goodrich, Daniel Duckworth, Semih Yavuz, Amit Dubey, Kyu-Young Kim, and Andy Cedilnik. 2019. Taskmaster-1: Toward a Realistic and Diverse Dialog Dataset. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 4516--4525.
[12]
Nicola De Cao, Gautier Izacard, Sebastian Riedel, and Fabio Petroni. 2021. Autoregressive Entity Retrieval. In Proceedings of International Conference on Learning Representations (ICLR) .
[13]
David Carmel, Ming-Wei Chang, Evgeniy Gabrilovich, Bo-june Paul (Paul) Hsu, and Kuansan Wang. 2014. ERD' 14 : Entity Recognition and Disambiguation Challenge. SIGIR Forum, Vol. 48 (2014), 63---77.
[14]
Danqi Chen, Adam Fisch, Jason Weston, and Antoine Bordes. 2017. Reading Wikipedia to Answer Open-Domain Questions. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1870--1879.
[15]
Yu-Hsin Chen and Jinho D. Choi. 2016. Character Identification on Multiparty Conversation: Identifying Mentions of Characters in TV Shows. In Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue. 90--100.
[16]
Eunsol Choi, He He, Mohit Iyyer, Mark Yatskar, Wen-tau Yih, Yejin Choi, Percy Liang, and Luke Zettlemoyer. 2018. QuAC: Question Answering in Context. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2174--2184.
[17]
Marco Cornolti, Paolo Ferragina, Massimiliano Ciaramita, Stefan Rüd, and Hinrich Schütze. 2018. SMAPH: A Piggyback Approach for Entity-Linking in Web Queries. ACM Trans. Inf. Syst., Vol. 37, 1 (2018).
[18]
Lei Cui, Furu Wei, and Ming Zhou. 2018. Neural Open Information Extraction. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 407--413.
[19]
Jeffrey Dalton, Laura Dietz, and James Allan. 2014. Entity Query Feature Expansion Using Knowledge Base Links. In Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval (SIGIR '14). 365--374.
[20]
Jeffrey Dalton, Chenyan Xiong, and Jamie Callan. 2019. CAsT 2019: The Conversational Assistance Track Overview. In Proceedings of TREC `19. 13--15.
[21]
Jeff Dalton, Chenyan Xiong, and Jamie Callan. 2020. TREC Conversational Assistance Track (CAsT). https://github.com/daltonj/treccastweb .
[22]
Arash Dargahi Nobari, Arian Askari, Faegheh Hasibi, and Mahmood Neshati. 2018. Query Understanding via Entity Attribute Identification. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management (CIKM '18). 1759--1762.
[23]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 4171--4186.
[24]
Emily Dinan, Stephen Roller, Kurt Shuster, Angela Fan, Michael Auli, and Jason Weston. 2019. Wizard of Wikipedia: Knowledge-Powered Conversational Agents. In International Conference on Learning Representations .
[25]
Mihail Eric, Rahul Goel, Shachi Paul, Abhishek Sethi, Sanchit Agarwal, Shuyang Gao, Adarsh Kumar, Anuj Goyal, Peter Ku, and Dilek Hakkani-Tur. 2020. MultiWOZ 2.1: A Consolidated Multi-Domain Dialogue Dataset with State Corrections and State Tracking Baselines. In Proceedings of the 12th Language Resources and Evaluation Conference. 422--428.
[26]
Mihail Eric, Lakshmi Krishnan, Francois Charette, and Christopher D. Manning. 2017. Key-Value Retrieval Networks for Task-Oriented Dialogue. In Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue. 37--49.
[27]
Anthony Fader, Stephen Soderland, and Oren Etzioni. 2011. Identifying Relations for Open Information Extraction. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing. 1535--1545.
[28]
Paolo Ferragina and Ugo Scaiella. 2010. TAGME: On-the-fly Annotation of Short Text Fragments (by Wikipedia Entities)). In Proceedings of the 19th ACM international conference on Information and knowledge management. 1625--1628.
[29]
Tim Finin, Will Murnane, Anand Karandikar, Nicholas Keller, Justin Martineau, and Mark Dredze. 2010. Annotating Named Entities in Twitter Data with Crowdsourcing. In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk (CSLDAMT '10). 80--88.
[30]
Jianfeng Gao, Michel Galley, and Lihong Li. 2019. Neural Approaches to Conversational AI. Foundations and Trends® in Information Retrieval, Vol. 13, 2--3 (2019), 127--298.
[31]
Emma Gerritse, Faegheh Hasibi, and Arjen De Vries. 2020. Graph-Embedding Empowered Entity Retrieval. In Proceedings of the 42nd European Conference on Information Retrieval (ECIR). 97--110.
[32]
Faegheh Hasibi, Krisztian Balog, and Svein Erik Bratsberg. 2015. Entity Linking in Queries: Tasks and Evaluation. In Proceedings of the 2015 International Conference on The Theory of Information Retrieval (ICTIR '15). 171--180.
[33]
Faegheh Hasibi, Krisztian Balog, and Svein Erik Bratsberg. 2016. Exploiting Entity Linking in Queries for Entity Retrieval. In Proceedings of the 2016 ACM on International Conference on the Theory of Information Retrieval (ICTIR '16). 209--218.
[34]
Faegheh Hasibi, Krisztian Balog, and Svein Erik Bratsberg. 2017a. Dynamic Factual Summaries for Entity Cards. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '17). 773--782.
[35]
Faegheh Hasibi, Krisztian Balog, and Svein Erik Bratsberg. 2017b. Entity Linking in Queries: Efficiency vs. Effectiveness. In Proceedings of 39th European Conference on Information Retrieval (ECIR '17). 40--53.
[36]
Claudia Hauff. 2020. Conversational IR. https://github.com/chauff/conversationalIR. Online; accessed October 2020].
[37]
Johannes Hoffart, Mohamed Amir Yosef, Ilaria Bordino, Hagen Fü rstenau, Manfred Pinkal, Marc Spaniol, Bilyana Taneva, Stefan Thater, and Gerhard Weikum. 2011. Robust disambiguation of named entities in text. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP '11). 782--792.
[38]
Vaibhav Kumar and Jamie Callan. 2020. Making Information Seeking Easier: An Improved Pipeline for Conversational Search. In Findings of the Association for Computational Linguistics: EMNLP 2020. 3971--3980.
[39]
Belinda Z. Li, Sewon Min, Srinivasan Iyer, Yashar Mehdad, and Wen-tau Yih. 2020. Efficient One-Pass End-to-End Entity Linking for Questions. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 6433--6441.
[40]
X. Li, G. Tur, D. Hakkani-Tür, and Q. Li. 2014. Personal knowledge graph population from user utterances in conversational understanding. In 2014 IEEE Spoken Language Technology Workshop (SLT). 224--229.
[41]
James Mayfield, Dawn Lawrie, Paul McNamee, and Douglas W. Oard. 2011. Building a Cross-Language Entity Linking Collection in Twenty-One Languages. In Multilingual and Multimodal Information Access Evaluation, Pamela Forner, Julio Gonzalo, Jaana Kek"al"ainen, Mounia Lalmas, and Marteen de Rijke (Eds.). 3--13.
[42]
Edgar Meij, Wouter Weerkamp, and Maarten De Rijke. 2012. Adding Semantics to Microblog Posts. In Proceedings of the Fifth ACM International Conference on Web Search and Data Mining (WSDM '12). 563--572.
[43]
Alexander H Miller, Will Feng, Adam Fisch, Jiasen Lu, Dhruv Batra, Antoine Bordes, Devi Parikh, and Jason Weston. 2017. ParlAI: A Dialog Research Software Platform. arXiv preprint arXiv:1705.06476 (2017).
[44]
Gustavo Penha, Alexandru Balan, and Claudia Hauff. 2019. Introducing MANtIS: a novel multi-domain information seeking dialogues dataset. arXiv preprint arXiv:1912.04639 (2019).
[45]
Matthew E. Peters, Mark Neumann, Robert Logan, Roy Schwartz, Vidur Joshi, Sameer Singh, and Noah A. Smith. 2019. Knowledge Enhanced Contextual Word Representations. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 43--54.
[46]
Francesco Piccinno and Paolo Ferragina. 2014. From TagME to WAT: A New Entity Annotator. In Proceedings of the first international workshop on Entity recognition and disambiguation. 55--62.
[47]
Nina Poerner, Ulli Waltinger, and Hinrich Schü tze. 2020. E-BERT: Efficient-Yet-Effective Entity Embeddings for BERT. In Findings of the Association for Computational Linguistics: EMNLP 2020. 803--818.
[48]
Jonathan Raiman and Olivier Raiman. 2018. DeepType: Multilingual Entity Linking by Neural Type System Evolution. In AAAI. 5406--5413.
[49]
Hannah Rashkin, Eric Michael Smith, Margaret Li, and Y-Lan Boureau. 2019. Towards Empathetic Open-domain Conversation Models: A New Benchmark and Dataset. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 5370--5381.
[50]
Siva Reddy, Danqi Chen, and Christopher D. Manning. 2019. CoQA: A Conversational Question Answering Challenge. Transactions of the Association for Computational Linguistics, Vol. 7 (2019), 249--266.
[51]
Mingyue Shang, Tong Wang, Mihail Eric, Jiangning Chen, Jiyang Wang, Matthew Welch, Tiantong Deng, Akshay Grewal, Han Wang, Yue Liu, Yang Liu, and Dilek Hakkani-Tur. 2021. Entity Resolution in Open-domain Conversations. In Proceedings of the 2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL) .
[52]
Sainbayar Sukhbaatar, arthur szlam, Jason Weston, and Rob Fergus. 2015. End-To-End Memory Networks. In Advances in Neural Information Processing Systems, Vol. 28.
[53]
Anna Tigunova, Andrew Yates, Paramita Mirza, and Gerhard Weikum. 2019. Listening between the Lines: Learning Personal Attributes from Conversations (WWW '19). 1818--1828.
[54]
Anna Tigunova, Andrew Yates, Paramita Mirza, and Gerhard Weikum. 2020. CHARM: Inferring Personal Attributes from Conversations. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 5391--5404.
[55]
Ricardo Usbeck, Michael Rö der, Axel-Cyrille Ngonga Ngomo, Ciro Baron, Andreas Both, Martin Brü mmer, Diego Ceccarelli, Marco Cornolti, Didier Cherix, Bernd Eickmann, Paolo Ferragina, Christiane Lemke, Andrea Moro, Roberto Navigli, Francesco Piccinno, Giuseppe Rizzo, Harald Sack, René Speck, Raphaë l Troncy, Jö rg Waitelonis, and Lars Wesemann. 2015. GERBIL: General Entity Annotator Benchmarking Framework. In Proceedings of the 24th International Conference on World Wide Web. 1133--1143.
[56]
Svitlana Vakulenko, Maarten de Rijke, Michael Cochez, Vadim Savenkov, and Axel Polleres. 2018. Measuring Semantic Coherence of a Conversation. In The Semantic Web -- ISWC 2018. 634--651.
[57]
Johannes M. van Hulst, Faegheh Hasibi, Koen Dercksen, Krisztian Balog, and Arjen P. de Vries. 2020. REL: An Entity Linker Standing on the Shoulders of Giants. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '20). 2197--2200.
[58]
Ledell Wu, Fabio Petroni, Martin Josifoski, Sebastian Riedel, and Luke Zettlemoyer. 2020. Scalable Zero-shot Entity Linking with Dense Entity Retrieval. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 6397--6407.
[59]
Chenyan Xiong, Jamie Callan, and Tie-Yan Liu. 2017. Word-Entity Duet Representations for Document Ranking. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '17). 763--772.
[60]
Ikuya Yamada, Akari Asai, Jin Sakuma, Hiroyuki Shindo, Hideaki Takeda, Yoshiyasu Takefuji, and Yuji Matsumoto. 2020. Wikipedia2Vec: An Efficient Toolkit for Learning and Visualizing the Embeddings of Words and Entities from Wikipedia. In Conference on Empirical Methods in Natural Language Processing: System Demonstrations. 23--30.
[61]
Alexander Yates, Michele Banko, Matthew Broadhead, Michael Cafarella, Oren Etzioni, and Stephen Soderland. 2007. TextRunner: Open Information Extraction on the Web. In Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT). 25--26.
[62]
Xiaoxue Zang, Abhinav Rastogi, Srinivas Sunkara, Raghav Gupta, Jianguo Zhang, and Jindong Chen. 2020. MultiWOZ 2.2 : A Dialogue Dataset with Additional Annotation Corrections and State Tracking Baselines. In Proceedings of the 2nd Workshop on Natural Language Processing for Conversational AI. 109--117.
[63]
Saizheng Zhang, Emily Dinan, Jack Urbanek, Arthur Szlam, Douwe Kiela, and Jason Weston. 2018. Personalizing Dialogue Agents: I have a dog, do you have pets too?. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2204--2213.
[64]
Zhengyan Zhang, Xu Han, Zhiyuan Liu, Xin Jiang, Maosong Sun, and Qun Liu. 2019. ERNIE: Enhanced Language Representation with Informative Entities. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 1441--1451.

Cited By

View all
  • (2024)Towards Self-Contained Answers: Entity-Based Answer Rewriting in Conversational SearchProceedings of the 2024 Conference on Human Information Interaction and Retrieval10.1145/3627508.3638300(209-218)Online publication date: 10-Mar-2024
  • (2023)Learning to Relate to Previous Turns in Conversational SearchProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599411(1722-1732)Online publication date: 6-Aug-2023
  • (2022)NILK: Entity Linking Dataset Targeting NIL-linking CasesProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557659(4069-4073)Online publication date: 17-Oct-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '21: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval
July 2021
2998 pages
ISBN:9781450380379
DOI:10.1145/3404835
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 July 2021

Check for updates

Author Tags

  1. conversational system
  2. datasets
  3. entity linking

Qualifiers

  • Short-paper

Conference

SIGIR '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)138
  • Downloads (Last 6 weeks)22
Reflects downloads up to 15 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Towards Self-Contained Answers: Entity-Based Answer Rewriting in Conversational SearchProceedings of the 2024 Conference on Human Information Interaction and Retrieval10.1145/3627508.3638300(209-218)Online publication date: 10-Mar-2024
  • (2023)Learning to Relate to Previous Turns in Conversational SearchProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599411(1722-1732)Online publication date: 6-Aug-2023
  • (2022)NILK: Entity Linking Dataset Targeting NIL-linking CasesProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557659(4069-4073)Online publication date: 17-Oct-2022
  • (2022)Personal Research Knowledge GraphsCompanion Proceedings of the Web Conference 202210.1145/3487553.3524654(763-768)Online publication date: 25-Apr-2022
  • (2022)Conversational Question Answering on Heterogeneous SourcesProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531815(144-154)Online publication date: 6-Jul-2022
  • (2022)Enhancing Entity Linking with Contextualized Entity EmbeddingsNatural Language Processing and Chinese Computing10.1007/978-3-031-17189-5_19(228-239)Online publication date: 24-Sep-2022

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media