A Large-Scale Dataset for Empathetic Response Generation

Anuradha Welivita; Yubo Xie; Pearl Pu

doi:10.18653/v1/2021.emnlp-main.96

A Large-Scale Dataset for Empathetic Response Generation

Abstract

Recent development in NLP shows a strong trend towards refining pre-trained models with a domain-specific dataset. This is especially the case for response generation where emotion plays an important role. However, existing empathetic datasets remain small, delaying research efforts in this area, for example, the development of emotion-aware chatbots. One main technical challenge has been the cost of manually annotating dialogues with the right emotion labels. In this paper, we describe a large-scale silver dataset consisting of 1M dialogues annotated with 32 fine-grained emotions, eight empathetic response intents, and the Neutral category. To achieve this goal, we have developed a novel data curation pipeline starting with a small seed of manually annotated data and eventually scaling it to a satisfactory size. We compare its quality against a state-of-the-art gold dataset using both offline experiments and visual validation methods. The resultant procedure can be used to create similar datasets in the same domain as well as in other domains.

Anthology ID:: 2021.emnlp-main.96
Volume:: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2021
Address:: Online and Punta Cana, Dominican Republic
Editors:: Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1251–1264
Language:
URL:: https://aclanthology.org/2021.emnlp-main.96
DOI:: 10.18653/v1/2021.emnlp-main.96
Bibkey:
Cite (ACL):: Anuradha Welivita, Yubo Xie, and Pearl Pu. 2021. A Large-Scale Dataset for Empathetic Response Generation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 1251–1264, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):: A Large-Scale Dataset for Empathetic Response Generation (Welivita et al., EMNLP 2021)
Copy Citation:
PDF:: https://aclanthology.org/2021.emnlp-main.96.pdf
Software:: 2021.emnlp-main.96.Software.zip
Video:: https://aclanthology.org/2021.emnlp-main.96.mp4
Code: anuradha1992/edos
Data: EmotionLines, IEMOCAP, MELD

PDF Cite Search Code Software Video