ErAConD: Error Annotated Conversational Dialog Dataset for Grammatical Error Correction

Xun Yuan; Derek Pham; Sam Davidson; Zhou Yu

doi:10.18653/v1/2022.naacl-main.5

ErAConD: Error Annotated Conversational Dialog Dataset for Grammatical Error Correction

Xun Yuan, Derek Pham, Sam Davidson, Zhou Yu

Abstract

Currently available grammatical error correction (GEC) datasets are compiled using essays or other long-form text written by language learners, limiting the applicability of these datasets to other domains such as informal writing and conversational dialog. In this paper, we present a novel GEC dataset consisting of parallel original and corrected utterances drawn from open-domain chatbot conversations; this dataset is, to our knowledge, the first GEC dataset targeted to a human-machine conversational setting. We also present a detailed annotation scheme which ranks errors by perceived impact on comprehension, making our dataset more representative of real-world language learning applications. To demonstrate the utility of the dataset, we use our annotated data to fine-tune a state-of-the-art GEC model. Experimental results show the effectiveness of our data in improving GEC model performance in a conversational scenario.

Anthology ID:: 2022.naacl-main.5
Volume:: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Month:: July
Year:: 2022
Address:: Seattle, United States
Editors:: Marine Carpuat, Marie-Catherine de Marneffe, Ivan Vladimir Meza Ruiz
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 76–84
Language:
URL:: https://aclanthology.org/2022.naacl-main.5
DOI:: 10.18653/v1/2022.naacl-main.5
Bibkey:
Cite (ACL):: Xun Yuan, Derek Pham, Sam Davidson, and Zhou Yu. 2022. ErAConD: Error Annotated Conversational Dialog Dataset for Grammatical Error Correction. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 76–84, Seattle, United States. Association for Computational Linguistics.
Cite (Informal):: ErAConD: Error Annotated Conversational Dialog Dataset for Grammatical Error Correction (Yuan et al., NAACL 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.naacl-main.5.pdf
Video:: https://aclanthology.org/2022.naacl-main.5.mp4
Code: yuanxun-yx/eracond
Data: ErAConD

PDF Cite Search Code Video