CALOR-QUEST : generating a training corpus for Machine Reading Comprehension models from shallow semantic annotations

Frédéric Bechet; Cindy Aloui; Delphine Charlet; Géraldine Damnati; Johannes Heinecke; Alexis Nasr; Frédéric Herledan

doi:10.18653/v1/D19-5803

CALOR-QUEST : generating a training corpus for Machine Reading Comprehension models from shallow semantic annotations

Frederic Bechet, Cindy Aloui, Delphine Charlet, Geraldine Damnati, Johannes Heinecke, Alexis Nasr, Frederic Herledan

Abstract

Machine reading comprehension is a task related to Question-Answering where questions are not generic in scope but are related to a particular document. Recently very large corpora (SQuAD, MS MARCO) containing triplets (document, question, answer) were made available to the scientific community to develop supervised methods based on deep neural networks with promising results. These methods need very large training corpus to be efficient, however such kind of data only exists for English and Chinese at the moment. The aim of this study is the development of such resources for other languages by proposing to generate in a semi-automatic way questions from the semantic Frame analysis of large corpora. The collect of natural questions is reduced to a validation/test set. We applied this method on the CALOR-Frame French corpus to develop the CALOR-QUEST resource presented in this paper.

Anthology ID:: D19-5803
Volume:: Proceedings of the 2nd Workshop on Machine Reading for Question Answering
Month:: November
Year:: 2019
Address:: Hong Kong, China
Editors:: Adam Fisch, Alon Talmor, Robin Jia, Minjoon Seo, Eunsol Choi, Danqi Chen
Venue:: WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 19–26
Language:
URL:: https://aclanthology.org/D19-5803
DOI:: 10.18653/v1/D19-5803
Bibkey:
Cite (ACL):: Frederic Bechet, Cindy Aloui, Delphine Charlet, Geraldine Damnati, Johannes Heinecke, Alexis Nasr, and Frederic Herledan. 2019. CALOR-QUEST : generating a training corpus for Machine Reading Comprehension models from shallow semantic annotations. In Proceedings of the 2nd Workshop on Machine Reading for Question Answering, pages 19–26, Hong Kong, China. Association for Computational Linguistics.
Cite (Informal):: CALOR-QUEST : generating a training corpus for Machine Reading Comprehension models from shallow semantic annotations (Bechet et al., 2019)
Copy Citation:
PDF:: https://aclanthology.org/D19-5803.pdf

PDF Cite Search