[PDF][PDF] Low-Resource Cross-Lingual Adaptive Training for Nigerian Pidgin

PJ Lin, M Saeed, E Chang… - Proceedings of the 24th …, 2023 - isca-archive.org
Proceedings of the 24th INTERSPEECH conference, 2023isca-archive.org
Developing effective spoken language processing systems for low-resource languages
poses several challenges due to the lack of parallel data and limited resources for fine-
tuning models. In this work, we target on improving upon both text classification and
translation of Nigerian Pidgin (Naija) by collecting a large-scale parallel English-Pidgin
corpus and further propose a framework of cross-lingual adaptive training that includes both
continual and task adaptive training so as to adapt a base pre-trained model to low-resource …
Abstract
Developing effective spoken language processing systems for low-resource languages poses several challenges due to the lack of parallel data and limited resources for fine-tuning models. In this work, we target on improving upon both text classification and translation of Nigerian Pidgin (Naija) by collecting a large-scale parallel English-Pidgin corpus and further propose a framework of cross-lingual adaptive training that includes both continual and task adaptive training so as to adapt a base pre-trained model to low-resource languages. Our studies show that English pre-trained language models serve as a stronger prior than multilingual language models on English-Pidgin tasks with up to 2.38 BLEU improvements; and demonstrate that augmenting orthographic data and using task adaptive training with back-translation can have a significant impact on model performance.
isca-archive.org
Showing the best result for this search. See all results