Selective Knowledge Distillation for Neural Machine Translation

Wang, Fusheng; Yan, Jianhao; Meng, Fandong; Zhou, Jie

Computer Science > Computation and Language

arXiv:2105.12967 (cs)

[Submitted on 27 May 2021]

Title:Selective Knowledge Distillation for Neural Machine Translation

Authors:Fusheng Wang, Jianhao Yan, Fandong Meng, Jie Zhou

View PDF

Abstract:Neural Machine Translation (NMT) models achieve state-of-the-art performance on many translation benchmarks. As an active research field in NMT, knowledge distillation is widely applied to enhance the model's performance by transferring teacher model's knowledge on each training sample. However, previous work rarely discusses the different impacts and connections among these samples, which serve as the medium for transferring teacher knowledge. In this paper, we design a novel protocol that can effectively analyze the different impacts of samples by comparing various samples' partitions. Based on above protocol, we conduct extensive experiments and find that the teacher's knowledge is not the more, the better. Knowledge over specific samples may even hurt the whole performance of knowledge distillation. Finally, to address these issues, we propose two simple yet effective strategies, i.e., batch-level and global-level selections, to pick suitable samples for distillation. We evaluate our approaches on two large-scale machine translation tasks, WMT'14 English->German and WMT'19 Chinese->English. Experimental results show that our approaches yield up to +1.28 and +0.89 BLEU points improvements over the Transformer baseline, respectively.

Comments:	Accepted as a long paper at ACL 2021. Code is available at this https URL
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2105.12967 [cs.CL]
	(or arXiv:2105.12967v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2105.12967

Submission history

From: Fandong Meng [view email]
[v1] Thu, 27 May 2021 06:54:12 UTC (5,544 KB)

Computer Science > Computation and Language

Title:Selective Knowledge Distillation for Neural Machine Translation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Selective Knowledge Distillation for Neural Machine Translation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators