Alleviating the Inequality of Attention Heads for Neural Machine Translation

Sun, Zewei; Huang, Shujian; Dai, Xin-Yu; Chen, Jiajun

Computer Science > Computation and Language

arXiv:2009.09672 (cs)

[Submitted on 21 Sep 2020 (v1), last revised 31 Aug 2022 (this version, v2)]

Title:Alleviating the Inequality of Attention Heads for Neural Machine Translation

Authors:Zewei Sun, Shujian Huang, Xin-Yu Dai, Jiajun Chen

View PDF

Abstract:Recent studies show that the attention heads in Transformer are not equal. We relate this phenomenon to the imbalance training of multi-head attention and the model dependence on specific heads. To tackle this problem, we propose a simple masking method: HeadMask, in two specific ways. Experiments show that translation improvements are achieved on multiple language pairs. Subsequent empirical analyses also support our assumption and confirm the effectiveness of the method.

Comments:	Accepted by COLING 2022
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2009.09672 [cs.CL]
	(or arXiv:2009.09672v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2009.09672

Submission history

From: Zewei Sun [view email]
[v1] Mon, 21 Sep 2020 08:14:30 UTC (7,171 KB)
[v2] Wed, 31 Aug 2022 11:50:22 UTC (102 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2020-09

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Zewei Sun
Shujian Huang
Xinyu Dai
Jiajun Chen

export BibTeX citation

Computer Science > Computation and Language

Title:Alleviating the Inequality of Attention Heads for Neural Machine Translation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Alleviating the Inequality of Attention Heads for Neural Machine Translation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators