Efficient Inference For Neural Machine Translation

Hsu, Yi-Te; Garg, Sarthak; Liao, Yi-Hsiu; Chatsviorkin, Ilya

Computer Science > Computation and Language

arXiv:2010.02416 (cs)

[Submitted on 6 Oct 2020 (v1), last revised 7 Oct 2020 (this version, v2)]

Title:Efficient Inference For Neural Machine Translation

Authors:Yi-Te Hsu, Sarthak Garg, Yi-Hsiu Liao, Ilya Chatsviorkin

View PDF

Abstract:Large Transformer models have achieved state-of-the-art results in neural machine translation and have become standard in the field. In this work, we look for the optimal combination of known techniques to optimize inference speed without sacrificing translation quality. We conduct an empirical study that stacks various approaches and demonstrates that combination of replacing decoder self-attention with simplified recurrent units, adopting a deep encoder and a shallow decoder architecture and multi-head attention pruning can achieve up to 109% and 84% speedup on CPU and GPU respectively and reduce the number of parameters by 25% while maintaining the same translation quality in terms of BLEU.

Comments:	Accepted SustaiNLP 2020
Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2010.02416 [cs.CL]
	(or arXiv:2010.02416v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2010.02416

Submission history

From: Yi-Te Hsu [view email]
[v1] Tue, 6 Oct 2020 01:21:11 UTC (7,795 KB)
[v2] Wed, 7 Oct 2020 13:48:02 UTC (7,795 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2020-10

Change to browse by:

cs
cs.LG

References & Citations

1 blog link

(what is this?)

DBLP - CS Bibliography

listing | bibtex

Yi-Te Hsu
Sarthak Garg
Yi-Hsiu Liao

export BibTeX citation

Computer Science > Computation and Language

Title:Efficient Inference For Neural Machine Translation

Submission history

Access Paper:

References & Citations

1 blog link

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Efficient Inference For Neural Machine Translation

Submission history

Access Paper:

References & Citations

1 blog link

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators