short-paper

Coarse-to-Fine Output Predictions for Efficient Decoding in Neural Machine Translation

Authors:

Jingbo ZhuAuthors Info & Claims

ACM Transactions on Asian and Low-Resource Language Information Processing, Volume 21, Issue 6

Article No.: 135, Pages 1 - 12

https://doi.org/10.1145/3527664

Published: 16 December 2022 Publication History

Abstract

Neural Machine Translation (NMT) systems are undesirably slow as the decoder often has to compute probability distributions over large target vocabularies. In this work, we propose a coarse-to-fine approach to reduce the complexity of the decoding process, using only the information of the weight matrix in the Softmax layer. The large target vocabulary is first trimmed to a small candidate set in the coarse-grained phase, and from this candidate set the final top-k results are generated in the fine-grained phase. Tested on an RNN-based NMT system and a Transformer-based NMT system separately, our GPU-friendly method achieved a significant speed-up without harming the translation quality.

References

[1]

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the 3rd International Conference on Learning Representations (ICLR’15), Yoshua Bengio and Yann LeCun (Eds.). San Diego, CA. http://arxiv.org/abs/1409.0473.

[2]

Ondrej Bojar, Rajen Chatterjee, Christian Federmann, Yvette Graham, Barry Haddow, Matthias Huck, Antonio Jimeno-Yepes, Philipp Koehn, Varvara Logacheva, Christof Monz, Matteo Negri, Aurélie Névéol, Mariana L. Neves, Martin Popel, Matt Post, Raphael Rubino, Carolina Scarton, Lucia Specia, Marco Turchi, Karin M. Verspoor, and Marcos Zampieri. 2016. Findings of the 2016 conference on machine translation. In Proceedings of the 1st Conference on Machine Translation (WMT’16). The Association for Computer Linguistics, Berlin, 131–198.

[3]

Jacob Devlin. 2017. Sharp models on dull hardware: Fast and accurate neural machine translation decoding on the CPU. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’17), Martha Palmer, Rebecca Hwa, and Sebastian Riedel (Eds.). Association for Computational Linguistics, Copenhagen, 2820–2825.

[4]

Sébastien Jean, KyungHyun Cho, Roland Memisevic, and Yoshua Bengio. 2015. On using very large target vocabulary for neural machine translation. CoRR abs/1412.2007 (2014). arXiv:1412.2007 http://arxiv.org/abs/1412.2007.

[5]

Yoon Kim and Alexander M. Rush. 2016. Sequence-level knowledge distillation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP’16), Jian Su, Xavier Carreras, and Kevin Duh (Eds.). The Association for Computational Linguistics, Austin, TX, 1317–1327.

[6]

Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondrej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL’07), John A. Carroll, Antal van den Bosch, and Annie Zaenen (Eds.). The Association for Computational Linguistics, Prague. https://aclanthology.org/P07-2045/.

Digital Library

[7]

Philipp Koehn, Franz Josef Och, and Daniel Marcu. 2003. Statistical phrase-based translation. In Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL’03), Marti A. Hearst and Mari Ostendorf (Eds.). The Association for Computational Linguistics, Edmonton. https://aclanthology.org/N03-1017/.

[8]

Gurvan L’Hostis, David Grangier, and Michael Auli. 2016. Vocabulary selection strategies for neural machine translation. arXiv:1610.00072. Retrieved from http://arxiv.org/abs/1610.00072.

[9]

Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. Effective approaches to attention-based neural machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’15), Lluís Màrquez, Chris Callison-Burch, Jian Su, Daniele Pighin, and Yuval Marton (Eds.). The Association for Computational Linguistics, Lisbon, 1412–1421. DOI:

[10]

Haitao Mi, Zhiguo Wang, and Abe Ittycheriah. 2016. Vocabulary manipulation for neural machine translation. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL’16). The Association for Computer Linguistics, Berlin.

[11]

Slav Petrov, Aria Haghighi, and Dan Klein. 2008. Coarse-to-fine syntactic machine translation using language projections. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’08). ACL, Honolulu, Hawaii, 108–116. Retrieved from https://aclanthology.org/D08-1012/.

Digital Library

[12]

Baskaran Sankaran, Markus Freitag, and Yaser Al-Onaizan. 2017. Attention-based vocabulary selection for NMT decoding. arXiv:1706.03824. Retrieved from http://arxiv.org/abs/1706.03824.

[13]

Baskaran Sankaran, Markus Freitag, and Yaser Al-Onaizan. 2017. Attention-based vocabulary selection for NMT decoding. arXiv:1706.03824. Retrieved from http://arxiv.org/abs/1706.03824.

[14]

Abigail See, Minh-Thang Luong, and Christopher D. Manning. 2016. Compression of neural machine translation models via pruning. In Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning (CoNLL’16), Yoav Goldberg and Stefan Riezler (Eds.). ACL, Berlin, 291–301.

[15]

Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Edinburgh neural machine translation systems for WMT 16. In Proceedings of the 1st Conference on Machine Translation (WMT’16). The Association for Computer Linguistics, Berlin, 371–376.

[16]

Xing Shi and Kevin Knight. 2017. Speeding up neural machine translation decoding by shrinking run-time vocabulary. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL’17), Regina Barzilay and Min-Yen Kan (Eds.). Association for Computational Linguistics, Vancouver, 574–579.

[17]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems, Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.). Long Beach, 5998–6008. https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html.

[18]

Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Lukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, and Jeffrey Dean. 2016. Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv:1609.08144. Retrieved from http://arxiv.org/abs/1609.08144.

[19]

Tong Xiao, Yinqiao Li, Jingbo Zhu, Zhengtao Yu, and Tongran Liu. 2019. Sharing attention weights for fast transformer. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI’19), Sarit Kraus (Ed.). ijcai.org, Macao, 5292–5298.

Cited By

Ko YRyu SHan SJeon YKim JPark SHan KTong HKim S(2023)KHAN: Knowledge-Aware Hierarchical Attention Networks for Accurate Political Stance PredictionProceedings of the ACM Web Conference 202310.1145/3543507.3583300(1572-1583)Online publication date: 30-Apr-2023
https://dl.acm.org/doi/10.1145/3543507.3583300
Jiang WLi MShabaz MSharma AHaq M(2022)Generation of Voice Signal Tone Sandhi and Melody Based on Convolutional Neural NetworkACM Transactions on Asian and Low-Resource Language Information Processing10.1145/354556922:5(1-13)Online publication date: 19-Sep-2022
https://dl.acm.org/doi/10.1145/3545569
Han YWang YChen JCao AZan H(2022)BBAE: A Method for Few-Shot Charge Prediction with Data Augmentation and Neural NetworkChinese Lexical Semantics10.1007/978-3-031-28956-9_5(58-66)Online publication date: 14-May-2022
https://dl.acm.org/doi/10.1007/978-3-031-28956-9_5

Index Terms

Coarse-to-Fine Output Predictions for Efficient Decoding in Neural Machine Translation
1. Theory of computation
  1. Theory and algorithms for application domains
    1. Machine learning theory

Recommendations

Neural Machine Translation of Indian Languages
Compute '17: Proceedings of the 10th Annual ACM India Compute Conference

Neural Machine Translation (NMT) is a new technique for machine translation that has led to remarkable improvements compared to rule-based and statistical machine translation (SMT) techniques, by overcoming many of the weaknesses in the conventional ...
Approach of English to Sanskrit machine translation based on case-based reasoning, artificial neural networks and translation rules

In Machine Translation (MT), we reuse past translation that is encoded into a set of cases, where case is the input sentence and its corresponding translation. A case which is similar to the input sentence will be retrieved and a solution is produced by ...
Language Modeling for Syntax-Based Machine Translation Using Tree Substitution Grammars: A Case Study on Chinese-English Translation

The poor grammatical output of Machine Translation (MT) systems appeals syntax-based approaches within language modeling. However, previous studies showed that syntax-based language modeling using (Context-Free) Treebank Grammars was not very helpful in ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Asian and Low-Resource Language Information Processing

ACM Transactions on Asian and Low-Resource Language Information Processing Volume 21, Issue 6

November 2022

372 pages

ISSN:2375-4699

EISSN:2375-4702

DOI:10.1145/3568970

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 December 2022

Online AM: 07 April 2022

Accepted: 18 March 2022

Revised: 08 February 2022

Received: 15 June 2021

Published in TALLIP Volume 21, Issue 6

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
139
Total Downloads

Downloads (Last 12 months)37
Downloads (Last 6 weeks)3

Reflects downloads up to 04 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Ko YRyu SHan SJeon YKim JPark SHan KTong HKim S(2023)KHAN: Knowledge-Aware Hierarchical Attention Networks for Accurate Political Stance PredictionProceedings of the ACM Web Conference 202310.1145/3543507.3583300(1572-1583)Online publication date: 30-Apr-2023
https://dl.acm.org/doi/10.1145/3543507.3583300
Jiang WLi MShabaz MSharma AHaq M(2022)Generation of Voice Signal Tone Sandhi and Melody Based on Convolutional Neural NetworkACM Transactions on Asian and Low-Resource Language Information Processing10.1145/354556922:5(1-13)Online publication date: 19-Sep-2022
https://dl.acm.org/doi/10.1145/3545569
Han YWang YChen JCao AZan H(2022)BBAE: A Method for Few-Shot Charge Prediction with Data Augmentation and Neural NetworkChinese Lexical Semantics10.1007/978-3-031-28956-9_5(58-66)Online publication date: 14-May-2022
https://dl.acm.org/doi/10.1007/978-3-031-28956-9_5

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents