Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
short-paper

Coarse-to-Fine Output Predictions for Efficient Decoding in Neural Machine Translation

Published: 16 December 2022 Publication History

Abstract

Neural Machine Translation (NMT) systems are undesirably slow as the decoder often has to compute probability distributions over large target vocabularies. In this work, we propose a coarse-to-fine approach to reduce the complexity of the decoding process, using only the information of the weight matrix in the Softmax layer. The large target vocabulary is first trimmed to a small candidate set in the coarse-grained phase, and from this candidate set the final top-k results are generated in the fine-grained phase. Tested on an RNN-based NMT system and a Transformer-based NMT system separately, our GPU-friendly method achieved a significant speed-up without harming the translation quality.

References

[1]
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the 3rd International Conference on Learning Representations (ICLR’15), Yoshua Bengio and Yann LeCun (Eds.). San Diego, CA. http://arxiv.org/abs/1409.0473.
[2]
Ondrej Bojar, Rajen Chatterjee, Christian Federmann, Yvette Graham, Barry Haddow, Matthias Huck, Antonio Jimeno-Yepes, Philipp Koehn, Varvara Logacheva, Christof Monz, Matteo Negri, Aurélie Névéol, Mariana L. Neves, Martin Popel, Matt Post, Raphael Rubino, Carolina Scarton, Lucia Specia, Marco Turchi, Karin M. Verspoor, and Marcos Zampieri. 2016. Findings of the 2016 conference on machine translation. In Proceedings of the 1st Conference on Machine Translation (WMT’16). The Association for Computer Linguistics, Berlin, 131–198.
[3]
Jacob Devlin. 2017. Sharp models on dull hardware: Fast and accurate neural machine translation decoding on the CPU. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’17), Martha Palmer, Rebecca Hwa, and Sebastian Riedel (Eds.). Association for Computational Linguistics, Copenhagen, 2820–2825.
[4]
Sébastien Jean, KyungHyun Cho, Roland Memisevic, and Yoshua Bengio. 2015. On using very large target vocabulary for neural machine translation. CoRR abs/1412.2007 (2014). arXiv:1412.2007 http://arxiv.org/abs/1412.2007.
[5]
Yoon Kim and Alexander M. Rush. 2016. Sequence-level knowledge distillation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP’16), Jian Su, Xavier Carreras, and Kevin Duh (Eds.). The Association for Computational Linguistics, Austin, TX, 1317–1327.
[6]
Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondrej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL’07), John A. Carroll, Antal van den Bosch, and Annie Zaenen (Eds.). The Association for Computational Linguistics, Prague. https://aclanthology.org/P07-2045/.
[7]
Philipp Koehn, Franz Josef Och, and Daniel Marcu. 2003. Statistical phrase-based translation. In Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL’03), Marti A. Hearst and Mari Ostendorf (Eds.). The Association for Computational Linguistics, Edmonton. https://aclanthology.org/N03-1017/.
[8]
Gurvan L’Hostis, David Grangier, and Michael Auli. 2016. Vocabulary selection strategies for neural machine translation. arXiv:1610.00072. Retrieved from http://arxiv.org/abs/1610.00072.
[9]
Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. Effective approaches to attention-based neural machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’15), Lluís Màrquez, Chris Callison-Burch, Jian Su, Daniele Pighin, and Yuval Marton (Eds.). The Association for Computational Linguistics, Lisbon, 1412–1421. DOI:
[10]
Haitao Mi, Zhiguo Wang, and Abe Ittycheriah. 2016. Vocabulary manipulation for neural machine translation. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL’16). The Association for Computer Linguistics, Berlin.
[11]
Slav Petrov, Aria Haghighi, and Dan Klein. 2008. Coarse-to-fine syntactic machine translation using language projections. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’08). ACL, Honolulu, Hawaii, 108–116. Retrieved from https://aclanthology.org/D08-1012/.
[12]
Baskaran Sankaran, Markus Freitag, and Yaser Al-Onaizan. 2017. Attention-based vocabulary selection for NMT decoding. arXiv:1706.03824. Retrieved from http://arxiv.org/abs/1706.03824.
[13]
Baskaran Sankaran, Markus Freitag, and Yaser Al-Onaizan. 2017. Attention-based vocabulary selection for NMT decoding. arXiv:1706.03824. Retrieved from http://arxiv.org/abs/1706.03824.
[14]
Abigail See, Minh-Thang Luong, and Christopher D. Manning. 2016. Compression of neural machine translation models via pruning. In Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning (CoNLL’16), Yoav Goldberg and Stefan Riezler (Eds.). ACL, Berlin, 291–301.
[15]
Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Edinburgh neural machine translation systems for WMT 16. In Proceedings of the 1st Conference on Machine Translation (WMT’16). The Association for Computer Linguistics, Berlin, 371–376.
[16]
Xing Shi and Kevin Knight. 2017. Speeding up neural machine translation decoding by shrinking run-time vocabulary. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL’17), Regina Barzilay and Min-Yen Kan (Eds.). Association for Computational Linguistics, Vancouver, 574–579.
[17]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems, Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.). Long Beach, 5998–6008. https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html.
[18]
Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Lukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, and Jeffrey Dean. 2016. Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv:1609.08144. Retrieved from http://arxiv.org/abs/1609.08144.
[19]
Tong Xiao, Yinqiao Li, Jingbo Zhu, Zhengtao Yu, and Tongran Liu. 2019. Sharing attention weights for fast transformer. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI’19), Sarit Kraus (Ed.). ijcai.org, Macao, 5292–5298.

Cited By

View all
  • (2023)KHAN: Knowledge-Aware Hierarchical Attention Networks for Accurate Political Stance PredictionProceedings of the ACM Web Conference 202310.1145/3543507.3583300(1572-1583)Online publication date: 30-Apr-2023
  • (2022)Generation of Voice Signal Tone Sandhi and Melody Based on Convolutional Neural NetworkACM Transactions on Asian and Low-Resource Language Information Processing10.1145/354556922:5(1-13)Online publication date: 19-Sep-2022
  • (2022)BBAE: A Method for Few-Shot Charge Prediction with Data Augmentation and Neural NetworkChinese Lexical Semantics10.1007/978-3-031-28956-9_5(58-66)Online publication date: 14-May-2022

Index Terms

  1. Coarse-to-Fine Output Predictions for Efficient Decoding in Neural Machine Translation

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Asian and Low-Resource Language Information Processing
    ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 21, Issue 6
    November 2022
    372 pages
    ISSN:2375-4699
    EISSN:2375-4702
    DOI:10.1145/3568970
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 16 December 2022
    Online AM: 07 April 2022
    Accepted: 18 March 2022
    Revised: 08 February 2022
    Received: 15 June 2021
    Published in TALLIP Volume 21, Issue 6

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Neural networks
    2. machine translation
    3. acceleration
    4. softmax optimization

    Qualifiers

    • Short-paper
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)37
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 04 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)KHAN: Knowledge-Aware Hierarchical Attention Networks for Accurate Political Stance PredictionProceedings of the ACM Web Conference 202310.1145/3543507.3583300(1572-1583)Online publication date: 30-Apr-2023
    • (2022)Generation of Voice Signal Tone Sandhi and Melody Based on Convolutional Neural NetworkACM Transactions on Asian and Low-Resource Language Information Processing10.1145/354556922:5(1-13)Online publication date: 19-Sep-2022
    • (2022)BBAE: A Method for Few-Shot Charge Prediction with Data Augmentation and Neural NetworkChinese Lexical Semantics10.1007/978-3-031-28956-9_5(58-66)Online publication date: 14-May-2022

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media