Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Improving Neural Machine Translation with Linear Interpolation of a Short-Path Unit

Published: 07 February 2020 Publication History

Abstract

In neural machine translation (NMT), the source and target words are at the two ends of a large deep neural network, normally mediated by a series of non-linear activations. The problem with such consequent non-linear activations is that they significantly decrease the magnitude of the gradient in a deep neural network, and thus gradually loosen the interaction between source words and their translations. As a result, a source word may be incorrectly translated into a target word out of its translational equivalents. In this article, we propose short-path units (SPUs) to strengthen the association of source and target words by allowing information flow over adjacent layers effectively via linear interpolation. In particular, we enrich three critical NMT components with SPUs: (1) an enriched encoding model with SPU, which interpolates source word embeddings linearly into source annotations; (2) an enriched decoding model with SPU, which enables the source context linearly flow to target-side hidden states; and (3) an enriched output model with SPU, which further allows linear interpolation of target-side hidden states into output states. Experimentation on Chinese-to-English, English-to-German, and low-resource Tibetan-to-Chinese translation tasks demonstrates that the linear interpolation of SPUs significantly improves the overall translation quality by 1.88, 1.43, and 3.75 BLEU, respectively. Moreover, detailed analysis shows that our approaches much strengthen the association of source and target words. From the preceding, we can see that our proposed model is effective both in rich- and low-resource scenarios.

References

[1]
Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton. 2016. Layer normalization. Computing Research Repository. arXiv:1607.06450.
[2]
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the 3rd International Conference on Learning Representations (ICLR’15).
[3]
Ankur Bapna, Mia Chen, Orhan Firat, Yuan Cao, and Yonghui Wu. 2018. Training deeper neural machine translation models with transparent attention. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’18). 3028--3033.
[4]
Yoshua Bengio, Patrice Simard, and Paolo Frasconi. 1994. Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks 5, 2 (1994), 157--166.
[5]
Kyunghyun Cho, Bart van Merrienboer, Dzmitry Bahdanau, and Yoshua Bengio. 2014. On the properties of neural machinetranslation: Encoder-decoder approaches. In Proceedings of the Workshop on Syntax, Semantics, and Structure in Statistical Translation (SSST’14). 103--111.
[6]
Kyunghyun Cho, Bart van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder--decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 1724--1734.
[7]
Tobias Domhan. 2018. How much attention do you need? A granular analysis of neural machine translation architectures. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL’18). 1799--1808.
[8]
Zi-Yi Dou, Zhaopeng Tu, Xing Wang, Longyue Wang, Shuming Shi, and Tong Zhang. 2019. Dynamic layer aggregation for neural machine translation with routing-by-agreement. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence (AAAI’19).
[9]
Jonas Gehring, Michael Auli, David Grangier, and Yann Dauphin. 2017. A convolutional encoder model for neural machine translation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL’17). 123--135.
[10]
Dong Han, Junhui Li, Yachao Li, Min Zhang, and Guodong Zhou. 2019. Explicitly modeling word translations in neural machine translation. ACM Transactions on Asian and Low-Resource Language Information Processing 19, 1 (2019), Article 15.
[11]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’16). 770--778.
[12]
Geoffrey E. Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan R. Salakhutdinov. 2012. Improving neural networks by preventing co-adaptation of feature detectors. Computing Research Repository. arXiv:1207.0580.
[13]
Sepp Hochreiter and Jürgeni Schmidhuber. 1997. Long short-term memory. Neural Computation 9, 8 (1997), 1735--1780.
[14]
Sébastien Jean, Orhan Firat, Kyunghyun Cho, Roland Memisevic, and Yoshua Bengio. 2015. Montreal neural machine translation systems for WMT’15. In Proceedings of the 10th Workshop on Statistical Machine Translation (WMT’16). 134--140.
[15]
Ákos Kádár, Grzegorz Chrupała, and Afra Alishahi. 2017. Representation of linguistic form and function in recurrent neural networks. Computational Linguistics 43, 4 (Dec. 2017), 761--780.
[16]
Nal Kalchbrenner, Ivo Danihelka, and Alex Graves. 2016. Grid long short-term memory. In Proceedings of the 4th International Conference on Learning Representations (ICLR’16).
[17]
Jaeyoung Kim, Mostafa El-Khamy, and Jungwon Lee. 2017. Residual LSTM: Design of a deep recurrent architecture for distant speech recognition. arXiv:1701.03360v3.
[18]
Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In Proceedings of the 4th International Conference on Learning Representations (ICLR’15).
[19]
Philipp Koehn. 2004. Statistical significance tests for machine translation evaluation. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (EMNLP’04). 388--395. https://www.aclweb.org/anthology/W04-3250
[20]
Shaohui Kuang, Junhui Li, António Branco, Weihua Luo, and Deyi Xiong. 2018. Attention focusing for neural machine translation by bridging source and target embeddings. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL’18). 1767--1776.
[21]
Junhui Li, Deyi Xiong, Zhaopeng Tu, Muhua Zhu, Min Zhang, and Guodong Zhou. 2017. Modeling source syntax for neural machine translation. In Proceedings of of the 55th Annual Meeting of the Association for Computational Linguistics (ACL’17). 688--697.
[22]
Yachao Li, Junhui Li, and Min Zhang. 2018. Adaptive weighting for neural machine translation. In Proceedings of the 27th International Conference on Computational Linguistics (COLING’18). 3038--3048. https://www.aclweb.org/anthology/C18-1257
[23]
Yang Liu and Maosong Sun. 2015. Contrastive unsupervised word alignment with non-local features. In Proceedings of the 29th AAAI Conference on Artificial Intelligence (AAAI’15). 857--868.
[24]
Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. Effective approaches to attention-based neural machine translation. In Proceedings of the 2015 Conference on Empirical Methods on Natural Language Processing (EMNLP’15). 1412--1421.
[25]
Antonio Valerio Miceli Barone, Jindřich Helcl, Rico Sennrich, Barry Haddow, and Alexandra Birch. 2017. Deep architectures for neural machine translation. In Proceedings of the 2nd Conference on Machine Translation. 99--107.
[26]
Franz Josef Och and Hermann Ney. 2003. A systematic comparison of various statistical alignment models. Computational Linguistics 29, 1 (2003), 19--51.
[27]
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’02). 311--318.
[28]
Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio. 2013. On the difficulty of training recurrent neural networks. In Proceedings of the 30th International Conference on Machine Learning (ICML’13). 1310--1318.
[29]
Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Neural machine translation of rare words with subword units. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’16). 1715--1725.
[30]
Rupesh Kumar Srivastava, Klaus Greff, and Jurgen Schmidhuber. 2015. Highway networks. In Proceedings of the ICML 2015 Deep Learning Workshop.
[31]
Jinsong Su, Shan Wu, Deyi Xiong, Yaojie Lu, Xianpei Han, and Biao Zhang. 2018. Variational recurrent neural machine translation. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI’18).
[32]
Zhaopeng Tu, Yang Liu, Zhengdong Lu, Xiaohua Liu, and Hang Li. 2017. Context gates for neural machine translation. Transactions of the Association for Computational Linguistics 5 (2017), 87--99.
[33]
Zhaopeng Tu, Zhengdong Lu, Yang Liu, Xiaohua Liu, and Hang Li. 2016. Modeling coverage for neural machine translation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’16). 76--85.
[34]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the 2017 Conference on Neural Information Processing Systems (NIPS’17). 5998--6008.
[35]
Mingxuan Wang, Zhengdong Lu, Jie Zhou, and Qun Liu. 2017. Deep neural machine translation with linear associative unit. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’17). 136--145.
[36]
Xing Wang, Zhengdong Lu, Zhaopeng Tu, Hang Li, Deyi Xiong, and Min Zhang. 2017. Neural machine translation advised by statistical machine translation. In Proceedings of the 31st AAAI Conference on Artificial Intelligence (AAAI’17). 3330--3336.
[37]
Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, et al. 2016. Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv:1609.08144.
[38]
Hao Xiong, Zhongjun He, Xiaoguang Hu, and Hua Wu. 2018. Multi-channel encoder for neural machine translation. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI’18).
[39]
Kaisheng Yao, Trevor Cohn, Katerina Vylomova, Kevin Duh, and Chris Dyer. 2015. Depth-gated LSTM. arXiv:1508.03790v4.
[40]
Jiacheng Zhang, Yanzhuo Ding, Shiqi Shen, Yong Cheng, Maosong Sun, Huanbo Luan, and Yang Liu. 2017. THUMT: An open source toolkit for neural machine translation. arXiv:1706.06415.
[41]
Jie Zhou, Ying Cao, Xuguang Wang, Peng Li, and Wei Xu. 2016. Deep recurrent models with fast-forward connections for neural machine translation. Transactions of the Association for Computational Linguistics 4 (2016), 371--383.

Cited By

View all
  • (2021)An Ensemble Method for Missing Data of Environmental Sensor Considering Univariate and Multivariate CharacteristicsSensors10.3390/s2122759521:22(7595)Online publication date: 16-Nov-2021
  • (2021)Recent Developments in Tibetan NLPACM Transactions on Asian and Low-Resource Language Information Processing10.1145/345369220:2(1-3)Online publication date: 23-Apr-2021
  • (2021)Finding Better Subwords for Tibetan Neural Machine TranslationACM Transactions on Asian and Low-Resource Language Information Processing10.1145/344821620:2(1-11)Online publication date: 15-Mar-2021
  • Show More Cited By

Index Terms

  1. Improving Neural Machine Translation with Linear Interpolation of a Short-Path Unit

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Asian and Low-Resource Language Information Processing
    ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 19, Issue 3
    May 2020
    228 pages
    ISSN:2375-4699
    EISSN:2375-4702
    DOI:10.1145/3378675
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 07 February 2020
    Accepted: 01 December 2019
    Revised: 01 December 2019
    Received: 01 April 2019
    Published in TALLIP Volume 19, Issue 3

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Neural machine translation
    2. low resource
    3. neural networks
    4. short-path units

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • National Natural Science Foundation of China

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)5
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 10 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)An Ensemble Method for Missing Data of Environmental Sensor Considering Univariate and Multivariate CharacteristicsSensors10.3390/s2122759521:22(7595)Online publication date: 16-Nov-2021
    • (2021)Recent Developments in Tibetan NLPACM Transactions on Asian and Low-Resource Language Information Processing10.1145/345369220:2(1-3)Online publication date: 23-Apr-2021
    • (2021)Finding Better Subwords for Tibetan Neural Machine TranslationACM Transactions on Asian and Low-Resource Language Information Processing10.1145/344821620:2(1-11)Online publication date: 15-Mar-2021
    • (2021)Improving neural machine translation with latent features feedbackNeurocomputing10.1016/j.neucom.2021.08.019463:C(368-378)Online publication date: 6-Nov-2021
    • (2021)Deep Transformer modeling via grouping skip connection for neural machine translationKnowledge-Based Systems10.1016/j.knosys.2021.107556234:COnline publication date: 25-Dec-2021
    • (2020)Investigating Back-Translation in Tibetan-Chinese Neural Machine TranslationJournal of Physics: Conference Series10.1088/1742-6596/1651/1/0121221651:1(012122)Online publication date: 1-Nov-2020

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media