research-article

Improving Neural Machine Translation with Linear Interpolation of a Short-Path Unit

Authors:

Peng ZouAuthors Info & Claims

ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), Volume 19, Issue 3

Article No.: 44, Pages 1 - 16

https://doi.org/10.1145/3377851

Published: 07 February 2020 Publication History

Abstract

In neural machine translation (NMT), the source and target words are at the two ends of a large deep neural network, normally mediated by a series of non-linear activations. The problem with such consequent non-linear activations is that they significantly decrease the magnitude of the gradient in a deep neural network, and thus gradually loosen the interaction between source words and their translations. As a result, a source word may be incorrectly translated into a target word out of its translational equivalents. In this article, we propose short-path units (SPUs) to strengthen the association of source and target words by allowing information flow over adjacent layers effectively via linear interpolation. In particular, we enrich three critical NMT components with SPUs: (1) an enriched encoding model with SPU, which interpolates source word embeddings linearly into source annotations; (2) an enriched decoding model with SPU, which enables the source context linearly flow to target-side hidden states; and (3) an enriched output model with SPU, which further allows linear interpolation of target-side hidden states into output states. Experimentation on Chinese-to-English, English-to-German, and low-resource Tibetan-to-Chinese translation tasks demonstrates that the linear interpolation of SPUs significantly improves the overall translation quality by 1.88, 1.43, and 3.75 BLEU, respectively. Moreover, detailed analysis shows that our approaches much strengthen the association of source and target words. From the preceding, we can see that our proposed model is effective both in rich- and low-resource scenarios.

References

[1]

Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton. 2016. Layer normalization. Computing Research Repository. arXiv:1607.06450.

[2]

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the 3rd International Conference on Learning Representations (ICLR’15).

[3]

Ankur Bapna, Mia Chen, Orhan Firat, Yuan Cao, and Yonghui Wu. 2018. Training deeper neural machine translation models with transparent attention. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’18). 3028--3033.

[4]

Yoshua Bengio, Patrice Simard, and Paolo Frasconi. 1994. Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks 5, 2 (1994), 157--166.

Digital Library

[5]

Kyunghyun Cho, Bart van Merrienboer, Dzmitry Bahdanau, and Yoshua Bengio. 2014. On the properties of neural machinetranslation: Encoder-decoder approaches. In Proceedings of the Workshop on Syntax, Semantics, and Structure in Statistical Translation (SSST’14). 103--111.

[6]

Kyunghyun Cho, Bart van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder--decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 1724--1734.

[7]

Tobias Domhan. 2018. How much attention do you need? A granular analysis of neural machine translation architectures. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL’18). 1799--1808.

[8]

Zi-Yi Dou, Zhaopeng Tu, Xing Wang, Longyue Wang, Shuming Shi, and Tong Zhang. 2019. Dynamic layer aggregation for neural machine translation with routing-by-agreement. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence (AAAI’19).

[9]

Jonas Gehring, Michael Auli, David Grangier, and Yann Dauphin. 2017. A convolutional encoder model for neural machine translation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL’17). 123--135.

[10]

Dong Han, Junhui Li, Yachao Li, Min Zhang, and Guodong Zhou. 2019. Explicitly modeling word translations in neural machine translation. ACM Transactions on Asian and Low-Resource Language Information Processing 19, 1 (2019), Article 15.

Digital Library

[11]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’16). 770--778.

[12]

Geoffrey E. Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan R. Salakhutdinov. 2012. Improving neural networks by preventing co-adaptation of feature detectors. Computing Research Repository. arXiv:1207.0580.

[13]

Sepp Hochreiter and Jürgeni Schmidhuber. 1997. Long short-term memory. Neural Computation 9, 8 (1997), 1735--1780.

Digital Library

[14]

Sébastien Jean, Orhan Firat, Kyunghyun Cho, Roland Memisevic, and Yoshua Bengio. 2015. Montreal neural machine translation systems for WMT’15. In Proceedings of the 10th Workshop on Statistical Machine Translation (WMT’16). 134--140.

[15]

Ákos Kádár, Grzegorz Chrupała, and Afra Alishahi. 2017. Representation of linguistic form and function in recurrent neural networks. Computational Linguistics 43, 4 (Dec. 2017), 761--780.

Digital Library

[16]

Nal Kalchbrenner, Ivo Danihelka, and Alex Graves. 2016. Grid long short-term memory. In Proceedings of the 4th International Conference on Learning Representations (ICLR’16).

[17]

Jaeyoung Kim, Mostafa El-Khamy, and Jungwon Lee. 2017. Residual LSTM: Design of a deep recurrent architecture for distant speech recognition. arXiv:1701.03360v3.

[18]

Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In Proceedings of the 4th International Conference on Learning Representations (ICLR’15).

[19]

Philipp Koehn. 2004. Statistical significance tests for machine translation evaluation. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (EMNLP’04). 388--395. https://www.aclweb.org/anthology/W04-3250

[20]

Shaohui Kuang, Junhui Li, António Branco, Weihua Luo, and Deyi Xiong. 2018. Attention focusing for neural machine translation by bridging source and target embeddings. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL’18). 1767--1776.

[21]

Junhui Li, Deyi Xiong, Zhaopeng Tu, Muhua Zhu, Min Zhang, and Guodong Zhou. 2017. Modeling source syntax for neural machine translation. In Proceedings of of the 55th Annual Meeting of the Association for Computational Linguistics (ACL’17). 688--697.

[22]

Yachao Li, Junhui Li, and Min Zhang. 2018. Adaptive weighting for neural machine translation. In Proceedings of the 27th International Conference on Computational Linguistics (COLING’18). 3038--3048. https://www.aclweb.org/anthology/C18-1257

[23]

Yang Liu and Maosong Sun. 2015. Contrastive unsupervised word alignment with non-local features. In Proceedings of the 29th AAAI Conference on Artificial Intelligence (AAAI’15). 857--868.

[24]

Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. Effective approaches to attention-based neural machine translation. In Proceedings of the 2015 Conference on Empirical Methods on Natural Language Processing (EMNLP’15). 1412--1421.

[25]

Antonio Valerio Miceli Barone, Jindřich Helcl, Rico Sennrich, Barry Haddow, and Alexandra Birch. 2017. Deep architectures for neural machine translation. In Proceedings of the 2nd Conference on Machine Translation. 99--107.

[26]

Franz Josef Och and Hermann Ney. 2003. A systematic comparison of various statistical alignment models. Computational Linguistics 29, 1 (2003), 19--51.

Digital Library

[27]

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’02). 311--318.

[28]

Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio. 2013. On the difficulty of training recurrent neural networks. In Proceedings of the 30th International Conference on Machine Learning (ICML’13). 1310--1318.

[29]

Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Neural machine translation of rare words with subword units. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’16). 1715--1725.

[30]

Rupesh Kumar Srivastava, Klaus Greff, and Jurgen Schmidhuber. 2015. Highway networks. In Proceedings of the ICML 2015 Deep Learning Workshop.

[31]

Jinsong Su, Shan Wu, Deyi Xiong, Yaojie Lu, Xianpei Han, and Biao Zhang. 2018. Variational recurrent neural machine translation. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI’18).

[32]

Zhaopeng Tu, Yang Liu, Zhengdong Lu, Xiaohua Liu, and Hang Li. 2017. Context gates for neural machine translation. Transactions of the Association for Computational Linguistics 5 (2017), 87--99.

[33]

Zhaopeng Tu, Zhengdong Lu, Yang Liu, Xiaohua Liu, and Hang Li. 2016. Modeling coverage for neural machine translation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’16). 76--85.

[34]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the 2017 Conference on Neural Information Processing Systems (NIPS’17). 5998--6008.

[35]

Mingxuan Wang, Zhengdong Lu, Jie Zhou, and Qun Liu. 2017. Deep neural machine translation with linear associative unit. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’17). 136--145.

[36]

Xing Wang, Zhengdong Lu, Zhaopeng Tu, Hang Li, Deyi Xiong, and Min Zhang. 2017. Neural machine translation advised by statistical machine translation. In Proceedings of the 31st AAAI Conference on Artificial Intelligence (AAAI’17). 3330--3336.

[37]

Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, et al. 2016. Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv:1609.08144.

[38]

Hao Xiong, Zhongjun He, Xiaoguang Hu, and Hua Wu. 2018. Multi-channel encoder for neural machine translation. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI’18).

[39]

Kaisheng Yao, Trevor Cohn, Katerina Vylomova, Kevin Duh, and Chris Dyer. 2015. Depth-gated LSTM. arXiv:1508.03790v4.

[40]

Jiacheng Zhang, Yanzhuo Ding, Shiqi Shen, Yong Cheng, Maosong Sun, Huanbo Luan, and Yang Liu. 2017. THUMT: An open source toolkit for neural machine translation. arXiv:1706.06415.

[41]

Jie Zhou, Ying Cao, Xuguang Wang, Peng Li, and Wei Xu. 2016. Deep recurrent models with fast-forward connections for neural machine translation. Transactions of the Association for Computational Linguistics 4 (2016), 371--383.

Cited By

Choi CJung HCho J(2021)An Ensemble Method for Missing Data of Environmental Sensor Considering Univariate and Multivariate CharacteristicsSensors10.3390/s2122759521:22(7595)Online publication date: 16-Nov-2021
https://doi.org/10.3390/s21227595
Congjun LHill N(2021)Recent Developments in Tibetan NLPACM Transactions on Asian and Low-Resource Language Information Processing10.1145/345369220:2(1-3)Online publication date: 23-Apr-2021
https://dl.acm.org/doi/10.1145/3453692
Li YJiang JYangji JMa N(2021)Finding Better Subwords for Tibetan Neural Machine TranslationACM Transactions on Asian and Low-Resource Language Information Processing10.1145/344821620:2(1-11)Online publication date: 15-Mar-2021
https://dl.acm.org/doi/10.1145/3448216
Show More Cited By

Index Terms

Improving Neural Machine Translation with Linear Interpolation of a Short-Path Unit
1. Computer systems organization
  1. Embedded and cyber-physical systems
    1. Embedded systems

Recommendations

Explicitly Modeling Word Translations in Neural Machine Translation

In this article, we show that word translations can be explicitly incorporated into NMT effectively to avoid wrong translations. Specifically, we propose three cross-lingual encoders to explicitly incorporate word translations into NMT: (1) Factored ...
Using Translation Memory to Improve Neural Machine Translations
ICDLT '22: Proceedings of the 2022 6th International Conference on Deep Learning Technologies

In this paper, we describe a way of using translation memory (TM) to improve the translation quality and stability of neural machine translation (NMT) systems, especially when the sentences to be translated have high similarity with sentences stored in ...
Research on Tibetan-Chinese Neural Machine Translation Integrating Statistical Method
MLNLP '23: Proceedings of the 2023 6th International Conference on Machine Learning and Natural Language Processing

In recent years, with the emergence of deep learning methods, Neural Machine Translation has become a new research direction of machine translation. Due to the scarcity of digital resources in Tibetan, there is only a small-scale Tibetan-Chinese ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Asian and Low-Resource Language Information Processing

ACM Transactions on Asian and Low-Resource Language Information Processing Volume 19, Issue 3

May 2020

228 pages

ISSN:2375-4699

EISSN:2375-4702

DOI:10.1145/3378675

Editor:
Imed Zitouni
Microsoft, USA

Issue’s Table of Contents

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 February 2020

Accepted: 01 December 2019

Revised: 01 December 2019

Received: 01 April 2019

Published in TALLIP Volume 19, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

National Natural Science Foundation of China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
212
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)0

Reflects downloads up to 10 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Choi CJung HCho J(2021)An Ensemble Method for Missing Data of Environmental Sensor Considering Univariate and Multivariate CharacteristicsSensors10.3390/s2122759521:22(7595)Online publication date: 16-Nov-2021
https://doi.org/10.3390/s21227595
Congjun LHill N(2021)Recent Developments in Tibetan NLPACM Transactions on Asian and Low-Resource Language Information Processing10.1145/345369220:2(1-3)Online publication date: 23-Apr-2021
https://dl.acm.org/doi/10.1145/3453692
Li YJiang JYangji JMa N(2021)Finding Better Subwords for Tibetan Neural Machine TranslationACM Transactions on Asian and Low-Resource Language Information Processing10.1145/344821620:2(1-11)Online publication date: 15-Mar-2021
https://dl.acm.org/doi/10.1145/3448216
Li YLi JZhang M(2021)Improving neural machine translation with latent features feedbackNeurocomputing10.1016/j.neucom.2021.08.019463:C(368-378)Online publication date: 6-Nov-2021
https://dl.acm.org/doi/10.1016/j.neucom.2021.08.019
Li YLi JZhang M(2021)Deep Transformer modeling via grouping skip connection for neural machine translationKnowledge-Based Systems10.1016/j.knosys.2021.107556234:COnline publication date: 25-Dec-2021
https://dl.acm.org/doi/10.1016/j.knosys.2021.107556
Liu DLi YZhu DLiu XMa NZhu A(2020)Investigating Back-Translation in Tibetan-Chinese Neural Machine TranslationJournal of Physics: Conference Series10.1088/1742-6596/1651/1/0121221651:1(012122)Online publication date: 1-Nov-2020
https://doi.org/10.1088/1742-6596/1651/1/012122

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Issue’s Table of Contents