Article

Neural machine translation advised by statistical machine translation

Authors:

Min ZhangAuthors Info & Claims

AAAI'17: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence

Pages 3330 - 3336

Published: 04 February 2017 Publication History

Abstract

Neural Machine Translation (NMT) is a new approach to machine translation that has made great progress in recent years. However, recent studies show that NMT generally produces fluent but inadequate translations (Tu et al. 2016b; 2016a; He et al. 2016; Tu et al. 2017). This is in contrast to conventional Statistical Machine Translation (SMT), which usually yields adequate but non-fluent translations. It is natural, therefore, to leverage the advantages of both models for better translations, and in this work we propose to incorporate SMT model into NMT framework. More specifically, at each decoding step, SMT offers additional recommendations of generated words based on the decoding information from NMT (e.g., the generated partial translation and attention history). Then we employ an auxiliary classifier to score the SMT recommendations and a gating function to combine the SMT recommendations with NMT generations, both of which are jointly trained within the NMT architecture in an end-to-end manner. Experimental results on Chinese-English translation show that the proposed approach achieves significant and consistent improvements over state-of-the-art NMT and SMT systems on multiple NIST test sets.

References

[1]

Arthur, P.; Neubig, G.; and Nakamura, S. 2016. Incorporating discrete translation lexicons into neural machine translation. In Proceedings of the 2016 Conference on EMNLP.

[2]

Bahdanau, D.; Cho, K.; and Bengio, Y. 2015. Neural machine translation by jointly learning to align and translate. In ICLR.

[3]

Brown, P. F.; Pietra, V. J. D.; Pietra, S. A. D.; and Mercer, R. L. 1993. The mathematics of statistical machine translation: Parameter estimation. Computational linguistics.

Digital Library

[4]

Chiang, D. 2005. A hierarchical phrase-based model for statistical machine translation. In Proceedings of the 43rd ACL.

Digital Library

[5]

Chitnis, R., and DeNero, J. 2015. Variable-length word encodings for neural translation models. In Proceedings of the 2015 Conference on EMNLP.

[6]

Cho, K.; van Merriënboer, B.; Bahdanau, D.; and Bengio, Y. 2014a. On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint.

[7]

Cho, K.; van Merrienboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; and Bengio, Y. 2014b. Learning phrase representations using rnn encoder-decoder for statistical machine translation. In Proceedings of the 2014 Conference on EMNLP.

[8]

Chung, J.; Cho, K.; and Bengio, Y. 2016. A character-level decoder without explicit segmentation for neural machine translation. In Proceedings of the 54th ACL.

[9]

Cohn, T.; Hoang, C. D. V.; Vymolova, E.; Yao, K.; Dyer, C.; and Haffari, G. 2016. Incorporating structural alignment biases into an attentional neural translation model. In Proceedings of the 2016 NAACL.

[10]

Costa-jussà, M. R., and Fonollosa, J. A. R. 2016. Character-based neural machine translation. In Proceedings of the 54th ACL.

[11]

Feng, S.; Liu, S.; Li, M.; and Zhou, M. 2016. Implicit distortion and fertility models for attention-based encoder-decoder nmt model. arXiv preprint.

[12]

Gu, J.; Lu, Z.; Li, H.; and Li, V. O. 2016. Incorporating copying mechanism in sequence-to-sequence learning. In Proceedings of the 54th ACL.

[13]

He, W.; He, Z.; Wu, H.; and Wang, H. 2016. Improved neural machine translation with smt features. In Proceedings of the 30th AAAI Conference on Artificial Intelligence.

Digital Library

[14]

Heafield, K. 2011. KenLM: faster and smaller language model queries. In Proceedings of the EMNLP 2011 Sixth Workshop on Statistical Machine Translation.

Digital Library

[15]

Hochreiter, S., and Schmidhuber, J. 1997. Long short-term memory. Neural computation 9(8):1735-1780.

Digital Library

[16]

Jean, S.; Cho, K.; Memisevic, R.; and Bengio, Y. 2015. On using very large target vocabulary for neural machine translation. In Proceedings of the 53rd ACL and the 7th IJCNLP.

[17]

Kalchbrenner, N., and Blunsom, P. 2013. Recurrent continuous translation models. In Proceedings of the 2013 Conference on EMNLP.

[18]

Koehn, P.; Och, F. J.; and Marcu, D. 2003. Statistical phrase-based translation. In Proceedings of the 2003 NAACL.

Digital Library

[19]

Ling, W.; Trancoso, I.; Dyer, C.; and Black, A. W. 2015. Character-based neural machine translation. arXiv preprint.

[20]

Luong, M.-T., and Manning, C. D. 2016. Achieving open vocabulary neural machine translation with hybrid word-character models. In Proceedings of the 54th ACL.

[21]

Luong, T.; Sutskever, I.; Le, Q.; Vinyals, O.; and Zaremba, W. 2015. Addressing the rare word problem in neural machine translation. In Proceedings of the 53rd ACL and the 7th IJCNLP.

[22]

Meng, F.; Lu, Z.; Tu, Z.; Li, H.; and Liu, Q. 2016. A deep memory-based architecture for sequence-to-sequence learning. In Proceedings of ICLR-Workshop 2016.

[23]

Och, F. J., and Ney, H. 2002. Discriminative training and maximum entropy models for statistical machine translation. In Proceedings of 40th ACL.

Digital Library

[24]

Och, F. J. 2003. Minimum error rate training in statistical machine translation. In Proceedings of the 41st ACL.

Digital Library

[25]

Schuster, M., and Paliwal, K. K. 1997. Bidirectional recurrent neural networks. Signal Processing, IEEE Transactions on 45(11):2673-2681.

Digital Library

[26]

Sennrich, R.; Haddow, B.; and Birch, A. 2016. Improving neural machine translation models with monolingual data. In Proceedings of the 54th ACL.

[27]

Stahlberg, F.; Hasler, E.; Waite, A.; and Byrne, B. 2016. Syntactically guided neural machine translation. In Proceedings of the 54th ACL (Volume 2: Short Papers).

[28]

Sutskever, I.; Vinyals, O.; and Le, Q. V. 2014. Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems 27.

Digital Library

[29]

Tang, Y.; Meng, F.; Lu, Z.; Li, H.; and Yu, P. L. 2016. Neural machine translation with external phrase memory. arXiv preprint arXiv:1606.01792.

[30]

Tu, Z.; Liu, Y.; Lu, Z.; Liu, X.; and Li, H. 2016a. Context gates for neural machine translation. arXiv preprint arXiv:1608.06043.

[31]

Tu, Z.; Lu, Z.; Liu, Y.; Liu, X.; and Li, H. 2016b. Modeling coverage for neural machine translation. In Proceedings of the 54th ACL.

[32]

Tu, Z.; Liu, Y.; Shang, L.; Liu, X.; and Li, H. 2017. Neural machine translation with reconstruction. In Proceedings of the 31th AAAI Conference on Artificial Intelligence.

[33]

Wuebker, J.; Green, S.; DeNero, J.; Hasan, S.; and Luong, M.-T. 2016. Models and inference for prefix-constrained machine translation. In Proceedings of the 54th ACL.

[34]

Zeiler, M. D. 2012. Adadelta: an adaptive learning rate method. arXiv preprint arXiv:1212.5701.

Cited By

Han DLi JLi YZhang MZhou G(2019)Explicitly Modeling Word Translations in Neural Machine TranslationACM Transactions on Asian and Low-Resource Language Information Processing10.1145/334235319:1(1-17)Online publication date: 23-Jul-2019
https://dl.acm.org/doi/10.1145/3342353

Neural machine translation advised by statistical machine translation
1. Hardware
  1. Power and energy
    1. Power estimation and optimization

Recommendations

N-gram-based statistical machine translation versus syntax augmented machine translation: comparison and system combination
EACL '09: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics

In this paper we compare and contrast two approaches to Machine Translation (MT): the CMU-UKA Syntax Augmented Machine Translation system (SAMT) and UPC-TALP N-gram-based Statistical Machine Translation (SMT). SAMT is a hierarchical syntax-driven ...
Linguistically annotated BTG for statistical machine translation
COLING '08: Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1

Bracketing Transduction Grammar (BTG) is a natural choice for effective integration of desired linguistic knowledge into statistical machine translation (SMT). In this paper, we propose a Linguistically Annotated BTG (LABTG) for SMT. It conveys ...
Dependency-Based Chinese-English Statistical Machine Translation
CICLing '07: Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing

We present a Chinese-English Statistical Machine Translation (SMT) system based on dependency tree mappings. We use a state-of-the-art dependency parser to parse the English translation of the Penn Chinese Treebank to make it bilingual and then learn a ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

AAAI'17: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence

February 2017

5106 pages

Program Chairs:
Satinder Singh
University of Michigan
,
Shaul Markovitch
Technion-Israel Institute of Technology

Sponsors

Association for the Advancement of Artificial Intelligence
amazon: amazon
Infosys
Facebook: Facebook
IBM: IBM

Publisher

AAAI Press

Publication History

Published: 04 February 2017

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Han DLi JLi YZhang MZhou G(2019)Explicitly Modeling Word Translations in Neural Machine TranslationACM Transactions on Asian and Low-Resource Language Information Processing10.1145/334235319:1(1-17)Online publication date: 23-Jul-2019
https://dl.acm.org/doi/10.1145/3342353

View Options

View options

Figures

Tables

Media

View Table of Conten