research-article

Low-resource Neural Machine Translation: Methods and Trends

Authors:

Heyan HuangAuthors Info & Claims

ACM Transactions on Asian and Low-Resource Language Information Processing, Volume 21, Issue 5

Article No.: 103, Pages 1 - 22

https://doi.org/10.1145/3524300

Published: 15 November 2022 Publication History

Abstract

Neural Machine Translation (NMT) brings promising improvements in translation quality, but until recently, these models rely on large-scale parallel corpora. As such corpora only exist on a handful of language pairs, the translation performance is far from the desired effect in the majority of low-resource languages. Thus, developing low-resource language translation techniques is crucial and it has become a popular research field in neural machine translation. In this article, we make an overall review of existing deep learning techniques in low-resource NMT. We first show the research status as well as some widely used low-resource datasets. Then, we categorize the existing methods and show some representative works detailedly. Finally, we summarize the common characters among them and outline the future directions in this field.

References

[1]

Mikel Artetxe, Gorka Labaka, and Eneko Agirre. 2017. Learning bilingual word embeddings with (almost) no bilingual data. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 451–462.

[2]

Mikel Artetxe, Gorka Labaka, and Eneko Agirre. 2019. An effective approach to unsupervised machine translation. arXiv preprint arXiv:1902.01313 (2019).

[3]

Mikel Artetxe, Gorka Labaka, and Eneko Agirre. 2020. Translation artifacts in cross-lingual transfer learning. arXiv preprint arXiv:2004.04721 (2020).

[4]

Mikel Artetxe, Gorka Labaka, Eneko Agirre, and Kyunghyun Cho. 2017. Unsupervised neural machine translation. arXiv preprint arXiv:1710.11041 (2017).

[5]

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).

[6]

Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization. 65–72.

Digital Library

[7]

Nicola Bertoldi and Marcello Federico. 2009. Domain adaptation for statistical machine translation with monolingual resources. In Proceedings of the Fourth Workshop on Statistical Machine Translation. 182–189.

Digital Library

[8]

Ondřej Bojar and Aleš Tamchyna. 2011. Improving translation model by monolingual data. In Proceedings of the 6th Workshop on Statistical Machine Translation. 330–336.

[9]

Peter F. Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer. 1993. The mathematics of statistical machine translation: Parameter estimation. Computat. Ling. 19, 2 (1993), 263–311.

Digital Library

[10]

Emanuele Bugliarello and Naoaki Okazaki. 2019. Enhancing machine translation with dependency-aware self-attention. arXiv preprint arXiv:1909.03149 (2019).

[11]

Jaime G. Carbonell, Steve Klein, David Miller, Mike Steinbaum, Tomer Grassiany, and Jochen Frey. 2006. Context-based machine translation. In Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers, 19–28.

[12]

Isaac Caswell, Ciprian Chelba, and David Grangier. 2019. Tagged back-translation. arXiv preprint arXiv:1906.06442 (2019).

[13]

Yun Chen, Yang Liu, Yong Cheng, and Victor O. K. Li. 2017. A teacher-student framework for zero-resource neural machine translation. arXiv preprint arXiv:1705.00753 (2017).

[14]

Yun Chen, Yang Liu, and Victor Li. 2018. Zero-resource neural machine translation with multi-agent communication game. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.

[15]

Yong Cheng. 2019. Joint training for pivot-based neural machine translation. In Joint Training for Neural Machine Translation. Springer, 41–54.

[16]

Yong Cheng. 2019. Semi-supervised learning for neural machine translation. In Joint Training for Neural Machine Translation. Springer, 25–40.

[17]

David Chiang. 2005. A hierarchical phrase-based model for statistical machine translation. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05). 263–270.

Digital Library

[18]

Santwana Chimalamarri, Dinkar Sitaram, and Ashritha Jain. 2020. Morphological segmentation to improve crosslingual word embeddings for low resource languages. ACM Trans. Asian Low-resour. Lang. Inf. Process. 19, 5 (2020), 1–15.

Digital Library

[19]

Alexandra Chronopoulou, Dario Stojanovski, and Alexander Fraser. 2020. Reusing a pretrained language model on languages with limited corpora for unsupervised NMT. arXiv preprint arXiv:2009.07610 (2020).

[20]

Michael Collins, Philipp Koehn, and Ivona Kučerová. 2005. Clause restructuring for statistical machine translation. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05). 531–540.

Digital Library

[21]

Ryan Cotterell and Julia Kreutzer. 2018. Explaining and generalizing back-translation through wake-sleep. arXiv preprint arXiv:1806.04402 (2018).

[22]

Anna Currey and Kenneth Heafield. 2019. Incorporating source syntax into transformer-based neural machine translation. In Proceedings of the 4th Conference on Machine Translation (Volume 1: Research Papers). 24–33.

[23]

Anna Currey, Antonio Valerio Miceli-Barone, and Kenneth Heafield. 2017. Copied monolingual data improves low-resource neural machine translation. In Proceedings of the 2nd Conference on Machine Translation. 148–156.

[24]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).

[25]

Qing Dou and Kevin Knight. 2012. Large scale decipherment for out-of-domain machine translation. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 266–275.

Digital Library

[26]

Sergey Edunov, Myle Ott, Michael Auli, and David Grangier. 2018. Understanding back-translation at scale. arXiv preprint arXiv:1808.09381 (2018).

[27]

Ahmed El Kholy, Nizar Habash, Gregor Leusch, Evgeny Matusov, and Hassan Sawaf. 2013. Language independent connectivity strength features for phrase pivot statistical machine translation. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 412–418.

[28]

Marzieh Fadaee, Arianna Bisazza, and Christof Monz. 2017. Data augmentation for low-resource neural machine translation. arXiv preprint arXiv:1705.00440 (2017).

[29]

Marzieh Fadaee and Christof Monz. 2018. Back-translation sampling by targeting difficult words in neural machine translation. arXiv preprint arXiv:1808.09006 (2018).

[30]

M. Amin Farajian, Marco Turchi, Matteo Negri, and Marcello Federico. 2017. Multi-domain neural machine translation through unsupervised adaptation. In Proceedings of the 2nd Conference on Machine Translation. 127–137.

[31]

Orhan Firat, Baskaran Sankaran, Yaser Al-Onaizan, Fatos T. Yarman Vural, and Kyunghyun Cho. 2016. Zero-resource translation with multi-lingual neural machine translation. arXiv preprint arXiv:1606.04164 (2016).

[32]

Yarin Gal and Zoubin Ghahramani. 2016. A theoretically grounded application of dropout in recurrent neural networks. Adv. Neural Inf. Process. Syst. 29 (2016), 1019–1027.

[33]

Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand, and Victor Lempitsky. 2016. Domain-adversarial training of neural networks. J. Mach. Learn. Res. 17, 1 (2016), 2096–2030.

Digital Library

[34]

Fei Gao, Jinhua Zhu, Lijun Wu, Yingce Xia, Tao Qin, Xueqi Cheng, Wengang Zhou, and Tie-Yan Liu. 2019. Soft contextual data augmentation for neural machine translation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 5539–5544.

[35]

Ilshat Gibadullin, Aidar Valeev, Albina Khusainova, and Adil Khan. 2019. A survey of methods to leverage monolingual data in low-resource neural machine translation. arXiv preprint arXiv:1910.00373 (2019).

[36]

Jiatao Gu, Hany Hassan, Jacob Devlin, and Victor O. K. Li. 2018. Universal neural machine translation for extremely low resource languages. arXiv preprint arXiv:1802.05368 (2018).

[37]

Jiatao Gu, Yong Wang, Yun Chen, Kyunghyun Cho, and Victor O. K. Li. 2018. Meta-learning for low-resource neural machine translation. arXiv preprint arXiv:1808.08437 (2018).

[38]

Caglar Gulcehre, Orhan Firat, Kelvin Xu, Kyunghyun Cho, Loic Barrault, Huei-Chi Lin, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2015. On using monolingual corpora in neural machine translation. arXiv preprint arXiv:1503.03535 (2015).

[39]

Junliang Guo, Xu Tan, Di He, Tao Qin, Linli Xu, and Tie-Yan Liu. 2019. Non-autoregressive neural machine translation with enhanced decoder input. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 3723–3730.

Digital Library

[40]

Francisco Guzmán, Peng-Jen Chen, Myle Ott, Juan Pino, Guillaume Lample, Philipp Koehn, Vishrav Chaudhary, and Marc’Aurelio Ranzato. 2019. The FLoRes evaluation datasets for low-resource machine translation: Nepali-English and Sinhala-English. arXiv preprint arXiv:1902.01382 (2019).

[41]

Di He, Yingce Xia, Tao Qin, Liwei Wang, Nenghai Yu, Tie-Yan Liu, and Wei-Ying Ma. 2016. Dual learning for machine translation. Adv. Neural Inf. Process. Syst. 29 (2016), 820–828.

[42]

Vu Cong Duy Hoang, Philipp Koehn, Gholamreza Haffari, and Trevor Cohn. 2018. Iterative back-translation for neural machine translation. In Proceedings of the 2nd Workshop on Neural Machine Translation and Generation. 18–24.

[43]

Aizhan Imankulova, Raj Dabre, Atsushi Fujita, and Kenji Imamura. 2019. Exploiting out-of-domain parallel data through multilingual transfer learning for low-resource neural machine translation. arXiv preprint arXiv:1907.03060 (2019).

[44]

Aizhan Imankulova, Takayuki Sato, and Mamoru Komachi. 2019. Filtered pseudo-parallel corpus improves low-resource neural machine translation. ACM Trans. Asian Low-resour. Lang. Inf. Process. 19, 2 (2019), 1–16.

Digital Library

[45]

Ann Irvine and Chris Callison-Burch. 2013. Combining bilingual and comparable corpora for low resource machine translation. In Proceedings of the 8th Workshop on Statistical Machine Translation. 262–270.

[46]

Ann Irvine and Chris Callison-Burch. 2014. Hallucinating phrase translations for low resource MT. In Proceedings of the 18th Conference on Computational Natural Language Learning. 160–170.

[47]

Ann Irvine and Chris Callison-Burch. 2016. End-to-end statistical machine translation with zero or small parallel texts. Nat. Lang. Eng. 22, 4 (2016), 517.

[48]

Hideki Isozaki, Katsuhito Sudoh, Hajime Tsukada, and Kevin Duh. 2010. Head finalization: A simple reordering rule for SOV languages. In Proceedings of the Joint 5th Workshop on Statistical Machine Translation and Metrics. 244–251.

[49]

Baijun Ji, Zhirui Zhang, Xiangyu Duan, Min Zhang, Boxing Chen, and Weihua Luo. 2020. Cross-lingual pre-training based transfer for zero-shot neural machine translation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 115–122.

[50]

Xu Jitao, Josep M. Crego, and Jean Senellart. 2020. Boosting neural machine translation with similar translations. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 1580–1590.

[51]

Melvin Johnson, Mike Schuster, Quoc V. Le, Maxim Krikun, Yonghui Wu, Zhifeng Chen, Nikhil Thorat, Fernanda Viégas, Martin Wattenberg, Greg Corrado, et al. 2017. Google’s multilingual neural machine translation system: Enabling zero-shot translation. Trans. Assoc. Computat. Ling. 5 (2017), 339–351.

[52]

Alina Karakanta, Jon Dehdari, and Josef van Genabith. 2018. Neural machine translation for low-resource languages without parallel corpora. Mach. Transl. 32, 1 (2018), 167–189.

Digital Library

[53]

Huda Khayrallah, Brian Thompson, Matt Post, and Philipp Koehn. 2020. Simulated multiple reference training improves low-resource machine translation. arXiv preprint arXiv:2004.14524 (2020).

[54]

Yunsu Kim, Yingbo Gao, and Hermann Ney. 2019. Effective cross-lingual transfer of neural machine translation models without shared vocabularies. arXiv preprint arXiv:1905.05475 (2019).

[55]

Yunsu Kim, Jiahui Geng, and Hermann Ney. 2019. Improving unsupervised word-by-word translation with language model and denoising autoencoder. arXiv preprint arXiv:1901.01590 (2019).

[56]

Yunsu Kim, Petre Petrov, Pavel Petrushkov, Shahram Khadivi, and Hermann Ney. 2019. Pivot-based transfer learning for neural machine translation between non-English languages. arXiv preprint arXiv:1909.09524 (2019).

[57]

Alexandre Klementiev, Ann Irvine, Chris Callison-Burch, and David Yarowsky. 2012. Toward statistical machine translation without parallel corpora. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics. 130–140.

Digital Library

[58]

Tom Kocmi and Ondřej Bojar. 2018. Trivial transfer learning for low-resource neural machine translation. arXiv preprint arXiv:1809.00357 (2018).

[59]

Philipp Koehn, Franz J. Och, and Daniel Marcu. 2003. Statistical Phrase-based Translation. Technical Report. University of Southern California Marina Del Rey Information Sciences Inst.

[60]

Surafel M. Lakew, Aliia Erofeeva, Matteo Negri, Marcello Federico, and Marco Turchi. 2018. Transfer learning in multilingual neural machine translation with dynamic vocabulary. arXiv preprint arXiv:1811.01137 (2018).

[61]

Guillaume Lample, Alexis Conneau, Ludovic Denoyer, and Marc’Aurelio Ranzato. 2017. Unsupervised machine translation using monolingual corpora only. arXiv preprint arXiv:1711.00043 (2017).

[62]

Guillaume Lample, Myle Ott, Alexis Conneau, Ludovic Denoyer, and Marc’Aurelio Ranzato. 2018. Phrase-based & neural unsupervised machine translation. arXiv preprint arXiv:1804.07755 (2018).

[63]

Jason Lee, Kyunghyun Cho, and Thomas Hofmann. 2017. Fully character-level neural machine translation without explicit segmentation. Trans. Assoc. Computat. Ling. 5 (2017), 365–378.

[64]

Yichong Leng, Xu Tan, Tao Qin, Xiang-Yang Li, and Tie-Yan Liu. 2019. Unsupervised pivot translation for distant languages. arXiv preprint arXiv:1906.02461 (2019).

[65]

Guanlin Li, Lemao Liu, Guoping Huang, Conghui Zhu, and Tiejun Zhao. 2019. Understanding data augmentation in neural machine translation: Two perspectives towards generalization. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 5693–5699.

[66]

Rumeng Li, Xun Wang, and Hong Yu. 2020. MetaMT, a meta learning method leveraging multiple domain data for low resource machine translation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 8245–8252.

[67]

Xiaoqing Li, Jiajun Zhang, and Chengqing Zong. 2016. One sentence one model for neural machine translation. arXiv preprint arXiv:1609.06490 (2016).

[68]

Ding Liu, Ning Ma, Fangtao Yang, and Xuebin Yang. 2019. A survey of low resource neural machine translation. In Proceedings of the 4th International Conference on Mechanical, Control and Computer Engineering (ICMCCE). IEEE, 39–393.

[69]

Zihan Liu, Genta Indra Winata, and Pascale Fung. 2021. Continual mixed-language pre-training for extremely low-resource neural machine translation. arXiv preprint arXiv:2105.03953 (2021).

[70]

Gongxu Luo, Yating Yang, Yang Yuan, Zhanheng Chen, and Aizimaiti Ainiwaer. 2019. Hierarchical transfer learning architecture for low-resource neural machine translation. IEEE Access 7 (2019), 154157–154166.

[71]

Minh-Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025 (2015).

[72]

Mieradilijiang Maimaiti, Yang Liu, Huanbo Luan, Zegao Pan, and Maosong Sun. 2021. Improving data augmentation for low-resource NMT guided by POS-Tagging and paraphrase embedding. Trans. Asian Low-resour. Lang. Inf. Process. 20, 6 (2021), 1–21.

Digital Library

[73]

Mieradilijiang Maimaiti, Yang Liu, Huanbo Luan, and Maosong Sun. 2019. Multi-round transfer learning for low-resource NMT using multiple high-resource languages. ACM Trans. Asian Low-resour. Lang. Inf. Process. 18, 4 (2019), 1–26.

Digital Library

[74]

Benjamin Marie, Raphael Rubino, and Atsushi Fujita. 2020. Tagged back-translation revisited: Why does it really work? In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 5990–5997.

[75]

Hideki Nakayama and Noriki Nishida. 2017. Zero-resource machine translation by multimodal encoder–decoder network with multimedia pivot. Mach. Transl. 31, 1 (2017), 49–64.

Digital Library

[76]

Toan Q. Nguyen and David Chiang. 2017. Transfer learning across low-resource, related languages for neural machine translation. arXiv preprint arXiv:1708.09803 (2017).

[77]

Xuan-Phi Nguyen, Shafiq Joty, Wu Kui, and Ai Ti Aw. 2019. Data diversification: A simple strategy for neural machine translation. arXiv preprint arXiv:1911.01986 (2019).

[78]

Boyuan Pan, Yazheng Yang, Hao Li, Zhou Zhao, Yueting Zhuang, Deng Cai, and Xiaofei He. 2019. MacNet: Transferring knowledge from machine comprehension to sequence-to-sequence models. arXiv preprint arXiv:1908.01816 (2019).

[79]

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. 311–318.

Digital Library

[80]

Michael Paul, Andrew Finch, and Eiichrio Sumita. 2013. How to choose the best pivot language for automatic translation of low-resource languages. ACM Trans. Asian Lang. Inf. Process. 12, 4 (2013), 1–17.

Digital Library

[81]

Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. arXiv preprint arXiv:1802.05365 (2018).

[82]

Hieu Pham, Xinyi Wang, Yiming Yang, and Graham Neubig. 2021. Meta back-translation. arXiv preprint arXiv:2102.07847 (2021).

[83]

Maja Popović. 2015. chrF: Character n-gram F-score for automatic MT evaluation. In Proceedings of the 10th Workshop on Statistical Machine Translation. 392–395.

[84]

Nima Pourdamghani, Nada Aldarrab, Marjan Ghazvininejad, Kevin Knight, and Jonathan May. 2019. Translating translationese: A two-step approach to unsupervised machine translation. arXiv preprint arXiv:1906.05683 (2019).

[85]

Nima Pourdamghani and Kevin Knight. 2017. Deciphering related languages. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2513–2518.

[86]

Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. 2019. Language models are unsupervised multitask learners. OpenAI Blog 1, 8 (2019), 9.

[87]

Sujith Ravi and Kevin Knight. 2011. Deciphering foreign language. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 12–21.

Digital Library

[88]

Shuo Ren, Wenhu Chen, Shujie Liu, Mu Li, Ming Zhou, and Shuai Ma. 2018. Triangular architecture for rare language translation. arXiv preprint arXiv:1805.04813 (2018).

[89]

Shuo Ren, Yu Wu, Shujie Liu, Ming Zhou, and Shuai Ma. 2019. Explicit cross-lingual pre-training for unsupervised machine translation. arXiv preprint arXiv:1909.00180 (2019).

[90]

Shuo Ren, Yu Wu, Shujie Liu, Ming Zhou, and Shuai Ma. 2020. A retrieve-and-rewrite initialization method for unsupervised machine translation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 3498–3504.

[91]

Sukanta Sen, Kamal Kumar Gupta, Asif Ekbal, and Pushpak Bhattacharyya. 2019. Multilingual unsupervised NMT using shared encoder and language-specific decoders. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 3083–3089.

[92]

Rico Sennrich, Barry Haddow, and Alexandra Birch. 2015. Improving neural machine translation models with monolingual data. arXiv preprint arXiv:1511.06709 (2015).

[93]

Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Edinburgh neural machine translation systems for WMT 16. arXiv preprint arXiv:1606.02891 (2016).

[94]

Rico Sennrich and Biao Zhang. 2019. Revisiting low-resource neural machine translation: A case study. arXiv preprint arXiv:1905.11901 (2019).

[95]

Hassan S. Shavarani and Anoop Sarkar. 2021. Better neural machine translation by extracting linguistic information from BERT. arXiv preprint arXiv:2104.02831 (2021).

[96]

Matthew Snover, Bonnie Dorr, Richard Schwartz, Linnea Micciulla, and John Makhoul. 2006. A study of translation edit rate with targeted human annotation. In Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers. 223–231.

[97]

Haipeng Sun, Rui Wang, Kehai Chen, Masao Utiyama, Eiichiro Sumita, and Tiejun Zhao. 2019. Unsupervised bilingual word embedding agreement for unsupervised neural machine translation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 1235–1245.

[98]

Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. arXiv preprint arXiv:1409.3215 (2014).

[99]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. arXiv preprint arXiv:1706.03762 (2017).

[100]

Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua Bengio, Pierre-Antoine Manzagol, and Léon Bottou. 2010. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion.J. Mach. Learn. Res. 11, 12 (2010).

[101]

Rui Wang, Xu Tan, Renqian Luo, Tao Qin, and Tie-Yan Liu. 2021. A survey on low-resource neural machine translation. arXiv preprint arXiv:2107.04239 (2021).

[102]

Xinyi Wang, Hieu Pham, Zihang Dai, and Graham Neubig. 2018. SwitchOut: An efficient data augmentation algorithm for neural machine translation. arXiv preprint arXiv:1808.07512 (2018).

[103]

Hao-Ran Wei, Zhirui Zhang, Boxing Chen, and Weihua Luo. 2020. Iterative domain-repaired back-translation. arXiv preprint arXiv:2010.02473 (2020).

[104]

Rongxiang Weng, Heng Yu, Shujian Huang, Shanbo Cheng, and Weihua Luo. 2020. Acquiring knowledge from pre-trained model to neural machine translation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 9266–9273.

[105]

Rongxiang Weng, Heng Yu, Shujian Huang, Weihua Luo, and Jiajun Chen. 2019. Improving neural machine translation with pre-trained representation. arXiv preprint arXiv:1908.07688 (2019).

[106]

Jiawei Wu, Xin Wang, and William Yang Wang. 2019. Extract and edit: An alternative to back-translation for unsupervised neural machine translation. arXiv preprint arXiv:1904.02331 (2019).

[107]

Yingce Xia, Di He, Tao Qin, Liwei Wang, Nenghai Yu, Tie-Yan Liu, and Wei-Ying Ma. 2016. Dual learning for machine translation. arXiv preprint arXiv:1611.00179 (2016).

[108]

Ziang Xie, Sida I. Wang, Jiwei Li, Daniel Lévy, Aiming Nie, Dan Jurafsky, and Andrew Y. Ng. 2017. Data noising as smoothing in neural network language models. arXiv preprint arXiv:1703.02573 (2017).

[109]

Zhen Yang, Wei Chen, Feng Wang, and Bo Xu. 2018. Unsupervised neural machine translation with weight sharing. arXiv preprint arXiv:1804.09057 (2018).

[110]

Samira Tofighi Zahabi, Somayeh Bakhshaei, and Shahram Khadivi. 2013. Using context vectors in improving a machine translation system with bridge language. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 318–322.

[111]

Jiajun Zhang and Chengqing Zong. 2016. Exploiting source-side monolingual data in neural machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1535–1545.

[112]

Meng Zhang, Yang Liu, Huanbo Luan, and Maosong Sun. 2017. Adversarial training for unsupervised bilingual lexicon induction. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1959–1970.

[113]

Hao Zheng, Yong Cheng, and Yang Liu. 2017. Maximum expected likelihood estimation for zero-resource neural machine translation. In Proceedings of the International Joint Conference on Artificial Intelligence. 4251–4257.

[114]

Zaixiang Zheng, Hao Zhou, Shujian Huang, Lei Li, Xin-Yu Dai, and Jiajun Chen. 2019. Mirror-generative neural machine translation. In Proceedings of the International Conference on Learning Representations.

[115]

Chunting Zhou, Xuezhe Ma, Junjie Hu, and Graham Neubig. 2019. Handling syntactic divergence in low-resource machine translation. arXiv preprint arXiv:1909.00040 (2019).

[116]

Barret Zoph, Deniz Yuret, Jonathan May, and Kevin Knight. 2016. Transfer learning for low-resource neural machine translation. arXiv preprint arXiv:1604.02201 (2016).

Cited By

Zhang JSu KLi HMao JTian YWen FGuo CMatsumoto T(2024)Neural Machine Translation for Low-Resource Languages from a Chinese-centric Perspective: A SurveyACM Transactions on Asian and Low-Resource Language Information Processing10.1145/366524423:6(1-60)Online publication date: 21-Jun-2024
https://dl.acm.org/doi/10.1145/3665244
Wei KJin LZhang ZGuo ZLi XLiu QFeng W(2024)More Than Syntaxes: Investigating Semantics to Zero-shot Cross-lingual Relation Extraction and Event Argument Role LabellingACM Transactions on Asian and Low-Resource Language Information Processing10.1145/358226123:5(1-21)Online publication date: 10-May-2024
https://dl.acm.org/doi/10.1145/3582261
Qumar SAzim MQuadri S(2024)Emerging resources, enduring challenges: a comprehensive study of Kashmiri parallel corpusAI & SOCIETY10.1007/s00146-024-01981-5Online publication date: 14-Jun-2024
https://doi.org/10.1007/s00146-024-01981-5
Show More Cited By

Index Terms

Low-resource Neural Machine Translation: Methods and Trends
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Machine translation

Recommendations

Neural Machine Translation for Low-resource Languages: A Survey
Neural Machine Translation (NMT) has seen tremendous growth in the last ten years since the early 2000s and has already entered a mature phase. While considered the most widely used solution for Machine Translation, its performance on low-resource ...
Extremely low-resource neural machine translation for Asian languages
Abstract
This paper presents a set of effective approaches to handle extremely low-resource language pairs for self-attention based neural machine translation (NMT) focusing on English and four Asian languages. Starting from an initial set of parallel ...
Morphologically Motivated Input Variations and Data Augmentation in Turkish-English Neural Machine Translation
Success of neural networks in natural language processing has paved the way for neural machine translation (NMT), which rapidly became the mainstream approach in machine translation. Significant improvement in translation performance has been achieved ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Asian and Low-Resource Language Information Processing

ACM Transactions on Asian and Low-Resource Language Information Processing Volume 21, Issue 5

September 2022

486 pages

ISSN:2375-4699

EISSN:2375-4702

DOI:10.1145/3533669

Editor:
Imed Zitouni
Google, USA

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 November 2022

Online AM: 15 March 2022

Accepted: 27 January 2022

Revised: 15 December 2021

Received: 10 June 2021

Published in TALLIP Volume 21, Issue 5

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Refereed

Funding Sources

National Natural Science Foundation of China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
883
Total Downloads

Downloads (Last 12 months)338
Downloads (Last 6 weeks)31

Reflects downloads up to 02 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhang JSu KLi HMao JTian YWen FGuo CMatsumoto T(2024)Neural Machine Translation for Low-Resource Languages from a Chinese-centric Perspective: A SurveyACM Transactions on Asian and Low-Resource Language Information Processing10.1145/366524423:6(1-60)Online publication date: 21-Jun-2024
https://dl.acm.org/doi/10.1145/3665244
Wei KJin LZhang ZGuo ZLi XLiu QFeng W(2024)More Than Syntaxes: Investigating Semantics to Zero-shot Cross-lingual Relation Extraction and Event Argument Role LabellingACM Transactions on Asian and Low-Resource Language Information Processing10.1145/358226123:5(1-21)Online publication date: 10-May-2024
https://dl.acm.org/doi/10.1145/3582261
Qumar SAzim MQuadri S(2024)Emerging resources, enduring challenges: a comprehensive study of Kashmiri parallel corpusAI & SOCIETY10.1007/s00146-024-01981-5Online publication date: 14-Jun-2024
https://doi.org/10.1007/s00146-024-01981-5
Liu CSilamu WLi Y(2023)A Chinese–Kazakh Translation Method That Combines Data Augmentation and R-Drop RegularizationApplied Sciences10.3390/app13191058913:19(10589)Online publication date: 22-Sep-2023
https://doi.org/10.3390/app131910589
Li RMohd Nawi AKang M(2023)Human-machine Translation Model Evaluation Based on Artificial Intelligence TranslationEMITTER International Journal of Engineering Technology10.24003/emitter.v11i2.81211:2(145-159)Online publication date: 20-Dec-2023
https://doi.org/10.24003/emitter.v11i2.812
Chauhan SShet JBeram SJagota VDighriri MAhmad MHossain MRizwan A(2023)Rule Based Fuzzy Computing Approach on Self-Supervised Sentiment Polarity Classification with Word Sense Disambiguation in Machine Translation for Hindi LanguageACM Transactions on Asian and Low-Resource Language Information Processing10.1145/357413022:5(1-21)Online publication date: 9-May-2023
https://dl.acm.org/doi/10.1145/3574130
Im SChan K(2023)A Study of Small Corpus-based NMT for Image-based Text Recognition2023 9th International Conference on Advanced Computing and Communication Systems (ICACCS)10.1109/ICACCS57279.2023.10112894(1497-1501)Online publication date: 17-Mar-2023
https://doi.org/10.1109/ICACCS57279.2023.10112894
Banik DPaul RRathore RJhaveri R(undefined)Improving Access to Medical Information for Multilingual Patients using Pipelined Ensemble Average based Machine TranslationACM Transactions on Asian and Low-Resource Language Information Processing10.1145/3617372
https://dl.acm.org/doi/10.1145/3617372

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents