research-article

Learning Domain Specific Sub-layer Latent Variable for Multi-Domain Adaptation Neural Machine Translation

Authors:

Shuanghong Huang,

Xiaomei WangAuthors Info & Claims

ACM Transactions on Asian and Low-Resource Language Information Processing, Volume 23, Issue 6

Article No.: 78, Pages 1 - 15

https://doi.org/10.1145/3661305

Published: 21 June 2024 Publication History

Abstract

Domain adaptation proves to be an effective solution for addressing inadequate translation performance within specific domains. However, the straightforward approach of mixing data from multiple domains to obtain the multi-domain neural machine translation (NMT) model can give rise to the parameter interference between domains problem, resulting in a degradation of overall performance. To address this, we introduce a multi-domain adaptive NMT method aimed at learning domain specific sub-layer latent variable and employ the Gumbel-Softmax reparameterization technique to concurrently train both model parameters and domain specific sub-layer latent variable. This approach facilitates learning private domain-specific knowledge while sharing common domain-invariant knowledge, effectively mitigating the parameter interference problem. The experimental results show that our proposed method significantly improved by up to 7.68 and 3.71 BLEU compared with the baseline model in English-German and Chinese-English public multi-domain datasets, respectively.

References

[1]

Dzmitry Bahdanau, Kyung Hyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the 3rd International Conference on Learning Representations (ICLR’15).

[2]

Ankur Bapna and Orhan Firat. 2019. Simple, scalable adaptation for neural machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 1538–1548.

[3]

Denny Britz, Quoc Le, and Reid Pryzant. 2017. Effective domain mixing for neural machine translation. In Proceedings of the 2nd Conference on Machine Translation. 118–126.

[4]

Kyunghyun Cho, Bart van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder–decoder for statistical machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1724–1734.

[5]

Chenhui Chu, Raj Dabre, and Sadao Kurohashi. 2017. An empirical comparison of domain adaptation methods for neural machine translation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 385–391.

[6]

Chenhui Chu and Rui Wang. 2018. A survey of domain adaptation for neural machine translation. In Proceedings of the 27th International Conference on Computational Linguistics. 1304–1319.

[7]

Tobias Domhan. 2018. How much attention do you need? A granular analysis of neural machine translation architectures. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1799–1808.

[8]

Shuhao Gu, Yang Feng, and Qun Liu. 2019. Improving domain adaptation translation with domain invariant and specific information. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 3081–3091.

[9]

Shuhao Gu, Yang Feng, and Wanying Xie. 2021. Pruning-then-expanding model for domain adaptation of neural machine translation. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 3942–3952.

[10]

Amr Hendy, Mohamed Abdelghaffar, Mohamed Afify, and Ahmed Y. Tawfik. 2022. Domain specific sub-network for multi-domain neural machine translation. In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 351–356.

[11]

Eric Jang, Shixiang Gu, and Ben Poole. 2017. Categorical reparameterization with Gumbel-Softmax. In Proceedings of the 5th International Conference on Learning Representations (ICLR’17). OpenReview.net.

[12]

Huda Khayrallah, Brian Thompson, Kevin Duh, and Philipp Koehn. 2018. Regularized training objective for continued training for domain adaptation in neural machine translation. In Proceedings of the 2nd Workshop on Neural Machine Translation and Generation. 36–44.

[13]

Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations (ICLR’15), Yoshua Bengio and Yann LeCun (Eds.).

[14]

Catherine Kobus, Josep M. Crego, and Jean Senellart. 2017. Domain control for neural machine translation. In Proceedings of the International Conference Recent Advances in Natural Language Processing (RANLP’17). 372–378.

[15]

Taku Kudo and John Richardson. 2018. SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing. In Proceedings of the Conference on Empirical Methods in Natural Language Processing: System Demonstrations. 66–71.

[16]

Solomon Kullback and Richard A. Leibler. 1951. On information and sufficiency. Ann. Math. Stat. 22, 1 (1951), 79–86.

[17]

Xian Li, Asa Cooper Stickland, Yuqing Tang, and Xiang Kong. 2020. Deep transformers with latent depth. Advan. Neural Inf. Process. Syst. 33 (2020), 1736–1746.

[18]

Jianze Liang, Chengqi Zhao, Mingxuan Wang, Xipeng Qiu, and Lei Li. 2021. Finding sparse structures for domain specific neural machine translation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 13333–13342.

[19]

Zhuang Liu, Mingjie Sun, Tinghui Zhou, Gao Huang, and Trevor Darrell. 2019. Rethinking the value of network pruning. In Proceedings of the 7th International Conference on Learning Representations (ICLR’19). OpenReview.net.

[20]

Minh-Thang Luong and Christopher D. Manning. 2015. Stanford neural machine translation systems for spoken language domains. In Proceedings of the 12th International Workshop on Spoken Language Translation: Evaluation Campaign.

[21]

Michael McCloskey and Neal J. Cohen. 1989. Catastrophic interference in connectionist networks: The sequential learning problem. In Psychology of Learning and Motivation. Vol. 24. Elsevier, 109–165.

[22]

Toan Q. Nguyen and Julian Salazar. 2019. Transformers without tears: Improving the normalization of self-attention. arXiv preprint arXiv:1910.05895 (2019).

[23]

Matt Post. 2018. A call for clarity in reporting BLEU scores. In Proceedings of the 3rd Conference on Machine Translation: Research Papers. 186–191.

[24]

Roger Ratcliff. 1990. Connectionist models of recognition memory: Constraints imposed by learning and forgetting functions. Psychol. Rev. 97, 2 (1990), 285.

[25]

Danielle Saunders. 2021. Domain adaptation and multi-domain adaptation for neural machine translation: A survey. arXiv preprint arXiv:2104.06951 (2021).

[26]

Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Neural machine translation of rare words with subword units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1715–1725.

[27]

Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton, and Jeff Dean. 2017. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. In Proceedings of the International Conference on Learning Representations.

[28]

Craig Stewart, Ricardo Rei, Catarina Farinha, and Alon Lavie. 2020. COMET-deploying a new state-of-the-art MT evaluation metric in production. In Proceedings of the 14th Conference of the Association for Machine Translation in the Americas (Volume 2: User Track). 78–109.

[29]

Jinsong Su, Jiali Zeng, Jun Xie, Huating Wen, Yongjing Yin, and Yang Liu. 2019. Exploring discriminative word-level domain contexts for multi-domain neural machine translation. IEEE Trans. Pattern Anal. Mach. Intell. 43, 5 (2019), 1530–1545.

[30]

Tianxiang Sun, Yunfan Shao, Xiaonan Li, Pengfei Liu, Hang Yan, Xipeng Qiu, and Xuanjing Huang. 2020. Learning sparse sharing architectures for multiple tasks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 8936–8943.

[31]

Sander Tars and Mark Fishel. 2018. Multi-domain neural machine translation. arXiv preprint arXiv:1805.02282 (2018).

[32]

Brian Thompson, Jeremy Gwinnup, Huda Khayrallah, Kevin Duh, and Philipp Koehn. 2019. Overcoming catastrophic forgetting during domain adaptation of neural machine translation. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2062–2068.

[33]

Liang Tian, Derek F. Wong, Lidia S. Chao, Paulo Quaresma, Francisco Oliveira, Yi Lu, Shuo Li, Yiming Wang, and Longyue Wang. 2014. UM-Corpus: A large English-Chinese parallel corpus for statistical machine translation. In Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC’14). 1837–1842.

[34]

Jörg Tiedemann. 2012. Parallel data, tools and interfaces in OPUS. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC’12), Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Ugur Dogan, Bente Maegaard, Joseph Mariani, Jan Odijk, and Stelios Piperidis (Eds.). European Language Resources Association (ELRA), 2214–2218.

[35]

Ashish Vaswani, Samy Bengio, Eugene Brevdo, Francois Chollet, Aidan N. Gomez, Stephan Gouws, Llion Jones, Łukasz Kaiser, Nal Kalchbrenner, Niki Parmar, Ryan Sepassi, Noam Shazeer, and Jakob Uszkoreit. 2018. Tensor2Tensor for neural machine translation. In Proceedings of the 13th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Track). 193–199. https://aclanthology.org/W18-1819.pdf

[36]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 5998–6008.

[37]

David Vilar. 2018. Learning hidden unit contribution for adapting neural machine translation models. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). 500–505.

[38]

Hongyu Wang, Shuming Ma, Li Dong, Shaohan Huang, Dongdong Zhang, and Furu Wei. 2022. DeepNet: Scaling transformers to 1,000 layers. arXiv preprint arXiv:2203.00555 (2022).

[39]

Zhiqiang Yu, Yuxin Huang, and Junjun Guo. 2022. Improving Thai-Lao neural machine translation with similarity lexicon. J. Intell. Fuzz. Syst. 42, 4 (2022), 4005–4014.

Digital Library

[40]

Jiali Zeng, Jinsong Su, Huating Wen, Yang Liu, Jun Xie, Yongjing Yin, and Jianqiang Zhao. 2018. Multi-domain neural machine translation with word-level domain context discrimination. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 447–457.

[41]

Michael Zhu and Suyog Gupta. 2018. To prune, or not to prune: Exploring the efficacy of pruning for model compression. In Proceedings of the 6th International Conference on Learning Representations (ICLR’18). OpenReview.net.

Index Terms

Learning Domain Specific Sub-layer Latent Variable for Multi-Domain Adaptation Neural Machine Translation
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Machine translation

Recommendations

Effective domain awareness and adaptation approach via mask substructure for multi-domain neural machine translation
Abstract
Multi-domain adaptation of neural machine translation (NMT) aims to learn a unified seq2seq framework based on multi-domain data. Domain corpus data mixing is one of the most important ways for multi-domain NMT, which has been widely explored in ...
Domain Adaptation for Tibetan-Chinese Neural Machine Translation
ACAI '20: Proceedings of the 2020 3rd International Conference on Algorithms, Computing and Artificial Intelligence

The meaning of the same word or sentence is likely to change in different semantic contexts, which challenges general-purpose translation system to maintain stable performance across different domains. Therefore, domain adaptation is an essential ...
Exploring iterative dual domain adaptation for neural machine translation
Abstract
Domain adaptation for neural machine translation (NMT) has always been a hot research topic in the community of machine translation. Generally, previous studies focus on the one-pass translation knowledge transfer from single source domain to a ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Asian and Low-Resource Language Information Processing

ACM Transactions on Asian and Low-Resource Language Information Processing Volume 23, Issue 6

June 2024

378 pages

EISSN:2375-4702

DOI:10.1145/3613597

Editor:
Imed Zitouni
Google, USA

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 June 2024

Online AM: 29 April 2024

Accepted: 20 April 2024

Revised: 05 February 2024

Received: 08 September 2023

Published in TALLIP Volume 23, Issue 6

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Institute of Science and Development, Chinese Academy of Sciences
Fundamental-Plus Plan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
179
Total Downloads

Downloads (Last 12 months)179
Downloads (Last 6 weeks)5

Reflects downloads up to 08 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Figures

Tables

Media

View full text|Download PDF

View Issue’s Table of Contents