Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Learning Domain Specific Sub-layer Latent Variable for Multi-Domain Adaptation Neural Machine Translation

Published: 21 June 2024 Publication History

Abstract

Domain adaptation proves to be an effective solution for addressing inadequate translation performance within specific domains. However, the straightforward approach of mixing data from multiple domains to obtain the multi-domain neural machine translation (NMT) model can give rise to the parameter interference between domains problem, resulting in a degradation of overall performance. To address this, we introduce a multi-domain adaptive NMT method aimed at learning domain specific sub-layer latent variable and employ the Gumbel-Softmax reparameterization technique to concurrently train both model parameters and domain specific sub-layer latent variable. This approach facilitates learning private domain-specific knowledge while sharing common domain-invariant knowledge, effectively mitigating the parameter interference problem. The experimental results show that our proposed method significantly improved by up to 7.68 and 3.71 BLEU compared with the baseline model in English-German and Chinese-English public multi-domain datasets, respectively.

References

[1]
Dzmitry Bahdanau, Kyung Hyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the 3rd International Conference on Learning Representations (ICLR’15).
[2]
Ankur Bapna and Orhan Firat. 2019. Simple, scalable adaptation for neural machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 1538–1548.
[3]
Denny Britz, Quoc Le, and Reid Pryzant. 2017. Effective domain mixing for neural machine translation. In Proceedings of the 2nd Conference on Machine Translation. 118–126.
[4]
Kyunghyun Cho, Bart van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder–decoder for statistical machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1724–1734.
[5]
Chenhui Chu, Raj Dabre, and Sadao Kurohashi. 2017. An empirical comparison of domain adaptation methods for neural machine translation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 385–391.
[6]
Chenhui Chu and Rui Wang. 2018. A survey of domain adaptation for neural machine translation. In Proceedings of the 27th International Conference on Computational Linguistics. 1304–1319.
[7]
Tobias Domhan. 2018. How much attention do you need? A granular analysis of neural machine translation architectures. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1799–1808.
[8]
Shuhao Gu, Yang Feng, and Qun Liu. 2019. Improving domain adaptation translation with domain invariant and specific information. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 3081–3091.
[9]
Shuhao Gu, Yang Feng, and Wanying Xie. 2021. Pruning-then-expanding model for domain adaptation of neural machine translation. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 3942–3952.
[10]
Amr Hendy, Mohamed Abdelghaffar, Mohamed Afify, and Ahmed Y. Tawfik. 2022. Domain specific sub-network for multi-domain neural machine translation. In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 351–356.
[11]
Eric Jang, Shixiang Gu, and Ben Poole. 2017. Categorical reparameterization with Gumbel-Softmax. In Proceedings of the 5th International Conference on Learning Representations (ICLR’17). OpenReview.net.
[12]
Huda Khayrallah, Brian Thompson, Kevin Duh, and Philipp Koehn. 2018. Regularized training objective for continued training for domain adaptation in neural machine translation. In Proceedings of the 2nd Workshop on Neural Machine Translation and Generation. 36–44.
[13]
Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations (ICLR’15), Yoshua Bengio and Yann LeCun (Eds.).
[14]
Catherine Kobus, Josep M. Crego, and Jean Senellart. 2017. Domain control for neural machine translation. In Proceedings of the International Conference Recent Advances in Natural Language Processing (RANLP’17). 372–378.
[15]
Taku Kudo and John Richardson. 2018. SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing. In Proceedings of the Conference on Empirical Methods in Natural Language Processing: System Demonstrations. 66–71.
[16]
Solomon Kullback and Richard A. Leibler. 1951. On information and sufficiency. Ann. Math. Stat. 22, 1 (1951), 79–86.
[17]
Xian Li, Asa Cooper Stickland, Yuqing Tang, and Xiang Kong. 2020. Deep transformers with latent depth. Advan. Neural Inf. Process. Syst. 33 (2020), 1736–1746.
[18]
Jianze Liang, Chengqi Zhao, Mingxuan Wang, Xipeng Qiu, and Lei Li. 2021. Finding sparse structures for domain specific neural machine translation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 13333–13342.
[19]
Zhuang Liu, Mingjie Sun, Tinghui Zhou, Gao Huang, and Trevor Darrell. 2019. Rethinking the value of network pruning. In Proceedings of the 7th International Conference on Learning Representations (ICLR’19). OpenReview.net.
[20]
Minh-Thang Luong and Christopher D. Manning. 2015. Stanford neural machine translation systems for spoken language domains. In Proceedings of the 12th International Workshop on Spoken Language Translation: Evaluation Campaign.
[21]
Michael McCloskey and Neal J. Cohen. 1989. Catastrophic interference in connectionist networks: The sequential learning problem. In Psychology of Learning and Motivation. Vol. 24. Elsevier, 109–165.
[22]
Toan Q. Nguyen and Julian Salazar. 2019. Transformers without tears: Improving the normalization of self-attention. arXiv preprint arXiv:1910.05895 (2019).
[23]
Matt Post. 2018. A call for clarity in reporting BLEU scores. In Proceedings of the 3rd Conference on Machine Translation: Research Papers. 186–191.
[24]
Roger Ratcliff. 1990. Connectionist models of recognition memory: Constraints imposed by learning and forgetting functions. Psychol. Rev. 97, 2 (1990), 285.
[25]
Danielle Saunders. 2021. Domain adaptation and multi-domain adaptation for neural machine translation: A survey. arXiv preprint arXiv:2104.06951 (2021).
[26]
Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Neural machine translation of rare words with subword units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1715–1725.
[27]
Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton, and Jeff Dean. 2017. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. In Proceedings of the International Conference on Learning Representations.
[28]
Craig Stewart, Ricardo Rei, Catarina Farinha, and Alon Lavie. 2020. COMET-deploying a new state-of-the-art MT evaluation metric in production. In Proceedings of the 14th Conference of the Association for Machine Translation in the Americas (Volume 2: User Track). 78–109.
[29]
Jinsong Su, Jiali Zeng, Jun Xie, Huating Wen, Yongjing Yin, and Yang Liu. 2019. Exploring discriminative word-level domain contexts for multi-domain neural machine translation. IEEE Trans. Pattern Anal. Mach. Intell. 43, 5 (2019), 1530–1545.
[30]
Tianxiang Sun, Yunfan Shao, Xiaonan Li, Pengfei Liu, Hang Yan, Xipeng Qiu, and Xuanjing Huang. 2020. Learning sparse sharing architectures for multiple tasks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 8936–8943.
[31]
Sander Tars and Mark Fishel. 2018. Multi-domain neural machine translation. arXiv preprint arXiv:1805.02282 (2018).
[32]
Brian Thompson, Jeremy Gwinnup, Huda Khayrallah, Kevin Duh, and Philipp Koehn. 2019. Overcoming catastrophic forgetting during domain adaptation of neural machine translation. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2062–2068.
[33]
Liang Tian, Derek F. Wong, Lidia S. Chao, Paulo Quaresma, Francisco Oliveira, Yi Lu, Shuo Li, Yiming Wang, and Longyue Wang. 2014. UM-Corpus: A large English-Chinese parallel corpus for statistical machine translation. In Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC’14). 1837–1842.
[34]
Jörg Tiedemann. 2012. Parallel data, tools and interfaces in OPUS. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC’12), Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Ugur Dogan, Bente Maegaard, Joseph Mariani, Jan Odijk, and Stelios Piperidis (Eds.). European Language Resources Association (ELRA), 2214–2218.
[35]
Ashish Vaswani, Samy Bengio, Eugene Brevdo, Francois Chollet, Aidan N. Gomez, Stephan Gouws, Llion Jones, Łukasz Kaiser, Nal Kalchbrenner, Niki Parmar, Ryan Sepassi, Noam Shazeer, and Jakob Uszkoreit. 2018. Tensor2Tensor for neural machine translation. In Proceedings of the 13th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Track). 193–199. https://aclanthology.org/W18-1819.pdf
[36]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 5998–6008.
[37]
David Vilar. 2018. Learning hidden unit contribution for adapting neural machine translation models. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). 500–505.
[38]
Hongyu Wang, Shuming Ma, Li Dong, Shaohan Huang, Dongdong Zhang, and Furu Wei. 2022. DeepNet: Scaling transformers to 1,000 layers. arXiv preprint arXiv:2203.00555 (2022).
[39]
Zhiqiang Yu, Yuxin Huang, and Junjun Guo. 2022. Improving Thai-Lao neural machine translation with similarity lexicon. J. Intell. Fuzz. Syst. 42, 4 (2022), 4005–4014.
[40]
Jiali Zeng, Jinsong Su, Huating Wen, Yang Liu, Jun Xie, Yongjing Yin, and Jianqiang Zhao. 2018. Multi-domain neural machine translation with word-level domain context discrimination. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 447–457.
[41]
Michael Zhu and Suyog Gupta. 2018. To prune, or not to prune: Exploring the efficacy of pruning for model compression. In Proceedings of the 6th International Conference on Learning Representations (ICLR’18). OpenReview.net.

Index Terms

  1. Learning Domain Specific Sub-layer Latent Variable for Multi-Domain Adaptation Neural Machine Translation

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Asian and Low-Resource Language Information Processing
    ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 23, Issue 6
    June 2024
    378 pages
    EISSN:2375-4702
    DOI:10.1145/3613597
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 21 June 2024
    Online AM: 29 April 2024
    Accepted: 20 April 2024
    Revised: 05 February 2024
    Received: 08 September 2023
    Published in TALLIP Volume 23, Issue 6

    Check for updates

    Author Tags

    1. Neural machine translation
    2. multi-domain adaptation
    3. parameter interference
    4. sub-layer latent variable

    Qualifiers

    • Research-article

    Funding Sources

    • Institute of Science and Development, Chinese Academy of Sciences
    • Fundamental-Plus Plan

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 179
      Total Downloads
    • Downloads (Last 12 months)179
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 08 Feb 2025

    Other Metrics

    Citations

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media