Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

DSISA: A New Neural Machine Translation Combining Dependency Weight and Neighbors

Published: 08 February 2024 Publication History

Abstract

Most of the previous neural machine translations (NMT) rely on parallel corpus. Integrating explicitly prior syntactic structure information can improve the neural machine translation. In this article, we propose a Syntax Induced Self-Attention (SISA) which explores the influence of dependence relation between words through the attention mechanism and fine-tunes the attention allocation of the sentence through the obtained dependency weight. We present a new model, Double Syntax Induced Self-Attention (DSISA), which fuses the features extracted by SISA and a compact convolution neural network (CNN). SISA can alleviate long dependency in sentence, while CNN captures the limited context based on neighbors. DSISA utilizes two different neural networks to extract different features for richer semantic representation and replaces the first layer of Transformer encoder. DSISA not only makes use of the global feature of tokens in sentences but also the local feature formed with adjacent tokens. Finally, we perform simulation experiments that verify the performance of the new model on standard corpora.

References

[1]
Z. Tan, S. Wang, Z. Yang, G. Chen, X. Huang, M. Sun, and Y. Liu. 2020. Neural machine translation: a review of methods, resources, and tools. AI Open 1 (2020), 5–21.
[2]
C. Zhou, F. Meng, J. Zhou, M. Zhang, H. Wang, and J. Su. 2022. Confidence based bidirectional global context aware training framework for neural machine translation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. 2878–2889.
[3]
J. Hu, H. Hayashi, K. Cho, and G. Neubig. 2022. Deep: Denoising entity pre-training for neural machine translation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. 1753–1766.
[4]
N. Kalchbrenner, L. Espeholt, K. Simonyan, A. Oord, A. Graves, and K. Kavukcuoglu. 2016. Neural machine translation in linear time. arXiv:1610.10099. Retrieved Mar 15, 2017 from https://arxiv.org/abs/1610.10099
[5]
J. Bastings, I. Titov, W. Aziz, D. Marcheggiani, and K. Sima'an. 2017. Graph convolutional encoders for syntax-aware neural machine translation. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 1957–1967.
[6]
S. Wu, D. Zhang, N. Yang, M. Li, and M. Zhou. 2017. Sequence-to-dependency neural machine translation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 698–707.
[7]
H. Xu, Q. Liu, J. Genabith, D. Xiong, and M. Zhang. 2021. Multi-head highly parallelized LSTM decoder for neural machine translation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. 273–282.
[8]
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. Gomez, L. Kaiser, and L. Polosukhin. 2017. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing System. 6000–6010.
[9]
X. Liu, K. Duh, L. Liu, and J. Gao. 2020. Very deep transformers for neural machine translation. arXiv: 2008.07772. Retrieved Oct 12, 2020 from https://arxiv.org/abs/2008.07772
[10]
N. Akoury, K. Krishna, M. Iyyer. 2019. Syntactically supervised transformers for faster neural machine translation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 1269–1281.
[11]
A. Slobodkin, L. Choshen, and O. Abend. 2022. c. In Proceedings of the 11th Joint Conference on Lexical and Computational Semantics. 28–43.
[12]
A. Bisazza, A. Üstün, and S. Sportel. 2021. On the difficulty of translating free-order case-marking languages. Transactions of the Association for Computational Linguistics 9 (2021), 1233–1248.
[13]
T. A. Chang and A. N. Rafferty. 2020. Encodings of source syntax: Similarities in NMT representations across target languages. In Proceedings of the 5th Workshop on Representation Learning for NLP. 7–16.
[14]
M. Zhang, Z. Li, G. Fu, and M. Zhang. 2019. Syntax-enhanced neural machine translation with syntax-aware word representations. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Minneapolis, Minnesota, 1151–1161.
[15]
K. Gupta, R. Haque, A. Ekbal, P. Bhattacharyya, and A. Way. 2020. Syntax-informed interactive neural machine translation. In Proceedings of the 2020 International Joint Conference on Neural Networks. 1–8.
[16]
P. Williams, R. Sennrich, M. Post, P. Koehn. 2016. Syntax-based statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts.
[17]
J. Li, P. Resnik and H. Daumé III. 2013. Modeling syntactic and semantic structures in hierarchical phrase-based translation. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 540–549.
[18]
B. Ahmadnia and B. Dorr. 2019. Enhancing phrase-based statistical machine translation by learning phrase representations using long short-term memory network. In Proceedings of the International Conference on Recent Advances in Natural Language Processing. 25–32.
[19]
A. Eriguchi, K. Hashimoto and Y. Tsuruoka. 2016. Tree-to-sequence attentional neural machine translation. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 823–833.
[20]
B. Yang, D. Wong, T. Xiao, L. Chao, and J. Zhu. 2017. Towards bidirectional hierarchical representations for attention-based neural machine translation. In Proceedings of the Empirical Methods in Natural Language Processing. 1432–1441.
[21]
C. Ma, A. Tamura, M. Utiyama, T. Zhao, and E. Sumita. 2018. Forest-based neural machine translation. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 1253–1263.
[22]
J. Li, D. Xiong, Z. Tu, M. Zhu, M. Zhang, and G. Zhou. 2017. Modeling source syntax for neural machine translation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 688–697.
[23]
H. Chen, S. Huang, D. Chiang, and J. Chen. 2017. Improved neural machine translation with a syntax-aware encoder and decoder. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics.1936–1945.
[24]
K. Hashimoto and Y. Tsuruoka. 2017. Neural machine translation with source-side latent graph parsing. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 125–135.
[25]
Ke Tran and Yonatan Bisk. 2018. Inducing grammars with and for Neural machine translation. In Proceedings of the 2nd Workshop on Neural Machine Translation and Generation. 25–35.
[26]
C. Trabelsi, O. Bilaniuk, Y. Zhang, D. Serdyuk, S. Subramanian, J. Santos, S. Mehri, N. Rostamzadeh, Y. Bengio, and C. Pal. 2018. Deep complex networks. arXiv: 1705.09792. Retrieved Feb 25, 2018 from https://arxiv.org/abs/1705.09792
[27]
Y. Liu and Y. Hou. 2023. Syntax-aware complex-valued neural machine translation. arXiv: 2307.08586. Retrieved Jul 17, 2023 from https://arxiv.org/abs/2307.08586
[28]
S. Wu, D. Zhang, Z. Zhang, N. Yang, M. Li, and M. Zhou. 2018. Dependency-to-dependency neural machine translation. IEEE/ACM Transactions on Audio, Speech, and Language Processing 26, 11 (2018), 2132–2141.
[29]
Y. Omote, A. Tamura, and T. Ninomiya. 2019. Dependency-based relative positional encoding for transformer NMT. In Proceedings of the International Conference on Recent Advances in Natural Language Processing. 854–861.
[30]
C. Ma, A. Tamura, M. Utiyama, E. Sumita, and T. Zhao. 2020. Syntax-based transformer for neural machine translation. Journal of Natural Language Processing 27, 2 (2020), 445–466.
[31]
K. Chen, R. Wang, M. Utiyama, E. Sumita, and T. Zhao. 2018. Syntax-directed attention for neural machine translation. In Proceedings of the AAAI Conference on Artificial Intelligence. 4792–4799.
[32]
E. Bugliarello and N. Okazaki. 2020. Enhancing machine translation with dependency-aware self-attention. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 1618–1627.
[33]
R. Peng, N. Lin, Y. Fang, S. Jiang, and J. Zhao. 2022. Boosting Neural Machine Translation with Dependency-Scaled Self-Attention Network. arXiv: 2111.11707. Retrieved Oct 2, 2022 from https://arxiv.org/abs/2111.11707v1
[34]
A. Currey and K. Heafield. 2019. Incorporating source syntax into transformer-based neural machine translation. In Proceedings of the 4th Conference on Machine Translation. 24–33.
[35]
S. Duan, Hai Zhao, and D. Zhang. 2023. Syntax-aware data augmentation for neural machine translation. IEEE/ACM Transactions on Audio, Speech, and Language Processing 31 (2023), 2988–2999.
[36]
H. Zhao, R. Wang, and K. Chen. 2021. Syntax in End-to-End natural language processing. In Proceedings of the 2021 Conference of Empirical Methods in Natural Language Processing: Tutorial Abstracts. 27–33.
[37]
C. Ma, A. Tamura, M. Utiyama, E. Sumita, and T. Zhao. 2019. Improving neural machine translation with neural syntactic distance. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2032–2037.
[38]
K. Aitken, V. Ramasesh, Y. Cao, and N. Maheswaranathan. 2021. Understanding how encoder-decoder architectures attend. In Proceedings of the 35th Conference on Neural Information Processing System.
[39]
W. Wang and J. Gang. 2018. Application of convolutional neural network in natural language processing. In Proceedings of the Information Systems and Computer Aided Education. 64–70.
[40]
A. F. Agarap. 2018. Deep learning using rectified linear units (ReLu). 2018. arXiv: 1803.08375. Retrieved Feb 2, 2019 from https://arxiv.org/abs/1803.08375
[41]
Z. Guo, Y. Zhang, and W. Lu. 2019. Attention guided graph convolutional networks for relation extraction. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 241–251.
[42]
M. Ott, S. Edunov, A. Baevski, A. Fan, S. Gross, N. Ng, D. Grangier, and M. Auli. 2019. FAIRSEQ: A fast, extensible toolkit for sequence modeling. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics. 48–53.
[43]
C. D. Manning, M. Surdeanu, J. Bauer, J. Finkel, S. Bethard, and D. McClosky. 2014. The stanford CoreNLP Natural language procesing toolkit. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations. 55–60,
[44]
R. Sennrich, B. Haddow, and A. Birch. 2016. Neural machine translation of rare words with subword units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 1715–1725.
[45]
D. P. Kingma and J. L. Ba. 2014. Adam: A method for stochastic optimization. In 3rd International Conference for Learning Representations.
[46]
R. Sennrich and B. Haddow. 2016. Linguistic input features improve neural machine translation. In Proceedings of the 1st Conference on Machine Translation. 83–91,
[47]
E. Strubell, P. Verga, D. Andor, D. Weiss, and A. McCallum. 2018. Linguistically-informed self-attention for semantic role labeling. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics 311–318.
[48]
K. Papineni, S. Roukos, T. Ward, and W. Zhu. 2002. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the Association for Computational Linguistics.
[49]
A. Raganato and J. Tiedemann. 2018. An analysis of encoder representations in Transformer-based machine translation. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. 287–297.
[50]
Y. Wang, H. Y. Lee, and Y. N. Chen. 2019. Tree transformer: integrating tree structures into self-attention. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 1061–1070.

Index Terms

  1. DSISA: A New Neural Machine Translation Combining Dependency Weight and Neighbors

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Asian and Low-Resource Language Information Processing
    ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 23, Issue 2
    February 2024
    340 pages
    EISSN:2375-4702
    DOI:10.1145/3613556
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 08 February 2024
    Online AM: 29 December 2023
    Accepted: 24 December 2023
    Revised: 20 September 2023
    Received: 16 January 2023
    Published in TALLIP Volume 23, Issue 2

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Neural machine translation
    2. transformer
    3. dependency relation
    4. convolution neural network

    Qualifiers

    • Research-article

    Funding Sources

    • National Natural Science Foundation of China
    • Sichuan Natural Science Foundation
    • Interdisciplinary Research of Southwest Jiaotong University
    • Inner Mongolia Natural Science Foundation
    • Initial Scientific Research Fund of Inner Mongolia University of Science and Technology

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 117
      Total Downloads
    • Downloads (Last 12 months)117
    • Downloads (Last 6 weeks)9
    Reflects downloads up to 04 Oct 2024

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media