research-article

DSISA: A New Neural Machine Translation Combining Dependency Weight and Neighbors

Authors: Lingfang Li, Aijun Zhang, Ming-Xing LuoAuthors Info & Claims

ACM Transactions on Asian and Low-Resource Language Information Processing, Volume 23, Issue 2

Article No.: 27, Pages 1 - 16

https://doi.org/10.1145/3638762

Published: 08 February 2024 Publication History

Abstract

Most of the previous neural machine translations (NMT) rely on parallel corpus. Integrating explicitly prior syntactic structure information can improve the neural machine translation. In this article, we propose a Syntax Induced Self-Attention (SISA) which explores the influence of dependence relation between words through the attention mechanism and fine-tunes the attention allocation of the sentence through the obtained dependency weight. We present a new model, Double Syntax Induced Self-Attention (DSISA), which fuses the features extracted by SISA and a compact convolution neural network (CNN). SISA can alleviate long dependency in sentence, while CNN captures the limited context based on neighbors. DSISA utilizes two different neural networks to extract different features for richer semantic representation and replaces the first layer of Transformer encoder. DSISA not only makes use of the global feature of tokens in sentences but also the local feature formed with adjacent tokens. Finally, we perform simulation experiments that verify the performance of the new model on standard corpora.

References

[1]

Z. Tan, S. Wang, Z. Yang, G. Chen, X. Huang, M. Sun, and Y. Liu. 2020. Neural machine translation: a review of methods, resources, and tools. AI Open 1 (2020), 5–21.

[2]

C. Zhou, F. Meng, J. Zhou, M. Zhang, H. Wang, and J. Su. 2022. Confidence based bidirectional global context aware training framework for neural machine translation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. 2878–2889.

[3]

J. Hu, H. Hayashi, K. Cho, and G. Neubig. 2022. Deep: Denoising entity pre-training for neural machine translation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. 1753–1766.

[4]

N. Kalchbrenner, L. Espeholt, K. Simonyan, A. Oord, A. Graves, and K. Kavukcuoglu. 2016. Neural machine translation in linear time. arXiv:1610.10099. Retrieved Mar 15, 2017 from https://arxiv.org/abs/1610.10099

[5]

J. Bastings, I. Titov, W. Aziz, D. Marcheggiani, and K. Sima'an. 2017. Graph convolutional encoders for syntax-aware neural machine translation. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 1957–1967.

[6]

S. Wu, D. Zhang, N. Yang, M. Li, and M. Zhou. 2017. Sequence-to-dependency neural machine translation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 698–707.

[7]

H. Xu, Q. Liu, J. Genabith, D. Xiong, and M. Zhang. 2021. Multi-head highly parallelized LSTM decoder for neural machine translation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. 273–282.

[8]

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. Gomez, L. Kaiser, and L. Polosukhin. 2017. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing System. 6000–6010.

[9]

X. Liu, K. Duh, L. Liu, and J. Gao. 2020. Very deep transformers for neural machine translation. arXiv: 2008.07772. Retrieved Oct 12, 2020 from https://arxiv.org/abs/2008.07772

[10]

N. Akoury, K. Krishna, M. Iyyer. 2019. Syntactically supervised transformers for faster neural machine translation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 1269–1281.

[11]

A. Slobodkin, L. Choshen, and O. Abend. 2022. c. In Proceedings of the 11th Joint Conference on Lexical and Computational Semantics. 28–43.

[12]

A. Bisazza, A. Üstün, and S. Sportel. 2021. On the difficulty of translating free-order case-marking languages. Transactions of the Association for Computational Linguistics 9 (2021), 1233–1248.

[13]

T. A. Chang and A. N. Rafferty. 2020. Encodings of source syntax: Similarities in NMT representations across target languages. In Proceedings of the 5th Workshop on Representation Learning for NLP. 7–16.

[14]

M. Zhang, Z. Li, G. Fu, and M. Zhang. 2019. Syntax-enhanced neural machine translation with syntax-aware word representations. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Minneapolis, Minnesota, 1151–1161.

[15]

K. Gupta, R. Haque, A. Ekbal, P. Bhattacharyya, and A. Way. 2020. Syntax-informed interactive neural machine translation. In Proceedings of the 2020 International Joint Conference on Neural Networks. 1–8.

[16]

P. Williams, R. Sennrich, M. Post, P. Koehn. 2016. Syntax-based statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts.

[17]

J. Li, P. Resnik and H. Daumé III. 2013. Modeling syntactic and semantic structures in hierarchical phrase-based translation. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 540–549.

[18]

B. Ahmadnia and B. Dorr. 2019. Enhancing phrase-based statistical machine translation by learning phrase representations using long short-term memory network. In Proceedings of the International Conference on Recent Advances in Natural Language Processing. 25–32.

[19]

A. Eriguchi, K. Hashimoto and Y. Tsuruoka. 2016. Tree-to-sequence attentional neural machine translation. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 823–833.

[20]

B. Yang, D. Wong, T. Xiao, L. Chao, and J. Zhu. 2017. Towards bidirectional hierarchical representations for attention-based neural machine translation. In Proceedings of the Empirical Methods in Natural Language Processing. 1432–1441.

[21]

C. Ma, A. Tamura, M. Utiyama, T. Zhao, and E. Sumita. 2018. Forest-based neural machine translation. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 1253–1263.

[22]

J. Li, D. Xiong, Z. Tu, M. Zhu, M. Zhang, and G. Zhou. 2017. Modeling source syntax for neural machine translation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 688–697.

[23]

H. Chen, S. Huang, D. Chiang, and J. Chen. 2017. Improved neural machine translation with a syntax-aware encoder and decoder. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics.1936–1945.

[24]

K. Hashimoto and Y. Tsuruoka. 2017. Neural machine translation with source-side latent graph parsing. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 125–135.

[25]

Ke Tran and Yonatan Bisk. 2018. Inducing grammars with and for Neural machine translation. In Proceedings of the 2nd Workshop on Neural Machine Translation and Generation. 25–35.

[26]

C. Trabelsi, O. Bilaniuk, Y. Zhang, D. Serdyuk, S. Subramanian, J. Santos, S. Mehri, N. Rostamzadeh, Y. Bengio, and C. Pal. 2018. Deep complex networks. arXiv: 1705.09792. Retrieved Feb 25, 2018 from https://arxiv.org/abs/1705.09792

[27]

Y. Liu and Y. Hou. 2023. Syntax-aware complex-valued neural machine translation. arXiv: 2307.08586. Retrieved Jul 17, 2023 from https://arxiv.org/abs/2307.08586

[28]

S. Wu, D. Zhang, Z. Zhang, N. Yang, M. Li, and M. Zhou. 2018. Dependency-to-dependency neural machine translation. IEEE/ACM Transactions on Audio, Speech, and Language Processing 26, 11 (2018), 2132–2141.

Digital Library

[29]

Y. Omote, A. Tamura, and T. Ninomiya. 2019. Dependency-based relative positional encoding for transformer NMT. In Proceedings of the International Conference on Recent Advances in Natural Language Processing. 854–861.

[30]

C. Ma, A. Tamura, M. Utiyama, E. Sumita, and T. Zhao. 2020. Syntax-based transformer for neural machine translation. Journal of Natural Language Processing 27, 2 (2020), 445–466.

[31]

K. Chen, R. Wang, M. Utiyama, E. Sumita, and T. Zhao. 2018. Syntax-directed attention for neural machine translation. In Proceedings of the AAAI Conference on Artificial Intelligence. 4792–4799.

[32]

E. Bugliarello and N. Okazaki. 2020. Enhancing machine translation with dependency-aware self-attention. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 1618–1627.

[33]

R. Peng, N. Lin, Y. Fang, S. Jiang, and J. Zhao. 2022. Boosting Neural Machine Translation with Dependency-Scaled Self-Attention Network. arXiv: 2111.11707. Retrieved Oct 2, 2022 from https://arxiv.org/abs/2111.11707v1

[34]

A. Currey and K. Heafield. 2019. Incorporating source syntax into transformer-based neural machine translation. In Proceedings of the 4th Conference on Machine Translation. 24–33.

[35]

S. Duan, Hai Zhao, and D. Zhang. 2023. Syntax-aware data augmentation for neural machine translation. IEEE/ACM Transactions on Audio, Speech, and Language Processing 31 (2023), 2988–2999.

Digital Library

[36]

H. Zhao, R. Wang, and K. Chen. 2021. Syntax in End-to-End natural language processing. In Proceedings of the 2021 Conference of Empirical Methods in Natural Language Processing: Tutorial Abstracts. 27–33.

[37]

C. Ma, A. Tamura, M. Utiyama, E. Sumita, and T. Zhao. 2019. Improving neural machine translation with neural syntactic distance. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2032–2037.

[38]

K. Aitken, V. Ramasesh, Y. Cao, and N. Maheswaranathan. 2021. Understanding how encoder-decoder architectures attend. In Proceedings of the 35th Conference on Neural Information Processing System.

[39]

W. Wang and J. Gang. 2018. Application of convolutional neural network in natural language processing. In Proceedings of the Information Systems and Computer Aided Education. 64–70.

[40]

A. F. Agarap. 2018. Deep learning using rectified linear units (ReLu). 2018. arXiv: 1803.08375. Retrieved Feb 2, 2019 from https://arxiv.org/abs/1803.08375

[41]

Z. Guo, Y. Zhang, and W. Lu. 2019. Attention guided graph convolutional networks for relation extraction. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 241–251.

[42]

M. Ott, S. Edunov, A. Baevski, A. Fan, S. Gross, N. Ng, D. Grangier, and M. Auli. 2019. FAIRSEQ: A fast, extensible toolkit for sequence modeling. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics. 48–53.

[43]

C. D. Manning, M. Surdeanu, J. Bauer, J. Finkel, S. Bethard, and D. McClosky. 2014. The stanford CoreNLP Natural language procesing toolkit. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations. 55–60,

[44]

R. Sennrich, B. Haddow, and A. Birch. 2016. Neural machine translation of rare words with subword units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 1715–1725.

[45]

D. P. Kingma and J. L. Ba. 2014. Adam: A method for stochastic optimization. In 3rd International Conference for Learning Representations.

[46]

R. Sennrich and B. Haddow. 2016. Linguistic input features improve neural machine translation. In Proceedings of the 1st Conference on Machine Translation. 83–91,

[47]

E. Strubell, P. Verga, D. Andor, D. Weiss, and A. McCallum. 2018. Linguistically-informed self-attention for semantic role labeling. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics 311–318.

[48]

K. Papineni, S. Roukos, T. Ward, and W. Zhu. 2002. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the Association for Computational Linguistics.

[49]

A. Raganato and J. Tiedemann. 2018. An analysis of encoder representations in Transformer-based machine translation. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. 287–297.

[50]

Y. Wang, H. Y. Lee, and Y. N. Chen. 2019. Tree transformer: integrating tree structures into self-attention. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 1061–1070.

Index Terms

DSISA: A New Neural Machine Translation Combining Dependency Weight and Neighbors
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms

Recommendations

Syntax-aware neural machine translation directed by syntactic dependency degree
Abstract
There are various ways to incorporate syntax knowledge into neural machine translation (NMT). However, quantifying the dependency syntactic intimacy (DSI) between word pairs in a dependency tree has not being considered to use in attentional and ...
Using Translation Memory to Improve Neural Machine Translations
ICDLT '22: Proceedings of the 2022 6th International Conference on Deep Learning Technologies

In this paper, we describe a way of using translation memory (TM) to improve the translation quality and stability of neural machine translation (NMT) systems, especially when the sentences to be translated have high similarity with sentences stored in ...
Deps-SAN: Neural Machine Translation with Dependency-Scaled Self-Attention Network
Neural Information Processing
Abstract
Syntax knowledge contributes its powerful strength in Neural machine translation (NMT) tasks. Early NMT works supposed that syntax details can be automatically learned from numerous texts via attention networks. However, succeeding researches ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Asian and Low-Resource Language Information Processing

ACM Transactions on Asian and Low-Resource Language Information Processing Volume 23, Issue 2

February 2024

340 pages

EISSN:2375-4702

DOI:10.1145/3613556

Editor:
Imed Zitouni
Google, USA

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 February 2024

Online AM: 29 December 2023

Accepted: 24 December 2023

Revised: 20 September 2023

Received: 16 January 2023

Published in TALLIP Volume 23, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China
Sichuan Natural Science Foundation
Interdisciplinary Research of Southwest Jiaotong University
Inner Mongolia Natural Science Foundation
Initial Scientific Research Fund of Inner Mongolia University of Science and Technology

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
117
Total Downloads

Downloads (Last 12 months)117
Downloads (Last 6 weeks)9

Reflects downloads up to 04 Oct 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents