Liu, C.; Silamu, W.; Li, Y. A Chinese–Kazakh Translation Method That Combines Data Augmentation and R-Drop Regularization. Appl. Sci.2023, 13, 10589.
Liu, C.; Silamu, W.; Li, Y. A Chinese–Kazakh Translation Method That Combines Data Augmentation and R-Drop Regularization. Appl. Sci. 2023, 13, 10589.
Liu, C.; Silamu, W.; Li, Y. A Chinese–Kazakh Translation Method That Combines Data Augmentation and R-Drop Regularization. Appl. Sci.2023, 13, 10589.
Liu, C.; Silamu, W.; Li, Y. A Chinese–Kazakh Translation Method That Combines Data Augmentation and R-Drop Regularization. Appl. Sci. 2023, 13, 10589.
Abstract
Low-resource languages often face the problem of insufficient data, which leads to poor quality in machine translation. One approach to address this issue is data augmentation. Data augmentation involves creating new data by transforming existing data through methods such as flipping, cropping, rotating, and adding noise. Traditionally, pseudo-parallel corpora are generated by randomly replacing words in low-resource language machine translation. However, this method can introduce ambiguity as the same word may have different meanings in different contexts. This study proposes a new approach for low-resource language machine translation, which involves generating pseudo-parallel corpora by replacing phrases. The performance of this approach is compared with other data augmentation methods, and it is observed that combining it with other data augmentation methods further improves performance. To enhance the robustness of the model, R-Drop regularization is also used. R-Drop is an effective method for improving the quality of machine translation. The proposed method was tested on Chinese-Kazakh (Latin script) translation tasks, resulting in performance improvements of 4.99 and 7.7 for Chinese-to-Kazakh and Kazakh-to-Chinese translations, respectively.
Keywords
machine translation; data augmentation; phrase replacement; R-Drop
Subject
Computer Science and Mathematics, Computer Science
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.