Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3632971.3632981acmotherconferencesArticle/Chapter ViewAbstractPublication PagesjcraiConference Proceedingsconference-collections
research-article

Neural Machine Translation for Chinese Patent Medicine Instructions

Published: 13 February 2024 Publication History

Abstract

Abstract. Insufficient research has been conducted on the validity of datasets pertaining to the translation of Chinese Patent Medicine Instructions into English. Upon analyzing the Chinese and English texts generated by prominent translation engines, we observe that the readability of translation is a sore point and the English translation standards lack consistency. There exists a restricted range of internet search platforms that are specifically designed for the purpose of Chinese Patent Medicine (CPM). The focus of these platforms centers on the domain of specialized terminology related to Chinese herbal medicine. To address these problems, we initially develop a Chinese Patent Medicine Instruction Dataset (CPMID) for Chinese-English translation. This dataset comprises 11,695 Chinese-English entries to be meticulously annotated and validated. We benchmark the task by training and testing multiple baselines including traditional models Seq2Seq+Attention (LSTM) and Transformer, pre-trained and released translation models SMaLL-100, NLLB-200, mBART-50, and ChatGPT. The dataset demonstrates the accuracy and effectiveness with improvement of 42.5 BLEU, surpassing prior state-of-the-art by over 54.7%. The primary objective of utilizing this dataset in future R&D is to provide a reliable retrieval system for foreign users of Chinese Patent Medicine (CPM). We believe that the implementation of CPMID has the potential to facilitate the modernization of Traditional Chinese Medicine (TCM) and significantly contribute to the field of Modern Medicine (MM).

References

[1]
Zhao, Z, (2021). Prevention and treatment of COVID-19 using Traditional Chinese Medicine:A review. Phytomedicine, 85, 153308.
[2]
Deng, X., & Yu, Z (2022). A systematic review of machine-translation-assisted language learning for sustainable education. Sustainability, 14 (13), 7598.
[3]
Sun, C., (2022). CPMCP: a database of Chinese patent medicine and compound prescription. Database, baac073.
[4]
Fang, S., (2021). HERB: a high-throughput experiment-and reference-guided database of traditional Chinese medicine. Nucleic acids research, 49 (D1), D1197-D1206.
[5]
Huang, L., (2018). TCMID 2.0: a comprehensive resource for TCM. Nucleic acids research, 46 (D1), D1117-D1120.
[6]
Yan, D., (2022). HIT 2.0: an enhanced platform for Herbal Ingredients' Targets. Nucleic acids research, 50 (D1), D1238-D1243.
[7]
Wang, L., (2023). Document-level machine translation with large language models. arXiv preprint arXiv:2304.02210.
[8]
Rudolph, J., (2023). War of the chatbots: Bard, Bing Chat, ChatGPT, Ernie and beyond. The new AI gold rush and its impact on higher education. Journal of Applied Learning and Teaching, 6 (1).
[9]
Yehudai, A., (2022). Reinforcement learning with large action spaces for neural machine translation. arXiv preprint arXiv:2210.03053.
[10]
Chowdhery, A., (2022). Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311.
[11]
Wei, J., (2021). Finetuned language models are zero-shot learners. arXiv preprint arXiv:2109.01652.
[12]
Mehandru, N., (2022). Reliable and safe use of machine translation in medical settings. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency (pp. 2016-2025).
[13]
Liu, H., (2020). BioNMT: A Biomedical neural machine translation system. INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL, 15 (6).
[14]
Costa-jussà, (2017). Introduction to the special issue on deep learning approaches for machine translation. Computer Speech & Language, 46, 367-373.
[15]
Upadhyay, A. (2017). Significant Enhancements in Machine Translation by Various Deep Learning Approaches. American Journal of Computer Science and Information Technology.
[16]
Luo, R., (2022). BioGPT: generative pre-trained transformer for biomedical text generation and mining. Briefings in Bioinformatics, 23 (6), bbac409.
[17]
Chen, Q., (2023). Large language models in biomedical natural language processing: benchmarks, baselines, and recommendations. arXiv preprint arXiv:2305.16326.
[18]
Wysocki, O., (2023). Transformers and the representation of biomedical background knowledge. Computational Linguistics, 49 (1), 73-115.
[19]
Ponthongmak, W., (2023). Development and external validation of automated ICD-10 coding from discharge summaries using deep learning approaches. Informatics in Medicine Unlocked, 38, 101227.
[20]
Singhal, K., (2022). Large language models encode clinical knowledge. arXiv preprint arXiv:2212.13138.
[21]
Huang, K., (2019). Clinicalbert: Modeling clinical notes and predicting hospital readmission. arXiv preprint arXiv:1904.05342.
[22]
Lee, J., (2020). BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36 (4), 1234-1240.
[23]
Urlaub, P., & Dessein, E. (2022). From disrupted classrooms to human-machine collaboration? The Pocket Calculator, Google Translate, and the future of language education. L2 Journal, 14 (1).
[24]
Mohammadshahi, A., (2022). SMaLL-100: Introducing Shallow Multilingual Machine Translation Model for Low-Resource Languages. arXiv preprint arXiv:2210.11621.
[25]
Costa-jussà, (2022). No language left behind: Scaling human-centered machine translation. arXiv preprint arXiv:2207.04672.
[26]
Mohammadshahi, A., (2022). What Do Compressed Multilingual Machine Translation Models Forget?. arXiv preprint arXiv:2205.10828.
[27]
Cho, K., (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078.
[28]
Vaswani, A., (2017). Attention is all you need. Advances in neural information processing systems, 30.
[29]
Liu, Y., (2020). Multilingual denoising pre-training for neural machine translation. Transactions of the Association for Computational Linguistics, 8, 726-742.
[30]
Sallam, M (2023). ChatGPT utility in healthcare education, research, and practice: systematic review on the promising perspectives and valid concerns. In Healthcare (Vol. 11, No. 6, p. 887). MDPI.
[31]
Lund, B. D., & Wang, T. (2023). Chatting about ChatGPT: how may AI and GPT impact academia and libraries?. Library Hi Tech News, 40 (3), 26-29.
[32]
Jiao, W., (2023). Is ChatGPT a good translator? A preliminary study. arXiv preprint arXiv:2301.08745.
[33]
K. Papineni, (2002). BLEU: a Method for Automatic Evaluation of Machine Translation. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL).

Index Terms

  1. Neural Machine Translation for Chinese Patent Medicine Instructions
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image ACM Other conferences
          JCRAI '23: Proceedings of the 2023 International Joint Conference on Robotics and Artificial Intelligence
          July 2023
          216 pages
          ISBN:9798400707704
          DOI:10.1145/3632971
          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          Published: 13 February 2024

          Permissions

          Request permissions for this article.

          Check for updates

          Author Tags

          1. Chinese Patent Medicine Instructions
          2. Large Language Models
          3. Neural Machine Translation
          4. Transformer

          Qualifiers

          • Research-article
          • Research
          • Refereed limited

          Conference

          JCRAI 2023

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • 0
            Total Citations
          • 21
            Total Downloads
          • Downloads (Last 12 months)21
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 22 Jan 2025

          Other Metrics

          Citations

          View Options

          Login options

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format.

          HTML Format

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media