Abstract
Abbreviation detection in clinical texts is popular and significant due to its contribution to enhancing readability and shareability of electronic medical records (EMRs). Nonetheless, it is limited to low-resource languages like Vietnamese because there is no available labeled dataset for the task. More development is thus needed to handle this task on Vietnamese clinical texts. On the other hand, there are many different note types where abbreviations are generated and used by many various groups of physicians, nurses, and other stakeholders. This fact leads to the necessity of processing a wide diversity of clinical texts for abbreviation detection. At this moment, none of the existing works takes into account the context where abbreviation detection is asked for the clinical texts that belong to one note type, unfortunately with the availability of the labeled clinical texts of another note type. This challenge results in a so-called cross-note abbreviation detection task in our work. In such a context, we address this task on Vietnamese clinical texts by proposing nested semisupervised learning. Our resulting Nested-SSL method is capable of detecting abbreviations in real Vietnamese clinical texts effectively. It is based on an existing semisupervised learning method and then boosts the core semisupervised learning process by a fold-based enhancement scheme in favor of F-measure of the minority class. In the empirical evaluation with real EMRs, Nested-SSL always outperforms its base semisupervised learning method and some existing ones. Its better performance lays the foundations for effectively preprocessing Vietnamese clinical texts in other tasks on EMRs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001). https://doi.org/10.1023/A:1010933404324
Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13, 21–27 (1967)
Collard, B., Royal, A.: The use of abbreviations in surgical note keeping. Ann. Med. Surg. 4, 100–102 (2015)
Cossin, S., Jolly, M., Larrouture, I., Griffier, R., Jouhet, V.: Semi-automatic extraction of abbreviations and their senses from electronic health records. In: Proceedings of IA & Santé 2021, pp. 1–13 (2021)
van Engelen, J.E., Hoos, H.H.: A survey on semi-supervised learning. Mach. Learn. 109, 373–440 (2020). https://doi.org/10.1007/s10994-019-05855-6
Heryawan, L., et al.: A detection of informal abbreviations from free text medical notes using deep learning. EJBI 16(1), 29–37 (2020). https://doi.org/10.24105/ejbi.2020.16.1.29
Kreuzthaler, M., Oleynik, M., Avian, A., Schulz, S.: Unsupervised abbreviation detection in clinical narratives. In: Proceedings of the Clinical Natural Language Processing Workshop, pp. 91–98 (2016)
Kreuzthaler, M., Schulz, S.: Detection of sentence boundaries and abbreviations in clinical narratives. BMC Med. Inform. Decis. Making 15, 1–13 (2015)
Kubal, D., Nagvenkar, A.: Effective ensembling of transformer based language models for acronyms identification. In: Proceedings of SDU@ AAAI, pp. 1–6 (2021)
Li, J., Zhu, Q.: Semi-supervised self-training method based on an optimum-path forest. IEEE Access 7, 36388–36399 (2019). https://doi.org/10.1109/ACCESS.2019.2903839
Li, S., Yang, C., Liang, T., Zhu, X., Yu, C., Yang, Y.: Acronym extraction with hybrid strategies. In: Proceedings of SDU@ AAAI, pp. 1–7 (2022)
Long, W.J.: Parsing free text nursing notes. In: Proceedings of AMIA Annual Symposium, p. 917 (2003)
Moon, S., Pakhomov, S., Melton, G.: Clinical Abbreviation Sense Inventory. University of Minnesota Digital Conservancy (2012). http://hdl.handle.net/11299/137703. Accessed 13 Jan 2019
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann (1993)
Sharma, P., Saadany, H., Zilio, L., Kanojia, D., Orăsan, C.: An ensemble approach to acronym extraction using transformers. In: Proceedings of SDU@ AAAI, pp. 1–6 (2022)
Shilo, L., Shilo, G.: Analysis of abbreviations used by residents in admission notes and discharge summaries. QJM Int. J. Med. 111(3), 179–183 (2018)
Triguero, I., García, S., Herrera, F.: Self-labeled techniques for semi-supervised learning: taxonomy, software and empirical study. Knowl. Inf. Syst. 42(2), 245–284 (2015). https://doi.org/10.1007/s10115-013-0706-y
Weka 3. http://www.cs.waikato.ac.nz/ml/weka. Accessed 28 June 2017
Wu, Y., Denny, J.C., Rosenbloom, S.T., Miller, R.A., Giuse, D.A., Xu, H.: A comparative study of current clinical natural language processing systems on handling abbreviations in discharge summaries. In: Proceedings of AMIA Annual Symposium, pp. 997–1003 (2012)
Wu, Y., et al.: Detecting abbreviations in discharge summaries using machine learning methods. In: Proceedings of AMIA Annual Symposium, pp. 1541–1549 (2011)
Wu, Y., Tang, B., Jiang, M., Moon, S., Denny, J.C., Xu, H.: Clinical acronym/abbreviation normalization using a hybrid approach. In: Proceedings of CLEF, pp. 1–9 (2013)
Wu, Y., et al.: A long journey to short abbreviations: developing an open-source framework for clinical abbreviation recognition and disambiguation (CARD). J. Am. Med. Inform. Assoc. 24(e1), e79–e86 (2017)
Xu, H., Stetson, P.D., Friedman, C.: A study of abbreviations in clinical notes. In: Proceedings of AMIA Annual Symposium, pp. 822–825 (2007)
Zhao, S., Li, J.: A semi-supervised self-training method based on density peaks and natural neighbors. J. Ambient Intell. Human. Comput. 1–15 (2020). https://doi.org/10.1007/s12652-020-02451-8
Zhou, Z.H., Li, M.: Tri-Training: exploiting unlabeled data using three classifiers. IEEE Trans. Knowl. Data Eng. 17(11), 1529–1541 (2005). https://doi.org/10.1109/TKDE.2005.186
Acknowledgment
This research is funded by Vietnam National University – Ho Chi Minh City (VNU-HCM) under grant number C2022-20-11.
In addition, our sincere thanks go to Dr. Nguyen Thi Minh Huyen and her team at University of Science, Vietnam National University, Hanoi, Vietnam, for the helpful resources. We also thank the providers of the Vietnamese electronic medical records very much.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Chau, V.T.N., Phung, N.H. (2023). Nested Semisupervised Learning for Cross-Note Abbreviation Detection in Vietnamese Clinical Texts. In: Nguyen, N.T., et al. Recent Challenges in Intelligent Information and Database Systems. ACIIDS 2023. Communications in Computer and Information Science, vol 1863. Springer, Cham. https://doi.org/10.1007/978-3-031-42430-4_49
Download citation
DOI: https://doi.org/10.1007/978-3-031-42430-4_49
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-42429-8
Online ISBN: 978-3-031-42430-4
eBook Packages: Computer ScienceComputer Science (R0)