Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Nested Semisupervised Learning for Cross-Note Abbreviation Detection in Vietnamese Clinical Texts

  • Conference paper
  • First Online:
Recent Challenges in Intelligent Information and Database Systems (ACIIDS 2023)

Abstract

Abbreviation detection in clinical texts is popular and significant due to its contribution to enhancing readability and shareability of electronic medical records (EMRs). Nonetheless, it is limited to low-resource languages like Vietnamese because there is no available labeled dataset for the task. More development is thus needed to handle this task on Vietnamese clinical texts. On the other hand, there are many different note types where abbreviations are generated and used by many various groups of physicians, nurses, and other stakeholders. This fact leads to the necessity of processing a wide diversity of clinical texts for abbreviation detection. At this moment, none of the existing works takes into account the context where abbreviation detection is asked for the clinical texts that belong to one note type, unfortunately with the availability of the labeled clinical texts of another note type. This challenge results in a so-called cross-note abbreviation detection task in our work. In such a context, we address this task on Vietnamese clinical texts by proposing nested semisupervised learning. Our resulting Nested-SSL method is capable of detecting abbreviations in real Vietnamese clinical texts effectively. It is based on an existing semisupervised learning method and then boosts the core semisupervised learning process by a fold-based enhancement scheme in favor of F-measure of the minority class. In the empirical evaluation with real EMRs, Nested-SSL always outperforms its base semisupervised learning method and some existing ones. Its better performance lays the foundations for effectively preprocessing Vietnamese clinical texts in other tasks on EMRs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001). https://doi.org/10.1023/A:1010933404324

    Article  MATH  Google Scholar 

  2. Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13, 21–27 (1967)

    Article  MATH  Google Scholar 

  3. Collard, B., Royal, A.: The use of abbreviations in surgical note keeping. Ann. Med. Surg. 4, 100–102 (2015)

    Article  Google Scholar 

  4. Cossin, S., Jolly, M., Larrouture, I., Griffier, R., Jouhet, V.: Semi-automatic extraction of abbreviations and their senses from electronic health records. In: Proceedings of IA & Santé 2021, pp. 1–13 (2021)

    Google Scholar 

  5. van Engelen, J.E., Hoos, H.H.: A survey on semi-supervised learning. Mach. Learn. 109, 373–440 (2020). https://doi.org/10.1007/s10994-019-05855-6

    Article  MathSciNet  MATH  Google Scholar 

  6. Heryawan, L., et al.: A detection of informal abbreviations from free text medical notes using deep learning. EJBI 16(1), 29–37 (2020). https://doi.org/10.24105/ejbi.2020.16.1.29

    Article  Google Scholar 

  7. Kreuzthaler, M., Oleynik, M., Avian, A., Schulz, S.: Unsupervised abbreviation detection in clinical narratives. In: Proceedings of the Clinical Natural Language Processing Workshop, pp. 91–98 (2016)

    Google Scholar 

  8. Kreuzthaler, M., Schulz, S.: Detection of sentence boundaries and abbreviations in clinical narratives. BMC Med. Inform. Decis. Making 15, 1–13 (2015)

    Article  Google Scholar 

  9. Kubal, D., Nagvenkar, A.: Effective ensembling of transformer based language models for acronyms identification. In: Proceedings of SDU@ AAAI, pp. 1–6 (2021)

    Google Scholar 

  10. Li, J., Zhu, Q.: Semi-supervised self-training method based on an optimum-path forest. IEEE Access 7, 36388–36399 (2019). https://doi.org/10.1109/ACCESS.2019.2903839

    Article  Google Scholar 

  11. Li, S., Yang, C., Liang, T., Zhu, X., Yu, C., Yang, Y.: Acronym extraction with hybrid strategies. In: Proceedings of SDU@ AAAI, pp. 1–7 (2022)

    Google Scholar 

  12. Long, W.J.: Parsing free text nursing notes. In: Proceedings of AMIA Annual Symposium, p. 917 (2003)

    Google Scholar 

  13. Moon, S., Pakhomov, S., Melton, G.: Clinical Abbreviation Sense Inventory. University of Minnesota Digital Conservancy (2012). http://hdl.handle.net/11299/137703. Accessed 13 Jan 2019

  14. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann (1993)

    Google Scholar 

  15. Sharma, P., Saadany, H., Zilio, L., Kanojia, D., Orăsan, C.: An ensemble approach to acronym extraction using transformers. In: Proceedings of SDU@ AAAI, pp. 1–6 (2022)

    Google Scholar 

  16. Shilo, L., Shilo, G.: Analysis of abbreviations used by residents in admission notes and discharge summaries. QJM Int. J. Med. 111(3), 179–183 (2018)

    Article  Google Scholar 

  17. Triguero, I., García, S., Herrera, F.: Self-labeled techniques for semi-supervised learning: taxonomy, software and empirical study. Knowl. Inf. Syst. 42(2), 245–284 (2015). https://doi.org/10.1007/s10115-013-0706-y

    Article  Google Scholar 

  18. Weka 3. http://www.cs.waikato.ac.nz/ml/weka. Accessed 28 June 2017

  19. Wu, Y., Denny, J.C., Rosenbloom, S.T., Miller, R.A., Giuse, D.A., Xu, H.: A comparative study of current clinical natural language processing systems on handling abbreviations in discharge summaries. In: Proceedings of AMIA Annual Symposium, pp. 997–1003 (2012)

    Google Scholar 

  20. Wu, Y., et al.: Detecting abbreviations in discharge summaries using machine learning methods. In: Proceedings of AMIA Annual Symposium, pp. 1541–1549 (2011)

    Google Scholar 

  21. Wu, Y., Tang, B., Jiang, M., Moon, S., Denny, J.C., Xu, H.: Clinical acronym/abbreviation normalization using a hybrid approach. In: Proceedings of CLEF, pp. 1–9 (2013)

    Google Scholar 

  22. Wu, Y., et al.: A long journey to short abbreviations: developing an open-source framework for clinical abbreviation recognition and disambiguation (CARD). J. Am. Med. Inform. Assoc. 24(e1), e79–e86 (2017)

    Article  Google Scholar 

  23. Xu, H., Stetson, P.D., Friedman, C.: A study of abbreviations in clinical notes. In: Proceedings of AMIA Annual Symposium, pp. 822–825 (2007)

    Google Scholar 

  24. Zhao, S., Li, J.: A semi-supervised self-training method based on density peaks and natural neighbors. J. Ambient Intell. Human. Comput. 1–15 (2020). https://doi.org/10.1007/s12652-020-02451-8

  25. Zhou, Z.H., Li, M.: Tri-Training: exploiting unlabeled data using three classifiers. IEEE Trans. Knowl. Data Eng. 17(11), 1529–1541 (2005). https://doi.org/10.1109/TKDE.2005.186

    Article  Google Scholar 

Download references

Acknowledgment

This research is funded by Vietnam National University – Ho Chi Minh City (VNU-HCM) under grant number C2022-20-11.

In addition, our sincere thanks go to Dr. Nguyen Thi Minh Huyen and her team at University of Science, Vietnam National University, Hanoi, Vietnam, for the helpful resources. We also thank the providers of the Vietnamese electronic medical records very much.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Vo Thi Ngoc Chau or Nguyen Hua Phung .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chau, V.T.N., Phung, N.H. (2023). Nested Semisupervised Learning for Cross-Note Abbreviation Detection in Vietnamese Clinical Texts. In: Nguyen, N.T., et al. Recent Challenges in Intelligent Information and Database Systems. ACIIDS 2023. Communications in Computer and Information Science, vol 1863. Springer, Cham. https://doi.org/10.1007/978-3-031-42430-4_49

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-42430-4_49

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-42429-8

  • Online ISBN: 978-3-031-42430-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics