Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2857546.2857630acmconferencesArticle/Chapter ViewAbstractPublication PagesicuimcConference Proceedingsconference-collections
research-article

A Hybrid Semi-supervised Learning Approach to Identifying Protected Health Information in Electronic Medical Records

Published: 04 January 2016 Publication History

Abstract

De-identification of electronic medical records is one of the main tasks to make clinical data sharable for more researchers outside the associated institutions. Indeed, this de-identification task has been considered very much with positive research outcomes worldwide, especially those from the i2b2 (Informatics for Integrating Biology and the Bedside) shared tasks in 2006 and 2014. However, it has not yet been a solved problem and still needs more investigation realistically. In this paper, we propose a hybrid semi-supervised learning approach to identifying protected health information (PHI) in electronic medical records. The proposed approach combines a machine learning-based method with a conditional random fields (CRF) model and a rule-based method in a post-processing phase to handle 8 PHI types with disambiguity. The CRF-based classification phase and the rule-based post-processing phase are then conducted in a semi-supervised learning manner. As compared to the existing works, our work has the merits of PHI identification such as: (1). Effectiveness with comparable precision and recall values from the experiments on the 2006 i2b2 data set; (2). A more practical solution as enhancing the training data set over time for a more accurate classifier in a semi-supervised learning mechanism; (3). A portable approach for clinical text in non-English languages.

References

[1]
Aberdeen, J., Bayer, S., Yeniterzi, R., Wellner, B., Clark, C., Hanauer, D., Malin, B., and Hirschman, L. 2010. The MITRE identification Scrubber toolkit: design, training, and assessment. J. Med. Inform. 79 (2010), 849--859.
[2]
Chapman, W. W., Nadkarni, P. M., Hirschman, L., D'Avolio, L. W., Savova, G. K., and Uzuner, O. 2011. Overcoming barriers to NLP for clinical text: the role of shared tasks and the need for additional creative solutions. J. Am. Med. Inform. Assoc. 18, 5 (Sept 2011), 540--543.
[3]
CRFSharp Toolkit, https://crfsharp.codeplex.com/, 2015.
[4]
Dalianis, H. and Velupillai, S. 2010. De-identifying Swedish clinical text -- refinement of a gold standard and experiments with Conditional Random Fields. Journal of Biomedical Semantics. 1, 6 (2010), 1--10.
[5]
Dehghan, A., Kovacevic, A., Karystianis, G., Keane, J. A., and Nenadic, G. 2015. Combining knowledge- and data-driven methods for de-identification of clinical narratives. J. Biomed. Inform. (2015). DOI= http://dx.doi.org/10.1016/j.jbi.2015.06.029.
[6]
Ferrández, O., South, B. R., Shen, S., Friedlin, F. J., Samore, M. H., and Meystre, S. M. 2013. BoB, a best-of-breed automated text de-identification system for VHA clinical documents. J. Am. Med. Inform. Assoc. 20 (2013), 77--83.
[7]
Gardner, J. and Xiong, L. 2008. HIDE: an integrated system for health information DE-identification. 2008. In Proceedings of the 2008 21st IEEE International Symp. On Computer-based Medical Systems (2008). IEEE, 254--259.
[8]
Grouin, C. and Zweigenbaum, P. 2013. Automatic de-identification of French clinical records: comparison of rule-based and machine learning approaches. MEDINFO. (2013), 476--480.
[9]
Gupta, D., Saul, M., and Glbertson, J. 2004. Evaluation of a de-identification (De-Id) software engine to share pathology reports and clinical documents for research. Am. J. Clin. Pathol. 121 (2004), 176--186.
[10]
Hanauer, D., Aberdeen, J., Bayer, S., Wellner, B., Clark, C., Zheng, K., and Hirschman, L. 2013. Bootstrapping a de-identification system for narrative patient records: Cost-performance tradeoffs. J. Med. Inform. 82 (2013), 821--831.
[11]
Informatics for Integrating Biology and the Bedside (i2b2), https://www.i2b2.org/NLP/, 2015.
[12]
Jaćimović, J., Krstev, C., and Jelovac, D. 2015. A rule-based system for automatic de-identification of medical narrative texts. Informatica. 39 (2015), 45--53.
[13]
Li, M., Carrell, D., Aberdeen, J., Hirschman, L., and Malin, B. A. 2014. De-identification of clinical narratives through writing complexity measures. J. Med. Inform. 83 (2014), 750--767.
[14]
Liu, Z., Chen, Y., Tang, B., Wang, X., Chen, Q., Li, H., Wang, J., Deng, Q., and Zhu, S. 2015. Automatic de-identification of electronic medical records using token-level and character-level conditional random fields. J. Biomed. Inform. (2015). DOI= http://dx.doi.org/10.1016/j.jbi.2015.06.009.
[15]
Meystre, S. M., Friedlin, F. J., South, B. R., Shen, S., and Samore, M. H. 2010. Automatic de-identification of textual documents in the electronic health record: a review of recent research. BMC Medical Research Methodology. 10, 70 (2010), 1--16.
[16]
Neamatullah, I., Douglass, M. M., Lehman, L. H., Reisner, A., Villarroel, M., Long, W. J., Szolovits, P., Moody, G. B., Mark, R. G., and Clifford, G. D. 2008. Automated de-identification of free-text medical records. BMC Medical Informatics and Decision Making. 8, 32 (2008).
[17]
Scheurwegs, E., Luyckx, K., Van der Schueren, F., and Van den Bulcke, T. 2013. De-identification of clinical free text in Dutch with limited training data: a case study. In Proceedings of the Workshop on Natural Language Processing for Medicine and Biology (2013). 18--23.
[18]
Shin, S-Y., Park, Y. R., Shin, Y., Choi, H. J., Park, J., Lyu, Y., Lee, M-S., Choi, C-M., Kim, W-S., and Lee, J. H. 2015. A de-identification method for bilingual clinical texts of various note types. J. Korean Med. Sci. 30 (2015), 7--15.
[19]
Stanford Natural Language Processing Tools, http://nlp.stanford.edu/software/corenlp.shtml, 2015.
[20]
Stubbs, A. 2015. Automated systems for the de-identification of longitudinal clinical narratives: Overview of 2014 i2b2/UTHealth shared task Track 1. J. Biomed. Inform. (2015). DOI= http://dx.doi.org/10.1016/j.jbi.2015.06.007.
[21]
Sweeney, L. 1996. Replacing personally-identifying information in medical records, the Scrub system. In Proceedings of the AMIA Annu. Fall Symp. (1996). 333--337.
[22]
Szarvas, G., Farkas, R., and Busa-Fekete, R. 2007. State-of-the-art anonymization of medical records using an iterative machine learning framework. J. Am. Med. Inform. Assoc. 14, 5 (2007), 574--580.
[23]
Wellner, B., Huyck, M., Mardis, S., Aberdeen, J., Morgan, A., Peshkin, L., Yeh, A., Hitzeman, J., and Hirschman, L. 2007. Rapidly retargetable approaches to de-identification in medical records. J. Am. Med. Inform. Assoc. 14 (2007), 564--573.
[24]
Yang, H. and Garibaldi, J. M. 2015. Automatic detection of protected health information from clinic narratives. J. Biomed. Inform. (2015). DOI= http://dx.doi.org/10.1016/j.jbi.2015.06.015.
[25]
Zuccon, G., Strachan, M., Nguyen, A., Bergheim, A., and Grayson, N. 2013. Automatic de-identification of electronic health records: an Australian perspective. In Proceedings of NICTA (Louhi, 2013).

Cited By

View all
  • (2024)De-identification of clinical free text using natural language processingArtificial Intelligence in Medicine10.1016/j.artmed.2024.102845151:COnline publication date: 1-May-2024
  • (2020)Survey on RNN and CRF models for de-identification of medical free textJournal of Big Data10.1186/s40537-020-00351-47:1Online publication date: 4-Sep-2020
  • (2020)Ubiquitous healthcare: a systematic mapping studyJournal of Ambient Intelligence and Humanized Computing10.1007/s12652-020-02513-x14:5(5021-5046)Online publication date: 26-Sep-2020

Index Terms

  1. A Hybrid Semi-supervised Learning Approach to Identifying Protected Health Information in Electronic Medical Records

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      IMCOM '16: Proceedings of the 10th International Conference on Ubiquitous Information Management and Communication
      January 2016
      658 pages
      ISBN:9781450341424
      DOI:10.1145/2857546
      © 2016 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 04 January 2016

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. de-identification
      2. electronic medical record
      3. privacy preserving
      4. protected health information
      5. semi-supervised learning

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Conference

      IMCOM '16
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 213 of 621 submissions, 34%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)3
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 15 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)De-identification of clinical free text using natural language processingArtificial Intelligence in Medicine10.1016/j.artmed.2024.102845151:COnline publication date: 1-May-2024
      • (2020)Survey on RNN and CRF models for de-identification of medical free textJournal of Big Data10.1186/s40537-020-00351-47:1Online publication date: 4-Sep-2020
      • (2020)Ubiquitous healthcare: a systematic mapping studyJournal of Ambient Intelligence and Humanized Computing10.1007/s12652-020-02513-x14:5(5021-5046)Online publication date: 26-Sep-2020

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media