Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Learning Multiple Diagnosis Codes for ICU Patients with Local Disease Correlation Mining

Published: 06 March 2017 Publication History

Abstract

In the era of big data, a mechanism that can automatically annotate disease codes to patients’ records in the medical information system is in demand. The purpose of this work is to propose a framework that automatically annotates the disease labels of multi-source patient data in Intensive Care Units (ICUs). We extract features from two main sources, medical charts and notes. The Bag-of-Words model is used to encode the features. Unlike most of the existing multi-label learning algorithms that globally consider correlations between diseases, our model learns disease correlation locally in the patient data. To achieve this, we derive a local disease correlation representation to enrich the discriminant power of each patient data. This representation is embedded into a unified multi-label learning framework. We develop an alternating algorithm to iteratively optimize the objective function. Extensive experiments have been conducted on a real-world ICU database. We have compared our algorithm with representative multi-label learning algorithms. Evaluation results have shown that our proposed method has state-of-the-art performance in the annotation of multiple diagnostic codes for ICU patients. This study suggests that problems in the automated diagnosis code annotation can be reliably addressed by using a multi-label learning model that exploits disease correlation. The findings of this study will greatly benefit health care and management in ICU considering that the automated diagnosis code annotation can significantly improve the quality and management of health care for both patients and caregivers.

References

[1]
David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent dirichlet allocation. Journal of Machine Learning Research 3 (2003), 993--1022.
[2]
Matthew R. Boutell, Jiebo Luo, Xipeng Shen, and Christopher M. Brown. 2004. Learning multi-label scene classification. Pattern Recognition 37, 9 (2004), 1757--1771.
[3]
Xiaojun Chang, Feiping Nie, Sen Wang, Yi Yang, Xiaofang Zhou, and Chengqi Zhang. 2016. Compound rank-k projections for bilinear analysis. IEEE Transactions on Neural Networks and Learning Systems 27, 7 (2016), 1502--1513.
[4]
Xiaojun Chang, Feiping Nie, Yi Yang, and Heng Huang. 2014a. A convex formulation for semi-supervised multi-label feature selection. In Proceedings of the AAAI Conference on Artificial Intelligence. 1171--1177.
[5]
Xiaojun Chang, Haoquan Shen, Sen Wang, Jiajun Liu, and Xue Li. 2014b. Semi-supervised feature analysis for multimedia annotation by mining label correlation. In Proceedings ofthe Advances in Knowledge Discovery and Data Mining—18th Pacific-Asia Conference, PAKDD 2014, Tainan, Taiwan, May 13--16, 2014. Proceedings, Part II. 74--85.
[6]
Xiaojun Chang, Yao-Liang Yu, Yi Yang, and Eric P. Xing. 2016. Semantic pooling for complex event analysis in untrimmed videos. IEEE Transactions on Pattern Analysis and Machine Intelligence (2016).
[7]
Zhengping Che, David Kale, Wenzhe Li, Mohammad Taha Bahadori, and Yan Liu. 2015. Deep computational phenotyping. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 507--516.
[8]
Weiwei Cheng, Eyke Hüllermeier, and Krzysztof J. Dembczynski. 2010. Bayes optimal multilabel classification via probabilistic classifier chains. In Proceedings of the International Conferences on Machine Learning. 279--286.
[9]
Amanda Clare and Ross D. King. 2001. Knowledge discovery in multi-label phenotype data. In Proceedings of the Principles of Data Mining and Knowledge Discovery. 42--53.
[10]
Francesco De Comité, Rémi Gilleron, and Marc Tommasi. 2003. Learning multi-label alternating decision trees from texts and data. In Proceeding of the Machine Learning and Data Mining in Pattern Recognition. Springer, 35--49.
[11]
André Elisseeff and Jason Weston. 2001. A kernel method for multi-labelled classification. In Proceedings of the Advances in Neural Information Processing Systems. 681--687.
[12]
A. Evgeniou and Massimiliano Pontil. 2007. Multi-task feature learning. Advances in Neural Information Processing Systems 19 (2007), 41--48.
[13]
Carol Friedman, Lyudmila Shagina, Yves Lussier, and George Hripcsak. 2004. Automated encoding of clinical documents based on natural language processing. Journal of the American Medical Informatics Association 11, 5 (2004), 392--402.
[14]
Johannes Fürnkranz, Eyke Hüllermeier, Eneldo Loza Mencía, and Klaus Brinker. 2008. Multilabel classification via calibrated label ranking. Machine Learning 73, 2 (2008), 133--153.
[15]
Marzyeh Ghassemi, Tristan Naumann, Finale Doshi-Velez, Nicole Brimmer, Rohit Joshi, Anna Rumshisky, and Peter Szolovits. 2014. Unfolding physiological state: Mortality modelling in intensive care units. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 75--84.
[16]
Quanquan Gu, Zhenhui Li, and Jiawei Han. 2011. Correlated multi-label feature selection. In Proceedings of the ACM International Conference on Information and Knowledge Management. 1087--1096.
[17]
Sheng-Jun Huang and Zhi-Hua Zhou. 2012. Multi-label learning by exploiting label correlations locally. In Proceedings of the AAAI Conference on Artificial Intelligence. 602--609.
[18]
Seyoung Kim and Eric P. Xing. 2010. Tree-guided group lasso for multi-task regression with structured sparsity. In Proceedings of the International Conferences on Machine Learning. 543--550.
[19]
Xiangnan Kong, Bokai Cao, and Philip S. Yu. 2013. Multi-label classification by mining label and instance correlations from heterogeneous information networks. In Proceedings ofthe ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 614--622.
[20]
Lucian Vlad Lita, Shipeng Yu, Radu Stefan Niculescu, and Jinbo Bi. 2008. Large scale diagnostic code classification for medical patient records. In Proceedings of the International Joint Conference on Natural Language Processing. 877--882.
[21]
Zhigang Ma, Feiping Nie, Yi Yang, Jasper R. R. Uijlings, and Nicu Sebe. 2012. Web image annotation via subspace-sparsity collaborated feature selection. IEEE Transactions on Multimedia 14, 4 (2012), 1021--1030.
[22]
Noman Mohammed, Benjamin Fung, Patrick C. K. Hung, and Cheuk-Kwong Lee. 2010. Centralized and distributed anonymization for high-dimensional healthcare data. ACM Transactions on Knowledge Discovery from Data (TKDD) 4, 4 (2010), 18.
[23]
Feiping Nie, Heng Huang, Xiao Cai, and Chris H. Ding. 2010. Efficient and robust feature selection via joint 2, 1-norms minimization. In Proceedings of the Advances in Neural Information Processing Systems. 1813--1821.
[24]
Patricia Ordóñez, Tom Armstrong, Tim Oates, and Jim Fackler. 2011. Using modified multivariate bag-of-words models to classify physiological data. In Proceedings of the IEEE International Conference on Data Mining (ICDM). 534--539.
[25]
Jesse Read, Bernhard Pfahringer, and Geoffrey Holmes. 2008. Multi-label classification using ensembles of pruned sets. In Proceedings of the IEEE International Conference on Data Mining. 995--1000.
[26]
P. Ruch, R. Baud, and A. Geissbhler. 2002. Evaluating and reducing the effect of data corruption when applying bag of words approaches to medical records. International Journal of Medical Informatics 67, 13 (2002), 75--83.
[27]
Mohammed Saeed, Mauricio Villarroel, Andrew T. Reisner, Gari Clifford, Li-Wei Lehman, George Moody, Thomas Heldt, Tin H. Kyaw, Benjamin Moody, and Roger G. Mark. 2011. Multiparameter intelligent monitoring in intensive care ii (MIMIC-II): A public-access intensive care unit database. Critical Care Medicine 39, 5 (2011), 952.
[28]
Robert E. Schapire and Yoram Singer. 2000. BoosTexter: A boosting-based system for text categorization. Machine Learning 39, 2 (2000), 135--168.
[29]
Zeeshan Syed, Collin Stultz, Manolis Kellis, Piotr Indyk, and John Guttag. 2010. Motif discovery in physiological datasets: A methodology for inferring predictive elements. ACM Transactions on Knowledge Discovery from Data (TKDD) 4, 1 (2010), 2.
[30]
G. Tsoumakas, I. Katakis, and L. Vlahavas. 2011. Random k-labelsets for multilabel classification. IEEE Transactions on Knowledge and Data Engineering 23, 7 (July 2011), 1079--1089.
[31]
Grigorios Tsoumakas and Ioannis Vlahavas. 2007. Random k-labelsets: An ensemble method for multilabel classification. In Proceedings of the European Conference on Machine Learning. 406--417.
[32]
Fei Wang, Noah Lee, Jianying Hu, Jimeng Sun, and Shahram Ebadollahi. 2012. Towards heterogeneous temporal clinical event pattern discovery: A convolutional approach. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 453--461.
[33]
Lei Wang, Peder C. Pedersen, Diane M. Strong, Bengisu Tulu, Emmanuel Agu, and Ronald Ignotz. 2015. Smartphone-based wound assessment system for patients with diabetes. IEEE Transactions on Biomedical Engineering 62, 2 (2015), 477--488.
[34]
Sen Wang, Yi Yang, Zhigang Ma, Xue Li, Chaoyi Pang, and Alexander G. Hauptmann. 2012. Action recognition by exploring data distribution and feature correlation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 1370--1377.
[35]
Xiao Wang, Weiwei Zhang, Qiuwen Zhang, and Guo-Zheng Li. 2015. MultiP-SChlo: Multi-label protein subchloroplast localization prediction with Chou's pseudo amino acid composition and a novel multi-label classifier. Bioinformatics 31, 16 (2015), 2639--2645.
[36]
Baoyuan Wu, Siwei Lyu, Bao-Gang Hu, and Qiang Ji. 2015. Multi-label learning with missing labels for image annotation and facial action unit recognition. Pattern Recognition 48, 7 (2015), 2279--2289.
[37]
Tong Xu, Dong Liu, Enhong Chen, Huanhuan Cao, and Jilei Tian. 2012. Towards annotating media contents through social diffusion analysis. In Proceedings of the IEEE International Conference on Data Mining. 1158--1163.
[38]
Ying-Ying Xu, Fan Yang, Yang Zhang, and Hong-Bin Shen. 2013. An image-based multi-label human protein subcellular localization predictor (iLocator) reveals protein mislocalizations in cancer tissues. Bioinformatics 29, 16 (2013), 2032--2040.
[39]
Ke Yan, Dejing Zhang, Darong Wu, Hua Wei, and Guangming Lu. 2014. Design of a breath analysis system for diabetes screening and blood glucose level prediction. IEEE Transactions on Biomedical Engineering 61, 11 (2014), 2787--2795.
[40]
Yan Yan, Glenn Fung, Jennifer G. Dy, and Romer Rosales. 2010. Medical coding classification by leveraging inter-code relationships. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 193--202.
[41]
Yi Yang, Zhigang Ma, Alexander G. Hauptmann, and Nicu Sebe. 2013. Feature selection for multimedia analysis by sharing information among multiple tasks. IEEE Transactions on Multimedia 15, 3 (2013), 661--669.
[42]
Yi Yang, Zhigang Ma, Feiping Nie, Xiaojun Chang, and Alexander G. Hauptmann. 2015. Multi-class active learning by uncertainty sampling with diversity maximization. International Journal of Computer Vision 113, 2 (2015), 113--127.
[43]
Yi Yang, Feiping Nie, Dong Xu, Jiebo Luo, Yueting Zhuang, and Yunhe Pan. 2012. A multimedia retrieval framework based on semi-supervised ranking and relevance feedback. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 4 (2012), 723--742.
[44]
Jieping Ye, Jianhui Chen, Ravi Janardan, and Sudhir Kumar. 2008. Developmental stage annotation of drosophila gene expression pattern images via an entire solution path for LDA. ACM Transactions on Knowledge Discovery from Data (TKDD) 2, 1 (2008), 4.
[45]
Min-Ling Zhang. 2009. ML-RBF: RBF neural networks for multi-label learning. Neural Processing Letters 29, 2 (2009), 61--74.
[46]
Min-Ling Zhang and Lei Wu. 2011. LIFT: Multi-label learning with label-specific features. In Proceedings of the International Joint Conferences on Artificial Intelligence. 1609--1614.
[47]
Min-Ling Zhang and Zhi-Hua Zhou. 2007. ML-KNN: A lazy learning approach to multi-label learning. Pattern Recognition 40, 7 (2007), 2038--2048.
[48]
Min-Ling Zhang and Zhi-Hua Zhou. 2014. A review on multi-label learning algorithms. IEEE Transactions on Knowledge and Data Engineering 26, 8 (Aug. 2014), 1819--1837.
[49]
Xiaofeng Zhu, Xuelong Li, and Shichao Zhang. 2016a. Block-row sparse multiview multilabel learning for image classification. IEEE Transactions on Cybernetics 46, 2 (2016), 450--461.
[50]
Xiaofeng Zhu, Heung-Il Suk, Seong-Whan Lee, and Dinggang Shen. 2016b. Subspace regularized sparse multitask learning for multiclass neurodegenerative disease identification. IEEE Transactions on Biomedical Engineering 63, 3 (2016), 607--618.
[51]
Xiaofeng Zhu, Heung-Il Suk, Li Wang, Seong-Whan Lee, Dinggang Shen, Alzheimers Disease Neuroimaging Initiative, et al. 2015. A novel relational regularization feature selection method for joint regression and classification in AD diagnosis. Medical Image Analysis (2015).

Cited By

View all
  • (2024)Towards Explainability in Automated Medical Code Prediction from Clinical RecordsIntelligent Systems and Applications10.1007/978-3-031-47718-8_40(593-637)Online publication date: 14-Feb-2024
  • (2022)Unifying Diagnosis Identification and Prediction Method Embedding the Disease Ontology Structure From Electronic Medical RecordsFrontiers in Public Health10.3389/fpubh.2021.7938019Online publication date: 20-Jan-2022
  • (2022)FedVCP: A Federated-Learning-Based Cooperative Positioning Scheme for Social Internet of VehiclesIEEE Transactions on Computational Social Systems10.1109/TCSS.2021.30620539:1(197-206)Online publication date: Feb-2022
  • Show More Cited By

Index Terms

  1. Learning Multiple Diagnosis Codes for ICU Patients with Local Disease Correlation Mining

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Knowledge Discovery from Data
      ACM Transactions on Knowledge Discovery from Data  Volume 11, Issue 3
      August 2017
      372 pages
      ISSN:1556-4681
      EISSN:1556-472X
      DOI:10.1145/3058790
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 06 March 2017
      Accepted: 01 September 2016
      Revised: 01 June 2016
      Received: 01 October 2015
      Published in TKDD Volume 11, Issue 3

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Diagnosis code annotation
      2. ICU data mining
      3. MIMIC II database
      4. local correlation exploiting
      5. multi-label learning
      6. pattern discovery

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Funding Sources

      • Australian Research Council Discover Project
      • Australian Research Council Linkage Project

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)26
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 22 Sep 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Towards Explainability in Automated Medical Code Prediction from Clinical RecordsIntelligent Systems and Applications10.1007/978-3-031-47718-8_40(593-637)Online publication date: 14-Feb-2024
      • (2022)Unifying Diagnosis Identification and Prediction Method Embedding the Disease Ontology Structure From Electronic Medical RecordsFrontiers in Public Health10.3389/fpubh.2021.7938019Online publication date: 20-Jan-2022
      • (2022)FedVCP: A Federated-Learning-Based Cooperative Positioning Scheme for Social Internet of VehiclesIEEE Transactions on Computational Social Systems10.1109/TCSS.2021.30620539:1(197-206)Online publication date: Feb-2022
      • (2022)Information Resilience: the nexus of responsible and agile approaches to information useThe VLDB Journal10.1007/s00778-021-00720-231:5(1059-1084)Online publication date: 16-Jan-2022
      • (2022)Real-Time Simulation Support for Real-Time SystemsHandbook of Real-Time Computing10.1007/978-981-287-251-7_40(591-604)Online publication date: 9-Aug-2022
      • (2021)Drug-Drug Interactions Prediction via Knowledge Graph and Text Embedding (Preprint)JMIR Medical Informatics10.2196/28277Online publication date: 28-Feb-2021
      • (2021)Automatic ICD-10 Coding Based on Multi-Head Attention Mechanism and Gated Residual Network2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)10.1109/BIBM52615.2021.9669625(536-543)Online publication date: 9-Dec-2021
      • (2020)Explainable Prediction of Medical Codes With Knowledge GraphsFrontiers in Bioengineering and Biotechnology10.3389/fbioe.2020.008678Online publication date: 14-Aug-2020
      • (2020)Automatic Medical Code Assignment via Deep Learning Approach for Intelligent HealthcareIEEE Journal of Biomedical and Health Informatics10.1109/JBHI.2020.299693724:9(2506-2515)Online publication date: Sep-2020
      • (2020)Wavelet energy feature based source camera identification for ear biometric imagesPattern Recognition Letters10.1016/j.patrec.2018.10.009130:C(139-147)Online publication date: 1-Feb-2020
      • Show More Cited By

      View Options

      Get Access

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media