Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3269206.3271805acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

HeteroMed: Heterogeneous Information Network for Medical Diagnosis

Published: 17 October 2018 Publication History

Abstract

With the recent availability of Electronic Health Records (EHR) and great opportunities they offer for advancing medical informatics, there has been growing interest in mining EHR for improving quality of care. Disease diagnosis due to its sensitive nature, huge costs of error, and complexity has become an increasingly important focus of research in past years. Existing studies model EHR by capturing co-occurrence of clinical events to learn their latent embeddings. However, relations among clinical events carry various semantics and contribute differently to disease diagnosis which gives precedence to a more advanced modeling of heterogeneous data types and relations in EHR data than existing solutions. To address these issues, we represent how high-dimensional EHR data and its rich relationships can be suitably translated into HeteroMed, a heterogeneous information network for robust medical diagnosis. Our modeling approach allows for straightforward handling of missing values and heterogeneity of data. HeteroMed exploits metapaths to capture higher level and semantically important relations contributing to disease diagnosis. Furthermore, it employs a joint embedding framework to tailor clinical event representations to the disease diagnosis goal. To the best of our knowledge, this is the first study to use Heterogeneous Information Network for modeling clinical data and disease diagnosis. Experimental results of our study show superior performance of HeteroMed compared to prior methods in prediction of exact diagnosis codes and general disease cohorts. Moreover, HeteroMed outperforms baseline models in capturing similarities of clinical events which are examined qualitatively through case studies.

References

[1]
American Medical Association. 2004. International classification of diseases, 9th revision, clinical modification: physician ICD-9-CM, 2005: volumes 1 and 2, color-coded, illustrated. Vol. 1. Amer Medical Assn.
[2]
James Bergstra, Olivier Breuleux, Frédéric Bastien, Pascal Lamblin, Razvan Pascanu, Guillaume Desjardins, Joseph Turian, David Warde-Farley, and Yoshua Bengio. 2010. Theano: A CPU and GPU math compiler in Python. In Proc. 9th Python in Science Conf . 1--7.
[3]
Taxiarchis Botsis, Gunnar Hartvigsen, Fei Chen, and Chunhua Weng. 2010. Secondary use of EHR: data quality issues and informatics opportunities. Summit on Translational Bioinformatics, Vol. 2010 (2010), 1.
[4]
Shiyu Chang, Wei Han, Jiliang Tang, Guo-Jun Qi, Charu C Aggarwal, and Thomas S Huang. 2015. Heterogeneous network embedding via deep architectures. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . ACM, 119--128.
[5]
Zhengping Che and Yan Liu. 2017. Deep Learning Solutions to Computational Phenotyping in Health Care. In Data Mining Workshops (ICDMW), 2017 IEEE International Conference on. IEEE, 1100--1109.
[6]
Ting Chen and Yizhou Sun. 2017. Task-guided and path-augmented heterogeneous network embedding for author identification. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining. ACM, 295--304.
[7]
Edward Choi, Mohammad Taha Bahadori, Elizabeth Searles, Catherine Coffey, Michael Thompson, James Bost, Javier Tejedor-Sojo, and Jimeng Sun. 2016a. Multi-layer representation learning for medical concepts. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1495--1504.
[8]
Edward Choi, Mohammad Taha Bahadori, Le Song, Walter F Stewart, and Jimeng Sun. 2017. GRAM: Graph-based attention model for healthcare representation learning. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . ACM, 787--795.
[9]
Edward Choi, Mohammad Taha Bahadori, Jimeng Sun, Joshua Kulas, Andy Schuetz, and Walter Stewart. 2016b. Retain: An interpretable predictive model for healthcare using reverse time attention mechanism. In Advances in Neural Information Processing Systems . 3504--3512.
[10]
Youngduck Choi, Chill Yi-I Chiu, and David Sontag. 2016c. Learning low-dimensional representations of medical concepts. AMIA Summits on Translational Science Proceedings, Vol. 2016 (2016), 41.
[11]
Youngduck Choi, Chill Yi-I Chiu, and David Sontag. 2016 d. Learning low-dimensional representations of medical concepts. AMIA Summits on Translational Science Proceedings, Vol. 2016 (2016), 41.
[12]
Lance De Vine, Guido Zuccon, Bevan Koopman, Laurianne Sitbon, and Peter Bruza. 2014. Medical semantic similarity with a neural language model. In Proceedings of the 23rd ACM international conference on conference on information and knowledge management. ACM, 1819--1822.
[13]
Yuxiao Dong, Nitesh V Chawla, and Ananthram Swami. 2017. metapath2vec: Scalable representation learning for heterogeneous networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . ACM, 135--144.
[14]
Wael Farhan, Zhimu Wang, Yingxiang Huang, Shuang Wang, Fei Wang, and Xiaoqian Jiang. 2016. A predictive model for medical events based on contextual embedding of temporal sequences. JMIR medical informatics, Vol. 4, 4 (2016).
[15]
Assaf Gottlieb, Gideon Y Stein, Eytan Ruppin, Russ B Altman, and Roded Sharan. 2013. A method for inferring medical diagnoses from patient similarities. BMC medicine, Vol. 11, 1 (2013), 194.
[16]
Mark L Graber, Nancy Franklin, and Ruthanna Gordon. 2005. Diagnostic error in internal medicine. Archives of internal medicine, Vol. 165, 13 (2005), 1493--1499.
[17]
Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 855--864.
[18]
Jiawei Han, Yizhou Sun, Xifeng Yan, and Philip S Yu. 2010. Mining knowledge from databases: an information network analysis approach. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. ACM, 1251--1252.
[19]
Shifu Hou, Yanfang Ye, Yangqiu Song, and Melih Abdulhayoglu. 2017. Hindroid: An intelligent android malware detection system based on structured heterogeneous information network. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1507--1515.
[20]
Alistair EW Johnson, Tom J Pollard, Lu Shen, H Lehman Li-wei, Mengling Feng, Mohammad Ghassemi, Benjamin Moody, Peter Szolovits, Leo Anthony Celi, and Roger G Mark. 2016. MIMIC-III, a freely accessible critical care database. Scientific data, Vol. 3 (2016), 160035.
[21]
Rong-Ho Lin. 2009. An intelligent model for liver disease diagnosis. Artificial Intelligence in Medicine, Vol. 47, 1 (2009), 53--62.
[22]
Jake Luo, Christina Eldredge, Chi C Cho, and Ron A Cisler. 2016. Population Analysis of Adverse Events in Different Age Groups Using Big Clinical Trials Data. JMIR medical informatics, Vol. 4, 4 (2016).
[23]
Fenglong Ma, Radha Chitta, Jing Zhou, Quanzeng You, Tong Sun, and Jing Gao. 2017. Dipole: Diagnosis prediction in healthcare via attention-based bidirectional recurrent neural networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1903--1911.
[24]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013a. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).
[25]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013b. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111--3119.
[26]
Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et almbox. 2011. Scikit-learn: Machine learning in Python. Journal of machine learning research, Vol. 12, Oct (2011), 2825--2830.
[27]
Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining . ACM, 701--710.
[28]
Sanjay Purushotham, Chuizheng Meng, Zhengping Che, and Yan Liu. 2017. Benchmark of Deep Learning Models on Large Healthcare MIMIC Datasets. arXiv preprint arXiv:1710.08531 (2017).
[29]
Radim Rehurek and Petr Sojka. 2010. Software framework for topic modelling with large corpora. In In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks. Citeseer.
[30]
Jingbo Shang, Jialu Liu, Meng Jiang, Xiang Ren, Clare R Voss, and Jiawei Han. 2017. Automated phrase mining from massive text corpora. arXiv preprint arXiv:1702.04457 (2017).
[31]
Chuan Shi, Yitong Li, Jiawei Zhang, Yizhou Sun, and S Yu Philip. 2017. A survey of heterogeneous information network analysis. IEEE Transactions on Knowledge and Data Engineering, Vol. 29, 1 (2017), 17--37.
[32]
Chuan Shi, Zhiqiang Zhang, Ping Luo, Philip S Yu, Yading Yue, and Bin Wu. 2015. Semantic path based personalized recommendation on weighted heterogeneous information networks. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. ACM, 453--462.
[33]
Jyoti Soni, Ujma Ansari, Dipesh Sharma, and Sunita Soni. 2011. Predictive data mining for medical diagnosis: An overview of heart disease prediction. International Journal of Computer Applications, Vol. 17, 8 (2011), 43--48.
[34]
Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. Line: Large-scale information network embedding. Proceedings of the 24th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 1067--1077.
[35]
Cheng-Hsiung Weng, Tony Cheng-Kui Huang, and Ruo-Ping Han. 2016. Disease prediction with different types of neural network classifiers. Telematics and Informatics, Vol. 33, 2 (2016), 277--292.

Cited By

View all
  • (2024)TACCO: Task-guided Co-clustering of Clinical Concepts and Patient Visits for Disease Subtyping based on EHR DataProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671594(6324-6334)Online publication date: 25-Aug-2024
  • (2024)Reinforced Computer-Aided Framework for Diagnosing Thyroid CancerIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2023.325132321:4(737-747)Online publication date: Jul-2024
  • (2024)Efficient symptom inquiring and diagnosis via adaptive alignment of reinforcement learning and classificationArtificial Intelligence in Medicine10.1016/j.artmed.2023.102748148:COnline publication date: 1-Feb-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '18: Proceedings of the 27th ACM International Conference on Information and Knowledge Management
October 2018
2362 pages
ISBN:9781450360142
DOI:10.1145/3269206
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. electronic health record
  2. health informatics
  3. heterogeneous information network
  4. network embedding

Qualifiers

  • Research-article

Funding Sources

  • NSF III-1705169 NSF CAREER Award 1741634

Conference

CIKM '18
Sponsor:

Acceptance Rates

CIKM '18 Paper Acceptance Rate 147 of 826 submissions, 18%;
Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)33
  • Downloads (Last 6 weeks)3
Reflects downloads up to 03 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)TACCO: Task-guided Co-clustering of Clinical Concepts and Patient Visits for Disease Subtyping based on EHR DataProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671594(6324-6334)Online publication date: 25-Aug-2024
  • (2024)Reinforced Computer-Aided Framework for Diagnosing Thyroid CancerIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2023.325132321:4(737-747)Online publication date: Jul-2024
  • (2024)Efficient symptom inquiring and diagnosis via adaptive alignment of reinforcement learning and classificationArtificial Intelligence in Medicine10.1016/j.artmed.2023.102748148:COnline publication date: 1-Feb-2024
  • (2023)A Survey on Heterogeneous Graph Embedding: Methods, Techniques, Applications and SourcesIEEE Transactions on Big Data10.1109/TBDATA.2022.31774559:2(415-436)Online publication date: 1-Apr-2023
  • (2023)Contrastive knowledge integrated graph neural networks for Chinese medical text classificationEngineering Applications of Artificial Intelligence10.1016/j.engappai.2023.106057122:COnline publication date: 1-Jun-2023
  • (2023)Thyroidkeeper: a healthcare management system for patients with thyroid diseasesHealth Information Science and Systems10.1007/s13755-023-00251-w11:1Online publication date: 17-Oct-2023
  • (2022)Conflict detection in Task Heterogeneous Information NetworksWeb Intelligence10.3233/WEB-21047820:1(21-35)Online publication date: 17-May-2022
  • (2022)Heterogeneous information networksProceedings of the VLDB Endowment10.14778/3554821.355490115:12(3807-3811)Online publication date: 1-Aug-2022
  • (2022)Time-aware Context-Gated Graph Attention Network for Clinical Risk PredictionIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.3181780(1-12)Online publication date: 2022
  • (2022)Graph representation learning in biomedicine and healthcareNature Biomedical Engineering10.1038/s41551-022-00942-x6:12(1353-1369)Online publication date: 31-Oct-2022
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media