Rediscovery of CNN's Versatility for Text-based Encoding of Raw Electronic Health Records

Cho, Eunbyeol; Lee, Min Jae; Hur, Kyunghoon; Kim, Jiyoun; Yoon, Jinsung; Choi, Edward

Computer Science > Machine Learning

arXiv:2303.08290 (cs)

[Submitted on 15 Mar 2023 (v1), last revised 10 May 2023 (this version, v2)]

Title:Rediscovery of CNN's Versatility for Text-based Encoding of Raw Electronic Health Records

Authors:Eunbyeol Cho, Min Jae Lee, Kyunghoon Hur, Jiyoun Kim, Jinsung Yoon, Edward Choi

View PDF

Abstract:Making the most use of abundant information in electronic health records (EHR) is rapidly becoming an important topic in the medical domain. Recent work presented a promising framework that embeds entire features in raw EHR data regardless of its form and medical code standards. The framework, however, only focuses on encoding EHR with minimal preprocessing and fails to consider how to learn efficient EHR representation in terms of computation and memory usage. In this paper, we search for a versatile encoder not only reducing the large data into a manageable size but also well preserving the core information of patients to perform diverse clinical tasks. We found that hierarchically structured Convolutional Neural Network (CNN) often outperforms the state-of-the-art model on diverse tasks such as reconstruction, prediction, and generation, even with fewer parameters and less training time. Moreover, it turns out that making use of the inherent hierarchy of EHR data can boost the performance of any kind of backbone models and clinical tasks performed. Through extensive experiments, we present concrete evidence to generalize our research findings into real-world practice. We give a clear guideline on building the encoder based on the research findings captured while exploring numerous settings.

Comments:	Accepted to CHIL 2023
Subjects:	Machine Learning (cs.LG); Computation and Language (cs.CL)
Cite as:	arXiv:2303.08290 [cs.LG]
	(or arXiv:2303.08290v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2303.08290

Submission history

From: Eunbyeol Cho [view email]
[v1] Wed, 15 Mar 2023 00:37:18 UTC (7,114 KB)
[v2] Wed, 10 May 2023 09:11:10 UTC (6,965 KB)

Computer Science > Machine Learning

Title:Rediscovery of CNN's Versatility for Text-based Encoding of Raw Electronic Health Records

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Rediscovery of CNN's Versatility for Text-based Encoding of Raw Electronic Health Records

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators