research-article

Free access

MedLink: De-Identified Patient Health Record Linkage

Authors:

Zhenbang Wu,

Cao Xiao,

Jimeng SunAuthors Info & Claims

KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Pages 2672 - 2682

https://doi.org/10.1145/3580305.3599427

Published: 04 August 2023 Publication History

PDF eReader

Abstract

A comprehensive patient health history is essential for patient care and healthcare research. However, due to the distributed nature of healthcare services, patient health records are often scattered across multiple systems. Existing record linkage approaches primarily rely on patient identifiers, which have inherent limitations such as privacy invasion and identifier discrepancies. To tackle this problem, we propose linking de-identified patient health records by matching health patterns without strictly relying on sensitive patient identifiers. Our model MedLink solves two challenges faced with the patient linkage task: (1) the challenge of identifying the same patients based on data collected in different timelines as disease progression makes the record matching difficult, and (2) the challenge of identifying distinct health patterns as common medical codes dominate health records and overshadow the more informative low-prevalence codes. To address these challenges, MedLink utilizes bi-directional health prediction to predict future codes forwardly and past codes backwardly, thus accounting for the health progression. MedLink also has a prevalence-aware retrieval design to focus more on the low-prevalence but informative codes during learning. MedLink can be trained end-to-end and is lightweight for efficient inference on large patient databases. We evaluate MedLink against leading baselines on real-world patient datasets, including the critical care dataset MIMIC-III and a large health claims dataset. Results show that MedLink outperforms the best baseline by 4% in top-1 accuracy with only 8% memory cost. Additionally, when combined with existing identifier-based linkage approaches, MedLink can improve their performance by up to 15%.

Supplementary Material

MP4 File (rtfp0941-2min-promo.mp4)

Presentation video - short version

Download
36.56 MB

References

[1]

Yang Bai, Xiaoguang Li, Gang Wang, Chaoliang Zhang, Lifeng Shang, Jun Xu, Zhaowei Wang, Fangshan Wang, and Qun Liu. 2020. SparTerm: Learning Term-based Sparse Representation for Fast Text Retrieval. CoRR abs/2010.00768 (2020). arXiv:2010.00768 https://arxiv.org/abs/2010.00768

Abstract

Supplementary Material

References

Cited By

Index Terms

Recommendations

Quality and Cost Improvement of Healthcare via Complementary Measurement and Diagnosis of Patient General Health Outcome Using Electronic Health Record Data: Research Rationale and Design

EEMI-An Electronic Health Record for Pediatricians: Adoption Barriers, Services and Use in Mexico

Standard-based patient-centered personal health record system

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations