Evaluation of the Synthetic Electronic Health Records

Muller, Emily; Zheng, Xu; Hayes, Jer

Computer Science > Machine Learning

arXiv:2210.08655 (cs)

[Submitted on 16 Oct 2022]

Title:Evaluation of the Synthetic Electronic Health Records

Authors:Emily Muller, Xu Zheng, Jer Hayes

View PDF

Abstract:Generative models have been found effective for data synthesis due to their ability to capture complex underlying data distributions. The quality of generated data from these models is commonly evaluated by visual inspection for image datasets or downstream analytical tasks for tabular datasets. These evaluation methods neither measure the implicit data distribution nor consider the data privacy issues, and it remains an open question of how to compare and rank different generative models. Medical data can be sensitive, so it is of great importance to draw privacy concerns of patients while maintaining the data utility of the synthetic dataset. Beyond the utility evaluation, this work outlines two metrics called Similarity and Uniqueness for sample-wise assessment of synthetic datasets. We demonstrate the proposed notions with several state-of-the-art generative models to synthesise Cystic Fibrosis (CF) patients' electronic health records (EHRs), observing that the proposed metrics are suitable for synthetic data evaluation and generative model comparison.

Comments:	arXiv admin note: substantial text overlap with arXiv:2201.05400
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2210.08655 [cs.LG]
	(or arXiv:2210.08655v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2210.08655

Submission history

From: Xu Zheng [view email]
[v1] Sun, 16 Oct 2022 22:46:08 UTC (1,678 KB)

Computer Science > Machine Learning

Title:Evaluation of the Synthetic Electronic Health Records

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Evaluation of the Synthetic Electronic Health Records

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators