Algorithmic encoding of protected characteristics in image-based models for disease detection

Glocker, Ben; Jones, Charles; Bernhardt, Melanie; Winzeck, Stefan

Computer Science > Machine Learning

arXiv:2110.14755 (cs)

[Submitted on 27 Oct 2021 (v1), last revised 21 Jul 2022 (this version, v4)]

Title:Algorithmic encoding of protected characteristics in image-based models for disease detection

Authors:Ben Glocker, Charles Jones, Melanie Bernhardt, Stefan Winzeck

View PDF

Abstract:It has been rightfully emphasized that the use of AI for clinical decision making could amplify health disparities. An algorithm may encode protected characteristics, and then use this information for making predictions due to undesirable correlations in the (historical) training data. It remains unclear how we can establish whether such information is actually used. Besides the scarcity of data from underserved populations, very little is known about how dataset biases manifest in predictive models and how this may result in disparate performance. This article aims to shed some light on these issues by exploring new methodology for subgroup analysis in image-based disease detection models. We utilize two publicly available chest X-ray datasets, CheXpert and MIMIC-CXR, to study performance disparities across race and biological sex in deep learning models. We explore test set resampling, transfer learning, multitask learning, and model inspection to assess the relationship between the encoding of protected characteristics and disease detection performance across subgroups. We confirm subgroup disparities in terms of shifted true and false positive rates which are partially removed after correcting for population and prevalence shifts in the test sets. We further find a previously used transfer learning method to be insufficient for establishing whether specific patient information is used for making predictions. The proposed combination of test-set resampling, multitask learning, and model inspection reveals valuable new insights about the way protected characteristics are encoded in the feature representations of deep neural networks.

Comments:	Code available on this https URL
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2110.14755 [cs.LG]
	(or arXiv:2110.14755v4 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2110.14755

Submission history

From: Ben Glocker [view email]
[v1] Wed, 27 Oct 2021 20:30:57 UTC (8,939 KB)
[v2] Wed, 22 Dec 2021 08:56:05 UTC (8,941 KB)
[v3] Tue, 18 Jan 2022 16:55:50 UTC (29,243 KB)
[v4] Thu, 21 Jul 2022 15:33:21 UTC (2,027 KB)

Computer Science > Machine Learning

Title:Algorithmic encoding of protected characteristics in image-based models for disease detection

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Algorithmic encoding of protected characteristics in image-based models for disease detection

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators