Multivariate Analysis on Performance Gaps of Artificial Intelligence Models in Screening Mammography

Zhang, Linglin; Brown-Mulry, Beatrice; Nalla, Vineela; Hwang, InChan; Gichoya, Judy Wawira; Gastounioti, Aimilia; Banerjee, Imon; Seyyed-Kalantari, Laleh; Woo, MinJae; Trivedi, Hari

Electrical Engineering and Systems Science > Image and Video Processing

arXiv:2305.04422 (eess)

[Submitted on 8 May 2023 (v1), last revised 19 Oct 2023 (this version, v3)]

Title:Multivariate Analysis on Performance Gaps of Artificial Intelligence Models in Screening Mammography

Authors:Linglin Zhang, Beatrice Brown-Mulry, Vineela Nalla, InChan Hwang, Judy Wawira Gichoya, Aimilia Gastounioti, Imon Banerjee, Laleh Seyyed-Kalantari, MinJae Woo, Hari Trivedi

View PDF

Abstract:Although deep learning models for abnormality classification can perform well in screening mammography, the demographic, imaging, and clinical characteristics associated with increased risk of model failure remain unclear. This retrospective study uses the Emory BrEast Imaging Dataset(EMBED) containing mammograms from 115931 patients imaged at Emory Healthcare between 2013-2020, with BI-RADS assessment, region of interest coordinates for abnormalities, imaging features, pathologic outcomes, and patient demographics. Multiple deep learning models were trained to distinguish between abnormal tissue patches and randomly selected normal tissue patches from screening mammograms. We assessed model performance by subgroups defined by age, race, pathologic outcome, tissue density, and imaging characteristics and investigated their associations with false negatives (FN) and false positives (FP). We also performed multivariate logistic regression to control for confounding between subgroups. The top-performing model, ResNet152V2, achieved accuracy of 92.6%(95%CI=92.0-93.2%), and AUC 0.975(95%CI=0.972-0.978). Before controlling for confounding, nearly all subgroups showed statistically significant differences in model performance. However, after controlling for confounding, we found lower FN risk associates with Other race(RR=0.828;p=.050), biopsy-proven benign lesions(RR=0.927;p=.011), and mass(RR=0.921;p=.010) or asymmetry(RR=0.854;p=.040); higher FN risk associates with architectural distortion (RR=1.037;p<.001). Higher FP risk associates to BI-RADS density C(RR=1.891;p<.001) and D(RR=2.486;p<.001). Our results demonstrate subgroup analysis is important in mammogram classifier performance evaluation, and controlling for confounding between subgroups elucidates the true associations between variables and model failure. These results can help guide developing future breast cancer detection models.

Comments:	29 pages, 6 tables, 7 figures, 2 supplemental tables
Subjects:	Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY); Machine Learning (cs.LG)
Cite as:	arXiv:2305.04422 [eess.IV]
	(or arXiv:2305.04422v3 [eess.IV] for this version)
	https://doi.org/10.48550/arXiv.2305.04422

Submission history

From: Linglin Zhang [view email]
[v1] Mon, 8 May 2023 02:28:45 UTC (1,792 KB)
[v2] Mon, 17 Jul 2023 20:18:38 UTC (1,766 KB)
[v3] Thu, 19 Oct 2023 18:03:11 UTC (3,643 KB)

Electrical Engineering and Systems Science > Image and Video Processing

Title:Multivariate Analysis on Performance Gaps of Artificial Intelligence Models in Screening Mammography

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Image and Video Processing

Title:Multivariate Analysis on Performance Gaps of Artificial Intelligence Models in Screening Mammography

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators