Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
\NiceMatrixOptions

custom-line = letter = : , command = dashedline , ccommand = cdashedline , tikz = dashed

11institutetext: Departament de Matemàtiques i Informàtica, Universitat de Barcelona, Spain 11email: richard.osuala@ub.edu 22institutetext: Helmholtz Center Munich, Munich, Germany 33institutetext: Technical University of Munich, Munich, Germany 44institutetext: Imperial College London, London, United Kingdom 55institutetext: Computer Vision Center, Bellaterra, Spain 66institutetext: Kings College London, London, UK 77institutetext: Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain

Enhancing the Utility of Privacy-Preserving Cancer Classification using Synthetic Data

Richard Osuala 112233    Daniel M. Lang 2233    Anneliese Riess 2233    Georgios Kaissis 223344    Zuzanna Szafranowska 11    Grzegorz Skorupko 11    Oliver Diaz 1155    Julia A. Schnabel 223366    Karim Lekadir 1177
Abstract

Deep learning holds immense promise for aiding radiologists in breast cancer detection. However, achieving optimal model performance is hampered by limitations in availability and sharing of data commonly associated to patient privacy concerns. Such concerns are further exacerbated, as traditional deep learning models can inadvertently leak sensitive training information. This work addresses these challenges exploring and quantifying the utility of privacy-preserving deep learning techniques, concretely, (i) differentially private stochastic gradient descent (DP-SGD) and (ii) fully synthetic training data generated by our proposed malignancy-conditioned generative adversarial network. We assess these methods via downstream malignancy classification of mammography masses using a transformer model. Our experimental results depict that synthetic data augmentation can improve privacy-utility tradeoffs in differentially private model training. Further, model pretraining on synthetic data achieves remarkable performance, which can be further increased with DP-SGD fine-tuning across all privacy guarantees. With this first in-depth exploration of privacy-preserving deep learning in breast imaging, we address current and emerging clinical privacy requirements and pave the way towards the adoption of private high-utility deep diagnostic models. Our reproducible codebase is publicly available at https://github.com/RichardObi/mammo_dp.

Keywords:
Breast Imaging Differential Privacy Generative Models
Refer to caption
Figure 1: Overview of our privacy-preserving deep learning pipeline and malignancy-conditioned generative adversarial network (MCGAN).

1 Introduction

Breast cancer accounts for staggering estimates of 684.000 deaths and 2,26 million new cases worldwide per year [11]. Part of this burden could be reduced through earlier detection and timely treatment. Screening mammography is a cornerstone for early detection and further associated with a reduction in breast cancer mortality [20]. Recent literature emphasizes the potential of deep learning-based computer-aided diagnosis (CAD) [29, 23, 15, 21], e.g., demonstrating that a symbiosis of deep learning models with radiologist assessment yields the highest breast cancer detection performances [20]. However, training deep learning models on patient data poses a risk of leakage of sensitive person-specific information during and after training [23], as models have the capacity to memorise sufficient information to allow for high-fidelity image reconstruction [3, 13]. To avoid such leakage of private patient information, data needs to be protected during model training, in particular when the objective is to develop models to be used in clinical practice or shared among entities. Furthermore, international data protection regulations grant patients the right to request the removal of their information from data holders. For instance, point (b) of article 17(1) of the EU General Data Protection Regulation (GDPR) [9] stipulates that data subjects have a “right to be forgotten”. Given, for instance, the proven possibility of reconstructing training data given a model’s weights [3, 13], these rights can extend to the removal of patient-specific information from already trained deep learning models [28]. However, it is known to be difficult to “reliably” and “provably” remove patient information — present in only one or few specific training data points — from already trained model weights [28]. A generic and verifiable alternative is given by the removal of a patient’s data point from the training data and retraining of the respective model with the reminder of the dataset. This procedure is not only likely to have negative impacts on the performance of algorithms, but also emerges as a deterrence and risk for hospitals to adopt deep learning models, due to extensive economic, organisational, and environmental costs caused by retraining. Anticipating patient consent withdrawals, costly retraining can be avoided by demonstrating that deep learning model weights do not include personally identifiable information (PII) about any specific patient. To this end, a powerful technique to ensure privacy during model training is given by Differentially Private Stochastic Gradient Descent (DP-SGD)[1], which quantifiably reduces the effect each single training sample can have on the resulting model weights. Furthermore, privacy-preservation can also be achieved by diagnostic models exclusively trained on synthetic data, which is not (unambiguously) attributable to any specific patient but rather contains anonymous samples representing the essence of the dataset [12, 23]. The caveat of both DP-SGD and synthetic data strategies is, however, that they generally lead to a reduction in model performance, known as the privacy-utility trade-off. Investigating this trade-off in the realm of breast imaging, our core contributions are summarised as follows:

  • We design and validate a transformer model, achieving promising performance as a backbone for privacy-preserving breast mass malignancy classification.

  • We propose and validate a conditional generative adversarial network capable of differentiating between benign and malignant breast mass generation.

  • We empirically quantify privacy-utility-tradeoffs in mass malignancy classification, assessing various differential privacy guarantees, and further combine and compare them with training on synthetic data.

2 Methods and Materials

Datasets and Preprocessing

We use the open-access Curated Breast Imaging Subset of Digital Database for Screening Mammography (CBIS-DDSM) dataset [16], which consists of 891 scanned film mammography cases with segmented masses with biopsy-proven malignancy status. After extracting mass images from craniocaudal view (CC) and mediolateral oblique (MLO) views, we follow the predefined per-patient train-test split [16], allocating 1296 mass images for training and 402 (245 benign, 157 malignant) mass images to testing. We further divided this training set randomly per-patient into a training (1104 mass images, 525 malignant) and a validation set (192 mass images, 102 malignant). As external test set, we further adopt the publicly available BCDR cohort [19], which comprises 1010 patients, totalling 1493 lesions (639 masses) with biopsy information from both digital mammograms (BCDR-DM) and film mammograms (BCDR-FM). Our final BCDR test set contains 1106 mass images extracted from CC and MLO views, 486 of which are malignant and 620 benign. To obtain mass patches, the lesion contour information is used to extract bounding boxes from the mammograms. We then create a square patch with a minimum size of 128x128 around this bounding box, ensuring a margin of 60 pixel in each direction. For classification, the mass patches are resized to pixel dimensions of 224x224 using inter-area interpolation, maintaining image ratios, and stacked to 3 channels. Models were trained on either a single 8GB NVIDIA RTX 2080 Super or 48GB RTX A6000 GPU using PyTorch and opacus [30] for DP-SGD.

Cancer Classification Transformer Model

Given its reported high performance on classifying the presence of a lesion in mammography patches [29] and its shifted window mechanism, allowing to effectively attend to shapes of varying sizes, we adopt a swin transformer (Swin-T) [17] as cancer classification model, to distinguish between benign and malignant masses. We inititalize ImageNet-pretrained [6] network weights and, after following the Swin-T hyperparameter setup [17] (stride, window size), we adjust the last fully-connected layer of the swin transformer reinitializing it with two output nodes each one outputting the logits for one of our respective classes (i.e., malignant or benign). We only set the parameters of the adjusted fully-connected layer as trainable and apply a learning rate of 1e-5. A weight decay of 1e-8 is used following the fine-tuning experiment described in [17]. Furthermore, an adamw optimizer, label smoothing of 0.1, and a batch size of 128 are used. During training, random horizontal and vertical flips are applied as data augmentation and a cross entropy loss is backpropagated. Training for 300 epochs, the model from the epoch with the lowest area under the precision-recall curve (AUPRC) on the validation set is selected for testing.

Malignancy-Conditioned Generative Adversarial Network

Going beyond unconditional mass synthesis in the literature [29, 2], we propose a malignancy conditioned generative adversarial network (MCGAN) to control the generation of either benign or malignant synthetic breast masses. In general, GANs consist of a generator (G) and a discriminator (D) network, which engage in a two-player zero-sum game, where G generates synthetic samples that D strives to distinguish from real ones [12]. We design G and D as deep convolutional neural networks [26] and, as shown in Fig. 1, integrate class-conditional information [22]. To this end, we extract the histopathology report’s biopsy information for each mass from the metadata, and convert it into a discrete malignancy label. Then, we transform this label into a multi-dimensional embedding vector before passing it through a fully-connected layer yielding a representation with the corresponding dimensionality to concatenate it to the generator input (100 dim noise vector) and to the discriminator input (128x128 input image). As D learns to associate class labels with patterns in the input images, it has to learn whether or not a given class corresponds to a given synthetic sample. Furthermore, as the discriminator loss is backpropagated into the generator, G is forced to synthesize samples corresponding to the provided class condition. This results in G learning a conditional distribution based on the value function

minGmaxDV(D,G)=minGmaxD[𝔼xpdata[logD(x|y)]+𝔼zpz[log(1D(G(z|y)))]].subscript𝐺subscript𝐷𝑉𝐷𝐺subscript𝐺subscript𝐷subscript𝔼similar-to𝑥subscript𝑝datadelimited-[]𝐷conditional𝑥𝑦subscript𝔼similar-to𝑧subscript𝑝𝑧delimited-[]1𝐷𝐺conditional𝑧𝑦\min_{G}\max_{D}V(D,G)=\min_{G}\max_{D}[\mathbb{E}_{x\sim p_{\mathrm{data}}}[% \log D(x|y)]+\mathbb{E}_{z\sim p_{z}}[\log(1-D(G(z|y)))]].roman_min start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT roman_max start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT italic_V ( italic_D , italic_G ) = roman_min start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT roman_max start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT [ blackboard_E start_POSTSUBSCRIPT italic_x ∼ italic_p start_POSTSUBSCRIPT roman_data end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ roman_log italic_D ( italic_x | italic_y ) ] + blackboard_E start_POSTSUBSCRIPT italic_z ∼ italic_p start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ roman_log ( 1 - italic_D ( italic_G ( italic_z | italic_y ) ) ) ] ] .

Optimizing the discriminator via binary cross-entropy [12], we define its loss in a class-conditional setup as

LDMCGAN=𝔼xpdata[logD(x|y)]+𝔼zpz[log(1D(G(z|y)))].subscript𝐿subscript𝐷MCGANsubscript𝔼similar-to𝑥subscript𝑝datadelimited-[]𝐷conditional𝑥𝑦subscript𝔼similar-to𝑧subscript𝑝𝑧delimited-[]1𝐷𝐺conditional𝑧𝑦\displaystyle L_{D_{\mathrm{MCGAN}}}=-\mathbb{E}_{x\sim p_{\mathrm{data}}}[% \log D(x|y)]+\mathbb{E}_{z\sim p_{z}}[\log(1-D(G(z|y)))].italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_MCGAN end_POSTSUBSCRIPT end_POSTSUBSCRIPT = - blackboard_E start_POSTSUBSCRIPT italic_x ∼ italic_p start_POSTSUBSCRIPT roman_data end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ roman_log italic_D ( italic_x | italic_y ) ] + blackboard_E start_POSTSUBSCRIPT italic_z ∼ italic_p start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ roman_log ( 1 - italic_D ( italic_G ( italic_z | italic_y ) ) ) ] .

We train our MCGAN on the CBIS-DDSM training data, applying random horizontal (p=0.5) and vertical (p=0.5) flipping as well as random cropping with resizing, where the resize scale ranges from 0.9 to 1.1 and aspect ratio from 0.95 to 1.1. We further include one-sided label smoothing [26] in a range of [0.7, 1.2]. Following [2], we employ a discriminator convolutional kernel size of 6 and a generator kernel size of 4. We observe that this reduces checkerboard artefacts as D’s field-of-view now requires G to create realistic transitions between the kernel-sized patches in the image. MCGAN is trained for 10k epochs with a batch size of 16. Based on the best quality-diversity tradeoff, we select the model from epoch 1.4k after qualitative visual assessment of generated samples .

Patient Privacy Preservation Framework

Privacy protection is an ethical norm and legal obligation, e.g. granting patients the right of their (retrospective) removal from databases [9]. Since (biomedical) deep learning models are vulnerable to information leakage, e.g. sensitive patient attributes [28, 3, 13], they can be affected by such (and future) regulations. However, privacy-preserving techniques can be integrated into deep learning frameworks and, to some extent, avoid compromising confidential data. For instance, (i) model training with DP-SGD [1] or (ii) training exclusively on synthetic data.

From a legal perspective, models trained on only synthetic data remain unaffected by patient consent withdrawal if “relatedness” between the data and the data subject cannot be established, or if “personal data has been rendered synthetic in such a manner that the data subject is no longer identifiable” [18] e.g., according to article 4(1) and recital 26 of the GDPR [9]. It is to be noted that in the “acceptable-risk” legal interpretation, a data subject’s re-identification risk is reduced to an “acceptable” level rather than fully eradicated [18]. Hence, this interpretation enables approaches such as synthetic data and/or Differential Privacy (DP) model training to be used as legally compliant privacy preservation methods despite not guaranteeing a “zero-risk” of patient re-identification.

DP is a mathematical framework that allows practitioners to provide (worst-case scenario) theoretical privacy guarantees for an individual sharing their data to train a deep learning model. Consider two databases (e.g., containing image-label pairs), we call them adjacent if they differ in a single data point, i.e., one image is present in one database but not in the other. Then, a randomised mechanism :𝒟:𝒟\mathcal{M}\colon\mathcal{D}\rightarrow\mathcal{R}caligraphic_M : caligraphic_D → caligraphic_R with domain 𝒟𝒟\mathcal{D}caligraphic_D and range \mathcal{R}caligraphic_R is said to satisfy (ε,δ)𝜀𝛿(\varepsilon,\delta)( italic_ε , italic_δ )-differential privacy, if for any two adjacent databases d,d𝒟𝑑superscript𝑑𝒟d,d^{\prime}\in\mathcal{D}italic_d , italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_D and for any subset of outputs S𝑆S\subseteq\mathcal{R}italic_S ⊆ caligraphic_R, Pr[(d)S]eεPr[(d)S]+δPr𝑑𝑆superscript𝑒𝜀Prsuperscript𝑑𝑆𝛿\Pr[\mathcal{M}(d)\in S]\leq e^{\varepsilon}\Pr[\mathcal{M}(d^{\prime})\in S]+\deltaroman_Pr [ caligraphic_M ( italic_d ) ∈ italic_S ] ≤ italic_e start_POSTSUPERSCRIPT italic_ε end_POSTSUPERSCRIPT roman_Pr [ caligraphic_M ( italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ∈ italic_S ] + italic_δ holds. ε𝜀\varepsilonitalic_ε and δ𝛿\deltaitalic_δ bound a single data point’s influence on a model’s output (e.g. the models’ weights or predictions). Thus, the smaller the value of these parameters, the higher the model’s privacy and the harder it is for an attacker to retrieve information about any training data point. DP-SGD [1] is the DP variant of the well-known SGD algorithm, and facilitates the training of a model under DP conditions. In particular, a model trained under (ε,δ)𝜀𝛿(\varepsilon,\delta)( italic_ε , italic_δ )-DP is robust to post-processing, meaning only using its output for further computations also satisfies (ε,δ)𝜀𝛿(\varepsilon,\delta)( italic_ε , italic_δ )-DP. Moreover, the choice of these parameters is application-dependent and normative [5] and varies strongly across real-world deployments [7]. In the case of mammography, multiple lesions of the same patient are available in the datasets, i.e. one from the CC view and one from the MLO view. Therefore, to preserve the privacy of one patient it is necessary to protect all their data points (i.e. all images). In such a case, DP group privacy is used to estimate a patient’s DP privacy guarantee. However, for simplicity, in our subsequent experiments, we provide image-level privacy guarantees rather than per patient.

3 Experiments and Results

3.0.1 Synthetic Data Evaluation

Experimental Setup Metrics
Dataset 1 Dataset 2 FIDImg \downarrow FIDRad \downarrow FRD \downarrow
SynMCGAN RealDDSM 58.00±plus-or-minus\pm±0.72 0.81±plus-or-minus\pm±.013 18.12±plus-or-minus\pm±1.01
RealDDSM RealDDSM 29.25±plus-or-minus\pm±0.82 0.31±plus-or-minus\pm±.019 3.48±plus-or-minus\pm±.352
SynMCGAN SynMCGAN 20.90±plus-or-minus\pm±0.16 0.32±plus-or-minus\pm±.012 0.57±plus-or-minus\pm±.094
RealDDSM RealBCDR 156.43±plus-or-minus\pm±14.3 3.88±plus-or-minus\pm±.351 277.63±plus-or-minus\pm±39.0
Refer to caption
Figure 2: Qualitative and quantitative synthesis results: Images are randomly selected malignant and benign real (CBIS-DDSM [16]) and MCGAN-generated masses. ImageNet [6] and RadImageNet [25, 21] based FID [14] and FRD [24] scores are reported as mean ±plus-or-minus\pm± standard deviation based on 3 subsets randomly sampled per patient (Nreal360{}_{\mathrm{real}}\approx 360start_FLOATSUBSCRIPT roman_real end_FLOATSUBSCRIPT ≈ 360, Nsyn3240{}_{\mathrm{syn}}\approx 3240start_FLOATSUBSCRIPT roman_syn end_FLOATSUBSCRIPT ≈ 3240). Row 4 indicates an BCDR-based[19] upper bound for comparison with synthetic data metrics in row 1.
Table 1: Results for within-domain (CBIS-DDSM [16]) and out-of-domain (BCDR [19]) breast cancer malignancy classification masses extracted from mammograms. Syn indicates 3k synthetic images being part of the fine-tuning training data, while SynPre represents pretraining all trainable model params with those 3k synthetic images (without DP guarantee), before fine-tuning the last two layers on real data with DP guarantee (RealFT). AUROC and AUPRC are reported as mean ±plus-or-minus\pm± std based on 3 random seed runs. Best results in bold.
Experimental Setup CBIS-DDSM [16] BCDR [19]
Model ε𝜀\varepsilonitalic_ε δ𝛿\deltaitalic_δ AUROC \uparrow AUPRC \uparrow AUROC \uparrow AUPRC \uparrow
SwinTReal \infty \infty 0.778±plus-or-minus\pm±.001 0.85±plus-or-minus\pm±.001 0.695±plus-or-minus\pm±.002 0.726±plus-or-minus\pm±.003
SwinTSyn \infty \infty 0.597±plus-or-minus\pm±.011 0.696±plus-or-minus\pm±.011 0.566±plus-or-minus\pm±.064 0.602±plus-or-minus\pm±.048
SwinTSynPre \infty \infty 0.639±plus-or-minus\pm±.016 0.733±plus-or-minus\pm±.001 0.622±plus-or-minus\pm±.032 0.660±plus-or-minus\pm±.017
SwinTReal 1111 1e41superscript𝑒41e^{-4}1 italic_e start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT 0.525±plus-or-minus\pm±.043 0.640±plus-or-minus\pm±.030 0.487±plus-or-minus\pm±.020 0.549±plus-or-minus\pm±.020
SwinTReal+Syn 1111 1e41superscript𝑒41e^{-4}1 italic_e start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT 0.553±plus-or-minus\pm±.040 0.665±plus-or-minus\pm±.025 0.521±plus-or-minus\pm±.023 0.573±plus-or-minus\pm±.024
SwinTSynPre+RealFT \infty|1111 \infty|1e41superscript𝑒41e^{-4}1 italic_e start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT 0.661±plus-or-minus\pm±.018 0.741±plus-or-minus\pm±.007 0.637±plus-or-minus\pm±.026 0.67±plus-or-minus\pm±0013
SwinTReal 6666 1e41superscript𝑒41e^{-4}1 italic_e start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT 0.572±plus-or-minus\pm±.031 0.679±plus-or-minus\pm±.019 0.532±plus-or-minus\pm±.031 0.579±plus-or-minus\pm±.029
SwinTReal+Syn 6666 1e41superscript𝑒41e^{-4}1 italic_e start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT 0.617±plus-or-minus\pm±.013 0.708±plus-or-minus\pm±.015 0.609±plus-or-minus\pm±.027 0.647±plus-or-minus\pm±.024
SwinTSynPre+RealFT \infty|6666 \infty|1e41superscript𝑒41e^{-4}1 italic_e start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT 0.677±plus-or-minus\pm±.014 0.752±plus-or-minus\pm±.009 0.647±plus-or-minus\pm±.022 0.679±plus-or-minus\pm±.009
SwinTReal 12121212 1e41superscript𝑒41e^{-4}1 italic_e start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT 0.596±plus-or-minus\pm±.023 0.702±plus-or-minus\pm±.013 0.559±plus-or-minus\pm±.033 0.600±plus-or-minus\pm±.030
SwinTReal+Syn 12121212 1e41superscript𝑒41e^{-4}1 italic_e start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT 0.624±plus-or-minus\pm±.010 0.704±plus-or-minus\pm±.012 0.625±plus-or-minus\pm±.020 0.663±plus-or-minus\pm±.012
SwinTSynPre+RealFT \infty|12121212 \infty|1e41superscript𝑒41e^{-4}1 italic_e start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT 0.688±plus-or-minus\pm±.012 0.758±plus-or-minus\pm±.011 0.654±plus-or-minus\pm±.019 0.685±plus-or-minus\pm±.007
SwinTReal 20202020 1e41superscript𝑒41e^{-4}1 italic_e start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT 0.611±plus-or-minus\pm±.018 0.715±plus-or-minus\pm±.012 0.581±plus-or-minus\pm±.028 0.618±plus-or-minus\pm±.026
SwinTReal+Syn 20202020 1e41superscript𝑒41e^{-4}1 italic_e start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT 0.630±plus-or-minus\pm±.003 0.699±plus-or-minus\pm±.008 0.641±plus-or-minus\pm±.018 0.685±plus-or-minus\pm±.012
SwinTSynPre+RealFT \infty|20202020 \infty|1e41superscript𝑒41e^{-4}1 italic_e start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT 0.697±plus-or-minus\pm±.012 0.763±plus-or-minus\pm±.012 0.659±plus-or-minus\pm±.017 0.689±plus-or-minus\pm±.006
SwinTReal 60606060 1e41superscript𝑒41e^{-4}1 italic_e start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT 0.622±plus-or-minus\pm±.014 0.721±plus-or-minus\pm±.110 0.605±plus-or-minus\pm±.019 0.640±plus-or-minus\pm±.017
SwinTReal+Syn 60606060 1e41superscript𝑒41e^{-4}1 italic_e start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT 0.629±plus-or-minus\pm±.002 0.694±plus-or-minus\pm±.005 0.650±plus-or-minus\pm±.013 0.696±plus-or-minus\pm±.007
SwinTSynPre+RealFT \infty|60606060 \infty|1e41superscript𝑒41e^{-4}1 italic_e start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT 0.712±plus-or-minus\pm±.013 0.776±plus-or-minus\pm±.013 0.671±plus-or-minus\pm±.014 0.697±plus-or-minus\pm±.004

Qualitatively assessing the synthetic images in Fig. 2, it is not readily possible to distinguish synthetic from real masses in terms of image fidelity or diversity. We note the absence of clear visual indicators to distinguish between malignant and benign images for both real and synthetic images. This is in line with the difficulty of determining the malignancy of a mammographic lesion shown by high clinical error rates and inter-observer variability [8]. However, results for training our malignancy classification model on only synthetic data (see Syn and SynPre in Table 1) show that the synthetic data captures the conditional distribution effectively generating either malignant or benign masses. Both, vanilla ImageNet-based Fréchet Inception Distance (FID) [14, 6] and radiology domain-specific RadImageNet-based FID [25, 21], concur that the synthetic data (FIDImg=58±plus-or-minus\pm±.72) is substantially closer to the real CBIS-DDSM [16] distribution compared to BCDR [19] (FIDImg=156.43±plus-or-minus\pm±1.43). This is even more pronounced when comparing the variation of extracted radiomics features for CBIS-DDSM to synthetic (FRD=18.12) and BCDR (FRD=277.63) images using the Fréchet Radiomics Distance (FRD) [24]. While this indicates desirable synthetic data fidelity, we also observe good diversity. The latter is shown by comparing subsets of the same datasets with each other, where the variation within the synthetic data (e.g., FIDRad=0.32±plus-or-minus\pm±.12) closely resembles the variation within the real CBIS-DDSM dataset (e.g., FIDRad=0.31±plus-or-minus\pm±.19). Notwithstanding less variation in radiomics imaging biomarkers within the synthetic data (FRDSyn=0.57 vs. FRDReal=3.48), this overall points to a valid coverage of the distribution and an absence of mode collapse.

3.0.2 Mass Malignancy Classification

As shown in Table 1, we conduct experiments with and without formal privacy guarantees. For scenarios where a formal privacy guarantee is not strictly required and, thus, synthetic data suffices as privacy mechanism, we compare the results of training SwinT on synthetic data (Syn) and on real data (Real) with DP-SGD. Kaissis et al. [15] defined ε=6𝜀6\varepsilon=6italic_ε = 6 as suitable privacy budget for their medical imaging dataset. Compared to DP-SGD with ε=6𝜀6\varepsilon=6italic_ε = 6, synthetic data achieves better AUPRCs for within-domain tests on CBIS-DDSM (SwinTSyn=0.696 vs SwinTReal(ε=6)=0.679) and is on par for out-of-domain (ood) tests on BCDR (SwinTSyn=0.602 vs SwinTReal(ε=6)=0.600). However, training all SwinT layers using synthetic data (SynPre), achieves substantially better performance only approximated by DP results for ε=60𝜀60\varepsilon=60italic_ε = 60 for within-domain (SwinTSynPre=0.733 vs SwinTReal(ε=60)=0.721) and ood (SwinTSynPre =0.66 vs SwinTReal(ε=60)=0.64) tests. Further fine-tuning SwinTSynPre on real data using DP-SGD results in additional improvement across all privacy parameters for within-domain and ood testing. For instance, training SwinTSynPre+RealFT with ε=1𝜀1\varepsilon=1italic_ε = 1 results in an AUPRC of 0.74 and 0.67 for CBIS-DDSM and BCDR, respectively. To assess scenarios where a formal guarantee is required, we further compare DP-SGD training of SwinT on real data (Real) with DP-SGD training on a mix of real and synthetic data (Real+Syn). To this end, our experiments show that such synthetic data augmentation can improve the privacy-utility tradeoff. This is exemplified by SwinTReal+Syn(ε=6) accomplishing an AUPRC of 0.708 within-domain and 0.647 ood, while SwinTReal(ε=6) achieved 0.679 and 0.579, respectively. We further observe the trend that stricter privacy budgets (i.e., smaller ε𝜀\varepsilonitalic_ε) can be associated with more added performance of synthetic data as additional classification model training data.

4 Discussion and Conclusion

We introduce a privacy preservation framework based on differential privacy (DP) and synthetic data and apply it to the diagnostic task of classifying the malignancy of breast masses extracted from screening mammograms. We further propose, train, and evaluate a malignancy-conditioned generative adversarial network to generate a dataset of benign and malignant synthetic breast masses. Next, we train a swin transformer model on mass malignancy classification and assess, compare and combine training under DP and training on synthetic data. This analysis revealed that when training with DP, synthetic data augmentation can notably improve classification performance for within-domain and out-of-domain test cases. Apart from that, we show, across privacy mechanisms and across domains, that the performance of models pretrained on synthetic data can be further improved by DP fine-tuning on real data.

This finding is particularly important considering that synthetic data, if not directly attributable to any specific patient, can become a valid, legally compliant alternative to strict DP guarantees in clinical practice. Consequently, it is to be further investigated where and when deterministic mechanisms without formal DP guarantees can suffice to shield against different privacy attacks [4]. In particular, we motivate future work to analyse the extent to which the inherent properties of synthetic data generation algorithms can provide empirical protection against attacks. A methodological alternative to our approach is to assess privacy-utility tradeoffs when training the generative model itself using DP-SGD [10, 23], resulting in formal privacy guarantees of the generated synthetic datasets. Thus, a further avenue to explore then lies within the question whether randomness inherent in randomised data synthesis algorithms (e.g., based on the noise in diffusion models [27] or GANs [12]) can be used to amplify the privacy of the DP versions of such synthesis algorithms, thereby potentially further enhancing privacy-utility tradeoffs. To this end, our study constitutes a crucial first step leading towards the clinical adoption of diagnostic deep learning models, enabling practical privacy-utility tradeoffs all while anticipating respective legal obligations and clinical requirements.

{credits}

4.0.1 Acknowledgements

This study has received funding from the European Union’s Horizon research and innovation programme under grant agreement No 952103 (EuCanImage) and No 101057699 (RadioVal). It was further partially supported by the project FUTURE-ES (PID2021-126724OB-I00) from the Ministry of Science and Innovation of Spain. RO acknowledges a research stay grant from the Helmholtz Information and Data Science Academy (HIDA).

4.0.2 \discintname

The authors have no competing interests to declare that are relevant to the content of this article.

References

  • [1] Abadi, M., Chu, A., Goodfellow, I., McMahan, H.B., Mironov, I., Talwar, K., Zhang, L.: Deep learning with differential privacy. In: Proceedings of the 2016 ACM SIGSAC conference on computer and communications security. pp. 308–318 (2016)
  • [2] Alyafi, B., Diaz, O., Marti, R.: DCGANs for realistic breast mass augmentation in x-ray mammography. In: Medical Imaging 2020: Computer-Aided Diagnosis. vol. 11314, p. 1131420. International Society for Optics and Photonics (2020)
  • [3] Balle, B., Cherubin, G., Hayes, J.: Reconstructing training data with informed adversaries. In: 2022 IEEE Symposium on Security and Privacy (SP). pp. 1138–1156. IEEE (2022)
  • [4] Cohen, A., Nissim, K.: Towards formalizing the gdpr’s notion of singling out. Proceedings of the National Academy of Sciences 117(15), 8344–8352 (2020)
  • [5] De, S., Berrada, L., Hayes, J., Smith, S.L., Balle, B.: Unlocking high-accuracy differentially private image classification through scale. arXiv preprint arXiv:2204.13650 (2022)
  • [6] Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. pp. 248–255. Ieee (2009)
  • [7] Dwork, C., Kohli, N., Mulligan, D.: Differential privacy in practice: Expose your epsilons! Journal of Privacy and Confidentiality 9(2) (2019)
  • [8] Ekpo, E.U., Alakhras, M., Brennan, P.: Errors in mammography cannot be solved through technology alone. Asian Pacific journal of cancer prevention: APJCP 19(2),  291 (2018)
  • [9] European Parliament and Council of European Union: General Data Protection Regulation (GDPR), REGULATION (EU) 2016/679 OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL. Online at https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:32016R0679/ (2018)
  • [10] Ghalebikesabi, S., Berrada, L., Gowal, S., Ktena, I., Stanforth, R., Hayes, J., De, S., Smith, S.L., Wiles, O., Balle, B.: Differentially private diffusion models generate useful synthetic images. arXiv preprint arXiv:2302.13861 (2023)
  • [11] Global Cancer Observatory: The global cancer observatory (gco) is an interactive web-based platform presenting global cancer statistics to inform cancer control and research. https://gco.iarc.fr/ (2023), accessed on 2023-01-17
  • [12] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Advances in neural information processing systems. pp. 2672–2680 (2014)
  • [13] Haim, N., Vardi, G., Yehudai, G., Shamir, O., Irani, M.: Reconstructing training data from trained neural networks. arXiv preprint arXiv:2206.07758 (2022)
  • [14] Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local nash equilibrium. arXiv preprint arXiv:1706.08500 (2017)
  • [15] Kaissis, G., Ziller, A., Passerat-Palmbach, J., Ryffel, T., Usynin, D., Trask, A., Lima Jr, I., Mancuso, J., Jungmann, F., Steinborn, M.M., et al.: End-to-end privacy preserving deep learning on multi-institutional medical imaging. Nature Machine Intelligence 3(6), 473–484 (2021)
  • [16] Lee, R.S., Gimenez, F., Hoogi, A., Miyake, K.K., Gorovoy, M., Rubin, D.L.: A curated mammography data set for use in computer-aided detection and diagnosis research. Scientific data 4(1),  1–9 (2017)
  • [17] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 10012–10022 (2021)
  • [18] López, C.A.F.: On the legal nature of synthetic data. In: NeurIPS 2022 Workshop on Synthetic Data for Empowering ML Research (2022)
  • [19] Lopez, M.G., Posada, N., Moura, D.C., Pollán, R.R., Valiente, J.M.F., Ortega, C.S., Solar, M., Diaz-Herrero, G., Ramos, I., Loureiro, J., et al.: BCDR: a breast cancer digital repository. In: 15th International conference on experimental mechanics. vol. 1215 (2012)
  • [20] McKinney, S.M., Sieniek, M., Godbole, V., Godwin, J., Antropova, N., Ashrafian, H., Back, T., Chesus, M., Corrado, G.S., Darzi, A., et al.: International evaluation of an ai system for breast cancer screening. Nature 577(7788), 89–94 (2020)
  • [21] Mei, X., Liu, Z., Robson, P.M., Marinelli, B., Huang, M., Doshi, A., Jacobi, A., Cao, C., Link, K.E., Yang, T., et al.: RadImageNet: An Open Radiologic Deep Learning Research Dataset for Effective Transfer Learning. Radiology: Artificial Intelligence p. e210315 (2022)
  • [22] Mirza, M., Osindero, S.: Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014)
  • [23] Osuala, R., Kushibar, K., Garrucho, L., Linardos, A., Szafranowska, Z., Klein, S., Glocker, B., Diaz, O., Lekadir, K.: Data synthesis and adversarial networks: A review and meta-analysis in cancer imaging. Medical Image Analysis 84, 102704 (2023)
  • [24] Osuala, R., Lang, D., Verma, P., Joshi, S., Tsirikoglou, A., Skorupko, G., Kushibar, K., Garrucho, L., Pinaya, W.H., Diaz, O., et al.: Towards learning contrast kinetics with multi-condition latent diffusion models. arXiv preprint arXiv:2403.13890 (2024)
  • [25] Osuala, R., Skorupko, G., Lazrak, N., Garrucho, L., García, E., Joshi, S., Jouide, S., Rutherford, M., Prior, F., Kushibar, K., et al.: medigan: a python library of pretrained generative models for medical image synthesis. Journal of Medical Imaging 10(6), 061403 (2023)
  • [26] Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434 (2015)
  • [27] Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., Ganguli, S.: Deep unsupervised learning using nonequilibrium thermodynamics. In: International Conference on Machine Learning. pp. 2256–2265. PMLR (2015)
  • [28] Su, R., Liu, X., Tsaftaris, S.A.: Why patient data cannot be easily forgotten? In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2022: 25th International Conference, Singapore, September 18–22, 2022, Proceedings, Part VIII. pp. 632–641. Springer (2022)
  • [29] Szafranowska, Z., Osuala, R., Breier, B., Kushibar, K., Lekadir, K., Diaz, O.: Sharing generative models instead of private data: a simulation study on mammography patch classification. In: 16th International Workshop on Breast Imaging (IWBI2022). vol. 12286, pp. 169–177. SPIE (2022)
  • [30] Yousefpour, A., Shilov, I., Sablayrolles, A., Testuggine, D., Prasad, K., Malek, M., Mironov, I.: Opacus: User-friendly differential privacy library in pytorch. arXiv preprint arXiv:2109.12298 (2021)