survey

Open access

Recent Advances of Differential Privacy in Centralized Deep Learning: A Systematic Survey

Authors:

Lea Demelius,

Roman Kern,

Andreas TrüglerAuthors Info & Claims

ACM Computing Surveys, Volume 57, Issue 6

Article No.: 158, Pages 1 - 28

https://doi.org/10.1145/3712000

Published: 10 February 2025 Publication History

PDF eReader

Abstract

Differential privacy has become a widely popular method for data protection in machine learning, especially since it allows formulating strict mathematical privacy guarantees. This survey provides an overview of the state of the art of differentially private centralized deep learning, thorough analyses of recent advances and open problems, as well as a discussion of potential future developments in the field. Based on a systematic literature review, the following topics are addressed: emerging application domains, differentially private generative models, auditing and evaluation methods for private models, protection against a broad range of threats and attacks, and improvements of privacy-utility tradeoffs.

1 Introduction

Deep learning (DL) is the state of the art of machine learning used to solve complex tasks in various fields, including computer vision and natural language processing (NLP), as well as applications in different domains ranging from healthcare to finance. As the performance of these models relies on a high amount of training data, the rise of DL comes with an increased interest in collecting and analyzing more and more data. However, data often includes personal or confidential information, making privacy a pressing concern. This development is also reflected in legislation that was recently put in place to protect personal information and avoid identification of individuals—for example, the General Data Protection Regulation (GDPR) in the European Union or the California Consumer Privacy Act (CCPA).

In response, privacy-enhancing technologies are growing in popularity. Cryptographic techniques [108] like homomorphic encryption and secure multi-party computation are used to protect against direct information leakage during data analysis. However, sensitive information can still be leaked indirectly via the output of the analysis and compromise privacy. For example, a machine learning model trained to diagnose a disease might make it possible to reconstruct specific information about individuals in the training dataset, even when the computation is encrypted. To avoid this kind of privacy leak, output privacy-preserving techniques can be applied. One common approach is to use anonymization techniques like k-anonymity [116], l-diversity [86] or t-closeness [79]. However, it is now well known that anonymization often cannot prevent re-identification [19, 60, 94] and the privacy risk is not quantifiable. Hence, differential privacy (DP) [50] was proposed to provide output privacy guarantees. DP is a mathematical probabilistic definition that makes it possible to hide information about each single datapoint (e.g., about each individual) while allowing inquiries about the whole dataset (e.g., a population) by adding curated noise. While DP provides a good utility-privacy tradeoff in many cases (especially statistical queries and traditional machine learning methods like logistic regression or support vector machines), the combination of DP and DL poses many challenges arising from the high-dimensionality, the high number of training steps, and the non-convex objective functions in DL. As a response, considerable progress has been made in the past few years addressing the difficulties and opportunities of differentially private deep learning (DP-DL).

1.1 Contributions of This Survey

This survey provides a comprehensive analysis of recent advances in DP-DL, focusing on centralized DL. Distributed, federated, and collaborative DL methods and applications have their unique characteristics and challenges, and deserve a separate survey. We also specifically focus on methods that do not presume convex objective functions.

Our main contributions are as follows:

(1)

Thorough systematic literature review of DP-DL.

(2)

Identification of the research focuses of the past years (2019–2023): (1) DP-DL for specific applications (Section 4), (2) differentially private generative models (Section 5), (3) evaluating and auditing DP-DL models (Section 6), (4) DP-DL against threats other than membership and attribute inference (Section 7), and (5) improving the privacy-utility tradeoff of DP-DL (Section 8).

(3)

Analysis and contextualization of the different research trends including their potential future development (Section 9).

Previous reviews of DL and DP (Table 1) primarily cover the privacy threats specific to DL and the most common DP methods used for protection. In contrast, this survey goes beyond the basic methods and systematically investigates advances and new paths explored in the field since 2019. We provide the reader with advanced understanding of the state-of-the-art methods, the challenges, and opportunities of DP-DL.

Table 1.

Review	Year	Systematic	Contribution
Zhao et al. [147]	2019	–	privacy attacks on DL, basic DP-DL methods
Ha et al. [67]	2019	–	privacy attacks on DL, basic DP-DL methods
Shen and Zhong [112]	2021	–	comparison of basic DP-DL models
Ouadrhiri and Abdelhadi [97]	2022	–	DP variants and mechanisms for deep and federated learning
This survey	2023	\(\checkmark\)	recent advances in DP-DL: evaluation and auditing, privacy-utility tradeoff, threats and attacks, synthetic data and generative models, open problems

Table 1. Reviews on DP-DL

Additional to the reviews mentioned in Table 1, a range of broader reviews exist, such as those about differentially private machine learning (without focusing on DL) [22, 64, 151], privacy-preserving DL (without focusing on DP) [24, 32, 68, 89, 117, 124], or privacy-preserving machine learning in general [82, 135]. These works give a good overview, whereas our survey follows a more detailed approach focusing on recent developments of DP in centralized DL. There are also surveys on DP for specific applications or data types such as continuous [110] or unstructured [148] data (i.e., image, video, audio, and text data). While these surveys give more insight into the challenges of DP for the specific data type, they do not provide a comprehensive overview of the whole field of DP-DL. In contrast, our strict survey methodology leads to an application- and data type-independent analysis of the research focuses of the recent years.

2 Survey Methodology

We conducted a systematic literature search on Scopus using a predefined search query, and subsequent automatic and manual assessment according to specific inclusion/exclusion criteria. The full process is depicted in Figure 1.

Fig. 1.

The Scopus query filtered for documents with the keywords “differential privacy,” “differentially private,” or “differential private” combined with “neural network,” “deep learning,” or “machine learning,” but without mention of “federated learning,” “collaborative,” “distributed,” or “edge” in their title or abstract. The documents were restricted to articles, conference papers, reviews, or book chapters published in journals, proceedings, or books in computer science or engineering. Only documents published in English between 2019 and March 2023 were considered.

Additionally, documents were only included if they were cited at least 10 times or were in the top 10% of most cited papers of the corresponding year. This inclusion criterion allowed manual review of the most influential works in the field. The second part of the criterion was added to mitigate the bias toward older papers.

In the last step, the remaining documents were manually reviewed. Whether a document was included in this survey was decided based on the following criteria. First, DP must be a key topic of the study. Second, the methods must include neural networks with results relevant for DL. Third, we only included works about centralized learning in contrast to distributed learning. Fourth, we excluded reinforcement learning and focused on the supervised and unsupervised learning paradigms typical for centralized DL. Last, one document was excluded because of retraction.

3 Preliminaries

3.1 Differential Privacy

DP [50] is a mathematical definition of privacy that aims at protecting details about individuals while still allowing general learnings from their data. A randomized algorithm \(\mathcal {M}:\mathcal {D}\rightarrow \mathcal {R}\) with domain \(\mathcal {D}\) and range \(\mathcal {R}\) is \(\epsilon\)-differentially private if for any two datasets \(x, y \in \mathcal {D}\) differing on at most one datapoint, and any subset of outputs \(\mathcal {S} \subseteq \mathcal {R}\), it holds that

\begin{equation} Pr[\mathcal {M}(x)\in \mathcal {S}]\le e^{\epsilon }Pr[\mathcal {M}(y)\in \mathcal {S}], \end{equation}

(1)

where \(Pr[]\) denotes the probability and \(\epsilon\) is the privacy risk (also referred to as the privacy budget). Simply put, a single datapoint (usually corresponding to an individual) has only a limited impact on the output of the analysis. \(\epsilon\) quantifies the maximum amount of information the output of the algorithm can disclose about the datapoint. Therefore, a lower \(\epsilon\) results in stronger privacy.

DP is achieved by computing the sensitivity of the algorithm in question to the absence/presence of a datapoint and adding random noise accordingly. The most common noise distributions used for DP are Laplace, Gaussian, and Exponential distribution. The amount of added noise relative to the sensitivity determines the privacy risk.

DP is not only quantifiable but also has a number of other benefits. For one, it is independent of both the input data and auxiliary information such as publicly available data or future knowledge. Moreover, it is immune to post-processing and is composable—that is, the sequential or parallel application of two \(\epsilon\)-differentially private algorithms is at least \(2\epsilon\)-DP. Advanced composition theorems can even prove lower overall privacy bounds. Another advantage of DP is its flexibility. The noise can be applied at different points in the dataflow, such as on the input data, on the output of the algorithm, or somewhere in between (e.g., during training of a DL model). It can also be combined with other privacy-enhancing technologies like homomorphic encryption, secure multi-party computation, or federated learning.

While these advantages lead to the widespread acceptance of DP as the gold standard for privacy protection, it also comes with its challenges. First, the unitless and probabilistic privacy parameter \(\epsilon\) is difficult to interpret, and thus choosing an appropriate value is challenging. The interested reader is referred to the work of Desfontaines [45] for a list of values used in real-world applications.

Another challenge of DP is its influence on the algorithm’s outputs and, consequently, its properties. The dominant issue is the privacy-utility tradeoff. Nevertheless, it can also influence the performance, usability, fairness, or robustness of the algorithm.

Additionally, DP is not easy to understand for laypeople. First, it is important to note that DP does not offer perfect privacy. DP provides a specific interpretation of privacy, but depending on the context, other interpretations might be expected or more relevant. Besides, the privacy protection depends both on the privacy parameter \(\epsilon\) and the unit/granularity of privacy (i.e., on what is considered as one record). That is to say, merely stating that an algorithm is DP does not ensure a substantial guarantee. Second, as DP is a mathematical definition that only defines the constraints but does not mandate an implementation, there exist many different algorithms that satisfy DP, from simple statistical analyses to machine learning algorithms. Some assume a central trusted party that has access to all the data (global DP), whereas others apply noise before data aggregation (local DP). Third, there exist a wide range of DP variants and extensions. Just within the years 2008 and 2021, more than 250 new notions were introduced [46]. They differ, for example, in how they quantify privacy loss, which properties are protected, and what capabilities the attacker is assumed to have. The most common extension is approximate DP (often simply referred to as \((\epsilon ,\delta)\)-DP), where the algorithm is \(\epsilon\)-DP with a probability of \(1-\delta\). The failure probability \(\delta\) is typically set to less than the inverse of the dataset size. Other common relaxations include zero-concentrated DP (zCDP) [51] and Rényi DP [90].

3.2 Deep Neural Networks and Stochastic Gradient Descent

Deep neural networks consist of connected nodes organized in layers, namely an input layer, multiple hidden layers, and an output layer. Each node applies a set of weights to its inputs and passes its sum through a non-linear activation function. The network can learn to perform different tasks, such as classification on complex data by minimizing the loss (i.e., the difference between the predictions of the model and the desired outputs). As the objective function of deep neural networks are often non-convex, it is generally not feasible to solve this optimization problem analytically. Instead, iterative optimization algorithms are applied, most commonly variants of stochastic gradient descent (SGD): At each timestep j, the model weights w are updated according to the gradients of the objective function \(\mathcal {L}\) computed on a randomly selected subset of the training dataset (=batch) \(\mathcal {B}\). In Equation (2), \(x_i\) denotes a record from the training set with the corresponding target output \(y_i\), \(\eta\) the learning rate, and \(|\mathcal {B}|\) the batch size.

\begin{equation} w_j = w_{j-1} - \eta \frac{1}{|\mathcal {B}|} \sum _{i\in \mathcal {B}} \nabla \mathcal {L}(w_{j-1}, x_i, y_i) \end{equation}

(2)

After training the model (e.g., with SGD), neural networks can be used to make predictions for previously unseen data. Models that perform well on new data are well generalized, whereas models that perform well only on the training data are considered overfitted.

DL models are usually trained with a supervised or an unsupervised learning paradigm. In supervised learning, the training set is labeled, whereas in unsupervised learning, the model learns to identify patterns without access to a ground truth. Apart from the fully connected layers described earlier, neural networks often include other types of architectures and layers. One popular example are convolutional layers, which use filters to extract local patterns, such as edges in images.

3.3 Privacy Threats for DL Models

Privacy attacks on DL models target either the training data or the model itself. They can take place in the training or the inference phase. During training, the adversary can not only be a passive observer but can also actively change the training process. Moreover, the attacker can have access to the whole model—that is, its architecture, parameters, gradients, and outputs (white-box access), or only the model’s outputs (black-box access). Typical privacy attacks are presented next.

Membership Inference Attack (MIA) . In a MIA, the adversary attempts to infer whether a specific record was part of the training dataset. This type of attack exists both in the white- and black-box setting, where in the latter case one distinguishes further between access to the prediction vector and access to the label. The first (and still widely used) MIA on machine learning models was proposed by Shokri et al. [113]. They train shadow models that imitate the target model, and based on their outputs on training and non-training data, a classification model learns to identify which records are members of the training data. The attack performs best on overfitted models [113, 140].

Model Inversion Attack . In a model inversion attack, the attacker tries to infer sensitive information about the training data, such as an attribute of a specific datapoint or features that characterize a certain class. For example, they might infer genetic information of a specific patient from a drug dosage prediction model [58], reconstruct the face of an individual from a face recognition model [57], or extract private text sequences from a generative language model [29, 30].

Property Inference Attack . In a property inference attack, the adversary seeks to extract information about the training data that is unrelated to the training task, such as statistical properties of the training dataset. Demonstrations in the literature include, for instance, the proportion of women in the training data of an income prediction model, or the proportion of older faces from an image classification model for gender classification [59].

Model Extraction Attack . In a model extraction attack, the adversary learns a model that approximates the target model. While this threat only targets the model and not the training data, it can still increase the privacy risk as a successful model extraction can facilitate follow-up attacks like model inversion.

In is important to note that especially in the case of model inversion and property inference, the terms are not used consistently. For example, De Cristofaro [43] uses the term property inference for inferring features of a class, whereas others [100, 109, 146] (including us) refer to this as a type of model inversion. In addition, there exist specialized privacy attacks tailored to specific settings that are related to, yet distinct from, these attack classes. One notable example includes authorship inference attacks, which are directed at language models to uncover the identities of the individuals who authored (parts of) the training data, or author attribute attacks, which attempt to determine specific characteristics or demographics of the author(s), such as their gender or age.

Furthermore, there are attacks not specific to privacy that could still be a threat to DL models, such as those presented next.

Adversarial Attack . In an adversarial attack, during the inference phase, the attacker manipulates inputs purposefully so that the model behaves in unintended ways. For example, small perturbations to images invisible to the human eye can lead to misclassifications [65]. In the context of generative language models, a type of attack related to adversarial attacks is prompt injection, where prompts are deliberately crafted to cause the model to generate harmful, misleading, or sensitive information.

Poisoning Attack . In a poisoning attack, the adversary perturbs the training samples so that they manipulate the model with the goal to reduce its accuracy or trigger specific misclassifications. Poisoning attacks pose a critical threat in environments where data is gathered from uncontrolled sources, such as crowdsourcing. The perturbations can be introduced to the training data itself or its labels [118]. Targeted poisoning attack, also known as backdoor attacks, insert a hidden pattern (known as trigger) into the model during training, so it behaves normally on regular inputs but produces incorrect predictions when the trigger is present. For instance, the presence of specific types of glasses can cause misclassifications in a face recognition model [40].

For a more detailed description, we refer the reader to surveys that focus on privacy attacks [43, 109, 111]. A selection of attack implementations can, for example, be found in the Adversarial Robustness Toolbox [96]. The use of unstructured data such as text presents additional challenges—for example, it is more difficult to identify what parts should be considered private. For a comprehensive discussion on privacy concerns related to language models, we refer the reader to the work of Brown et al. [26].

3.4 DP Algorithms for DL

The most popular algorithm for DP-DL is differentially private stochastic gradient descent (DP-SGD) by Abadi et al. [11]. DP-SGD adapts classical SGD by perturbing the gradients with noise. Abadi et al. [11] built upon prior research on gradient perturbation [21, 114] to tailor it for DL, particularly by proposing a tighter privacy accounting method. As the gradients norms are unbounded, they are clipped to ensure a finite sensitivity before Gaussian noise is added. Equation (2) thus becomes Equation (3), where \(clip[]\) denotes the clipping function that clips the per-example gradients so that their norm does not exceed the clipping norm C, \(\chi\) is a random vector drawn from a standard Gaussian distribution, and \(\sigma\) is the standard deviation of the added Gaussian noise.

\begin{equation} w_j = w_{j-1} - \eta \left(\frac{1}{|\mathcal {B}|} \sum _{i\in \mathcal {B}} \left(clip\left[\nabla \mathcal {L}(w_{j-1}, x_i, y_i), C\right] + \sigma C\chi \right)\right) \end{equation}

(3)

While the privacy-utility tradeoff inherent to DP still applies (i.e., information is lost), DP-SGD can improve generalization and thus improve accuracy on the validation dataset.

It is important to note that the per-example gradient clipping in DP-SGD used to bind each record’s sensitivity differs from the batch-wise gradient clipping sometimes used to improve stability and convergence of the training process [144].

Instead of perturbing the gradients, it is also possible to either perturb the model parameters after non-private training or the objective function [34]. Both of these methods, however, have serious limitations in DL as they rely on strong convexity assumptions that do not hold for deep neural networks. While non-convex objective functions can be approximated with convex polynomial functions [104, 105], these methods still considerably constrain the model’s learning capabilities [71].

An alternative to rendering the DL model itself differentially private is the addition of noise at the prediction level [49]. However, with this method, the privacy budget increases with every prediction made, making it necessary to limit the number of inferences that can be made with the model. For an example of prediction level perturbation in DL, see the work of Ye et al. [139].

Another popular method for training a DL model in a differentially private manner is PATE (Private Aggregation of Teacher Ensembles) [98]. PATE is a semi-supervised approach that needs public data. However, in contrast to DP-SGD, PATE can be applied to every machine learning technique. First, several teacher models are trained on separate private datasets. Next, the public data is labeled based on the differentially private aggregation of the predictions of the teacher models. Finally, a student model is trained on the noisily labeled public data, which can be made public afterward.

4 Specific Applications of DP-DL

In recent years, DP-DL has attracted increased interest in a wide range of application areas. Table 2 lists the application fields and corresponding papers discussed in this survey.

Table 2.

Application	Paper
Image publishing	[128, 141]
Medical image analysis	[93, 129]
Face recognition	[31, 81]
Video analysis	[27, 61]
Natural language processing	[17, 53, 54, 80, 85]
Smart grid	[13, 123]
Recommender systems	[35, 145]
Mobile devices	[39, 72, 81, 126]

Table 2. Summary of Application Areas of DP-DL

4.1 Image Publishing

Today, a high amount of images are getting published on a daily basis via the internet. In particularly privacy-sensitive cases, de-identification techniques like blurring and pixelation are used to protect certain objects (e.g., faces and license plates). However, these methods compromise the images’ quality and usability. Yu et al. [141] and Wen et al. [128] proposed to replace the sensitive image content with synthetic data in a differentially private manner. They both use Generative Adversarial Networks (GANs) (for more details, see Section 5) and introduce Laplace noise into the latent representations. While Yu et al. [141] first applied a convolutional neural network (CNN) to detect the sensitive image regions, Wen et al. [128] focused specifically on face anonymization, making this step redundant. Instead, they concentrated on preserving the visual traits by encoding the attribute and identity information separately and only perturbing the latter.

4.2 Medical Image Analysis

Privacy is of particular relevance in the health domain. One application area where DP was recently applied is medical image analysis, such as for COVID-19 diagnoses from chest x-rays [93] and for classification of histology images [129]. The former applied PATE on a convolutional deep neural network. The latter introduced P3SGD (patient privacy preserving SGD), a variant of DP-SGD that protects patient-level privacy instead of image-level privacy. They showed that DP can not only preserve privacy but can also mitigate overfitting in cases of small numbers of training records.

4.3 Face Recognition

Face recognition is a prevalent technology for security, such as for unlocking smartphones and for surveillance. For face recognition algorithms, not only privacy but also performance is critical, as they are often deployed on devices with limited resources and the analyses should be carried out in real time. To this end, methods that decrease the size of the models while preserving DP were proposed.

Li et al. [81] introduced LightFace, a lightweight DL model designed for private face recognition on mobile devices. The approach uses depth-wise separable convolutions for model size reduction, and a Bayesian GAN and ensemble learning for privacy preservation. They were able to decrease the model size and the computational demand while still outperforming both DP-SGD and PATE. Chamikara et al. [31] proposed PEEP (privacy using eigenface perturbation), where the dimensionality of the images for training the DL model is reduced with differentially private PCA (Principal Component Analysis).

4.4 Video Analysis

With the prevalence of CCTV cameras, automatic video analyses are on the rise, including traffic monitoring and security surveillance. Compared to image analysis, video analysis is more complex due to the additional time dimension. Private traffic monitoring tries to answer questions like how many people or cars were passing in a certain time period or how long people or cars were visible on average, while disclosing no information about individuals (e.g., if a specific person or car was observed). To this end, Cangialosi et al. [27] introduced the differentially private video analytics system Privid. Privacy-preserving video analysis is also relevant in security surveillance (e.g., for crime and threat detection). Giorgi et al. [61] proposed training an autoencoder with DP-SGD for detecting anomalies (e.g., vandalism, robbery, assault) in CCTV footage.

4.5 Natural Language Processing

DP is also increasingly applied in NLP, protecting against different threat scenarios like membership inference of a phrase or word, author attribute inference (e.g., age or gender), authorship identification, or disclosure of sensitive content. Depending on the use case and threat scenario, the unit of privacy can be, for example, a token, a word, a sentence, a document, or a user. Some works (e.g., [53, 54]) also apply metric-based variants of DP (d-privacy [33]), where the unit of privacy is determined by the distance between datapoints, (e.g., semantic similarity). Differentially private NLP models can be achieved in two ways: (1) by training the model with DP-SGD [80] or (2) by perturbing the text representations further used for training [53, 54, 85].

Applying the original DP-SGD on large language models can lead to deficient accuracy and unreasonably high computational and memory overhead. Pretraining, refined hyperparameter selection, and fine-tuning can improve the performance significantly [80]. Moreover, Li et al. [80] proposed “ghost clipping,” which avoids computing the per-example gradients explicitly and infers the per-example gradient norms in a less memory-demanding way.

The perturbation of text representations is a local DP method. Even though it can be used in centralized DL, it is primarily applied in the distributed setting where the server training the model is untrusted. Additionally, to pure text processing, text representation perturbation can also be applied to related tasks, such as speech transcription [17].

For further details, we refer the interested reader to Section 4.4 in the work of Zhao and Chen [148] or Section 4.1 in the work of Yue et al. [143].

4.6 Smart Energy Networks

Smart energy networks, also called smart grids, use digital technologies to optimize the generation, distribution, and usage of electricity. Key components are smart meters, which allow real-time monitoring of energy consumption and, thereby, load forecasting. However, smart meter data can disclose personal information, such as daily habits of residents.

Abdalzaher et al. [13] provided an overview of privacy-related issues of smart meters and possible defense mechanisms including DP. While they focused on private data release, Ustundag Soykan et al. [123] proposed a private load forecasting method on smart meter data using DP-SGD.

4.7 Recommender Systems

Recommender systems are programs that provide personalized recommendations to users, based on their past behaviors and contextual information (so-called user features). Similar to other machine learning models, recommender systems are at risk of leaking personal information.

Zhang et al. [145] proposed a recommender system based on a graph convolutional network that protects both the user-item interactions (modeled as a graph) and the additionally used user features. The former is achieved by perturbation of the model’s (polynomially approximated) loss function and the latter by adding noise directly to the user features.

Chen et al. [35] focused on cross-domain recommendation, where knowledge is transferred from one domain to another, usually because the target domain lacks enough data. The information transfer introduces a privacy risk for the users of the source domain. Chen et al. [35] proposed a solution based on differentially private rating publishing and subsequent recommendation modeling with deep neural networks.

4.8 Mobile Devices

The combination of DP and DL was also studied in the context of mobile devices. On the one hand, studies looked at how to reduce the model’s size and computational demand to allow the deployment and/or training of deep neural networks on devices with limited resources while still preserving utility and training privacy. For example, Wang et al. [126] proposed an architecture-independent approach based on hint learning and knowledge distillation, and Li et al. [81] introduced the lightweight face recognition algorithm LightFace already discussed in Section 4.3. On the other hand, DP was applied on location-based services—a typical application class for mobile devices [39, 72].

5 Differentially Private Generative Models

Generative models are a class of machine learning models that aim to generate new data samples similar to the data samples from the training set. The synthetic data created with generative models are often seen as being privacy preserving, as they are not directly linked to real entities or individuals. However, similarly to other machine learning models, generative models can memorize sensitive information and be vulnerable to privacy attacks. This section gives an overview of recent works regarding the generation of differentially private synthetic data with generative models (Table 3).

Table 3.

Method	Type of Generative Model(s)	DP Algorithm
DPGM [14]	variational autoencoders	DP k-means and DP-SGD
DP-SYN [12]	autoencoders	DP-SGD and DP-EM
PPGAN [84]	GAN	DP-SGD
GANobfuscator [133]	GAN	DP-SGD
GS-WGAN [36]	GAN	DP-SGD
RDP-CGAN [119]	GAN	DP-SGD
PATE-GAN [73]	GAN	PATE

Table 3. Summary of Differentially Private Generative Models

The methods are either based on (variational) autoencoders or GANs.

5.1 Autoencoders

An autoencoder is a DL model consisting of two parts: an encoder and a decoder. The encoder learns to map the input to a lower-dimensional space, whereas the decoder learns to reconstruct the input from this intermediate representation. Variational autoencoders (VAEs) work according to the same principle but learn the probability distribution of the intermediate (encoded) representations.

Acs et al. [14] proposed DPGM (Differentially Private Generative Model) based on a mixture of VAEs. First, the training data are privately clustered using differentially private version of k-means. Next, one VAE per cluster is trained with DP-SGD. The data distributions learned by the VAEs are used to generate synthetic data. Splitting the input data into clusters and learning separate models has two advantages. First, the models learn faster, as they are trained on similar datapoints so less noise is added and the models achieve higher accuracy. Second, unrealistic combinations of clusters are avoided.

Abay et al. [12] used a similar approach also based on partitioning the training data. In contrast to DPGM [14], they assume a supervised setting and split the data according to their labels. Each class is then used to train a separate autoencoder via DP-SGD. Synthetic data is generated by sampling the intermediate representations with differentially private expectation maximization (DP-EM) [101], and applying the decoder on the new representations. This method (called DP-SYN) outperformed DPGM for most tested datasets.

5.2 Generative Adversarial Networks

GANs consist of two neural networks that play an adversarial game. The generator tries to create synthetic samples similar to the training data, whereas the discriminator attempts to distinguish between the artificially generated and the real samples. After training, the generator can be used to create synthetic data.

In the original GAN, the generator is trained based on the Jensen-Shannon distance as a distance measure between the two data distributions. This can lead to instability issues, particularly vanishing gradients, where the generator learns too slowly compared to the discriminator. To mitigate this problem, the Wasserstein GAN (WGAN) [18] was introduced, which relies on the Wasserstein distance instead of the Jensen-Shannon distance. For this variant, the weights are clipped after each update to make the Wasserstein distance applicable in this context. A differentially private version of WGAN (called DPGAN) was first introduced in 2018 by Xie et al. [132] exploiting the fact that WGAN already has bounded gradients. Therefore, DP can be achieved by adding noise to the discriminator’s gradients without the need to clip them. As the generator has only access to information about the training data via the (now private) discriminator, it is not necessary to train the generator with DP-SGD. Like DPGAN, the following works [36, 84, 119, 133] are private variants of WGAN.

PPGAN by Liu et al. [84] is similar to DPGAN but uses a different optimization algorithm (mini-batch SGD instead of RMSprop). Xu et al. [133] and Chen et al. [36] both proposed differentially private GANs based on the improved WGAN by Gulrajani et al. [66]. This version of WGAN adds a penalty on the gradient norm to the objective function instead of clipping the weights to improve the GAN’s stability. While the GANobfuscator by Xu et al. [133] trains the whole discriminator with DP-SGD, the GS-WGAN by Chen et al. [36] only sanitizes the gradients that propagate information from the discriminator back to the generator, resulting in a more selective gradient perturbation. Moreover, the GANobfuscator includes an adaptive clipping norm (based on average gradient norm on public data) and an adaptive learning rate (based on gradients’ magnitudes).

Torfi et al. [119] introduced RDP-CGAN (Rényi DP and Convolutional GAN), addressing two challenges that can make generating synthetic data difficult: (1) mixed discrete-continuous data and (2) correlated data (temporal correlations or correlated features). These properties are specially prevalent in health data. RDP-CGAN handles discrete data by adding an unsupervised feature learning step in the form of a convolutional autoencoder to the WGAN. This autoencoder is trained using DP-SGD to map the (discrete) input space to a continuous space. Correlations in the data are considered using one-dimensional convolutional layers.

Jordon et al. [73] took another approach and proposed a differentially private GAN based on the PATE framework, where k teacher discriminators are trained to distinguish real and synthetic data samples. The teachers are then used to label the training data for the student discriminator (by noisy aggregation of their predictions). A main contribution is that PATE-GAN sidesteps the need for public data of the original PATE method. This is essential in this setting, as the generation of synthetic data is usually necessary exactly because no public data is available. They could show that their approach outperforms DPGAN.

6 Auditing and Evaluation

Auditing and evaluating DL models is important to ensure that they provide effective privacy preservation while maintaining utility. As DP is always a tradeoff between privacy and utility, privacy evaluation helps in choosing a suitable privacy budget: high enough to protect sensitive information but low enough to provide sufficient accuracy. Which level of accuracy is sufficient is application dependent. In some cases, it might also be important to evaluate not only the utility overall but also on different subgroups present in the training data to detect potential unfairness and biases. In this section, we first discuss the attack-based evaluation of empirical privacy risks, then the evaluation of accuracy on subgroups.

6.1 Attack-Based Evaluation

While the DP parameter \(\epsilon\) provides an upper bound on the privacy loss, attack-based evaluation can give a lower bound. Even though empirical privacy evaluation cannot provide any guarantee, it can be useful to answer questions regarding the practical meaning of different privacy budgets, or how different aspects (e.g., the applied DP notion) influence the actual privacy risk. Moreover, the considerable gap that was repeatedly observed between the lower and upper bound led to the assumption that the worst-case setting is too pessimistic and actual privacy is much lower. As a consequence, high privacy budgets (i.e., high \(\epsilon\)) were often chosen in practical applications [45]. While this choice improved the models’ utility, the question remained if the gap between lower and upper privacy bounds implies that the DP guarantee is too pessimistic, or if the applied attacks were just weak and future stronger attacks would be able to fully exploit the privacy budget. Additionally, attacks can be used to evaluate whether models preserve privacy in specific settings, such as in cases where a class consists of instances by a single individual. The papers reviewed in this section give insights into these matters. Table 4 provides an overview of the reviewed works, their applied attacks, and evaluation metrics. Table 5 shows which datasets were used for auditing.

Table 4.

Work	Attack	Black- or White-Box	Evaluation Metric(s)
Jayaraman and Evans, 2019 [71]	MIA and model inversion	black- and white-box	accuracy loss, attacker’s advantage
Chen et al., 2020 [37]	MIA	white-box	accuracy
Leino and Fredrikson, 2020 [78]	MIA	white-box	accuracy, recall, precision, attacker’s advantage
Jagielski et al., 2020 [70]	poisoning attack	black-box	\(\epsilon _{LB}\)
Nasr et al., 2021 [95]	MIA and poisoning attacks	black- and white-box	\((\epsilon , \delta)_{LB}\)
Park et al., 2019 [100]	model inversion	black-box	success rate, impact of the attack
Zhang et al., 2020 [146]	model inversion	white-box	accuracy

Table 4. Summary of Attack-Based Evaluation Methods

The table includes the works reviewed in this chapter and states the used attack(s), if they assume black- or white-box access to the target model, and how the target model’s vulnerability was measured. The attacker’s advantage is a measure of privacy leakage proposed by Yeom et al. [140]. \(\epsilon _{LB}\) and \((\epsilon ,\delta)_{LB}\) refer, respectively, to the empirical lower bound of the privacy budget.

Table 5.

Dataset	Data Type	Paper
Adult dataset [6]	tabular	[78]
AT&T Database of Faces [1]	images	[100]
Breast Cancer Wisconsin dataset [7]	tabular	[78]
CIFAR-10 [75]	images	[70, 78, 95, 138]
CIFAR-100 [75]	images	[71, 78]
Fashion-MNIST [131]	images	[70]
German Credit Data [10]	tabular	[78]
Hepatitis dataset [9]	tabular	[78]
Labeled Faces in the Wild [5]	images	[78]
MNIST [76]	images	[78, 95, 146]
Pima Diabetes dataset [8]	tabular	[78]
Purchase-100 [4]	tabular	[70, 71, 95]
VGGFace2 dataset [3]	images	[100]
Yeast genomic dataset [23]	genomic	[37]

Table 5. Datasets Used for Auditing DP-DL in the Reviewed Papers

Jayaraman and Evans [71] analyzed how different relaxed notions of DP influence the attack success. They use both MIA and model inversion attack, and study DP with advanced composition, zCDP, and Rényi DP. They conclude that relaxed DP definitions go hand in hand with increased attack success (i.e., privacy loss). That means that using modified privacy analysis can narrow the gap between theoretical privacy guarantee and empirical lower bound.

Chen et al. [37] confirmed the effectiveness of DP also in case of high-dimensional training data using the example of genomic data. They evaluated a differentially private CNN model with and without sparsity by launching an MIA. Even though model sparsity can improve the model’s accuracy in the non-private setting (by mitigating overfitting), they showed that in the private setting it has a negative effect (but improves privacy).

While the previous papers used existing attacks to study different settings, the following works propose novel, stronger attacks.

Leino and Fredrikson [78] argued that memorization of sensitive information can not only be apparent in the model’s predictions but also in how it uses (externally given or internally learned) features. Features that are only predictive for the training dataset but not for the target data distribution can leak information even in well-generalized models. To evaluate the resulting privacy risk, they proposed a new white-box membership inference attack that also leverages the intermediate representations of the target model. As normal shadow model training does not lead to the same interpretation of internal features (even with identical model architecture, hyperparameters, and training data), they linearly approximated each layer, launched a separate attack on each layer, and trained a meta-model that combines the outputs of the layer-wise attacks. This novel attack was shown to be more effective than previous attacks. Training the target model with DP-SGD in general decreased the model’s vulnerability, but high \(\epsilon\) (here: \(\epsilon =16\)) leads to privacy leakage comparable to the non-private setting.

Jagielski et al. [70] explicitly analyzed the gap between theoretical privacy guarantee (upper bound) and empirical evaluation (lower bound). The gap can be narrowed either by providing tighter privacy analyses (as discussed in context with Jayaraman and Evans [71]) or by developing stronger attacks (e.g., like Leino and Fredrikson [78]). Jagielski et al. [70] followed the latter approach and showed that the former is approaching its limits. Their novel attack is based on data poisoning, where the attacker purposefully perturbs some training samples to change the parameter distribution (e.g., by adding a white patch in the corner of an image). As this approach is made effective by using poisoning samples that induce large gradients, the gradient clipping of DP-SGD degrades the attack. Their proposed adjustment includes perturbing training samples so that parameters are changed in the direction of their lowest variance. Their estimated lower bound of the privacy loss \(\epsilon _{LB}\) was only an order of magnitude below the theoretical upper bound. This suggested the end of continuously tighter privacy guarantees by refined analyses that in the past brought improvements by a thousandfold.

Nasr et al. [95] extended this study further by testing how different assumptions about the attacker’s capabilities influence the empirical lower bound. The investigated capabilities include access to the parameters of the final and all intermediate models, and manipulation of inputs and gradients. In contrast to Jagielski et al. [70], they considered approximate DP. Their evaluations revealed that the strongest adversary can exploit the full privacy budget determined by theoretic analysis. They confirmed thereby that the possibilities to improve the privacy-utility tradeoff via more advanced privacy analyses are exhausted. However, assuming limited capabilities of the attacker may reduce the upper bound further.

The attacks applied in these papers are all considered typical assessments of DP: membership inference is the most obvious test of DP arising from its definition; the model inversion attack applied by Jayaraman and Evans [71] tries to infer attributes of specific individuals, which is not possible if the whole record of the individual cannot be recovered; and poisoning attacks exploit the fact that DP has to work even with the worst-case dataset. However, model inversion attacks can also mean the inference of class attributes (instead of individual attributes). In some tasks (e.g., face recognition), a class refers to exactly one individual. The following works [100, 146] analyzed whether DP-DL also protects against this kind of model inversion attack.

Park et al. [100] evaluated the privacy loss of a face recognition model by reconstructing the training images from the model’s predictions, and automatically measuring the attack success based on the performance of an evaluation model. Their results showed that even high privacy budgets (e.g., \(\epsilon =8\)) can provide protection against this model inversion attack compared to the non-private setting.

Zhang et al. [146] proposed a GMI (generative model inversion) attack also in the face recognition setting. They first trained a GAN (see the detailed explanation in Section 5) on public data to generate realistic images, then reconstructed the sensitive face regions by finding the values that maximize the likelihood. They showed that DP-SGD could not prevent their attack. They also argued that higher predictive power of the target models goes hand in hand with increased vulnerability to the attack.

Further discussions about the relevance of this kind of model inversion attack can be found in Section 9.

6.2 Evaluation of Subgroups: Bias and Fairness

The evaluation of accuracy on subgroups is a standard approach to identify biases. In general, models should avoid the unfair treatment of different groups, especially those based on legally protected attributes like gender, religion, and ethnicity. Bagdasaryan et al. [20] showed that applying DP-SGD leads not only to an overall accuracy loss but also underrepresented classes are disparately affected. While a greater imbalance in the training data seems to increase the accuracy gap, Farrand et al. [52] showed that it is significant even in cases of a 30%–70% split, and also for loose privacy guarantees. The observed effect is believed to mainly arise from the clipping of the gradients, which penalizes those samples that result in bigger gradients (i.e., outliers).

7 Beyond Membership and Attribute Inference: Applying DP-DL to Protect against Other Threats

DP is usually applied to protect against re-identification. In the context of DL, this mainly includes MIA and model inversion attack. This section reviews cases in which DP can protect against other threats that DL models exhibit. Table 6 lists the discussed threats and literature.

Table 6.

Threat	Paper
Model extraction attacks	[136, 149, 150]
Adversarial attacks	[77, 103]
Privacy risk of interpretable DL	[69]
Privacy risk of machine unlearning	[38]
Backdoor poisoning attacks*	[48]
Network intrusion*	[137]

Table 6. Summary of Threats Other Than Membership and Attribute Inference against Which DP-DL Was Applied

*These two approaches are both based on threat identification via anomaly detection.

One threat against which DP can be applied, even though it was not originally intended for that purpose, are model extraction attacks. Model extraction is primarily a security and confidentiality issue, but it can also compromise privacy as it facilitates membership and attribute inference.

Zheng et al. [149, 150] observed that most model extraction attacks infer the decision boundary of the target model via nearby inputs. They introduced the notion of boundary differential privacy (BDP) and proposed to append a BDP layer to the machine learning model that (1) determines which outputs are close to the decision boundary (a question not straightforward for DL models, as the decision boundary has no closed form) and (2) adds noise to them. Previous input-output pairs are cached to ensure that the same input results in the same noisy output. This method guarantees that the attacker cannot learn the decision boundary with more than a predetermined level of precision (controlled by \(\epsilon\)).

A subsequent study by Yan et al. [136] showed that without caching, the BDP layer cannot protect against their novel model extraction attack. They proposed an alternative to caching: monitoring the privacy leakage and adapting the privacy budget accordingly.

DP can also be adapted to protect DL models against adversarial attacks. While originally DP is defined on a record level, feature-level DP can achieve robustness against adversarial examples. For example, Lecuyer et al. [77] proposed PixelDP, a pixel-level DP layer that can be added to any type of DL model. As this approach only guarantees robustness but not DP in the original sense, Phan et al. [103] developed the method further by combining it with DP-SGD. To deal with the tradeoff between robustness, privacy, and utility, they relied on heterogeneous noise: more noise is added to more vulnerable coordinates.

Another line of research is the relationship between interpretability/explainability and privacy. Even though interpretable/explainable AI methods are not a threat scenario on their own, they can facilitate privacy attacks. Harder et al. [69] looked at how models can both be interpretable and guarantee DP. Models trained with DP-SGD are not vulnerable to gradient-based interpretable AI methods due to the post-processing property of DP. However, gradient-based methods can only provide local explanations (i.e., about how relevant a specific input is for the model’s decision), but Harder et al. [69] were specifically interested in methods that can also give global explanations (i.e., about how the model works overall). They introduced DP-LLM (differentially private locally linear maps), which can approximate DL models and are inherently interpretable.

Chen et al. [38] investigated the privacy risk of machine unlearning. Machine unlearning [28, 125] is the process of removing the impact one or more datapoints have on a trained model. Common methods are retraining from scratch and SISA (Sharded, Isolated, Sliced, and Aggregated) [25], where the original model consists of k models each trained on a subset of the training set and therefore only one submodel is retrained for unlearning. While the main idea is to be able to comply with privacy regulations like the Right to Be Forgotten in the European General Data Protection Regulation (GDPR), the unlearning can disclose additional information about the removed datapoint(s). Chen et al. [38] proposed a new black-box MIA that exploits both the original and the unlearned model. Their attack was more powerful than classical MIAs, and also worked for well-generalized models, when several datapoints were removed, when the attacker missed several intermediate unlearned models, and when the model was updated with new inputs. They showed that DP-SGD is an effective defense against the privacy risk of machine unlearning.

Instead of protecting the model directly, DP can also be used to improve the detection of attacks [48, 137]. This approach can be viewed as a kind of anomaly detection, where the attack scenario is the outlier/novelty. Anomaly detection with DL models (e.g., autoencoders, CNNs) is based on the model’s tendency to underfit on underrepresented subgroups (i.e., the model’s error is expected to be higher for atypical inputs). Training the model with DP amplifies this effect (see Section 6.2). While this leads to negative consequences in the context of fairness and bias, here it can be used to improve the performance of anomaly detection.

Du et al. [48] applied this approach on crowdsourcing data—that is, data stemming from many individuals. These individuals could launch a backdoor poisoning attack by maliciously adapting their contributed samples. To protect the target model, the poisoned samples need to be identified and removed from the training set. They showed that DP can improve the performance of anomaly detection in this context. Based on the same reasoning, Yang et al. [137] proposed Griffin, a network intrusion detection system.

8 Improving the Privacy-Utility Tradeoff of DP-DL

The main challenge of DP-DL is that by setting meaningful privacy guarantees, utility often deteriorates strongly. In recent years, many propositions for improved DP-DL (mostly DP-SGD) were made that can increase accuracy at the same privacy. Table 7 gives an overview of the proposed approaches. An alternative method would be to provide tighter theoretical bounds for the privacy loss without influencing the learning algorithm. An example of such an approach evaluated on DL can be found in the work of Ding et al. [47]. However, as we already established in Section 6, this line of research seems to have reached its limit.

Table 7.

Approach	Paper
Adapting the model architecture	[99]
Improving the hyperparameter selection	[83, 99]
Applying feature engineering and transfer learning	[120]
Mitigating the clipping bias (to improve convergence)	[41]
Pruning the model	[15, 62, 107]
Adding heterogeneous noise	[16, 63, 130, 134, 142]

Table 7. Summary of Approaches for Improving DP-DL

One approach to increase the accuracy of DP-SGD at the same privacy is to adapt the architecture of the DL model to better suit differentially private learning. Papernot et al. [99] observed that rendering SGD differentially private as proposed by Abadi et al. [11] leads to exploding gradients. The larger the gradients, the more information is lost during clipping, which in turn hurts the model’s accuracy. To mitigate this effect, Papernot et al. [99] proposed to use bounded activation functions instead of the unbounded ones commonly used in non-private training (e.g., ReLU). They introduced tempered sigmoid activation functions:

\begin{equation} \Phi (x) = \frac{s}{1+e^{-T x}}-o, \end{equation}

(4)

where the parameter s controls the scale of the activation, the inverse temperature T regulates the gradient norms, and o is the offset (Figure 2). The setting [\(s=2\), \(T=2\), \(o=1\)] results in the hyperbolic tangent (tanh) function.

Fig. 2.

Papernot et al. [99] showed that tempered sigmoids can increase the model’s accuracy. For the MNIST and Fashion-MNIST datasets, the tanh function performed best.

Another important aspect when tuning DL models for improved performance is hyperparameter selection (e.g., choosing the learning rate). Even though it might be tempting to transfer the choice of hyperparameters from non-private to private model, Papernot et al. [99] showed that not only the model’s architecture but also the hyperparameters should be chosen explicitly for the private model in contrast to using what worked well in the non-private setting. As this can result in additional privacy leakage, one should consider private selection of hyperparameters, as, for example, proposed by Liu and Talwar [83].

Similar to non-private DL, DP-SGD can benefit from feature engineering, additional data, and transfer learning. Tramèr and Boneh [120] showed that handcrafted features can significantly improve the private model’s utility compared to end-to-end learning. The comparable increase in accuracy for private end-to-end learning can be achieved by using an order of magnitude more training data or by transferring features learned from public data.

While the preceding techniques are already known from non-private DL, DP-SGD introduces two new steps that offer opportunities for improvement: gradient clipping and noise addition.

Chen et al. [41] found that the bias introduced by gradient clipping can cause convergence issues. They discovered a relationship between the symmetry of the gradient distribution and convergence: symmetric gradient distributions lead to convergence even if a large fraction of gradients are heavily scaled down. Based on this finding, Chen et al. [41] proposed to introduce additional noise before clipping when the gradient distribution is non-symmetric. It is important to note that this approach may lead to better but slower convergence due to the additional variance, and therefore only improves the privacy-utility tradeoff in specific use cases.

Another observation specific to DP-SGD is that the privacy-utility tradeoff worsens with growing model size. A higher number of model parameters results in a higher gradient norm, meaning that clipping the gradient to the same norm leads to a higher impact. If the clipping norm is also increased, then more noise has to be added to achieve the same privacy guarantee. This effect can be mitigated by either reducing the number of model parameters (parameter pruning) [15, 62] or compressing the gradients (gradient pruning) [15, 107].

The work by Gondara et al. [62] is based on the lottery ticket hypothesis [55, 56], which says that there exist subnetworks in large neural networks that when trained separately achieve comparable accuracy as the full network. The term lottery ticket refers to the pruned networks and comes from the idea that finding a well-performing subnetwork is like winning the lottery. Gondara et al. [62] altered the original lottery ticket hypothesis to comply with DP. First, the lottery tickets are created non-privately using a public dataset. Next, the accuracy of each lottery ticket is evaluated on a private validation set, and the best subnetwork is selected while preserving DP via the Exponential Mechanism. Finally, the winning ticket is trained using DP-SGD.

Adamczewski and Park [15] proposed DP-SSGD (differentially private sparse stochastic gradient descent) that too relies on model pruning. They experimented with both parameter freezing, where just a subset of parameters are trained, and parameter selection, where a different subset of parameters are updated each iteration. The updated parameters were either chosen randomly or based on their magnitude.

Both of these model pruning approaches rely on publicly available data that should be as similar as possible to the private data. In contrast, Phong and Phuong [107] proposed a gradient pruning method that works without public data. In addition to making the gradients sparse, they use memorization to maintain the direction of the gradient descent.

A further line of research deals with adapting the noise that is added to the differentially private model during training. While the original DP-SGD algorithm adds the same amount of noise to each gradient coordinate independent of the training progress, this line of work adds noise either based on the learning progress (e.g., number of executed epochs) [142] or based on the coordinates’ impact on the model [16, 63, 130, 134].

Yu et al. [142] argue that with training progress and therefore convergence to the local optimum, the model profits more from smaller noise. This “dynamic privacy budget allocation” is similar to the idea behind adaptive learning rates, which is a common technique in non-private learning and can also be applied in private learning (see, e.g., [133, 134]). Yu et al. [142] compared different variants of dynamic schemes to allocate privacy budgets, including predefined decay schedules like exponential decay, or noise scaling based on the validation accuracy on a public dataset. They showed that all dynamic schemes outperform the uniform noise allocation to a similar extent.

Xiang et al. [130] treated the privacy-utility tradeoff as an optimization problem, namely minimizing the accuracy loss while satisfying the privacy constraints. Consequently, less noise is added to those gradient coordinates that have a high impact on the model’s output. While the model’s utility was improved for a range of privacy budgets and model architectures, the method is computationally expensive due to the high dimensionality of the optimization problem.

Xu et al. [134] advanced the approach further not only reducing the computational demand but also improving convergence (and therefore decreasing the privacy budget). The improved version called AdaDP (adaptive and fast convergent approach to differentially private deep learning) replaces the computationally expensive optimization with a heuristic approach to compute the gradient coordinates’ impact on the model’s output. The added noise is not only adaptive with regard to the coordinates’ sensitivity but also decreases with the number of training iterations. Faster convergence is achieved by incorporating an adaptive learning rate that is larger for less frequently updated coordinates.

A related research direction is the usage of explainable AI methods to calibrate the noise. Both Gong et al. [63] and Adesuyi and Kim [16] use layer-wise relevance propagation (LRP) [91] to determine the importance of the different parameters. Gong et al. [63] proposed the ADPPL (adaptive differential privacy preserving learning) framework that adds adaptive Laplace noise [106] to the gradient coordinates according to their relevance. In contrast, the approach by Adesuyi and Kim [16] is based on loss function perturbation. As DL models have non-convex loss functions, the polynomial approximation of the loss function is computed before adding Laplace noise. LRP is used to classify the parameters either as high or low relevance, adding small and large noise accordingly.

Explainable AI-based approaches can also be applied in the local DP setting—for example, Wang et al. [127] uses feature importance to decide how much noise to add to the training data.

A list of examples for results that the discussed works reported can be found in Table 8. For comparison, the results for the original DP-SGD by Abadi et al. [11] were included as well. The different accuracies for the three different reported privacy levels from the original DP-SGD paper clearly show the privacy-utility tradeoff typical for differentially private algorithms. Direct comparison between the methods is difficult due to different network architectures and hyperparameters, the different evaluation datasets, and the different privacy levels measured according to different DP notions (e.g., \(\epsilon\)-DP, \((\epsilon ,\delta)\)-DP, \(\rho\)-zCDP).

Table 8.

Method	Model	Dataset	Accuracy	Privacy
Original [11]	PCA + FCNN (1 hidden layer)	MNIST	90%	\(\epsilon =0.5, \delta =10^{-5}\)
Original [11]	PCA + FCNN (1 hidden layer)	MNIST	95%	\(\epsilon =2, \delta =10^{-5}\)
Original [11]	PCA + FCNN (1 hidden layer)	MNIST	97%	\(\epsilon =8, \delta =10^{-5}\)
Tanh [99]	CNN (2 convolutional layers)	MNIST	\(98.1\%\)	\(\epsilon =2.93, \delta =10^{-5}\)
DPLTH [62]	FCNN (3 hidden layers)	public: MNIST; private: Fashion-MNIST	\(76\%\)	\(\epsilon =0.4\)
DP-SSGD [15]	CNN (2 convolutional layers)	MNIST	\(97.02\%\)*	\(\epsilon =2\)
Gradient compression [107]	CNN (2 convolutional layers)	MNIST	\(98.52\%\)	\(\epsilon =1.71, \delta =10^{-5}\)
Dynamic privacy budget allocation** [142]	PCA + FCNN (1 hidden layer)	MNIST	\(93.2\%\)**	\(\rho =0.78\)***
Adaptive noise [130]	CNN (2 convolutional layers)	MNIST	\(94.69\%\)	\(\epsilon =1,\delta =10^{-5}\)
AdaDP [134]	PCA + FCNN (1 hidden layer)	MNIST	\(96\%\)	\(\epsilon =1.4,\delta =10^{-4}\)
Noise acc. to LRP [16]	FCNN (3 hidden layers)	Wisconsin Diagnosis Breast Cancer (WDBC)	\(94\%\)	\(\epsilon =1.1\)
ADPPL framework [63]	CNN (2 convolutional layers)	MNIST	\(94\%\)	\(\epsilon =1\)

Table 8. Selected Examples of Results for the Improved DP Methods in Comparison with the Original DP-SGD by Abadi et al. [11]

The used DL models include fully connected neural networks (FCNN) and CNNs. Some include a PCA layer in front. We reported the model, evaluation dataset, accuracy, and privacy budget for all methods that state exact values, preferably on the MNIST dataset.

*This result was achieved by parameter freezing. Parameter selection performed slightly worse.

**This result was achieved by a polynomial decay schedule. Other budget allocation schemes performed comparably.

***\(\rho\) is the privacy parameter for zCDP.

9 Discussion and Future Directions

This study reviews the latest developments on DP in centralized DL. The main research focuses of the past years were (1) the application of DP to specific domains, (2) differentially private generative models, (3) auditing and evaluation of DP models, (4) applications of DP-DL to protect against threats other than membership and attribute inference, and (5) improvements of the privacy-utility tradeoff. For each subtopic, we provided a comprehensive summary of recent advances. In this last section, we discuss the key points, interconnections, and expected future directions of the respective topics and differentially private centralized DL in general.

Our survey demonstrated how diverse the methods and applications of DP can be on the example of DL models. While this flexibility is one of the strengths of DP, it can also be a hindrance to its broad deployment due to insufficient understanding. The efforts to make DP more accessible to a wider audience and to promote its (correct) application should continue. That includes not only discussions about how to choose the method, the unit of privacy, and the privacy budget but also how to verify that implementations are correct (see the work of Kifer et al. [74] and references therein for more information). An example where an implementation error lead to privacy issues even though theoretically DP was proven can be found in the work of Tramer et al. [122].

In addition to the properties inherent to DP that make it hard to understand for laypeople (see Section 3.1), names that are used ambiguously in the research field can add to the confusion. For example, model inversion can refer to inferring (1) the attribute of a single record or (2) the attribute of a class. While the first obviously implies a privacy concern, the second is primarily problematic if a class consists of instances of one individual (e.g., as is the case in face recognition). As a result of this ambiguous meaning, contradicting conclusions emerged about whether DP protects against model inversion attacks. Some (e.g., [140]) used the first interpretation and concluded that DP naturally also protects against model inversion. Others (e.g., [146]) used the second interpretation and showed that DP does not always do so. Interestingly, Park et al. [100] also use the second interpretation but concluded that DP mitigates model inversion attacks. Maybe this is because DP decreases the accuracy of the target model, and, as Zhang et al. [146] argued, predictive power and vulnerability to model inversion go hand in hand. Future research should pay attention to accurately define the used terms, and, ideally, the research community should agree on a coherent taxonomy.

The increasing awareness of privacy concerns in combination with the many open questions regarding ethical, legal, and methodical aspects make using synthetic data a tempting alternative to applying privacy-enhancing technologies to private data. However, it is important to spread the knowledge that synthetic data alone is not by default privacy preserving [115]. Additional protection might be necessary. Even though differentially private synthetic data can be a viable solution, future research is needed, for example, to find good ways to evaluate the usefulness of the data.

Auditing and evaluating DL models is a key research topic not only but especially for differentially private models. We expect the trend of novel attacks to continue, analyzing new threat scenarios and improving our understanding of which aspects influence the attack success. An important element of evaluation is the used dataset. Interestingly, many commonly used datasets for auditing DP-DL (e.g., MNIST, CIFAR-10 or CIFAR-100; see Table 5) are not “relevant to the privacy problem” [42]. There is a need for more realistic benchmark datasets that include private features. It is also noticeable that attack-based evaluation has not been extensively studied on differentially private language models. While there exist many studies proposing privacy attacks on language models (e.g., [30, 88]) and DP is increasingly applied on language models (see Section 4), the works reviewed in this survey that use attack-based evaluation specifically for comparing the resulting lower bound to the upper bound provided by the DP guarantee focused more on tabular and image datasets. Another point to consider regarding evaluation is that with the rise of continual learning, auditing is not only relevant once before deployment but should be carried out repeatedly whenever the training set changes [42]. This is also the case when machine unlearning is applied (as discussed in Section 7).

The accuracy disparity between subgroups is one of many biases that are studied in the field of fair AI—with the goal of avoiding discriminatory behavior of machine learning models. Its amplification by DP, the underlying causes, and possible mitigation strategies are actively researched. For example, de Oliviera et al. [44] suggested that they are preventable by better hyperparameter selection.

In addition to empirical assessment, theoretical privacy analysis might be able to provide more realistic upper bounds by including additional assumptions (e.g., about the attacker’s capabilities) or features (e.g., the clipping norm or initial randomness [70]). This could also improve the privacy-utility tradeoff.

Section 7 showed that the concept of DP can be beneficial in diverse threat scenarios. Especially, the connection to robustness and explainability might gain importance through the growing interest in trustworthy AI.

We also anticipate further research on novel strategies or advancement on existing approaches to improve the privacy-utility tradeoff. Some of the mentioned methods could be combined in the future. For example, the differentially private lottery ticket hypothesis approach by Gondara et al. [62] can be combined with tempered sigmoid activation functions [99]. Additional effort should be made to compare different methods to identify the best-performing method(s). Simply summarizing the reported results, as we did in Table 8, cannot provide sufficient insight. Fair comparison would require testing the methods on the same model (i.e., same architecture and hyperparameters) with the same evaluation dataset for the same privacy level.

When comparing the different approaches, it is also important to note that some rely on public data [15, 62, 120, 142]. On the one hand, public data might not be available and therefore prevent the application of those methods. On the other hand, it is debatable whether public availability justifies disregarding all privacy considerations (see other works [42, 121] for further information).

With the rise of DL used in real-world scenarios, new challenges arise. For example, real-world datasets often contain various dependencies, where domain knowledge is required for their correct interpretation. Causal models [102] may help capture and model this domain knowledge and additionally improve interpretability [92], and act as guide to avoid biases [87]. However, they may have new implications on privacy. Growing interest is not only coming from the research and industrial community—the public is also actively engaging in discussions about the impact of AI applications on society. Most recently, large language models like ChatGPT [2] are in the spotlight, among other things due to privacy concerns. The future will show whether DP will be part of the next generation of DL deployments.

All in all, DP-DL achieved significant progress in recent years, but open questions are still numerous. We expect the interest in the topic to increase further, especially as new standards and legal frameworks arise. On the way to trustworthy AI, we need not only technical innovations but also legal and ethical discussions about what privacy preservation means in the digital age.

10 Conclusion

This survey provided a comprehensive overview of recent trends and developments of DP in centralized DL. Throughout the article, we highlighted the different research focuses of the past years, including auditing and evaluating differentially private models, improving the tradeoff between privacy and utility, applying DP methods to threats beyond membership and attribute inference, generating private synthetic data, and applying DP-DL models to different application fields. A total of six insights have been derived from literature. First, there is a need for more realistic benchmark datasets with private features. Second, there is a necessity for repeated auditing. Third, more realistic upper privacy bounds would be possible by including additional attack assumptions and model features. Fourth, privacy-utility tradeoffs can be improved by better comparison of existing methods and a possible combination of best approaches. Fifth, by default, synthetic data is not privacy preserving, and differentially private synthetic data requires more research. Sixth, ambiguously used terms lead to confusion in the research field, and a coherent taxonomy is needed.

In summary, we explored the advancements, remaining challenges, and future prospects of integrating mathematical privacy guarantees into DL models. By shedding light on the current state of the field and emphasizing its potential, it is our hope to inspire further research and real-world applications of DP-DL.

References

[1]

AT&T Laboratories Cambridge. n.d. The Database of Faces. Retrieved May 31, 2023 from https://cam-orl.co.uk/facedatabase.html/

Abstract

1 Introduction

1.1 Contributions of This Survey

2 Survey Methodology

3 Preliminaries

3.1 Differential Privacy

3.2 Deep Neural Networks and Stochastic Gradient Descent

3.3 Privacy Threats for DL Models

3.4 DP Algorithms for DL

4 Specific Applications of DP-DL

4.1 Image Publishing

4.2 Medical Image Analysis

4.3 Face Recognition

4.4 Video Analysis

4.5 Natural Language Processing

4.6 Smart Energy Networks

4.7 Recommender Systems

4.8 Mobile Devices

5 Differentially Private Generative Models

5.1 Autoencoders

5.2 Generative Adversarial Networks

6 Auditing and Evaluation

6.1 Attack-Based Evaluation

6.2 Evaluation of Subgroups: Bias and Fairness

7 Beyond Membership and Attribute Inference: Applying DP-DL to Protect against Other Threats

8 Improving the Privacy-Utility Tradeoff of DP-DL

9 Discussion and Future Directions

10 Conclusion

References

Index Terms

Recommendations

Differential privacy in deep learning: A literature survey

An Emerging Strategy for Privacy Preserving Databases: Differential Privacy

Differential privacy in deep learning: Privacy and beyond

Comments

Information

Published In

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

View options

PDF

eReader

Login options

Full Access

Share

Share this Publication link

Share on social media

Affiliations