Keywords

1 Introduction

Machine learning models are pregnable (e.g. [3, 4, 9]) at test time to specially learned, mild noise in the input space, commonly known as adversarial perturbations. Data samples created via adding these perturbations to clean samples are known as adversarial samples. Lately, the Deep Neural Networks (DNN) based object classifiers are also observed [7, 11, 14, 28] to be drastically affected by the adversarial attacks with quasi imperceptible perturbations. Further, it is observed (e.g. [28]) that these adversarial perturbations exhibit cross model generalizability (transferability). This means, often same adversarial sample gets incorrectly classified by multiple models in spite of having different architectures and trained with disjoint training datasets. It enables attackers to launch simple black-box attacks [12, 21] on the deployed models without any knowledge about their architecture and parameters.

However, most of the existing works (e.g. [14, 28]) craft input-specific perturbations, i.e., perturbations are functions of input and they may not transfer across data samples. In other words, perturbation crafted for one data sample most often fails to fool the model when used to corrupt other clean data samples. However, recent findings by Moosavi-Dezfooli et al. [13] and Mopuri et al. [15, 17] demonstrated that there exist input-agnostic (or image-agnostic) perturbations that when added, most of the data samples can fool the target classifier. Such perturbations are known as “Universal Adversarial Perturbations (UAP)”, since a single noise can adversarially perturb samples from multiple categories. Furthermore, it is observed that similar to image-specific perturbations, UAPs also exhibit cross model generalizability enabling easy black-box attacks. Thus, UAPs pose a severe threat to the deployment of the vision models and require a meticulous study. Especially for applications which involve safety (e.g. autonomous driving) and privacy of the users (e.g. access granting), it is indispensable to develop robust models against such adversarial attacks.

Fig. 1.
figure 1

Overview of the proposed approach. Stage-I, “Ask and Acquire” generates the “class impressions” to mimic the effect of actual data samples. Stage-II, “Attack” learns a neural network based generative model G which crafts UAPs from random vectors z sampled from a latent space.

Approaches that craft UAPs can be broadly categorized into two classes: (i) data-driven, and (ii) data-free approaches. Data-driven approaches such as [13] require access to samples of the underlying data distribution to craft UAPs using a fooling objective (e.g. confidence reduction as in Eq. (2)). Thus, UAPs crafted via data-driven approaches typically result in higher success rate (or fooling rate), i.e., fool the models more often. Note that data-driven approaches have access to the data samples and the model architecture along with the parameters. Further, performance of the crafted UAPs is observed [15, 17] to be proportional to the number of data samples available during crafting. However the data-free approaches (e.g. FFF [17]), with a goal to understand the true stability of the the models, indirectly craft UAPs (e.g. activation loss of FFF [17]) instead of using a direct fooling objective. Note that data-free approaches have access to only the model architecture and parameters but not to any data samples. Thus, it is a challenging problem to craft UAPs in data-free scenarios and therefore the success rate of these UAPs would typically be lesser compared to that achieved by the data-driven ones.

In spite of being difficult, data-free approaches have important advantages:

  • When compared to their data-driven counter parts, data-free approaches reveal accurate vulnerability of the learned representations and in turn the models. On the other hand, success rates reported by data-driven approaches act as a sort of upper bounds on the achievable rates. Also, it is observed [15, 17] that their performance is proportional to the amount of data available for crafting UAPs.

  • Because of the strong association of the data-driven UAPs to the target data, they suffer poor transferability across datasets. On the other hand, data-free UAPs transfer better across datasets [15, 17].

  • Data-free approaches are typically faster [17] to craft UAPs.

Thus, in this paper, we attempt to achieve best of both worlds, i.e., effectiveness of the data-driven objectives and efficiency, transferability of the data-free approaches. We present a novel approach for the data-free scenarios that emulates the effect of actual data samples with “class impressions” of the model and crafts UAPs via learning a feed-forward neural network. Class impressions are the reconstructed images from the model’s memory which is the set of learned parameters. In other words, they are generic representations of the object categories in the input space (as shown in Fig. 2). In the first part of our approach, we acquire class impressions via simple optimization (Sect. 3.2) that can serve as representative samples from the underlying data distribution. After acquiring multiple class impressions for each of the categories, we perform the second part, which is learning a generative model (a feed-forward neural network) for efficiently generating UAPs. Thus, unlike the existing works [13, 17] that solve complex optimizations to generate UAPs, our approach crafts via a simple feed-forward operation through the learned neural network. The major contributions of our work can be listed as:

  • We propose a novel approach to handle the absence of data (via class impressions, Sect. 3.2) for crafting UAPs and achieve state-of-the-art success (fooling) rates.

  • We present a generative network (Sect. 3.3) that learns to efficiently generate UAPs utilizing the class impressions.

The paper is organized as followed: Sect. 2 describes the relevant existing works, Sect. 3 presents the proposed framework in detail, Sect. 4 reports comprehensive experimental evaluation of our approach and finally Sect. 5 concludes the paper.

2 Related Works

Adversarial perturbations (e.g. [7, 14, 28]) reveal the vulnerability of the learning models to specific noise. Further, these perturbations can be input agnostic [13, 17] called “Universal Adversarial Perturbations (UAP)” and can pose severe threat to the deployability of these models. Existing approaches to craft the UAPs [13, 15, 17] perform complex optimizations every time we wish to craft a UAP. Differing from the previous works, we present a neural network that readily crafts UAPs. Only similar work by Baluja et al. [2] presents a neural network that transforms a clean image into an adversarial sample by passing through a series of layers. However, we learn a generative model which maps a latent space to that of UAPs. A concurrent work by Mopuri et al. [18] presents a similar generative model approach to craft perturbations but for data-driven case.

Also, existing data-free method [17] to craft UAPs achieves significantly less success rates compared to the data-driven methods such as UAP [13] and NAG [18]. In this paper, we attempted to reduce the gap between them by emulating the effect of data with the proposed class impressions. Our class impressions are obtained via simple optimization similar to visualization works such as [26, 27]. Feature visualizations [16, 25,26,27, 29,30,31] are introduced to (i) understand what input patterns each neuron responds to, and (ii) gain intuitions into neural networks in order to alleviate the black-box nature of the neural networks. Two slightly different approaches exist for feature visualizations. In the first approach, a random input is optimized in order to maximize the activation of a chosen neuron (or set of neurons) in the architecture. This enables to generate visializations for a given neuron (as in [26]) in the input space.

In other approaches such as the Deep Dream [19] instead of choosing a neuron to activate, arbitrary natural image is passed as an input, and the network enhances the activations that are detected. This way of visualization finds the subtle patterns in the input and amplify them. Since our task is to generate class impressions that emulate the behaviour of real samples, we follow the former approach.

Since the objective is to generate class impressions that can be used to craft UAPs with the fooling objective, softmax probability neuron seems like the obvious choice to activate. However, this intuition is misleading, [20, 26] have shown that directly optimizing at softmax leads to increase in the class probability by reducing the pre-softmax logits of other classes. Also, often it does not increase the pre-softmax value of the desired class, thus giving poor visualizations. In order to make the desired class more likely, we optimize the pre-softmax logits and our observations are in agreement with that of [20, 26].

3 Proposed Approach

In this section we present the proposed approach to craft efficient UAPs for data-free scenarios. It is understood [13, 17, 18] that, because of data availability and a more direct optimization, data-driven approaches can craft UAPs that are effective in fooling. On the other hand, the data-free approaches can quickly craft generalizable UAPs by solving relatively simple and indirect optimizations. In this paper we aim to achieve the effectiveness of the data-driven approaches in the data-free setup. For this, first we create representative data samples called, class impressions (Fig. 2) to mimic the actual data samples of the underlying distribution. Later, we learn a neural network based generative model to craft UAPs using the generated class impressions and a direct fooling objective (Eq. (2)). Figure 1 shows the overview of our approach. Stage-I, “Ask and Acquire” is about the class impression generation from the target CNN model and Stage-II, “Attack” is training the generative model that learns to craft UAPs using the class impressions obtained in the first stage. In the following subsections, we will discuss these two stages in detail.

3.1 Notation

We first define the notations followed throughout this paper:

  • f: target classifier (TC) under attack, which is a trained model with frozen parameters

  • \(f^i_k\): \(k^{th}\) activation in \(i^{th}\) layer of the target classifier

  • \(f^{ps/m}\): output of the pre-softmax layer

  • \(f^{s/m}\): output of the softmax (probability) layer

  • v: additive universal adversarial perturbation (UAP)

  • x: clean input to the target classifier, typically either data sample or class impression

  • \(\xi \): max-norm \((l_1)\) constraint on the UAPs, i.e., maximum allowed strength of perturbation that can be added or subtracted at each pixel in the image.

3.2 Ask and Acquire the Class Impressions

Availability of the actual data samples can enable to solve for a direct fooling objective thus craft UAPs that can achieve high success rates [13]. Hence in the data-free scenarios we generate samples that act as proxy for data. Note that the attacker has access to only the model architecture and the learned parameters of the target classifier (CNN). The learned parameters are a function of training data and procedure. They can be treated as model’s memory in which the essence of training has been encoded and saved. The objective of our first stage, “Ask and Acquire” is to tap the model’s memory and acquire representative samples of the training data. We can then use only these representative samples to craft UAPs to fool the target classifier.

Note that we do not aim to generate natural looking data samples. Instead, our approach creates samples for which the target classifier predicts strong confidence. That is, we create samples such that the target classifier strongly believes them to be actual samples that belong to categories in the underlying data distribution. In other words, these are impressions of the actual training data that we try to reconstruct from model’s memory. Therefore we name them Class Impressions. The motivation to generate these class impression is that, for the purpose of optimizing a fooling objective (e.g. Eq. 2) it is sufficient to have samples that behave like natural data samples, which is, to be predicted with high confidence. Thus, the ability of the learned UAPs to act as adversarial noise to these samples with respect to the target classifier generalizes to the actual samples.

Top panel of Fig. 1 shows the first stage of our approach to generate the class impressions. We begin with a random noisy image sampled from \(\mathcal {U}[0,255]\) and update it till the target classifier predicts a chosen category with high confidence. We achieve this via performing the optimization shown in Eq. (1). Note that we can create impression \((CI_c)\) for any chosen class (c) by maximizing the predicted confidence to that class. In other words, we modify the random (noisy) image till the target network believes it to be an input from a chosen class c with high confidence. We consider the activations in the pre-softmax layer \(f_c^{ps/m}\) (before we apply the softmax non-linearity) and maximize the model’s confidence.

$$\begin{aligned} CI_c = \mathop {\text {argmax}}\limits _{x} \,\,\, f_{c}^{ps/m}(x) \end{aligned}$$
(1)
Fig. 2.
figure 2

Sample class impressions generated for VGG-F [5] model. The name of the corresponding categories are mentioned below the images. Note that the impressions have several natural looking patterns located in various spatial locations and in multiple orientations.

While learning the class impressions, we perform typical data augmentations such as (i) random rotation in \([-5^{\circ }, 5^{\circ }]\), (ii) scaling by a factor randomly selected from \(\{0.95, 0.975, 1.0, 1.025\}\), (iii) RGB jittering, and (iv) random cropping. Along with the above typical augmentations, we also add random uniform noise in \(\mathcal {U}[-10,10]\). Purpose of this augmentation is to generate robust impressions that behave similar to natural samples with respect to the augmentations and random noise. We can generate multiple impressions for a single category by varying the initialization, i.e., multiple initializations result in multiple class impressions. Note that the dimensions of the generated impressions would be same as that required by the model’s input (e.g., \(224 \times 224 \times 3\)). We have implemented the optimization given in Eq. (1) in TensorFlow [1] framework. We used Adam [10] optimizer with a learning rate of 0.1 with other parameters set to their default values. In order to mimic the variety in terms of the difficulty of recognition (from easy to difficult samples), we have devised a stopping criterion for the optimization. We presume that the difficulty is inversely related to the confidence predicted by the classifier. Before we start the optimization in Eq. (1), we randomly sample a confidence value uniformly in [0.55, 0.99] range and stop our optimization after the predicted confidence by the target classifier reaches that. Thus, the generated class impressions will have samples of varied difficulty.

Figure 2 shows sample class impressions generated for VGG-F [5] model. The corresponding category labels are mentioned below the impressions. Note that the generated class impressions clearly show several natural looking patterns located in various spatial locations and in multiple orientations. Figure 3 shows multiple class impressions generated by our method starting from different initializations for “Squirrel Monkey” category. Note that the impressions have different visual patterns relevant to the chosen category. We have generated 10 class impressions for each of the 1000 categories in ILSVRC dataset resulting in a total of 10000 class impressions. These samples will be used to learn a neural network based generative model that can craft UAPs through a feed-forward operation.

Fig. 3.
figure 3

Multiple class impressions for “Squirrel Monkey” category generated from different initializations for VGG-F [5] target classifier.

3.3 Attack: Craft the Data-Free Perturbations

After generating the class impressions in the first stage of our approach, we treat them as training data for learning a generator to craft the UAPs. Bottom panel of Fig. 1 shows the overview of our generative model. In the following subsections we present the architecture of our model along with the objectives that drive the learning.

3.4 Fooling Loss

We learn a neural network (G) similar to the generator part of a Generative Adversarial network (GAN) [6]. G takes a random vector z whose components are sampled from a simple distribution (e.g. \(\mathcal {U}[-1,1]\)) and transforms it into a UAP via a series of deconvolution layers. Note that in practice a mini-batch of vectors is processed. We train G in order to be able to generate the UAPs that can fool the target classifier over the underlying data distribution. To be specific, we train with a fooling loss computed over the generated class impressions (from Stage-I, Sect. 3.2) as the training data. Let us denote the predicted label on clean sample (x) as ‘clean label’ and that of a perturbed sample \((x+v)\) as ‘perturbed label’. The objective is to make the ‘clean’ and ‘perturbed’ labels different. To ensure this to happen, our training loss reduces the confidence predicted to the ‘clean label’ on the perturbed sample. Because of the softmax nonlinearity, confidence predicted to some other label increases and eventually causes a label flip, which is fooling the target classifier. Hence, we formulate our fooling loss as

$$\begin{aligned} L_f = -log(1-f^{s/m}_c(x+v)) \end{aligned}$$
(2)

where c is the clean label predicted on x and \(f^{s/m}_c\) is the probability (soft-max output) predicted to category c. Note that this objective is similar to most of the adversarial attacking methods (e.g. FGSM [7, 21]) in spirit.

3.5 Diversity Loss

Fooling loss \(L_f\) (Eq. (2)) only trains G to learn UAPs that can fool the target classifier. In order to avoid learning a degenerate G which can only generate a single strong UAP, we enforce diversity in the generated UAPs. We enforce that the crafted UAPs within a mini-batch are diverse via maximizing the pairwise distance between their embeddings \(f^l(x+v_i) \text { and } f^l(x+v_j)\), where \(v_i\) and \(v_j\) belong to generations within a mini-batch. We consider the layers of the target CNN for projecting \((x\,+\,v)\). Thus our training objective is comprised of a diversity loss given by

$$\begin{aligned} L_d = -\sum _{i,j=1, i\ne j }^K d( f^l(x+v_i) , f^l(x+v_j) ) \end{aligned}$$
(3)

where K is the mini-batch size, and d is a suitable distance metric (e.g., Euclidean or cosine distance) computed between the features extracted between a pair of adversarial samples. Note that the class impression x present in the two embeddings \(f(x+v_i) \text { and } f(x+v_j)\) is same. Therefore, pushing them apart via minimizing \(L_d\) will make the UAPs \(v_i\) and \(v_j\) dissimilar.

Therefore the loss we optimize for training our generative model for crafting UAPs is given by

$$\begin{aligned} Loss = L_f + \lambda L_d \end{aligned}$$
(4)

Note that this objective is similar in spirit to that presented in the concurrent work [18].

4 Experiments

In this section we present our experimental setup and the effectiveness of the proposed method in terms the success rates achieved by the crafted UAPs. For all our experiments we have considered ILSVRC [23] dataset and recognition models trained on it as the target CNNs. Note that, since we have considered data-free scenario, we extract class impressions to serve as data samples. Similar to the existing data-driven approach [13] that uses 10 data samples per class, we also extract 10 impressions for each class which makes a training data of 10000 samples.

4.1 Implementation Details

The dimension of the latent space is chosen as 10, i.e, z is random 10D vector sampled from \(\mathcal {U}[-1,1]\). We have investigated with other dimensions (e.g. 50, 100, etc.) for the latent space and found that 10 is efficient with respect to the number of parameters though the success rates are not very different. We used a mini-batch size of 32. All our experiments are implemented in TensorFlow [1] using Adam optimizer and the implementations are made available at https://github.com/val-iisc/aaa. The generator part (G) of the network maps the latent space Z to the UAPs for a given target classifier. The architecture of our generator consists of 5 deconv layers. The final deconv layer is followed by a tanh non-linearity and scaling by \(\xi \). Doing so limits the perturbations to \(\bigl [-\xi ,\, \xi \bigr ]\). Similar to [13, 17], the value of \(\xi \) is chosen to be 10 in order to add negligible adversarial noise. The architecture of G is adapted from [24]. We experimented on a variety of CNN architectures trained to perform object recognition on the ILSVRC [23] dataset. The generator (G) architecture is unchanged for different target CNN architectures and separately learned with the corresponding class impressions.

While computing the diversity loss (Eq. 3), for each of the class impressions in the mini-batch (x), we select a pair of generated UAPs \((v_1 \text { and } v_2)\) and compute the distance between \(f^l(x+v_1) \text { and } f^l(x+v_2)\). The diversity loss would be sum of all such distances computed over the mini-batch members. We typically consider the softmax layer of the target CNN for extracting the embeddings. Also, since the embeddings are probability vectors, we use cosine distance between the extracted embeddings. Note that, we can use any other intermediate layer for embedding and Euclidean distance for measuring their separation.

Since our objective is to generate diverse UAPs that can fool effectively, we give equal weight to both the components of the loss, i.e., we keep \(\lambda =1\) in Eq. (4).

Table 1. Success rates of the perturbations modelled by our generative network, compared against the data-free approach FFF [17]. Rows indicate the target net for which perturbations are modelled and columns indicate the net under attack. Note that, in each row, entry where the target CNN matches with the network under attack represents white-box attack and the rest represent the black-box attacks. The mean fooling rate achieved by the Generator (G) trained for each of the target CNNs is shown in the rightmost column.
Fig. 4.
figure 4

Sample universal adversarial perturbations (UAP), learned by the proposed framework for different networks, the corresponding target CNN is mentioned below the UAP. Note that images shown are one sample for each of the target networks, and across different samplings the perturbations vary visually as shown in Fig. 6.

4.2 UAPs and the Success Rates

Similar to [13, 15, 17, 18] we measure the effectiveness of the crafted UAPs in terms of their “success rate”. It is the percentage of data samples (x) for which the target CNN predicts a different label upon adding the UAP (v). Note that we compute the success rates over the 50000 validation images from ILSVRC dataset. Table 1 reports the obtained success rates of the UAPs crafted by our generative model G on various networks. Each row denotes the target model for which we train G and the columns indicate the model we attack to fool. Thus, we report the transfer rates on the unseen models also, which is referred to as “black-box attacking” (off-diagonal entries). Similarly, when the target CNN over which we learn G matches with the model under attack, it is referred to as “white-box attacking” (diagonal entries). Note that the right most column shows the mean success rates achieved by the individual generator networks (G) obtained across all the 6 CNN models. Proposed method can craft UAPs that have on an average \(20.18\%\) higher mean success rate compared to the existing data-free method to craft UAPs (FFF [17]).

Figure 4 shows example UAPs learned by our approach for different target CNN models. Note that the pixel values in those perturbations lie in \([-10,10]\). Also the UAPs for different models look different. Figure 5 shows a clean and corresponding perturbed samples after adding UAPs learned for different target CNNs. Note that each of the target CNNs misclassify them differently.

For the sake of completeness, we compare our approach with the data-driven counterpart also. Table 2 presents the white-box success rates for both data-free and data-driven methods to craft UAPs. We also show the fooling ability of random noise sampled in \([-10,10]\) as a baseline. Note that the success rates obtained by random noise is very less compared to the learned UAPs. Thus the adversarial perturbations are highly structured and very effective compared to the performance of random noise as perturbation.

On the other hand, the proposed method of acquiring class impressions from the target model’s memory increases the mean success rate by an absolute \(20\%\) from that of current state-of-the-art data-free approach (FFF [17]). Also, note that our approach performs close to the data-driven approach UAP [13] with a gap of \(8\%\). These observations suggest that the class impressions are effective to serve the purpose of the actual data samples in the context of learning to craft the UAPs.

Table 2. Effectiveness of the proposed approach to handle the data absence. We compare the success rates against the data-driven approach UAP [13], data-free approach FFF [17] and random noise baseline.
Fig. 5.
figure 5

Clean image (leftmost) of class “Sand Viper”, followed by adversarial images generated by adding UAPs crafted for various target CNNs. Note that the perturbations while remaining imperceptible are leading to different misclassifications.

4.3 Comparison with Data Dependent Approaches

Table 3 presents the transfer rates achieved by the image-agnostic perturbations crafted by the proposed approach. Each row denotes the target model on which the generative model (G) is learned and columns denotes the models under attack. Hence, diagonal entries denote the white-box adversarial attacks and the off diagonal entries denote the black-box attacks. Note that the main draft presents only the white-box success rates, for completeness we present both here. Also note that, in spite of being a data-free approach the mean SR (extreme right column) obtained by our method is very close to that achieved by the state-of-the-art data-driven approach to craft UAPs.

Table 3. Success rates (SR) for the perturbations crafted by the proposed approach compared against the state-of-the-art data driven approach for crafting the UAPs.

4.4 Diversity

The objective of having the diversity component \((L_d)\) in the loss is to avoid learning a single UAP and to learn a generative model that can generate diverse set of UAPs for a given target CNN. We examine the distribution of predicted labels after adding the generated UAPs. This can reveal if there is a set of sink labels that attract most of the predictions. We have considered the G learned to fool VGG-F model and 50000 samples of ILSVRC validation set. We randomly select 10 UAPs generated by the G and compute the mean histogram of predicted labels. After sorting the histogram, most of the predicted labels \((95\%)\) for proposed approach spread over 212 labels out of the total 1000 target labels. Whereas the same number for UAP [13] is 173. The observed \(22.5\%\) higher diversity is attributed to our diversity component \((L_d)\).

4.5 Simultaneous Targets

The ability of the adversarial perturbations to generalize across multiple models is observed with both image-specific [7, 28] and agnostic perturbations [13, 17]. It is an important issue to be investigated since it makes simple black-box attacks possible via transferring the perturbations to unknown models. In this subsection we investigate to learn a single G that can can craft UAPs to simultaneously fool multiple target CNNs.

We replace the single target CNN with an ensemble of three models: CaffeNet, VGG-16 and ResNet-152 and learn \(G_E\) using the fooling and diversity losses. Note that, since the class impressions vary from model to model, for this experiment we generate class impressions from multiple CNNs. Particularly, we simultaneously maximize the pre-softmax activation (Eq. (1)) of the desired class across individual target CNNs via optimizing their mean. We then investigate the generalizability of the generated perturbations. Table 4 presents the mean black-box success rate (MBBSR) for the UAPs generated by \(G_E\) on the remaining 3 models. For comparison, we present the MBBSR of the generators learned on the individual models. Because of the ensemble of the target CNNs \(G_E\) learns to craft more general UAPs and therefore achieves higher success rates than the individual generators.

Table 4. Generalizability of the UAPs crafted by the ensemble generator \(G_E\) learned on three target CNNs: CaffeNet, VGG-16 and ResNet-152. Note that because of the ensemble of the target CNNs, \(G_E\) learns to craft perturbations that have higher mean black-box success rates (MBBSR) compared to that of the individual generators.

4.6 Interpolating in the Latent Space

Our generator network (G) is similar to that in a typical GAN [6, 22]. It maps the latent space to the space of UAPs for the given target classifier(s). In case of GANs, interpolating in the latent space can reveal signs of memorization. While traversing the latent space, smooth semantic change in the generations means the model has learned relevant representations. In our case, since we generate UAPs, we investigate if the interpolation has smooth visual changes and the intermediate UAPs can also fool the target CNN coherently.

Figure 6 shows the results of interpolating in the latent space for ResNet-152 as the target CNN. We sample a pair of points \((z_1\text { and }z_2)\) in the latent space and consider 5 intermediate points on the line joining them. We generate the UAPs corresponding to all these points by passing them through the learned generator architecture G. Figure 6 shows the generated UAPs and the corresponding success rates in fooling the target CNN. Note that the UAPs change visually smoothly between any pair of points and the success rate remains unchanged. This ensures that the representations learned are relevant and interesting.

Fig. 6.
figure 6

Interpolation between a pair of points in Z space shows that the mapping learned by our generator has smooth transitions. The figure shows the perturbations corresponding to 5 points on the line joining a pair of points \((z_1 \text { and } z_2)\) in the latent space. Note that these perturbations are learned to fool the ResNet-152 [8] architecture. Below each perturbation, the corresponding success rate obtained over 50000 images from ILSVRC 2014 validation images is mentioned. This shows the fooling capability of these intermediate perturbations is also high and remains same at different locations.

4.7 Adversarial Training

We have performed adversarial training of target CNN with \(50\%\) mixture of clean and adversarial samples crafted using the learned generator (G). After 2 epochs, success rate of the G has dropped from 75.28 to 62.51. Note that the improvement is minor and the target CNN is still vulnerable. We then repeated the generator training for the finetuned network, resulting generator fools the finetuned network with an increased success rate of 68.72. After repeating this for multiple iterations, we observe that adversarial training does not make the target CNN significantly robust.

5 Discussion and Conclusions

In this paper we have presented a novel approach to mitigate the absence of data for crafting Universal Adversarial Perturbations (UAP). Class impressions are representative images that are easy to obtain via simple optimization from the target model. Using class impressions, our method drastically reduces the performance gap between the data-driven and data-free approaches to craft the UAPs. Success rates closer to that of data-driven UAPs demonstrate the effectiveness of class impressions in the context of crafting UAPs.

Another way to look at this observation is that it would be possible to extract useful information about the training data from the model parameters in a task specific manner. In this paper, we have extracted the class impressions as proxy data samples to train a generative model that can craft UAPs for the given target CNN classifier. It would be interesting to explore such feasibility for other applications as well.

The generative model presented in our approach is an efficient way to craft UAPs. Unlike the existing methods that perform complex optimizations, our approach constructs UAPs through a simple feed forward operation. Significant success rates, surprising cross model generalizability even in the absence of data reveal severe susceptibilities of the current deep learning models.