VMIFGSM

Enhance Stealthiness and Transferability of
Adversarial Attacks with Class Activation Mapping

Ensemble Attack
Hui Xia∗ Rui Zhang Zi Kang

Ocean University of China Ocean University of China Ocean University of China
xiahui@ouc.edu.cn zhangrui0504@stu.ouc.edu.cn kangzi@stu.ouc.edu.cn
Shuliang Jiang Shuo Xu

Ocean University of China Ocean University of China
jiangshuliang@stu.ouc.edu.cn xushuo@stu.ouc.edu.cn
Abstract—Although there has been extensive research on the an increasingly important role in many fields [4]. In au-
transferability of adversarial attacks, existing methods for gener- tonomous driving, DNNs accurately recognize and understand
ating adversarial examples suffer from two significant drawbacks: road and traffic conditions by analyzing images and data
poor stealthiness and low attack efficacy under low-round attacks. from cameras and radar sensors. In the medical area [5]–[8],
To address the above issues, we creatively propose an adversarial DNNs automatically identify and segment lesions in medical
example generation method that ensembles the class activation
images, helping doctors diagnose and treat diseases more
maps of multiple models, called class activation mapping ensem-
ble attack. We first use the class activation mapping method quickly and accurately. In the security field [9]–[12], DNNs
to discover the relationship between the decision of the Deep achieve automatic alarm and tracking of security monitoring
Neural Network and the image region. Then we calculate the systems by recognizing and tracking objects such as faces and
class activation score for each pixel and use it as the weight for vehicles. In addition to the above application areas, DNNs
perturbation to enhance the stealthiness of adversarial examples have wide applications in many other fields. For example,
and improve attack performance under low attack rounds. DNNs are used for natural language processing for machine
In the optimization process, we also ensemble class activation translation, sentiment analysis, and text generation tasks [13],
maps of multiple models to ensure the transferability of the [14]. In recommendation systems, DNNs can use to recom-
adversarial attack algorithm. Experimental results show that our mend products, music, movies, and other content. In industrial
method generates adversarial examples with high perceptibility,
control, DNNs can use to predict machine failures and optimize
transferability, attack performance under low-round attacks, and
evasiveness. Specifically, when our attack capability is comparable production lines. However, as DNNs are increasingly applied
to the most potent attack (VMIFGSM), our perceptibility is close in more fields, the threat of adversarial attacks is becoming
to the best-performing attack (TPGD). For non-targeted attacks, more serious [15]–[20].
our method outperforms the VMIFGSM by an average of 11.69%
in attack capability against 13 target models and outperforms the An adversarial attack is a malicious attack against machine
TPGD by an average of 37.15%. For targeted attacks, our method learning models to deceive the model and cause it to make
achieves the fastest convergence, the most potent attack efficacy, incorrect predictions. Adversarial attacks may have a serious
and significantly outperforms the eight baseline methods in low- impact on the security and privacy of a model. For example,
round attacks. Furthermore, our method can evade defenses and when the adversarial attack is successfully executed against
be used to assess the robustness of models1 . an image classification model, a harmless image may be
incorrectly classified as a completely different object, leading
I. I NTRODUCTION to serious safety issues. In autonomous vehicles, attackers can
Deep Neural Networks (DNNs) have achieved remarkable deceive the vehicle’s cameras by adding specific patterns and
achievements in image classification [1]–[3]. They are playing noise, resulting in traffic accidents. In facial recognition tasks,
attackers may trick the system by adding invisible noise to
1 Corresponding author is Xia Hui, e-mail: xiahui@ouc.edu.cn. Hui human eyes, thus accessing sensitive personal information. The
Xia and Rui Zhang contributed equally to this work. Our code is above examples demonstrate that we must consider the risk of
available at https://github.com/DreamyRainforest/Class Activation Mapping adversarial attacks when developing and deploying DNNs.
Ensemble Attack/tree/main
Researchers have proposed many methods to defend
against adversarial attacks [21], [22], including adversarial
Network and Distributed System Security (NDSS) Symposium 2024
training [23], defensive regularization [24], and adversarial
26 February - 1 March 2024, San Diego, CA, USA example detection [25], [26]. Adversarial training [27] is
ISBN 1-891562-93-2 a commonly used method that enhances the robustness of
https://dx.doi.org/10.14722/ndss.2024.23164 DNNs by adding adversarial examples to the training data and
www.ndss-symposium.org alternating between adversarial and benign examples during
training. Defensive regularization methods limit the mapping • As far as we know, this is the first black-box adver-
space between model input and output by introducing reg- sarial attack method that considers both attack trans-
ularization terms into the loss function, thereby reducing the ferability and perturbation weighting simultaneously.
impact of adversarial examples. Adversarial example detection
methods attempt to detect adversarial examples from input • We use a strategy different from traditional meth-
data to prevent them from entering the model. Although these
ods to improve the stealthiness and transferability
methods can effectively improve the robustness of DNNs,
of adversarial attack algorithms. Traditional methods
adversarial attacks continue to evolve and pose significant
directly integrate the outputs of multiple models into
challenges. Therefore, more efforts and exploration are still the objective function, while we incorporate the CAM
needed to research the security and robustness of DNNs. into the search process for minimum perturbation. This
Understanding the principles and methods of adversarial method can avoid adding excessive perturbations in
attacks, exploring possible attack methods, and mitigating unimportant regions, thereby preserving the stealthi-
potential risks are essential to enhance the security and ro- ness of the adversarial examples. It can quickly change
bustness of DNNs. Various adversarial attack methods have the decision region of benign images to enable the at-
been proposed, including optimization-based and gradient- tacking algorithm to exhibit good attack performance
based methods, which can achieve high success rates in with low attack rounds. We also ensemble the CAM
the white-box setting [28]–[32]. However, in the black-box of multiple models to ensure the transferability of
setting, the attack efficacy is lower due to the inability to adversarial examples, thereby improving the success
access the target model’s internal details. Some transferability- rate and robustness of the attack.
enhancing attack methods have been proposed to solve the
above problem, including gradient optimization attack, input • Experimental results show that our method produces
transformation attack, and model ensemble attack [33]–[38]. adversarial examples with good perceptibility, attack
However, these methods add indiscriminate perturbations to all ability, transferability, convergence, and evasiveness.
pixel locations in the image, resulting in generated adversarial Specifically, when our attack ability is comparable to
examples with poor stealthiness and low attack capability for the most vigorous VMIFGSM attack, our perceptibil-
low-round attacks. In particular, compared to the two types of ity is close to the best-performing TPGD. Compared
methods, gradient optimization attack and input transformation to VMIFGSM, L2 is reduced by 24.08, Low fre by
attack, the model ensemble attack is an efficient attack method 13.91, SSIM by 0.04, and PSNR by 0.69 in our
and widely used to improve black-box attack performance. method. Under non-targeted attack mode, compared to
Therefore, in this work, we still focus on model ensemble the VMIFGSM, our method improves the average at-
attacks and attempt to perturb important locations to enhance tack success rate against 13 target models by 11.69%,
the stealthiness and convergence speed of the adversarial and compared to the TPGD, the average attack success
examples’ attack capability under low-round attacks. rate is enhanced by 37.15%. Under targeted attack
Severi et al. [39] utilized machine learning interpretabil- mode, our method converges fastest and has the most
ity tools, such as Shapley Additive exPlanations, to gener- substantial attack ability, and the attack ability is
ate malware samples that would be misclassified as benign significantly better than the eight baseline methods
samples by a classifier. Inspired by their work, we attempt under low attack rounds. The attack success rate of
to employ the Class Activation Mapping (CAM) method our method has been at least 10% higher than that of
to generate adversarial examples. The CAM method can VMIFGSM since the fourth round. Also, our method
reveal the connection between the decisions of DNNs and can bypass defense methods, making it helpful in
the regions in an image, resulting in adversarial examples evaluating the robustness of models.
with solid attack capabilities. However, the targeted nature
of CAM scores may result in adversarial examples applying
only to specific target models, leading to poor transferability. In the following chapters, we will briefly introduce our
To address this problem, we incorporate CAM scores as research method’s main content and contributions. Specifically,
weights for adding perturbations to each pixel, enhancing this paper will be divided into the following sections: Section
the attack capabilities for low attack epochs and stealthiness. II will review existing research related to adversarial attacks,
Additionally, we improve the transferability of adversarial including gradient-based attacks, input transformation attacks,
examples by ensembling CAM scores from multiple models. and model ensemble attacks. Section III will briefly introduce
Thus, we propose a class activation mapping ensemble at- seven adversarial attack methods to help readers understand
tack. We validate our attack method on the ILSVRC 2012 our attack strategy. Section IV will elaborate on our research
validation set and compare it with ten baseline methods method, a class activation mapping ensemble attack for gen-
regarding perceptibility and attack ability for two attack modes erating adversarial examples. Section V will introduce the
(targeted and non-targeted). Experimental results show that datasets, experimental settings, and evaluation metrics used
our attack method generates adversarial examples with good in this study, as well as the detailed implementation of our
stealthiness and has good attack and transferability abilities for method. Also, we will present the experimental results and
13 models, including AlexNet [40], VGG16 [41], Efficient- compare the performance of different methods regarding attack
Net b0 [42], ResNet18/34/50 [43], WideResNet50/101 [44], effectiveness, transferability, and evasiveness. Finally, in Sec-
Inception v2 [45], MobileNet v2 [46], ConvNeXt [47], tion VI, we will summarize the contributions and limitations
ViT [48], RegNet [49]. Our contributions are summarized as of our method and present future research directions and
follows: recommendations.
2
II. R ELATED W ORK of adversarial examples to some extent, the improvement is
insignificant.
Research on adversarial attacks can be roughly divided
into five categories: gradient-based attacks, optimization-based
attacks, score-based attacks, decision-based attacks, and trans- C. Model Ensemble Attacks
ferable attacks. We focus on transferable attacks, and in The model ensemble attack improves the transferability
this section, we introduce existing transferability-enhancing of the attack by attacking multiple models. This method can
methods from three aspects: gradient-based attacks, input effectively bypass the defense mechanisms of different models,
transformation attacks, and model ensemble attacks. improving the success rate and transferability of the attack.
Liu et al. [58] first proposed ensemble attacks by averaging
A. Gradient Optimization Attack the predictions (probability) of multiple models and using
The gradient-based attack method is a standard adversarial existing adversarial attack methods (such as FGSM and PGD)
attack method that utilizes the gradient information of the to improve the transferability of adversarial examples. Dong
target model to generate adversarial examples. The most et al. [56] proposed two variants of multi-model ensemble
famous way among them is the Fast Gradient Sign Method attacks, which generate adversarial examples by integrating
(FGSM) [50], which obtains a perturbation vector by multiply- multiple models’ logit outputs and losses to improve the trans-
ing the gradient direction of each pixel with the perturbation ferability of adversarial examples. However, these methods
and then adds it to the benign image to generate an adversarial uniformly integrate all models’ outputs and use stochastic
example. Although FGSM has a high success rate in attacking gradient descent for optimization can easily get trapped in local
a single model, it cannot guarantee the transferability of the optima. Therefore, Xiong et al. [59] proposed a Stochastic
attack effect to other models. Kurakin et al. [51] proposed the Variance-Reduced Ensemble attack (SVRE), but the efficiency
Basic Iterative Method (BIM) by setting multiple smaller steps of generating adversarial examples is slow, and the attack
to construct more accurate adversarial examples. Madry et capability of adversarial examples is not significant.
al. [52] searched for perturbations within the L-norm ball range Although existing methods have shown some improvement
of the data point and proposed Projected Gradient Descent in the transferability of adversarial examples, the effect is
(PGD). Although PGD has good attack ability in white-box insignificant. Moreover, these methods indiscriminately add
settings, it is prone to overfitting the target model in black-box perturbations to all pixel positions in the image, resulting
settings, leading to poor transferability of adversarial examples. in poor perceptual quality and low attack capability at low
To enhance the transferability of adversarial examples, Dong iteration counts. In summary, the current methods could be
et al. [53] proposed a Momentum Iterative Fast Gradient improved in improving adversarial examples’ transferability
Sign Method (MIFGSM), which stabilizes the update direction and stealthiness.
during iterations and avoids poor local maxima. Lin et al. [54]
introduced Nesterov Accelerated Gradient into gradient-based Inspired by the above approach, we are still committed to
attacks by effectively looking ahead to improve the transfer- using model ensemble attacks to enhance the transferability of
ability of adversarial examples. Wang et al. [55] introduced adversarial examples. However, unlike the previous method,
variance tuning in gradient-based iterative attacks to enhance we do not ensemble multiple target models’ logit outputs,
the transferability of attacks. Although these methods improve predictions, and losses, but rather the CAMs of numerous
the transferability of adversarial examples, the effects are models. By integrating the CAMs of various models, we
insignificant. can ensure the transferability of adversarial examples and
improve the attack capability of adversarial examples at low
B. Input Transform Attacks attack rounds while improving the stealthiness of adversarial
examples to some extent.
The input transformation-based attack method enhances
the attack’s transferability by utilizing the image’s invariance. III. TECHNICAL BACKGROUND
This type of attack method mainly includes methods such as
rotation, translation, and scaling to transform the image. Xie et We briefly introduce seven adversarial attack methods:
al. [56] proposed the Diverse Input Method (DIFGSM), which FGSM, PGD, TPGD, NIFGSM, MIFGSM, VMIFGSM, and
increases the transferability of adversarial examples by using SVRE. This is to help readers understand our attack strategy.
input diversity such as randomly adjusting size and padding
to create different input patterns. Dong et al. [53] generated A. Fast Gradient Sign Method (FGSM)
adversarial examples by optimizing perturbations on a set of
translated images and proposed the Translation-Invariant At- The FGSM [50] is a standard adversarial attack that can
tack Method (TIFGSM). Lin et al. [54] demonstrated the scale deceive DNNs into producing incorrect outputs. FGSM can be
invariance of deep learning models and proposed the Scale- described as follows: assuming that x is the benign image, y
Invariant Method (SINIFGSM) to improve the transferability is the true label, and L(x, y; θ) is the loss function, θ is the
of adversarial examples by optimizing adversarial perturba- model parameters. The process of generating an adversarial
tions on scaled copies of the input image. Wang et al. [57] example with FGSM is as follows:
proposed the Admix attack, which computes the gradients
on the input image and mixes them with a small portion x0 = x + α · sign(∇x L(x, y; θ)) (1)
of each additional module image while using the original
0
label of the input to generate more transferable adversarial where x is the perturbed image, α is the magnitude of the
examples. Although these methods improve the transferability perturbation, ∇x L(x, y; θ) is the gradient of the loss function
3
L(x, y; θ) with respect to the input image, sign(∇x L(x, y; θ)) where gt is the gradient computed using Nesterov accelerated
is the sign of the gradient element-wise. This method is gradient method. Specifically, this method uses the Nesterov
computationally efficient because it only requires one forward Accelerated Gradient Method to calculate the adversarial gra-
and backward propagation to generate an adversarial example. dient quickly and uses scale invariance to ensure the stability of
However, the choice of the perturbation magnitude may affect the attack effect. The Nesterov Accelerated Gradient Method is
the attack effectiveness, and it needs to be selected experimen- a momentum-based optimization algorithm that can accelerate
tally or by other methods. the convergence speed of gradient descent. Its mathematical
expression is as follows:
B. Projected Gradient Descent (PGD)
vt+1 = µ · vt − η · ∇x F(xt + µ · vt ) (6)
The main idea of PGD [52] is to iteratively add a certain
amount of perturbation within a specific range to the input
image to cause misclassification. The optimization objective xt+1 = xt + vt+1 (7)
is:
where µ is the momentum parameter, η is the learning rate.
0 0
min
0
{L(F(x ), y; θ)} s.t., kx − xk∞ ≤ (2) Scale invariance means that the attack effect should remain
x
stable for different scales of input data. Scale invariance
F is the target model, is the maximum perturbation range, k· is achieved by dividing the perturbation into two parts: a
kp represents the Lp norm. To solve the optimization problem, global scaling and a local perturbation. Specifically, the attack
PGD iteratively updates the perturbation using the gradient perturbation can be represented as:
descent method, i.e.,
kxk2
xt+1 = Πx+ (xt + α · sign(∇xt L(F(xt ), y; θ))) (3) d = α · sign(gt ) (8)
kgt k2
where y is the true label, xt is the sample obtained after
the t-th iteration, α is the learning rate, sign(·) represents the where represents element-wise multiplication, kxk2 and
sign function, and Πx+ (·) represents the projection operation. kgt k2 represent the L2 norms of the input data and the
Overall, PGD is a powerful and flexible iterative adversarial adversarial gradient. Combining Nesterov’s accelerated gradi-
attack algorithm that gradually approaches the optimal solution ent method and scale invariance achieves good performance
by updating the perturbation multiple times, generating more in both adversarial attack effectiveness and computational
effective adversarial examples. efficiency.
C. Theoretically Grounded Approach (TPGD)

E. Moment Iterative Fast Gradient Sign Method (MIFGSM)
The author discusses the trade-off between adversarial
robustness and accuracy in DNNs [60]. The authors propose The MIFGSM [53] is a commonly targeted adversarial
a theoretically grounded approach that balances the trade- attack algorithm against DNNs, based on the FGSM and the
off by modifying the training data. Specifically, the authors Momentum Gradient Descent algorithm. The basic idea of
define a loss function that combines adversarial robustness and MIFGSM is to add a momentum term to FGSM to speed up the
accuracy, called TPGD. The form of the loss function is: attack and increase the attack’s success rate. Its mathematical
expression is:
L(θ) = E(x,y)∼D [max LCE (F(x + δ), y) − ξR(δ)] (4)
δ∈S
xt+1 = clip(xt + α · sign(gt ))
D is the distribution of the training data, LCE is the cross- ∇x L(xt , y; θ) (9)
entropy loss function, S is the set of adversarial examples, gt = µ · gt−1 +
R(δ) is the robustness evaluation of adversarial perturbation ||∇x L(xt , y; θ)||1
δ, ξ is a balancing factor. The meaning of this loss function
where µ is the momentum factor used to control the smooth-
is to maximize the cross-entropy loss function for correctly
ness of the perturbation, the clip function is used to truncate
predicted examples among all adversarial examples, while
xt to ensure it is within the range of [x − , x + ]. MIFGSM
simultaneously minimizing the robustness evaluation of the
algorithm introduces a momentum term, enabling the attacker
adversarial examples to achieve a balance between adversarial
to find the path to a successful attack faster and avoid getting
robustness and accuracy. The authors prove that this loss func-
stuck in local minima. Additionally, MIFGSM can increase the
tion can be minimized under certain conditions and propose
attack success rate and amplitude by increasing the number of
an optimization algorithm to solve this minimization problem.
epochs, but it also increases attack time and cost.
D. Nesterov Accelerated Gradient Fast Gradient Sign Method
(NIFGSM) F. Variance-based Moment Iterative Fast Gradient Sign
The authors propose an adversarial attack method based Method (VMIFGSM)
on Nesterov accelerated gradient and scale invariance, called
NIFGSM [54]. Mathematically, this method can be represented VMIFGSM [55] is an improved targeted attack algorithm
by the following equation: for DNNs. It introduces variance information based on the
MIFGSM algorithm to better explore the model’s gradient in-
x0 = x + α · sign(gt ) (5) formation and accelerate the attack process. The mathematical
4
expression of the VMIFGSM algorithm is as follows: δ such that the 0 output
0
of x is significantly different from
0 0 the output of x , x = x + δ, while the magnitude of the
xt+1 = xt + α · sign(gt+1 ) perturbation δ is small enough to be imperceptible to humans.
0
ĝt+1 = ∇x0 L(xt , y; θ) Where is the perturbation constraint. This can be formulated
ĝt+1 + vt (10) as the following constrained minimization problem:
gt+1 = µ · gt + 0
||ĝt+1 + vt ||1 min d(x , x)
0 δ (11)
vt+1 = V (xt ) 0
s.t., F(x; θ) 6= F(x ; θ)
where vt+1 represents the gradient variance of the N samples, 0 0
V (·) is the approximate of gradient variance, i.e., V (x) = where d(x , x) represents the distance between x and x ,
1
N which can be Euclidean distance, Lp norm distance, etc.
∇xi L(xi , y) − ∇x L(x, y), N represents the number of
P
N
i=1 For the non-targeted attack, the purpose of a non-targeted
sampled examples. The dynamic step size setting can help attack is to deceive a DNN by causing it to predict an incor-
attackers better explore the gradient information, accelerate the rect label by maximizing the prediction error. The objective
attack process, and improve the attack success rate. function for a non-targeted attack can be formulated as the
following optimization problem:
G. Stochastic Variance Reduced Ensemble Attack (SVRE) 0 0
x = arg max L(x , y; θ) (12)
The authors propose an adversarial attack method based 0
||x−x ||<
on an ensemble with random variance reduction to improve
the transferability of adversarial examples, called SVRE [59]. For the targeted attack, the goal of the attack is to find
0
This method includes three key components: random pertur- the closest x to the input image x, such that the perturbed
bation, variance reduction, and ensemble. Specifically, random 0
image x is classified by the model F as the targeted label y0 .
0
perturbation is used to increase the diversity of attack samples, Assuming the input image x, the specified targeted label y ,
variance reduction is used to reduce the impact of noise and and the model’s classification function F, the targeted attack
improve the efficiency and stability of the attack, and the can be formulated as the following optimization problem:
ensemble combines the prediction results of multiple models
to improve the success rate and transferability of the attack. Algorithm 1 Class Activation Mapping Ensemble Attack
The goal of this method is to minimize a loss function on
the attack samples, which consists of the distance between the Require: Target model F, loss function L, input image x, true
output of the original model and the adversarial model and a label y, perturbation constraint , attack epochs T, decay
regularization term. factor µ. 0
Ensure: Adversarial example x .
To improve attack efficiency, the authors use a stochastic 1: α = ε/T ;
gradient descent-based optimization method with a mini-batch 2: g0 = 0;
technique to randomly select a subset of data for an update. 3: v0 = 0;
In addition, a new ensemble strategy, weighted averaging, is 0
4: x0 = x;
introduced to balance the contributions of different models. 5: for t = 0 → T − 1 do
Finally, experiments on multiple datasets show that compared 6: Compute gradient with Eq. (17);
with other adversarial attack methods, this method has a higher 7: Update gradient with Eq. (16);
attack success rate and better transferability. 8:
0
Update xt+1 with Eq. (15);
9: end for
0 0
IV. M ETHODOLOGY 10: x = xT ;
0
This section overviews the proposed method from prob- 11: Return x .
lem definition and class activation mapping ensemble attack.
Firstly, the problem is defined by presenting the objective 0 0 0
functions for targeted and non-targeted attacks, which intro- x = arg max L(x , y ; θ) (13)
||x−x0 ||<
duces the problem that needs to be addressed. Secondly, the
class activation mapping is introduced to enhance the method’s
attack performance and integrate the CAM into finding the To improve the stealthiness of the adversarial example, at-
minimum perturbation that avoids adding excessive perturba- tackers often need to add some constraints, such as limiting the
tions in unimportant regions, thus ensuring the stealthiness of size of the perturbation, to ensure that humans do not observe
adversarial examples. the perturbation. Different adversarial attack algorithms use
other loss functions and constraints. For example, gradient-
based adversarial attack algorithms such as the FGSM and
A. Problem Definition
PGD use cross-entropy loss and norm constraints to maximize
Adversarial attacks can be viewed as an optimization the error and limit the perturbation size. However, their attack
problem. The goal is to deceive a DNN by applying small capability is defined by gradient information. On the other
perturbations to the input image, resulting in a significantly hand, evolutionary algorithms-based adversarial attack algo-
different output than the original. Specifically, adversarial rithms use the objective function to guide the search process to
attacks can be defined as follows: given a target model F with find the most aggressive perturbation, but their computational
parameters θ and an input image x, find a small perturbation costs are often high. Although the aforementioned methods
5
have made great progress in improving the perception of the generated adversarial examples will be. Based on prior
adversarial examples, the adversarial examples generated by knowledge, we choose a gradient substitute model, which is
the above method have relatively weak attack ability in a black- used to determine the sign direction of perturbation, and select
box setting. the model with larger heatmap regions as the CAMs substitute
model, which calculates the weights of the perturbations.
Researchers have proposed various methods to enhance
the performance and transferability of adversarial attack algo- Non-targeted attack: Given the target 0
model F and the
rithms. One commonly used way is integrating multiple models input image x, find a perturbed
0
image x such that the output
to improve the model’s robustness and attack capability. For of x and the output of x differ significantly, while the
0
example, various models can be trained simultaneously in perturbation in x is small enough not to draw human attention,
adversarial training, and then their outputs are integrated to and without specifying a specific deceived output. Specifically,
obtain stronger robustness. In adversarial attacks, the model the non-targeted attack can be represented as the following
ensemble attack method can enhance attack capability. This minimization-constrained problem:
method produces adversarial examples by weighted averaging 0 0 0
the outputs of multiple models, thereby improving attack x = arg max L(x , y; θ) + ||x̃ − x ||2 (14)
0
||x−x ||<
performance. Although these methods have some improvement
in attack transferability, they all have the drawbacks of poor where x̃ is the maximum of ensemble class activation maps,
low iteration attack capability, slow convergence speed, and
poor stealthiness. x̃ = max{x1cam , x2cam , · · · , xncam }
The CAM can discover the relationship between a DNN’s xcam is the score of the class activation mapping, n is the
decision and the image region and visualize the model’s output number of CAM-based substitute models.
based on its feature map, thereby providing attackers with The first term is used to improve the attack capability
more information to construct adversarial examples. Therefore, of adversarial examples, while the second term is used to
we use the CAM to create adversarial examples, which can improve their transferability among different models. We use
effectively enhance the attack capability of adversarial attack the gradient variance optimization method to optimize the ob-
methods. We still focus on model ensemble attacks to improve jective function. However, differently from the method in [53],
the transferability of adversarial examples. However, unlike we incorporate CAMs [61] as the weight of the perturbation
existing research on model ensemble attack methods (such as to further enhance the optimization effect. Specifically, we
integrating multiple models’ logit outputs, prediction probabil- multiply the CAM with the perturbation to obtain a new
ities, and losses), we solve the problem of poor transferability perturbation vector. Then we use the sign of the gradient
by integrating multiple models’ CAMs to construct adversarial variance of the gradient substitute model as the direction for
examples. updating the perturbation, thereby improving the adversarial
attack ability of the adversarial examples. That is,
B. Class Activation Mapping Ensemble Attack 0 0
xt+1 = xt + (λ · α) · (w · Mc ) · sign(gt+1 ) (15)
Overview: In the black-box attack setting, attackers can ĝt+1 + vt
only access the input and output of the target model without gt+1 = µ · gt + (16)
||ĝt+1 + vt ||1
obtaining its internal structure and parameter information.
0 0
Therefore, we need to construct substitute models to gen- ĝt+1 = ∇x0 L(x , y; θ) + ||x̃ − x ||2 (17)
erate adversarial examples to deceive targeted DNNs. We t
first train substitute models (gradient substitute model and 1 X

N
CAMs substitute model) and use the CAM method to calculate vt+1 = ∇ i L(xi , y; θ)−∇x L(x, y; θ) (18)
the class activation score of each pixel, which is used as N i=1 x
the weight of the perturbation to ensure the stealthiness of where λ represents the perturbation step size factor, w rep-
the adversarial examples and enhance the attack performance resents the perturbation magnitude factor used to constrain
under low attack epochs. In the optimization process, we the magnitude of perturbation, Mc represents the perturbation
integrate the CAMs [61] of multiple CAM-based substitute weight of pixels, i.e., the score of class activation map of
models to ensure the transferability of the adversarial attack al- CAMs substitute model,
gorithm. Compared to traditional methods, using CAM scores
as perturbation weights can effectively avoid adding exces- Mc (a, b) = max{m1c (a, b), m2c (a, b), · · · , mnc (a, b)} (19)
sive perturbations in unimportant regions, thereby ensuring K
the stealthiness of the adversarial examples. Meanwhile, this mc (a, b) =
X
(αck Flk (a, b)) (20)
method can add perturbations in a targeted manner and quickly
k=1
change the decision region of benign images, enabling the
adversarial attack method to exhibit good attack performance where c represents the class of interest in addition to the current
even under low iteration times. class,
Substitution model: When selecting a substitute model,
X Flk (a, b) ∂Sc (Fl ) lk
αck = (P lk lk
F (a, b)) (21)
various factors, such as the complexity of the model, the a,b a,b ∂F (a, b) ∂F (a, b)
similarity to the target model, and the availability of training
data, need to be considered. Generally, the more similar the where l represents the index of the target layer, Fl represents
substitute model is to the target model, the more effective the response of the target layer, Flk represents the response of
6
TABLE I: Non-targeted Attack: Perceptive measure (Gradient substitute model: ResNet50, CAMs substitute models: WideRes-
Net101, Inception, and ResNet34, Target model: ResNet50, = 16/255, α = 1/255, λ = 0.75, w = 2).
Attack
TPGD [60] PGD [52] TIFGSM [53] DIFGSM [56] NIFGSM [54] MIFGSM [53] SINIFGSM [54] VMIFGSM [55] VNIFGSM [55] OUR
Metric
PSNR 31.54 29.60 30.75 29.60 29.32 29.39 29.20 29.41 29.31 30.10
MSE 0.0008 0.0012 0.0009 0.0012 0.0012 0.0012 0.0013 0.0012 0.0012 0.0011
L2 120.02 175.88 139.00 175.58 185.30 182.62 190.08 182.64 186.75 158.56
L∞ 0.2751 0.2813 0.2765 0.2828 0.278 0.2848 0.2816 0.2828 0.2842 0.2795
Low fre 47.27 69.90 72.98 70.91 78.79 78.27 82.14 82.16 83.45 68.25
SSIM 0.8900 0.8100 0.8900 0.8100 0.7900 0.8000 0.7900 0.8100 0.8022 0.8500
AASR 37% 49% 53% 60% 62% 63% 69% 71% 71% 71%
the kth feature map in layer l, Flk (a, b) represents the response V. E XPERIMENT
at position (a, b) in feature map Flk , and Sc (Fl ) represents
the score of the interested class, We validate our method mainly through five parts: ex-
perimental settings, perceptual evaluation, attack performance
K X
l
X ∂Sc (Fl ) lk analysis, robustness analysis, and ablation studies. In the
Sc (F ) = ( F (a, b)) + Φ(Fl ) (22) experimental settings, we introduce the dataset, evaluation
∂Flk (a, b)
k=1 a,b metrics, baseline methods, and defense methods. In the percep-
XL X ∂Sc (Fl ) tual evaluation section, we employ various visual perceptual
Φ(Fl ) = btj (23) metrics to analyze the perceptual differences between adver-
t=l+1 j ∂µtj sarial examples and benign images. The attack performance
where µtj represents the unit in the t-th layer, and btj represents analysis section primarily assesses the attack capability of the
the offset of the unit in the t-th layer. The pseudo-code for proposed method in non-targeted and targeted attacks. In the
the class activation mapping ensemble attack is shown in robustness analysis section, we evaluate the effectiveness of
Algorithm 1. the proposed attack against various defense methods. In the
ablation studies section, we analyze the rationality of using
0
Targeted attack: Give the input image x, the target label CAMs as adversarial perturbation weights and investigate the
y , and the model F, the objective function for targeted attack impact of different modules and parameter values on the attack
can be derived as follows: performance of our method.
0 0 0 0
x = arg max −L(x , y ; θ) − ||x̃tar − x ||2 (24)
||x−x ||<0 A. Experimental Settings
where x̃tar is the integrated class activation map of the target Dataset: We validate the proposed method on the ILSVRC
class, 2012 validation set and the data subsets provided in refer-
ences [53]–[55] while ensuring that each developed model
x̃tar = max{x1tar 2
cam , xtar cam , · · · , xntar cam } correctly classifies all selected test images.
xtar cam is the score of the targeted class activation mapping. Models: To differentiate the roles of different models,
It should be noted that we categorize the models into the following three types:
0 0
xt+1 = xt + (λ · α) · (w · Mtc ) · sign(gt+1 ) CAMs substitute models, which are employed to compute
perturbation weights; gradient substitute models, which are
Mtc (a, b) = max{m1tc (a, b), m2tc (a, b), · · · , mntc (a, b)} used to calculate gradient signs when crafting adversarial
examples; and target models, solely employ to evaluate the
mtc is the activation map score for the specified class. Where
attack performance of the proposed method, without access
K
X to any information about the model. It is worth emphasizing
k lk
mtc (a, b) = (αtc F (a, b)) that all the models we used, including CAMs substitute,
k=1 gradient substitute, and target models, are pre-trained models
lk specifically designed for the Imagenet dataset and provided
k
X F (a, b) ∂Stc (Fl ) lk
αtc = (P F (a, b)) within the PyTorch library. We employ WideResNet101 [44],
lk lk
a,b a,b ∂F (a, b) ∂F (a, b) Inception v2 (Inception) [45], and ResNet34 [43] as CAMs
substitute models, and ResNet50 [43] as the gradient sub-
K X
X ∂Stc (Fl ) lk stitute model. We also regard models such as AlexNet [40],
Stc (Fl ) = ( lk F (a, b)) + Φ(Fl ) VGG16 [41], EfficientNet b0 (EfficientNet) [42], WideRes-
∂F (a, b)
k=1 a,b Net50 [44], MobileNet v2 (MobileNet) [46], ResNet18 [43],
XL X ∂Stc (Fl ) ConvNeXt [47], ViT [48], and RegNet [49] as target models.
Φ(Fl ) = btj
t=l+1 j ∂µtj Metrics: The perceptual metrics include PSNR, Mean
Squared Error (MSE), SSIM, Low fre, CIEDE2000, L2 norm,
7
Fig. 1: Adversarial example generated by our method and baseline methods under non-targeted attack. L2 , SSIM, and PSNR
represent perceptual metrics, and ‘ASR’ represents the success rate of attack against the target model (Target model: ResNet50,
Attack mode: non-targeted attack, Gradient substitute model: ResNet50, CAMs substitute models: WideResNet101, Inception,
and ResNet34, = 16/255, α = 1/255, epoch = 10, λ = 0.75, w = 2).
Fig. 2: Adversarial examples generated by our method and baseline methods under targeted attack. ‘Ori-label’ represents the
original label of the image, ‘Tar-label’ represents the targeted label we specified, ‘Epoch’ represents the iteration round of the
attack, and ‘Adv-label’ represents the predicted label of ResNet50 (Target model: ResNet50, Gradient substitute model: ResNet50,
CAMs substitute models: WideResNet101, Inception, and ResNet34, = 16/255, α = 1/500, λ = 0.75, w = 4).
and L∞ norm. The attack capability metrics include the Attack B. Perceptual Evaluation
Success Rate (ASR) against one target model and the Average
Attack Success Rate (AASR) against 13 target classifiers. Fig. 1 and Fig. 2 depict the adversarial examples generated
by our method and baseline methods under non-targeted and
targeted attacks. Please refer to Fig. 1 in the Appendix for
Baseline: We evaluate ten baseline methods in our
the results of the baseline methods converging to the target
study, including PGD [52], TPGD [60], DIFGSM [56],
label. To validate the perceptibility of our method, we employ
TIFGSM [53], MFGSM [53], NIFGSM [54], SINIFGSM [54],
six distance metrics to analyze the perceptibility of adversarial
VMIFGSM [55], VNIFGSM [55], SVRE [59]. The ensemble
examples under non-targeted and targeted attacks, as shown
model of SVRE are Inception v3, Inception v4, InceptionRes-
in Table I and Table II. Table I presents the perceptibility
net v2, and ResNet101.
of the adversarial examples generated by ten attack methods
with similar attack capabilities under non-targeted attacks.
Defense: We test seven defense methods, including Feature Specifically, when our attack capability is comparable to
Squeezing (Fea. Squ.) [62], Label smoothing(Lab. Smo.) [63], VMIFGSM and VNIFGSM. Compared with VMIFGSM, our
Engstrom [64], Salman [65], Singh [66], Liu [67], and perceptibility drops by 24.08 for L2 , decreases by 13.91 for
Shan [68]. The Fea. Squ., Engstrom, and Lab. Smo. methods Low fre, increases by 0.04 for SSIM, and increases by 0.69
utilize ResNet50 as the target model. Salman’s method uses for PSNR. Table II demonstrates the perceptibility of the
WideResNet50 as the target model. Singh utilizes ViT, and Liu adversarial examples generated by nine attack methods with
employs ConvNeXt as the target model. Shan’s target model similar attack capabilities under targeted attack mode. This
is the same as work [68]. table shows that our method achieves the best perceptibility
among the eight baseline methods with comparable attack
Parameter: We set the perturbation constraint for all attack capabilities, while SINIFGSM performs the worst. Notably,
methods, α = 1/500 or 1/255, = 16/255, µ = 1, N = 5, compared to VNIFGSM, our method improves the PSNR
w = 2 or 4. metric by 2.12 and reduces the SSIM by 0.088 and Low fre
8
TABLE II: Targeted Attack: Perceptive measure (Gradient substitute model: ResNet50, CAMs substitute models: WideResNet101,
Inception, and ResNet34, Target model: ResNet50, = 16/255, α = 1/500, λ = 0.75, w = 4).
Attack
PGD [60] TIFGSM [53] DIFGSM [56] NIFGSM [54] MIFGSM [53] SINIFGSM [54] VMIFGSM [55] VNIFGSM [55] OUR
Metric
PSNR 29.53 28.46 29.28 25.40 25.28 25.25 28.44 28.26 30.38
MSE 0.0012 0.0015 0.0012 0.0029 0.0030 0.0030 0.0015 0.0016 0.0010
L2 178.01 226.04 187.76 438.01 450.13 453.71 224.41 233.84 149.55
L∞ 0.2797 0.2865 0.2818 0.3070 0.3064 0.3081 0.2875 0.2860 0.2794
Low fre 70.72 156.72 79.58 189.33 196.84 213.69 111.75 116.93 64.48
SSIM 0.8067 0.8514 0.8017 0.6092 0.6029 0.6195 0.7826 0.7737 0.8616
AASR 39% 57% 58% 64% 67% 65% 65% 65% 65%
(a) AASR (b) CIEDE2000 (c) Low fre (d) L2
(e) L∞ (f) MSE (g) PSNR (h) SSIM
Fig. 3: Non-targetrd Attack: Perceptive measure (Gradient substitute model: ResNet50, CAMs substitute models: WideResNet101,
Inception, and ResNet34, = 16/255, α = 1/255, λ = 0.75, w = 2).
by 52.45. Compared to SINIFGSM, our method improves the the four metrics of MSE, PSNR, SSIM, and L2 , the visual
PSNR metric by 5.13 and reduces the L2 by 304.16 and perceptibility of SINIFGSM is the worst, and the perturbation
Low fre by 149.21. is the largest, while our perceptibility is between the above-
mentioned methods. Due to the computational complexity
To further analyze the perceptibility of adversarial exam- involved in achieving comparable attack capability to the most
ples under non-targeted attacks, we control the attack ca- vigorous baseline attack under targeted attacks, we do not
pability of our method to be similar to the most vigorous constrain our method to have a similar attack capability in
VMIFGSM by adjusting the value of λ and evaluate the showcasing the changes in the perceptibility of adversarial
perceptibility of adversarial examples at different epochs, as examples. For the changes in the perceptibility of adversarial
shown in Fig. 3. From Fig. 3(a), it can be seen that the examples under targeted attacks, please refer to the Fig. 2 in
VMIFGSM and our method have the most vigorous attack the appendix.
capability, while the PGD and TPGD have weaker attack
capabilities. By analyzing Fig. 3(b) to 3(h), it can be observed In conclusion, our method exhibits good stealthiness under
that the perceptibility of the ten methods gradually decreases as the non-targeted and targeted attacks compared to the baseline
the number of epochs increases and the perturbation increases. methods when they have similar attack effectiveness. This is
Among the nine baseline methods, the TPGD generates adver- attributed to our utilization CAM is the weighting factor for
sarial examples with the best visual perceptibility and lowest perturbations, which avoids adding perturbations in unimpor-
perturbation, followed by the PGD and DIFGSM. Regarding tant regions.
9
(a) AlexNet (b) EfficientNet (c) Inception (d) MobileNet (e) ResNet18 (f) ConvNeXt
(g) ResNet34 (h) ResNet50 (i) VGG16 (j) WideResNet50 (k) WideResNet101 (l) RegNet
Fig. 4: Non-targeted attack: attack success rate (Gradient substitute model: ResNet50, CAMs substitute models: WideResNet101,
(a) AlexNet (b) EfficientNet (c) Inception (d) MobileNet (e) ResNet18 (f) ConvNeXt
(g) ResNet34 (h) ResNet50 (i) VGG16 (j) WideResNet50 (k) WideResNet101 (l) RegNet
Fig. 5: Targeted attack: the attack success rate that can mislead the gradient substitution model into classifying as a specified
label and can also mislead the targeted model (CAMs substitute models: WideResNet101, Inception, and ResNet34, = 16/255,
α = 1/500, λ = 0.75, w = 4).
C. Attack Performance Analysis To demonstrate this ability clearly, we show the attack
ability of adversarial examples generated at the 10th epoch
Fig. 4 demonstrates the impact of adversarial examples in non-targeted attack, as shown in Table III. From the table,
generated at different epochs on the performance of the target it can be seen that our method has a significantly better attack
model under non-targeted attacks. From Fig. 4(a) to 4(l), ability than nine baseline methods. Compared with the most
it can be seen that with the increase of attack epochs, the substantial attack method VMIFGSM, our method improves
attack ability of adversarial examples generated by ten attack the average attack ability against 13 target models by 11.69%,
methods will gradually increase until convergence. Compared and compared with the most perceptible attack method TPGD,
with the nine baseline methods, our method has the most it improves the average attack ability by 37.15%. Compared to
vigorous attack ability, followed by VMIFGSM. Specifically, VMIFGSM, our method achieves 17% and 15% higher attack
our method has a faster convergence rate. Its attack ability success rates against the EfficientNet and RegNet.
can converge to the level of the most powerful attack method
around the 20th epoch, reducing about ten epochs compared Fig. 5 illustrates the impact of adversarial examples gen-
with the baseline methods. At the same time, our method erated at different epochs on the performance of the target
has a significantly better attack ability than the nine baseline model under targeted attacks. As the number of attack epochs
methods in the first 20 epochs. This effect is practical for all increases, the attack capabilities of the nine attack methods
12 target models. gradually increase until convergence. Our method consistently
10
(a) Robust: Non-targeted (b) Robust: Targeted (c) Ave. robust: Non-targeted (d) Ave. robust: Targeted
Fig. 6: Non-targeted attack and targeted attack: robustness analysis (Gradient substitute model: ResNet50, CAMs substitute
models: WideResNet101, Inception, and ResNet34, Target model: ResNet50, Non-targeted attack: w=2, Targeted attack: w=4).
(a) w = 3: Non-targeted (b) λ=0.75: Non-targeted (c) w = 3: Targeted (d) λ=0.75: Targeted
Fig. 7: Non-targeted attack and targeted attack: the influence of λ and w on the attack capability of adversarial examples (Gradient
substitute model: ResNet50, CAMs substitute models: WideResNet101, Inception, and ResNet34, = 16/255, α = 1/255).
exhibits significant advantages over the eight baseline meth- targeted attacks, as shown in Table V. The table shows that
ods in attack capability, convergence speed, and low-iteration compared to the baseline methods, our method has the most
attack capability in targeted attack. The attack capabilities of significant impact on the robustness under non-targeted and
the baseline methods do not increase significantly in the first targeted attacks. Notably, robustness is most severely affected
20 epochs. Our method significantly eight baseline methods for defense methods such as Fea. Squ. [62] and Lab. Smo. [63],
regarding attack capability against the AlexNet, EfficientNet, followed by Shan [67]’s adversarial training method, while
VGG16, ConvNeXt, ResNet18, WideResNet101, and Mo- the impact on Engstrom [64]’s defense is relatively weaker.
bileNet. Although our method exhibits a decrease in attack capability
compared to the undefended scenario, it still outperforms
To demonstrate this ability clearly, we show the attack the baseline methods. While specific baseline methods show
ability of adversarial examples generated at the 11th epoch increased attack capability against defense methods compared
in the targeted attack, as shown in Table IV. Table IV presents to the undefended model, they have already reached their
the attack capabilities of our method and the baseline methods limits. Thus, it can be inferred that our method exhibits a
in targeted attack mode. As the number of attack epochs certain level of evasion capability against defense methods.
increases, the attack capabilities of the nine methods gradually
increase until convergence. Similar to non-targeted attacks, our
method exhibits the fastest convergence and the most vigorous Fig. 6 illustrates the relationship between perceptibility
attack capability, significantly outperforming the eight baseline changes and the model’s robustness with defense methods. We
methods at low attack epochs. control the transformation parameter λ to achieve variations in
the perceptibility of adversarial examples, where larger values
In summary, our method outperforms the baseline methods result in more significant perturbations and poorer image
with the fastest convergence speed and most vital attack capa- perceptibility. Fig. 6(a) and 6(b) depict the changes in the
bility, significantly surpassing the baseline methods, especially robustness of the model with defense methods for non-targeted
at low attack epochs. and targeted attacks. These subplots show that as image
perceptibility decreases, the robustness of models with defense
D. Robustness Analysis methods also gradually decreases. Fig. 6(c) and 6(d) present
the average robustness of various defense methods, where
To evaluate the evasion capability of our method against image perceptibility is evaluated using λ and the SSIM. These
different defense methods, we present the impact of various subplots demonstrate that as image perceptibility decreases,
attack methods on the robustness under the non-targeted and the robustness of models with defense methods also decreases.
11
TABLE III: Non-targeted Attack: the success rate of non-targeted attacks in the 10th epoch (Gradient substitute model: ResNet50,
CAMs substitute models: WideResNet101, Inception, and ResNet34, = 16/255, α = 1/500, epoch = 10, λ = 0.75, w = 2).
Attack
SVRE PGD TPGD DIFGSM MIFGSM NIFGSM TIFGSM SINIFGSM VNIFGSM VMIFGSM OUR
Model
ResNet18 0.15 0.44 0.26 0.50 0.47 0.52 0.38 0.50 0.55 0.57 0.73
ResNet34 0.10 0.37 0.20 0.51 0.48 0.52 0.43 0.48 0.63 0.63 0.77
ResNet50 0.16 1.00 0.63 1.00 1.00 1.00 0.99 1.00 1.00 1.00 1.00
AlexNet 0.08 0.39 0.28 0.42 0.33 0.29 0.28 0.36 0.31 0.33 0.41
MobileNet 0.18 0.50 0.36 0.53 0.51 0.47 0.42 0.57 0.58 0.63 0.72
WideResNet50 0.14 0.48 0.33 0.54 0.50 0.51 0.52 0.57 0.68 0.70 0.83
WideResNet101 0.08 0.30 0.25 0.38 0.42 0.41 0.33 0.41 0.53 0.56 0.73
VGG16 0.43 0.16 0.33 0.45 0.42 0.43 0.41 0.49 0.59 0.61 0.74
Inception 0.23 0.10 0.19 0.28 0.26 0.28 0.25 0.26 0.37 0.40 0.48
EfficientNet 0.35 0.11 0.30 0.38 0.39 0.40 0.33 0.40 0.49 0.51 0.68
ConvNeXt 0.28 0.16 0.25 0.32 0.29 0.28 0.27 0.28 0.33 0.36 0.48
ViT 0.15 0.05 0.11 0.14 0.14 0.17 0.17 0.15 0.23 0.23 0.33
RegNet 0.42 0.18 0.33 0.51 0.53 0.51 0.41 0.48 0.62 0.60 0.75
TABLE IV: Targeted Attack: the attack success rate in the 11th epoch that can mislead the gradient substitution model into
classifying as a specified label and can also mislead the targeted model (Gradient substitute model: ResNet50, CAMs substitute
models: WideResNet101, Inception, and ResNet34, = 16/255, α = 1/500, epoch = 11, λ = 0.75, w = 4).
Attack
PGD [52] DIFGSM [56] MIFGSM [53] NIFGSM [54] TIFGSM [53] SINIFGSM [54] VNIFGSM [55] VMIFGSM [55] OUR
Model
ResNet18 0.3667 0.4333 0.3750 0.3500 0.2750 0.3833 0.4417 0.4583 0.7000
ResNet34 0.2750 0.2667 0.3417 0.3583 0.2750 0.3250 0.4417 0.4583 0.6333
ResNet50 0.5750 0.7333 0.9167 0.9000 0.7583 0.8167 0.9667 0.9417 0.9750
AlexNet 0.3583 0.4000 0.3083 0.3000 0.2833 0.3167 0.2917 0.2750 0.6500
MobileNet 0.4167 0.4417 0.3750 0.4083 0.3833 0.3917 0.5167 0.4917 0.7250
WideResNet50 0.2667 0.3917 0.3833 0.3917 0.3750 0.4000 0.5333 0.4750 0.6667
WideResNet101 0.2583 0.3167 0.2917 0.3417 0.2583 0.3083 0.3083 0.3417 0.5333
VGG16 0.4083 0.4417 0.4333 0.4417 0.3917 0.4000 0.5750 0.5167 0.9917
Inception 0.2500 0.2667 0.2167 0.2417 0.2333 0.2250 0.3417 0.3250 0.4500
EfficientNet 0.3417 0.3000 0.3667 0.3417 0.2917 0.3083 0.3917 0.3750 0.5917
ConvNeXt 0.0000 0.0700 0.0900 0.10 0.0600 0.1000 0.1000 0.1300 0.3700
ViT 0.4800 0.2600 0.5600 0.5300 0.0900 0.4500 0.4100 0.4600 0.6800
RegNet 0.1400 0.1800 0.5100 0.4600 0.1100 0.2400 0.2800 0.3000 0.6500
Overall, as the perceptual quality of images deteriorates, the capability of generating adversarial examples increases with
robustness of models with defense methods decreases for both the increase of both parameters, and the change in the value
non-targeted and targeted attacks. of w has a more significant effect on the attack capability of
generating adversarial examples. Moreover, compared with the
In conclusion, our method exhibits a certain level of non-targeted attack mode, the parameter change significantly
evasion against defense methods in non-targeted and targeted impacts the attack capability of generating adversarial exam-
attacks, making it valuable for evaluating the robustness of ples in the targeted attack.
models.
To validate the rationality of using CAM as perturbation
E. Ablation Study weights, we demonstrate the attack capabilities of generating
adversarial examples using different perturbation weighting
Fig. 7 analyzes the influence of two parameters, λ and methods, as shown in Table VI. Fix indicates adding a fixed
w, on our method’s attack capability of generating adversarial perturbation value to all pixels. Uniform represents adding per-
examples. It can be seen from the figure that the attack turbations that follow a uniform distribution. Gauss represents
12
TABLE V: Robustness analysis (accuracy of models with defense methods) (Gradient substitute model: ResNet50, CAMs
substitute models: WideResNet101, Inception, and ResNet34, Non-targeted attack: w=2, Targeted attack: w=4).
Non-targeted Attack Targeted Attack

Method
Attack Fea. Squ. Lab. Smo. Engstrom Salman Singh Liu Shan Fea. Squ. Lab. Smo. Engstrom Salman Singh Liu Shan
BENIGN —— —— 61% 65% 68% 73% —— —— —— 61% 65% 68% 73% ——
DIFGSM 37% 47% 59% 59% 64% 67% 58% 45% 54% 59% 59% 64% 67% 52%
MIFGSM 32% 28% 61% 60% 64% 67% 58% 40% 41% 58% 59% 64% 65% 52%
NIFGSM 36% 27% 61% 60% 64% 68% 47% 41% 38% 58% 59% 64% 66% 41%
PGD 47% 52% 60% 59% 64% 68% 66% 51% 63% 58% 59% 65% 66% 42%
SINIFGSM 28% 41% 59% 59% 64% 66% 64% 45% 49% 58% 60% 63% 66% 52%
TIFGSM 43% 53% 62% 59% 64% 68% 64% 44% 58% 58% 59% 63% 66% 50%
TPGD 46% 61% 60% 60% 64% 68% 55% ——
VNIFGSM 21% 23% 61% 60% 64% 67% 60% 31% 27% 58% 61% 63% 67% 46%
VMIFGSM 23% 22% 61% 60% 64% 66% 45% 33% 24% 59% 61% 63% 67% 48%
OUR 16% 12% 61% 59% 64% 66% 43% 25% 21% 57% 61% 63% 65% 39%
TABLE VI: CAM effectiveness analysis (Gradient substitute model: ResNet50, CAMs substitute models: WideResNet101,
Inception, and ResNet34, = 16/255, α = 1/500, λ = 0.75, Non-targeted attack: w=2, Targeted attack: w=4).
Non-targeted Attack
Model
ResNet18 ResNet34 ResNet50 Alexnet MobileNet WideResNet50 WideResNet101 VGG16 Inception EfficientNet ConvNeXt ViT RegNet
Method
Uniform 0.43 0.50 1.00 0.00 0.00 0.00 0.00 0.45 0.00 0.00 0.29 0.16 0.50
Gauss 0.23 0.17 0.25 0.00 0.00 0.00 0.00 0.31 0.00 0.00 0.24 0.08 0.28
Clam [69] 0.51 0.52 0.98 0.00 0.01 0.00 0.00 0.52 0.00 0.00 0.31 0.19 0.47
Fix [54] 0.55 0.63 1.00 0.31 0.58 0.68 0.53 0.59 0.37 0.49 0.33 0.23 0.62
OUR 0.73 0.77 1.00 0.41 0.72 0.83 0.73 0.74 0.48 0.68 0.48 0.33 0.75
Targeted Attack
Model
Resnet18 Resnet34 Resnet50 Alexnet MobileNet WideResNet50 WideResNet101 VGG16 Inception EfficientNet ConvNeXt ViT RegNet
Method
Uniform 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Gauss 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.00
CALM [69] 0.67 0.28 0.29 0.58 0.28 0.20 0.08 0.27 0.11 0.18 0.08 0.29 0.19
Fix [54] 0.44 0.44 0.97 0.29 0.52 0.53 0.31 0.58 0.34 0.39 0.10 0.41 0.28
OUR 0.70 0.63 0.98 0.65 0.73 0.67 0.53 0.99 0.45 0.59 0.37 0.68 0.65
TABLE VII: Attack Success Ratio of different modules (Target adversarial examples compared to the four perturbation-adding
model: WideResNet101, = 16/255, α = 1/500, epoch = 7, methods. The Fix method follows closely, while Gauss and
λ = 0.75, Non-targeted attack: w=2, Targeted attack: w=4). Uniform exhibit the weakest attack capability and transfer-
ability. The adversarial example generated by the CALM [69]
Module method possesses certain attack capabilities but weaker trans-
Module 1 Module 2 Module 3 ALL ferability. This validates the rationality of using CAM as
Mode
Non-targeted attack 0.5333 0.525 0.6583 0.7000
perturbation weights.
Targeted attack 0.4583 0.3667 0.5250 0.5333 To validate the effectiveness of our method, we analyze
the attack capability of adversarial examples under different
modules, as shown in Table VII. Module 1 only includes
the first term of the loss function, Module 2 includes both
adding perturbations that follow the Gaussian distribution. the first and second terms of the loss function, Module 3
CALM [69] represents the CALM method based on attribute includes the first term of the loss function and uses class
attribution. The table shows that our method exhibits the activation mapping scores as perturbation weights, and ALL
strongest attack capability and transferability for generating includes all modules. The table shows that when generating
13
adversarial examples under both targeted and non-targeted [5] Y. Shen, A. Sowmya, Y. Luo, X. Liang, D. Shen, and J. Ke, “A
attack modes, the attack capability is the weakest when only federated learning system for histopathology image analysis with an
using Module 1, the strongest when using all modules, and orchestral stain-normalization GAN,” IEEE Transactions on Medical
Imaging (TMI), vol. 42, no. 7, pp. 1969–1981, 2023.
the second strongest when only using Module 3. Comparing
[6] R. Yasrab, Z. Fu, H. Zhao, L. H. Lee, H. Sharma, L. Drukker, A. T.
the performance of Module 1 and Module 3, it can be seen Papageorgiou, and J. A. Noble, “A machine learning method for au-
that the addition of Module 3 significantly improves the attack tomated description and workflow analysis of first trimester ultrasound
capability of adversarial examples. The combination of Module scans,” IEEE Transactions on Medical Imaging (TMI), vol. 42, no. 5,
2 and Module 3 further enhances the attack capability of pp. 1301–1313, 2023.
adversarial examples. [7] Y. Ding, Q. Li, Z. Li, and Y. Li, “Multi-modal medical image segmen-
tation with deep learning: a review,” IEEE Transactions on Medical
Regarding the analysis of computational costs, please refer Imaging (TMI), vol. 41, no. 1, pp. 20–36, 2022.
to Appendix Table III. [8] S. Chang, Y. Gao, M. J. Pomeroy, T. Bai, H. Zhang, S. Lu, P. J.
Pickhardt, A. Gupta, M. Reiter, E. S. Gould, and Z. Liang, “Exploring
dual-energy CT spectral information for machine learning-driven lesion
VI. C ONCLUSION diagnosis in pre-log domain,” IEEE Transactions on Medical Imaging
(TMI), vol. 42, no. 6, pp. 1835–1845, 2023.
Although many studies have focused on the issue of model [9] S. Zhang, Y. Li, D. Yang, Q. Wang, and Y. Liu, “Deep learning-based
transferability in attacks, there still exist problems of poor malware detection with improved robustness,” in Proceedings of the
2022 ACM SIGSAC Conference on Computer and Communications
stealthiness and low attack effectiveness in generating adver- Security (CCS), 2022, pp. 1021–1036.
sarial examples. To address these issues, we propose a class
[10] X. Liu, Z. Zhang, X. Wang, B. Liu, F. Li, and X. Xie, “Detecting
activation mapping ensemble attack. This method considers stealthy adversarial examples with deep learning models,” in Pro-
each pixel feature’s role in the image, using the CAMs to im- ceedings of the 2023 ACM SIGMOD International Conference on
prove attack performance. We attack the ResNet50 model and Management of Data (SIGMOD), 2023, pp. 1125–1140.
test the transferability of adversarial examples on models such [11] X. Wu, S. Liu, X. Hu, J. Xu, and W. Wang, “Robust deep learning
as WideResNet101, Inception, and ResNet34. Experimental for intrusion detection: A review,” IEEE Transactions on Information
results show that our method can generate adversarial examples Forensics and Security (TIFS), vol. 17, no. 9, pp. 2292–2307, 2022.
with better transferability and perform better under low-round [12] H. Chen, Q. Zhang, Y. Liu, Y. Liu, X. Yin, and W. Wang, “Adversarial
training of deep learning models for malware detection: A case study,”
attacks. ACM Transactions on Privacy and Security (TOPS), vol. 26, no. 1, pp.
1–26, 2023.
In the future, we will continue to focus on research on the
[13] M. Gupta and P. Agrawal, “Compression of deep learning models for
transferability of model ensemble attacks, aiming to improve text: A survey,” in Proceedings of the ACM Transactions on Knowledge
the transferability of attack methods significantly. We will also Discovery from Data (TKDD), 2022, pp. 61:1–61:55.
try more attack methods and techniques, including combining [14] W. Zhang, J. Liu, X. Li, and H. Chen, “A survey of deep learning
reinforcement learning and meta-learning, to improve attack techniques in natural language processing,” in Proceedings of the IEEE
performance. Transactions on Neural Networks and Learning Systems (TNNLS),
2023, pp. 1–15.
[15] H. Tran, D. Lu, and G. Zhang, “Exploiting the local parabolic
ACKNOWLEDGMENT landscapes of adversarial losses to accelerate black-box adversarial
attack,” in Proceedings of the European Conference on Computer Vision
This research is supported by the National Natural Sci- (ECCV), 2022, pp. 317–334.
ence Foundation of China (NSFC) [grant numbers 62172377, [16] Y. Yang, P. Liu, X. Zhang, and Q. Huang, “Universal adversarial training
61872205], the Shandong Provincial Natural Science Founda- via wasserstein distance,” in Proceedings of the IEEE/CVF Conference
tion [grant number ZR2019MF018], and the Startup Research on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 13 702–
13 711.
Foundation for Distinguished Scholars No. 202112016. This
[17] B. Chen, Y. Feng, T. Dai, J. Bai, Y. Jiang, S. Xia, and X. Wang, “Adver-
should be a simple paragraph before the References to thank sarial examples generation for deep product quantization networks on
those individuals and institutions who have supported your image retrieval,” IEEE Transactions on Pattern Analysis and Machine
work on this article. Intelligence (TPAMI), vol. 45, no. 2, pp. 1388–1404, 2023.
[18] G. Dhillon, J. Rajasegaran, P. K. Mookiah, C. P. Lim, and S. Ra-
man, “Robustnessand adversarial training for object detection via self-
supervised learning,” in Proceedings of the 30th ACM International
Conference on Multimedia (ICM), 2022, pp. 1627–1635.
R EFERENCES [19] A. Banerjee, U. Bhattacharya, and A. Bera, “Learning unseen emotions
[1] Z. Deng, J. Shi, and J. Zhu, “Neuralef: Deconstructing kernels by deep from gestures via semantically-conditioned zero-shot perception with
neural networks,” in Proceedings of the International Conference on adversarial autoencoders,” in Proceedings of the AAAI Conference on
Machine Learning (ICML), 2022, pp. 4976–4992. Artificial Intelligence (AAAI), 2022, pp. 3–10.
[2] Z. Huang, Y. Wang, C. Li, and H. He, “Going deeper into permutation- [20] Z. Wei, J. Chen, Z. Wu, and Y. Jiang, “Boosting the transferability of
sensitive graph neural networks,” in Proceedings of the International video adversarial examples via temporal translation,” in Proceedings of
Conference on Machine Learning (ICML), 2022, pp. 9377–9409. the AAAI Conference on Artificial Intelligence (AAAI), 2022, pp. 2659–
2667.
[3] F. Brau, G. Rossolini, A. Biondi, and G. C. Buttazzo, “On the min-
imal adversarial perturbation for deep neural networks with provable [21] K. Li, Y. Liu, X. Ao, and Q. He, “Revisiting graph adversarial attack
estimation error,” IEEE Transactions on Pattern Analysis and Machine and defense from a data distribution perspective,” in Proceedings of the
Intelligence (TPAMI), vol. 45, no. 4, pp. 5038–5052, 2023. International Conference on Learning Representations (ICLR), 2023.
[4] S. Yang, E. Yang, B. Han, Y. Liu, M. Xu, G. Niu, and T. Liu, [22] L. Pan, C. Hang, A. Sil, and S. Potdar, “Improved text classification via
“Estimating instance-dependent bayes-label transition matrix using a contrastive adversarial training,” in Proceedings of the AAAI Conference
deep neural network,” in Proceedings of the International Conference on Artificial Intelligence (AAAI), 2022, pp. 11 130–11 138.
on Machine Learning (ICML), 2022, pp. 25 302–25 312. [23] B. Wang, L. Zhang, D. Zhou, Y. Cao, and J. Ding, “Neural topic
14
modeling based on cycle adversarial training and contrastive learning,” vances in Neural Information Processing Systems (NeurIPS), 2012, pp.
in Proceedings of the Annual Meeting of the Association for Computa- 1097–1105.
tional Linguistics (ACL), 2023, pp. 9720–9731. [41] K. Simonyan and A. Zisserman, “Very deep convolutional networks for
[24] Y. Li, Y. Sun, Z. Xu, J. Cao, Y. Li, R. Li, H. Chen, S. Cheung, Y. Liu, large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
and Y. Xiao, “Regexscalpel: Regular expression denial of service (redos) [42] M. Tan and Q. Le, “Efficientnetv2: Smaller models and faster training,”
defense by localize-and-fix,” in Proceedings of the USENIX Security in Proceedings of the International Conference on MachineLearning
Symposium (USENIX Security), 2022, pp. 4183–4200. (ICML), 2021, pp. 10 096–10 106.
[25] L. Chen, Y. Zhang, Y. Song, L. Liu, and J. Wang, “Self-supervised [43] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
learning of adversarial example: Towards good generalizations for recognition,” in Proceedings of the IEEE/CVF Conference on Computer
deepfake detection,” in Proceedings of the IEEE/CVF Conference on Vision and Pattern Recognition (CVPR), 2016, pp. 770–778.
Computer Vision and Pattern Recognition (CVPR), 2022, pp. 18 689–
18 698. [44] S. Zagoruyko and N. Komodakis, “Wide residual networks,” in Proceed-
ings of the British Machine Vision Conference 2016 (BMVC), 2016, pp.
[26] H. C. Moon, S. R. Joty, and X. Chi, “Gradmask: Gradient-guided token 1–12.
masking for textual adversarial example detection,” in Proceedings of
the ACM Knowledge Discovery and Data Mining (SIGKDD), 2022, pp. [45] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. Alemi, “Rethinking the
3603–3613. inception architecture for computer vision,” in Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition (CVPR), 2016,
[27] E. Tavan and M. Najafi, “Marsan at semeval-2023 task 10: Can pp. 2818–2826.
adversarial training with help of a graph convolutional network detect
[46] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mo-
explainable sexism?” in Proceedings of the Annual Meeting of the
bilenetv2: Inverted residuals and linear bottlenecks,” in Proceedings of
Association for Computational Linguistics (ACL), 2023, pp. 1011–1020.
the IEEE/CVF Conference on Computer Vision and Pattern Recognition
[28] V. Bhat, P. Jyothi, and P. Bhattacharyya, “Adversarial training for low- (CVPR), 2018, pp. 4510–4520.
resource disfluency correction,” in Proceedings of the Annual Meeting of
[47] Z. Liu, H. Mao, C.-Y. Wu, C. Feichtenhofer, T. Darrell, and S. Xie, “A
the Association for Computational Linguistics (ACL), 2023, pp. 8112–
convnet for the 2020s,” in Proceedings of the IEEE/CVF Conference on
8122.
Computer Vision and Pattern Recognition (CVPR), 2022, pp. 11 976–
[29] E. Altinisik, H. Sajjad, H. T. Sencar, S. Messaoud, and S. Chawla, 11 986.
“Impact of adversarial training on robustness and generalizability of
[48] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai,
language models,” in Proceedings of the Annual Meeting of the Asso-
T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al.,
ciation for Computational Linguistics (ACL), 2023, pp. 7828–7840.
“An image is worth 16x16 words: Transformers for image recognition
[30] Y. Li, Z. Li, Y. Gao, and C. Liu, “White-box multi-objective adversarial at scale,” arXiv preprint arXiv:2010.11929, 2020.
attack on dialogue generation,” in Proceedings of the Annual Meeting of [49] I. Radosavovic, R. P. Kosaraju, R. Girshick, K. He, and P. Dollár,
the Association for Computational Linguistics (ACL), 2023, pp. 1778– “Designing network design spaces,” in Proceedings of the IEEE/CVF
1792. Conference on Computer Vision and Pattern Recognition (CVPR), 2020,
[31] S. Shan, W. Ding, E. Wenger, H. Zheng, and B. Y. Zhao, “Post-breach pp. 10 428–10 436.
recovery: protection against white-box adversarial examples for leaked [50] I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing
DNN models,” in Proceedings of the ACM Conference on Computer adversarial examples,” in Proceedings of the 3rd International Confer-
and Communications Security (CCS), 2022, pp. 2611–2625. ence on Learning Representations (ICLR), 2015.
[32] C. Zhang, P. Benz, A. Karjauv, J. Cho, K. Zhang, and I. S. Kweon, [51] A. Kurakin, I. J. Goodfellow, and S. Bengio, “Adversarial examples in
“Investigating top-k white-box and transferable black-box attack,” in the physical world,” in Proceedings of the 5th International Conference
Proceedings of the IEEE/CVF Conference on Computer Vision and on Learning Representations (ICLR), 2017.
Pattern Recognition (CVPR), 2022, pp. 15 064–15 073.
[52] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards
[33] D. Lee, S. Moon, J. Lee, and H. O. Song, “Query-efficient and deep learning models resistant to adversarial attacks,” in Proceedings of
scalable black-box adversarial attacks on discrete sequential data via the 6th International Conference on Learning Representations (ICLR),
bayesian optimization,” in Proceedings of the International Conference 2018.
on Machine Learning (ICML), 2022, pp. 12 478–12 497.
[53] Y. Dong, F. Liao, T. Pang, H. Su, J. Zhu, X. Hu, and J. Li, “Boosting
[34] Y. Dai, H. Luo, and L. Chen, “Follow-the-perturbed-leader for adver- adversarial attacks with momentum,” in Proceedings of the IEEE/CVF
sarial markov decision processes with bandit feedback,” in Proceedings Conference on Computer Vision and Pattern Recognition (CVPR), 2018,
of the Advances in Neural Information Processing Systems (NeurIPS), pp. 9185–9193.
2022.
[54] J. Lin, C. Song, K. He, L. Wang, and J. E. Hopcroft, “Nesterov
[35] Y. Xue and U. Roshan, “Accuracy of white box and black box accelerated gradient and scale invariance for adversarial attacks,” in
adversarial attacks on a sign activation 01 loss neural network en- Proceedings of the 8th International Conference on Learning Repre-
semble,” in Proceedings of the International Conference on Learning sentations (ICLR), 2020.
Representations (ICLR), 2023.
[55] X. Wang and K. He, “Enhancing the transferability of adversarial attacks
[36] J. Liu, Y. Kang, D. Tang, K. Song, C. Sun, X. Wang, W. Lu, and through variance tuning,” in Proceedings of the IEEE/CVF Conference
X. Liu, “Order-disorder: Imitation adversarial attacks for black-box on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 1924–
neural ranking models,” in Proceedings of the ACM Conference on 1933.
Computer and Communications Security (CCS), 2022, pp. 2025–2039.
[56] C. Xie, Z. Zhang, Y. Zhou, S. Bai, J. Wang, Z. Ren, and A. L. Yuille,
[37] N. Aafaq, N. Akhtar, W. Liu, M. Shah, and A. Mian, “Language “Improving transferability of adversarial examples with input diversity,”
model agnostic gray-box adversarial attack on image captioning,” IEEE in Proceedings of the IEEE/CVF Conference on Computer Vision and
Transactions on Information Forensics and Security (TIFS), vol. 18, pp. Pattern Recognition (CVPR), 2019, pp. 2730–2739.
626–638, 2023.
[57] X. Wang, X. He, J. Wang, and K. He, “Admix: Enhancing the trans-
[38] F. Tramèr, A. Kurakin, N. Papernot, I. Goodfellow, D. Boneh, and ferability of adversarial attacks,” in Proceedings of the IEEE/CVF In-
P. McDaniel, “Ensemble adversarial training: Attacks and defenses,” ternational Conference on Computer Vision (CVPR), 2021, pp. 16 158–
arXiv preprint arXiv:1705.07204, 2017. 16 167.
[39] G. Severi, J. Meyer, S. Coull, and A. Oprea, “{Explanation-Guided} [58] Y. Liu, X. Chen, C. Liu, and D. Song, “Delving into transferable
backdoor poisoning attacks against malware classifiers,” in Proceedings adversarial examples and black-box attacks,” in Proceedings of the 5th
of the USENIX security symposium (USENIX security), 2021, pp. 1487– International Conference on Learning Representations (ICLR), 2017.
1504. [59] Y. Xiong, J. Lin, M. Zhang, J. E. Hopcroft, and K. He, “Stochastic
[40] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification variance reduced ensemble adversarial attack for boosting the adver-
with deep convolutional neural networks,” in Proceedings of the Ad- sarial transferability,” in Proceedings of the IEEE/CVF Conference on
15
TABLE I: Non-targeted Attack: attack success rate at different epoch (Gradient substitute model: ResNet50, CAMs substitute
models: WideResNet101, Inception, and ResNet34, Target model: ResNet50, = 16/255, α = 1/500, λ = 0.75, w = 2).
Attack
PGD TPGD DIFGSM MIFGSM NIFGSM TIFGSM SINIFGSM VNIFGSM VMIFGSM OUR
Epoch
1 0.25833 0.24167 0.25833 0.22500 0.23333 0.25000 0.25000 0.23333 0.22500 0.27500
3 0.40833 0.31667 0.38333 0.32500 0.31667 0.27500 0.33333 0.34167 0.32500 0.49167
5 0.43333 0.28333 0.45833 0.41667 0.38333 0.34167 0.39167 0.44167 0.46667 0.57500
7 0.45833 0.35000 0.50000 0.45833 0.45000 0.42500 0.49167 0.53333 0.51667 0.70000
9 0.47500 0.32500 0.56667 0.50000 0.48333 0.49167 0.51667 0.56667 0.59167 0.80000
10 0.48333 0.32500 0.54167 0.50000 0.50833 0.51667 0.56667 0.67500 0.70000 0.83333
12 0.45833 0.34167 0.56667 0.56667 0.57500 0.57500 0.64167 0.73333 0.76667 0.84167
14 0.52500 0.41667 0.58333 0.60833 0.64167 0.57500 0.68333 0.78333 0.78333 0.85833
16 0.52500 0.40000 0.63333 0.63333 0.68333 0.60000 0.75000 0.79167 0.80000 0.86667
18 0.50000 0.36667 0.63333 0.70000 0.71667 0.59167 0.81667 0.85000 0.83333 0.88333
20 0.52500 0.40000 0.70000 0.73333 0.74167 0.62500 0.85833 0.85833 0.86667 0.86667
22 0.55000 0.42500 0.63333 0.74167 0.75000 0.62500 0.85833 0.87500 0.85833 0.88333
24 0.53333 0.39167 0.65833 0.75833 0.75000 0.66667 0.88333 0.86667 0.88333 0.88333
26 0.57500 0.38333 0.73333 0.77500 0.77500 0.68333 0.89167 0.88333 0.88333 0.89167
Computer Vision and Pattern Recognition (CVPR), 2022, pp. 14 983– examples in the target attack mode converging to the target
14 992. label. Due to the weaker attack capability of the PGD method,
[60] H. Zhang, Y. Yu, J. Jiao, E. Xing, L. El Ghaoui, and M. Jordan, it did not converge to the target label even with an increased
“Theoretically principled trade-off between robustness and accuracy,” number of epochs.
in Proceedings of the International conference on machine learning
(ICML), 2019, pp. 7472–7482. Analyzing the perceptibility of nine attack methods under
[61] R. Fu, Q. Hu, X. Dong, Y. Guo, Y. Gao, and B. Li, “Axiom-based grad- targeted attack mode at different epochs with the same attack
cam: Towards accurate visualization and explanation of cnns,” arXiv
preprint arXiv:2008.02312, 2020.
capability involves a significant computational burden. To
facilitate computation, when analyzing the perceptibility of
[62] W. Xu, D. Evans, and Y. Qi, “Feature squeezing: Detecting adversarial
examples in deep neural networks,” arXiv preprint arXiv:1704.01155, adversarial examples at different epochs, we did not con-
2017. strain the attack effectiveness of the baseline method under
[63] C. Zhang, P. Jiang, Q. Hou, and Y. Wei, “Delving deep into label the targeted attack mode. Fig. 2 illustrates the variations in
smoothing,” IEEE Transactions on Medical Imaging (TMI), vol. 30, image perceptibility, measured by six distance metrics under
pp. 5984–5996, 2021. targeted attack at different epochs. Fig. 2(a) presents the attack
[64] L. Engstrom, A. Ilyas, and H. Salmans, “Robustness (python library),” capability of adversarial examples at different epochs. From
2019. [Online]. Available: https://github.com/MadryLab/robustness this figure, it is evident that the attack capability of our method
[65] H. Salman, A. Ilyas, and Engstrom, “Do adversarially robust imagenet gradually increases with the increase in epochs until it con-
models transfer better?” Proceedings of the Advances in Neural Infor-
mation Processing Systems (NeurIPS), vol. 33, pp. 3533–3545, 2020. verges. Moreover, our approach exhibits significantly higher
[66] N. D. Singh, F. Croce, and M. Hein, “Revisiting adversarial training
attack capability at low epochs than the baseline method.
for imagenet: architectures, training and generalization across threat Fig. 2(b) to 2(h) demonstrate the perceptibility of adversarial
models,” arXiv preprint arXiv:2303.01870, 2023. examples evaluated using different perceptual metrics. From
[67] C. Liu, Y. Dong, W. Xiang, X. Yang, H. Su, J. Zhu, Y. Chen, Y. He, these subfigures, it can be observed that the perceptibility of
H. Xue, and S. Zheng, “A comprehensive study on robustness of image adversarial examples generated by all attack methods gradually
classification models: Benchmarking and rethinking,” arXiv preprint converges as the number of epochs increases. Specifically,
arXiv:2302.14301, 2023.
before the attack effectiveness of adversarial examples con-
[68] S. Shan, E. Wenger, and B. Wang, “Gotta catch’em all: Using honeypots
to catch adversarial attacks on neural networks,” in Proceedings of
verges, our method exhibits lower perceptibility than baseline
the 2022 ACM SIGSAC Conference on Computer and Communications methods. This is because, at low epochs, even though our
Security (CCS), 2020, pp. 67–83. method focuses on adding perturbations in critical regions with
[69] J. M. Kim, J. Choe, Z. Akata, and S. J. Oh, “Keep calm and improve vi- weighted perturbations, the weighted perturbations are more
sual feature attribution,” in Proceedings of the IEEE/CVF International significant than the unweighted perturbations. As a result, our
Conference on Computer Vision (ICCV), 2021, pp. 8350–8360. method shows poor perceptibility compared to the baseline
methods in fewer epochs. Nevertheless, the attack capability
A PPENDIX of the baseline method is significantly lower than ours. As the
P ERCEPTUAL E VALUATION number of epochs increases, the baseline method adds minor
perturbations across the entire image. At lower epochs, there is
Fig. 1 extends DIFGSM, SINIFGSM, VMIFGSM, and not much change in perceptibility. However, the perceptibility
VNIFGSM, four baseline methods, to generate adversarial starts to deteriorate when the number of epochs reaches a
16
TABLE II: Targeted attack: attack success rate at different epoch (Gradient substitute model: ResNet50, CAMs substitute models:
WideResNet101, Inception, and ResNet34, Target model: ResNet50, = 16/255, α = 1/500, λ = 0.75, w = 4).
Attack
PGD [52] DIFGSM [56] MIFGSM [53] NIFGSM [54] TIFGSM [53] SINIFGSM [54] VNIFGSM [55] VMIFGSM [55] OUR
Epoch
1 0.0000 0.0083 0.0083 0.0083 0.0000 0.0083 0.0083 0.0083 0.0167
2 0.0167 0.0000 0.0250 0.0250 0.0167 0.0250 0.0250 0.0250 0.0250
3 0.0250 0.0167 0.0250 0.0250 0.0000 0.0167 0.0250 0.0250 0.0583
4 0.0417 0.0333 0.1250 0.0917 0.0333 0.0500 0.0500 0.0667 0.1667
5 0.0750 0.0750 0.1333 0.0917 0.0250 0.0417 0.0500 0.0583 0.1917
6 0.1000 0.0750 0.2667 0.2083 0.0500 0.1667 0.1333 0.1417 0.3000
7 0.1333 0.1083 0.2833 0.2083 0.0500 0.1333 0.1417 0.1417 0.3083
8 0.1250 0.1917 0.4417 0.3583 0.1250 0.2917 0.3083 0.3000 0.4167
9 0.1417 0.225 0.4333 0.3500 0.1167 0.2667 0.3083 0.3000 0.4750
10 0.1917 0.2333 0.5333 0.4917 0.2083 0.3833 0.3583 0.3667 0.6083
11 0.2333 0.2750 0.5250 0.4917 0.1833 0.3667 0.3667 0.3500 0.6417
13 0.2500 0.3250 0.7250 0.6333 0.2500 0.4167 0.4417 0.4333 0.7333
17 0.3250 0.4833 0.8750 0.8667 0.3833 0.6667 0.7083 0.7083 0.8833
19 0.3667 0.5167 0.9083 0.9167 0.4250 0.7083 0.7917 0.8167 0.9250
22 0.3333 0.6167 0.9583 0.9750 0.6167 0.8250 0.8833 0.8917 0.9667
25 0.3583 0.6750 0.9750 0.9833 0.6583 0.8583 0.9500 0.9417 0.9833
TABLE III: Average time cost (Gradient substitute model: ResNet50, CAMs substitute models: WideResNet101, Inception, and
ResNet34, = 16/255, α = 1/500, epoch = 11, λ = 0.75, Non-targeted attack: w=2, Targeted attack: w=4).
Model
DIFGSM [56] MIFGSM [53] NIFGSM [54] TIFGSM [53] SINIFGSM [54] VNIFGSM [55] VMIFGSM [55] OUR
Mode
Non-targeted Attack 0.942011s 0.540739s 0.505559s 1.055500s 1.464435s 1.648041s 1.642374s 1.393054s
Targeted Attack 1.197577s 0.765150s 0.808836s 1.273745s 3.828091s 2.452281s 2.695090s 1.629163s
TABLE IV: The rationale behind using separate substitute models for gradient and CAM computation under non-targeted attack.
Target model
ResNet18 ResNet34 ResNet50 Alexnet MobileNet WideResNet50 WideResNet101 VGG16 Inception Efficientnet ConvNeXt ViT RegNet
Method
Same substitute model 0.4750 0.5000 1.0000 0.0000 0.0000 0.0000 0.0000 0.4750 0.0000 0.0000 0.2917 0.1250 0.5167
Multiple substitute models 0.7300 0.7700 1.0000 0.4100 0.7200 0.8300 0.7300 0.7400 0.4800 0.6800 0.4800 0.3300 0.7500
certain threshold. Additionally, by analyzing the results after from the 3rd epoch, our attack capability surpasses the most
the 40th epoch, it can be concluded that our method exhibits robust method, VMIFGSM, by at least 10%. In particular, at
attack effectiveness comparable to the most vigorous attack, the 9th iteration, our method’s attack capability is improved
VMIFGSM. At the same time, the perceptibility falls between by 20.83%.
the optimal perceptual method and the most vigorous attack
method. This conclusion aligns with the findings in non- Table II shows the attack capabilities of our method and
targeted attacks. baseline methods in targeted attacks. As the number of attack
epochs increases, the attack capabilities of the nine attack
ATTACK P ERFORMANCE A NALYSIS methods gradually increase until convergence. Similar to non-
targeted attacks, our method converges the fastest and has
Table I displays the attack capabilities of our method and the strongest attack capability, significantly outperforming the
baseline methods in the non-targeted attack. As the number eight baseline methods at low attack epochs. Starting from the
of attack epochs increases, the attack capabilities of the nine 4th iteration, our attack capability surpasses VMIFGSM, the
attack methods gradually strengthen until convergence. Com- strongest method, by at least 10%. Particularly, at the 10th
pared to the baseline methods, our method exhibits the fastest iteration, our method’s attack capability increases by 24.16%
convergence and vigorous attack capability, significantly out- compared to VMIFGSM, and at the 11th iteration, it increases
performing the baseline methods at low attack epochs. Starting by 29.17%.
17
Fig. 1: Adversarial examples generated by our method and baseline methods under targeted attack (Target model: ResNet50,
Gradient substitute model: ResNet50, CAMs substitute models: WideResNet101, Inception, and ResNet34, = 16/255, α =
1/500, λ = 0.75, w = 4).
(a) AASR (b) CIEDE2000 (c) Low fre (d) L2
(e) L∞ (f) MSE (g) PSNR (h) SSIM
Fig. 2: Targetrd Attack: Perceptive measure (Gradient substitute model: ResNet50, CAMs substitute models: WideResNet101,
A BLATION S TUDY Table IV demonstrates the rationale for using different

substitute models for computing gradients and heatmaps. From
this table, it is evident that adversarial examples generated
To analyze the computational cost of generating adversarial using different substitute models exhibit stronger transfer-
examples using our method, we present the average time cost ability than adversarial examples generated using the same
for generating a single adversarial example under non-targeted substitute model. This validates the effectiveness of using
and targeted attack modes when different attack methods different substitute models for generating adversarial examples.
achieve comparable attack performance. The results are shown
in Table III. The table shows that generating adversarial ex-
amples for targeted attacks incurs higher time costs than non-
targeted attacks. This is because targeted attacks are more chal-
lenging and require multiple epochs. In both non-targeted and
targeted attack modes, the average time taken by our method
to generate each image falls between the fastest method,
MIFGSM, and the slowest method, SINIFGSM. Therefore,
although some additional time is involved in computing the
perturbation weights, it does not significantly exacerbate the
overall time cost. Our method exhibits higher time costs than
less performant attack methods such as DIFGSM, MIFGSM,
and NIFGSM. However, it is significantly more efficient than
most power attack methods like VNIFGSM and VMIFGSM.
18

VMIFGSM

Uploaded by

Copyright:

Available Formats

VMIFGSM

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

VMIFGSM

Uploaded by

Copyright:

Available Formats

Enhance Stealthiness and Transferability of

Adversarial Attacks with Class Activation Mapping

Hui Xia∗ Rui Zhang Zi Kang

Shuliang Jiang Shuo Xu

C. Theoretically Grounded Approach (TPGD)

first train substitute models (gradient substitute model and 1 X

(a) AASR (b) CIEDE2000 (c) Low fre (d) L2

(e) L∞ (f) MSE (g) PSNR (h) SSIM

Non-targeted Attack Targeted Attack

(a) AASR (b) CIEDE2000 (c) Low fre (d) L2

(e) L∞ (f) MSE (g) PSNR (h) SSIM

A BLATION S TUDY Table IV demonstrates the rationale for using different

You might also like