Diffusion Policy Attacker: Crafting Adversarial Attacks For Diffusion-Based Policies
Diffusion Policy Attacker: Crafting Adversarial Attacks For Diffusion-Based Policies
Diffusion Policy Attacker: Crafting Adversarial Attacks For Diffusion-Based Policies
Yongxin Chen
Georgia Institute of Technology
yongchen@gatech.edu
Abstract
Diffusion models (DMs) have emerged as a promising approach for behavior
cloning (BC). Diffusion policies (DP) based on DMs have elevated BC perfor-
mance to new heights, demonstrating robust efficacy across diverse tasks, coupled
with their inherent flexibility and ease of implementation. Despite the increas-
ing adoption of DP as a foundation for policy generation, the critical issue of
safety remains largely unexplored. While previous attempts have targeted deep
policy networks, DP used diffusion models as the policy network, making it in-
effective to be attacked using previous methods because of its chained structure
and randomness injected. In this paper, we undertake a comprehensive exami-
nation of DP safety concerns by introducing adversarial scenarios, encompass-
ing offline and online attacks, and global and patch-based attacks. We propose
DP-Attacker, a suite of algorithms that can craft effective adversarial attacks
across all aforementioned scenarios. We conduct attacks on pre-trained diffusion
policies across various manipulation tasks. Through extensive experiments, we
demonstrate that DP-Attacker has the capability to significantly decrease the suc-
cess rate of DP for all scenarios. Particularly in offline scenarios, DP-Attacker
can generate highly transferable perturbations applicable to all frames. Further-
more, we illustrate the creation of adversarial physical patches that, when ap-
plied to the environment, effectively deceive the model. Video results are put in:
https://sites.google.com/view/diffusion-policy-attacker.
1 Introduction
Behavior Cloning (BC) [39] is a pivotal area in robot learning: given an expert demonstration dataset,
it aims to train a policy network in a supervised approach. Recently, diffusion models [16, 46]
have become dominant in BC, primarily due to their strong capability in modeling multi-modal
distribution. The resulting policy learner, termed Diffusion Policy (DP) [9, 18], can generate the
action trajectory from a pure Gaussian noise conditioned on the input image(s). An increasing
number of works are adopting DP as an action decoder for BC across various domains such as robot
manipulation [12, 54, 7], long-horizon planning [34, 26] and autonomous driving [28].
Adversarial attack [30, 14] has been haunting deep neural networks (DNN) for a long time: a
small perturbation on the input image will fool the DNN into making wrong decisions. Despite the
remarkable success of diffusion policies in BC, their robustness under adversarial attacks [30, 14]
remains largely unexplored, posing a potential barrier and risk to their broader application. While it is
∗
indicates equal contribution. Correspondence to: ychen3302@gatech.edu
(a) Diffusion Policy Attacker (b) Adversarial Perturbation for DP (c) Adversarial Patch for DP
# #
𝛿 ← DP-Attacker (𝐼 , 𝜃) [online]
ker DP-Attacker (𝐼 𝒟 , 𝜃)
e ra
Hac 𝛿 # ← 𝛿 = DP-Attacker (𝐼 𝒟 , 𝜃) [offline]
Cam
Low SR
𝐼! + 𝛿 ! 𝐼# + 𝛿 # 𝐼$ + 𝛿 $ 𝐼! 𝐼# 𝐼$
…
…
…
𝜖" 𝜖" 𝜖" 𝜖" 𝜖" 𝜖"
𝜏 ~ 𝑝! 𝜏" 𝐼)
𝜏 !
𝜏 # 𝜏$ 𝜏! 𝜏# 𝜏$
𝜏 ~ 𝑝! 𝜏" 𝐼 + 𝛿)
Figure 1: Adversarial Attacks against Diffusion Policy: We aim to attack robots controlled with
visual-based DP, unveiling hidden threats to the safe application of diffusion-based policies. (a) By
hacking the visual inputs, we can fool the diffusion process into generating wrong actions τ (in red).
We propose Diffusion Policy Attacker(DP-Attacker), which can effectively attack the DP by (b)
hacking the global camera inputs I using small visual perturbations under both online and offline
settings or (c) attaching an adversarial patch into the environment. The online settings use current
visual inputs at t-th timestep I t to generate time-variant perturbations δ t , while the offline settings
use only offline data I D to generate time-invariant perturbations δ.
straightforward to attack an end-to-end DNN by applying gradient ascent over the loss function [30,
14], it is non-trivial to craft attacks against a DP, due to its concatenated denoising structure and high
randomness. Prior research [25, 24, 52, 51, 43] has focused on attacking the diffusion process of the
text-to-image (T2I) diffusion models [40]. However, there are distinct differences between attacking
a T2I diffusion model and attacking a Diffusion Policy. Firstly, they concentrate on attacking the
diffused value while we aim at attacking the conditional image. In addition, they try to fool the editing
process over the clean images (e.g. SDEdit [33]), while we are trying to fool the entire denoising
process starting from pure Gaussian noise.
In this paper, we focus on crafting adversarial attacks against DP. Specifically, we propose Diffusion
Policy Attacker (DP-Attacker), the first suite of white-box-attack algorithms that can effectively
deceive the visual-based diffusion policies. We investigate two hacking scenarios as illustrated
in Figure 1: (1) hacking the scene camera, which means that we can add imperceptible digital
perturbations to the visual inputs of DP, and (2) hacking the scene by attaching small adversarial
patches [4] to the environments (e.g. table). Furthermore, we consider both offline and online settings,
for online settings, we can generate time-variant perturbations based on the current visual inputs, on
the opposite, for the offline settings we can only add one fixed perturbation across all the frames.
We conducted extensive experiments on DP pre-trained on six robotic manipulation tasks and
demonstrated that DP-Attacker can effectively craft adversarial attacks against DP. For digital
attacks, DP-Attacker can generate both online and offline attacks that significantly degrade the DP
system’s performance. For physical attacks, DP-Attacker is capable of creating adversarial patches
tailored for each task, which can be put into the physical environment to disrupt the system. Also, we
reveal that the encoder makes the DP easy to attack.
2 Related Works
Diffusion-based Policy Generation Diffusion models [46, 16, 45] exhibit superior performance
in high-fidelity image generation [40, 38, 42]. Due to its strong expressiveness in modeling multi-
modal distribution, diffusion models have also been successfully applied to robot learning areas
such as reinforcement learning [50, 2], imitation learning [9, 54, 20, 37], and motion planning [41,
29, 18]. Among them, Diffusion policy (DP) [9, 54, 23] has gained significant attention due to its
straightforward training methodology and consistent, reliable performance. In this paper, we focus
on crafting adversarial attacks against visual-based DP, a technology already integrated into various
indoor robot prototypes like Mobile Aloha [12].
2
Adversarial Examples for Deep Systems Adversarial attacks have been widely studied for deep
neural networks (DNNs): given a small perturbation, the DNN will be fooled to make wrong
predictions [48, 14]. For DNN-based visual recognition models, crafting adversarial samples is
a relatively easy task using gradient-based budget-limited attacks [30, 53, 14, 5, 10, 3]. However,
attacking diffusion models consisting of a cascade of DNNs injected with noise, poses a more complex
challenge. Recent studies have demonstrated the feasibility of effectively crafting adversarial samples
for latent diffusion models using meticulously designed surrogate losses [25, 55, 24, 44, 43, 52, 6].
However, these efforts have primarily focused on image editing or imitation tasks and are limited to
working solely in latent space [51]. Here we hope to explore the adversarial attacks against DP under
various settings.
Adversarial Threats against Robot Learning Previous research has highlighted adversarial
attacks as a significant threat to robot learning systems [8], where small perturbations can cause chaos
in applications such as deep reinforcement learning [22, 13, 27, 36, 47, 35], imitation learning [15],
robot navigation [21], robot manipulation [19, 32], and multi-agent robot swarms [1]. Despite the
rising popularity of policies generated by diffusion models, to the best of our knowledge, there have
been no prior efforts aimed at attacking these models.
3 Preliminaries
3.1 Diffusion Models for Behaviour Cloning
Diffusion models [46, 16] are one type of generative model that can fit a distribution q(x0 ), using a
diffusion process and a denoising process. Starting from xK , a pure Gaussian noise, the denoising
process can generate samples from the target distribution by K iterations of denoising steps:
xk = αk (xk+1 − λk ϵθ (xk , k) + N (0, σk2 I)), k = 0, 2, ..., K − 1 (1)
where αk , λk , σk are hyper-parameters for the noise scheduler. ϵθ is a learned denoiser parameterized
by θ, which can be trained by optimizing the denoising loss termed L = Ex,k ∥ϵθ (x + ϵk , k) − ϵk ∥2 .
We define the reverse process in Equation 1 as xk = Rkθ (xk+1 ) for simplicity.
Diffusion policies [18, 9] noted πθ apply the diffusion models mentioned above, resulting in τ t ∼
πθ (st ), where τ t ∈ RDa ×La is the planned action sequences at timestep t in the continuous space, st
is the current states, and Da , La are the action dimension and action length respectively. Accordingly,
the learnable denoiser becomes ϵθ (τk , k, s), and the denoised diffusion process remains the same.
For visual DP, the states st are usually images captured by the scene or wrist cameras, so we use I t
throughout to represent the visual inputs at timestep t. Finally, the policy can be formulated as
τ t ∼ πθ (I t ) = R0θ (R1θ ...RK−2
θ (RK−1
θ (xK , I t )...I t ), I t ). (2)
The equation above shows that the predicted action τ t is the output of chained denoiser models
residually conditioned on the current observation I t . In practice, while DP outputs a long sequence
of actions τ , we only execute the first few actions of it in a receding horizon manner to improve
temporal action consistency [9].
Adversarial samples [14, 30, 5] have been widely studied as a threat to the AI system: for a DNN-
based image classifier y = fθ (x), one can easily craft imperceptible perturbations P to fool the
classifier to make wrong predictions over P(x). In digital attack settings [48, 14], the perturbation
should be small and always invisible to humans, which can be formulated by the ℓ∞ -norm as
|P(x) − x|∞ < σ where σ is a small value (e.g. 8/255 for pixel value). Methods like FGSM [14]
and PGD [30] can be easily applied to craft such kinds of adversarial attacks. For physical-world
adversarial patches [4, 11, 53, 17], P(x) is always crafted as attaching a small adversarial patch to
the environments, and the patch should be robust to physical-world transformations such as position,
camera view, and lightening condition.
Recent works [25, 43] show that it is also possible to craft such kind of adversarial examples to fool
latent diffusion models [40] with an encoder E and a decoder D: adding small perturbation to a clean
3
image, the denoising process will be fooled to generate bad editing or imitation results. The following
Monte-Carlo-based adversarial loss to attack a latent diffusion model:
Ladv (x) = Ek ∥ϵθ (E(x) + ϵk , k) − ϵk ∥22 (3)
the mechanism behind attacking latent diffusion models [52] turns out to be the vulnerability of the
autoencoder and works only for the diffusion model in the latent space [51]. Also, the settings above
differs from our settings of attacking a DP which targets on attacking the conditional image without
the ground-truth clean action to get the diffused input of ϵθ in Equation 3. In the following section,
we show that we can still effectively craft different kinds of adversarial samples based on Equation 3
with some modification.
4 Methods
4.1 Problem Settings DP-Attacker
In this paper, we assume that we have access
to some diffusion policy network. Given this Global Attack Patched Attack
trained network, we wish to find adversarial per-
turbations that, when added to the observation I, Offline Online Offline
will cause the trained diffusion policy to gener-
ate unwanted actions (either random or targeted)
that impede task completion. Targeted Untargeted Targeted Untargeted
The most straightforward way to measure the Figure 2: Design Space of DP-Attacker: the tree
quality of the attack is to use the difference be- above shows the design space of DP-Attacker,
tween generated actions from the original ac- which can be adapted to various kinds of attack
tions in an end-to-end manner: scenarios, including global attacks (hacking and
Luntar (I, t) = −||π (P(I)) − τ t,∗ 2
|| (3) cameras) vs patched attacks (hacking the physical
end2end θ
environment); offline vs online; targeted vs untar-
where τ t,∗ is a known good solution sampled by geted.
πθ given the observation image I, and P(·) is
some perturbation on the observation image. It
could be generated either from the trained policy for online attacks or from the training dataset for
offline attacks. One can minimize the negative L-2 distance between a generated action and a good
action for untargeted attacks. For targeted attacks, the action loss becomes
t 2
Ltar
end2end (I, t) = ||πθ (P(I)) − τtarget || (4)
t
where τtarget is some target bad action we wish the policy to execute (e.g. always move to left). We
can use PGD [30] to optimize for the best perturbation that minimizes this loss. However, due to the
inherent long-denoising chain of the diffusion policy πθ , the calculation of this gradient could be
quite costly [43].
In practice, running the end-to-end attacks above is not effective especially when the model is large
and when we need to hack the camera at a high frequency. Instead, borrowing ideas from recent
works [25, 24, 52] on adv-samples for diffusion models, we propose to use the following optimization
objectives:
Luntar
adv (I, t) = −Ek ∥ϵθ (τ
t,∗
+ ϵk , k, P(I)) − ϵk ∥2 (5)
where k is the timestep of the diffusion process and t is the timestep of the action runner. We add
noise to the good solution τ t,∗ and then calculate the L-2 distance between the predicted noise of the
denoise network and the added noise. Minimizing this loss leads to inaccurate noise prediction of the
denoising network and, in turn, leads to bad generated action of the diffusion policy. For targeted
attacks, the noise prediction loss is:
t 2
Ltar
adv (I, t) = Ek ∥ϵθ (τtarget + ϵk , k, P(I)) − ϵk ∥ (6)
Minimizing this loss would allow the denoising net to favor the generation of the target action. The
gradient of the noise prediction loss is easier to calculate compared to the action loss because of the
short one-step chain. This makes it more favorable for conducting attacks.
4
4.2 Global Attacks
A global attack injects adversarial perturbation δ into the observation image I by adding it on top
of the observation image, i.e. P(I) = I + δ. The adversarial noise δ is of the same shape as the
original image. To make the attack imperceptible, the adversarial noise’s absolute value is limited
by some σ. To find such an adversarial noise, we use PGD [30], an optimization-based method to
search for an adversarial noise. The adversarial noise can be constructed online during inference
or offline using the training dataset. The algorithm for conducting an online global attack is shown
in Algorithm 1. The algorithm optimizes for loss in Equation 5 or Equation 6. The algorithm can
be modified easily to construct an offline attack. Given the training dataset DT = {(τ t , I t )|t ∈ T }
t 2
we can optimize for the loss Luntar tar
adv (I, t) = −Ek,(τ t ,I t ) ∥ϵθ (τ + ϵk , k, P(I)) − ϵk ∥ or Ladv (I, t) =
t 2
Ek,(τ t ,I t ) ∥ϵθ (τtarget + ϵk , k, P(I)) − ϵk ∥ . This algorithm is provided in the appendix.
5
Clean
Rand
Noise
Untar
Tar
Clean
Rand
Noise
Untar
Tar
Figure 3: Global Attack (Online): We visualize the global attacks in Algorithm 1 within both the
PushT and Can environments. Specifically, we present action rollouts for four types of observations:
clean observations, observations perturbed with random Gaussian noise, and our optimized perturba-
tions (both untargeted and targeted). While the DPs show robustness to random perturbations, they
are vulnerable to adversarial samples generated using DP-Attacker.
4.3 Patched Attacks
A patched attack directly puts a specifically designed image patch x ∈ Rc×h×w into the environment.
The camera later captures it and causes undesirable motion from the diffusion policy. The patch
should be active under different scales, orientations, and observation views. During training, we
apply some random affine transform (shift, rotation, scale, and shear) T ∈ T. The affine transform
uses the center of the image as the origin of the coordinate system. The resulting patch replaces the
original observation image using the replacement operator: replace(I, x) again using the image’s
center as the origin of the coordinate system. To search for such a patch, we use the training dataset
and optimize for the best patch using PGD. The algorithm is illustrated in Algorithm 2.
5 Experiments
We test the effectiveness of DP-Attacker with various strengths and configurations on different
diffusion policies. Our target models are vision-based diffusion policy models introduced by Chi et
al. [9]. We aim to manipulate the visual input so that the generated trajectory will not lead to task
completion. We quantitatively evaluate the effectiveness of our attack methods by recording the result
6
Patch-Can
Patch-Square
Patch-Toolhang
Figure 4: Physical Adversarial Patches: we show the patches optimized by Algorithm 2, attaching
it to the physical scene will effectively lower the success rate of the target diffusion policy.
Table 1: Quantitative Results on Global Attacks: The table includes the attack result for all
transformer based diffusion policy networks. Our DP-Attack can significantly lower the performance
of the diffusion models.
task completion scores/successful rate. We also provide scores without attacks for reference and
random noise attacks (adding some Gaussian noise to the observation images) as a baseline attack
method. We foucus on the models released by Chi et al. [9]. However, our attack algorithm applies
to other variants of diffusion policies as well.
Environment Setup Our benchmark contains 6 tasks: PushT, Can, Lift, Square, Transport, and
Toolhang. These tasks are illustrated in Figure 6 in the Appendix. Robosuite provides all the
simulation of these tasks except for PushT [49, 31, 56]. For evaluation, we attack the released
checkpoints of diffusion policies trained by Chi et al. [9]. For tasks Can, Lift, Square, and Transport,
each has two demonstration datasets: Multi-Human (MH) and Proficient Human (PH). The other
two tasks (PushT and Toolhang) has only one PH dataset, respectively. This gives us a total of 10
datasets. In [9], each dataset is used to train two diffusion policies with different diffusion backbone
architectures: CNN-based and Transformer-based. We take the best performing checkpoints for these
20 different scenarios released by Chi et. al [9] as our attack targets. For each attack method, we run
50 rollouts and collect the average score or calculate the success rate of the tasks. The rollout length
uses the same length as the demonstration dataset [9, 31]. Besides our attack methods, we also run
the rollout using clean images for reference and with random noise added as a baseline attack method.
The evaluation is done using a single machine with an RTX 3090 GPU and AMD Ryzen 9 5950X to
calculate rollouts and run our attack algorithms.
We first present the results of global attacks. We evaluate both our online attack algorithm (creating
adversarial noise on the fly per inference) and offline algorithm (pre-generating a fixed noise that is
used for every inference).
7
Can Lift Square Toolhang
Backbone Arch CNN Transformer CNN Transformer CNN Transformer CNN Transformer
Clean 0.98 0.92 1 1 0.94 0.92 0.8 0.86
Random Noise Patch 0.9 0.94 1 0.9 0.8 0.54 0.56 0.12
Untargeted-Offline 0.16 0.44 1 0.82 0.72 0.34 0.48 0.02
Table 2: Quantitative Results on Patched Attacks
Online Attack For online attacks, we use attack parameters σ = 0.03, α = 0.001875, N = 50.
For targeted attacks, we use a normalized target action vector of all ones. We report the performance
of the transformer-based models before and after the attack in Table 1. The results of global attacks
on all models are given in the appendix. Example rollouts and images used in the rollouts are shown
in Figure 3.
Offline Attack For offline global attacks, we train on the training dataset with batch size 64,
α = 0.0001, σ = 0.03 for 10 epochs. The resulting trained adversarial noise is added to the input
image for every inference. The results are shown in Table 1. Examples of rollouts and images used in
the attack can be found on our website.
We find that diffusion policy is not robust to noises introduced by our DP-Attacker. The performance
of diffusion policies is significantly reduced after running global attacks. A disturbance of less than
3% is able to decrease the performance from 100% to 0%. The success of offline global attacks also
shows attacks can be cheaply constructed and pose a significant threat to the safety of using diffusion
policy in the real world.
Can PH CNN Online Global Attack Lift PH CNN Online Global Attack
5.2 Patched Attack
1.60 1.60
Results We construct a patch of size that covers around 5% of the observation image using
Algorithm 2. The details of the training can be found in the appendix. We evaluate the effectiveness
of our patch attack algorithm on a total of 8 checkpoints, covering the PH dataset across four tabletop
manipulation tasks (Can, Lift, Square, and Toolhang) using both CNN and Transformer diffusion
backbones. The result success rate (SR) is shown in Table 2. Example rollouts are shown in Table 4.
Simpler tasks such as Can and Lift are quite robust to random noise patch. Our DP-Attacker
produces adversarial patches that perform better than random noise in terms of degrading the
diffusion policy performance.
6 Ablation Study
Attack Parameters To investigate the effectiveness of our attack method, we evaluate how the
attack parameter plays a role in DP-Attacker. First, we investigate the effect of the number of PGD
8
Parameters (σ = 0.03) N = 10 N = 20 N = 50
Success Rate 0.94 0.8 0.66
Parameters (N = 50) σ = 0.01 σ = 0.03 σ = 0.05
Success Rate 1 0.68 0.32
Table 3: Different Parameters for DP-Attack: We did an ablation study on parameters σ and
N , and we can see that smaller steps and budgets are not enough to fool a DP. Larger budgets will
dramatically decrease the Sucess Rate (SR).
Method Attack Time Model Success Rate
Clean - 1
Random Noise - 1
End to End DDPM (Untargeted) ∼70s 1
End to End DDPM (Targeted) ∼67s 0.52
End to End DDIM-8 (Untargeted) ∼6.5s 0.9
End to End DDIM-8 (Targeted) ∼5.8s 0.24
DP-Attacker (Untargeted) ∼1.8s 0.62
DP-Attacker (Targeted) ∼1.3s 0.02
Table 4: Compared with End to End Attacks DP-Attacker runs significantly faster than the
end-to-end attacks even if it is accelerated with DDIM. Our DP-Attacker also provides better attack
results.
steps N . We keep the σ = 0.03, and α = 2σN . Second, we investigate the effect of the noise scale σ.
2σ
We keep N = 50, and α = N . We evaluate all six attacks on the transformer backbone DP trained
on the Lift PH dataset. The result is summarized in table 3.
End to End Loss vs. Noise Prediction Loss We perform a comparison with the end-to-end action
loss 3. We evaluate both methods with the same attack parameters (σ = 0.03, α = 0.001875, N =
50) on the best-performing transformer backbone trained on the PH dataset of the Lift task. Again,
we evaluate 50 randomly initialized environments. The selection of end to end loss with DDPM [16]
scheduler makes it infeasible for online attacks. In addition, we provide results where we replace
the loss-calculating noise scheduler with a DDIM-8 step scheduler [45]. This provides speedup for
calculating the end to end loss. The result SR after the attack and the average time used to perform
the online attacks are shown in 4. The naive end-to-end loss is significantly lower than our attack
algorithms and does not provide better results. We suspect that since diffusion models introduce
randomness during the sampling of a trajectory, it is better to attack the noise prediction loss rather
than the end to end action loss.
What Is Being Attacked Is the Encoder We try to investigate further what exactly is attacked
in our DP-Attacker. Other literature relating to text-to-image diffusion models shows that the
encoder is the one being attacked [43, 52]. We suspect the same is happening for diffusion policy. To
investigate this, we calculate the L2 distance between the encoded feature vector of clean and attacked
images random noise attack, unsuccessful attack parameters, and successful parameters, respectively.
The details of the calculation is in the appendix. We do this for 1000 images in the training dataset and
plot the distribution of the distances using a violin plot in Figure 5. The significant difference shows
that our attack method has drastically changed the representation of the conditional visual feature.
This later affects the downstream conditional noise prediction net, causing it to make inaccurate noise
predictions. We put details about it in the Appendix.
9
deep structure of diffusion-based policy generation, it remains vulnerable to adversarial attacks. We
emphasize the need for future research to focus on enhancing the robustness of DP to ensure its
reliability in real-world applications. There are also some limitations for this paper: our experiments
were conducted exclusively within a simulation environment, and we did not extend our testing to
real-world scenarios. Additionally, we did not develop or implement any defensive strategies for the
proposed tasks, which remains an area for future research and exploration.
References
[1] M. Abouelyazid. Adversarial deep reinforcement learning to mitigate sensor and communication attacks
for secure swarm robotics. Journal of Intelligent Connectivity and Emerging Technologies, 8(3):94–112,
2023.
[2] A. Ajay, Y. Du, A. Gupta, J. Tenenbaum, T. Jaakkola, and P. Agrawal. Is conditional generative modeling
all you need for decision-making? arXiv preprint arXiv:2211.15657, 2022.
[3] A. Arnab, O. Miksik, and P. H. Torr. On the robustness of semantic segmentation models to adversarial
attacks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 888–897,
2018.
[4] T. B. Brown, D. Mané, A. Roy, M. Abadi, and J. Gilmer. Adversarial patch. arXiv preprint
arXiv:1712.09665, 2017.
[5] N. Carlini and D. Wagner. Towards evaluating the robustness of neural networks. In 2017 ieee symposium
on security and privacy (sp), pages 39–57. Ieee, 2017.
[6] J. Chen, J. Dong, and X. Xie. Exploring adversarial attacks against latent diffusion model from the
perspective of adversarial transferability. arXiv preprint arXiv:2401.07087, 2024.
[7] L. Chen, S. Bahl, and D. Pathak. Playfusion: Skill acquisition via diffusion from language-annotated play.
In Conference on Robot Learning, pages 2012–2029. PMLR, 2023.
[8] T. Chen, J. Liu, Y. Xiang, W. Niu, E. Tong, and Z. Han. Adversarial attack and defense in reinforcement
learning-from ai security view. Cybersecurity, 2:1–22, 2019.
[9] C. Chi, S. Feng, Y. Du, Z. Xu, E. Cousineau, B. Burchfiel, and S. Song. Diffusion policy: Visuomotor
policy learning via action diffusion. arXiv preprint arXiv:2303.04137, 2023.
[10] Y. Dong, F. Liao, T. Pang, H. Su, J. Zhu, X. Hu, and J. Li. Boosting adversarial attacks with momentum. In
Proceedings of the IEEE conference on computer vision and pattern recognition, pages 9185–9193, 2018.
[11] R. Duan, X. Ma, Y. Wang, J. Bailey, A. K. Qin, and Y. Yang. Adversarial camouflage: Hiding physical-
world attacks with natural styles. In Proceedings of the IEEE/CVF conference on computer vision and
pattern recognition, pages 1000–1008, 2020.
[12] Z. Fu, T. Z. Zhao, and C. Finn. Mobile aloha: Learning bimanual mobile manipulation with low-cost
whole-body teleoperation. arXiv preprint arXiv:2401.02117, 2024.
[13] A. Gleave, M. Dennis, C. Wild, N. Kant, S. Levine, and S. Russell. Adversarial policies: Attacking deep
reinforcement learning. arXiv preprint arXiv:1905.10615, 2019.
[14] I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing adversarial examples. arXiv
preprint arXiv:1412.6572, 2014.
[15] G. Hall, A. Das, J. Quarles, and P. Rad. Studying adversarial attacks on behavioral cloning dynamics. In
2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI), pages 452–459.
IEEE, 2020.
[16] J. Ho, A. Jain, and P. Abbeel. Denoising diffusion probabilistic models. Advances in Neural Information
Processing Systems, 33:6840–6851, 2020.
[17] Z. Hu, S. Huang, X. Zhu, F. Sun, B. Zhang, and X. Hu. Adversarial texture for fooling person detectors
in the physical world. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition, pages 13307–13316, 2022.
[18] M. Janner, Y. Du, J. B. Tenenbaum, and S. Levine. Planning with diffusion for flexible behavior synthesis.
arXiv preprint arXiv:2205.09991, 2022.
10
[19] Y. Jia, C. M. Poskitt, J. Sun, and S. Chattopadhyay. Physical adversarial attack on a robotic arm. IEEE
Robotics and Automation Letters, 7(4):9334–9341, 2022.
[20] T.-W. Ke, N. Gkanatsios, and K. Fragkiadaki. 3d diffuser actor: Policy diffusion with 3d scene representa-
tions. arXiv preprint arXiv:2402.10885, 2024.
[21] M. I. Khedher and M. Rezzoug. Analyzing adversarial attacks against deep learning for robot navigation.
In ICAART (2), pages 1114–1121, 2021.
[22] M. Lechner, A. Amini, D. Rus, and T. A. Henzinger. Revisiting the adversarial robustness-accuracy
tradeoff in robot learning. IEEE Robotics and Automation Letters, 8(3):1595–1602, 2023.
[23] X. Li, V. Belagali, J. Shang, and M. S. Ryoo. Crossway diffusion: Improving diffusion-based visuomotor
policy via self-supervised learning. arXiv preprint arXiv:2307.01849, 2023.
[24] C. Liang and X. Wu. Mist: Towards improved adversarial examples for diffusion models. arXiv preprint
arXiv:2305.12683, 2023.
[25] C. Liang, X. Wu, Y. Hua, J. Zhang, Y. Xue, T. Song, Z. Xue, R. Ma, and H. Guan. Adversarial example
does good: Preventing painting imitation from diffusion models via adversarial examples. In International
Conference on Machine Learning, pages 20763–20786. PMLR, 2023.
[26] Z. Liang, Y. Mu, H. Ma, M. Tomizuka, M. Ding, and P. Luo. Skilldiffuser: Interpretable hierarchical
planning via skill abstractions in diffusion-based task execution. arXiv preprint arXiv:2312.11598, 2023.
[27] Y.-C. Lin, Z.-W. Hong, Y.-H. Liao, M.-L. Shih, M.-Y. Liu, and M. Sun. Tactics of adversarial attack on
deep reinforcement learning agents. arXiv preprint arXiv:1703.06748, 2017.
[28] J. Liu, P. Hang, X. Zhao, J. Wang, and J. Sun. Ddm-lag: A diffusion-based decision-making model for
autonomous vehicles with lagrangian safety enhancement. arXiv preprint arXiv:2401.03629, 2024.
[29] Y. Luo, C. Sun, J. B. Tenenbaum, and Y. Du. Potential based diffusion motion planning. 2023.
[30] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu. Towards deep learning models resistant to
adversarial attacks. In International Conference on Learning Representations, 2018.
[31] A. Mandlekar, D. Xu, J. Wong, S. Nasiriany, C. Wang, R. Kulkarni, L. Fei-Fei, S. Savarese, Y. Zhu, and
R. Martín-Martín. What matters in learning from offline human demonstrations for robot manipulation. In
arXiv preprint arXiv:2108.03298, 2021.
[32] M. Melis, A. Demontis, B. Biggio, G. Brown, G. Fumera, and F. Roli. Is deep learning safe for robot vision?
adversarial examples against the icub humanoid. In Proceedings of the IEEE international conference on
computer vision workshops, pages 751–759, 2017.
[33] C. Meng, Y. He, Y. Song, J. Song, J. Wu, J.-Y. Zhu, and S. Ermon. Sdedit: Guided image synthesis and
editing with stochastic differential equations. In International Conference on Learning Representations,
2021.
[34] U. A. Mishra, S. Xue, Y. Chen, and D. Xu. Generative skill chaining: Long-horizon skill planning with
diffusion models. In Conference on Robot Learning, pages 2905–2925. PMLR, 2023.
[35] K. Mo, W. Tang, J. Li, and X. Yuan. Attacking deep reinforcement learning with decoupled adversarial
policy. IEEE Transactions on Dependable and Secure Computing, 20(1):758–768, 2022.
[36] A. Pattanaik, Z. Tang, S. Liu, G. Bommannan, and G. Chowdhary. Robust deep reinforcement learning
with adversarial attacks. arXiv preprint arXiv:1712.03632, 2017.
[37] T. Pearce, T. Rashid, A. Kanervisto, D. Bignell, M. Sun, R. Georgescu, S. V. Macua, S. Z. Tan, I. Momenne-
jad, K. Hofmann, et al. Imitating human behaviour with diffusion models. arXiv preprint arXiv:2301.10677,
2023.
[38] D. Podell, Z. English, K. Lacey, A. Blattmann, T. Dockhorn, J. Müller, J. Penna, and R. Rombach. Sdxl:
Improving latent diffusion models for high-resolution image synthesis. arXiv preprint arXiv:2307.01952,
2023.
[39] D. A. Pomerleau. Alvinn: An autonomous land vehicle in a neural network. Advances in neural information
processing systems, 1, 1988.
11
[40] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer. High-resolution image synthesis with
latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern
recognition, pages 10684–10695, 2022.
[41] K. Saha, V. Mandadi, J. Reddy, A. Srikanth, A. Agarwal, B. Sen, A. Singh, and M. Krishna. Edmp:
Ensemble-of-costs-guided diffusion for motion planning. arXiv preprint arXiv:2309.11414, 2023.
[42] C. Saharia, W. Chan, S. Saxena, L. Li, J. Whang, E. L. Denton, K. Ghasemipour, R. Gontijo Lopes,
B. Karagol Ayan, T. Salimans, et al. Photorealistic text-to-image diffusion models with deep language
understanding. Advances in Neural Information Processing Systems, 35:36479–36494, 2022.
[43] H. Salman, A. Khaddaj, G. Leclerc, A. Ilyas, and A. Madry. Raising the cost of malicious ai-powered
image editing. arXiv preprint arXiv:2302.06588, 2023.
[44] S. Shan, J. Cryan, E. Wenger, H. Zheng, R. Hanocka, and B. Y. Zhao. Glaze: Protecting artists from style
mimicry by text-to-image models. arXiv preprint arXiv:2302.04222, 2023.
[45] J. Song, C. Meng, and S. Ermon. Denoising diffusion implicit models. In International Conference on
Learning Representations, 2021.
[46] Y. Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole. Score-based generative
modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020.
[47] J. Sun, T. Zhang, X. Xie, L. Ma, Y. Zheng, K. Chen, and Y. Liu. Stealthy and efficient adversarial attacks
against deep reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence,
volume 34, pages 5883–5891, 2020.
[48] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus. Intriguing
properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
[49] E. Todorov, T. Erez, and Y. Tassa. Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ
international conference on intelligent robots and systems, pages 5026–5033. IEEE, 2012.
[50] Z. Wang, J. J. Hunt, and M. Zhou. Diffusion policies as an expressive policy class for offline reinforcement
learning. arXiv preprint arXiv:2208.06193, 2022.
[51] H. Xue and Y. Chen. Pixel is a barrier: Diffusion models are more adversarially robust than we think.
arXiv preprint arXiv:2404.13320, 2024.
[52] H. Xue, C. Liang, X. Wu, and Y. Chen. Toward effective protection against diffusion-based mimicry
through score distillation. In The Twelfth International Conference on Learning Representations, 2023.
[53] H. Xue, A. Araujo, B. Hu, and Y. Chen. Diffusion-based adversarial sample generation for improved
stealthiness and controllability. Advances in Neural Information Processing Systems, 36, 2024.
[54] Y. Ze, G. Zhang, K. Zhang, C. Hu, M. Wang, and H. Xu. 3d diffusion policy. arXiv preprint
arXiv:2403.03954, 2024.
[55] B. Zheng, C. Liang, X. Wu, and Y. Liu. Understanding and improving adversarial attacks on latent diffusion
model. arXiv preprint arXiv:2310.04687, 2023.
[56] Y. Zhu, J. Wong, A. Mandlekar, R. Martín-Martín, A. Joshi, S. Nasiriany, and Y. Zhu. robosuite: A modular
simulation framework and benchmark for robot learning. In arXiv preprint arXiv:2009.12293, 2020.
12
Appendix
We put more video results, including rollouts of the DP-based robots under various attacks crafted by
DP-Attacker , in the following anonymous link:
https://sites.google.com/view/dp-attacker-videos/.
A Broader Impact
Diffusion-based policies (DPs) have emerged as promising candidates for integrating real robots into
our daily lives. Even with just a few collected demonstrations, DPs exhibit strong performance across
various tasks [12, 54]. However, despite their utilization of diffusion models, which distinguish
them from other policy generators, our research highlights their vulnerability to adversarial attacks.
We demonstrate practical attacks on DP-based systems, such as hacking cameras to introduce fixed
perturbations across all frames (global offline attack) and incorporating patterns into the scene
(physical patched attack). It is critical to consider these threats, and we urge future research to
prioritize the development of more robust DPs before their widespread application in the real world.
B Algorithms
We also provide the algorithm for training the offline global attacks.
C Experimental Details
We investigate a total of six different tasks: PushT, Can, Lift, Square, Transport, and Tool hang. The
tasks are illustrated in 6. Here are the descriptions for each task:
• PushT: The simulation happens in 2D. The agent controls a rod (blue circle) to push the
grey T block into the targeted green area. The score calculated is the maximum percent of
coverage of the green area by the grey T block during a rollout.
13
Push T Can Lift Square Transport Tool hang
• Can: The simulation environment is provided by robosuite [56]. The agent controls the
6-DoF end-effector position and gripper close or open. The goal is to move the randomly
positioned can in the left bin into the corresponding bin (lower right) in the right bin.
• Lift: The simulation environment is provided by robosuite. The agent controls the 6-DoF
end-effector position and gripper close or open. The goal is to lift up the randomly positioned
red block.
• Square: The simulation environment is provided by robosuite. The agent controls the 6-DoF
end-effector position and gripper close or open. The goal is to put the randomly positioned
square nut around the square peg.
• Transport: The simulation environment is provided by robosuite. The agent controls 2 6-DoF
end-effector positions and grippers close or open. The goal is to transport the hammer inside
the box one one side to the box on the other side.
• Tool hang: The simulation environment is provided by robosuite. The agent controls the
6-DoF end-effector position and gripper close or open. The goal is to construct the tool
tower by first inserting an L-shaped bar into the base and later hanging the second tool on
the tip of the bar.
Our goal is to construct noises for the observation images. The image encoder uses multiple views
when constructing the conditional image feature vector. Below are details of how we construct the
adversarial noises.
14
Parameter Range
x shift [−0.4, 0.4] percent of image, in centered coordinate.
y shift [−0.4, 0.4] percent of image, in centered coordinate.
rotation [−45◦ , 45◦ ]
scale [1, 1]
shear x [−50◦ , 50◦ ]
shear y [−50◦ , 50◦ ]
the simulation environment for evaluation and can be observed from multiple perspectives. For tasks
Can, Lift, and Square, the observation image size is 84 × 84, and we choose the training patch size of
17×17 that covers around 4% of the observation. For task Toolhang where the observation image size
is 84 × 84, we choose the training patch size of 50 × 50 that covers around 4.3% of the image. The
set of transforms T is summarized in the table 5. The training parameters are epochs = 10, batch size
= 64, α = 0.0001. For evaluation, we make patch objects of size 0.06m × 0.06m (2.36in × 2.36in)
and put it onto the table. The rotation angle is from [−45◦ , 45◦ ]. For tasks Can, Lift, and Square,
the position of the patch can be anywhere on the table. For Toolhang, the position of the patch is
constrained to be on the top left of the table so it can be captured by the camera. The size is about the
same and we provide the comparison in the figure 7.
D More Results
Full Table for Global Attacks We provide the full table for the results of global attacks in 6 as an
extension for 1. CNN-based models are harder to attack. Nevertheless, the result score still decreased
significantly.
Targeted Attacks With more attack budget, we can manipulate the robot’s action quite well. For
this experiment, we increase the online global attack budget σ to 0.05. With this increased budget,
we could manipulate the generated action of DP. This shows that DP-Attacker proposed targeted
noise prediction loss See our website for details.
What Is Being Attacked Is the Encoder To investigate whether it is the encoder that is being
attacked in DP-Attacker we perform the following comparison. For a given image, we find the
encoded feature vector of the clean image E(x), clean image + random noise E(x + δrand ), and clean
image + adversarial noise E(x + δadv ) by our DP-Attacker. Next we calculate the L2 distance
between the encoded clean image vs the encoded random noise attacked image |E(x) − E(x + δrand )|22 ,
and the L2 distance between the encoded clean image vs the encoded DP-Attacker attacked images
|E(x) − E(x + δadv )|22 . We collect the these two distances for 1000 images in the training dataset and
plot the distribution of the two sets using violin plots 5. The attack we use are random noise attack with
σ = 0.03 and online targeted global attacks with σ = 0.03, N = 50, α = 0.001875, ntarget = 1. We
do this for two dataset, one is the CAN ph with CNN backbone where our DP-Attacker successfully
performs the attack, and the other is the Lift ph with CNN backbone where our DP-Attacker fails to
construct successful attacks (see table 6). The difference in distribution shows that successful attacks
is correlated with the successful disturbance of the encoder.
15
Tasks PushT Can Lift Square Transport Toolhang
Demonstration Type PH PH MH PH MH PH MH PH MH PH
Clean 0.75/0.83 0.92/0.98 0.92/0.98 1/1 1/1 0.92/0.94 0.72/0.84 0.86/0.88 0.46/0.82 0.86/0.8
Random Noise 0.66/0.87 0.88/1 0.98/0.98 1/1 1/1 0.82/0.94 0.74/0.76 0.84/0.84 0.48/0.68 0.82/0.72
Targeted-Offline 0.46/0.68 0.08/0.2 0.08/0.16 0.94/1 0.7/1 0/0.9 0/0.66 0/0.66 0.02/0.64 0/0
Untargeted-Offline 0.39/0.73 0.1/0 0.46/0.34 0.8/1 0.62/0.98 0.04/0.62 0/0.68 0/0 0/0 0/0
Targeted-Online 0.10/0.45 0/0 0/0 0.02/1 0/1 0/0.54 0/0.08 0/0 0/0 0/0
Untargeted-Online 0.19/0.48 0.02/0.02 0.02/0.02 0.62/1 0.62/1 0/0.38 0/0.08 0/0.04 0/0.04 0/0
Table 6: Quantitative Results on Global Attacks: The table includes the attack results for both CNN
and transformer-based diffusion policy networks. The format is transformer/CNN. Our DP-Attack
can significantly lower the performance of the diffusion models.
Table 7: Compared with End to End Attacks DP-Attacker runs significantly faster than the
end-to-end attacks even if it is accelerated with DDIM. Our DP-Attacker also provides better attack
results
Speed and Effectiveness Comparison With End to End Loss We perform another set of compari-
son with the end to end loss to show the both the speed benefit and effectiveness of our DP-Attacker .
We conduct online targeted attacks on the Transformer-based DP for the PushT task. The PGD
parameters for end to end attacks at N = 50, σ = 0.03, α = 0.001875. The result average model
score in 50 simulations and attack time is shown in 7. The evaluation is done on a machine with GPU
RTX4080 mobile, and CPU Intel i9-139000HX.
16