Adversarial Training and Robustness for Multiple Perturbations

Tramèr, Florian; Boneh, Dan

Computer Science > Machine Learning

arXiv:1904.13000 (cs)

[Submitted on 30 Apr 2019 (v1), last revised 18 Oct 2019 (this version, v2)]

Title:Adversarial Training and Robustness for Multiple Perturbations

Authors:Florian Tramèr, Dan Boneh

View PDF

Abstract:Defenses against adversarial examples, such as adversarial training, are typically tailored to a single perturbation type (e.g., small $\ell_\infty$-noise). For other perturbations, these defenses offer no guarantees and, at times, even increase the model's vulnerability. Our aim is to understand the reasons underlying this robustness trade-off, and to train models that are simultaneously robust to multiple perturbation types. We prove that a trade-off in robustness to different types of $\ell_p$-bounded and spatial perturbations must exist in a natural and simple statistical setting. We corroborate our formal analysis by demonstrating similar robustness trade-offs on MNIST and CIFAR10. Building upon new multi-perturbation adversarial training schemes, and a novel efficient attack for finding $\ell_1$-bounded adversarial examples, we show that no model trained against multiple attacks achieves robustness competitive with that of models trained on each attack individually. In particular, we uncover a pernicious gradient-masking phenomenon on MNIST, which causes adversarial training with first-order $\ell_\infty, \ell_1$ and $\ell_2$ adversaries to achieve merely $50\%$ accuracy. Our results question the viability and computational scalability of extending adversarial robustness, and adversarial training, to multiple perturbation types.

Comments:	Accepted at NeurIPS 2019, 23 pages
Subjects:	Machine Learning (cs.LG); Cryptography and Security (cs.CR); Machine Learning (stat.ML)
Cite as:	arXiv:1904.13000 [cs.LG]
	(or arXiv:1904.13000v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1904.13000

Submission history

From: Florian Tramèr [view email]
[v1] Tue, 30 Apr 2019 00:22:29 UTC (2,215 KB)
[v2] Fri, 18 Oct 2019 01:53:18 UTC (2,215 KB)

Computer Science > Machine Learning

Title:Adversarial Training and Robustness for Multiple Perturbations

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Adversarial Training and Robustness for Multiple Perturbations

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators