The Manifold Assumption and Defenses Against Adversarial Perturbations

Wu, Xi; Jang, Uyeong; Chen, Lingjiao; Jha, Somesh

Computer Science > Machine Learning

arXiv:1711.08001v2 (cs)

[Submitted on 21 Nov 2017 (v1), revised 1 Jan 2018 (this version, v2), latest version 8 Jun 2018 (v3)]

Title:The Manifold Assumption and Defenses Against Adversarial Perturbations

Authors:Xi Wu, Uyeong Jang, Lingjiao Chen, Somesh Jha

View PDF

Abstract:In the adversarial-perturbation problem of neural networks, an adversary starts with a neural network model $F$ and a point ${\bf x}$ that $F$ classifies correctly, and applies a \emph{small perturbation} to $\bf x$ to produce another point ${\bf x}'$ that $F$ classifies \emph{incorrectly}. In this paper, we propose taking into account \emph{the inherent confidence information} produced by models when studying adversarial perturbations, where a natural measure of "confidence" is $\|F({\bf x})\|_\infty$ (i.e. how confident $F$ is about its prediction?). Motivated by a thought experiment based on the manifold assumption, we propose a "goodness property" of models which states that \emph{confident regions of a good model should be well separated}. We give formalizations of this property and examine existing robust training objectives in view of them. Interestingly, we find that a recent objective by Madry et al. encourages training a model that satisfies well our formal version of the goodness property, but has a weak control of points that are wrong but with low confidence. However, if Madry et al.'s model is indeed a good solution to their objective, then good and bad points are now distinguishable and we can try to embed uncertain points back to the closest confident region to get (hopefully) correct predictions. We thus propose embedding objectives and algorithms, and perform an empirical study using this method. Our experimental results are encouraging: Madry et al.'s model wrapped with our embedding procedure achieves almost perfect success rate in defending against attacks that the base model fails on, while retaining good generalization behavior.

Comments:	Rewrite of the previous draft. Add both theoretical and empirical evidence to support our claims
Subjects:	Machine Learning (cs.LG); Cryptography and Security (cs.CR); Machine Learning (stat.ML)
Cite as:	arXiv:1711.08001 [cs.LG]
	(or arXiv:1711.08001v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1711.08001

Submission history

From: Xi Wu [view email]
[v1] Tue, 21 Nov 2017 19:15:05 UTC (75 KB)
[v2] Mon, 1 Jan 2018 20:12:55 UTC (77 KB)
[v3] Fri, 8 Jun 2018 13:46:51 UTC (96 KB)

Computer Science > Machine Learning

Title:The Manifold Assumption and Defenses Against Adversarial Perturbations

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:The Manifold Assumption and Defenses Against Adversarial Perturbations

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators