Computer Science > Machine Learning
[Submitted on 21 Nov 2017 (v1), revised 1 Jan 2018 (this version, v2), latest version 8 Jun 2018 (v3)]
Title:The Manifold Assumption and Defenses Against Adversarial Perturbations
View PDFAbstract:In the adversarial-perturbation problem of neural networks, an adversary starts with a neural network model $F$ and a point ${\bf x}$ that $F$ classifies correctly, and applies a \emph{small perturbation} to $\bf x$ to produce another point ${\bf x}'$ that $F$ classifies \emph{incorrectly}. In this paper, we propose taking into account \emph{the inherent confidence information} produced by models when studying adversarial perturbations, where a natural measure of "confidence" is $\|F({\bf x})\|_\infty$ (i.e. how confident $F$ is about its prediction?). Motivated by a thought experiment based on the manifold assumption, we propose a "goodness property" of models which states that \emph{confident regions of a good model should be well separated}. We give formalizations of this property and examine existing robust training objectives in view of them. Interestingly, we find that a recent objective by Madry et al. encourages training a model that satisfies well our formal version of the goodness property, but has a weak control of points that are wrong but with low confidence. However, if Madry et al.'s model is indeed a good solution to their objective, then good and bad points are now distinguishable and we can try to embed uncertain points back to the closest confident region to get (hopefully) correct predictions. We thus propose embedding objectives and algorithms, and perform an empirical study using this method. Our experimental results are encouraging: Madry et al.'s model wrapped with our embedding procedure achieves almost perfect success rate in defending against attacks that the base model fails on, while retaining good generalization behavior.
Submission history
From: Xi Wu [view email][v1] Tue, 21 Nov 2017 19:15:05 UTC (75 KB)
[v2] Mon, 1 Jan 2018 20:12:55 UTC (77 KB)
[v3] Fri, 8 Jun 2018 13:46:51 UTC (96 KB)
Current browse context:
cs.LG
References & Citations
Bibliographic and Citation Tools
Bibliographic Explorer (What is the Explorer?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)
Code, Data and Media Associated with this Article
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)
Demos
Recommenders and Search Tools
Influence Flower (What are Influence Flowers?)
Connected Papers (What is Connected Papers?)
CORE Recommender (What is CORE?)
IArxiv Recommender
(What is IArxiv?)
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.