Confidence-Calibrated Adversarial Training: Generalizing to Unseen Attacks

Stutz, David; Hein, Matthias; Schiele, Bernt

Computer Science > Machine Learning

arXiv:1910.06259 (cs)

[Submitted on 14 Oct 2019 (v1), last revised 30 Jun 2020 (this version, v4)]

Title:Confidence-Calibrated Adversarial Training: Generalizing to Unseen Attacks

Authors:David Stutz, Matthias Hein, Bernt Schiele

View PDF

Abstract:Adversarial training yields robust models against a specific threat model, e.g., $L_\infty$ adversarial examples. Typically robustness does not generalize to previously unseen threat models, e.g., other $L_p$ norms, or larger perturbations. Our confidence-calibrated adversarial training (CCAT) tackles this problem by biasing the model towards low confidence predictions on adversarial examples. By allowing to reject examples with low confidence, robustness generalizes beyond the threat model employed during training. CCAT, trained only on $L_\infty$ adversarial examples, increases robustness against larger $L_\infty$, $L_2$, $L_1$ and $L_0$ attacks, adversarial frames, distal adversarial examples and corrupted examples and yields better clean accuracy compared to adversarial training. For thorough evaluation we developed novel white- and black-box attacks directly attacking CCAT by maximizing confidence. For each threat model, we use $7$ attacks with up to $50$ restarts and $5000$ iterations and report worst-case robust test error, extended to our confidence-thresholded setting, across all attacks.

Subjects:	Machine Learning (cs.LG); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Cite as:	arXiv:1910.06259 [cs.LG]
	(or arXiv:1910.06259v4 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1910.06259

Submission history

From: David Stutz [view email]
[v1] Mon, 14 Oct 2019 16:38:03 UTC (1,452 KB)
[v2] Mon, 25 Nov 2019 16:34:42 UTC (2,248 KB)
[v3] Tue, 25 Feb 2020 16:15:44 UTC (2,265 KB)
[v4] Tue, 30 Jun 2020 12:03:44 UTC (2,504 KB)

Computer Science > Machine Learning

Title:Confidence-Calibrated Adversarial Training: Generalizing to Unseen Attacks

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Confidence-Calibrated Adversarial Training: Generalizing to Unseen Attacks

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators