Improved Activation Clipping for Universal Backdoor Mitigation and Test-Time Detection

Wang, Hang; Xiang, Zhen; Miller, David J.; Kesidis, George

Computer Science > Machine Learning

arXiv:2308.04617 (cs)

[Submitted on 8 Aug 2023]

Title:Improved Activation Clipping for Universal Backdoor Mitigation and Test-Time Detection

Authors:Hang Wang, Zhen Xiang, David J. Miller, George Kesidis

View PDF

Abstract:Deep neural networks are vulnerable to backdoor attacks (Trojans), where an attacker poisons the training set with backdoor triggers so that the neural network learns to classify test-time triggers to the attacker's designated target class. Recent work shows that backdoor poisoning induces over-fitting (abnormally large activations) in the attacked model, which motivates a general, post-training clipping method for backdoor mitigation, i.e., with bounds on internal-layer activations learned using a small set of clean samples. We devise a new such approach, choosing the activation bounds to explicitly limit classification margins. This method gives superior performance against peer methods for CIFAR-10 image classification. We also show that this method has strong robustness against adaptive attacks, X2X attacks, and on different datasets. Finally, we demonstrate a method extension for test-time detection and correction based on the output differences between the original and activation-bounded networks. The code of our method is online available.

Subjects:	Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Cite as:	arXiv:2308.04617 [cs.LG]
	(or arXiv:2308.04617v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2308.04617

Submission history

From: George Kesidis [view email]
[v1] Tue, 8 Aug 2023 22:47:39 UTC (748 KB)

Computer Science > Machine Learning

Title:Improved Activation Clipping for Universal Backdoor Mitigation and Test-Time Detection

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Improved Activation Clipping for Universal Backdoor Mitigation and Test-Time Detection

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators