research-article

Hiding Needles in a Haystack: Towards Constructing Neural Networks that Evade Verification

Authors:

Árpád Berta,

Gábor Danner,

István Hegedus,

Mark JelasityAuthors Info & Claims

IH&MMSec '22: Proceedings of the 2022 ACM Workshop on Information Hiding and Multimedia Security

Pages 51 - 62

https://doi.org/10.1145/3531536.3532966

Published: 23 June 2022 Publication History

Get Access

Abstract

Machine learning models are vulnerable to adversarial attacks, where a small, invisible, malicious perturbation of the input changes the predicted label. A large area of research is concerned with verification techniques that attempt to decide whether a given model has adversarial inputs close to a given benign input. Here, we show that current approaches to verification have a key vulnerability: we construct a model that is not robust but passes current verifiers. The idea is to insert artificial adversarial perturbations by adding a backdoor to a robust neural network model. In our construction, the adversarial input subspace that triggers the backdoor has a very small volume, and outside this subspace the gradient of the model is identical to that of the clean model. In other words, we seek to create a "needle in a haystack" search problem. For practical purposes, we also require that the adversarial samples be robust to JPEG compression. Large "needle in the haystack" problems are practically impossible to solve with any search algorithm. Formal verifiers can handle this in principle, but they do not scale up to real-world networks at the moment, and achieving this is a challenge because the verification problem is NP-complete. Our construction is based on training a hiding and a revealing network using deep steganography. Using the revealing network, we create a separate backdoor network and integrate it into the target network. We train our deep steganography networks over the CIFAR-10 dataset. We then evaluate our construction using state-of-the-art adversarial attacks and backdoor detectors over the CIFAR-10 and the ImageNet datasets. We made the code and models publicly available at https://github.com/szegedai/hiding-needles-in-a-haystack.

References

[1]

Maksym Andriushchenko, Francesco Croce, Nicolas Flammarion, and Matthias Hein. 2019. Square Attack: a query-efficient black-box adversarial attack via random search. CoRR abs/1912.00049 (2019). arXiv:1912.00049 http://arxiv.org/abs/1912.00049

Abstract

References

Cited By

Index Terms

Recommendations

An Embarrassingly Simple Approach for Trojan Attack in Deep Neural Networks

STRIP: a defence against trojan attacks on deep neural networks

Universal backdoor attack on deep neural networks for malware detection

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Get Access

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations