research-article

Backdoor smoothing: : Demystifying backdoor attacks on deep neural networks

Authors:

Kathrin Grosse,

Battista Biggio,

Michael Backes,

Ian MolloyAuthors Info & Claims

Volume 120, Issue C

https://doi.org/10.1016/j.cose.2022.102814

Published: 01 September 2022 Publication History

Abstract

Backdoor attacks mislead machine-learning models to output an attacker-specified class when presented a specific trigger at test time. These attacks require poisoning the training data to compromise the learning algorithm, e.g., by injecting poisoning samples containing the trigger into the training set, along with the desired class label. Despite the increasing number of studies on backdoor attacks and defenses, the underlying factors affecting the success of backdoor attacks, along with their impact on the learning algorithm, are not yet well understood. In this work, we aim to shed light on this issue by unveiling that backdoor attacks induce a smoother decision function around the triggered samples – a phenomenon which we refer to as backdoor smoothing. To quantify backdoor smoothing, we define a measure that evaluates the uncertainty associated to the predictions of a classifier around the input samples. Our experiments show that smoothness increases when the trigger is added to the input samples, and that this phenomenon is more pronounced for more successful attacks. We also provide preliminary evidence that backdoor triggers are not the only smoothing-inducing patterns, but that also other artificial patterns can be detected by our approach, paving the way towards understanding the limitations of current defenses and designing novel ones.

References

[1]

W. Aiken, H. Kim, S. Woo, J. Ryoo, Neural network laundering: removing black-box backdoor watermarks from deep neural networks, Comput. Secur. 106 (2021) 102277.

[2]

A. Athalye, N. Carlini, D.A. Wagner, Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples, ICML, JMLR Workshop and Conference Proceedings, vol. 80, 2018, pp. 274–283.

[3]

T. Baluta, S. Shen, S. Shinde, K.S. Meel, P. Saxena, Quantitative verification of neural networks and its security applications, CCS, 2019.

[4]

B. Biggio, I. Corona, D. Maiorca, B. Nelson, N. Šrndić, P. Laskov, G. Giacinto, F. Roli, Evasion attacks against machine learning at test time, Machine Learning and Knowledge Discovery in Databases (ECML PKDD), LNCS, vol. 8190, 2013.

[5]

B. Biggio, G. Fumera, F. Roli, Security evaluation of pattern classifiers under attack, IEEE Trans. Knowl. Data Eng. 26 (4) (2014) 984–996.

Digital Library

[6]

B. Biggio, F. Roli, Wild patterns: ten years after the rise of adversarial machine learning, Pattern Recognit. 84 (2018) 317–331.

Digital Library

[7]

B. Chen, W. Carvalho, N. Baracaldo, H. Ludwig, B. Edwards, T. Lee, I. Molloy, B. Srivastava, Detecting backdoor attacks on deep neural networks by activation clustering, Workshop on Artificial Intelligence Safety at AAAI, 2019.

[8]

Chen, X., Liu, C., Li, B., Lu, K., Song, D., 2017. Targeted backdoor attacks on deep learning systems using data poisoning. arXiv preprint arXiv:1712.05526.

[9]

Cinà, A. E., Grosse, K., Vascon, S., Demontis, A., Biggio, B., Roli, F., Pelillo, M., 2021. Backdoor learning curves: explaining backdoor poisoning beyond influence functions. arXiv preprint arXiv:2106.07214.

[10]

J.M. Cohen, E. Rosenfeld, J.Z. Kolter, Certified adversarial robustness via randomized smoothing, ICML, 2019.

[11]

N. Dalvi, P. Domingos, Mausam, S. Sanghai, D. Verma, Adversarial classification, KDD, 2004, pp. 99–108.

[12]

M. Forouzesh, F. Salehi, P. Thiran, Generalization comparison of deep neural networks via output sensitivity, 2020 25th International Conference on Pattern Recognition (ICPR), IEEE, 2021, pp. 7411–7418.

[13]

C. Frederickson, M. Moore, G. Dawson, R. Polikar, Attack strength vs. detectability dilemma in adversarial machine learning, 2018 International Joint Conference on Neural Networks (IJCNN), IEEE, 2018, pp. 1–8.

[14]

M. Frigge, D.C. Hoaglin, B. Iglewicz, Some implementations of the boxplot, Am. Stat. 43 (1) (1989) 50–54.

[15]

Y. Gao, C. Xu, D. Wang, S. Chen, D.C. Ranasinghe, S. Nepal, Strip: a defence against trojan attacks on deep neural networks, Proceedings of the 35th Annual Computer Security Applications Conference, 2019, pp. 113–125.

[16]

Gilmer, J., Adams, R. P., Goodfellow, I. J., Andersen, D., Dahl, G. E., 2018. Motivating the rules of the game for adversarial example research. CoRR abs/1807.06732

[17]

T. Gu, K. Liu, B. Dolan-Gavitt, S. Garg, Badnets: evaluating backdooring attacks on deep neural networks, IEEE Access 7 (2019) 47230–47244.

[18]

Guo, W., Wang, L., Xing, X., Du, M., Song, D., 2019. Tabor: a highly accurate approach to inspecting and restoring trojan backdoors in ai systems. arXiv preprint arXiv:1908.01763.

[19]

K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, CVPR, 2016.

[20]

J. Jia, B. Wang, X. Cao, N.Z. Gong, Certified robustness of community detection against adversarial structural perturbation via randomized smoothing, WWW, 2020.

[21]

Y. Jiang, B. Neyshabur, H. Mobahi, D. Krishnan, S. Bengio, Fantastic generalization measures and where to find them, ICLR, 2020.

[22]

M.G. Kendall, A new measure of rank correlation, Biometrika 30 (1/2) (1938) 81–93.

[23]

A. Krizhevsky, G. Hinton, Learning Multiple Layers of Features from Tiny Images, Technical Report, Citeseer, 2009.

[24]

G.-H. Lee, Y. Yuan, S. Chang, T. Jaakkola, Tight certificates of adversarial robustness for randomly smoothed classifiers, NeurIPS, 2019.

[25]

A. Levine, S. Feizi, Wasserstein smoothing: certified robustness against Wasserstein adversarial attacks, International Conference on Artificial Intelligence and Statistics, 2020, pp. 3938–3947.

[26]

Y. Li, N. Koren, L. Lyu, X. Lyu, B. Li, X. Ma, Neural attention distillation: erasing backdoor triggers from deep neural networks, ICLR, 2021.

[27]

K. Liu, B. Dolan-Gavitt, S. Garg, Fine-pruning: defending against backdooring attacks on deep neural networks, International Symposium on Research in Attacks, Intrusions, and Defenses, Springer, 2018, pp. 273–294.

[28]

Y. Liu, W.-C. Lee, G. Tao, S. Ma, Y. Aafer, X. Zhang, Abs: scanning neural networks for back-doors by artificial brain stimulation, CCS, 2019.

[29]

Y. Liu, S. Ma, Y. Aafer, W. Lee, J. Zhai, W. Wang, X. Zhang, Trojaning attack on neural networks, NDSS, 2018.

[30]

S.-M. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, P. Frossard, Universal adversarial perturbations, CVPR, 2017, pp. 1765–1773.

[31]

A. Nguyen, A. Tran, Wanet–imperceptible warping-based backdoor attack, ICLR, 2021.

[32]

R. Novak, Y. Bahri, D.A. Abolafia, J. Pennington, J. Sohl-Dickstein, Sensitivity and generalization in neural networks: an empirical study, ICLR, 2018.

[33]

I. Olkin, Contributions to Probability and Statistics: Essays in Honor of Harold Hotelling, Stanford University Press, 1960.

[34]

Pearson, K., 1895. Notes on regression and inheritance in the case of two parents proceedings of the royal society of London, 58, 240–242.

[35]

E. Rosenfeld, E. Winston, P. Ravikumar, Z. Kolter, Certified robustness to label-flipping attacks via randomized smoothing, International Conference on Machine Learning, PMLR, 2020, pp. 8230–8241.

[36]

H. Salman, J. Li, I. Razenshteyn, P. Zhang, H. Zhang, S. Bubeck, G. Yang, Provably robust deep learning via adversarially trained smoothed classifiers, NeurIPS, 2019.

[37]

A. Shafahi, W.R. Huang, M. Najibi, O. Suciu, C. Studer, T. Dumitras, T. Goldstein, Poison frogs! targeted clean-label poisoning attacks on neural networks, NeurIPS, 2018.

[38]

H. Shu, H. Zhu, Sensitivity analysis of deep neural networks, AAAI, 2019.

[39]

C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, R. Fergus, Intriguing properties of neural networks, Proceedings of the 2014 International Conference on Learning Representations, Computational and Biological Learning Society, 2014.

[40]

T.J.L. Tan, R. Shokri, Bypassing backdoor detection algorithms in deep learning, European Symposium of Security and Privacy, 2020.

[41]

B. Tran, J. Li, A. Madry, Spectral signatures in backdoor attacks, NeurIPS, 2018.

[42]

J. Uttley, Power Analysis, Sample Size, and Assessment of Statistical Assumptions-Improving the Evidential Value of Lighting research, Leukos, 2019.

[43]

B. Wang, Y. Yao, S. Shan, H. Li, B. Viswanath, H. Zheng, B.Y. Zhao, Neural cleanse: identifying and mitigating backdoor attacks in neural networks, IEEE Symposium on Security and Privacy, 2019.

[44]

Z. Xiang, D.J. Miller, G. Kesidis, Detection of backdoors in trained classifiers without access to the training set, IEEE Trans. Neural Netw. Learn. Syst. 33 (3) (2020) 1177–1191.

[45]

Z. Xiang, D.J. Miller, G. Kesidis, Reverse engineering imperceptible backdoor attacks on deep neural networks for detection and training set cleansing, Comput. Secur. 106 (2021) 102280.

[46]

Xiao, H., Rasul, K., Vollgraf, R., 2017. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747.

[47]

C. Zhang, S. Bengio, M. Hardt, B. Recht, O. Vinyals, Understanding deep learning requires rethinking generalization, ICLR, 2017.

[48]

P. Zhao, P.-Y. Chen, P. Das, K.N. Ramamurthy, X. Lin, Bridging mode connectivity in loss landscapes and adversarial robustness, ICLR, 2020.

[49]

C. Zhu, W.R. Huang, H. Li, G. Taylor, C. Studer, T. Goldstein, Transferable clean-label poisoning attacks on deep neural nets, ICML, 2019.

[50]

L. Zhu, R. Ning, C. Wang, C. Xin, H. Wu, Gangsweep: Sweep out neural backdoors by GAN, Proceedings of the 28th ACM International Conference on Multimedia, 2020, pp. 3173–3181.

[51]

L. Zhu, R. Ning, C. Xin, C. Wang, H. Wu, Clear: clean-up sample-targeted backdoor in neural networks, Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 16453–16462.

Cited By

Hardt MMazumdar EMendler-Dünner CZrnic TKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)Algorithmic collective action in machine learningProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3618918(12570-12586)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3618918
Cinà AGrosse KDemontis AVascon SZellinger WMoser BOprea ABiggio BPelillo MRoli F(2023)Wild Patterns Reloaded: A Survey of Machine Learning Security against Training Data PoisoningACM Computing Surveys10.1145/358538555:13s(1-39)Online publication date: 13-Jul-2023
https://dl.acm.org/doi/10.1145/3585385

Index Terms

Backdoor smoothing: Demystifying backdoor attacks on deep neural networks

Index terms have been assigned to the content through auto-classification.

Recommendations

On the Security of Randomized Defenses Against Adversarial Samples
ASIA CCS '20: Proceedings of the 15th ACM Asia Conference on Computer and Communications Security

Deep Learning has been shown to be particularly vulnerable to adversarial samples. To combat adversarial strategies, numerous defensive techniques have been proposed. Among these, a promising approach is to use randomness in order to make the ...
Securing Machine Learning Architectures and Systems
GLSVLSI '20: Proceedings of the 2020 on Great Lakes Symposium on VLSI

Machine learning (ML), and deep learning in particular, have become a critical workload as they are becoming increasingly applied at the core of a wide range of application spaces. Computer systems, from the architecture up, have been impacted by ML in ...
Reflection Backdoor: A Natural Backdoor Attack on Deep Neural Networks
Computer Vision – ECCV 2020
Abstract
Recent studies have shown that DNNs can be compromised by backdoor attacks crafted at training time. A backdoor attack installs a backdoor into the victim model by injecting a backdoor pattern into a small proportion of the training data. At test ...

Comments

Information & Contributors

Information

Published In

cover image Computers and Security

Computers and Security Volume 120, Issue C

Sep 2022

680 pages

ISSN:0167-4048

Issue’s Table of Contents

Copyright © 2022.

Publisher

Elsevier Advanced Technology Publications

United Kingdom

Publication History

Published: 01 September 2022

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 21 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Hardt MMazumdar EMendler-Dünner CZrnic TKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)Algorithmic collective action in machine learningProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3618918(12570-12586)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3618918
Cinà AGrosse KDemontis AVascon SZellinger WMoser BOprea ABiggio BPelillo MRoli F(2023)Wild Patterns Reloaded: A Survey of Machine Learning Security against Training Data PoisoningACM Computing Surveys10.1145/358538555:13s(1-39)Online publication date: 13-Jul-2023
https://dl.acm.org/doi/10.1145/3585385

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents