Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Backdoor smoothing: : Demystifying backdoor attacks on deep neural networks

Published: 01 September 2022 Publication History

Abstract

Backdoor attacks mislead machine-learning models to output an attacker-specified class when presented a specific trigger at test time. These attacks require poisoning the training data to compromise the learning algorithm, e.g., by injecting poisoning samples containing the trigger into the training set, along with the desired class label. Despite the increasing number of studies on backdoor attacks and defenses, the underlying factors affecting the success of backdoor attacks, along with their impact on the learning algorithm, are not yet well understood. In this work, we aim to shed light on this issue by unveiling that backdoor attacks induce a smoother decision function around the triggered samples – a phenomenon which we refer to as backdoor smoothing. To quantify backdoor smoothing, we define a measure that evaluates the uncertainty associated to the predictions of a classifier around the input samples. Our experiments show that smoothness increases when the trigger is added to the input samples, and that this phenomenon is more pronounced for more successful attacks. We also provide preliminary evidence that backdoor triggers are not the only smoothing-inducing patterns, but that also other artificial patterns can be detected by our approach, paving the way towards understanding the limitations of current defenses and designing novel ones.

References

[1]
W. Aiken, H. Kim, S. Woo, J. Ryoo, Neural network laundering: removing black-box backdoor watermarks from deep neural networks, Comput. Secur. 106 (2021) 102277.
[2]
A. Athalye, N. Carlini, D.A. Wagner, Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples, ICML, JMLR Workshop and Conference Proceedings, vol. 80, 2018, pp. 274–283.
[3]
T. Baluta, S. Shen, S. Shinde, K.S. Meel, P. Saxena, Quantitative verification of neural networks and its security applications, CCS, 2019.
[4]
B. Biggio, I. Corona, D. Maiorca, B. Nelson, N. Šrndić, P. Laskov, G. Giacinto, F. Roli, Evasion attacks against machine learning at test time, Machine Learning and Knowledge Discovery in Databases (ECML PKDD), LNCS, vol. 8190, 2013.
[5]
B. Biggio, G. Fumera, F. Roli, Security evaluation of pattern classifiers under attack, IEEE Trans. Knowl. Data Eng. 26 (4) (2014) 984–996.
[6]
B. Biggio, F. Roli, Wild patterns: ten years after the rise of adversarial machine learning, Pattern Recognit. 84 (2018) 317–331.
[7]
B. Chen, W. Carvalho, N. Baracaldo, H. Ludwig, B. Edwards, T. Lee, I. Molloy, B. Srivastava, Detecting backdoor attacks on deep neural networks by activation clustering, Workshop on Artificial Intelligence Safety at AAAI, 2019.
[8]
Chen, X., Liu, C., Li, B., Lu, K., Song, D., 2017. Targeted backdoor attacks on deep learning systems using data poisoning. arXiv preprint arXiv:1712.05526.
[9]
Cinà, A. E., Grosse, K., Vascon, S., Demontis, A., Biggio, B., Roli, F., Pelillo, M., 2021. Backdoor learning curves: explaining backdoor poisoning beyond influence functions. arXiv preprint arXiv:2106.07214.
[10]
J.M. Cohen, E. Rosenfeld, J.Z. Kolter, Certified adversarial robustness via randomized smoothing, ICML, 2019.
[11]
N. Dalvi, P. Domingos, Mausam, S. Sanghai, D. Verma, Adversarial classification, KDD, 2004, pp. 99–108.
[12]
M. Forouzesh, F. Salehi, P. Thiran, Generalization comparison of deep neural networks via output sensitivity, 2020 25th International Conference on Pattern Recognition (ICPR), IEEE, 2021, pp. 7411–7418.
[13]
C. Frederickson, M. Moore, G. Dawson, R. Polikar, Attack strength vs. detectability dilemma in adversarial machine learning, 2018 International Joint Conference on Neural Networks (IJCNN), IEEE, 2018, pp. 1–8.
[14]
M. Frigge, D.C. Hoaglin, B. Iglewicz, Some implementations of the boxplot, Am. Stat. 43 (1) (1989) 50–54.
[15]
Y. Gao, C. Xu, D. Wang, S. Chen, D.C. Ranasinghe, S. Nepal, Strip: a defence against trojan attacks on deep neural networks, Proceedings of the 35th Annual Computer Security Applications Conference, 2019, pp. 113–125.
[16]
Gilmer, J., Adams, R. P., Goodfellow, I. J., Andersen, D., Dahl, G. E., 2018. Motivating the rules of the game for adversarial example research. CoRR abs/1807.06732
[17]
T. Gu, K. Liu, B. Dolan-Gavitt, S. Garg, Badnets: evaluating backdooring attacks on deep neural networks, IEEE Access 7 (2019) 47230–47244.
[18]
Guo, W., Wang, L., Xing, X., Du, M., Song, D., 2019. Tabor: a highly accurate approach to inspecting and restoring trojan backdoors in ai systems. arXiv preprint arXiv:1908.01763.
[19]
K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, CVPR, 2016.
[20]
J. Jia, B. Wang, X. Cao, N.Z. Gong, Certified robustness of community detection against adversarial structural perturbation via randomized smoothing, WWW, 2020.
[21]
Y. Jiang, B. Neyshabur, H. Mobahi, D. Krishnan, S. Bengio, Fantastic generalization measures and where to find them, ICLR, 2020.
[22]
M.G. Kendall, A new measure of rank correlation, Biometrika 30 (1/2) (1938) 81–93.
[23]
A. Krizhevsky, G. Hinton, Learning Multiple Layers of Features from Tiny Images, Technical Report, Citeseer, 2009.
[24]
G.-H. Lee, Y. Yuan, S. Chang, T. Jaakkola, Tight certificates of adversarial robustness for randomly smoothed classifiers, NeurIPS, 2019.
[25]
A. Levine, S. Feizi, Wasserstein smoothing: certified robustness against Wasserstein adversarial attacks, International Conference on Artificial Intelligence and Statistics, 2020, pp. 3938–3947.
[26]
Y. Li, N. Koren, L. Lyu, X. Lyu, B. Li, X. Ma, Neural attention distillation: erasing backdoor triggers from deep neural networks, ICLR, 2021.
[27]
K. Liu, B. Dolan-Gavitt, S. Garg, Fine-pruning: defending against backdooring attacks on deep neural networks, International Symposium on Research in Attacks, Intrusions, and Defenses, Springer, 2018, pp. 273–294.
[28]
Y. Liu, W.-C. Lee, G. Tao, S. Ma, Y. Aafer, X. Zhang, Abs: scanning neural networks for back-doors by artificial brain stimulation, CCS, 2019.
[29]
Y. Liu, S. Ma, Y. Aafer, W. Lee, J. Zhai, W. Wang, X. Zhang, Trojaning attack on neural networks, NDSS, 2018.
[30]
S.-M. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, P. Frossard, Universal adversarial perturbations, CVPR, 2017, pp. 1765–1773.
[31]
A. Nguyen, A. Tran, Wanet–imperceptible warping-based backdoor attack, ICLR, 2021.
[32]
R. Novak, Y. Bahri, D.A. Abolafia, J. Pennington, J. Sohl-Dickstein, Sensitivity and generalization in neural networks: an empirical study, ICLR, 2018.
[33]
I. Olkin, Contributions to Probability and Statistics: Essays in Honor of Harold Hotelling, Stanford University Press, 1960.
[34]
Pearson, K., 1895. Notes on regression and inheritance in the case of two parents proceedings of the royal society of London, 58, 240–242.
[35]
E. Rosenfeld, E. Winston, P. Ravikumar, Z. Kolter, Certified robustness to label-flipping attacks via randomized smoothing, International Conference on Machine Learning, PMLR, 2020, pp. 8230–8241.
[36]
H. Salman, J. Li, I. Razenshteyn, P. Zhang, H. Zhang, S. Bubeck, G. Yang, Provably robust deep learning via adversarially trained smoothed classifiers, NeurIPS, 2019.
[37]
A. Shafahi, W.R. Huang, M. Najibi, O. Suciu, C. Studer, T. Dumitras, T. Goldstein, Poison frogs! targeted clean-label poisoning attacks on neural networks, NeurIPS, 2018.
[38]
H. Shu, H. Zhu, Sensitivity analysis of deep neural networks, AAAI, 2019.
[39]
C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, R. Fergus, Intriguing properties of neural networks, Proceedings of the 2014 International Conference on Learning Representations, Computational and Biological Learning Society, 2014.
[40]
T.J.L. Tan, R. Shokri, Bypassing backdoor detection algorithms in deep learning, European Symposium of Security and Privacy, 2020.
[41]
B. Tran, J. Li, A. Madry, Spectral signatures in backdoor attacks, NeurIPS, 2018.
[42]
J. Uttley, Power Analysis, Sample Size, and Assessment of Statistical Assumptions-Improving the Evidential Value of Lighting research, Leukos, 2019.
[43]
B. Wang, Y. Yao, S. Shan, H. Li, B. Viswanath, H. Zheng, B.Y. Zhao, Neural cleanse: identifying and mitigating backdoor attacks in neural networks, IEEE Symposium on Security and Privacy, 2019.
[44]
Z. Xiang, D.J. Miller, G. Kesidis, Detection of backdoors in trained classifiers without access to the training set, IEEE Trans. Neural Netw. Learn. Syst. 33 (3) (2020) 1177–1191.
[45]
Z. Xiang, D.J. Miller, G. Kesidis, Reverse engineering imperceptible backdoor attacks on deep neural networks for detection and training set cleansing, Comput. Secur. 106 (2021) 102280.
[46]
Xiao, H., Rasul, K., Vollgraf, R., 2017. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747.
[47]
C. Zhang, S. Bengio, M. Hardt, B. Recht, O. Vinyals, Understanding deep learning requires rethinking generalization, ICLR, 2017.
[48]
P. Zhao, P.-Y. Chen, P. Das, K.N. Ramamurthy, X. Lin, Bridging mode connectivity in loss landscapes and adversarial robustness, ICLR, 2020.
[49]
C. Zhu, W.R. Huang, H. Li, G. Taylor, C. Studer, T. Goldstein, Transferable clean-label poisoning attacks on deep neural nets, ICML, 2019.
[50]
L. Zhu, R. Ning, C. Wang, C. Xin, H. Wu, Gangsweep: Sweep out neural backdoors by GAN, Proceedings of the 28th ACM International Conference on Multimedia, 2020, pp. 3173–3181.
[51]
L. Zhu, R. Ning, C. Xin, C. Wang, H. Wu, Clear: clean-up sample-targeted backdoor in neural networks, Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 16453–16462.

Cited By

View all
  • (2023)Algorithmic collective action in machine learningProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3618918(12570-12586)Online publication date: 23-Jul-2023
  • (2023)Wild Patterns Reloaded: A Survey of Machine Learning Security against Training Data PoisoningACM Computing Surveys10.1145/358538555:13s(1-39)Online publication date: 13-Jul-2023

Index Terms

  1. Backdoor smoothing: Demystifying backdoor attacks on deep neural networks
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image Computers and Security
        Computers and Security  Volume 120, Issue C
        Sep 2022
        680 pages

        Publisher

        Elsevier Advanced Technology Publications

        United Kingdom

        Publication History

        Published: 01 September 2022

        Author Tags

        1. ML security
        2. Deep learning backdoors
        3. ML poisoning
        4. Training time attacks
        5. Training time defenses

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 21 Sep 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2023)Algorithmic collective action in machine learningProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3618918(12570-12586)Online publication date: 23-Jul-2023
        • (2023)Wild Patterns Reloaded: A Survey of Machine Learning Security against Training Data PoisoningACM Computing Surveys10.1145/358538555:13s(1-39)Online publication date: 13-Jul-2023

        View Options

        View options

        Get Access

        Login options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media