Abstract
When a deep learning-based model is attacked by backdoor attacks, it behaves normally for clean inputs, whereas outputs unexpected results for inputs with specific triggers. This causes serious threats to deep learning-based applications. Many backdoor detection methods have been proposed to address these threats. However, these defenses can only work on the backdoored models attacked by static trigger(s). Recently, some backdoor attacks with dynamic and invisible triggers have been developed, and existing detection methods cannot defend against these attacks. To address this new threat, in this paper, we propose a new defense mechanism that can detect and mitigate backdoor attacks with dynamic and invisible triggers. We reverse engineer generators that transform clean images into backdoor images for each label. The generated images by the generator can help to detect the existence of a backdoor and further remove it. To the best of our knowledge, our work is the first work to defend against backdoor attacks with dynamic and invisible triggers. Experiments on multiple datasets show that the proposed method can effectively detect and mitigate the backdoor with dynamic and invisible triggers in deep learning-based models.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Brown, T., et al.: Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 33, 1877–1901 (2020)
Chou, E., Tramèr, F., Pellegrino, G.: Sentinet: detecting localized universal attacks against deep learning systems. In: IEEE S &P Workshops, pp. 48–54 (2020)
Dong, Y., et al.: Black-box detection of backdoor attacks with limited information and data. In: ICCV, pp. 16482–16491 (2021)
Gao, Y., Xu, C., Wang, D., Chen, S., Ranasinghe, D.C., Nepal, S.: Strip: a defence against trojan attacks on deep neural networks. In: ACSAC, pp. 113–125 (2019)
Goodfellow, I.J., et al.: Generative adversarial nets. In: NeurIPS, pp. 1–9 (2014)
Gu, T., Liu, K., Dolan-Gavitt, B., Garg, S.: Badnets: evaluating backdooring attacks on deep neural networks. IEEE Access 7, 47230–47244 (2019)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images. University of Toronto, Technical Report (2009)
LeCun, Y., et al.: Learning algorithms for classification: a comparison on handwritten digit recognition. Neural Netw. Stat. Mech. Perspect. 261(276), 2 (1995)
Li, S., Xue, M., Zhao, B.Z.H., Zhu, H., Zhang, X.: Invisible backdoor attacks on deep neural networks via steganography and regularization. IEEE Trans. Dependable Secure Comput. 18(5), 2088–2105 (2021)
Li, Y., Lyu, X., Koren, N., Lyu, L., Li, B., Ma, X.: Neural attention distillation: erasing backdoor triggers from deep neural networks. In: ICLR, pp. 1–12 (2021)
Liu, K., Dolan-Gavitt, B., Garg, S.: Fine-pruning: defending against backdooring attacks on deep neural networks. In: Research in Attacks, Intrusions, and Defenses, pp. 273–294 (2018)
Nguyen, T.A., Tran, A.: Input-aware dynamic backdoor attack. In: NeurIPS, pp. 3454–3464 (2020)
Nguyen, T.A., Tran, A.T.: Wanet - imperceptible warping-based backdoor attack. In: ICLR, pp. 1–11 (2021)
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2016)
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 234–241 (2015)
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: Mobilenetv 2: inverted residuals and linear bottlenecks. In: CVPR, pp. 4510–4520 (2018)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR, pp. 1–14 (2015)
Stallkamp, J., Schlipsing, M., Salmen, J., Igel, C.: Man vs. computer: benchmarking machine learning algorithms for traffic sign recognition. Neural Netw. 32, 323–332 (2012)
Wang, B., et al.: Neural cleanse: identifying and mitigating backdoor attacks in neural networks. In: S &P, pp. 707–723 (2019)
Zeng, Y., Park, W., Mao, Z.M., Jia, R.: Rethinking the backdoor attacks’ triggers: a frequency perspective. In: ICCV, pp. 16473–16481 (2021)
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: CVPR, pp. 586–595 (2018)
Zhu, L., Ning, R., Wang, C., Xin, C., Wu, H.: Gangsweep: sweep out neural backdoors by gan. In: ACM MM, pp. 3173–3181 (2020)
Acknowledgements
This work was supported in part by the National Natural Science Foundation of China under Grants 62071142, by the Guangdong Basic and Applied Basic Research Foundation under Grants 2021A1515011406, by the Shenzhen College Stability Support Plan under Grant GXWD20201230155427003-20200824210638001, by the Guangdong Provincial Key Laboratory of Novel Security Intelligence Technologies under Grant 2022B1212010005.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zheng, Z., Hua, Z., Zhang, L.Y. (2023). Detecting and Mitigating Backdoor Attacks with Dynamic and Invisible Triggers. In: Tanveer, M., Agarwal, S., Ozawa, S., Ekbal, A., Jatowt, A. (eds) Neural Information Processing. ICONIP 2022. Lecture Notes in Computer Science, vol 13625. Springer, Cham. https://doi.org/10.1007/978-3-031-30111-7_19
Download citation
DOI: https://doi.org/10.1007/978-3-031-30111-7_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-30110-0
Online ISBN: 978-3-031-30111-7
eBook Packages: Computer ScienceComputer Science (R0)