Abstract
Backdoor attacks, which manipulate model output, have garnered significant attention from researchers. However, some existing word-level backdoor attack methods in NLP models are difficult to defend effectively due to their concealment and diversity. These covert attacks use two words that appear similar to the naked eye but will be mapped to different word vectors by the NLP model as a way of bypassing existing defenses. To address this issue, we propose incorporating triple metric learning into the standard training phase of NLP models to defend against existing word-level backdoor attacks. Specifically, metric learning is used to minimize the distance between vectors of similar words while maximizing the distance between them and vectors of other words. Additionally, given that metric learning may reduce a model’s sensitivity to semantic changes caused by subtle perturbations, we added contrastive learning after the model’s standard training. Experimental results demonstrate that our method performs well against the two most stealthy existing word-level backdoor attacks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Azizi, A., et al.: T-Miner: a generative approach to defend against trojan attacks on DNN-based text classification. In: 30th USENIX Security Symposium (USENIX Security 21), pp. 2255–2272. USENIX Association, August 2021. https://www.usenix.org/conference/usenixsecurity21/presentation/azizi
Chen, C., Dai, J.: Mitigating backdoor attacks in LSTM-based text classification systems by backdoor keyword identification. Neurocomputing 452, 253–262 (2021). https://doi.org/10.1016/j.neucom.2021.04.105. https://www.sciencedirect.com/science/article/pii/S0925231221006639
Chen, X., Dong, Y., Sun, Z., Zhai, S., Shen, Q., Wu, Z.: Kallima: a clean-label framework for textual backdoor attacks. arXiv preprint arXiv:2206.01832 (2022)
Chen, X., Salem, A., Backes, M., Ma, S., Zhang, Y.: BadNL: backdoor attacks against NLP models. In: ICML 2021 Workshop on Adversarial Machine Learning (2021)
Cui, G., Yuan, L., He, B., Chen, Y., Liu, Z., Sun, M.: A unified evaluation of textual backdoor learning: frameworks and benchmarks. In: Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A. (eds.) Advances in Neural Information Processing Systems, vol. 35, pp. 5009–5023. Curran Associates, Inc. (2022). https://proceedings.neurips.cc/paper_files/paper/2022/file/2052b3e0617ecb2ce9474a6feaf422b3-Paper-Datasets_and_Benchmarks.pdf
Dai, J., Chen, C., Li, Y.: A backdoor attack against LSTM-based text classification systems. IEEE Access 7, 138872–138878 (2019)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Doan, B.G., Abbasnejad, E., Ranasinghe, D.C.: Februus: input purification defense against trojan attacks on deep neural network systems. In: Annual Computer Security Applications Conference, ACSAC 2020, pp. 897–912. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3427228.3427264
Gao, Y., Xu, C., Wang, D., Chen, S., Ranasinghe, D.C., Nepal, S.: STRIP: a defence against trojan attacks on deep neural networks. In: Proceedings of the 35th Annual Computer Security Applications Conference, ACSAC 2019, pp. 113–125. Association for Computing Machinery, New York, NY, USA (2019). https://doi.org/10.1145/3359789.3359790
Garg, S., Kumar, A., Goel, V., Liang, Y.: Can adversarial weight perturbations inject neural backdoors. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management, CIKM 2020, pp. 2029–2032. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3340531.3412130
Garg, S., Ramakrishnan, G.: BAE: BERT-based adversarial examples for text classification. arXiv preprint arXiv:2004.01970 (2020)
Gu, T., Dolan-Gavitt, B., Garg, S.: BadNets: identifying vulnerabilities in the machine learning model supply chain. arXiv preprint arXiv:1708.06733 (2017)
Le, T., Lee, J., Yen, K., Hu, Y., Lee, D.: Perturbations in the wild: leveraging human-written text perturbations for realistic adversarial attack and defense. arXiv preprint arXiv:2203.10346 (2022)
Li, Y., Lyu, X., Koren, N., Lyu, L., Li, B., Ma, X.: Neural attention distillation: erasing backdoor triggers from deep neural networks. In: International Conference on Learning Representations (2021). https://openreview.net/forum?id=9l0K4OM-oXE
Li, Y., Zhai, T., Wu, B., Jiang, Y., Li, Z., Xia, S.: Rethinking the trigger of backdoor attack (2021)
Liao, C., Zhong, H., Squicciarini, A., Zhu, S., Miller, D.: Backdoor embedding in convolutional neural network models via invisible perturbation. arXiv preprint arXiv:1808.10307 (2018)
Liu, Y., et al.: Trojaning attack on neural networks (2017)
Liu, Y., Shen, G., Tao, G., An, S., Ma, S., Zhang, X.: PICCOLO: exposing complex backdoors in NLP transformer models. In: 2022 IEEE Symposium on Security and Privacy (SP), pp. 2025–2042 (2022). https://doi.org/10.1109/SP46214.2022.9833579
Liu, Y., Ma, X., Bailey, J., Lu, F.: Reflection backdoor: a natural backdoor attack on deep neural networks. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12355, pp. 182–199. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58607-2_11
Lyu, W., Zheng, S., Ma, T., Chen, C.: A study of the attention abnormality in Trojaned BERTs. In: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 4727–4741. Association for Computational Linguistics, Seattle, United States, July 2022. https://doi.org/10.18653/v1/2022.naacl-main.348. https://aclanthology.org/2022.naacl-main.348
Qi, F., Chen, Y., Li, M., Yao, Y., Liu, Z., Sun, M.: ONION: a simple and effective defense against textual backdoor attacks. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 9558–9566. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, November 2021. https://doi.org/10.18653/v1/2021.emnlp-main.752. https://aclanthology.org/2021.emnlp-main.752
Qi, F., Chen, Y., Zhang, X., Li, M., Liu, Z., Sun, M.: Mind the style of text! Adversarial and backdoor attacks based on text style transfer. arXiv preprint arXiv:2110.07139 (2021)
Qi, F., et al.: Hidden killer: invisible textual backdoor attacks with syntactic trigger. arXiv preprint arXiv:2105.12400 (2021)
Qi, F., Yao, Y., Xu, S., Liu, Z., Sun, M.: Turn the combination lock: learnable textual backdoor attacks via word substitution. arXiv preprint arXiv:2106.06361 (2021)
Saha, A., Subramanya, A., Pirsiavash, H.: Hidden trigger backdoor attacks. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 11957–11965 (2020)
Socher, R., et al.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1631–1642 (2013)
Sun, L.: Natural backdoor attack on text data (2021)
Wang, B., et al.: Neural cleanse: identifying and mitigating backdoor attacks in neural networks. In: 2019 IEEE Symposium on Security and Privacy (SP), pp. 707–723 (2019). https://doi.org/10.1109/SP.2019.00031
Wang, D., Ding, N., Li, P., Zheng, H.: CLINE: contrastive learning with semantic negative examples for natural language understanding. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 2332–2342. Association for Computational Linguistics, August 2021. https://doi.org/10.18653/v1/2021.acl-long.181. https://aclanthology.org/2021.acl-long.181
Xu, X., Wang, Q., Li, H., Borisov, N., Gunter, C.A., Li, B.: Detecting AI trojans using meta neural analysis. In: 2021 IEEE Symposium on Security and Privacy (SP), pp. 103–120 (2021). https://doi.org/10.1109/SP40001.2021.00034
Yang, W., Li, L., Zhang, Z., Ren, X., Sun, X., He, B.: Be careful about poisoned word embeddings: exploring the vulnerability of the embedding layers in NLP models. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 2048–2058. Association for Computational Linguistics, June 2021. https://doi.org/10.18653/v1/2021.naacl-main.165. https://aclanthology.org/2021.naacl-main.165
Yang, W., Lin, Y., Li, P., Zhou, J., Sun, X.: RAP: robustness-aware perturbations for defending against backdoor attacks on NLP models. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 8365–8381. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, November 2021. https://doi.org/10.18653/v1/2021.emnlp-main.659. https://aclanthology.org/2021.emnlp-main.659
Yang, W., Lin, Y., Li, P., Zhou, J., Sun, X.: Rethinking stealthiness of backdoor attack against NLP models. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 5543–5557. Association for Computational Linguistics, August 2021. https://doi.org/10.18653/v1/2021.acl-long.431. https://aclanthology.org/2021.acl-long.431
Yang, Y., Wang, X., He, K.: Robust textual embedding against word-level adversarial attacks. In: Cussens, J., Zhang, K. (eds.) Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence. Proceedings of Machine Learning Research, vol. 180, pp. 2214–2224. PMLR, 01–05 August 2022. https://proceedings.mlr.press/v180/yang22c.html
Zampieri, M., Malmasi, S., Nakov, P., Rosenthal, S., Farra, N., Kumar, R.: Predicting the type and target of offensive posts in social media. arXiv preprint arXiv:1902.09666 (2019)
Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. IN: Advances in Neural Information Processing Systems, vol. 28 (2015)
Zhao, S., Ma, X., Zheng, X., Bailey, J., Chen, J., Jiang, Y.G.: Clean-label backdoor attacks on video recognition models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14443–14452 (2020)
Acknowledgements
This work was supported in part by Jiangsu Province Modern Education Technology Research 2021 Smart Campus Special Project “Research and Practice of Data Security System in College Smart Campus Construction” (2021-R-96776); Jiangsu Province University Philosophy and Social Science Research Major Project “Higher Vocational College Ideological and Political Science” “Research on the Construction of Class Selective Compulsory Courses” (2022SJZDSZ011); Research Project of Nanjing Industrial Vocational and Technical College (2020SKYJ03); 2022 “Research on the Teaching Reform of High-quality Public Courses in Jiangsu Province Colleges and Universities” --The new era of labor education courses in Jiangsu higher vocational colleges Theoretical and practical research.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Yang, S., Li, Q., Lian, Z., Wang, P., Hou, J. (2024). MIC: An Effective Defense Against Word-Level Textual Backdoor Attacks. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Lecture Notes in Computer Science, vol 14452. Springer, Singapore. https://doi.org/10.1007/978-981-99-8076-5_1
Download citation
DOI: https://doi.org/10.1007/978-981-99-8076-5_1
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8075-8
Online ISBN: 978-981-99-8076-5
eBook Packages: Computer ScienceComputer Science (R0)