MIC: An Effective Defense Against Word-Level Textual Backdoor Attacks

Yang, Shufan; Li, Qianmu; Lian, Zhichao; Wang, Pengchuan; Hou, Jun

doi:10.1007/978-981-99-8076-5_1

Shufan Yang¹²,
Qianmu Li¹²,
Zhichao Lian¹²,
Pengchuan Wang¹² &
…
Jun Hou¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14452))

Included in the following conference series:

International Conference on Neural Information Processing

5002 Accesses

Abstract

Backdoor attacks, which manipulate model output, have garnered significant attention from researchers. However, some existing word-level backdoor attack methods in NLP models are difficult to defend effectively due to their concealment and diversity. These covert attacks use two words that appear similar to the naked eye but will be mapped to different word vectors by the NLP model as a way of bypassing existing defenses. To address this issue, we propose incorporating triple metric learning into the standard training phase of NLP models to defend against existing word-level backdoor attacks. Specifically, metric learning is used to minimize the distance between vectors of similar words while maximizing the distance between them and vectors of other words. Additionally, given that metric learning may reduce a model’s sensitivity to semantic changes caused by subtle perturbations, we added contrastive learning after the model’s standard training. Experimental results demonstrate that our method performs well against the two most stealthy existing word-level backdoor attacks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Punctuation Matters! Stealthy Backdoor Attack for Language Models

Non-Alpha-Num: a novel architecture for generating adversarial examples for bypassing NLP-based clickbait detection mechanisms

Article 13 May 2024

Adversarial attacks and defenses for large language models (LLMs): methods, frameworks & challenges

Article 25 June 2024

References

Azizi, A., et al.: T-Miner: a generative approach to defend against trojan attacks on DNN-based text classification. In: 30th USENIX Security Symposium (USENIX Security 21), pp. 2255–2272. USENIX Association, August 2021. https://www.usenix.org/conference/usenixsecurity21/presentation/azizi
Chen, C., Dai, J.: Mitigating backdoor attacks in LSTM-based text classification systems by backdoor keyword identification. Neurocomputing 452, 253–262 (2021). https://doi.org/10.1016/j.neucom.2021.04.105. https://www.sciencedirect.com/science/article/pii/S0925231221006639
Chen, X., Dong, Y., Sun, Z., Zhai, S., Shen, Q., Wu, Z.: Kallima: a clean-label framework for textual backdoor attacks. arXiv preprint arXiv:2206.01832 (2022)
Chen, X., Salem, A., Backes, M., Ma, S., Zhang, Y.: BadNL: backdoor attacks against NLP models. In: ICML 2021 Workshop on Adversarial Machine Learning (2021)
Google Scholar
Cui, G., Yuan, L., He, B., Chen, Y., Liu, Z., Sun, M.: A unified evaluation of textual backdoor learning: frameworks and benchmarks. In: Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A. (eds.) Advances in Neural Information Processing Systems, vol. 35, pp. 5009–5023. Curran Associates, Inc. (2022). https://proceedings.neurips.cc/paper_files/paper/2022/file/2052b3e0617ecb2ce9474a6feaf422b3-Paper-Datasets_and_Benchmarks.pdf
Dai, J., Chen, C., Li, Y.: A backdoor attack against LSTM-based text classification systems. IEEE Access 7, 138872–138878 (2019)
Article Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Doan, B.G., Abbasnejad, E., Ranasinghe, D.C.: Februus: input purification defense against trojan attacks on deep neural network systems. In: Annual Computer Security Applications Conference, ACSAC 2020, pp. 897–912. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3427228.3427264
Gao, Y., Xu, C., Wang, D., Chen, S., Ranasinghe, D.C., Nepal, S.: STRIP: a defence against trojan attacks on deep neural networks. In: Proceedings of the 35th Annual Computer Security Applications Conference, ACSAC 2019, pp. 113–125. Association for Computing Machinery, New York, NY, USA (2019). https://doi.org/10.1145/3359789.3359790
Garg, S., Kumar, A., Goel, V., Liang, Y.: Can adversarial weight perturbations inject neural backdoors. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management, CIKM 2020, pp. 2029–2032. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3340531.3412130
Garg, S., Ramakrishnan, G.: BAE: BERT-based adversarial examples for text classification. arXiv preprint arXiv:2004.01970 (2020)
Gu, T., Dolan-Gavitt, B., Garg, S.: BadNets: identifying vulnerabilities in the machine learning model supply chain. arXiv preprint arXiv:1708.06733 (2017)
Le, T., Lee, J., Yen, K., Hu, Y., Lee, D.: Perturbations in the wild: leveraging human-written text perturbations for realistic adversarial attack and defense. arXiv preprint arXiv:2203.10346 (2022)
Li, Y., Lyu, X., Koren, N., Lyu, L., Li, B., Ma, X.: Neural attention distillation: erasing backdoor triggers from deep neural networks. In: International Conference on Learning Representations (2021). https://openreview.net/forum?id=9l0K4OM-oXE
Li, Y., Zhai, T., Wu, B., Jiang, Y., Li, Z., Xia, S.: Rethinking the trigger of backdoor attack (2021)
Google Scholar
Liao, C., Zhong, H., Squicciarini, A., Zhu, S., Miller, D.: Backdoor embedding in convolutional neural network models via invisible perturbation. arXiv preprint arXiv:1808.10307 (2018)
Liu, Y., et al.: Trojaning attack on neural networks (2017)
Google Scholar
Liu, Y., Shen, G., Tao, G., An, S., Ma, S., Zhang, X.: PICCOLO: exposing complex backdoors in NLP transformer models. In: 2022 IEEE Symposium on Security and Privacy (SP), pp. 2025–2042 (2022). https://doi.org/10.1109/SP46214.2022.9833579
Liu, Y., Ma, X., Bailey, J., Lu, F.: Reflection backdoor: a natural backdoor attack on deep neural networks. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12355, pp. 182–199. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58607-2_11
Chapter Google Scholar
Lyu, W., Zheng, S., Ma, T., Chen, C.: A study of the attention abnormality in Trojaned BERTs. In: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 4727–4741. Association for Computational Linguistics, Seattle, United States, July 2022. https://doi.org/10.18653/v1/2022.naacl-main.348. https://aclanthology.org/2022.naacl-main.348
Qi, F., Chen, Y., Li, M., Yao, Y., Liu, Z., Sun, M.: ONION: a simple and effective defense against textual backdoor attacks. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 9558–9566. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, November 2021. https://doi.org/10.18653/v1/2021.emnlp-main.752. https://aclanthology.org/2021.emnlp-main.752
Qi, F., Chen, Y., Zhang, X., Li, M., Liu, Z., Sun, M.: Mind the style of text! Adversarial and backdoor attacks based on text style transfer. arXiv preprint arXiv:2110.07139 (2021)
Qi, F., et al.: Hidden killer: invisible textual backdoor attacks with syntactic trigger. arXiv preprint arXiv:2105.12400 (2021)
Qi, F., Yao, Y., Xu, S., Liu, Z., Sun, M.: Turn the combination lock: learnable textual backdoor attacks via word substitution. arXiv preprint arXiv:2106.06361 (2021)
Saha, A., Subramanya, A., Pirsiavash, H.: Hidden trigger backdoor attacks. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 11957–11965 (2020)
Google Scholar
Socher, R., et al.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1631–1642 (2013)
Google Scholar
Sun, L.: Natural backdoor attack on text data (2021)
Google Scholar
Wang, B., et al.: Neural cleanse: identifying and mitigating backdoor attacks in neural networks. In: 2019 IEEE Symposium on Security and Privacy (SP), pp. 707–723 (2019). https://doi.org/10.1109/SP.2019.00031
Wang, D., Ding, N., Li, P., Zheng, H.: CLINE: contrastive learning with semantic negative examples for natural language understanding. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 2332–2342. Association for Computational Linguistics, August 2021. https://doi.org/10.18653/v1/2021.acl-long.181. https://aclanthology.org/2021.acl-long.181
Xu, X., Wang, Q., Li, H., Borisov, N., Gunter, C.A., Li, B.: Detecting AI trojans using meta neural analysis. In: 2021 IEEE Symposium on Security and Privacy (SP), pp. 103–120 (2021). https://doi.org/10.1109/SP40001.2021.00034
Yang, W., Li, L., Zhang, Z., Ren, X., Sun, X., He, B.: Be careful about poisoned word embeddings: exploring the vulnerability of the embedding layers in NLP models. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 2048–2058. Association for Computational Linguistics, June 2021. https://doi.org/10.18653/v1/2021.naacl-main.165. https://aclanthology.org/2021.naacl-main.165
Yang, W., Lin, Y., Li, P., Zhou, J., Sun, X.: RAP: robustness-aware perturbations for defending against backdoor attacks on NLP models. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 8365–8381. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, November 2021. https://doi.org/10.18653/v1/2021.emnlp-main.659. https://aclanthology.org/2021.emnlp-main.659
Yang, W., Lin, Y., Li, P., Zhou, J., Sun, X.: Rethinking stealthiness of backdoor attack against NLP models. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 5543–5557. Association for Computational Linguistics, August 2021. https://doi.org/10.18653/v1/2021.acl-long.431. https://aclanthology.org/2021.acl-long.431
Yang, Y., Wang, X., He, K.: Robust textual embedding against word-level adversarial attacks. In: Cussens, J., Zhang, K. (eds.) Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence. Proceedings of Machine Learning Research, vol. 180, pp. 2214–2224. PMLR, 01–05 August 2022. https://proceedings.mlr.press/v180/yang22c.html
Zampieri, M., Malmasi, S., Nakov, P., Rosenthal, S., Farra, N., Kumar, R.: Predicting the type and target of offensive posts in social media. arXiv preprint arXiv:1902.09666 (2019)
Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. IN: Advances in Neural Information Processing Systems, vol. 28 (2015)
Google Scholar
Zhao, S., Ma, X., Zheng, X., Bailey, J., Chen, J., Jiang, Y.G.: Clean-label backdoor attacks on video recognition models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14443–14452 (2020)
Google Scholar

Download references

Acknowledgements

This work was supported in part by Jiangsu Province Modern Education Technology Research 2021 Smart Campus Special Project “Research and Practice of Data Security System in College Smart Campus Construction” (2021-R-96776); Jiangsu Province University Philosophy and Social Science Research Major Project “Higher Vocational College Ideological and Political Science” “Research on the Construction of Class Selective Compulsory Courses” (2022SJZDSZ011); Research Project of Nanjing Industrial Vocational and Technical College (2020SKYJ03); 2022 “Research on the Teaching Reform of High-quality Public Courses in Jiangsu Province Colleges and Universities” --The new era of labor education courses in Jiangsu higher vocational colleges Theoretical and practical research.

Author information

Authors and Affiliations

Nanjing University of Science and Technology, Nanjing, 210094, China
Shufan Yang, Qianmu Li, Zhichao Lian & Pengchuan Wang
School of Social Science, Nanjing Vocational University of Industry Technology, Nanjing, 210046, China
Jun Hou

Authors

Shufan Yang
View author publications
You can also search for this author in PubMed Google Scholar
Qianmu Li
View author publications
You can also search for this author in PubMed Google Scholar
Zhichao Lian
View author publications
You can also search for this author in PubMed Google Scholar
Pengchuan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jun Hou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qianmu Li .

Editor information

Editors and Affiliations

Central South University, Changsha, China
Biao Luo
Chinese Academy of Sciences, Beijing, China
Long Cheng
Zhejiang University, Hangzhou, China
Zheng-Guang Wu
Guangdong University of Technology, Guangzhou, China
Hongyi Li
UNSW Sydney, Sydney, NSW, Australia
Chaojie Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yang, S., Li, Q., Lian, Z., Wang, P., Hou, J. (2024). MIC: An Effective Defense Against Word-Level Textual Backdoor Attacks. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Lecture Notes in Computer Science, vol 14452. Springer, Singapore. https://doi.org/10.1007/978-981-99-8076-5_1

Download citation

DOI: https://doi.org/10.1007/978-981-99-8076-5_1
Published: 14 November 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8075-8
Online ISBN: 978-981-99-8076-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

MIC: An Effective Defense Against Word-Level Textual Backdoor Attacks

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Punctuation Matters! Stealthy Backdoor Attack for Language Models

Non-Alpha-Num: a novel architecture for generating adversarial examples for bypassing NLP-based clickbait detection mechanisms

Adversarial attacks and defenses for large language models (LLMs): methods, frameworks & challenges

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

MIC: An Effective Defense Against Word-Level Textual Backdoor Attacks

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Punctuation Matters! Stealthy Backdoor Attack for Language Models

Non-Alpha-Num: a novel architecture for generating adversarial examples for bypassing NLP-based clickbait detection mechanisms

Adversarial attacks and defenses for large language models (LLMs): methods, frameworks & challenges

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation