Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3534678.3539357acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Public Access

LeapAttack: Hard-Label Adversarial Attack on Text via Gradient-Based Optimization

Published: 14 August 2022 Publication History

Abstract

Generating text adversarial examples in the hard-label setting is a more realistic and challenging black-box adversarial attack problem, whose challenge comes from the fact that gradient cannot be directly calculated from discrete word replacements. Consequently, the effectiveness of gradient-based methods for this problem still awaits improvement. In this paper, we propose a gradient-based optimization method named LeapAttack to craft high-quality text adversarial examples in the hard-label setting. To specify, LeapAttack employs the word embedding space to characterize the semantic deviation between the two words of each perturbed substitution by their difference vector. Facilitated by this expression, LeapAttack gradually updates the perturbation direction and constructs adversarial examples in an iterative round trip: firstly, the gradient is estimated by transforming randomly sampled word candidates to continuous difference vectors after moving the current adversarial example near the decision boundary; secondly, the estimated gradient is mapped back to a new substitution word based on the cosine similarity metric. Extensive experimental results show that in the general case LeapAttack can efficiently generate high-quality text adversarial examples with the highest semantic similarity and the lowest perturbation rate in the hard-label setting.

Supplemental Material

MP4 File
This video is a presentation video of the gradient-based solution named LeapAttack proposed for the hard-label text adversarial attack task.

References

[1]
Samuel R. Bowman, Gabor Angeli, Christopher Potts, and Christopher D. Manning. 2015. A large annotated corpus for learning natural language inference. In EMNLP. The Association for Computational Linguistics, 632--642.
[2]
Wieland Brendel, Jonas Rauber, and Matthias Bethge. 2018. Decision-Based Adversarial Attacks: Reliable Attacks Against Black-Box Machine Learning Models. In ICLR. OpenReview.net.
[3]
Nicholas Carlini and David A. Wagner. 2017. Towards Evaluating the Robustness of Neural Networks. In S&P. IEEE Computer Society, 39--57.
[4]
Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St John, Noah Constant, Mario Guajardo-Céspedes, Steve Yuan, Chris Tar, et al. 2018. Universal sentence encoder. arXiv preprint arXiv:1803.11175 (2018).
[5]
Jianbo Chen, Michael I. Jordan, and Martin J. Wainwright. 2020. HopSkipJumpAttack: A Query-Efficient Decision-Based Attack. In S&P. IEEE, 1277--1294.
[6]
Minhao Cheng, Thong Le, Pin-Yu Chen, Huan Zhang, Jinfeng Yi, and Cho-Jui Hsieh. 2019. Query-Efficient Hard-label Black-box Attack: An Optimization-based Approach. In ICLR. OpenReview.net.
[7]
Minhao Cheng, Simranjit Singh, Patrick H. Chen, Pin-Yu Chen, Sijia Liu, and Cho-Jui Hsieh. 2020. Sign-OPT: A Query-Efficient Hard-label Adversarial Attack. In International Conference on Learning Representations.
[8]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL-HLT. Association for Computational Linguistics, 4171--4186.
[9]
Javid Ebrahimi, Anyi Rao, Daniel Lowd, and Dejing Dou. 2018. HotFlip: White- Box Adversarial Examples for Text Classification. In ACL. Association for Computational Linguistics, 31--36.
[10]
Ji Gao, Jack Lanchantin, Mary Lou Soffa, and Yanjun Qi. 2018. Black-Box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers. In SP Workshops. IEEE Computer Society, 50--56.
[11]
Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. 2015. Explaining and Harnessing Adversarial Examples. In ICLR.
[12]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural Comput. 9, 8 (1997), 1735--1780.
[13]
Di Jin, Zhijing Jin, Joey Tianyi Zhou, and Peter Szolovits. 2020. Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment. In AAAI. AAAI Press, 8018--8025.
[14]
Yoon Kim. 2014. Convolutional Neural Networks for Sentence Classification. In EMNLP. ACL, 1746--1751.
[15]
Alexey Kurakin, Ian J. Goodfellow, and Samy Bengio. 2017. Adversarial examples in the physical world. In ICLR. OpenReview.net.
[16]
Jinfeng Li, Shouling Ji, Tianyu Du, Bo Li, and Ting Wang. 2019. TextBugger: Generating Adversarial Text Against Real-world Applications. In NDSS.
[17]
Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. 2011. Learning Word Vectors for Sentiment Analysis. In ACL. The Association for Computer Linguistics, 142--150.
[18]
Rishabh Maheshwary, Saket Maheshwary, and Vikram Pudi. 2021. Generating Natural Language Attacks in a Hard Label Black Box Setting. In AAAI.
[19]
Paul Michel, Xian Li, Graham Neubig, and Juan Miguel Pino. 2019. On Evaluation of Adversarial Perturbations for Sequence-to-Sequence Models. In NAACL-HLT 2019. Association for Computational Linguistics, 3103--3114.
[20]
Shakir Mohamed, Mihaela Rosca, Michael Figurnov, and Andriy Mnih. 2020. Monte Carlo Gradient Estimation in Machine Learning. J. Mach. Learn. Res. 21, 132 (2020), 1--62.
[21]
Nikola Mrksic, Diarmuid Ó Séaghdha, Blaise Thomson, Milica Gasic, Lina Maria Rojas-Barahona, Pei-Hao Su, David Vandyke, Tsung-Hsien Wen, and Steve J. Young. 2016. Counter-fitting Word Vectors to Linguistic Constraints. In NAACL HLT 2016. The Association for Computational Linguistics, 142--148.
[22]
Bo Pang and Lillian Lee. 2005. Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales. In ACL. The Association for Computer Linguistics, 115--124.
[23]
Shuhuai Ren, Yihe Deng, Kun He, and Wanxiang Che. 2019. Generating Natural Language Adversarial Examples through Probability Weighted Word Saliency. In ACL. Association for Computational Linguistics, 1085--1097.
[24]
Adina Williams, Nikita Nangia, and Samuel R. Bowman. 2018. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference. In NAACL-HLT. Association for Computational Linguistics, 1112--1122.
[25]
Muchao Ye, Chenglin Miao, Ting Wang, and Fenglong Ma. 2022. TextHoaxer: Budgeted Hard-Label Adversarial Attacks on Text. AAAI (2022).
[26]
Xiang Zhang, Junbo Jake Zhao, and Yann LeCun. 2015. Character-level Convolutional Networks for Text Classification. In NeurIPS. 649--657.
[27]
Yao Zhou, Jun Wu, Haixun Wang, and Jingrui He. 2020. Adversarial Robustness through Bias Variance Decomposition: A New Perspective for Federated Learning. arXiv preprint arXiv:2009.09026 (2020).
[28]
Yi Zhou, Xiaoqing Zheng, Cho-Jui Hsieh, Kai-Wei Chang, and Xuanjing Huang. 2021. Defense against Synonym Substitution-based Adversarial Attacks via Dirichlet Neighborhood Ensemble. In ACL. Online.

Cited By

View all
  • (2024)FastTextDodger: Decision-Based Adversarial Attack Against Black-Box NLP Models With Extremely High EfficiencyIEEE Transactions on Information Forensics and Security10.1109/TIFS.2024.335037619(2398-2411)Online publication date: 1-Jan-2024
  • (2024)TextCheater: A Query-Efficient Textual Adversarial Attack in the Hard-Label SettingIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2023.333980221:4(3901-3916)Online publication date: Jul-2024
  • (2024)Adaptive Gradient-based Word Saliency for adversarial text attacksNeurocomputing10.1016/j.neucom.2024.127667590:COnline publication date: 18-Jul-2024
  • Show More Cited By

Index Terms

  1. LeapAttack: Hard-Label Adversarial Attack on Text via Gradient-Based Optimization

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
      August 2022
      5033 pages
      ISBN:9781450393850
      DOI:10.1145/3534678
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 14 August 2022

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. gradient-based optimization
      2. hard-label text adversarial attack

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      KDD '22
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)310
      • Downloads (Last 6 weeks)36
      Reflects downloads up to 03 Oct 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)FastTextDodger: Decision-Based Adversarial Attack Against Black-Box NLP Models With Extremely High EfficiencyIEEE Transactions on Information Forensics and Security10.1109/TIFS.2024.335037619(2398-2411)Online publication date: 1-Jan-2024
      • (2024)TextCheater: A Query-Efficient Textual Adversarial Attack in the Hard-Label SettingIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2023.333980221:4(3901-3916)Online publication date: Jul-2024
      • (2024)Adaptive Gradient-based Word Saliency for adversarial text attacksNeurocomputing10.1016/j.neucom.2024.127667590:COnline publication date: 18-Jul-2024
      • (2023)HQA-attackProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668357(51347-51358)Online publication date: 10-Dec-2023
      • (2023)UniTProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667103(22351-22368)Online publication date: 10-Dec-2023
      • (2023)Character as pixelsProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence10.24963/ijcai.2023/109(983-990)Online publication date: 19-Aug-2023
      • (2023)PAT: Geometry-Aware Hard-Label Black-Box Adversarial Attacks on TextProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599461(3093-3104)Online publication date: 6-Aug-2023

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media