research-article

Public Access

LeapAttack: Hard-Label Adversarial Attack on Text via Gradient-Based Optimization

Authors:

Fenglong MaAuthors Info & Claims

KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Pages 2307 - 2315

https://doi.org/10.1145/3534678.3539357

Published: 14 August 2022 Publication History

Abstract

Generating text adversarial examples in the hard-label setting is a more realistic and challenging black-box adversarial attack problem, whose challenge comes from the fact that gradient cannot be directly calculated from discrete word replacements. Consequently, the effectiveness of gradient-based methods for this problem still awaits improvement. In this paper, we propose a gradient-based optimization method named LeapAttack to craft high-quality text adversarial examples in the hard-label setting. To specify, LeapAttack employs the word embedding space to characterize the semantic deviation between the two words of each perturbed substitution by their difference vector. Facilitated by this expression, LeapAttack gradually updates the perturbation direction and constructs adversarial examples in an iterative round trip: firstly, the gradient is estimated by transforming randomly sampled word candidates to continuous difference vectors after moving the current adversarial example near the decision boundary; secondly, the estimated gradient is mapped back to a new substitution word based on the cosine similarity metric. Extensive experimental results show that in the general case LeapAttack can efficiently generate high-quality text adversarial examples with the highest semantic similarity and the lowest perturbation rate in the hard-label setting.

Supplemental Material

MP4 File

This video is a presentation video of the gradient-based solution named LeapAttack proposed for the hard-label text adversarial attack task.

Download
26.39 MB

References

[1]

Samuel R. Bowman, Gabor Angeli, Christopher Potts, and Christopher D. Manning. 2015. A large annotated corpus for learning natural language inference. In EMNLP. The Association for Computational Linguistics, 632--642.

[2]

Wieland Brendel, Jonas Rauber, and Matthias Bethge. 2018. Decision-Based Adversarial Attacks: Reliable Attacks Against Black-Box Machine Learning Models. In ICLR. OpenReview.net.

[3]

Nicholas Carlini and David A. Wagner. 2017. Towards Evaluating the Robustness of Neural Networks. In S&P. IEEE Computer Society, 39--57.

[4]

Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St John, Noah Constant, Mario Guajardo-Céspedes, Steve Yuan, Chris Tar, et al. 2018. Universal sentence encoder. arXiv preprint arXiv:1803.11175 (2018).

[5]

Jianbo Chen, Michael I. Jordan, and Martin J. Wainwright. 2020. HopSkipJumpAttack: A Query-Efficient Decision-Based Attack. In S&P. IEEE, 1277--1294.

[6]

Minhao Cheng, Thong Le, Pin-Yu Chen, Huan Zhang, Jinfeng Yi, and Cho-Jui Hsieh. 2019. Query-Efficient Hard-label Black-box Attack: An Optimization-based Approach. In ICLR. OpenReview.net.

[7]

Minhao Cheng, Simranjit Singh, Patrick H. Chen, Pin-Yu Chen, Sijia Liu, and Cho-Jui Hsieh. 2020. Sign-OPT: A Query-Efficient Hard-label Adversarial Attack. In International Conference on Learning Representations.

[8]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL-HLT. Association for Computational Linguistics, 4171--4186.

[9]

Javid Ebrahimi, Anyi Rao, Daniel Lowd, and Dejing Dou. 2018. HotFlip: White- Box Adversarial Examples for Text Classification. In ACL. Association for Computational Linguistics, 31--36.

[10]

Ji Gao, Jack Lanchantin, Mary Lou Soffa, and Yanjun Qi. 2018. Black-Box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers. In SP Workshops. IEEE Computer Society, 50--56.

[11]

Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. 2015. Explaining and Harnessing Adversarial Examples. In ICLR.

[12]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural Comput. 9, 8 (1997), 1735--1780.

Digital Library

[13]

Di Jin, Zhijing Jin, Joey Tianyi Zhou, and Peter Szolovits. 2020. Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment. In AAAI. AAAI Press, 8018--8025.

[14]

Yoon Kim. 2014. Convolutional Neural Networks for Sentence Classification. In EMNLP. ACL, 1746--1751.

[15]

Alexey Kurakin, Ian J. Goodfellow, and Samy Bengio. 2017. Adversarial examples in the physical world. In ICLR. OpenReview.net.

[16]

Jinfeng Li, Shouling Ji, Tianyu Du, Bo Li, and Ting Wang. 2019. TextBugger: Generating Adversarial Text Against Real-world Applications. In NDSS.

[17]

Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. 2011. Learning Word Vectors for Sentiment Analysis. In ACL. The Association for Computer Linguistics, 142--150.

Digital Library

[18]

Rishabh Maheshwary, Saket Maheshwary, and Vikram Pudi. 2021. Generating Natural Language Attacks in a Hard Label Black Box Setting. In AAAI.

[19]

Paul Michel, Xian Li, Graham Neubig, and Juan Miguel Pino. 2019. On Evaluation of Adversarial Perturbations for Sequence-to-Sequence Models. In NAACL-HLT 2019. Association for Computational Linguistics, 3103--3114.

[20]

Shakir Mohamed, Mihaela Rosca, Michael Figurnov, and Andriy Mnih. 2020. Monte Carlo Gradient Estimation in Machine Learning. J. Mach. Learn. Res. 21, 132 (2020), 1--62.

[21]

Nikola Mrksic, Diarmuid Ó Séaghdha, Blaise Thomson, Milica Gasic, Lina Maria Rojas-Barahona, Pei-Hao Su, David Vandyke, Tsung-Hsien Wen, and Steve J. Young. 2016. Counter-fitting Word Vectors to Linguistic Constraints. In NAACL HLT 2016. The Association for Computational Linguistics, 142--148.

[22]

Bo Pang and Lillian Lee. 2005. Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales. In ACL. The Association for Computer Linguistics, 115--124.

Digital Library

[23]

Shuhuai Ren, Yihe Deng, Kun He, and Wanxiang Che. 2019. Generating Natural Language Adversarial Examples through Probability Weighted Word Saliency. In ACL. Association for Computational Linguistics, 1085--1097.

[24]

Adina Williams, Nikita Nangia, and Samuel R. Bowman. 2018. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference. In NAACL-HLT. Association for Computational Linguistics, 1112--1122.

[25]

Muchao Ye, Chenglin Miao, Ting Wang, and Fenglong Ma. 2022. TextHoaxer: Budgeted Hard-Label Adversarial Attacks on Text. AAAI (2022).

[26]

Xiang Zhang, Junbo Jake Zhao, and Yann LeCun. 2015. Character-level Convolutional Networks for Text Classification. In NeurIPS. 649--657.

Digital Library

[27]

Yao Zhou, Jun Wu, Haixun Wang, and Jingrui He. 2020. Adversarial Robustness through Bias Variance Decomposition: A New Perspective for Federated Learning. arXiv preprint arXiv:2009.09026 (2020).

[28]

Yi Zhou, Xiaoqing Zheng, Cho-Jui Hsieh, Kai-Wei Chang, and Xuanjing Huang. 2021. Defense against Synonym Substitution-based Adversarial Attacks via Dirichlet Neighborhood Ensemble. In ACL. Online.

Cited By

Hu XLiu GZheng BZhao LWang QZhang YDu M(2024)FastTextDodger: Decision-Based Adversarial Attack Against Black-Box NLP Models With Extremely High EfficiencyIEEE Transactions on Information Forensics and Security10.1109/TIFS.2024.335037619(2398-2411)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TIFS.2024.3350376
Peng HGuo SZhao DZhang XHan JJi SYang XZhong M(2024)TextCheater: A Query-Efficient Textual Adversarial Attack in the Hard-Label SettingIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2023.333980221:4(3901-3916)Online publication date: Jul-2024
https://doi.org/10.1109/TDSC.2023.3339802
Qi YYang XLiu BZhang KLiu W(2024)Adaptive Gradient-based Word Saliency for adversarial text attacksNeurocomputing10.1016/j.neucom.2024.127667590:COnline publication date: 18-Jul-2024
https://dl.acm.org/doi/10.1016/j.neucom.2024.127667
Show More Cited By

Index Terms

LeapAttack: Hard-Label Adversarial Attack on Text via Gradient-Based Optimization
1. Computing methodologies
  1. Artificial intelligence
    1. Search methodologies
      1. Discrete space search
2. Security and privacy

Recommendations

Average Gradient-Based Adversarial Attack
Deep neural networks (DNNs) are vulnerable to adversarial attacks which can fool the classifiers by adding small perturbations to the original example. The added perturbations in most existing attacks are mainly determined by the gradient of the loss ...
Adversarial Attack Against Convolutional Neural Network via Gradient Approximation
Advanced Intelligent Computing Technology and Applications
Abstract
At present, convolutional neural networks (CNNs) have become an essential method for image recognition tasks, owing to their remarkable accuracy and efficiency. However, the susceptibility of convolutional neural networks to adversarial attacks, ...
Improving adversarial attacks on deep neural networks via constricted gradient-based perturbations
Abstract
Despite the remarkable success achieved by the deep learning techniques, adversarial attacks on deep neural networks unveiled the security issues posted in specific domains. Such carefully crafted adversarial instances generated by the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 2022

5033 pages

ISBN:9781450393850

DOI:10.1145/3534678

General Chairs:
Aidong Zhang
University of Virginia
,
Huzefa Rangwala
Amazon/George Mason University

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 August 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation

Conference

KDD '22

Sponsor:

KDD '22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 14 - 18, 2022

Washington DC, USA

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
610
Total Downloads

Downloads (Last 12 months)310
Downloads (Last 6 weeks)36

Reflects downloads up to 03 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Hu XLiu GZheng BZhao LWang QZhang YDu M(2024)FastTextDodger: Decision-Based Adversarial Attack Against Black-Box NLP Models With Extremely High EfficiencyIEEE Transactions on Information Forensics and Security10.1109/TIFS.2024.335037619(2398-2411)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TIFS.2024.3350376
Peng HGuo SZhao DZhang XHan JJi SYang XZhong M(2024)TextCheater: A Query-Efficient Textual Adversarial Attack in the Hard-Label SettingIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2023.333980221:4(3901-3916)Online publication date: Jul-2024
https://doi.org/10.1109/TDSC.2023.3339802
Qi YYang XLiu BZhang KLiu W(2024)Adaptive Gradient-based Word Saliency for adversarial text attacksNeurocomputing10.1016/j.neucom.2024.127667590:COnline publication date: 18-Jul-2024
https://dl.acm.org/doi/10.1016/j.neucom.2024.127667
Liu HXu ZZhang XZhang FMa FChen HYu HZhang XOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)HQA-attackProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668357(51347-51358)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3668357
Ye MYin ZZhang TDu TChen JWang TMa FOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)UniTProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667103(22351-22368)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3667103
Kou ZPei STian YZhang XElkind E(2023)Character as pixelsProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence10.24963/ijcai.2023/109(983-990)Online publication date: 19-Aug-2023
https://dl.acm.org/doi/10.24963/ijcai.2023/109
Ye MChen JMiao CLiu HWang TMa FSingh ASun YAkoglu LGunopulos DYan XKumar ROzcan FYe J(2023)PAT: Geometry-Aware Hard-Label Black-Box Adversarial Attacks on TextProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599461(3093-3104)Online publication date: 6-Aug-2023
https://dl.acm.org/doi/10.1145/3580305.3599461

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents