research-article

Unicode Evil: Evading NLP Systems Using Visual Similarities of Text Characters

Authors:

Antreas Dionysiou,

Elias AthanasopoulosAuthors Info & Claims

AISec '21: Proceedings of the 14th ACM Workshop on Artificial Intelligence and Security

Pages 1 - 12

https://doi.org/10.1145/3474369.3486871

Published: 15 November 2021 Publication History

Abstract

Adversarial Text Generation Frameworks (ATGFs) aim at causing a Natural Language Processing (NLP) machine to misbehave, i.e., misclassify a given input. In this paper, we propose EvilText, a general ATGF that successfully evades some of the most popular NLP machines by (efficiently) perturbing a given legitimate text, preserving at the same time the original text's semantics as well as human readability. Perturbations are based on visually similar classes of characters appearing in the unicode set. EvilText can be utilized from NLP services' operators for evaluating their systems security and robustness. Furthermore, EvilText outperforms the state-of-the-art ATGFs, in terms of: (a) effectiveness, (b) efficiency and (c) original text's semantics and human readability preservation. We evaluate EvilText on some of the most popular NLP systems used for sentiment analysis and toxic content detection. We further expand on the generality and transferability of our ATGF, while also exploring possible countermeasures for defending against our attacks. Surprisingly, naive defence mechanisms fail to mitigate our attacks; the only promising one being the restriction of unicode characters use. However, we argue that restricting the use of unicode characters imposes a significant trade-off between security and usability as almost all websites are heavily based on unicode support.

Supplementary Material

MP4 File (EvilText.mp4)

EvilText is an Adversarial Text Generation Framework (ATGF) used to effectively and efficiently evade popular NLP systems.

Download
35.32 MB

References

[1]

Moustafa Alzantot, Yash Sharma, Ahmed Elgohary, Bo-Jhang Ho, Mani Srivastava, and Kai-Wei Chang. 2018. Generating natural language adversarial examples. arXiv preprint arXiv:1804.07998 (2018).

[2]

Anish Athalye, Logan Engstrom, Andrew Ilyas, and Kevin Kwok. 2017. Synthesizing robust adversarial examples. arXiv preprint arXiv:1707.07397 (2017).

[3]

Marco Barreno, Blaine Nelson, Russell Sears, Anthony D Joseph, and J Doug Tygar. 2006. Can machine learning be secure?. In ASIACCS. ACM, 16--25.

[4]

Yonatan Belinkov and Yonatan Bisk. 2017. Synthetic and natural noise both break neural machine translation. arXiv preprint arXiv:1711.02173 (2017).

[5]

Battista Biggio, Giorgio Fumera, and Fabio Roli. 2011. Design of robust classifiers for adversarial environments. In SMC. IEEE, 977--982.

[6]

John Brooke et almbox. 1996. SUS-A quick and dirty usability scale. Usability evaluation in industry, Vol. 189, 194 (1996), 4--7.

[7]

Nicholas Carlini and David Wagner. 2017. Towards evaluating the robustness of neural networks. In EuroS&P. IEEE, 39--57.

[8]

Minhao Cheng, Jinfeng Yi, Pin-Yu Chen, Huan Zhang, and Cho-Jui Hsieh. 2018. Seq2Sick: Evaluating the Robustness of Sequence-to-Sequence Models with Adversarial Examples. (2018). arxiv: cs.LG/1803.01128

[9]

Gregory Cohen, Saeed Afshar, Jonathan Tapson, and André van Schaik. 2017. EMNIST: an extension of MNIST to handwritten letters. arXiv preprint arXiv:1702.05373 (2017).

[10]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).

[11]

Antreas Dionysiou, Michalis Agathocleous, Chris Christodoulou, and Vasilis Promponas. 2018. Convolutional Neural Networks in Combination with Support Vector Machines for Complex Sequential Data Classification. In ICANN. Springer, 444--455.

[12]

Antreas Dionysiou and Elias Athanasopoulos. 2020. SoK: Machine vs. Machine--A Systematic Classification of Automated Machine Learning-Based CAPTCHA Solvers. Computers & Security (2020), 101947.

[13]

Antreas Dionysiou, Vassilis Vassiliades, and Elias Athanasopoulos. 2021. HoneyGen: Generating Honeywords Using Representation Learning. Association for Computing Machinery, New York, NY, USA, 265--279. https://doi.org/10.1145/3433210.3453092

Digital Library

[14]

Javid Ebrahimi, Anyi Rao, Daniel Lowd, and Dejing Dou. 2018. HotFlip: White-Box Adversarial Examples for Text Classification. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (2018), 31--36.

[15]

Anthony Y Fu, Xiaotie Deng, Liu Wenyin, and Greg Little. 2006. The methodology and an application to fight against unicode attacks. In SOUPS. 91--101.

[16]

Ji Gao, Jack Lanchantin, Mary Lou Soffa, and Yanjun Qi. 2018. Black-box generation of adversarial text sequences to evade deep learning classifiers. In SPW. IEEE, 50--56.

[17]

Zhitao Gong, Wenlu Wang, Bo Li, Dawn Song, and Wei-Shinn Ku. 2018. Adversarial texts with gradient methods. arXiv preprint arXiv:1801.07175 (2018).

[18]

Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. 2014. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014).

[19]

Daniel Holden, Jun Saito, Taku Komura, and Thomas Joyce. 2015. Learning motion manifolds with convolutional autoencoders. In SIGGRAPH. ACM, 18.

[20]

Hossein Hosseini, Sreeram Kannan, Baosen Zhang, and Radha Poovendran. 2017. Deceiving google's perspective api built for detecting toxic comments. arXiv preprint arXiv:1702.08138 (2017).

[21]

Minqing Hu and Bing Liu. 2004. Mining and summarizing customer reviews. In SIGKDD. ACM, 168--177.

[22]

Ling Huang, Anthony D Joseph, Blaine Nelson, Benjamin IP Rubinstein, and JD Tygar. 2011. Adversarial machine learning. In AISec. ACM, 43--58.

[23]

Robin Jia and Percy Liang. 2017. Adversarial examples for evaluating reading comprehension systems. in EMNLP (2017), 2021--2031.

[24]

Armand Joulin, Edouard Grave, Piotr Bojanowski, Matthijs Douze, Hérve Jégou, and Tomas Mikolov. 2016. FastText.zip: Compressing text classification models. arXiv preprint arXiv:1612.03651 (2016).

[25]

Yoon Kim. 2014. Convolutional Neural Networks for Sentence Classification. in EMNLP (10 2014), 1746--1751.

[26]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In NeurIPS. 1097--1105.

[27]

Brandon Laughlin, Christopher Collins, Karthik Sankaranarayanan, and Khalil El-Khatib. 2019. A Visual Analytics Framework for Adversarial Text Generation. arXiv preprint arXiv:1909.11202 (2019).

[28]

Yann LeCun, Yoshua Bengio, et almbox. 1995 a. Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks, Vol. 3361, 10 (1995), 1995.

[29]

Yann LeCun, LD Jackel, Leon Bottou, A Brunot, Corinna Cortes, JS Denker, Harris Drucker, I Guyon, UA Muller, Eduard Sackinger, et al. 1995 b. Comparison of learning algorithms for handwritten digit recognition. In ICANN, Vol. 60. Perth, Australia, 53--60.

[30]

Jinfeng Li, Shouling Ji, Tianyu Du, Bo Li, and Ting Wang. 2019. TextBugger: Generating Adversarial Text Against Real-world Applications. in NDSS (2019). https://doi.org/10.14722/ndss.2019.23138

[31]

Bin Liang, Hongcheng Li, Miaoqiang Su, Pan Bian, Xirong Li, and Wenchang Shi. 2018. Deep Text Classification Can Be Fooled. Proceedings of the 27th International Joint Conference on Artificial Intelligence (2018), 4208--4215.

Digital Library

[32]

Andrew L Maas, Raymond E Daly, Peter T Pham, Dan Huang, Andrew Y Ng, and Christopher Potts. 2011. Learning word vectors for sentiment analysis. In 49th ACL: Human language technologies. ACL, 142--150.

[33]

Bo Pang and Lillian Lee. 2004. A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts. In 42nd ACL. ACL, 271--es.

[34]

Bo Pang and Lillian Lee. 2005. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In 43rd ACL. ACL, 115--124.

[35]

Bo Pang and Lillian Lee. 2008. Opinion Mining and Sentiment Analysis. Foundations and Trends in Information Retrieval, Vol. 2 (2008), 1--135.

Digital Library

[36]

Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z. Berkay Celik, and Ananthram Swami. 2017. Practical Black-Box Attacks Against Machine Learning. In ASIACCS. ACM, New York, NY, USA, 506--519.

[37]

Nicolas Papernot, Patrick McDaniel, Somesh Jha, Matt Fredrikson, Z Berkay Celik, and Ananthram Swami. 2016a. The limitations of deep learning in adversarial settings. In EuroS&P. IEEE, 372--387.

[38]

Nicolas Papernot, Patrick McDaniel, Ananthram Swami, and Richard Harang. 2016b. Crafting adversarial input sequences for recurrent neural networks. In MILCOM. IEEE, 49--54.

[39]

Sanglee Park and Jungmin So. 2020. On the Effectiveness of Adversarial Training in Defending against Adversarial Example Attacks for Image Classification. Applied Sciences, Vol. 10, 22 (2020), 8079.

[40]

Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In EMNLP. 1532--1543.

[41]

Aditi Raghunathan, Sang Michael Xie, Fanny Yang, John C Duchi, and Percy Liang. 2019. Adversarial training can hurt generalization. arXiv preprint arXiv:1906.06032 (2019).

[42]

David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. 1985. Learning internal representations by error propagation. Technical Report. California Univ San Diego La Jolla Inst for Cognitive Science.

[43]

Suranjana Samanta and Sameep Mehta. 2017. Towards crafting text adversarial samples. arXiv preprint arXiv:1707.02812 (2017).

[44]

David Sculley, Gabriel Wachman, and Carla E Brodley. 2006. Spam Filtering Using Inexact String Matching in Explicit Feature Space with On-Line Linear Classifiers. In TREC.

[45]

Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. 2013. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013).

[46]

Han Xu, Yao Ma, Haochen Liu, Debayan Deb, Hui Liu, Jiliang Tang, and Anil Jain. 2019. Adversarial attacks and defenses in images, graphs and text: A review. arXiv preprint arXiv:1909.08072 (2019).

[47]

Xiang Zhang, Junbo Zhao, and Yann LeCun. 2015. Character-level convolutional networks for text classification. In NIPS. 649--657.

[48]

Ye Zhang and Byron Wallace. 2017. A Sensitivity Analysis of (and Practitioners' Guide to) Convolutional Neural Networks for Sentence Classification. In IJCNLP. AFNLP, Taipei, Taiwan, 253--263.

[49]

Zhengli Zhao, Dheeru Dua, and Sameer Singh. 2018. Generating natural adversarial examples. in ICLR (2018).

Cited By

Roth TGao YAbuadbba ANepal SLiu W(2024)Token-modification adversarial attacks for natural language processing: A surveyAI Communications10.3233/AIC-230279(1-22)Online publication date: 2-Apr-2024
https://doi.org/10.3233/AIC-230279

Index Terms

Unicode Evil: Evading NLP Systems Using Visual Similarities of Text Characters
1. Computing methodologies
  1. Machine learning
2. Security and privacy

Recommendations

BDDR: An Effective Defense Against Textual Backdoor Attacks
Abstract
Deep neural networks (DNNs) have been recently shown to be vulnerable to backdoor attacks. The infected model performs well on benign testing samples, however, the attacker can trigger the infected model to misbehave by the backdoor. ...
Towards a unified framework for imperceptible textual attacks
Abstract
Despite the great success of Deep Neural Networks (DNNs) in the field of natural language processing (NLP), they are increasingly facing tremendous threats from textual attacks in two kinds: adversarial attacks and backdoor attacks. Both of them ...
The triggers that open the NLP model backdoors are hidden in the adversarial samples
Abstract
Deep neural networks (DNNS) have been proven to be vulnerable to adversarial attacks. But the adversarial perturbations are generated for specific input samples, and the perturbations of one sample cannot be applied to other samples. ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

AISec '21: Proceedings of the 14th ACM Workshop on Artificial Intelligence and Security

November 2021

210 pages

ISBN:9781450386579

DOI:10.1145/3474369

Program Chairs:
Nicholas Carlini
Google Brain
,
Ambra Demontis
University of Cagliari
,
Yizheng Chen
University of California, Berkeley

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGSAC: ACM Special Interest Group on Security, Audit, and Control

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 November 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Horizon 2020 research and innovation programme
Marie Sk?odowska-Curie grant

Conference

CCS '21

Sponsor:

SIGSAC

CCS '21: 2021 ACM SIGSAC Conference on Computer and Communications Security

November 15, 2021

Virtual Event, Republic of Korea

Acceptance Rates

Overall Acceptance Rate 94 of 231 submissions, 41%

Upcoming Conference

CCS '24

Sponsor:
sigsac

ACM SIGSAC Conference on Computer and Communications Security

October 14 - 18, 2024

Salt Lake City , UT , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
313
Total Downloads

Downloads (Last 12 months)26
Downloads (Last 6 weeks)0

Reflects downloads up to 30 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Roth TGao YAbuadbba ANepal SLiu W(2024)Token-modification adversarial attacks for natural language processing: A surveyAI Communications10.3233/AIC-230279(1-22)Online publication date: 2-Apr-2024
https://doi.org/10.3233/AIC-230279

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents