Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3474369.3486871acmconferencesArticle/Chapter ViewAbstractPublication PagesccsConference Proceedingsconference-collections
research-article

Unicode Evil: Evading NLP Systems Using Visual Similarities of Text Characters

Published: 15 November 2021 Publication History

Abstract

Adversarial Text Generation Frameworks (ATGFs) aim at causing a Natural Language Processing (NLP) machine to misbehave, i.e., misclassify a given input. In this paper, we propose EvilText, a general ATGF that successfully evades some of the most popular NLP machines by (efficiently) perturbing a given legitimate text, preserving at the same time the original text's semantics as well as human readability. Perturbations are based on visually similar classes of characters appearing in the unicode set. EvilText can be utilized from NLP services' operators for evaluating their systems security and robustness. Furthermore, EvilText outperforms the state-of-the-art ATGFs, in terms of: (a) effectiveness, (b) efficiency and (c) original text's semantics and human readability preservation. We evaluate EvilText on some of the most popular NLP systems used for sentiment analysis and toxic content detection. We further expand on the generality and transferability of our ATGF, while also exploring possible countermeasures for defending against our attacks. Surprisingly, naive defence mechanisms fail to mitigate our attacks; the only promising one being the restriction of unicode characters use. However, we argue that restricting the use of unicode characters imposes a significant trade-off between security and usability as almost all websites are heavily based on unicode support.

Supplementary Material

MP4 File (EvilText.mp4)
EvilText is an Adversarial Text Generation Framework (ATGF) used to effectively and efficiently evade popular NLP systems.

References

[1]
Moustafa Alzantot, Yash Sharma, Ahmed Elgohary, Bo-Jhang Ho, Mani Srivastava, and Kai-Wei Chang. 2018. Generating natural language adversarial examples. arXiv preprint arXiv:1804.07998 (2018).
[2]
Anish Athalye, Logan Engstrom, Andrew Ilyas, and Kevin Kwok. 2017. Synthesizing robust adversarial examples. arXiv preprint arXiv:1707.07397 (2017).
[3]
Marco Barreno, Blaine Nelson, Russell Sears, Anthony D Joseph, and J Doug Tygar. 2006. Can machine learning be secure?. In ASIACCS. ACM, 16--25.
[4]
Yonatan Belinkov and Yonatan Bisk. 2017. Synthetic and natural noise both break neural machine translation. arXiv preprint arXiv:1711.02173 (2017).
[5]
Battista Biggio, Giorgio Fumera, and Fabio Roli. 2011. Design of robust classifiers for adversarial environments. In SMC. IEEE, 977--982.
[6]
John Brooke et almbox. 1996. SUS-A quick and dirty usability scale. Usability evaluation in industry, Vol. 189, 194 (1996), 4--7.
[7]
Nicholas Carlini and David Wagner. 2017. Towards evaluating the robustness of neural networks. In EuroS&P. IEEE, 39--57.
[8]
Minhao Cheng, Jinfeng Yi, Pin-Yu Chen, Huan Zhang, and Cho-Jui Hsieh. 2018. Seq2Sick: Evaluating the Robustness of Sequence-to-Sequence Models with Adversarial Examples. (2018). arxiv: cs.LG/1803.01128
[9]
Gregory Cohen, Saeed Afshar, Jonathan Tapson, and André van Schaik. 2017. EMNIST: an extension of MNIST to handwritten letters. arXiv preprint arXiv:1702.05373 (2017).
[10]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
[11]
Antreas Dionysiou, Michalis Agathocleous, Chris Christodoulou, and Vasilis Promponas. 2018. Convolutional Neural Networks in Combination with Support Vector Machines for Complex Sequential Data Classification. In ICANN. Springer, 444--455.
[12]
Antreas Dionysiou and Elias Athanasopoulos. 2020. SoK: Machine vs. Machine--A Systematic Classification of Automated Machine Learning-Based CAPTCHA Solvers. Computers & Security (2020), 101947.
[13]
Antreas Dionysiou, Vassilis Vassiliades, and Elias Athanasopoulos. 2021. HoneyGen: Generating Honeywords Using Representation Learning. Association for Computing Machinery, New York, NY, USA, 265--279. https://doi.org/10.1145/3433210.3453092
[14]
Javid Ebrahimi, Anyi Rao, Daniel Lowd, and Dejing Dou. 2018. HotFlip: White-Box Adversarial Examples for Text Classification. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (2018), 31--36.
[15]
Anthony Y Fu, Xiaotie Deng, Liu Wenyin, and Greg Little. 2006. The methodology and an application to fight against unicode attacks. In SOUPS. 91--101.
[16]
Ji Gao, Jack Lanchantin, Mary Lou Soffa, and Yanjun Qi. 2018. Black-box generation of adversarial text sequences to evade deep learning classifiers. In SPW. IEEE, 50--56.
[17]
Zhitao Gong, Wenlu Wang, Bo Li, Dawn Song, and Wei-Shinn Ku. 2018. Adversarial texts with gradient methods. arXiv preprint arXiv:1801.07175 (2018).
[18]
Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. 2014. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014).
[19]
Daniel Holden, Jun Saito, Taku Komura, and Thomas Joyce. 2015. Learning motion manifolds with convolutional autoencoders. In SIGGRAPH. ACM, 18.
[20]
Hossein Hosseini, Sreeram Kannan, Baosen Zhang, and Radha Poovendran. 2017. Deceiving google's perspective api built for detecting toxic comments. arXiv preprint arXiv:1702.08138 (2017).
[21]
Minqing Hu and Bing Liu. 2004. Mining and summarizing customer reviews. In SIGKDD. ACM, 168--177.
[22]
Ling Huang, Anthony D Joseph, Blaine Nelson, Benjamin IP Rubinstein, and JD Tygar. 2011. Adversarial machine learning. In AISec. ACM, 43--58.
[23]
Robin Jia and Percy Liang. 2017. Adversarial examples for evaluating reading comprehension systems. in EMNLP (2017), 2021--2031.
[24]
Armand Joulin, Edouard Grave, Piotr Bojanowski, Matthijs Douze, Hérve Jégou, and Tomas Mikolov. 2016. FastText.zip: Compressing text classification models. arXiv preprint arXiv:1612.03651 (2016).
[25]
Yoon Kim. 2014. Convolutional Neural Networks for Sentence Classification. in EMNLP (10 2014), 1746--1751.
[26]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In NeurIPS. 1097--1105.
[27]
Brandon Laughlin, Christopher Collins, Karthik Sankaranarayanan, and Khalil El-Khatib. 2019. A Visual Analytics Framework for Adversarial Text Generation. arXiv preprint arXiv:1909.11202 (2019).
[28]
Yann LeCun, Yoshua Bengio, et almbox. 1995 a. Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks, Vol. 3361, 10 (1995), 1995.
[29]
Yann LeCun, LD Jackel, Leon Bottou, A Brunot, Corinna Cortes, JS Denker, Harris Drucker, I Guyon, UA Muller, Eduard Sackinger, et al. 1995 b. Comparison of learning algorithms for handwritten digit recognition. In ICANN, Vol. 60. Perth, Australia, 53--60.
[30]
Jinfeng Li, Shouling Ji, Tianyu Du, Bo Li, and Ting Wang. 2019. TextBugger: Generating Adversarial Text Against Real-world Applications. in NDSS (2019). https://doi.org/10.14722/ndss.2019.23138
[31]
Bin Liang, Hongcheng Li, Miaoqiang Su, Pan Bian, Xirong Li, and Wenchang Shi. 2018. Deep Text Classification Can Be Fooled. Proceedings of the 27th International Joint Conference on Artificial Intelligence (2018), 4208--4215.
[32]
Andrew L Maas, Raymond E Daly, Peter T Pham, Dan Huang, Andrew Y Ng, and Christopher Potts. 2011. Learning word vectors for sentiment analysis. In 49th ACL: Human language technologies. ACL, 142--150.
[33]
Bo Pang and Lillian Lee. 2004. A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts. In 42nd ACL. ACL, 271--es.
[34]
Bo Pang and Lillian Lee. 2005. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In 43rd ACL. ACL, 115--124.
[35]
Bo Pang and Lillian Lee. 2008. Opinion Mining and Sentiment Analysis. Foundations and Trends in Information Retrieval, Vol. 2 (2008), 1--135.
[36]
Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z. Berkay Celik, and Ananthram Swami. 2017. Practical Black-Box Attacks Against Machine Learning. In ASIACCS. ACM, New York, NY, USA, 506--519.
[37]
Nicolas Papernot, Patrick McDaniel, Somesh Jha, Matt Fredrikson, Z Berkay Celik, and Ananthram Swami. 2016a. The limitations of deep learning in adversarial settings. In EuroS&P. IEEE, 372--387.
[38]
Nicolas Papernot, Patrick McDaniel, Ananthram Swami, and Richard Harang. 2016b. Crafting adversarial input sequences for recurrent neural networks. In MILCOM. IEEE, 49--54.
[39]
Sanglee Park and Jungmin So. 2020. On the Effectiveness of Adversarial Training in Defending against Adversarial Example Attacks for Image Classification. Applied Sciences, Vol. 10, 22 (2020), 8079.
[40]
Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In EMNLP. 1532--1543.
[41]
Aditi Raghunathan, Sang Michael Xie, Fanny Yang, John C Duchi, and Percy Liang. 2019. Adversarial training can hurt generalization. arXiv preprint arXiv:1906.06032 (2019).
[42]
David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. 1985. Learning internal representations by error propagation. Technical Report. California Univ San Diego La Jolla Inst for Cognitive Science.
[43]
Suranjana Samanta and Sameep Mehta. 2017. Towards crafting text adversarial samples. arXiv preprint arXiv:1707.02812 (2017).
[44]
David Sculley, Gabriel Wachman, and Carla E Brodley. 2006. Spam Filtering Using Inexact String Matching in Explicit Feature Space with On-Line Linear Classifiers. In TREC.
[45]
Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. 2013. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013).
[46]
Han Xu, Yao Ma, Haochen Liu, Debayan Deb, Hui Liu, Jiliang Tang, and Anil Jain. 2019. Adversarial attacks and defenses in images, graphs and text: A review. arXiv preprint arXiv:1909.08072 (2019).
[47]
Xiang Zhang, Junbo Zhao, and Yann LeCun. 2015. Character-level convolutional networks for text classification. In NIPS. 649--657.
[48]
Ye Zhang and Byron Wallace. 2017. A Sensitivity Analysis of (and Practitioners' Guide to) Convolutional Neural Networks for Sentence Classification. In IJCNLP. AFNLP, Taipei, Taiwan, 253--263.
[49]
Zhengli Zhao, Dheeru Dua, and Sameer Singh. 2018. Generating natural adversarial examples. in ICLR (2018).

Cited By

View all
  • (2024)Token-modification adversarial attacks for natural language processing: A surveyAI Communications10.3233/AIC-230279(1-22)Online publication date: 2-Apr-2024

Index Terms

  1. Unicode Evil: Evading NLP Systems Using Visual Similarities of Text Characters

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      AISec '21: Proceedings of the 14th ACM Workshop on Artificial Intelligence and Security
      November 2021
      210 pages
      ISBN:9781450386579
      DOI:10.1145/3474369
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 15 November 2021

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. adversarial machine learning
      2. adversarial text generation
      3. natural language processing

      Qualifiers

      • Research-article

      Funding Sources

      • Horizon 2020 research and innovation programme
      • Marie Sk?odowska-Curie grant

      Conference

      CCS '21
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 94 of 231 submissions, 41%

      Upcoming Conference

      CCS '24
      ACM SIGSAC Conference on Computer and Communications Security
      October 14 - 18, 2024
      Salt Lake City , UT , USA

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)26
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 30 Aug 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Token-modification adversarial attacks for natural language processing: A surveyAI Communications10.3233/AIC-230279(1-22)Online publication date: 2-Apr-2024

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media