Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3474369.3486871acmconferencesArticle/Chapter ViewAbstractPublication PagesccsConference Proceedingsconference-collections
research-article

Unicode Evil: Evading NLP Systems Using Visual Similarities of Text Characters

Published: 15 November 2021 Publication History
  • Get Citation Alerts
  • Abstract

    Adversarial Text Generation Frameworks (ATGFs) aim at causing a Natural Language Processing (NLP) machine to misbehave, i.e., misclassify a given input. In this paper, we propose EvilText, a general ATGF that successfully evades some of the most popular NLP machines by (efficiently) perturbing a given legitimate text, preserving at the same time the original text's semantics as well as human readability. Perturbations are based on visually similar classes of characters appearing in the unicode set. EvilText can be utilized from NLP services' operators for evaluating their systems security and robustness. Furthermore, EvilText outperforms the state-of-the-art ATGFs, in terms of: (a) effectiveness, (b) efficiency and (c) original text's semantics and human readability preservation. We evaluate EvilText on some of the most popular NLP systems used for sentiment analysis and toxic content detection. We further expand on the generality and transferability of our ATGF, while also exploring possible countermeasures for defending against our attacks. Surprisingly, naive defence mechanisms fail to mitigate our attacks; the only promising one being the restriction of unicode characters use. However, we argue that restricting the use of unicode characters imposes a significant trade-off between security and usability as almost all websites are heavily based on unicode support.

    Supplementary Material

    MP4 File (EvilText.mp4)
    EvilText is an Adversarial Text Generation Framework (ATGF) used to effectively and efficiently evade popular NLP systems.

    References

    [1]
    Moustafa Alzantot, Yash Sharma, Ahmed Elgohary, Bo-Jhang Ho, Mani Srivastava, and Kai-Wei Chang. 2018. Generating natural language adversarial examples. arXiv preprint arXiv:1804.07998 (2018).
    [2]
    Anish Athalye, Logan Engstrom, Andrew Ilyas, and Kevin Kwok. 2017. Synthesizing robust adversarial examples. arXiv preprint arXiv:1707.07397 (2017).
    [3]
    Marco Barreno, Blaine Nelson, Russell Sears, Anthony D Joseph, and J Doug Tygar. 2006. Can machine learning be secure?. In ASIACCS. ACM, 16--25.
    [4]
    Yonatan Belinkov and Yonatan Bisk. 2017. Synthetic and natural noise both break neural machine translation. arXiv preprint arXiv:1711.02173 (2017).
    [5]
    Battista Biggio, Giorgio Fumera, and Fabio Roli. 2011. Design of robust classifiers for adversarial environments. In SMC. IEEE, 977--982.
    [6]
    John Brooke et almbox. 1996. SUS-A quick and dirty usability scale. Usability evaluation in industry, Vol. 189, 194 (1996), 4--7.
    [7]
    Nicholas Carlini and David Wagner. 2017. Towards evaluating the robustness of neural networks. In EuroS&P. IEEE, 39--57.
    [8]
    Minhao Cheng, Jinfeng Yi, Pin-Yu Chen, Huan Zhang, and Cho-Jui Hsieh. 2018. Seq2Sick: Evaluating the Robustness of Sequence-to-Sequence Models with Adversarial Examples. (2018). arxiv: cs.LG/1803.01128
    [9]
    Gregory Cohen, Saeed Afshar, Jonathan Tapson, and André van Schaik. 2017. EMNIST: an extension of MNIST to handwritten letters. arXiv preprint arXiv:1702.05373 (2017).
    [10]
    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
    [11]
    Antreas Dionysiou, Michalis Agathocleous, Chris Christodoulou, and Vasilis Promponas. 2018. Convolutional Neural Networks in Combination with Support Vector Machines for Complex Sequential Data Classification. In ICANN. Springer, 444--455.
    [12]
    Antreas Dionysiou and Elias Athanasopoulos. 2020. SoK: Machine vs. Machine--A Systematic Classification of Automated Machine Learning-Based CAPTCHA Solvers. Computers & Security (2020), 101947.
    [13]
    Antreas Dionysiou, Vassilis Vassiliades, and Elias Athanasopoulos. 2021. HoneyGen: Generating Honeywords Using Representation Learning. Association for Computing Machinery, New York, NY, USA, 265--279. https://doi.org/10.1145/3433210.3453092
    [14]
    Javid Ebrahimi, Anyi Rao, Daniel Lowd, and Dejing Dou. 2018. HotFlip: White-Box Adversarial Examples for Text Classification. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (2018), 31--36.
    [15]
    Anthony Y Fu, Xiaotie Deng, Liu Wenyin, and Greg Little. 2006. The methodology and an application to fight against unicode attacks. In SOUPS. 91--101.
    [16]
    Ji Gao, Jack Lanchantin, Mary Lou Soffa, and Yanjun Qi. 2018. Black-box generation of adversarial text sequences to evade deep learning classifiers. In SPW. IEEE, 50--56.
    [17]
    Zhitao Gong, Wenlu Wang, Bo Li, Dawn Song, and Wei-Shinn Ku. 2018. Adversarial texts with gradient methods. arXiv preprint arXiv:1801.07175 (2018).
    [18]
    Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. 2014. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014).
    [19]
    Daniel Holden, Jun Saito, Taku Komura, and Thomas Joyce. 2015. Learning motion manifolds with convolutional autoencoders. In SIGGRAPH. ACM, 18.
    [20]
    Hossein Hosseini, Sreeram Kannan, Baosen Zhang, and Radha Poovendran. 2017. Deceiving google's perspective api built for detecting toxic comments. arXiv preprint arXiv:1702.08138 (2017).
    [21]
    Minqing Hu and Bing Liu. 2004. Mining and summarizing customer reviews. In SIGKDD. ACM, 168--177.
    [22]
    Ling Huang, Anthony D Joseph, Blaine Nelson, Benjamin IP Rubinstein, and JD Tygar. 2011. Adversarial machine learning. In AISec. ACM, 43--58.
    [23]
    Robin Jia and Percy Liang. 2017. Adversarial examples for evaluating reading comprehension systems. in EMNLP (2017), 2021--2031.
    [24]
    Armand Joulin, Edouard Grave, Piotr Bojanowski, Matthijs Douze, Hérve Jégou, and Tomas Mikolov. 2016. FastText.zip: Compressing text classification models. arXiv preprint arXiv:1612.03651 (2016).
    [25]
    Yoon Kim. 2014. Convolutional Neural Networks for Sentence Classification. in EMNLP (10 2014), 1746--1751.
    [26]
    Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In NeurIPS. 1097--1105.
    [27]
    Brandon Laughlin, Christopher Collins, Karthik Sankaranarayanan, and Khalil El-Khatib. 2019. A Visual Analytics Framework for Adversarial Text Generation. arXiv preprint arXiv:1909.11202 (2019).
    [28]
    Yann LeCun, Yoshua Bengio, et almbox. 1995 a. Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks, Vol. 3361, 10 (1995), 1995.
    [29]
    Yann LeCun, LD Jackel, Leon Bottou, A Brunot, Corinna Cortes, JS Denker, Harris Drucker, I Guyon, UA Muller, Eduard Sackinger, et al. 1995 b. Comparison of learning algorithms for handwritten digit recognition. In ICANN, Vol. 60. Perth, Australia, 53--60.
    [30]
    Jinfeng Li, Shouling Ji, Tianyu Du, Bo Li, and Ting Wang. 2019. TextBugger: Generating Adversarial Text Against Real-world Applications. in NDSS (2019). https://doi.org/10.14722/ndss.2019.23138
    [31]
    Bin Liang, Hongcheng Li, Miaoqiang Su, Pan Bian, Xirong Li, and Wenchang Shi. 2018. Deep Text Classification Can Be Fooled. Proceedings of the 27th International Joint Conference on Artificial Intelligence (2018), 4208--4215.
    [32]
    Andrew L Maas, Raymond E Daly, Peter T Pham, Dan Huang, Andrew Y Ng, and Christopher Potts. 2011. Learning word vectors for sentiment analysis. In 49th ACL: Human language technologies. ACL, 142--150.
    [33]
    Bo Pang and Lillian Lee. 2004. A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts. In 42nd ACL. ACL, 271--es.
    [34]
    Bo Pang and Lillian Lee. 2005. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In 43rd ACL. ACL, 115--124.
    [35]
    Bo Pang and Lillian Lee. 2008. Opinion Mining and Sentiment Analysis. Foundations and Trends in Information Retrieval, Vol. 2 (2008), 1--135.
    [36]
    Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z. Berkay Celik, and Ananthram Swami. 2017. Practical Black-Box Attacks Against Machine Learning. In ASIACCS. ACM, New York, NY, USA, 506--519.
    [37]
    Nicolas Papernot, Patrick McDaniel, Somesh Jha, Matt Fredrikson, Z Berkay Celik, and Ananthram Swami. 2016a. The limitations of deep learning in adversarial settings. In EuroS&P. IEEE, 372--387.
    [38]
    Nicolas Papernot, Patrick McDaniel, Ananthram Swami, and Richard Harang. 2016b. Crafting adversarial input sequences for recurrent neural networks. In MILCOM. IEEE, 49--54.
    [39]
    Sanglee Park and Jungmin So. 2020. On the Effectiveness of Adversarial Training in Defending against Adversarial Example Attacks for Image Classification. Applied Sciences, Vol. 10, 22 (2020), 8079.
    [40]
    Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In EMNLP. 1532--1543.
    [41]
    Aditi Raghunathan, Sang Michael Xie, Fanny Yang, John C Duchi, and Percy Liang. 2019. Adversarial training can hurt generalization. arXiv preprint arXiv:1906.06032 (2019).
    [42]
    David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. 1985. Learning internal representations by error propagation. Technical Report. California Univ San Diego La Jolla Inst for Cognitive Science.
    [43]
    Suranjana Samanta and Sameep Mehta. 2017. Towards crafting text adversarial samples. arXiv preprint arXiv:1707.02812 (2017).
    [44]
    David Sculley, Gabriel Wachman, and Carla E Brodley. 2006. Spam Filtering Using Inexact String Matching in Explicit Feature Space with On-Line Linear Classifiers. In TREC.
    [45]
    Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. 2013. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013).
    [46]
    Han Xu, Yao Ma, Haochen Liu, Debayan Deb, Hui Liu, Jiliang Tang, and Anil Jain. 2019. Adversarial attacks and defenses in images, graphs and text: A review. arXiv preprint arXiv:1909.08072 (2019).
    [47]
    Xiang Zhang, Junbo Zhao, and Yann LeCun. 2015. Character-level convolutional networks for text classification. In NIPS. 649--657.
    [48]
    Ye Zhang and Byron Wallace. 2017. A Sensitivity Analysis of (and Practitioners' Guide to) Convolutional Neural Networks for Sentence Classification. In IJCNLP. AFNLP, Taipei, Taiwan, 253--263.
    [49]
    Zhengli Zhao, Dheeru Dua, and Sameer Singh. 2018. Generating natural adversarial examples. in ICLR (2018).

    Cited By

    View all
    • (2024)Token-modification adversarial attacks for natural language processing: A surveyAI Communications10.3233/AIC-230279(1-22)Online publication date: 2-Apr-2024

    Index Terms

    1. Unicode Evil: Evading NLP Systems Using Visual Similarities of Text Characters

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        AISec '21: Proceedings of the 14th ACM Workshop on Artificial Intelligence and Security
        November 2021
        210 pages
        ISBN:9781450386579
        DOI:10.1145/3474369
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 15 November 2021

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. adversarial machine learning
        2. adversarial text generation
        3. natural language processing

        Qualifiers

        • Research-article

        Funding Sources

        • Horizon 2020 research and innovation programme
        • Marie Sk?odowska-Curie grant

        Conference

        CCS '21
        Sponsor:

        Acceptance Rates

        Overall Acceptance Rate 94 of 231 submissions, 41%

        Upcoming Conference

        CCS '24
        ACM SIGSAC Conference on Computer and Communications Security
        October 14 - 18, 2024
        Salt Lake City , UT , USA

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)36
        • Downloads (Last 6 weeks)2
        Reflects downloads up to 27 Jul 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Token-modification adversarial attacks for natural language processing: A surveyAI Communications10.3233/AIC-230279(1-22)Online publication date: 2-Apr-2024

        View Options

        Get Access

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media