HotFlip: White-Box Adversarial Examples for NLP

Ebrahimi, Javid; Rao, Anyi; Lowd, Daniel; Dou, Dejing

Computer Science > Computation and Language

arXiv:1712.06751v1 (cs)

[Submitted on 19 Dec 2017 (this version), latest version 24 May 2018 (v2)]

Title:HotFlip: White-Box Adversarial Examples for NLP

Authors:Javid Ebrahimi, Anyi Rao, Daniel Lowd, Dejing Dou

View PDF

Abstract:Adversarial examples expose vulnerabilities of machine learning models. We propose an efficient method to generate white-box adversarial examples that trick character-level and word-level neural models. Our method, HotFlip, relies on an atomic flip operation, which swaps one token for another, based on the gradients of the one-hot input vectors. In experiments on text classification and machine translation, we find that only a few manipulations are needed to greatly increase the error rates. We analyze the properties of these examples, and show that employing these adversarial examples in training can improve test-time accuracy on clean examples, as well as defend the models against adversarial examples.

Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:1712.06751 [cs.CL]
	(or arXiv:1712.06751v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1712.06751

Submission history

From: Javid Ebrahimi [view email]
[v1] Tue, 19 Dec 2017 02:15:19 UTC (93 KB)
[v2] Thu, 24 May 2018 16:43:45 UTC (51 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2017-12

Change to browse by:

cs
cs.LG

References & Citations

1 blog link

(what is this?)

DBLP - CS Bibliography

listing | bibtex

Javid Ebrahimi
Anyi Rao
Daniel Lowd
Dejing Dou

export BibTeX citation

Computer Science > Computation and Language

Title:HotFlip: White-Box Adversarial Examples for NLP

Submission history

Access Paper:

References & Citations

1 blog link

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:HotFlip: White-Box Adversarial Examples for NLP

Submission history

Access Paper:

References & Citations

1 blog link

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators