All You Need is "Love": Evading Hate-speech Detection

Gröndahl, Tommi; Pajola, Luca; Juuti, Mika; Conti, Mauro; Asokan, N.

Computer Science > Computation and Language

arXiv:1808.09115 (cs)

[Submitted on 28 Aug 2018 (v1), last revised 5 Nov 2018 (this version, v3)]

Title:All You Need is "Love": Evading Hate-speech Detection

Authors:Tommi Gröndahl, Luca Pajola, Mika Juuti, Mauro Conti, N. Asokan

View PDF

Abstract:With the spread of social networks and their unfortunate use for hate speech, automatic detection of the latter has become a pressing problem. In this paper, we reproduce seven state-of-the-art hate speech detection models from prior work, and show that they perform well only when tested on the same type of data they were trained on. Based on these results, we argue that for successful hate speech detection, model architecture is less important than the type of data and labeling criteria. We further show that all proposed detection techniques are brittle against adversaries who can (automatically) insert typos, change word boundaries or add innocuous words to the original hate speech. A combination of these methods is also effective against Google Perspective -- a cutting-edge solution from industry. Our experiments demonstrate that adversarial training does not completely mitigate the attacks, and using character-level features makes the models systematically more attack-resistant than using word-level features.

Comments:	11 pages, Proceedings of the 11th ACM Workshop on Artificial Intelligence and Security (AISec) 2018
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:1808.09115 [cs.CL]
	(or arXiv:1808.09115v3 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1808.09115

Submission history

From: Tommi Gröndahl [view email]
[v1] Tue, 28 Aug 2018 04:49:54 UTC (35 KB)
[v2] Fri, 31 Aug 2018 15:48:23 UTC (35 KB)
[v3] Mon, 5 Nov 2018 16:06:57 UTC (35 KB)

Computer Science > Computation and Language

Title:All You Need is "Love": Evading Hate-speech Detection

Submission history

Access Paper:

References & Citations

1 blog link

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:All You Need is "Love": Evading Hate-speech Detection

Submission history

Access Paper:

References & Citations

1 blog link

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators