Google Scholar

The risk of racial bias in hate speech detection

M Sap, D Card, S Gabriel, Y Choi… - Proceedings of the 57th …, 2019 - aclanthology.org

M Sap, D Card, S Gabriel, Y Choi, NA Smith

Proceedings of the 57th annual meeting of the association for …, 2019•aclanthology.org

We investigate how annotators' insensitivity to differences in dialect can lead to racial bias in
automatic hate speech detection models, potentially amplifying harm against minority
populations. We first uncover unexpected correlations between surface markers of African
American English (AAE) and ratings of toxicity in several widely-used hate speech datasets.
Then, we show that models trained on these corpora acquire and propagate these biases,
such that AAE tweets and tweets by self-identified African Americans are up to two times …

Abstract

We investigate how annotators’ insensitivity to differences in dialect can lead to racial bias in automatic hate speech detection models, potentially amplifying harm against minority populations. We first uncover unexpected correlations between surface markers of African American English (AAE) and ratings of toxicity in several widely-used hate speech datasets. Then, we show that models trained on these corpora acquire and propagate these biases, such that AAE tweets and tweets by self-identified African Americans are up to two times more likely to be labelled as offensive compared to others. Finally, we propose* dialect* and* race priming* as ways to reduce the racial bias in annotation, showing that when annotators are made explicitly aware of an AAE tweet’s dialect they are significantly less likely to label the tweet as offensive.

aclanthology.org

Show moreShow less

Save Cite Cited by 938 Related articles All 11 versions View as HTML

Cite

Advanced search

Saved to My library

The risk of racial bias in hate speech detection