research-article

Abusive Language Detection in Online User Content

Authors:

Chikashi Nobata,

Joel Tetreault,

Yi ChangAuthors Info & Claims

WWW '16: Proceedings of the 25th International Conference on World Wide Web

Pages 145 - 153

https://doi.org/10.1145/2872427.2883062

Published: 11 April 2016 Publication History

Abstract

Detection of abusive language in user generated online content has become an issue of increasing importance in recent years. Most current commercial methods make use of blacklists and regular expressions, however these measures fall short when contending with more subtle, less ham-fisted examples of hate speech. In this work, we develop a machine learning based method to detect hate speech on online user comments from two domains which outperforms a state-of-the-art deep learning approach. We also develop a corpus of user comments annotated for abusive language, the first of its kind. Finally, we use our detection tool to analyze abusive language over time and in different settings to further enhance our knowledge of this behavior.

References

[1]

S. Brody and N. Diakopoulos. Cooooooooooooooollllllllllllll!!!!!!!!!!!!!! using word lengthening to detect sentiment in microblogs. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pages 562--570, Edinburgh, Scotland, UK., July 2011. Association for Computational Linguistics.

Digital Library

[2]

M. D. Buhrmester, T. Kwang, and S. D. Gosling. Amazon's mechanical turk: A new source of inexpensive, yet high-quality, data? Perspectives on Psychological Science, 6(1):3--5, Jan 2011.

[3]

Y. Chen, Y. Zhou, S. Zhu, and H. Xu. Detecting offensive language in social media to protect adolescent online safety. In Privacy, Security, Risk and Trust (PASSAT), 2012 International Conference on and 2012 International Confernece on Social Computing (SocialCom), pages 71--80. IEEE, 2012.

Digital Library

[4]

N. Djuric, H. Wu, V. Radosavljevic, M. Grbovic, and N. Bhamidipati. Hierarchical neural language models for joint representation of streaming documents and their content. In International World Wide Web Conference (WWW), 2015.

Digital Library

[5]

N. Djuric, J. Zhou, R. Morris, M. Grbovic, V. Radosavljevic, and N. Bhamidipati. Hate speech detection with comment embeddings. In Proceedings of International World Wide Web Conference (WWW), 2015.

Digital Library

[6]

M. Faruqui and C. Dyer. Non-distributional word vector representations. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 464--469, Beijing, China, July 2015. Association for Computational Linguistics.

[7]

J. Horton, D. G. Rand, and R. J. Zeckhauser. The online laboratory: Conducting experiments in a real labor market. National Bureau of Economic Research Cambridge, Mass., USA, 2010.

[8]

Q. Le and T. Mikolov. Distributed representations of sentences and documents. In T. Jebara and E. P. Xing, editors, Proceedings of the 31st International Conference on Machine Learning (ICML-14), pages 1188--1196. JMLR Workshop and Conference Proceedings, 2014.

[9]

B. Liu. Sentiment Analysis and Opinion Mining. Morgan Claypool Publishers, 2012.

Digital Library

[10]

T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. CoRR, abs/1301.3781, 2013.

[11]

T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Weinberger, editors, Advances in Neural Information Processing Systems 26, pages 3111--3119. Curran Associates, Inc., 2013.

[12]

G. Paolacci, J. Chandler, and P. G. Ipeirotis. Running experiments on amazon mechanical turk. Judgment and Decision Making, 5(5):411--419, 2010.

[13]

E. Pitler and A. Nenkova. Using syntax to disambiguate explicit discourse connectives in text. In Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, pages 13--16, Suntec, Singapore, August 2009. Association for Computational Linguistics.

Digital Library

[14]

S. Sood, J. Antin, and E. Churchill. Profanity use in online communities. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pages 1481--1490. ACM, 2012.

Digital Library

[15]

S. O. Sood, J. Antin, and E. F. Churchill. Using crowdsourcing to improve profanity detection. In AAAI Spring Symposium: Wisdom of the Crowd, 2012.

[16]

M. Surdeanu, M. Ciaramita, and H. Zaragoza. Learning to rank answers to non-factoid questions from web collections. Computational Linguistics, 37:351--383, 2011.

Digital Library

[17]

S. Suri and D. J. Watts. Cooperation and contagion in web-based, networked public goods experiments. PloS One, 6(3), 2011.

[18]

W. Warner and J. Hirschberg. Detecting hate speech on the world wide web. In Proceedings of the Second Workshop on Language in Social Media, pages 19--26, Montréal, Canada, June 2012. Association for Computational Linguistics.

Digital Library

[19]

B. Yang, W. Yih, X. He, J. Gao, and L. Deng. Embedding entities and relations for learning and inference in knowledge bases. CoRR, abs/1412.6575, 2014.

[20]

D. Yarowsky. Unsupervised word sense disambiguation rivaling supervised methods. In Proceedings of the 33rd annual meeting on Association for Computational Linguistics, pages 189--196. Association for Computational Linguistics, 1995.

Digital Library

[21]

D. Yin, Z. Xue, L. Hong, B. D. Davison, A. Kontostathis, and L. Edwards. Detection of harassment on web 2.0. Proceedings of the Content Analysis in the WEB, 2:1--7, 2009.

Cited By

Alhazmi AMahmud RIdris NMohamed Abo MEke C(2024)A systematic literature review of hate speech identification on Arabic Twitter data: research challenges and future directionsPeerJ Computer Science10.7717/peerj-cs.196610(e1966)Online publication date: 2-Apr-2024
https://doi.org/10.7717/peerj-cs.1966
Alqahtani AIlyas M(2024)An Ensemble-Based Multi-Classification Machine Learning Classifiers Approach to Detect Multiple Classes of CyberbullyingMachine Learning and Knowledge Extraction10.3390/make60100096:1(156-170)Online publication date: 12-Jan-2024
https://doi.org/10.3390/make6010009
Faraz AAhsan FMounsef JKaramitsos IKanavos A(2024)Enhancing Child Safety in Online Gaming: The Development and Application of Protectbot, an AI-Powered Chatbot FrameworkInformation10.3390/info1504023315:4(233)Online publication date: 19-Apr-2024
https://doi.org/10.3390/info15040233
Show More Cited By

Index Terms

Abusive Language Detection in Online User Content
1. Human-centered computing
  1. Collaborative and social computing

Recommendations

Spread of Hate Speech in Online Social Media
WebSci '19: Proceedings of the 10th ACM Conference on Web Science

Hate speech is considered to be one of the major issues currently plaguing the online social media. With online hate speech culminating in gruesome scenarios like the Rohingya genocide in Myanmar, anti-Muslim mob violence in Sri Lanka, and the ...
Identification of hate speech and abusive language on indonesian Twitter using the Word2vec, part of speech and emoji features
AISS '19: Proceedings of the 1st International Conference on Advanced Information Science and System

Freedom of speech for the people of Indonesia on social media makes the spread of hate speech and abusive language inevitable. If there is no proper handling, this will lead to social disharmony between individuals and communities. The identification of ...
Explainable Abusive Language Classification Leveraging User and Network Data
Machine Learning and Knowledge Discovery in Databases. Applied Data Science Track
Abstract
Online hate speech is a phenomenon with considerable consequences for our society. Its automatic detection using machine learning is a promising approach to contain its spread. However, classifying abusive language with a model that purely relies ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

WWW '16: Proceedings of the 25th International Conference on World Wide Web

April 2016

1482 pages

ISBN:9781450341431

General Chairs:
Jacqueline Bourdeau
Tele-university (TELUQ), Montreal, QC, Canada
,
Jim A. Hendler
Rensselaer Polytechnic Institute, Troy, NY, USA
,
Roger Nkambou Nkambou
Université du Québec à Montréal, Montreal, QC, Canada
,
Program Chairs:
Ian Horrocks
University of Oxford, UK
,
Ben Y. Zhao
University of California at Santa Barbara, CA, USA

Copyright © 2016 Copyright is held by the International World Wide Web Conference Committee (IW3C2).

Sponsors

IW3C2: International World Wide Web Conference Committee

In-Cooperation

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

International World Wide Web Conferences Steering Committee

Republic and Canton of Geneva, Switzerland

Publication History

Published: 11 April 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

WWW '16

Sponsor:

IW3C2

WWW '16: 25th International World Wide Web Conference

April 11 - 15, 2016

Québec, Montréal, Canada

Acceptance Rates

WWW '16 Paper Acceptance Rate 115 of 727 submissions, 16%;

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

481
Total Citations
View Citations
4,645
Total Downloads

Downloads (Last 12 months)432
Downloads (Last 6 weeks)28

Reflects downloads up to 22 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Alhazmi AMahmud RIdris NMohamed Abo MEke C(2024)A systematic literature review of hate speech identification on Arabic Twitter data: research challenges and future directionsPeerJ Computer Science10.7717/peerj-cs.196610(e1966)Online publication date: 2-Apr-2024
https://doi.org/10.7717/peerj-cs.1966
Alqahtani AIlyas M(2024)An Ensemble-Based Multi-Classification Machine Learning Classifiers Approach to Detect Multiple Classes of CyberbullyingMachine Learning and Knowledge Extraction10.3390/make60100096:1(156-170)Online publication date: 12-Jan-2024
https://doi.org/10.3390/make6010009
Faraz AAhsan FMounsef JKaramitsos IKanavos A(2024)Enhancing Child Safety in Online Gaming: The Development and Application of Protectbot, an AI-Powered Chatbot FrameworkInformation10.3390/info1504023315:4(233)Online publication date: 19-Apr-2024
https://doi.org/10.3390/info15040233
Asiri ASaleh M(2024)SOD: A Corpus for Saudi Offensive Language Detection ClassificationComputers10.3390/computers1308021113:8(211)Online publication date: 20-Aug-2024
https://doi.org/10.3390/computers13080211
Wu CHu HZhu DShan XYung KIp A(2024)A Study of Discriminatory Speech Classification Based on Improved Smote and SVM-RFApplied Sciences10.3390/app1415646814:15(6468)Online publication date: 24-Jul-2024
https://doi.org/10.3390/app14156468
Memon ASootahar DLuhana KMeyer K(2024)A corpus-based real-time text classification and tagging approach for social dataFrontiers in Computer Science10.3389/fcomp.2024.12949856Online publication date: 13-Mar-2024
https://doi.org/10.3389/fcomp.2024.1294985
Al Maruf AAbidin AHaque MJiyad ZGolder AAlubady RAung Z(2024)Hate speech detection in the Bengali language: a comprehensive surveyJournal of Big Data10.1186/s40537-024-00956-z11:1Online publication date: 23-Jul-2024
https://doi.org/10.1186/s40537-024-00956-z
Adegoke FTenuche BAgozie E(2024)Development of Pidgin English Hate Speech Classification System for Social MediaAmerican Journal of Information Science and Technology10.11648/j.ajist.20240802.128:2(34-44)Online publication date: 14-Jun-2024
https://doi.org/10.11648/j.ajist.20240802.12
Baruah AWahlang LJyrwa FShadap FBarbhuiya FDey K(2024)Abusive Language Detection in Khasi Social Media CommentsACM Transactions on Asian and Low-Resource Language Information Processing10.1145/3664285Online publication date: 14-May-2024
https://dl.acm.org/doi/10.1145/3664285
Huang ESarma AHwang SChandrasekharan EChancellor S(2024)Opportunities, tensions, and challenges in computational approaches to addressing online harassmentProceedings of the 2024 ACM Designing Interactive Systems Conference10.1145/3643834.3661623(1483-1498)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1145/3643834.3661623
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents