Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2396761.2398556acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
short-paper

Detecting offensive tweets via topical feature discovery over a large scale twitter corpus

Published: 29 October 2012 Publication History

Abstract

In this paper, we propose a novel semi-supervised approach for detecting profanity-related offensive content in Twitter. Our approach exploits linguistic regularities in profane language via statistical topic modeling on a huge Twitter corpus, and detects offensive tweets using automatically these generated features. Our approach performs competitively with a variety of machine learning (ML) algorithms. For instance, our approach achieves a true positive rate (TP) of 75.1% over 4029 testing tweets using Logistic Regression, significantly outperforming the popular keyword matching baseline, which has a TP of 69.7%, while keeping the false positive rate (FP) at the same level as the baseline at about 3.77%. Our approach provides an alternative to large scale hand annotation efforts required by fully supervised learning approaches.

References

[1]
Alias-i. Lingpipe 4.0.1. 2008. http://alias-i.com/lingpipe.
[2]
AllSlang. List of swear words. 2010. http://www.noswearing.com/dictionary.
[3]
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3:993--1022, 2003.
[4]
J. Bollen, H. Mao, and A. Pepe. Modeling public mood and emotion: Twitter sentiment and socio-economic phenomena. In Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media (ICWSM'11), 2011.
[5]
D. Davidov, O. Tsur, and A. Rappoport. Enhanced sentiment learning using twitter hashtags and smileys. In Proceedings of the 23rd International Conference on Computational Linguistics (COLING'10), 2010.
[6]
P. Gianfortoni, D. Adamson, and C. Rose. Modeling of stylistic variation in social media with stretchy patterns. In Proceedings of the First Workshop on Algorithms and Resources for Modelling of Dialects and Language Varieties, pages 49--59. Association for Computational Linguistics, 2011.
[7]
A. Go, R. Bhayani, and L. Huang. Twitter sentiment classification using distant supervision. Technical report, Stanford University, 2009.
[8]
R. Jones, A. Mccallum, K. Nigam, and E. Riloff. Bootstrapping for text learning tasks. In Proceedings of the Workshop on Text Mining: Foundations, Techniques and Applications in the Sixteenth International Joint Conference on Artificial Intelligence, pages 52--63, 1999.
[9]
E. Kouloumpis, T. Wilson, and J. Moore. Twitter sentiment analysis: The good the bad and the omg! In Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media (ICWSM'11), 2011.
[10]
A. Mahmud, K. Z. Ahmed, and M. Khan. Detecting flames and insults in text. In Proceedings of the Sixth International Conference on Natural Language Processing, 2008.
[11]
J. R. Martin and P. R. White. The Language of Evaluation: Appraisal in English. Palgrave Macmillan, 2005.
[12]
P. Norvig. Statistical learning as the ultimate agile development tool. In ACM 17th Conference on Information and Knowledge Management (CIKM'08), 2008.
[13]
B. O'Connor, R. Balasubramanyan, B. R. Routledge, and N. A. Smith. From tweets to polls: Linking text sentiment to public opinion time series. In Proceedings of the Fourth International AAAI Conference on Weblogs and Social Media, 2010.
[14]
X.-H. Phan and C.-T. Nguyen, 2007. GibbsLDA++: A C/C++ implementation of latent Dirichlet allocation (LDA).
[15]
A. H. Razavi, D. Inkpen, S. Uritsky, and S. Matwin. Offensive language detection using multi-level classification. In Proceedings of the 23rd Canadian Conference on Artificial Intelligence, pages 16--27, 2010.
[16]
E. Spertus. Smokey: Automatic recognition of hostile messages. In Proceedings of the Ninth Conference on Innovative Applications of Artificial Intelligence, pages 1058--1065, 1997.
[17]
O. Tsur, D. Davidov, and A. Rappoport. Icwsm - a great catchy name: Semi-supervised recognition of sarcastic sentences in online product reviews. In Proceedings of the Fourth International AAAI Conference on Weblogs and Social Media (ICWSM'10), pages 162--169, 2010.
[18]
Z. Xu and S. Zhu. Filtering offensive language in online communities using grammatical relations. In Proceedings of the Seventh Annual Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference, 2010.

Cited By

View all
  • (2025)COT: A Generative Approach for Hate Speech Counter-Narratives via Contrastive Optimal TransportIEEE Transactions on Emerging Topics in Computational Intelligence10.1109/TETCI.2024.34066919:1(740-756)Online publication date: Feb-2025
  • (2024)A corpus-based real-time text classification and tagging approach for social dataFrontiers in Computer Science10.3389/fcomp.2024.12949856Online publication date: 13-Mar-2024
  • (2024)Explainable AI Discloses Gender Bias in Sexism Detection AlgorithmProceedings of the 11th International Conference on Networking, Systems, and Security10.1145/3704522.3704524(120-127)Online publication date: 19-Dec-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management
October 2012
2840 pages
ISBN:9781450311564
DOI:10.1145/2396761
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 October 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. hadoop
  2. machine learning
  3. topic modeling
  4. twitter

Qualifiers

  • Short-paper

Conference

CIKM'12
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)41
  • Downloads (Last 6 weeks)5
Reflects downloads up to 24 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2025)COT: A Generative Approach for Hate Speech Counter-Narratives via Contrastive Optimal TransportIEEE Transactions on Emerging Topics in Computational Intelligence10.1109/TETCI.2024.34066919:1(740-756)Online publication date: Feb-2025
  • (2024)A corpus-based real-time text classification and tagging approach for social dataFrontiers in Computer Science10.3389/fcomp.2024.12949856Online publication date: 13-Mar-2024
  • (2024)Explainable AI Discloses Gender Bias in Sexism Detection AlgorithmProceedings of the 11th International Conference on Networking, Systems, and Security10.1145/3704522.3704524(120-127)Online publication date: 19-Dec-2024
  • (2024)Cyberbullying detection based on aspect-level sentiment analysisProceedings of the 2024 3rd International Conference on Cryptography, Network Security and Communication Technology10.1145/3673277.3673312(200-204)Online publication date: 19-Jan-2024
  • (2024)Hate speech detection: A comprehensive review of recent worksExpert Systems10.1111/exsy.13562Online publication date: 25-Feb-2024
  • (2024)Detection of Objectionable Song Lyrics Using Weakly Supervised Learning and Natural Language Processing TechniquesProcedia Computer Science10.1016/j.procs.2024.04.183235(1929-1942)Online publication date: 2024
  • (2024)Deep learning for hate speech detection: a comparative studyInternational Journal of Data Science and Analytics10.1007/s41060-024-00650-6Online publication date: 22-Oct-2024
  • (2024)Automated Offensive Comment Detection for the Romanian LanguageAI Approaches for Designing and Evaluating Interactive Intelligent Systems10.1007/978-3-031-53957-2_5(85-110)Online publication date: 10-Apr-2024
  • (2024)A Conceptual Framework for Human‐Centric and Semantics‐Based Explainable Event DetectionWIREs Data Mining and Knowledge Discovery10.1002/widm.156514:6Online publication date: 17-Oct-2024
  • (2023)Offline events and online hatePLOS ONE10.1371/journal.pone.027851118:1(e0278511)Online publication date: 25-Jan-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media