short-paper

Detecting offensive tweets via topical feature discovery over a large scale twitter corpus

Authors:

Carolyn RoseAuthors Info & Claims

CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management

Pages 1980 - 1984

https://doi.org/10.1145/2396761.2398556

Published: 29 October 2012 Publication History

Abstract

In this paper, we propose a novel semi-supervised approach for detecting profanity-related offensive content in Twitter. Our approach exploits linguistic regularities in profane language via statistical topic modeling on a huge Twitter corpus, and detects offensive tweets using automatically these generated features. Our approach performs competitively with a variety of machine learning (ML) algorithms. For instance, our approach achieves a true positive rate (TP) of 75.1% over 4029 testing tweets using Logistic Regression, significantly outperforming the popular keyword matching baseline, which has a TP of 69.7%, while keeping the false positive rate (FP) at the same level as the baseline at about 3.77%. Our approach provides an alternative to large scale hand annotation efforts required by fully supervised learning approaches.

References

[1]

Alias-i. Lingpipe 4.0.1. 2008. http://alias-i.com/lingpipe.

[2]

AllSlang. List of swear words. 2010. http://www.noswearing.com/dictionary.

[3]

D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3:993--1022, 2003.

Digital Library

[4]

J. Bollen, H. Mao, and A. Pepe. Modeling public mood and emotion: Twitter sentiment and socio-economic phenomena. In Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media (ICWSM'11), 2011.

[5]

D. Davidov, O. Tsur, and A. Rappoport. Enhanced sentiment learning using twitter hashtags and smileys. In Proceedings of the 23rd International Conference on Computational Linguistics (COLING'10), 2010.

Digital Library

[6]

P. Gianfortoni, D. Adamson, and C. Rose. Modeling of stylistic variation in social media with stretchy patterns. In Proceedings of the First Workshop on Algorithms and Resources for Modelling of Dialects and Language Varieties, pages 49--59. Association for Computational Linguistics, 2011.

Digital Library

[7]

A. Go, R. Bhayani, and L. Huang. Twitter sentiment classification using distant supervision. Technical report, Stanford University, 2009.

[8]

R. Jones, A. Mccallum, K. Nigam, and E. Riloff. Bootstrapping for text learning tasks. In Proceedings of the Workshop on Text Mining: Foundations, Techniques and Applications in the Sixteenth International Joint Conference on Artificial Intelligence, pages 52--63, 1999.

[9]

E. Kouloumpis, T. Wilson, and J. Moore. Twitter sentiment analysis: The good the bad and the omg! In Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media (ICWSM'11), 2011.

[10]

A. Mahmud, K. Z. Ahmed, and M. Khan. Detecting flames and insults in text. In Proceedings of the Sixth International Conference on Natural Language Processing, 2008.

[11]

J. R. Martin and P. R. White. The Language of Evaluation: Appraisal in English. Palgrave Macmillan, 2005.

[12]

P. Norvig. Statistical learning as the ultimate agile development tool. In ACM 17th Conference on Information and Knowledge Management (CIKM'08), 2008.

[13]

B. O'Connor, R. Balasubramanyan, B. R. Routledge, and N. A. Smith. From tweets to polls: Linking text sentiment to public opinion time series. In Proceedings of the Fourth International AAAI Conference on Weblogs and Social Media, 2010.

[14]

X.-H. Phan and C.-T. Nguyen, 2007. GibbsLDA++: A C/C++ implementation of latent Dirichlet allocation (LDA).

[15]

A. H. Razavi, D. Inkpen, S. Uritsky, and S. Matwin. Offensive language detection using multi-level classification. In Proceedings of the 23rd Canadian Conference on Artificial Intelligence, pages 16--27, 2010.

Digital Library

[16]

E. Spertus. Smokey: Automatic recognition of hostile messages. In Proceedings of the Ninth Conference on Innovative Applications of Artificial Intelligence, pages 1058--1065, 1997.

Digital Library

[17]

O. Tsur, D. Davidov, and A. Rappoport. Icwsm - a great catchy name: Semi-supervised recognition of sarcastic sentences in online product reviews. In Proceedings of the Fourth International AAAI Conference on Weblogs and Social Media (ICWSM'10), pages 162--169, 2010.

[18]

Z. Xu and S. Zhu. Filtering offensive language in online communities using grammatical relations. In Proceedings of the Seventh Annual Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference, 2010.

Cited By

Zhang LJin LXu GLi XSun X(2025)COT: A Generative Approach for Hate Speech Counter-Narratives via Contrastive Optimal TransportIEEE Transactions on Emerging Topics in Computational Intelligence10.1109/TETCI.2024.34066919:1(740-756)Online publication date: Feb-2025
https://doi.org/10.1109/TETCI.2024.3406691
Memon ASootahar DLuhana KMeyer K(2024)A corpus-based real-time text classification and tagging approach for social dataFrontiers in Computer Science10.3389/fcomp.2024.12949856Online publication date: 13-Mar-2024
https://doi.org/10.3389/fcomp.2024.1294985
Muntasir FNoor J(2024)Explainable AI Discloses Gender Bias in Sexism Detection AlgorithmProceedings of the 11th International Conference on Networking, Systems, and Security10.1145/3704522.3704524(120-127)Online publication date: 19-Dec-2024
https://dl.acm.org/doi/10.1145/3704522.3704524
Show More Cited By

Index Terms

Detecting offensive tweets via topical feature discovery over a large scale twitter corpus
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Language resources
  2. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
    2. Machine learning approaches
      1. Classification and regression trees

Recommendations

Measuring and Detecting Virality on Social Media: The Case of Twitter’s Viral Tweets Topic
WWW '23 Companion: Companion Proceedings of the ACM Web Conference 2023

Social media posts may go viral and reach large numbers of people within a short period of time. Such posts may threaten the public dialogue if they contain misleading content, making their early detection highly crucial. Previous works proposed their ...
Identifying topical influencers on twitter based on user behavior and network topology

Social media web sites have become major media platforms to share personal information, news, photos, videos and more. Users can even share live streams whenever they want to reach out to many other. This prevalent usage of social media attracted ...
Content features of tweets for effective communication during disasters: A media synchronicity theory perspective
Highlights
- Tweets with more words and hashtags are retweeted faster during disaster events.
Abstract
Users’ ability to retweet information has made Twitter one of the most prominent social media platforms for disseminating emergency information during disasters. However, few studies have examined how Twitter’s features can support the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management

October 2012

2840 pages

ISBN:9781450311564

DOI:10.1145/2396761

General Chair:
Xuewen Chen
Wayne State University, USA
,
Program Chairs:
Guy Lebanon
Georgia Institute of Technology
,
Haixun Wang
Microsoft Research Asia
,
Mohammed J. Zaki
Rensselaer Polytechnic Institute

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 October 2012

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Conference

CIKM'12

Sponsor:

CIKM'12: 21st ACM International Conference on Information and Knowledge Management

October 29 - November 2, 2012

Hawaii, Maui, USA

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

145
Total Citations
View Citations
1,370
Total Downloads

Downloads (Last 12 months)41
Downloads (Last 6 weeks)5

Reflects downloads up to 24 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhang LJin LXu GLi XSun X(2025)COT: A Generative Approach for Hate Speech Counter-Narratives via Contrastive Optimal TransportIEEE Transactions on Emerging Topics in Computational Intelligence10.1109/TETCI.2024.34066919:1(740-756)Online publication date: Feb-2025
https://doi.org/10.1109/TETCI.2024.3406691
Memon ASootahar DLuhana KMeyer K(2024)A corpus-based real-time text classification and tagging approach for social dataFrontiers in Computer Science10.3389/fcomp.2024.12949856Online publication date: 13-Mar-2024
https://doi.org/10.3389/fcomp.2024.1294985
Muntasir FNoor J(2024)Explainable AI Discloses Gender Bias in Sexism Detection AlgorithmProceedings of the 11th International Conference on Networking, Systems, and Security10.1145/3704522.3704524(120-127)Online publication date: 19-Dec-2024
https://dl.acm.org/doi/10.1145/3704522.3704524
Pan T(2024)Cyberbullying detection based on aspect-level sentiment analysisProceedings of the 2024 3rd International Conference on Cryptography, Network Security and Communication Technology10.1145/3673277.3673312(200-204)Online publication date: 19-Jan-2024
https://dl.acm.org/doi/10.1145/3673277.3673312
Gandhi AAhir PAdhvaryu KShah PLohiya RCambria EPoria SHussain A(2024)Hate speech detection: A comprehensive review of recent worksExpert Systems10.1111/exsy.13562Online publication date: 25-Feb-2024
https://doi.org/10.1111/exsy.13562
Bolla BPattnaik SPatra S(2024)Detection of Objectionable Song Lyrics Using Weakly Supervised Learning and Natural Language Processing TechniquesProcedia Computer Science10.1016/j.procs.2024.04.183235(1929-1942)Online publication date: 2024
https://doi.org/10.1016/j.procs.2024.04.183
Malik JQiao HPang Gvan den Hengel A(2024)Deep learning for hate speech detection: a comparative studyInternational Journal of Data Science and Analytics10.1007/s41060-024-00650-6Online publication date: 22-Oct-2024
https://doi.org/10.1007/s41060-024-00650-6
Paraschiv ACojocaru ADascalu M(2024)Automated Offensive Comment Detection for the Romanian LanguageAI Approaches for Designing and Evaluating Interactive Intelligent Systems10.1007/978-3-031-53957-2_5(85-110)Online publication date: 10-Apr-2024
https://doi.org/10.1007/978-3-031-53957-2_5
Kolajo TDaramola O(2024)A Conceptual Framework for Human‐Centric and Semantics‐Based Explainable Event DetectionWIREs Data Mining and Knowledge Discovery10.1002/widm.156514:6Online publication date: 17-Oct-2024
https://doi.org/10.1002/widm.1565
Lupu YSear RVelásquez NLeahy RRestrepo NGoldberg BJohnson N(2023)Offline events and online hatePLOS ONE10.1371/journal.pone.027851118:1(e0278511)Online publication date: 25-Jan-2023
https://doi.org/10.1371/journal.pone.0278511
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten