Search | arXiv e-print repository

SenTopX: Benchmark for User Sentiment on Various Topics

Authors: Hina Qayyum, Muhammad Ikram, Benjamin Zhao, Ian Wood, Mohamad Ali Kaafar, Nicolas Kourtellis

Abstract: Toxic sentiment analysis on Twitter (X) often focuses on specific topics and events such as politics and elections. Datasets of toxic users in such research are typically gathered through lexicon-based techniques, providing only a cross-sectional view. his approach has a tight confine for studying toxic user behavior and effective platform moderation. To identify users consistently spreading toxic… ▽ More Toxic sentiment analysis on Twitter (X) often focuses on specific topics and events such as politics and elections. Datasets of toxic users in such research are typically gathered through lexicon-based techniques, providing only a cross-sectional view. his approach has a tight confine for studying toxic user behavior and effective platform moderation. To identify users consistently spreading toxicity, a longitudinal analysis of their tweets is essential. However, such datasets currently do not exist. This study addresses this gap by collecting a longitudinal dataset from 143K Twitter users, covering the period from 2007 to 2021, amounting to a total of 293 million tweets. Using topic modeling, we extract all topics discussed by each user and categorize users into eight groups based on the predominant topic in their timelines. We then analyze the sentiments of each group using 16 toxic scores. Our research demonstrates that examining users longitudinally reveals a distinct perspective on their comprehensive personality traits and their overall impact on the platform. Our comprehensive dataset is accessible to researchers for additional analysis. △ Less

Submitted 4 June, 2024; originally announced June 2024.

arXiv:2401.14252 [pdf, other]

doi 10.1109/BigData59044.2023.10386248

On mission Twitter Profiles: A Study of Selective Toxic Behavior

Authors: Hina Qayyum, Muhammad Ikram, Benjamin Zi Hao Zhao, an D. Wood, Nicolas Kourtellis, Mohamed Ali Kaafar

Abstract: The argument for persistent social media influence campaigns, often funded by malicious entities, is gaining traction. These entities utilize instrumented profiles to disseminate divisive content and disinformation, shaping public perception. Despite ample evidence of these instrumented profiles, few identification methods exist to locate them in the wild. To evade detection and appear genuine, sm… ▽ More The argument for persistent social media influence campaigns, often funded by malicious entities, is gaining traction. These entities utilize instrumented profiles to disseminate divisive content and disinformation, shaping public perception. Despite ample evidence of these instrumented profiles, few identification methods exist to locate them in the wild. To evade detection and appear genuine, small clusters of instrumented profiles engage in unrelated discussions, diverting attention from their true goals. This strategic thematic diversity conceals their selective polarity towards certain topics and fosters public trust. This study aims to characterize profiles potentially used for influence operations, termed 'on-mission profiles,' relying solely on thematic content diversity within unlabeled data. Distinguishing this work is its focus on content volume and toxicity towards specific themes. Longitudinal data from 138K Twitter or X, profiles and 293M tweets enables profiling based on theme diversity. High thematic diversity groups predominantly produce toxic content concerning specific themes, like politics, health, and news classifying them as 'on-mission' profiles. Using the identified ``on-mission" profiles, we design a classifier for unseen, unlabeled data. Employing a linear SVM model, we train and test it on an 80/20% split of the most diverse profiles. The classifier achieves a flawless 100% accuracy, facilitating the discovery of previously unknown ``on-mission" profiles in the wild. △ Less

Submitted 25 January, 2024; originally announced January 2024.

Journal ref: 2023 IEEE International Conference on Big Data (BigData)

arXiv:2401.14141 [pdf, other]

doi 10.1109/BigData59044.2023.10386248

Exploring the Distinctive Tweeting Patterns of Toxic Twitter Users

Authors: Hina Qayyum, Muhammad Ikram, Benjamin Zi Hao Zhao, Ian D. Wood, Nicolas Kourtellis, Mohamed Ali Kaafar

Abstract: In the pursuit of bolstering user safety, social media platforms deploy active moderation strategies, including content removal and user suspension. These measures target users engaged in discussions marked by hate speech or toxicity, often linked to specific keywords or hashtags. Nonetheless, the increasing prevalence of toxicity indicates that certain users adeptly circumvent these measures. Thi… ▽ More In the pursuit of bolstering user safety, social media platforms deploy active moderation strategies, including content removal and user suspension. These measures target users engaged in discussions marked by hate speech or toxicity, often linked to specific keywords or hashtags. Nonetheless, the increasing prevalence of toxicity indicates that certain users adeptly circumvent these measures. This study examines consistently toxic users on Twitter (rebranded as X) Rather than relying on traditional methods based on specific topics or hashtags, we employ a novel approach based on patterns of toxic tweets, yielding deeper insights into their behavior. We analyzed 38 million tweets from the timelines of 12,148 Twitter users and identified the top 1,457 users who consistently exhibit toxic behavior, relying on metrics like the Gini index and Toxicity score. By comparing their posting patterns to those of non-consistently toxic users, we have uncovered distinctive temporal patterns, including contiguous activity spans, inter-tweet intervals (referred to as 'Burstiness'), and churn analysis. These findings provide strong evidence for the existence of a unique tweeting pattern associated with toxic behavior on Twitter. Crucially, our methodology transcends Twitter and can be adapted to various social media platforms, facilitating the identification of consistently toxic users based on their posting behavior. This research contributes to ongoing efforts to combat online toxicity and offers insights for refining moderation strategies in the digital realm. We are committed to open research and will provide our code and data to the research community. △ Less

Submitted 25 January, 2024; originally announced January 2024.

Comments: 2023 IEEE International Conference on Big Data (BigData)

arXiv:2303.14603 [pdf, other]

doi 10.1145/3578503.3583619

A longitudinal study of the top 1% toxic Twitter profiles

Authors: Hina Qayyum, Benjamin Zi Hao Zhao, Ian D. Wood, Muhammad Ikram, Mohamed Ali Kaafar, Nicolas Kourtellis

Abstract: Toxicity is endemic to online social networks including Twitter. It follows a Pareto like distribution where most of the toxicity is generated by a very small number of profiles and as such, analyzing and characterizing these toxic profiles is critical. Prior research has largely focused on sporadic, event centric toxic content to characterize toxicity on the platform. Instead, we approach the pro… ▽ More Toxicity is endemic to online social networks including Twitter. It follows a Pareto like distribution where most of the toxicity is generated by a very small number of profiles and as such, analyzing and characterizing these toxic profiles is critical. Prior research has largely focused on sporadic, event centric toxic content to characterize toxicity on the platform. Instead, we approach the problem of characterizing toxic content from a profile centric point of view. We study 143K Twitter profiles and focus on the behavior of the top 1 percent producers of toxic content on Twitter, based on toxicity scores of their tweets availed by Perspective API. With a total of 293M tweets, spanning 16 years of activity, the longitudinal data allow us to reconstruct the timelines of all profiles involved. We use these timelines to gauge the behavior of the most toxic Twitter profiles compared to the rest of the Twitter population. We study the pattern of tweet posting from highly toxic accounts, based on the frequency and how prolific they are, the nature of hashtags and URLs, profile metadata, and Botometer scores. We find that the highly toxic profiles post coherent and well articulated content, their tweets keep to a narrow theme with lower diversity in hashtags, URLs, and domains, they are thematically similar to each other, and have a high likelihood of bot like behavior, likely to have progenitors with intentions to influence, based on high fake followers score. Our work contributes insight into the top 1 percent of toxic profiles on Twitter and establishes the profile centric approach to investigate toxicity on Twitter to be beneficial. △ Less

Submitted 25 March, 2023; originally announced March 2023.

arXiv:2202.07853 [pdf, other]

A deep dive into the consistently toxic 1% of Twitter

Authors: Hina Qayyum, Benjamin Zi Hao Zhao, Ian D. Wood, Muhammad Ikram, Mohamed Ali Kaafar, Nicolas Kourtellis

Abstract: Misbehavior in online social networks (OSN) is an ever-growing phenomenon. The research to date tends to focus on the deployment of machine learning to identify and classify types of misbehavior such as bullying, aggression, and racism to name a few. The main goal of identification is to curb natural and mechanical misconduct and make OSNs a safer place for social discourse. Going beyond past work… ▽ More Misbehavior in online social networks (OSN) is an ever-growing phenomenon. The research to date tends to focus on the deployment of machine learning to identify and classify types of misbehavior such as bullying, aggression, and racism to name a few. The main goal of identification is to curb natural and mechanical misconduct and make OSNs a safer place for social discourse. Going beyond past works, we perform a longitudinal study of a large selection of Twitter profiles, which enables us to characterize profiles in terms of how consistently they post highly toxic content. Our data spans 14 years of tweets from 122K Twitter profiles and more than 293M tweets. From this data, we selected the most extreme profiles in terms of consistency of toxic content and examined their tweet texts, and the domains, hashtags, and URLs they shared. We found that these selected profiles keep to a narrow theme with lower diversity in hashtags, URLs, and domains, they are thematically similar to each other (in a coordinated manner, if not through intent), and have a high likelihood of bot-like behavior (likely to have progenitors with intentions to influence). Our work contributes a substantial and longitudinal online misbehavior dataset to the research community and establishes the consistency of a profile's toxic behavior as a useful factor when exploring misbehavior as potential accessories to influence operations on OSNs. △ Less

Submitted 15 February, 2022; originally announced February 2022.

Showing 1–5 of 5 results for author: Qayyum, H