1 Introduction
Social media was first seen as an opportunity for new modes of open civic discussion and interactive participation [
31,
32]. However, it has become evident that open online media have properties that often overshadow the anticipated positive aspects by permitting a range of disruptive behaviors that deteriorate constructive discussion in the forms of manipulation, provocation, and circulation of disinformation [
73]. Two cases of harmful behavioral phenomena that are often found in online forum interaction are aggression and trolling. Both are common and evidently problematic online: recent societal events have shown that they can be used to harass groups of people as well as to perform large-scale ideological manipulation, both of which pose significant threats to egalitarian and democratic ideals [
1,
8]. Detection and prevention of such harmful behaviors are essential for protecting these ideals and maintaining civility online [
12].
This article focuses on the identifiable conversational properties of disruptive behavior in online interaction where messages relate to one another in a conversation, e.g., through turn-taking. Disruptive online behavior can encompass phenomena such as, but not limited only to, aggression, trolling, bullying, spamming and hate speech, when these materialize as behaviors within conversational interaction that disrupt, harass, or break the flow of conversation. By attending to interactive characteristics of disruptive behavior, our focus excludes cases that cannot be analyzed as part of an interaction, e.g., aggressive language or swearing in non-interactive contexts such as blog texts or isolated messages.
Disruptive behaviors can be hard or impossible to recognize and detect without examining the dynamics between messages. Thus, analysis of messages in isolation, outside of their conversational contexts, may not be sufficient for addressing covert, deceptive forms of disruptive online behavior. Although some message-level aggression detection systems can be very accurate (e.g., [
93]), they are also vulnerable to user deception and manipulation [
49]. Thus, to detect and prevent disruptive behavior online, we need detection methods that can identify both overt and covert interaction strategies. This is needed for models to be less vulnerable to deception and more generalizable to various contexts. As aggression and trolling are common phenomena in conversational interaction, analysis of their wider conversational contexts can help in understanding and detecting them. Such an analysis can build on the existing understanding of language use in conversation [
40,
94]. Investigation of patterns in conversational dynamics can reveal covert manipulation such as violations of conversational norms of turn-taking that may be left unnoticed if investigating only individual messages (see, e.g., [
53,
82]).
In this study, we analyze aggression and trolling, two forms of disruptive behavior that cause harm in social media conversations [
73]: the former being mostly an overt form of behavior [
81] and the latter often being more covertly acted out [
36,
54]. We are primarily focusing on covert behavior – trolling – while our analysis of aggression serves to highlight the differences between the two behaviors. By doing this, we emphasize the need for structural analysis of conversational dynamics, especially for covert disruptive behaviors such as trolling.
We suggest that identifying patterns in the use of
conversational actions and related norm violations provides more conversational context, generalizability, and better accuracy in the computational detection of covert disruptive interaction. By conversational actions, we refer to the functions of a message, i.e., what a turn does in a conversation in relation to previous turns: e.g., asking someone a question. We investigate the use of common conversational actions and how they are used to violate conversational norms in disruptive as opposed to non-disruptive interaction. Besides analyzing action taking in general, we attend to action-related conversational norm violations in which the expectations set by actions are broken. An example of such a violation would be to avoid answering a question, replying with an unexpected action such as an accusation instead. Norms related to actions are pervasive across various contexts [
39]; thus, people can be expected to usually follow them in quite a similar fashion online as well [
77,
114]. This suggests that analyzing the dynamics of common conversational actions and their norm violations may offer a generalizable approach for detecting disruptive behavior.
The reason for focusing on aggression and trolling specifically is that while we consider them to differ in their portrayed levels of overtness (i.e., immediate perceivability), they have often been analyzed together [
14,
15,
59,
69]. Building on earlier research on aggression enables us to conduct our analysis on an existing verified dataset [
126]. Our findings on trolling detection, in turn, provide a basis for future research. In particular, computational trolling detection has not been studied widely from a conversational perspective as our study does. Instead, computational approaches have remained mostly limited to message-level analyses. Moreover, the demarcation between aggression and trolling has not always been clear-cut. Aggression is often seen as hostile behavior involving the use of profanity, derogatory language, or aggressive sentiment [
81]. These have also been considered as definitive features of trolling (e.g., [
15,
97]), although the literature has also maintained that they are not equivalent phenomena [
54]. In this article, by
aggression we refer to overt means of offense within a conversation, actions that result in an
ad hominem attack (per [
126, p. 1353]). Trolling, in turn, has been defined in several ways, emphasizing different characteristics (see [
26]), such as ideological [
124] or political manipulation (or “Twitter trolling”; [
27]), classic trolling (i.e., baiting newbies on discussion forums; [
36,
48]), or deceptive harassment of minority groups [
56]. In our article, by
trolling we refer to
conversational trolling, which manifests as strategically deceptive interaction online, aiming to confuse, provoke and manipulate others into participating in pointless and even harmful discussions (per [
53,
54]).
We seek to show that the ways in which people use common conversational actions and how they adhere to or violate action-related norms can be used to complement message-level features and thus to better detect covert disruptive interaction (trolling). To show the benefits of a conversational approach, we build on linguistics and conversation analysis (CA) to develop a framework for analyzing online interaction as turn-taking, in which each turn may perform one or several conversational actions.
Besides generally attending to the types of conversational actions used, we analyze one phenomenon in particular. We observe that humans have a tendency to prefer
symmetry in conversation: some action pairs in turn-taking (e.g., question–answer) are normatively bound. Thereby, certain types of responses are expected after response-mobilizing actions [
106]. Unexpected responses are, in turn, referred to as
asymmetric actions in this article. We expect disruptive behaviors to manifest conversational
norm violations where responses are asymmetric with respect to earlier actions (see [
82]).
However, a non-trivial problem is that all turns in conversation need to be tagged with actions in order to analyze the conversational structure. We overcome the lack of suitable CA-based automated action tagging systems by utilizing a state-of-the-art zero-shot
Natural Language Inference (NLI) model [
121] to implement our action tagging scheme. Action-tagged messages are then transformed into various conversational features for computational detection. Drawing from our theoretical foundations, we address three
research questions (RQs):
–
RQ1 (Principle of symmetry): Do aggression and trolling involve more asymmetry than constructive discussion?
–
RQ2 (Principle of conversational distinctiveness) : Are important conversational features such as actions and norm violations different in online aggression as compared with trolling?
–
RQ3 (Principle of context): Do features related to conversational context – i.e., conversational actions, action pairs, and norm violations instead of isolated messages – allow more accurate detection of disruptive interaction as compared with earlier approaches?
We expect conversations involving aggression and trolling to have different identifiable characteristics in conversational features, e.g., the use of actions and conversational norm violations. By investigating conversations in the way presented above, our findings suggest that especially trolling detection can be significantly improved if conversational action features are included as predictive variables. RQ2 functions to provide context for RQ1 and RQ3, and to illustrate why structural analysis of conversation dynamics (i.e., analysis of the interrelations of turns in conversation, drawing from linguistic theories) is essential, showing that in comparison with aggression, trolling involves more indirectness and turn-taking-based conversational strategies such as asymmetries. Our approach offers a new theoretically grounded approach for understanding social media interaction and new means for detecting covert disruptive online behaviors.
3 Data
We studied disruptive interaction using two datasets: one that we used to distinguish aggressive conversations from nonaggressive ones and another that enabled us to compare trolling-like and nontrolling conversations. As aggression and trolling are different phenomena, we needed clearly and robustly labeled data. This was important in order to test whether our conversation-oriented approach would work for both aggression and trolling detection and whether the two behaviors would entail differing conversational features.
Many datasets are available on aggressive or hostile behavior and hate speech (e.g., [
47]). However, most focus on aggression at the message level and thus provide isolated messages instead of full conversations (e.g., [
47,
69]). Moreover, most are X data, which are not necessarily very conversational (see, e.g., datasets listed in [
127]). When it comes to more conversational interaction, the Wikipedia Conversations Gone Awry dataset [
126] fits our criteria of conversationality. Here, aggressive conversations are defined as starting out civil but ending up in a personal attack. The dataset has been annotated by crowdworkers who have filtered and then labeled conversations ending up in “rude, insulting, or disrespectful” behavior as aggressive. The dataset has a robust annotation scheme for personal attacks. Thus, we selected it as our aggression data. The dataset contains 30,021 messages from 4,188 Wikipedia conversations (see Table
2), each conversation including at least three messages. The same authors also provide a similar Reddit dataset, which we excluded because it considers deleted messages as indicators of personal attacks. At this stage, we did not want to use data from which central messages were missing as it would not allow sufficient analysis of entire conversations, especially because our idea was to analyze entire conversations and the interrelations between actions. The missing messages could have been extremely important for our analysis. Another potential dataset could have been the ComMA dataset of Kumar et al. [
68], but as the data is multilingual, at this stage we chose to work with the English-language Wikipedia dataset, as the only suitable conversational trolling dataset is in English only. While multilingual analysis is important, we feel that limiting the language of the data to English only is an important first step in the detection of covert disruptive online behavior in conversational interaction.
Finding a suitable dataset for analyzing trolling was more challenging. Trolling data collection has previously been based on (1) revealed X’s Russian troll accounts
1 (2) aggressive behavior [
59], or (3) suspected or mentioned trolls [
79]. However, these do not exhibit enough conversationality or are in other ways unsuitable in the following ways. First, as was the case with aggression datasets, few datasets on trolling include sufficient meta-level information for analyzing conversational behavior (e.g., to which message another one is responding) to be able to analyze action pairs. Second, wanting to study conversational trolling behaviors exhibiting a range of different strategies (as in [
54]), we could not use data consisting of only aggression-type trolling. Similarly, manipulative X accounts are but one form of trolling that may not be conversational either. Moreover, (possibly biased) lists of confirmed troll accounts reveal only a limited number of such accounts. Lastly, although data can be gathered by searching for troll mentions on online forums (e.g., [
79]) and such data could also portray enough conversationality, it could fail to include deceptive forms of trolling unnoticed by others. Conversations sampled this way may also include cases labeled as trolling in which trolling accusations are actually used as a rhetorical strategy to counter an unwanted argument in a conversation – cases that do not necessarily include any trolling (see [
82]). We therefore concluded that user-based flagging or reporting of trolling behavior as an identification method would be liable to be biased (e.g., [
50]).
Due to these issues, we decided to use and extend a dataset collected by us in two earlier studies [
82,
113].
2 These datasets followed our definition of trolling as conversational behavior. We expanded the original dataset following the original data collection procedure and annotation scheme used in Paakki et al. [
82]. The data sampling was based on Hardaker’s categorization of commonly used trolling styles [
54] and operationalization of their identification on the guidelines defined by Paakki et al. [
82]; the original paper provides further details on collection and annotation criteria.
In the operationalization of trolling identification for our manual data collection to expand the data, we assumed the definition of conversational trolling described in Paakki et al. [
82] as well as the annotation guidelines for trolling identification used in the aforementioned qualitative study. Per this definition, a troll is a participant in a discussion who feigns sincerity, but whose real goals are to disrupt, digress, provoke, or otherwise prolong futile discussion by utilizing strategies described in Hardaker’s typology: aggressing, shocking, endangering others, antipathizing, (hypo)criticizing, or digressing [
54]. Trolling behaviors are delimited to ‘successful’ trolling, cases in which trolls manage to elicit responses.
We expanded the dataset using the collection and annotation criteria described in Paakki et al. [
82] (see the article for detailed guidelines): identifying systematic or repetitious (see [
128]) deceptive behavior utilizing one or more of the six trolling strategies per Hardaker [
54]. We collected novel examples from the same sources the original data stemmed from: comment sections of
TheWashington Post,
The Guardian,
The Telegraph, and Reddit newsgroups, pertaining to similar topics as in Paakki et al. [
82], namely, political discussions around Brexit and climate change, and interest-based or recreational topics around fitness and well-being, relationships (similar to [
113]) and pets.
3 We first read comment sections on each topic until we had identified approximately 200 conversations containing trolling-like behavior (including the 68 cases present in the original data [
82,
113]). A conversation here refers to a branch within a comment section thread. We set the minimum number of comments in a conversation at 3 messages to match the minimum conversation length in the aggression dataset [
126]. We attempted to capture a wide range of trolling styles across different topics and platforms, similar to the earlier qualitative study on conversational trolling [
82].
We collected the data by scraping whole comment sections that we had first read manually, by using screen scraping (Selenium webdriver scripts) and the Reddit
application programming interface (API), and transforming the data into the computer-readable JSON format. This resulted in 4,911 conversations (54,071 messages), including both trolling-like behavior and non-trolling conversations. We made sure that the data included similar message-level and user-level details (e.g., user ID, posting time, reply-to-ID) to the Awry dataset that we chose for studying aggression. We used user IDs and reply-to-IDs to identify response patterns.
4We then carried out all the manual annotation work required for this article. The original dataset was validated as part of the qualitative study [
82] by conducting an inter-annotator test with three annotators for discovering trolling in conversation threads extracted and separated from entire comment sections under either news articles or items starting Reddit threads [
82]. One annotator selected a random set of conversations separated from whole comment sections, including a set of non-trolling and trolling threads, altogether 100 threads. The three annotators then proceeded to annotate these threads as either trolling or non-trolling. Fleiss kappa [
42] measure for the inter-annotator agreement achieved an overall agreement of 86.5%. The free-marginal Fleiss kappa was 0.74, which signifies substantial interrater agreement (range: 0.40–0.75; [
42]) [
82]. To further validate the dataset, we conducted an inter-annotator test for identifying trolling within whole comment sections, as we expected this to be a harder task. For this task, one annotator created a set of documents with whole comment sections (
\(N=100\) ), which either contained or did not contain trolling. With two annotators, we reached inter-annotator reliability scores of 87.10% for overall agreement and 0.74 (substantial agreement) for Fleiss kappa [
42]. For both annotation procedures, we performed iterative annotation, meaning that we first annotated 3–4 practice batches of data, negotiating our annotations in between batches. The disagreements often stemmed from overlooking some detail in the guidelines, missing some important feature in the conversation, or different understandings of the guidelines or the conversations. In the case of different understandings, we analyzed the difficult case, the guidelines, and our interpretations in detail, and, if needed, updated the guidelines so that difficult cases could thereafter be resolved correctly and the classes could be differentiated. We continued iterative annotation until we achieved a good enough understanding of the guidelines and the classes, meaning that a simple agreement percent was high enough (over 80%). After this, we annotated the final set of approximately 100 conversations to calculate the inter-annotator agreement.
Both datasets’ main characteristics are summarized in Table
2. Similar to the aggression data, we included in our final trolling dataset only conversations that involved at least two responses to the message initiating a new conversation. The discussions in the aggression data involve a range of topics. Also, the trolling data include a variety of trolling strategies (as defined in [
54]), as well as conversations from several different online forums – centered on politics and society as well as on leisure and recreational activities. Finally, we used data augmentation methods (SMOTE [
19]) to address the high imbalance between trolling and non-trolling cases: there were 257 trolling conversations and 4,654 non-trolling conversations. This produced an augmented training set with a balanced set of trolling and non-trolling conversations. We used this dataset for classification tasks only, following recommendations by Chawla et al. [
19].
5 Results
The following three subsections will present the findings related to the article’s three RQs: how common asymmetric and symmetric actions are in online aggression and trolling, how aggression and trolling conversations differ in most informative predictive features, and whether our conversational features relating to actions and norm violations can improve the detection of these behaviors. Following the best practice of using separate datasets for training, validation, and testing, the classification results in these subsections are obtained using the test dataset.
5.1 RQ1: Do Aggression and Trolling Involve More Asymmetry than Constructive Interaction?
Our first RQ asked whether disruptive online conversations contain more asymmetry in terms of conversational action pairs when compared with well-intending discussion, i.e., whether aggression contains more asymmetry than non-aggression, and trolling more than non-trolling. For instance, to analyze whether symmetric responses to accusations are different in aggression as compared with non-aggression, we first computed the percentage of symmetric responses in each conversation by dividing the number of symmetric responses with the total number of actions in the given conversation. We then compared the distributions of these percentages between conversations with and without aggression. We used both permutation tests and the Generalized Mann–Whitney–Wilcoxon (i.e., B–M) test [
66] to test for the equality of the group distributions with the null hypothesis that the distributions would be equal. Table
5 shows both the significance levels and effect sizes for these tests. Statistically significant results indicate cases in which the percentages of an action pair were different in disruptive and non-disruptive conversations. Effect sizes below 0.5 point out the action pairs in which asymmetries (or symmetries) were more frequent in disruptive conversations. Effect sizes above 0.5 indicate higher frequencies in non-disruptive conversations.
Table
5 shows differences in the percentages of (a)symmetric action pairs in both datasets. For aggression, the B–M and permutation tests were closely in line with each other. Unexpectedly, the overall percentage of symmetric action pairs was slightly lower (28.2) in conversations without aggression than in conversations with aggression (29.0). This may be due to accusations and requests getting symmetric responses more often in aggressive conversations. Also, overall, accusations seem to be more common in disruptive conversations (see Appendix
B).
Also, the findings regarding asymmetries involved some unexpected results in our aggression dataset. Among those differences that proved statistically significant, in aggressive conversations, accusations and requests were more frequent both in mismatched and missing responses (i.e., their effect size was below 0.5). In other response types, however, asymmetries were more frequent in non-aggressive conversations. Finally, overall, the percentages of asymmetric responses seemed higher for non-aggression both in mismatched responses (12.4 vs. 12.0) and in missing responses (31.7 vs. 31.4). These differences were not, however, statistically significant. Overall, effect sizes for the aggression task were quite close to 0.5.
However, for trolling, the results proved more interesting and were closer to our expectations.
Asymmetries were more common in trolling overall. Among asymmetric mismatched responses, based on the B–M tests and effect sizes, asymmetries in all action pairs except for proposals were more frequent in conversations that contained trolling. Asymmetric missing responses yielded the same result. Also, the three comparisons of totals in asymmetries provided similar statistically significant results. The more conservative permutation tests corroborate the results for all totals and accusations as well as for missing responses to questions and proposals.
The effect sizes, on the other hand, were not strong in many cases. While the effect sizes were between 0.379 and 0.435 for responses to accusations, mismatched or missing responses to questions, and comparisons of totals, the other values were close to 0.5 even in cases in which a significant difference in frequencies was observed. Also, in light of the permutation tests, requests and challenges did not have statistically significant differences in asymmetric responses in trolling versus non-trolling and, in the case of the permutation tests for mismatched responses, only accusations got a statistically significant result in the comparison. As for requests and challenges, the results may be partly due to the fact that the action tagger had lower performance in identifying these actions (see Appendix
A). This might have affected the identification of (a)symmetries in these cases as well as the statistical tests. In fact, when we investigated the action tagging results for requests more closely, the tagger often mistakenly interpreted questions or even challenges as requests or proposals. For these latter classes, norms related to action pairs and symmetric responses would be different (especially so for questions; see Table
3), which likely also affected the results.
Surprisingly, although this was not part of our hypotheses, many symmetric responses were more frequent in trolling as compared with non-trolling. One possible reason can be that these actions (questions, accusations, and challenges, to be specific) were also more frequent in trolling (see Appendix
B). It might also be due to the fact that we are considering whole conversations instead of individual user behaviors. While norm violations in action pairs might be more common for troll users (or aggressive users), the same might not be the case for non-troll users engaging in the conversation. The opposite might even be true: non-troll users might even start responding to other actions in an overtly symmetric manner to hold the norm violator accountable, as can be seen in some examples in studies by Hardaker [
53] and Paakki et al. [
82].
The results seem to support the importance of paying attention to the principle of symmetry between conversational actions (RQ1; see Section 2.1) in trolling but not for aggression. Trolling conversations are more asymmetric in terms of action pairs as compared with non-trolling, although this does not apply similarly to all actions we examined. The results also emphasize the need to recognize the differences in the frequencies of specific action types and how they are used in conversation.
5.2 RQ2: Are Important Conversational Features Such As Actions and Norm Violations Different in Online Aggression As Compared to Trolling?
Our second RQ asked whether there are significant differences in important features, both related to conversational norm violations (asymmetries in action pairs) and in conversational features in general (such as frequencies of different individual actions). With this question, we were interested in finding evidence for our claim that aggression and trolling should be treated as different phenomena, which would also be visible in their features.
The features extracted from the action-tagged data allowed comparisons between most frequent action types and (a)symmetries in aggression on the one hand and in trolling on the other hand. We were also able to compare the importance of other features, including sentiments, different politeness strategies, prompt types, and agreement between the datasets. We started by conducting a stepwise feature-forward selection (stepAIC) using logistic regression to find the most important features out of all our conversation features (see Table
4) that set apart aggression versus non-aggression, and trolling versus non-trolling conversations. We set out to prove that the most important features would be different for trolling in contrast to aggression.
Table
6 presents the results. Out of all the 47 comparison and 53 action-based features in total (see Table
4), the table lists the features that proved optimal in stepwise feature-forward selection for distinguishing aggression from non-aggression and trolling from non-trolling. The algorithm selected 32 features for aggression and 29 for trolling.
The results suggest that the two types of disruptive behavior are best characterized by different feature sets. In light of comparison features, sentiment and POLPR features are more predictive in aggression than in trolling. Sentiment features are especially important in aggression detection (1st, 2nd, and 9th places). This is in line with earlier research [
4,
116]. For trolling detection, in contrast, sentiment features are less essential, with blacklist words reaching only the 18th position. POLPR features, in turn, appear important in both detection tasks, with 13/32 (41%) of the features in aggression detection and 9/29 (31%) features in trolling belonging to that category.
However, most important to RQ2, the table shows that many features related to actions and (a)symmetries carry significant weight in both feature sets, although differently. Action features (i.e., specific individual action frequencies as well as their replies) are important for the aggression task: denials, appreciations, and apologies can be found at the 8th, 10th, and 20th places, and there are 8 action features other than asymmetries in total on the list. With regard to specific actions, the numbers of types of replies to specific actions, i.e., accusations, questions, and challenges, were informative. These actions are seen in the literature to have face-threatening qualities [
11,
45]; thus, in ordinary well-intending discussion, people tend to avoid them. Several such actions were more frequent in aggression as compared with non-aggression (see Appendix
B). Symmetries and asymmetries, on the other hand, appear lower in the list. Only 5/32 (16%) of the top features as listed in Table
6 for aggression are related to symmetries or asymmetries altogether.
For trolling, the findings are different: action-based features carry more weight than in aggression detection. Symmetries and asymmetries, as well as other action-based features, are central in trolling detection: 7/29 (24%) of the top features are related to symmetries or asymmetries in action pairs, and individual action features and other action pair–related features appear 6 times, resulting in 13/29 (45%) action-based features in total. Other action features besides (a)symmetries do not consist of merely directly face-threatening actions or replies to them; rather, they consist of textual symmetry of matching pairs, number of questions asked, apologies, and statements made. Overall, trolling seems to include more accusations as compared with non-trolling (see Appendix
B). Besides norm violations in action pairs (e.g., mismatched replies), the use of accusations, types of responses to them, and asymmetries related to accusations are informative. Some politeness features also carry weight in separating trolling from non-trolling conversations, as they do for distinguishing aggression from non-aggression, e.g., hedging and directness.
To sum, the results speak for the relevance of conversational features in trolling detection but to a lesser extent in aggression detection. Our results differ from earlier research findings in some respects. This demonstrates that comparing aggression and trolling has been fruitful. First, the level of disagreement is informative in aggression prediction, in contrast to the findings of Zakharov et al. [
122]. Second, trolling mentioned by several users is significant, although it needs to be noted that this happens only in about 26% of all trolling conversations (see Appendix
C). By nature, in many cases, troll mentions are indicators of trolling but might not necessarily constitute cases of trolling themselves. Third, while Zhang et al. [
126] reported that both politeness and prompt types are important in aggression, Table
6 shows that politeness features are more prominent than prompt features for both datasets.
11 Although trolling has been seen as a phenomenon that cannot solely be studied based on politeness since there are many more subtle layers to the deceptive phenomenon (e.g., [
53]), POLPR features also seem to be important here in addition to action-based features.
Finally, while aggression and trolling have often been analyzed together (see the Introduction and Section
2.2), the results demonstrate their differences at the feature level. Aggression and trolling differ by their sentiment (e.g., toxicity is a top feature for aggression but does not appear in the trolling list), although negative sentiment has been a central feature in previous studies, both in aggression and trolling detection [
4,
14,
15,
59]. These results illustrate the differences between aggression and trolling, and emphasize that sentiment measures alone are not enough to detect especially covert disruptive behaviors (trolling) online.
Overall, the results demonstrate the differences between conversational strategies and directness (principle of conversational distinctiveness) in these disruptive behaviors (RQ2): aggression and trolling do, indeed, involve significantly different important features and norm violations. This can be seen in their use of specific individual actions, (a)symmetries, politeness, and sentiment. Aggressive interaction is characterized most importantly by sentiment-related features, pair similarity between matching action pairs, (lack of) reconciling actions, and the use of face-threatening actions. Some (a)symmetries are important in the task, however, less so as compared with trolling. Conversations involving trolling are defined largely by various asymmetries such as mismatched responses, the use of specific actions such as accusations (and related asymmetries), and politeness strategies such as indirect greetings and hedging. These can be seen as more indirect strategies to direct or manipulate conversation, ones that the perpetrator can shrug off as unintended or misinterpreted, and that cannot be concretely pointed out, such as offensive language. We conclude that trolling behavior is thus essentially responsive and can be recognized by elusive norm violations in responses to other users.
5.3 RQ3: Do Features Related to Conversational Actions and Norm Violations Allow More Accurate Detection of Disruptive Interaction?
Our second RQ addressed ML-based disruptive behavior detection and whether conversational features related to actions and their norm violations, together with other relevant features, would allow accurate detection of online conversations involving aggression or trolling. Also here, following our interest in detecting covert behavior in particular, we were primarily focused on the models’ performance in trolling detection. We also report the results for aggression detection in order to support comparison with prior works and to evaluate whether the two behaviors should be treated as different phenomena.
To compare our action-based analysis with earlier research, we trained models with increasingly comprehensive feature sets. In aggression detection, we used the feature set of Zhang et al. [
126] (politeness and prompt types; POLPR) in order to include features used in earlier research. Since we used their dataset, we wished to use similar features. However, we stress that the original study by Zhang et al. [
126] sought to predict a toxic ending to a conversation, whereas we sought to distinguish aggressive conversations from non-aggressive conversations. Thus, the performances of the models should not be directly compared. After POLPR and toxicity features, we added our conversational action feature sets to investigate how action-based features and norm violation features would affect the performances of ML models in detecting disruptive behaviors in online conversation. For trolling data – for which suitable prior baseline data were not available – we used the same feature sets but also included trolling mentions in the message-level feature set since earlier research has reported on their importance [
78] and because our findings for RQ2 also demonstrated the importance of this feature. We will first introduce the overall results of our classification tasks and then discuss further details related to both tasks in more depth. Table
7 presents the results of both classification tasks.
For aggression detection, we obtained accuracy levels in the range 0.64–0.69 for the POLPR model. Our best models used POLPR + Tox + Actions and the feature-forward selection, both of which included a combination of action (Actions), toxicity (Tox), politeness and prompt type (POLPR) features. These surpassed the POLPR model with an accuracy of 0.90. The other key measures (recall, precision, and
F1 score) of the best performing aggression detection model are presented in Table
8. In the buildup towards the best model in aggression detection, toxicity features (Tox + Actions, or POLPR + Tox) offered the most prominent accuracy improvement in all the models. Concerning RQ2’s focus on the contribution of conversational actions, the model using only action-related features (Actions) (accuracy 0.68–0.75) was equal to or slightly better than the POLPR model.
The confusion matrix (Table
9) for aggression classification illustrates that the results for our best aggression detection model (using feature-forward selection) were well balanced for both aggression and non-aggression. The
Receiver Operating Characteristics (ROC) Area Under the Curve (AUC) score for the model was 0.95, calculated by repeated stratified K-fold cross-validation using sci-kit learn. To summarize the most important results, aggression classification improved most when using toxicity together with POLPR or conversational action–based features as compared with only using POLPR features.
For trolling detection, we also expected that conversational action-based features would be important. Indeed, Table
7 shows that the best model including action-based features in its feature set can detect trolling with an accuracy as high as 0.92. The action features alone lead to the accuracy of 0.86 already. We conducted the Nemenyi test [
33] to compare the Macro-
F1 performances across the folds between the model using POLPR + Tox features vs. the model using POLPR + Tox + Actions to confirm whether the addition of conversational action-based features would prove to have significant gains. The results show a statistically significant difference between the performances of the models (
\(p\lt 0.05\) ), showing that using action-based features together with features from earlier research can lead to significantly improved results in trolling detection.
These results are better than the classification scores achieved by earlier trolling detectors (e.g., [
50]: accuracy, 0.78–0.85,
\(F1,\) 0.79–0.85; [
79]: accuracy, 0.58–0.65; [
78]: accuracy, 0.81,
\(F1,\) 0.80). The results are not completely comparable, however, as the definitions of trolling and the units of analysis vary between the studies. However, social media data has been used in all of the mentioned studies, and we have also incorporated features deemed most relevant in previous work into our model for comparability.
A further ablation analysis in Table
10 on best model performances on conversations with different lengths showed that our models performed well on different conversation lengths, although very long conversations tended to be more difficult. The ROC AUC score for the best model (XGBoost) was 0.96. The confusion matrix in Table
9 reports that the results are quite well balanced for the trolling task. It is notable that, in contrast to aggression classification, the accuracy levels did not increase to the same extent when toxicity features were added to POLPR features.
As the forward selection of important features (see Table
6) revealed that the number of different users mentioning trolling or accusing someone of trolling is highly informative, we wished to further ensure that the classifier would also perform well on conversations in which trolling is not mentioned. As mentioned earlier, we considered it important for the classifier to be able to also detect covert trolling attempts that have not been identified by other users (see Section
2.3). This was also because detection based on trolling mentions would be biased, working only to detect trolling already noticed by other users (see Appendix
C). Thus, we further tested how our best trolling classifier performed on conversations including trolling mentions or accusations as compared with conversations in which trolling was not mentioned at all. As can be seen in Table
11, the classifier could also accurately detect trolling in conversations in which trolling was not mentioned or no one was accused of trolling. However, the rare cases in which trolling is mentioned in non-trolling conversations seemed to be hardest for the model to identify correctly. Cases of non-trolling with trolling mentions being rare in our data (see Appendix
C), we do not consider this a significant caveat.
The results provide evidence of the efficacy of using conversational features related to actions and norm violations together with message-level features to detect especially covert disruptive social media conversations (trolling): specific actions and norm violations in action pairs, complemented by features used in earlier research, allow better performance in trolling detection as compared with earlier research. Although the classification results for aggression are more modest in comparison with earlier detection models, for trolling they surpass earlier models. This supports the importance of tending to the principle of context in disruptive behavior detection (RQ3) but also emphasizes that conversational action features can add significant information to message-level features used in earlier research to more efficiently detect trolling. Action-based classification has advantages due to the following reasons: (a) instead of investigating only message-specific properties, action-based classification focuses on conversational features, i.e., conversational actions taken within the conversation and the relationships between action pairs; and (b) the analysis of conversational symmetries and asymmetries between action pairs (i.e., dynamics between messages) attends to the particular interaction characteristics typical of especially covert disruptive behaviors (trolling). To sum, conversational action features and norm violations play a key role in our trolling classification task: they allow significantly improved classification results. Thus, instead of focusing solely on message-level features to develop trolling detection models, more focus should be given to conversational features related to conversational actions and norm violations to obtain better results.
6 Discussion
The results of this article emphasize the relevance of linguistic and interaction-based conversational features related to conversational actions and norm violations in analyzing and detecting covert disruptive online conversations. They provide more context for understanding how these phenomena emerge not only through offensive language or negative sentiment but also through violations of conversational norms related to actions. The latter may be acted out, for example, by evading response-expecting actions and social accountability. The results show that analyzing conversational structures, such as action types and the dynamics between action pairs, can reveal differences in their use across different conversational behaviors (aggression, non-aggression, trolling, non-trolling). We verified the validity of our approach by using aggression data from earlier research as a comparison point. Compared with previous studies, conversational action features and norm violations allowed accurate detection of conversations containing aggression and trolling. We demonstrated that our model can especially detect covert behaviors accurately and surpass earlier models in performance; also, different styles of trolling [
54] from different contexts were included in the dataset (see [
82]). The CA-based computational approach to analyzing social media conversations proposed in this article is a novel method and robust in its action tagging scheme, which is rooted in a well-researched theoretical tradition.
Overall, the results stress the importance of three principles in the detection of covert disruptive conversations: the principle of symmetry between conversational actions, the principle of conversational distinctiveness in conversational strategies used, and the principle of context in interaction. Based on our three RQs and their results, we can further elaborate the need to account for these principles in detection models. To identify covert disruption in particular, it is important to pay attention to norm violations in actions in user interaction, indirect strategies of manipulation, and the wider conversational context of messages. This reveals more subtle or covert norm violations as compared with direct offense, e.g., evasion of accountability demands.
Our results in detecting conversations including covert disruption strategies provide a significant contribution for future detection and prevention of especially covert forms of disruption, such as trolling, on social media.
The results emphasize that to have an efficient and sustainable solution to detecting covert attempts at disruption, it is necessary to move beyond the individual message, to look at conversational structure. The types of features we have described in this article could be used to computationally analyze and detect disruptive behavior online. Various norm violations, e.g., in action pairs, are amenable for user-based analysis as well: possible models could produce a probability of a user behaving in an unacceptable way, even during discussion, by using previous turns and inter-message dynamics in the conversation as features. Such features are also more generalizable than word use, for example: common conversational actions and their expectations are features that repeat in various contexts (see, e.g., [
39]; although some exceptions can be found: [
2]). Identification of a range of conversational features such as covert norm violations could allow detection models that would not be as vulnerable to user deception as message-level detection (see [
49]). This, however, is a vein of research that still requires more work.
The results of the ablation study in which we compared classifier performance on conversations with different lengths (Table
10) showed that our model performs well and in quite a stable manner across different conversation lengths. However, very short (3–4 messages) and very long conversations (11–79 messages) tended to be a bit more difficult to classify. Thus, as expected, it seems that a 3-message-long conversation does not always provide enough information. However, with a few more messages, the identification of disruptive behavior seems to become a little easier. For potential approaches in which disruptive behavior could be identified during conversation, we suspect that very early detection, when there are only very few messages in the conversation, might be most challenging, especially when it comes to deceptive behaviors such as covert trolling. In addition, when operating on extremely long conversations, these should perhaps be further analyzed in smaller batches. It must be taken into account in future research that a very long conversation (e.g., including over 100 messages) might include a problematic thread but might not be problematic in its entirety.
6.1 Limitations
Our computational modeling and analysis of online conversations through the lens of conversational actions and asymmetries is still under development. From an applied digital CA perspective, more detailed analyses will be needed to be able to better account for common patterns found in online interaction. Firstly, we could get better insights through even more detailed analysis and distinction of actions and their functions. For instance, accusations that are targeted at a specific person in contrast to general accusations towards a group of people could be analyzed separately. Secondly, the analysis did not consider the existence of insert expansions that are commonly found as injections in between action pairs [
96, pp. 97–114]. For example, a question as a response to another question might be acceptable if it contains a request for more detailed information with regard to the original question and the original question is returned to afterwards. In this article, we have supposed that they are less common than action pairs without insert expansions and, thus, in a large conversation dataset should not greatly affect the end result. Moreover, since we assumed a conservative approach in counting asymmetries, e.g., in the case of mismatched responses, future studies could attempt to identify asymmetries in more detail (e.g., between parts of messages).
As no action identification model trained on accusations and challenges was available, the zero-shot approach was our best option at this point. However, although we reached fruitful results in our conversation classification tasks, it should be noted that a zero-shot model is likely to be less accurate than a supervised model trained on annotated data [
121]. Zero-shot NLI is also a black box. It lacks transparency; thus, the reasons by which it selects a particular class cannot be scrutinized even if we wished to. For this reason, supervised ML approaches might provide more insight into the analysis of conversational actions. NLI is also computationally demanding and, thus, very slow if we operate on very large datasets; faster models requiring less resources or computation should be developed for such cases. The NLI-based model does not reach a very high performance on our 10-way action classification task. However, we consider its performance to be adequate here for a first model that looks at conversational context as a basis for discerning conversations involving trolling and aggression from more peaceful conversations. Still, there is room for future research to implement more fine-grained high-performance models of action classification for asynchronous forum conversations, including classes such as accusations and challenges that are informative for trolling especially [
7,
82]. However, developing such an action classification model was out of the scope of this article, as the effort to create an annotated dataset and to implement a robust model for this purpose would be worth a study of its own. In the future, a better action classification model could be trained incorporating digital CA-inspired action classes: accusations, challenges, and requests for action along with questions, proposals, statements, and their expected responses. Future research should investigate whether this will yield even better results in social media conversation classification tasks.
Another limitation is the selection of datasets. One concern regarding the aggression data is the length of the discussions: the data by Zhang et al. [
126] includes many conversations that are only 3–4 messages long. In the authors’ definition, a conversation that involves a personal attack always starts as neutral but ends up with an attack. It is debatable whether 3–4 messages would count as such a conversation. Therefore, in the future, it may be reasonable to analyze conversations in which the minimum number of turns is higher. Furthermore, datasets pose a limitation as there do not exist many datasets on trolling or aggression that would provide enough conversational information (sequential organization of messages, message and author IDs, and which messages they are responding to). Therefore, it was not possible to find conversational datasets on trolling and aggression that would have been similar enough in terms of discussion topics and source platforms to be perfectly comparable. We also could not find similar enough classification studies that could have been used as a comparison baseline – the most similar study has a slightly different focus [
126]. Due to the lack of trolling (or aggression) datasets that would provide enough conversational information to conduct our analysis, we acknowledge that we had to choose datasets that are not exactly similar, as Wikipedia editor conversations may differ in interaction from the types of conversations included in the trolling dataset. Some of the potential datasets we did find either had a completely different definition of trolling [
59], failed to include some of the very central messages in their data (as in the Reddit dataset of Zhang et al. [
126]) or included multilingual content that we could not analyze at this point in our work [
68].
It should also be stressed that, at this point, aggression and trolling data are not fully comparable. This is because the aggression data is solely obtained from conversations among Wikipedia editors that may possess qualitatively different characteristics as compared with the conversations in the trolling dataset. Thus, some of the differences that we observe in our results might be based on the different nature of the types of interaction found in the respective datasets, or, for example, the differences of audiences. In other words, aggressive behavior in interaction might be more or less toxic in Wikipedia editor conversations as compared with newspaper comment sections or Reddit conversations. Furthermore, we have no way to ensure that the aggression data does not contain some amount of trolling, though we ascertained with random checks that the data portrayed mainly aggression. Similarly, the trolling data has not been filtered for aggressive behavior that is not trolling. However, since the main focus of this study is on trolling detection and whether actions and norm violations enable improved performance therein, we feel that this is a minor problem. This also relates to limitations regarding RQ2, which functions in this article to provide more context for RQ1 and RQ3, and to illustrate the different features relevant to trolling and aggression. We acknowledge that the datasets are not equal and, therefore, direct comparisons of the features are not reliable. In order to more reliably study the exact differences between them, further research using similar data with careful annotation practices for both would be needed. This was out of the scope of this article. We feel that future work should investigate how novel datasets could be gathered to allow more versatile conversational analyses of trolling and aggression, and how novel datasets could in the future be used to further corroborate our results.
A third limitation in our study is the small size of the trolling dataset. This could not be avoided because, as of this writing, there exist no large datasets with robust collection criteria, encompassing all known and well-researched trolling styles [
54] and including both audience-recognized trolling behavior (that has been filtered to account for audience bias) and behavior that has not been recognized or named by the audience as trolling. Incorporating the latter data is very important, as deceptiveness is at the very core of trolling [
36] and, thus, presumably most successful trolling attempts go unnoticed and unlabeled. The small size of our dataset might limit the generalizability of the models even with data augmentation. Nevertheless, the results are promising.
In future research, larger conversational trolling data should be gathered and the analyses ascertained by running them on the larger-scale dataset. Studies could also explore classification performance differences across different conversation topics, a task that was out of the scope of this article. Furthermore, conversational datasets with multilingual and low-resource language content would be an important next step for which to apply our detection approach and analyses. At the present moment, we have concentrated on English-only data as a relevant first step. However, trolling detection is a task that should be applicable to many language contexts as well as multilingual settings, which is why we consider this an important future direction for research.
6.2 Methodological Insights
Our results provide an interesting and novel angle to the study of computer-mediated communication. Although CMC is said to be more fragmented than ordinary face-to-face interaction (e.g., [
56]) and thus not as diligently following the rules or customs of face-to-face conversation, the results in this article prove that well-intending online conversations also adhere to the norms of conversational action, e.g., the rules of action pairs (see also [
77]). Norm-violating conversational actions, even in online discussions, are less common in the flow of well-intended conversation. Thus, the evidence here backs up the claim [
77] that people orient to regular conversational social norms even when interacting with others on social media. We have shown that, at least to some extent, computational implementation of analyses of actions and action pairs is possible. This opens new possibilities for studying interaction in CMC – and for studying disruptive behavior online. Norm violation can sometimes be a way of using power in interaction, especially when it is systematic. Thus, some of the patterns found in this study could shed more light on how different actions are used to assert power online in trolling, aggressive behavior, political communication or other modes of online interaction attempting to influence other people. Action-based differences in conversational data could be used to differentiate between types of conversations based on their inner dynamics.
Our study reveals that online discussions can be analyzed computationally via conversational actions and their norm violations, which is a novel approach in computational CMC research. It also reveals that the sequential structure of conversation carries relevance in analyzing different online behaviors. In addition to bringing novel findings and methodologies to computational CMC, our study has important implications to applied CA. Firstly, we have shown that as an objective of study, online behaviors such as trolling and aggression can be analyzed and recognized by using CA-based coding schemes, which, to our knowledge, is a new area of application for them. Also, for covert behavior (trolling), we have shown that the approach can outperform earlier approaches. Future work may look at how to better implement more fine-grained CA-based action classification systems, which could result in better identification of trolling. Secondly, this study is an important indicator that applied computational CA, which thus far has been more of a hypothetical than a real field of study, is not only applicable but also integral when we want to detect covert disruptive online behavior (e.g., trolling).
We have successfully combined applied computational CA with other conversational and linguistic measures such as the toxicity of language use, and have shown that our approach may well be the most successful one when identifying covert disruptive online behaviors. Based on the importance of different conversational features, it seems that future research should also investigate how topical digressions play into how disruptive interactions unravel: pair similarities (cosine) between messages were included in the list of important features (Table
6), but the more exact dynamics of such similarities should be studied further. The results also suggest that there is more work to be done within computational detection of covert disruptive behaviors.
6.3 Automated Moderation and Social Media
Disruptive behavior online deteriorates online discussions and threatens democratic systems [
8], especially in the form of trolling-like manipulative communication strategies (e.g., [
1]). Since trolling strategies have been found to be used on the one hand to attack vulnerable groups and minorities (e.g., [
56]) and on the other hand to manipulate public opinion and to spread disinformation and uncertainty [
6,
102], automated detection of both aggressive and trolling behavior is sorely needed to reduce the impact of harmful and manipulative behaviors on individuals and society. We feel that norm violation–based identification such as what we have presented could be effectively used to find possible systematic manipulation or covert disruption such as trolling in online conversations. It could also be used to prevent the effects of content or sentiment amplification attempts [
6], harassment of specific groups of people online (see, e.g., [
91]), and possible automated provocation and trolling attempts on conversational forums. The three principles we have defined in this article outline the approach that automatic detection of covert disruptive online behavior could be based on. Models can benefit from identifying covert norm violations of symmetry between conversational actions (principle of symmetry), accounting for indirect strategies in interaction (principle of conversational distinctiveness), and analyzing the wider conversational context around turns (principle of context).
Tools for recognizing disruptive behavior are important for better social media: during significant societal events, there is a need for recognizing harmful manipulative behaviors that attempt to amplify certain messages or manipulate social media behaviors to boost the motivations of political players or other social media influencers (see, e.g., [
6]). Recent conversations around social media responsibility and moderation (e.g., [
12,
73]) might necessitate more governance and moderation on the side of social media platform providers when it comes to information spread and important social phenomena such as elections or health behaviors. This would further emphasize the need for detecting both covert and overt systematically disruptive behaviors online.
The reasons why automated moderation is important are twofold. First, the simplest manners of violating social norms, such as asymmetric responses, are also easy to automate. Thus, it is likely that these could be used to disrupt and manipulate online conversations around sensitive or political issues. As bots have already been shown to be used in the amplification of political agendas on X [
6], it is plausible that bots capable of trolling and aggression on more conversational online forums could also be used in the future for similar purposes. This emphasizes the importance of automated moderation. Moreover, with recent advances in NLP, bots participating in conversations might not be discernible from human participants even with meta-level account information. We feel that, in this case, moderation should be based on systematic display of disruptive behavior, regardless of whether bot or human. Second, earlier research shows that many online conversations are easily volatile [
62]. Thus, many platforms are also vulnerable to aggression and trolling if there already is a history of disruptive conversation practices in these discussion environments. Trolling, for instance, is often infectious [
21]. This makes it doubly important to moderate.
To improve the identification of disruptive behaviors, future research will likely need to look into various possible information sources besides conversational analyses of interaction, e.g., combining conversation–internal information with conversation–external user information as well as message-level analyses. Although in this article we have stressed the importance of investigating the conversational context past only analyzing standalone messages, we think that message-level analyses are also needed and can be very helpful in some cases, for instance, when there is only one initial potentially provocative message in a thread. In such a case, of course, a conversational approach would possibly not be needed. However, deceptive behaviors such as covert trolling cannot always be detected if only examining the first message in a thread. Thus, we think that even having just a few messages (and more than one from the troll) will provide more relevant information on potential consistent disruptive behavior than a single message.
Beyond deciding whether a conversation includes disruptive behaviors or attempts to manipulate, the next steps would also involve deciding whether some specific users have acted as instigators of disruption or repeatedly violated community norms, and how to act on these issues. Here, various options are possible: either computational identification of problematic messages and user behaviors within and across conversations based on interactions, produced content, and meta information or identification by human moderators. In cases in which a conversation involves tens or hundreds of messages, a mixed approach might be a viable option.
Assuming that automated moderation of disruptive behaviors is implemented on online forums on a larger and more intricate scale than it currently is, what can be done and how we operate on instances identified as possible trolling remain questions for which in-depth ethical considerations should be initiated [
12]. For instance, to discern between instigators of aggression or trolling justly, moderation could be based on the collaboration of human moderators and computational detection systems [
63], i.e., a human could make the final decision. Moreover, extreme care is required in assigning an “aggressor” or “troll” label to single users, as this is a grave accusation and erroneous accusations can also have a negative impact on the person in question (e.g., unfair blocking: [
64]). Safer and fair options need to be explored. A possibility would be to first inform users themselves of the observation that they have violated community norms (e.g., notifying them of messages they have left unanswered). Then, perhaps, if the behavior continues, there would be a need for human moderators in the loop. As algorithms and automated systems can lead to biased end results, careful use of computational models is needed to avoid unnecessarily placing blame on innocents or individuals who might unwittingly disregard social norms. Ethical design and employment of disruptive behavior detection is a vein of research we feel will be necessary and valuable if and when large-scale automated identification of these disruptive behaviors is to be used on online forums – to maintain civility and platform norms but not to increase censorship.
6.4 Conclusion
These results offer two important contributions to the analysis of covert disruptive behavior in online interaction. First, the conversational approach building on the concepts of conversational action and norm violation offers a new theoretically grounded approach for understanding disruptive online behavior. This offers a significant contribution to developing new computational models for analyzing online discussions and to (automated) data collection for online trolling research, e.g., to help capture a range of different styles of conversational trolling. Second, our computational model for detecting conversations involving trolling demonstrate practical improvements that surpass earlier detection accuracy levels. Our approach paves the way for possible computational models for automated moderation of harmful covert disruptive behaviors, especially trolling, which has so far been difficult to detect automatically. As the main interest in the computational identification of trolling is the prevention of disruption, more real-time methods are needed in contrast to leaked lists of paid trolls or troll mentions that often come too late or identify only a subsection of trolling attempts. Thus, the results of this article offer a significant contribution to developing methods to more accurately detect covert disruption. They also pave the way for possible future models utilizing analyses of conversational actions and norm violations that could help prevent covert disruptive behaviors.