Search | arXiv e-print repository

arXiv:2401.02095 [pdf, other]

Politics and Propaganda on Social Media: How Twitter and Meta Moderate State-Linked Information Operations

Authors: Nihat Mugurtay, Umut Duygu, Onur Varol

Abstract: Why do Social Media Corporations (SMCs) engage in state-linked information operations? Social media can significantly influence the global political landscape, allowing governments and other political entities to engage in concerted information operations, shaping or manipulating domestic and foreign political agendas. In response to state-linked political manipulation tactics on social media, Twi… ▽ More Why do Social Media Corporations (SMCs) engage in state-linked information operations? Social media can significantly influence the global political landscape, allowing governments and other political entities to engage in concerted information operations, shaping or manipulating domestic and foreign political agendas. In response to state-linked political manipulation tactics on social media, Twitter and Meta carried out take-down operations against propaganda networks, accusing them of interfering foreign elections, organizing disinformation campaigns, manipulating political debates and many other issues. This research investigates the two SMCs' policy orientation to explain which factors can affect these two companies' reaction against state-linked information operations. We find that good governance indicators such as democracy are significant elements of SMCs' country-focus. This article also examines whether Meta and Twitter's attention to political regime characteristics is influenced by international political alignments. This research illuminates recent trends in SMCs' take-down operations and illuminating interplay between geopolitics and domestic regime characteristics. △ Less

Submitted 4 January, 2024; originally announced January 2024.

Comments: 4 figures, 2 tables and Supp. Info

arXiv:2312.17423 [pdf, other]

Social Bots: Detection and Challenges

Authors: Kai-Cheng Yang, Onur Varol, Alexander C. Nwala, Mohsen Sayyadiharikandeh, Emilio Ferrara, Alessandro Flammini, Filippo Menczer

Abstract: While social media are a key source of data for computational social science, their ease of manipulation by malicious actors threatens the integrity of online information exchanges and their analysis. In this Chapter, we focus on malicious social bots, a prominent vehicle for such manipulation. We start by discussing recent studies about the presence and actions of social bots in various online di… ▽ More While social media are a key source of data for computational social science, their ease of manipulation by malicious actors threatens the integrity of online information exchanges and their analysis. In this Chapter, we focus on malicious social bots, a prominent vehicle for such manipulation. We start by discussing recent studies about the presence and actions of social bots in various online discussions to show their real-world implications and the need for detection methods. Then we discuss the challenges of bot detection methods and use Botometer, a publicly available bot detection tool, as a case study to describe recent developments in this area. We close with a practical guide on how to handle social bots in social media research. △ Less

Submitted 28 December, 2023; originally announced December 2023.

Comments: This is a draft of the chapter. The final version will be available in the Handbook of Computational Social Science edited by Taha Yasseri, forthcoming 2024, Edward Elgar Publishing Ltd. The material cannot be used for any other purpose without further permission of the publisher and is for private use only

arXiv:2311.18063 [pdf, other]

TurkishBERTweet: Fast and Reliable Large Language Model for Social Media Analysis

Authors: Ali Najafi, Onur Varol

Abstract: Turkish is one of the most popular languages in the world. Wide us of this language on social media platforms such as Twitter, Instagram, or Tiktok and strategic position of the country in the world politics makes it appealing for the social network researchers and industry. To address this need, we introduce TurkishBERTweet, the first large scale pre-trained language model for Turkish social medi… ▽ More Turkish is one of the most popular languages in the world. Wide us of this language on social media platforms such as Twitter, Instagram, or Tiktok and strategic position of the country in the world politics makes it appealing for the social network researchers and industry. To address this need, we introduce TurkishBERTweet, the first large scale pre-trained language model for Turkish social media built using almost 900 million tweets. The model shares the same architecture as base BERT model with smaller input length, making TurkishBERTweet lighter than BERTurk and can have significantly lower inference time. We trained our model using the same approach for RoBERTa model and evaluated on two text classification tasks: Sentiment Classification and Hate Speech Detection. We demonstrate that TurkishBERTweet outperforms the other available alternatives on generalizability and its lower inference time gives significant advantage to process large-scale datasets. We also compared our models with the commercial OpenAI solutions in terms of cost and performance to demonstrate TurkishBERTweet is scalable and cost-effective solution. As part of our research, we released TurkishBERTweet and fine-tuned LoRA adapters for the mentioned tasks under the MIT License to facilitate future research and applications on Turkish social media. Our TurkishBERTweet model is available at: https://github.com/ViralLab/TurkishBERTweet △ Less

Submitted 29 November, 2023; originally announced November 2023.

Comments: 21 pages, 4 figures, 8 tables

arXiv:2310.20407 [pdf, other]

Unsupervised detection of coordinated fake-follower campaigns on social media

Authors: Yasser Zouzou, Onur Varol

Abstract: Automated social media accounts, known as bots, are increasingly recognized as key tools for manipulative online activities. These activities can stem from coordination among several accounts and these automated campaigns can manipulate social network structure by following other accounts, amplifying their content, and posting messages to spam online discourse. In this study, we present a novel un… ▽ More Automated social media accounts, known as bots, are increasingly recognized as key tools for manipulative online activities. These activities can stem from coordination among several accounts and these automated campaigns can manipulate social network structure by following other accounts, amplifying their content, and posting messages to spam online discourse. In this study, we present a novel unsupervised detection method designed to target a specific category of malicious accounts designed to manipulate user metrics such as online popularity. Our framework identifies anomalous following patterns among all the followers of a social media account. Through the analysis of a large number of accounts on the Twitter platform (rebranded as Twitter after the acquisition of Elon Musk), we demonstrate that irregular following patterns are prevalent and are indicative of automated fake accounts. Notably, we find that these detected groups of anomalous followers exhibit consistent behavior across multiple accounts. This observation, combined with the computational efficiency of our proposed approach, makes it a valuable tool for investigating large-scale coordinated manipulation campaigns on social media platforms. △ Less

Submitted 31 October, 2023; originally announced October 2023.

Comments: 17 pages, 5 figures, 1 table and supplementary information

arXiv:2310.16181 [pdf, other]

Hidden Citations Obscure True Impact in Science

Authors: Xiangyi Meng, Onur Varol, Albert-László Barabási

Abstract: References, the mechanism scientists rely on to signal previous knowledge, lately have turned into widely used and misused measures of scientific impact. Yet, when a discovery becomes common knowledge, citations suffer from obliteration by incorporation. This leads to the concept of hidden citation, representing a clear textual credit to a discovery without a reference to the publication embodying… ▽ More References, the mechanism scientists rely on to signal previous knowledge, lately have turned into widely used and misused measures of scientific impact. Yet, when a discovery becomes common knowledge, citations suffer from obliteration by incorporation. This leads to the concept of hidden citation, representing a clear textual credit to a discovery without a reference to the publication embodying it. Here, we rely on unsupervised interpretable machine learning applied to the full text of each paper to systematically identify hidden citations. We find that for influential discoveries hidden citations outnumber citation counts, emerging regardless of publishing venue and discipline. We show that the prevalence of hidden citations is not driven by citation counts, but rather by the degree of the discourse on the topic within the text of the manuscripts, indicating that the more discussed is a discovery, the less visible it is to standard bibliometric analysis. Hidden citations indicate that bibliometric measures offer a limited perspective on quantifying the true impact of a discovery, raising the need to extract knowledge from the full text of the scientific corpus. △ Less

Submitted 11 May, 2024; v1 submitted 24 October, 2023; originally announced October 2023.

Comments: 71 pages (main + SI)

arXiv:2301.11429 [pdf, other]

Just Another Day on Twitter: A Complete 24 Hours of Twitter Data

Authors: Juergen Pfeffer, Daniel Matter, Kokil Jaidka, Onur Varol, Afra Mashhadi, Jana Lasser, Dennis Assenmacher, Siqi Wu, Diyi Yang, Cornelia Brantner, Daniel M. Romero, Jahna Otterbacher, Carsten Schwemmer, Kenneth Joseph, David Garcia, Fred Morstatter

Abstract: At the end of October 2022, Elon Musk concluded his acquisition of Twitter. In the weeks and months before that, several questions were publicly discussed that were not only of interest to the platform's future buyers, but also of high relevance to the Computational Social Science research community. For example, how many active users does the platform have? What percentage of accounts on the site… ▽ More At the end of October 2022, Elon Musk concluded his acquisition of Twitter. In the weeks and months before that, several questions were publicly discussed that were not only of interest to the platform's future buyers, but also of high relevance to the Computational Social Science research community. For example, how many active users does the platform have? What percentage of accounts on the site are bots? And, what are the dominating topics and sub-topical spheres on the platform? In a globally coordinated effort of 80 scholars to shed light on these questions, and to offer a dataset that will equip other researchers to do the same, we have collected all 375 million tweets published within a 24-hour time period starting on September 21, 2022. To the best of our knowledge, this is the first complete 24-hour Twitter dataset that is available for the research community. With it, the present work aims to accomplish two goals. First, we seek to answer the aforementioned questions and provide descriptive metrics about Twitter that can serve as references for other researchers. Second, we create a baseline dataset for future research that can be used to study the potential impact of the platform's ownership change. △ Less

Submitted 11 April, 2023; v1 submitted 26 January, 2023; originally announced January 2023.

arXiv:2211.13121 [pdf, other]

#Secim2023: First Public Dataset for Studying Turkish General Election

Authors: Ali Najafi, Nihat Mugurtay, Ege Demirci, Serhat Demirkiran, Huseyin Alper Karadeniz, Onur Varol

Abstract: In the context of Turkey's upcoming parliamentary and presidential elections ("seçim" in Turkish), social media is playing an important role in shaping public debate. The increasing engagement of citizens on social media platforms has led to the growing use of social media by political actors. It is of utmost importance to capture the upcoming Turkish elections, as social media is becoming an esse… ▽ More In the context of Turkey's upcoming parliamentary and presidential elections ("seçim" in Turkish), social media is playing an important role in shaping public debate. The increasing engagement of citizens on social media platforms has led to the growing use of social media by political actors. It is of utmost importance to capture the upcoming Turkish elections, as social media is becoming an essential component of election propaganda, political debates, smear campaigns, and election manipulation by domestic and international actors. We provide a comprehensive dataset for social media researchers to study the upcoming election, develop tools to prevent online manipulation, and gather novel information to inform the public. We are committed to continually improving the data collection and updating it regularly leading up to the election. Using the Secim2023 dataset, researchers can examine the social and communication networks between political actors, track current trends, and investigate emerging threats to election integrity. Our dataset is available at: https://github.com/ViralLab/Secim2023_Dataset △ Less

Submitted 22 November, 2022; originally announced November 2022.

Comments: 22 pages, 9 figures

arXiv:2209.10006 [pdf, other]

Should we agree to disagree about Twitter's bot problem?

Authors: Onur Varol

Abstract: Bots, simply defined as accounts controlled by automation, can be used as a weapon for online manipulation and pose a threat to the health of platforms. Researchers have studied online platforms to detect, estimate, and characterize bot accounts. Concerns about the prevalence of bots were raised following Elon Musk's bid to acquire Twitter. Twitter's recent estimate that 5\% of monetizable daily a… ▽ More Bots, simply defined as accounts controlled by automation, can be used as a weapon for online manipulation and pose a threat to the health of platforms. Researchers have studied online platforms to detect, estimate, and characterize bot accounts. Concerns about the prevalence of bots were raised following Elon Musk's bid to acquire Twitter. Twitter's recent estimate that 5\% of monetizable daily active users being bot accounts raised questions about their methodology. This estimate is based on a specific number of active users and relies on Twitter's criteria for bot accounts. In this work, we want to stress that crucial questions need to be answered in order to make a proper estimation and compare different methodologies. We argue how assumptions on bot-likely behavior, the detection approach, and the population inspected can affect the estimation of the percentage of bots on Twitter. Finally, we emphasize the responsibility of platforms to be vigilant, transparent, and unbiased in dealing with threats that may affect their users. △ Less

Submitted 5 November, 2022; v1 submitted 20 September, 2022; originally announced September 2022.

Comments: 22 pages, 5 figures

arXiv:2203.03430 [pdf, other]

Academic Support Network Reflects Doctoral Experience and Productivity

Authors: Ozgur Can Seckin, Onur Varol

Abstract: Current practices of quantifying performance by productivity leads serious concerns for psychological well-being of doctoral students and influence of research environment is often neglected in research evaluations. Acknowledgements in dissertations reflect the student experience and provide an opportunity to thank the people who support them. We conduct textual analysis of acknowledgments to buil… ▽ More Current practices of quantifying performance by productivity leads serious concerns for psychological well-being of doctoral students and influence of research environment is often neglected in research evaluations. Acknowledgements in dissertations reflect the student experience and provide an opportunity to thank the people who support them. We conduct textual analysis of acknowledgments to build the "academic support network," uncovering five distinct communities: Academic, Administration, Family, Friends & Colleagues, and Spiritual; each of which is acknowledged differently by genders and disciplines. Female students mention fewer people from each community except for their families and total number of people mentioned in acknowledgements allows disciplines to be categorized as either individual science or team science. We also show that number of people mentioned from academic community is positively correlated with productivity and institutional rankings are found to be correlated with productivity and size of academic support networks but show no effect on students' sentiment on acknowledgements. Our results indicate the importance of academic support networks by explaining how they differ and how they influence productivity. △ Less

Submitted 7 March, 2022; originally announced March 2022.

Comments: 31 pages, 4 figures, 1 table

arXiv:2107.02263 [pdf, other]

Information Access Equality on Network Generative Models

Authors: Xindi Wang, Onur Varol, Tina Eliassi-Rad

Abstract: It is well known that networks generated by common mechanisms such as preferential attachment and homophily can disadvantage the minority group by limiting their ability to establish links with the majority group. This has the effect of limiting minority nodes' access to information. We present the results of an empirical study on the equality of information access in network models with different… ▽ More It is well known that networks generated by common mechanisms such as preferential attachment and homophily can disadvantage the minority group by limiting their ability to establish links with the majority group. This has the effect of limiting minority nodes' access to information. We present the results of an empirical study on the equality of information access in network models with different growth mechanisms and spreading processes. For growth mechanisms, we focus on the majority/minority dichotomy, homophily, preferential attachment, and diversity. For spreading processes, we investigate simple vs. complex contagions, different transmission rates within and between groups, and various seeding conditions. We observe two phenomena. First, information access equality is a complex interplay between network structures and the spreading processes. Second, there is a trade-off between equality and efficiency of information access under certain circumstances (e.g., when inter-group edges are low and information transmits asymmetrically). Our findings can be used to make recommendations for mechanistic design of social networks with information access equality. △ Less

Submitted 19 December, 2021; v1 submitted 5 July, 2021; originally announced July 2021.

arXiv:2104.10864 [pdf, other]

doi 10.1371/journal.pone.0263381

Misinformation, Believability, and Vaccine Acceptance Over 40 Countries: Takeaways From the Initial Phase of The COVID-19 Infodemic

Authors: Karandeep Singh, Gabriel Lima, Meeyoung Cha, Chiyoung Cha, Juhi Kulshrestha, Yong-Yeol Ahn, Onur Varol

Abstract: The COVID-19 pandemic has been damaging to the lives of people all around the world. Accompanied by the pandemic is an infodemic, an abundant and uncontrolled spreading of potentially harmful misinformation. The infodemic may severely change the pandemic's course by interfering with public health interventions such as wearing masks, social distancing, and vaccination. In particular, the impact of… ▽ More The COVID-19 pandemic has been damaging to the lives of people all around the world. Accompanied by the pandemic is an infodemic, an abundant and uncontrolled spreading of potentially harmful misinformation. The infodemic may severely change the pandemic's course by interfering with public health interventions such as wearing masks, social distancing, and vaccination. In particular, the impact of the infodemic on vaccination is critical because it holds the key to reverting to pre-pandemic normalcy. This paper presents findings from a global survey on the extent of worldwide exposure to the COVID-19 infodemic, assesses different populations' susceptibility to false claims, and analyzes its association with vaccine acceptance. Based on responses gathered from over 18,400 individuals from 40 countries, we find a strong association between perceived believability of misinformation and vaccination hesitancy. Additionally, our study shows that only half of the online users exposed to rumors might have seen the fact-checked information. Moreover, depending on the country, between 6% and 37% of individuals considered these rumors believable. Our survey also shows that poorer regions are more susceptible to encountering and believing COVID-19 misinformation. We discuss implications of our findings on public campaigns that proactively spread accurate information to countries that are more susceptible to the infodemic. We also highlight fact-checking platforms' role in better identifying and prioritizing claims that are perceived to be believable and have wide exposure. Our findings give insights into better handling of risk communication during the initial phase of a future pandemic. △ Less

Submitted 22 April, 2021; originally announced April 2021.

arXiv:2006.06867 [pdf, other]

doi 10.1145/3340531.3412698

Detection of Novel Social Bots by Ensembles of Specialized Classifiers

Authors: Mohsen Sayyadiharikandeh, Onur Varol, Kai-Cheng Yang, Alessandro Flammini, Filippo Menczer

Abstract: Malicious actors create inauthentic social media accounts controlled in part by algorithms, known as social bots, to disseminate misinformation and agitate online discussion. While researchers have developed sophisticated methods to detect abuse, novel bots with diverse behaviors evade detection. We show that different types of bots are characterized by different behavioral features. As a result,… ▽ More Malicious actors create inauthentic social media accounts controlled in part by algorithms, known as social bots, to disseminate misinformation and agitate online discussion. While researchers have developed sophisticated methods to detect abuse, novel bots with diverse behaviors evade detection. We show that different types of bots are characterized by different behavioral features. As a result, supervised learning techniques suffer severe performance deterioration when attempting to detect behaviors not observed in the training data. Moreover, tuning these models to recognize novel bots requires retraining with a significant amount of new annotations, which are expensive to obtain. To address these issues, we propose a new supervised learning method that trains classifiers specialized for each class of bots and combines their decisions through the maximum rule. The ensemble of specialized classifiers (ESC) can better generalize, leading to an average improvement of 56\% in F1 score for unseen accounts across datasets. Furthermore, novel bot behaviors are learned with fewer labeled examples during retraining. We deployed ESC in the newest version of Botometer, a popular tool to detect social bots in the wild, with a cross-validation AUC of 0.99. △ Less

Submitted 14 August, 2020; v1 submitted 11 June, 2020; originally announced June 2020.

Comments: 8 pages, 10 figures, Accepted to CIKM'20

Journal ref: Proc. 29th ACM International Conference on Information and Knowledge Management (CIKM), pages 2725-2732, 2020

arXiv:2004.07229 [pdf]

doi 10.1073/pnas.2025581118

Network Medicine Framework for Identifying Drug Repurposing Opportunities for COVID-19

Authors: Deisy Morselli Gysi, Ítalo Do Valle, Marinka Zitnik, Asher Ameli, Xiao Gan, Onur Varol, Susan Dina Ghiassian, JJ Patten, Robert Davey, Joseph Loscalzo, Albert-László Barabási

Abstract: The current pandemic has highlighted the need for methodologies that can quickly and reliably prioritize clinically approved compounds for their potential effectiveness for SARS-CoV-2 infections. In the past decade, network medicine has developed and validated multiple predictive algorithms for drug repurposing, exploiting the sub-cellular network-based relationship between a drug's targets and di… ▽ More The current pandemic has highlighted the need for methodologies that can quickly and reliably prioritize clinically approved compounds for their potential effectiveness for SARS-CoV-2 infections. In the past decade, network medicine has developed and validated multiple predictive algorithms for drug repurposing, exploiting the sub-cellular network-based relationship between a drug's targets and disease genes. Here, we deployed algorithms relying on artificial intelligence, network diffusion, and network proximity, tasking each of them to rank 6,340 drugs for their expected efficacy against SARS-CoV-2. To test the predictions, we used as ground truth 918 drugs that had been experimentally screened in VeroE6 cells, and the list of drugs under clinical trial, that capture the medical community's assessment of drugs with potential COVID-19 efficacy. We find that while most algorithms offer predictive power for these ground truth data, no single method offers consistently reliable outcomes across all datasets and metrics. This prompted us to develop a multimodal approach that fuses the predictions of all algorithms, showing that a consensus among the different predictive methods consistently exceeds the performance of the best individual pipelines. We find that 76 of the 77 drugs that successfully reduced viral infection do not bind the proteins targeted by SARS-CoV-2, indicating that these drugs rely on network-based actions that cannot be identified using docking-based strategies. These advances offer a methodological pathway to identify repurposable drugs for future pathogens and neglected diseases underserved by the costs and extended timeline of de novo drug development. △ Less

Submitted 9 August, 2020; v1 submitted 15 April, 2020; originally announced April 2020.

arXiv:1911.09179 [pdf, other]

doi 10.1609/aaai.v34i01.5460

Scalable and Generalizable Social Bot Detection through Data Selection

Authors: Kai-Cheng Yang, Onur Varol, Pik-Mai Hui, Filippo Menczer

Abstract: Efficient and reliable social bot classification is crucial for detecting information manipulation on social media. Despite rapid development, state-of-the-art bot detection models still face generalization and scalability challenges, which greatly limit their applications. In this paper we propose a framework that uses minimal account metadata, enabling efficient analysis that scales up to handle… ▽ More Efficient and reliable social bot classification is crucial for detecting information manipulation on social media. Despite rapid development, state-of-the-art bot detection models still face generalization and scalability challenges, which greatly limit their applications. In this paper we propose a framework that uses minimal account metadata, enabling efficient analysis that scales up to handle the full stream of public tweets of Twitter in real time. To ensure model accuracy, we build a rich collection of labeled datasets for training and validation. We deploy a strict validation system so that model performance on unseen datasets is also optimized, in addition to traditional cross-validation. We find that strategically selecting a subset of training data yields better model accuracy and generalization than exhaustively training on all available data. Thanks to the simplicity of the proposed model, its logic can be interpreted to provide insights into social bot characteristics. △ Less

Submitted 20 November, 2019; originally announced November 2019.

Comments: AAAI 2020

arXiv:1908.04628 [pdf, other]

L2P: Learning to Place for Estimating Heavy-Tailed Distributed Outcomes

Authors: Xindi Wang, Onur Varol, Tina Eliassi-Rad

Abstract: Many real-world prediction tasks have outcome variables that have characteristic heavy-tail distributions. Examples include copies of books sold, auction prices of art pieces, demand for commodities in warehouses, etc. By learning heavy-tailed distributions, "big and rare" instances (e.g., the best-sellers) will have accurate predictions. Most existing approaches are not dedicated to learning heav… ▽ More Many real-world prediction tasks have outcome variables that have characteristic heavy-tail distributions. Examples include copies of books sold, auction prices of art pieces, demand for commodities in warehouses, etc. By learning heavy-tailed distributions, "big and rare" instances (e.g., the best-sellers) will have accurate predictions. Most existing approaches are not dedicated to learning heavy-tailed distribution; thus, they heavily under-predict such instances. To tackle this problem, we introduce Learning to Place (L2P), which exploits the pairwise relationships between instances for learning. In its training phase, L2P learns a pairwise preference classifier: is instance A > instance B? In its placing phase, L2P obtains a prediction by placing the new instance among the known instances. Based on its placement, the new instance is then assigned a value for its outcome variable. Experiments on real data show that L2P outperforms competing approaches in terms of accuracy and ability to reproduce heavy-tailed outcome distribution. In addition, L2P provides an interpretable model by placing each predicted instance in relation to its comparable neighbors. Interpretable models are highly desirable when lives and treasure are at stake. △ Less

Submitted 12 October, 2023; v1 submitted 13 August, 2019; originally announced August 2019.

Comments: 9 pages, 6 figures, 2 tables Nature of changes from previous version: 1. Added complexity analysis in Section 2.2 2. Datasets change 3. Added LambdaMART in the baseline methods, also a brief discussion on why LambdaMart failed in our problem. 4. Figure updates

arXiv:1906.11371 [pdf, other]

doi 10.5210/fm.v22i5.7883

Deception Strategies and Threats for Online Discussions

Authors: Onur Varol, Ismail Uluturk

Abstract: Communication plays a major role in social systems. Effective communications, which requires transmission of the messages between individuals without disruptions or noise, can be a powerful tool to deliver intended impact. Language and style of the content can be leveraged to deceive and manipulate recipients. These deception and persuasion strategies can be applied to exert power and amass capita… ▽ More Communication plays a major role in social systems. Effective communications, which requires transmission of the messages between individuals without disruptions or noise, can be a powerful tool to deliver intended impact. Language and style of the content can be leveraged to deceive and manipulate recipients. These deception and persuasion strategies can be applied to exert power and amass capital in politics and business. In this work, we provide a modest review of how such deception and persuasion strategies were applied to different communication channels over the years. We provide examples of campaigns that has occurred in different periods over the last 100 years, together with their corresponding dissemination mediums. In the Internet age, we enjoy access to the vast amount of information and the ability to communicate without borders. However, malicious actors work toward abusing online systems to disseminate disinformation, disrupt communication, and manipulate people by the means of automated tools, such as social bots. It is important to study the old practices of persuasion to be able to investigate modern practices and tools. Here we provide a discussion of current threats against society while drawing parallels with the historical practices and the recent research efforts on systems of detection and prevention. △ Less

Submitted 26 June, 2019; originally announced June 2019.

Comments: 47 oages, 3 figures

Journal ref: First Monday 23.5 (2018)

arXiv:1901.00912 [pdf, other]

doi 10.1002/hbe2.115

Arming the public with artificial intelligence to counter social bots

Authors: Kai-Cheng Yang, Onur Varol, Clayton A. Davis, Emilio Ferrara, Alessandro Flammini, Filippo Menczer

Abstract: The increased relevance of social media in our daily life has been accompanied by efforts to manipulate online conversations and opinions. Deceptive social bots -- automated or semi-automated accounts designed to impersonate humans -- have been successfully exploited for these kinds of abuse. Researchers have responded by developing AI tools to arm the public in the fight against social bots. Here… ▽ More The increased relevance of social media in our daily life has been accompanied by efforts to manipulate online conversations and opinions. Deceptive social bots -- automated or semi-automated accounts designed to impersonate humans -- have been successfully exploited for these kinds of abuse. Researchers have responded by developing AI tools to arm the public in the fight against social bots. Here we review the literature on different types of bots, their impact, and detection methods. We use the case study of Botometer, a popular bot detection tool developed at Indiana University, to illustrate how people interact with AI countermeasures. A user experience survey suggests that bot detection has become an integral part of the social media experience for many users. However, barriers in interpreting the output of AI tools can lead to fundamental misunderstandings. The arms race between machine learning methods to develop sophisticated bots and effective countermeasures makes it necessary to update the training data and features of detection tools. We again use the Botometer case to illustrate both algorithmic and interpretability improvements of bot scores, designed to meet user expectations. We conclude by discussing how future AI developments may affect the fight between malicious bots and the public. △ Less

Submitted 6 February, 2019; v1 submitted 3 January, 2019; originally announced January 2019.

Comments: Published in Human Behavior and Emerging Technologies

Journal ref: Hum Behav & Emerg Tech. 2019;e115

arXiv:1807.09725 [pdf, other]

Does putting your emotions into words make you feel better? Measuring the minute-scale dynamics of emotions from online data

Authors: Rui Fan, Ali Varamesh, Onur Varol, Alexander Barron, Ingrid van de Leemput, Marten Scheffer, Johan Bollen

Abstract: Studies of affect labeling, i.e. putting your feelings into words, indicate that it can attenuate positive and negative emotions. Here we track the evolution of individual emotions for tens of thousands of Twitter users by analyzing the emotional content of their tweets before and after they explicitly report having a strong emotion. Our results reveal how emotions and their expression evolve at t… ▽ More Studies of affect labeling, i.e. putting your feelings into words, indicate that it can attenuate positive and negative emotions. Here we track the evolution of individual emotions for tens of thousands of Twitter users by analyzing the emotional content of their tweets before and after they explicitly report having a strong emotion. Our results reveal how emotions and their expression evolve at the temporal resolution of one minute. While the expression of positive emotions is preceded by a short but steep increase in positive valence and followed by short decay to normal levels, negative emotions build up more slowly, followed by a sharp reversal to previous levels, matching earlier findings of the attenuating effects of affect labeling. We estimate that positive and negative emotions last approximately 1.25 and 1.5 hours from onset to evanescence. A separate analysis for male and female subjects is suggestive of possible gender-specific differences in emotional dynamics. △ Less

Submitted 25 July, 2018; originally announced July 2018.

arXiv:1707.07592 [pdf, other]

doi 10.1038/s41467-018-06930-7

The spread of low-credibility content by social bots

Authors: Chengcheng Shao, Giovanni Luca Ciampaglia, Onur Varol, Kaicheng Yang, Alessandro Flammini, Filippo Menczer

Abstract: The massive spread of digital misinformation has been identified as a major global risk and has been alleged to influence elections and threaten democracies. Communication, cognitive, social, and computer scientists are engaged in efforts to study the complex causes for the viral diffusion of misinformation online and to develop solutions, while search and social media platforms are beginning to d… ▽ More The massive spread of digital misinformation has been identified as a major global risk and has been alleged to influence elections and threaten democracies. Communication, cognitive, social, and computer scientists are engaged in efforts to study the complex causes for the viral diffusion of misinformation online and to develop solutions, while search and social media platforms are beginning to deploy countermeasures. With few exceptions, these efforts have been mainly informed by anecdotal evidence rather than systematic data. Here we analyze 14 million messages spreading 400 thousand articles on Twitter during and following the 2016 U.S. presidential campaign and election. We find evidence that social bots played a disproportionate role in amplifying low-credibility content. Accounts that actively spread articles from low-credibility sources are significantly more likely to be bots. Automated accounts are particularly active in amplifying content in the very early spreading moments, before an article goes viral. Bots also target users with many followers through replies and mentions. Humans are vulnerable to this manipulation, retweeting bots who post links to low-credibility content. Successful low-credibility sources are heavily supported by social bots. These results suggest that curbing social bots may be an effective strategy for mitigating the spread of online misinformation. △ Less

Submitted 24 May, 2018; v1 submitted 24 July, 2017; originally announced July 2017.

Comments: 41 pages, 20 figures, 3 tables

Journal ref: Nature Communications, 9: 4787, 2018

arXiv:1703.07518 [pdf, other]

Early Detection of Promoted Campaigns on Social Media

Authors: Onur Varol, Emilio Ferrara, Filippo Menczer, Alessandro Flammini

Abstract: Social media expose millions of users every day to information campaigns --- some emerging organically from grassroots activity, others sustained by advertising or other coordinated efforts. These campaigns contribute to the shaping of collective opinions. While most information campaigns are benign, some may be deployed for nefarious purposes. It is therefore important to be able to detect whethe… ▽ More Social media expose millions of users every day to information campaigns --- some emerging organically from grassroots activity, others sustained by advertising or other coordinated efforts. These campaigns contribute to the shaping of collective opinions. While most information campaigns are benign, some may be deployed for nefarious purposes. It is therefore important to be able to detect whether a meme is being artificially promoted at the very moment it becomes wildly popular. This problem has important social implications and poses numerous technical challenges. As a first step, here we focus on discriminating between trending memes that are either organic or promoted by means of advertisement. The classification is not trivial: ads cause bursts of attention that can be easily mistaken for those of organic trends. We designed a machine learning framework to classify memes that have been labeled as trending on Twitter.After trending, we can rely on a large volume of activity data. Early detection, occurring immediately at trending time, is a more challenging problem due to the minimal volume of activity data that is available prior to trending.Our supervised learning framework exploits hundreds of time-varying features to capture changing network and diffusion patterns, content and sentiment information, timing signals, and user meta-data. We explore different methods for encoding feature time series. Using millions of tweets containing trending hashtags, we achieve 75% AUC score for early detection, increasing to above 95% after trending. We evaluate the robustness of the algorithms by introducing random temporal shifts on the trend time series. Feature selection analysis reveals that content cues provide consistently useful signals; user features are more informative for early detection, while network and timing features are more helpful once more data is available. △ Less

Submitted 22 March, 2017; originally announced March 2017.

Comments: 29 pages, 9 figures, 3 tables

arXiv:1703.03107 [pdf, other]

Online Human-Bot Interactions: Detection, Estimation, and Characterization

Authors: Onur Varol, Emilio Ferrara, Clayton A. Davis, Filippo Menczer, Alessandro Flammini

Abstract: Increasing evidence suggests that a growing amount of social media content is generated by autonomous entities known as social bots. In this work we present a framework to detect such entities on Twitter. We leverage more than a thousand features extracted from public data and meta-data about users: friends, tweet content and sentiment, network patterns, and activity time series. We benchmark the… ▽ More Increasing evidence suggests that a growing amount of social media content is generated by autonomous entities known as social bots. In this work we present a framework to detect such entities on Twitter. We leverage more than a thousand features extracted from public data and meta-data about users: friends, tweet content and sentiment, network patterns, and activity time series. We benchmark the classification framework by using a publicly available dataset of Twitter bots. This training data is enriched by a manually annotated collection of active Twitter users that include both humans and bots of varying sophistication. Our models yield high accuracy and agreement with each other and can detect bots of different nature. Our estimates suggest that between 9% and 15% of active Twitter accounts are bots. Characterizing ties among accounts, we observe that simple bots tend to interact with bots that exhibit more human-like behaviors. Analysis of content flows reveals retweet and mention strategies adopted by bots to interact with different target groups. Using clustering analysis, we characterize several subclasses of accounts, including spammers, self promoters, and accounts that post content from connected applications. △ Less

Submitted 27 March, 2017; v1 submitted 8 March, 2017; originally announced March 2017.

Comments: Accepted paper for ICWSM'17, 10 pages, 8 figures, 1 table

arXiv:1702.02007 [pdf]

The Installation Costs of a Satellite and Space Shuttle Launch Complex as a Public Expenditure Project

Authors: Dogus Ozuyar, Sevilay Gumus Ozuyar, Oguzhan Karadeniz, Ozge Varol

Abstract: From the 1940's to the present, space explorations, which is a highly important topic for the world and human beings, penetrate into many areas from the communication to the national security as well as from the discovery of exoplanets and new life forms to space mining. On the top of the countries which do researches on these fields are the developed countries and the developing countries are onl… ▽ More From the 1940's to the present, space explorations, which is a highly important topic for the world and human beings, penetrate into many areas from the communication to the national security as well as from the discovery of exoplanets and new life forms to space mining. On the top of the countries which do researches on these fields are the developed countries and the developing countries are only used as launch areas, in an irrelevant manner of the research and development. However, developing countries can significantly reduce foreign dependency and security flaws as well as providing important reputation gain in international platforms by conducting space research and development activities as already done by developed countries. All the large scale space probes conducted by developed countries oblige Turkey to develop space researches in terms of economy, security and scientific aspects. Due to these reasons, the approximate costs of a launch base, which will be installed to conduct space researches in Turkey, and of a satellite or a spacecraft, which will be able to launch from this base and serve a variety of purposes, are calculated in this study. In an effort to make the mentioned calculations, examples of various countries that have already established a launch base and already launched from these bases are analyzed and some projections are built for Turkey by calculating the estimated costs. Since these projections must be carried out by the Republic of Turkey since the private sector in Turkey will not be willing to invest in such activities, the possible public advantages can be gained through these activities are also mentioned and evaluated. △ Less

Submitted 7 February, 2017; originally announced February 2017.

Comments: 2nd International Annual Meeting of Sosyoekonomi Society

MSC Class: 91-06 ACM Class: J.4

Journal ref: Sosyoekonomi proceedings book, ISBN:978-605-9190-61-9, 2016

arXiv:1605.00659 [pdf, other]

doi 10.1007/978-3-319-47874-6_3

Predicting online extremism, content adopters, and interaction reciprocity

Authors: Emilio Ferrara, Wen-Qiang Wang, Onur Varol, Alessandro Flammini, Aram Galstyan

Abstract: We present a machine learning framework that leverages a mixture of metadata, network, and temporal features to detect extremist users, and predict content adopters and interaction reciprocity in social media. We exploit a unique dataset containing millions of tweets generated by more than 25 thousand users who have been manually identified, reported, and suspended by Twitter due to their involvem… ▽ More We present a machine learning framework that leverages a mixture of metadata, network, and temporal features to detect extremist users, and predict content adopters and interaction reciprocity in social media. We exploit a unique dataset containing millions of tweets generated by more than 25 thousand users who have been manually identified, reported, and suspended by Twitter due to their involvement with extremist campaigns. We also leverage millions of tweets generated by a random sample of 25 thousand regular users who were exposed to, or consumed, extremist content. We carry out three forecasting tasks, (i) to detect extremist users, (ii) to estimate whether regular users will adopt extremist content, and finally (iii) to predict whether users will reciprocate contacts initiated by extremists. All forecasting tasks are set up in two scenarios: a post hoc (time independent) prediction task on aggregated data, and a simulated real-time prediction task. The performance of our framework is extremely promising, yielding in the different forecasting scenarios up to 93% AUC for extremist user detection, up to 80% AUC for content adoption prediction, and finally up to 72% AUC for interaction reciprocity forecasting. We conclude by providing a thorough feature analysis that helps determine which are the emerging signals that provide predictive power in different scenarios. △ Less

Submitted 2 May, 2016; originally announced May 2016.

Comments: 9 pages, 3 figures, 8 tables

Journal ref: International Conference on Social Informatics (pp. 22-39). Springer. 2016

arXiv:1602.00975 [pdf, other]

doi 10.1145/2872518.2889302

BotOrNot: A System to Evaluate Social Bots

Authors: Clayton A. Davis, Onur Varol, Emilio Ferrara, Alessandro Flammini, Filippo Menczer

Abstract: While most online social media accounts are controlled by humans, these platforms also host automated agents called social bots or sybil accounts. Recent literature reported on cases of social bots imitating humans to manipulate discussions, alter the popularity of users, pollute content and spread misinformation, and even perform terrorist propaganda and recruitment actions. Here we present BotOr… ▽ More While most online social media accounts are controlled by humans, these platforms also host automated agents called social bots or sybil accounts. Recent literature reported on cases of social bots imitating humans to manipulate discussions, alter the popularity of users, pollute content and spread misinformation, and even perform terrorist propaganda and recruitment actions. Here we present BotOrNot, a publicly-available service that leverages more than one thousand features to evaluate the extent to which a Twitter account exhibits similarity to the known characteristics of social bots. Since its release in May 2014, BotOrNot has served over one million requests via our website and APIs. △ Less

Submitted 2 February, 2016; originally announced February 2016.

Comments: 2 pages, 2 figures, WWW Developers Day

Journal ref: Proceedings of the 25th International Conference Companion on World Wide Web (pp. 273-274). 2016

arXiv:1601.05140 [pdf]

doi 10.1109/MC.2016.183

The DARPA Twitter Bot Challenge

Authors: V. S. Subrahmanian, Amos Azaria, Skylar Durst, Vadim Kagan, Aram Galstyan, Kristina Lerman, Linhong Zhu, Emilio Ferrara, Alessandro Flammini, Filippo Menczer, Andrew Stevens, Alexander Dekhtyar, Shuyang Gao, Tad Hogg, Farshad Kooti, Yan Liu, Onur Varol, Prashant Shiralkar, Vinod Vydiswaran, Qiaozhu Mei, Tim Hwang

Abstract: A number of organizations ranging from terrorist groups such as ISIS to politicians and nation states reportedly conduct explicit campaigns to influence opinion on social media, posing a risk to democratic processes. There is thus a growing need to identify and eliminate "influence bots" - realistic, automated identities that illicitly shape discussion on sites like Twitter and Facebook - before t… ▽ More A number of organizations ranging from terrorist groups such as ISIS to politicians and nation states reportedly conduct explicit campaigns to influence opinion on social media, posing a risk to democratic processes. There is thus a growing need to identify and eliminate "influence bots" - realistic, automated identities that illicitly shape discussion on sites like Twitter and Facebook - before they get too influential. Spurred by such events, DARPA held a 4-week competition in February/March 2015 in which multiple teams supported by the DARPA Social Media in Strategic Communications program competed to identify a set of previously identified "influence bots" serving as ground truth on a specific topic within Twitter. Past work regarding influence bots often has difficulty supporting claims about accuracy, since there is limited ground truth (though some exceptions do exist [3,7]). However, with the exception of [3], no past work has looked specifically at identifying influence bots on a specific topic. This paper describes the DARPA Challenge and describes the methods used by the three top-ranked teams. △ Less

Submitted 21 April, 2016; v1 submitted 19 January, 2016; originally announced January 2016.

Comments: IEEE Computer Magazine, in press

Journal ref: Computer 49 (6), 38-46. IEEE, 2016

arXiv:1411.0652 [pdf, other]

doi 10.1007/s13278-014-0237-x

Clustering memes in social media streams

Authors: Mohsen JafariAsbagh, Emilio Ferrara, Onur Varol, Filippo Menczer, Alessandro Flammini

Abstract: The problem of clustering content in social media has pervasive applications, including the identification of discussion topics, event detection, and content recommendation. Here we describe a streaming framework for online detection and clustering of memes in social media, specifically Twitter. A pre-clustering procedure, namely protomeme detection, first isolates atomic tokens of information car… ▽ More The problem of clustering content in social media has pervasive applications, including the identification of discussion topics, event detection, and content recommendation. Here we describe a streaming framework for online detection and clustering of memes in social media, specifically Twitter. A pre-clustering procedure, namely protomeme detection, first isolates atomic tokens of information carried by the tweets. Protomemes are thereafter aggregated, based on multiple similarity measures, to obtain memes as cohesive groups of tweets reflecting actual concepts or topics of discussion. The clustering algorithm takes into account various dimensions of the data and metadata, including natural language, the social network, and the patterns of information diffusion. As a result, our system can build clusters of semantically, structurally, and topically related tweets. The clustering process is based on a variant of Online K-means that incorporates a memory mechanism, used to "forget" old memes and replace them over time with the new ones. The evaluation of our framework is carried out by using a dataset of Twitter trending topics. Over a one-week period, we systematically determined whether our algorithm was able to recover the trending hashtags. We show that the proposed method outperforms baseline algorithms that only use content features, as well as a state-of-the-art event detection method that assumes full knowledge of the underlying follower network. We finally show that our online learning framework is flexible, due to its independence of the adopted clustering algorithm, and best suited to work in a streaming scenario. △ Less

Submitted 3 November, 2014; originally announced November 2014.

Comments: 25 pages, 8 figures, accepted on Social Network Analysis and Mining (SNAM). The final publication is available at Springer via http://dx.doi.org/10.1007/s13278-014-0237-x

Journal ref: Social Network Analysis and Mining, 4(1), 1-13. 2014

arXiv:1407.5225 [pdf, other]

doi 10.1145/2818717

The Rise of Social Bots

Authors: Emilio Ferrara, Onur Varol, Clayton Davis, Filippo Menczer, Alessandro Flammini

Abstract: The Turing test aimed to recognize the behavior of a human from that of a computer algorithm. Such challenge is more relevant than ever in today's social media context, where limited attention and technology constrain the expressive power of humans, while incentives abound to develop software agents mimicking humans. These social bots interact, often unnoticed, with real people in social media eco… ▽ More The Turing test aimed to recognize the behavior of a human from that of a computer algorithm. Such challenge is more relevant than ever in today's social media context, where limited attention and technology constrain the expressive power of humans, while incentives abound to develop software agents mimicking humans. These social bots interact, often unnoticed, with real people in social media ecosystems, but their abundance is uncertain. While many bots are benign, one can design harmful bots with the goals of persuading, smearing, or deceiving. Here we discuss the characteristics of modern, sophisticated social bots, and how their presence can endanger online ecosystems and our society. We then review current efforts to detect social bots on Twitter. Features related to content, network, sentiment, and temporal patterns of activity are imitated by bots but at the same time can help discriminate synthetic behaviors from human ones, yielding signatures of engineered social tampering. △ Less

Submitted 6 March, 2017; v1 submitted 19 July, 2014; originally announced July 2014.

Comments: Check http://cacm.acm.org/magazines/2016/7/204021-the-rise-of-social-bots/fulltext for the final version; 'Bot or Not?' is available at: http://truthy.indiana.edu/botornot/

Journal ref: Communications of the ACM 59 (7), 96-104, 2016

arXiv:1406.7197 [pdf, other]

doi 10.1145/2615569.2615699

Evolution of Online User Behavior During a Social Upheaval

Authors: Onur Varol, Emilio Ferrara, Christine L. Ogan, Filippo Menczer, Alessandro Flammini

Abstract: Social media represent powerful tools of mass communication and information diffusion. They played a pivotal role during recent social uprisings and political mobilizations across the world. Here we present a study of the Gezi Park movement in Turkey through the lens of Twitter. We analyze over 2.3 million tweets produced during the 25 days of protest occurred between May and June 2013. We first c… ▽ More Social media represent powerful tools of mass communication and information diffusion. They played a pivotal role during recent social uprisings and political mobilizations across the world. Here we present a study of the Gezi Park movement in Turkey through the lens of Twitter. We analyze over 2.3 million tweets produced during the 25 days of protest occurred between May and June 2013. We first characterize the spatio-temporal nature of the conversation about the Gezi Park demonstrations, showing that similarity in trends of discussion mirrors geographic cues. We then describe the characteristics of the users involved in this conversation and what roles they played. We study how roles and individual influence evolved during the period of the upheaval. This analysis reveals that the conversation becomes more democratic as events unfold, with a redistribution of influence over time in the user population. We conclude by observing how the online and offline worlds are tightly intertwined, showing that exogenous events, such as political speeches or police actions, affect social media conversations and trigger changes in individual behavior. △ Less

Submitted 27 June, 2014; originally announced June 2014.

Comments: Best Paper Award at ACM Web Science 2014

Journal ref: Proceedings of the 2014 ACM conference on Web science, Pages 81-90

arXiv:1402.2297 [pdf, other]

doi 10.1145/2567948.2579697

Connecting Dream Networks Across Cultures

Authors: Onur Varol, Filippo Menczer

Abstract: Many species dream, yet there remain many open research questions in the study of dreams. The symbolism of dreams and their interpretation is present in cultures throughout history. Analysis of online data sources for dream interpretation using network science leads to understanding symbolism in dreams and their associated meaning. In this study, we introduce dream interpretation networks for Engl… ▽ More Many species dream, yet there remain many open research questions in the study of dreams. The symbolism of dreams and their interpretation is present in cultures throughout history. Analysis of online data sources for dream interpretation using network science leads to understanding symbolism in dreams and their associated meaning. In this study, we introduce dream interpretation networks for English, Chinese and Arabic that represent different cultures from various parts of the world. We analyze communities in these networks, finding that symbols within a community are semantically related. The central nodes in communities give insight about cultures and symbols in dreams. The community structure of different networks highlights cultural similarities and differences. Interconnections between different networks are also identified by translating symbols from different languages into English. Structural correlations across networks point out relationships between cultures. Similarities between network communities are also investigated by analysis of sentiment in symbol interpretations. We find that interpretations within a community tend to have similar sentiment. Furthermore, we cluster communities based on their sentiment, yielding three main categories of positive, negative, and neutral dream symbols. △ Less

Submitted 10 February, 2014; originally announced February 2014.

Comments: 6 pages, 3 figures

arXiv:1310.8598 [pdf, other]

Mode-coupling points to functionally important residues in Myosin II

Authors: Onur Varol, Deniz Yuret, Burak Erman, Alkan Kabakçıoğlu

Abstract: Relevance of mode coupling to energy/information transfer during protein function, particularly in the context of allosteric interactions is widely accepted. However, existing evidence in favor of this hypothesis comes essentially from model systems. We here report a novel formal analysis of the near-native dynamics of myosin II, which allows us to explore the impact of the interaction between pos… ▽ More Relevance of mode coupling to energy/information transfer during protein function, particularly in the context of allosteric interactions is widely accepted. However, existing evidence in favor of this hypothesis comes essentially from model systems. We here report a novel formal analysis of the near-native dynamics of myosin II, which allows us to explore the impact of the interaction between possibly non-Gaussian vibrational modes on fluctutational dynamics. We show that, an information-theoretic measure based on mode coupling {\it alone} yields a ranking of residues with a statistically significant bias favoring the functionally critical locations identified by experiments on myosin II. △ Less

Submitted 31 October, 2013; originally announced October 2013.

Comments: 17 pages, 6 figures, 1 table

arXiv:1310.2671 [pdf, other]

doi 10.1145/2512938.2512956

Traveling Trends: Social Butterflies or Frequent Fliers?

Authors: Emilio Ferrara, Onur Varol, Filippo Menczer, Alessandro Flammini

Abstract: Trending topics are the online conversations that grab collective attention on social media. They are continually changing and often reflect exogenous events that happen in the real world. Trends are localized in space and time as they are driven by activity in specific geographic areas that act as sources of traffic and information flow. Taken independently, trends and geography have been discuss… ▽ More Trending topics are the online conversations that grab collective attention on social media. They are continually changing and often reflect exogenous events that happen in the real world. Trends are localized in space and time as they are driven by activity in specific geographic areas that act as sources of traffic and information flow. Taken independently, trends and geography have been discussed in recent literature on online social media; although, so far, little has been done to characterize the relation between trends and geography. Here we investigate more than eleven thousand topics that trended on Twitter in 63 main US locations during a period of 50 days in 2013. This data allows us to study the origins and pathways of trends, how they compete for popularity at the local level to emerge as winners at the country level, and what dynamics underlie their production and consumption in different geographic areas. We identify two main classes of trending topics: those that surface locally, coinciding with three different geographic clusters (East coast, Midwest and Southwest); and those that emerge globally from several metropolitan areas, coinciding with the major air traffic hubs of the country. These hubs act as trendsetters, generating topics that eventually trend at the country level, and driving the conversation across the country. This poses an intriguing conjecture, drawing a parallel between the spread of information and diseases: Do trends travel faster by airplane than over the Internet? △ Less

Submitted 9 October, 2013; originally announced October 2013.

Comments: Proceedings of the first ACM conference on Online social networks, pp. 213-222, 2013

Journal ref: Proceedings of the first ACM conference on Online social networks (pp. 213-222). ACM. 2013

arXiv:1310.2665 [pdf, other]

doi 10.1145/2492517.2492530

Clustering Memes in Social Media

Authors: Emilio Ferrara, Mohsen JafariAsbagh, Onur Varol, Vahed Qazvinian, Filippo Menczer, Alessandro Flammini

Abstract: The increasing pervasiveness of social media creates new opportunities to study human social behavior, while challenging our capability to analyze their massive data streams. One of the emerging tasks is to distinguish between different kinds of activities, for example engineered misinformation campaigns versus spontaneous communication. Such detection problems require a formal definition of meme,… ▽ More The increasing pervasiveness of social media creates new opportunities to study human social behavior, while challenging our capability to analyze their massive data streams. One of the emerging tasks is to distinguish between different kinds of activities, for example engineered misinformation campaigns versus spontaneous communication. Such detection problems require a formal definition of meme, or unit of information that can spread from person to person through the social network. Once a meme is identified, supervised learning methods can be applied to classify different types of communication. The appropriate granularity of a meme, however, is hardly captured from existing entities such as tags and keywords. Here we present a framework for the novel task of detecting memes by clustering messages from large streams of social data. We evaluate various similarity measures that leverage content, metadata, network features, and their combinations. We also explore the idea of pre-clustering on the basis of existing entities. A systematic evaluation is carried out using a manually curated dataset as ground truth. Our analysis shows that pre-clustering and a combination of heterogeneous features yield the best trade-off between number of clusters and their quality, demonstrating that a simple combination based on pairwise maximization of similarity is as effective as a non-trivial optimization of parameters. Our approach is fully automatic, unsupervised, and scalable for real-time detection of memes in streaming data. △ Less

Submitted 9 October, 2013; originally announced October 2013.

Comments: Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM'13), 2013

Journal ref: Advances in social networks analysis and mining (ASONAM), 2013 IEEE/ACM international conference on (pp. 548-555). IEEE

Showing 1–32 of 32 results for author: Varol, O