research-article

Public Access

Detection of Novel Social Bots by Ensembles of Specialized Classifiers

Authors:

Mohsen Sayyadiharikandeh,

Kai-Cheng Yang,

Alessandro Flammini,

Filippo MenczerAuthors Info & Claims

CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge Management

Pages 2725 - 2732

https://doi.org/10.1145/3340531.3412698

Published: 19 October 2020 Publication History

Abstract

Malicious actors create inauthentic social media accounts controlled in part by algorithms, known as social bots, to disseminate misinformation and agitate online discussion. While researchers have developed sophisticated methods to detect abuse, novel bots with diverse behaviors evade detection. We show that different types of bots are characterized by different behavioral features. As a result, supervised learning techniques suffer severe performance deterioration when attempting to detect behaviors not observed in the training data. Moreover, tuning these models to recognize novel bots requires retraining with a significant amount of new annotations, which are expensive to obtain. To address these issues, we propose a new supervised learning method that trains classifiers specialized for each class of bots and combines their decisions through the maximum rule. The ensemble of specialized classifiers (ESC) can better generalize, leading to an average improvement of 56% in F1 score for unseen accounts across datasets. Furthermore, novel bot behaviors are learned with fewer labeled examples during retraining. We deployed ESC in the newest version of Botometer, a popular tool to detect social bots in the wild, with a cross-validation AUC of 0.99.

Supplementary Material

MP4 File (3340531.3412698.mp4)

In this video we present the limitations of current bot detection tools on Twitter.\r\nWe then propose a new method that generalizes better when tested on new class of bots. \r\nWe employed the new method in the new version of Botometer (v4) which was launched recently and is publicly available.

Download
8.59 MB

References

[1]

Faraz Ahmed and Muhammad Abulaish. 2013. A generic statistical approach for spam detection in Online Social Networks. Computer Communications 36, 10--11 (2013), 1120--1129.

[2]

Jon-Patrick Allem and Emilio Ferrara. 2018. Could social bots pose a threat to public health? American Journal of Public Health 108, 8 (2018), 1005.

[3]

Jon-Patrick Allem, Emilio Ferrara, Sree Priyanka Uppu, Tess Boley Cruz, and Jennifer B Unger. 2017. E-cigarette surveillance with social media data: social bots, emerging topics, and trends. JMIR public health and surveillance 3, 4 (2017), e98.

[4]

Eiman Alothali, Nazar Zaki, Elfadil A Mohamed, and Hany Alashwal. 2018. Detecting social bots on Twitter: a literature review. In International Conference on Innovations in Information Technology. IEEE, 175--180.

[5]

Alessandro Bessi and Emilio Ferrara. 2016. Social bots distort the 2016 US Presidential election online discussion. First Monday 21, 11 (2016).

[6]

David A Broniatowski, Amelia M Jamison, SiHua Qi, Lulwah AlKulaib, Tao Chen, Adrian Benton, Sandra C Quinn, and Mark Dredze. 2018. Weaponized health communication: Twitter bots and Russian trolls amplify the vaccine debate. American Journal of Public Health 108, 10 (2018), 1378--1384.

[7]

Nikan Chavoshi, Hossein Hamooni, and Abdullah Mueen. 2016. DeBot: Twitter Bot Detection via Warped Correlation. In Proc. Intl. Conf. on Data Mining. 817--822.

[8]

Zhouhan Chen and Devika Subramanian. 2018. An Unsupervised Approach to Detect Spam Campaigns that Use Botnets on Twitter. arXiv preprint arXiv:1804.05232 (2018).

[9]

Stefano Cresci, Roberto Di Pietro, Marinella Petrocchi, Angelo Spognardi, and Maurizio Tesconi. 2015. Fame for sale: efficient detection of fake Twitter followers. Decision Support Systems 80 (2015), 56--71.

Digital Library

[10]

Stefano Cresci, Roberto Di Pietro, Marinella Petrocchi, Angelo Spognardi, and Maurizio Tesconi. 2016. DNA-inspired online behavioral modeling and its application to spambot detection. IEEE Intelligent Systems 31, 5 (2016), 58--64.

[11]

Stefano Cresci, Roberto Di Pietro, Marinella Petrocchi, Angelo Spognardi, and Maurizio Tesconi. 2017. The paradigm-shift of social spambots: Evidence, theories, and tools for the arms race. In Proc. Intl. Conf. of the Web Companion. 963--972.

Digital Library

[12]

Stefano Cresci, Fabrizio Lillo, Daniele Regoli, Serena Tardelli, and Maurizio Tesconi. 2018. $ FAKE: Evidence of Spam and Bot Activity in Stock Microblogs on Twitter. In Proc. 12th International AAAI Conference on Web and Social Media.

[13]

Stefano Cresci, Fabrizio Lillo, Daniele Regoli, Serena Tardelli, and Maurizio Tesconi. 2019. Cashtag piggybacking: Uncovering spam and bot activity in stock microblogs on Twitter. ACM Transactions on the Web (TWEB) 13, 2 (2019), 11.

Digital Library

[14]

Stefano Cresci, Marinella Petrocchi, Angelo Spognardi, and Stefano Tognazzi. 2018. From Reaction to Proaction: UnexploredWays to the Detection of Evolving Spambots. In Companion of The Web Conference. 1469--1470.

Digital Library

[15]

Stefano Cresci, Marinella Petrocchi, Angelo Spognardi, and Stefano Tognazzi. 2019. Better safe than sorry: an adversarial approach to improve social bot detection. In Proceedings of the 10th ACM Conference on Web Science. 47--56.

Digital Library

[16]

Clayton Allen Davis, Onur Varol, Emilio Ferrara, Alessandro Flammini, and Filippo Menczer. 2016. BotOrNot: A system to evaluate social bots. In In Proc. 25th Intl. Conf. Companion on World Wide Web. 273--274.

Digital Library

[17]

Ashok Deb, Anuja Majmundar, Sungyong Seo, Akira Matsui, Rajat Tandon, Shen Yan, Jon-Patrick Allem, and Emilio Ferrara. 2018. Social Bots for Online Public Health Interventions. In Proc. of the Intl. Conf. on Advances in Social Networks Analysis and Mining.

[18]

Juan Echeverría, Emiliano De Cristofaro, Nicolas Kourtellis, Ilias Leontiadis, Gianluca Stringhini, Shi Zhou, et al. 2018. LOBO--evaluation of generalization deficiencies in Twitter bot classifiers. In Proc. of the Annual Computer Security Applications Conf. ACM, 137--146.

[19]

Juan Echeverria and Shi Zhou. 2017. Discovery of the Twitter Bursty Botnet. arXiv preprint arXiv:1709.06740 (2017).

[20]

Emilio Ferrara. 2017. Disinformation and Social Bot Operations in the Run Up to the 2017 French Presidential Election. First Monday 22, 8 (2017).

[21]

Emilio Ferrara, Onur Varol, Clayton Davis, Filippo Menczer, and Alessandro Flammini. 2016. The rise of social bots. Commun. ACM 59, 7 (2016), 96--104.

Digital Library

[22]

Zafar Gilani, Reza Farahbakhsh, Gareth Tyson, Liang Wang, and Jon Crowcroft. 2017. Of bots and humans (on Twitter). In Proc. of the Intl. Conf. on Advances in Social Networks Analysis and Mining. ACM, 349--354.

Digital Library

[23]

Zafar Gilani, Ekaterina Kochmar, and Jon Crowcroft. 2017. Classification of Twitter accounts into automated agents and human users. In Proc. of the Intl. Conf. on Advances in Social Networks Analysis and Mining. ACM, 489--496.

Digital Library

[24]

Christian Grimme, Dennis Assenmacher, and Lena Adam. 2018. Changing Perspectives: Is It Sufficient to Detect Social Bots?. In Proc. International Conference on Social Computing and Social Media. 445--461.

Digital Library

[25]

Sneha Kudugunta and Emilio Ferrara. 2018. Deep Neural Networks for Bot Detection. Information Sciences 467, October (2018), 312--322.

[26]

Kyumin Lee, Brian David Eoff, and James Caverlee. 2011. Seven Months with the Devils: A Long-Term Study of Content Polluters on Twitter. In Proc. AAAI Intl. Conf. on Web and Social Media (ICWSM).

[27]

Shenghua Liu, Bryan Hooi, and Christos Faloutsos. 2017. HoloScope: Topologyand- Spike Aware Fraud Detection. CoRR abs/1705.02505 (2017). arXiv:1705.02505 http://arxiv.org/abs/1705.02505

Digital Library

[28]

Michele Mazza, Stefano Cresci, Marco Avvenuti,Walter Quattrociocchi, and Maurizio Tesconi. 2019. RTbust: Exploiting Temporal Patterns for Botnet Detection on Twitter. arXiv preprint arXiv:1902.04506 (2019).

[29]

Zachary Miller, Brian Dickinson, William Deitrick, Wei Hu, and Alex Hai Wang. 2014. Twitter spammer detection using data stream clustering. Information Sciences 260 (2014), 64--73.

Digital Library

[30]

Silvia Mitter, Claudia Wagner, and Markus Strohmaier. 2014. A categorization scheme for socialbot attacks in online social networks. CoRR abs/1402.6288 (2014). arXiv:1402.6288 http://arxiv.org/abs/1402.6288

[31]

Alexandru Niculescu-Mizil and Rich Caruana. 2005. Predicting Good Probabilities with Supervised Learning. In Proc. 22nd International Conference on Machine Learning (ICML) (Bonn, Germany). 625--632.

Digital Library

[32]

Diogo Pacheco, Pik-Mai Hui, Christopher Torres-Lugo, Bao Tran Truong, Alessandro Flammini, and Filippo Menczer. 2020. Uncovering Coordinated Networks on Social Media. preprint arXiv:2001.05658 (2020). To appear in Proc. ICWSM 2021.

[33]

Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al. 2011. Scikit-learn: Machine learning in Python. the Journal of machine Learning research 12 (2011), 2825--2830.

[34]

Jacob Ratkiewicz, Michael Conover, Mark Meiss, Bruno Gonçalves, Snehal Patil, Alessandro Flammini, and Filippo Menczer. 2011. Truthy: mapping the spread of astroturf in microblog streams. In Proc. 20th Intl. Conf. Companion on World Wide Web. 249--252.

Digital Library

[35]

Adrian Rauchfleisch and Jonas Kaiser. 2020. The False Positive Problem of Automatic Bot Detection in Social Science Research. SSRN Electronic Journal (01 2020). https://doi.org/10.2139/ssrn.3565233

[36]

Chengcheng Shao, Giovanni Luca Ciampaglia, Onur Varol, Kai-Cheng Yang, Alessandro Flammini, and Filippo Menczer. 2018. The spread of low-credibility content by social bots. Nature communications 9, 1 (2018), 4787.

[37]

Chengcheng Shao, Pik-Mai Hui, Lei Wang, Xinwen Jiang, Alessandro Flammini, Filippo Menczer, and Giovanni Luca Ciampaglia. 2018. Anatomy of an online misinformation network. PLoS ONE 13, 4 (2018), e0196087.

[38]

Tao Stein, Erdong Chen, and Karan Mangla. 2011. Facebook Immune System. In Proc. 4th Workshop on Social Network Systems (SNS) (Salzburg, Austria). Article 8, 8 pages. https://doi.org/10.1145/1989656.1989664

Digital Library

[39]

Massimo Stella, Emilio Ferrara, and Manlio De Domenico. 2018. Bots increase exposure to negative and inflammatory content in online social systems. Proceedings of the National Academy of Sciences 115, 49 (2018), 12435--12440.

[40]

Galen Stoking and Nami Sumida. 2018. Social Media Bots Draw Public?s Attention and Concern. Pew Research Center (15 Oct 2018). https://www.journalism.org/2018/10/15/social-media-bots-draw-publics-attention-and-concern/

[41]

VS Subrahmanian, Amos Azaria, Skylar Durst, Vadim Kagan, Aram Galstyan, Kristina Lerman, Linhong Zhu, Emilio Ferrara, Alessandro Flammini, Filippo Menczer, et al. 2016. The DARPA Twitter bot challenge. Computer 49, 6 (2016), 38--46.

Digital Library

[42]

Onur Varol, Clayton A Davis, Filippo Menczer, and Alessandro Flammini. 2018. Feature Engineering for Social Bot Detection. Feature Engineering for Machine Learning and Data Analytics (2018), 311--334.

[43]

Onur Varol, Emilio Ferrara, Clayton A Davis, Filippo Menczer, and Alessandro Flammini. 2017. Online human-bot interactions: Detection, estimation, and characterization. In Proc. Intl. AAAI Conf. on Web and Social Media (ICWSM).

[44]

Onur Varol and Ismail Uluturk. 2018. Deception strategies and threats for online discussions. First Monday 23, 5 (2018).

[45]

Onur Varol and Ismail Uluturk. 2020. Journalists on Twitter: self-branding, audiences, and involvement of bots. Journal of Computational Social Science 3, 1 (01 Apr 2020), 83--101. https://doi.org/10.1007/s42001-019-00056-6

[46]

Nguyen Vo, Kyumin Lee, Cheng Cao, Thanh Tran, and Hongkyu Choi. 2017. Revealing and detecting malicious retweeter groups. In Proc. of the Intl. Conf. on Advances in Social Networks Analysis and Mining. 363--368.

Digital Library

[47]

Gang Wang, Manish Mohanlal, Christo Wilson, Xiao Wang, Miriam J. Metzger, Haitao Zheng, and Ben Y. Zhao. 2012. Social Turing Tests: Crowdsourcing Sybil Detection. CoRR abs/1205.3856 (2012). arXiv:1205.3856

[48]

Kai-Cheng Yang, Onur Varol, Clayton A Davis, Emilio Ferrara, Alessandro Flammini, and Filippo Menczer. 2019. Arming the public with artificial intelligence to counter social bots. Human Behav. and Emerging Technologies 1, 1 (2019), 48--61.

[49]

Kai-Cheng Yang, Onur Varol, Pik-Mai Hui, and Filippo Menczer. 2020. Scalable and generalizable social bot detection through data selection. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 1096--1103.

Cited By

Salles DMuniz de Medeiros PMartins BRegattieri LSantini R(2024)role of social bots in the Brazilian environmental debate:The International Review of Information Ethics10.29173/irie51033:1Online publication date: 1-Apr-2024
https://doi.org/10.29173/irie510
Taban MGür İ(2024)Social Media as an Agent of Influence: Twitter Bots in Russia - Ukraine WarGüvenlik Stratejileri Dergisi10.17752/guvenlikstrtj.139670520:47(99-122)Online publication date: 26-Apr-2024
https://doi.org/10.17752/guvenlikstrtj.1396705
DeVerna MAiyappa RPacheco DBryden JMenczer F(2024)Identifying and characterizing superspreaders of low-credibility content on TwitterPLOS ONE10.1371/journal.pone.030220119:5(e0302201)Online publication date: 22-May-2024
https://doi.org/10.1371/journal.pone.0302201
Show More Cited By

Index Terms

Detection of Novel Social Bots by Ensembles of Specialized Classifiers

Recommendations

Bots in Social and Interaction Networks: Detection and Impact Estimation

The rise of bots and their influence on social networks is a hot topic that has aroused the interest of many researchers. Despite the efforts to detect social bots, it is still difficult to distinguish them from legitimate users. Here, we propose a ...
Red Bots Do It Better:Comparative Analysis of Social Bot Partisan Behavior
WWW '19: Companion Proceedings of The 2019 World Wide Web Conference

Recent research brought awareness of the issue of bots on social media and the significant risks of mass manipulation of public opinion in the context of political discussion. In this work, we leverage Twitter to study the discourse during the 2018 US ...
Discovering social bots on Twitter: a thematic review

The onset of online social networks (OSN) like Twitter became a predominant platform for social expression and public relations. Twitter had 330 million monthly active users by the year 2019. With the gain in popularity, the ratio of virulent and ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge Management

October 2020

3619 pages

ISBN:9781450368599

DOI:10.1145/3340531

General Chairs:
Mathieu d'Aquin
DSI, Insight, NUI Galway, Ireland
,
Stefan Dietze
GESIS, Cologne, Germany, Heinrich-Heine-University Düsseldorf, Germany, L3S Research Center, Germany
,
Program Chairs:
Claudia Hauff
TU Delft, The Netherlands
,
Edward Curry
DSI, Insight, NUI Galway, Ireland
,
Philippe Cudre Mauroux
eXascale, University of Fribourg, Switzerland

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 October 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

CIKM '20

Sponsor:

CIKM '20: The 29th ACM International Conference on Information and Knowledge Management

October 19 - 23, 2020

Virtual Event, Ireland

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

115
Total Citations
View Citations
2,356
Total Downloads

Downloads (Last 12 months)520
Downloads (Last 6 weeks)56

Reflects downloads up to 16 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Salles DMuniz de Medeiros PMartins BRegattieri LSantini R(2024)role of social bots in the Brazilian environmental debate:The International Review of Information Ethics10.29173/irie51033:1Online publication date: 1-Apr-2024
https://doi.org/10.29173/irie510
Taban MGür İ(2024)Social Media as an Agent of Influence: Twitter Bots in Russia - Ukraine WarGüvenlik Stratejileri Dergisi10.17752/guvenlikstrtj.139670520:47(99-122)Online publication date: 26-Apr-2024
https://doi.org/10.17752/guvenlikstrtj.1396705
DeVerna MAiyappa RPacheco DBryden JMenczer F(2024)Identifying and characterizing superspreaders of low-credibility content on TwitterPLOS ONE10.1371/journal.pone.030220119:5(e0302201)Online publication date: 22-May-2024
https://doi.org/10.1371/journal.pone.0302201
Rutter Lten Thij MLorenzo-Luaces LValdez DBollen J(2024)Negative affect variability differs between anxiety and depression on social mediaPLOS ONE10.1371/journal.pone.027210719:2(e0272107)Online publication date: 21-Feb-2024
https://doi.org/10.1371/journal.pone.0272107
Wang RZhang YSuk JHolland Levin S(2024)Empowered or Constrained in Platform Governance? An Analysis of Twitter Users’ Responses to Elon Musk’s TakeoverSocial Media + Society10.1177/2056305124127760610:3Online publication date: 12-Sep-2024
https://doi.org/10.1177/20563051241277606
Seckin OAtalay AOtenen EDuygu UVarol O(2024)Mechanisms Driving Online Vaccine Debate During the COVID-19 PandemicSocial Media + Society10.1177/2056305124122965710:1Online publication date: 12-Feb-2024
https://doi.org/10.1177/20563051241229657
Weiss ERoss VLyford A(2024)Investigating responses to US drone strikes in Yemen using Twitter dataMedia, War & Conflict10.1177/17506352231219694Online publication date: 8-Jan-2024
https://doi.org/10.1177/17506352231219694
Unlu ATruong SSawhney NTammi T(2024)Unveiling the Veiled Threat: The Impact of Bots on COVID-19 Health CommunicationSocial Science Computer Review10.1177/08944393241275641Online publication date: 8-Sep-2024
https://doi.org/10.1177/08944393241275641
Slavin SBerman ABeam ANavar AMittleman M(2024)Statin Twitter: Human and Automated Bot Contributions, 2010 to 2022Journal of the American Heart Association10.1161/JAHA.123.03267813:7Online publication date: 2-Apr-2024
https://doi.org/10.1161/JAHA.123.032678
Akhtar MMasood RIkram MKanhere SQuek TGao DZhou JCardenas A(2024)SoK: False Information, Bots and Malicious Campaigns: Demystifying Elements of Social Media ManipulationsProceedings of the 19th ACM Asia Conference on Computer and Communications Security10.1145/3634737.3644998(1784-1800)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1145/3634737.3644998
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents