Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3643991.3644877acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

RABBIT: A tool for identifying bot accounts based on their recent GitHub event history

Published: 02 July 2024 Publication History

Abstract

Collaborative software development through GitHub repositories frequently relies on bot accounts to automate repetitive and error-prone tasks. This highlights the need to have accurate and efficient bot identification tools. Several such tools have been proposed in the past, but they tend to rely on a substantial amount of historical data, or they limit themselves to a reduced subset of activity types, making them difficult to use at large scale. To overcome these limitations, we developed RABBIT, an open source command-line tool that queries the GitHub Events API to retrieve the recent events of a given GitHub account and predicts whether the account is a human or a bot. RABBIT is based on an XGBoost classification model that relies on six features related to account activities and achieves high performance, with an AUC, F1 score, precision and recall of 0.92. Compared to the state-of-the-art in bot identification, RABBIT exhibits a similar performance in terms of precision, recall and F1 score, while being more than an order of magnitude faster and requiring considerably less data. This makes RABBIT usable on a large scale, capable of processing several thousand accounts per hour efficiently.

References

[1]
Ahmad Abdellatif, Mairieli Wessel, Igor Steinmacher, Marco A. Gerosa, and Emad Shihab. 2022. BotHunter: An Approach to Detect Software Bots in GitHub. In International Conference on Mining Software Repositories (MSR). 6--17.
[2]
Nathan Cassee, Christos Kitsanelis, Eleni Constantinou, and Alexander Serebrenik. 2021. Human, bot or both? A study on the capabilities of classification models on mixed accounts. In International Conference on Software Maintenance and Evolution (ICSME). IEEE.
[3]
Natarajan Chidambaram, Alexandre Decan, and Mehdi Golzadeh. 2022. Leveraging Predictions from Multiple Repositories to Improve bot Detection. In International Workshop on Bots in Software Engineering (BotSE). IEEE.
[4]
Natarajan Chidambaram, Alexandre Decan, and Tom Mens. 2023. A Dataset of Bot and Human Activities in GitHub. In International Conference on Mining Software Repositories (MSR). IEEE/ACM, 465--469.
[5]
Natarajan Chidambaram, Alexandre Decan, and Tom Mens. 2023. Distinguishing Bots From Human Developers Based on Their GitHub Activity Types. In Seminar on Advanced Techniques & Tools for Software Evolution (SATToSE). CEUR.
[6]
Laura Dabbish, Colleen Stuart, Jason Tsay, and Jim Herbsleb. 2012. Social Coding in GitHub: Transparency and Collaboration in an Open Software Repository. In International Conference on Computer Supported Cooperative Work (CSCW). ACM, 1277--1286.
[7]
Tapajit Dey, Sara Mousavi, Eduardo Ponce, Tanner Fry, Bogdan Vasilescu, Anna Filippova, and Audris Mockus. 2020. Detecting and Characterizing Bots That Commit Code. In International Conference on Mining Software Repositories (MSR). ACM, 209--219.
[8]
Mehdi Golzadeh, Alexandre Decan, Damien Legay, and Tom Mens. 2021. A ground-truth dataset and classification model for detecting bots in GitHub issue and PR comments. Journal of Systems and Software 175 (2021).
[9]
Mehdi Golzadeh, Alexandre Decan, and Tom Mens. 2020. Evaluating a bot detection model on git commit messages. In Belgium-Netherlands Software Evolution Workshop (BENEVOL), Vol. 2912. CEUR Workshop Proceedings.
[10]
Mehdi Golzadeh, Tom Mens, Alexandre Decan, Eleni Constantinou, and Natarajan Chidambaram. 2022. Recognizing Bot Activity in Collaborative Software Development. IEEE Software 39, 5 (2022), 56--61.
[11]
Margaret-Anne Storey and Alexey Zagalsky. 2016. Disrupting Developer Productivity One Bot at a Time. In International Symposium on Foundations of Software Engineering (FSE). ACM SIGSOFT, 928--931.
[12]
Zhendong Wang, Yi Wang, and David Redmiles. 2022. From Specialized Mechanics to Project Butlers: The Usage of Bots in Open Source Software Development. IEEE Software 39, 5 (2022), 38--43.
[13]
Mairieli Wessel, Bruno Mendes De Souza, Igor Steinmacher, Igor S. Wiese, Ivanilton Polato, Ana Paula Chaves, and Marco A. Gerosa. 2018. The power of bots: Understanding bots in OSS projects. International Conference on Human-Computer Interaction (CHI) (2018).
[14]
Mairieli Wessel, Igor Wiese, Igor Steinmacher, and Marco A Gerosa. 2021. Don't Disturb Me: Challenges of Interacting with Software Bots on Open Source Software Projects. International Conference on Human-Computer Interaction (CHI) 5 (2021).
[15]
Marvin Wyrich, Raoul Ghit, Tobias Haller, and Christian Müller. 2021. Bots Don't Mind Waiting, Do They? Comparing the Interaction With Automatically and Manually Created Pull Requests. In International Workshop on Bots in Software Engineering (BotSE). 6--10.

Cited By

View all
  • (2024)Hidden in the Code: Visualizing True Developer Identities2024 IEEE Working Conference on Software Visualization (VISSOFT)10.1109/VISSOFT64034.2024.00013(24-35)Online publication date: 6-Oct-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MSR '24: Proceedings of the 21st International Conference on Mining Software Repositories
April 2024
788 pages
ISBN:9798400705878
DOI:10.1145/3643991
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 July 2024

Check for updates

Author Tags

  1. GitHub events
  2. classification model
  3. bot identification

Qualifiers

  • Research-article

Funding Sources

Conference

MSR '24
Sponsor:

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)28
  • Downloads (Last 6 weeks)5
Reflects downloads up to 16 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Hidden in the Code: Visualizing True Developer Identities2024 IEEE Working Conference on Software Visualization (VISSOFT)10.1109/VISSOFT64034.2024.00013(24-35)Online publication date: 6-Oct-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media