research-article

Web Bot Detection Evasion Using Deep Reinforcement Learning

Authors:

Christos Iliou,

Theodoros Kostoulas,

Theodora Tsikrika,

Vasilios Katos,

Stefanos Vrochidis,

Ioannis KompatsiarisAuthors Info & Claims

ARES '22: Proceedings of the 17th International Conference on Availability, Reliability and Security

Article No.: 15, Pages 1 - 10

https://doi.org/10.1145/3538969.3538994

Published: 23 August 2022 Publication History

Abstract

Web bots are vital for the web as they can be used to automate several actions, some of which would have otherwise been impossible or very time consuming. These actions can be benign, such as website testing and web indexing, or malicious, such as unauthorised content scraping, scalping, vulnerability scanning, and more. To detect malicious web bots, recent approaches examine the visitors’ fingerprint and behaviour. For the latter, several values (i.e., features) are usually extracted from visitors’ web logs and used as input to train machine learning models. In this research we show that web bots can use recent advances in machine learning, and, more specifically, Reinforcement Learning (RL), to effectively evade behaviour-based detection techniques. To evaluate these evasive bots, we examine (i) how well they can evade a pre-trained bot detection framework, (ii) how well they can still evade detection after the detection framework is re-trained on new behaviours generated from the evasive web bots, and (iii) how bots perform if re-trained again on the re-trained detection framework. We show that web bots can repeatedly evade detection and adapt to the re-trained detection framework to showcase the importance of considering such types of bots when designing web bot detection frameworks.

References

[1]

Alejandro Acien, Aythami Morales, Julian Fiérrez, Rubén Vera-Rodríguez, and Oscar Delgado-Mohatar. 2020. BeCAPTCHA: Bot Detection in Smartphone Interaction using Touchscreen Biometrics and Mobile Sensors. CoRR abs/2005.13655(2020). arxiv:2005.13655https://arxiv.org/abs/2005.13655

[2]

Akamai. 2021. Akamai’s Bot Manager - Advanced strategies to flexibly manage the long-term business and IT impact of bots. https://www.akamai.com/us/en/multimedia/documents/product-brief/bot-manager-product-brief.pdf

[3]

Ismail Akrout, Amal Feriani, and Mohamed Akrout. 2019. Hacking Google reCAPTCHA v3 using Reinforcement Learning. CoRR abs/1903.01003(2019). arxiv:1903.01003http://arxiv.org/abs/1903.01003

[4]

Shafiq Alam, Gillian Dobbie, Yun Sing Koh, and Patricia Riddle. 2014. Web bots detection using Particle Swarm Optimization based clustering. In Proceedings of the IEEE Congress on Evolutionary Computation, CEC 2014, Beijing, China, July 6-11, 2014. IEEE, 2955–2962. https://doi.org/10.1109/CEC.2014.6900644

[5]

Yasmin AlNoamany, Michele C. Weigle, and Michael L. Nelson. 2013. Access patterns for robots and humans in web archives. In 13th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL ’13, Indianapolis, IN, USA, July 22 - 26, 2013. ACM, 339–348. https://doi.org/10.1145/2467696.2467722

Digital Library

[6]

Babak Amin Azad, Oleksii Starov, Pierre Laperdrix, and Nick Nikiforakis. 2020. Web Runner 2049: Evaluating Third-Party Anti-bot Services. In Detection of Intrusions and Malware, and Vulnerability Assessment - 17th International Conference, DIMVA 2020, Lisbon, Portugal, June 24-26, 2020, Proceedings(Lecture Notes in Computer Science, Vol. 12223). Springer, 135–159. https://doi.org/10.1007/978-3-030-52683-2_7

Digital Library

[7]

Kevin Bock, Daven Patel, George Hughey, and Dave Levin. 2017. unCaptcha: A Low-Resource Defeat of reCaptcha’s Audio Challenge. In 11th USENIX Workshop on Offensive Technologies, WOOT 2017, Vancouver, BC, Canada, August 14-15, 2017. USENIX Association. https://www.usenix.org/conference/woot17/workshop-program/presentation/bock

[8]

Alberto Cabri, Grazyna Suchacka, Stefano Rovetta, and Francesco Masulli. 2018. Online Web Bot Detection Using a Sequential Classification Approach. In 20th IEEE International Conference on High Performance Computing and Communications; 16th IEEE International Conference on Smart City; 4th IEEE International Conference on Data Science and Systems, HPCC/SmartCity/DSS 2018, Exeter, United Kingdom, June 28-30, 2018. IEEE, 1536–1540. https://doi.org/10.1109/HPCC/SmartCity/DSS.2018.00252

[9]

Michele Campobasso, Pavlo Burda, and Luca Allodi. 2019. CARONTE: Crawling Adversarial Resources Over Non-Trusted, High-Profile Environments. In 2019 IEEE European Symposium on Security and Privacy Workshops, EuroS&P Workshops 2019, Stockholm, Sweden, June 17-19, 2019. IEEE, 433–442. https://doi.org/10.1109/EuroSPW.2019.00055

[10]

Jun Chen, Xiangyang Luo, Yanqing Guo, Yi Zhang, and Daofu Gong. 2017. A Survey on Breaking Technique of Text-Based CAPTCHA. Secur. Commun. Networks 2017 (2017), 6898617:1–6898617:15. https://doi.org/10.1155/2017/6898617

[11]

Zi Chu, Steven Gianvecchio, and Haining Wang. 2018. Bot or Human? A Behavior-Based Online Bot Detection System. In From Database to Cyber Security - Essays Dedicated to Sushil Jajodia on the Occasion of His 70th Birthday. 432–449. https://doi.org/10.1007/978-3-030-04834-1_21

[12]

Cloudflare. 2021. Cloudflare Bot Management. https://www.cloudflare.com/en-gb/products/bot-management/

[13]

Derek Doran and Swapna S. Gokhale. 2012. A classification framework for web robots. J. Assoc. Inf. Sci. Technol. 63, 12 (2012), 2549–2554. https://doi.org/10.1002/asi.22741

Digital Library

[14]

Javad Hamidzadeh, Mahdieh Zabihimayvan, and Reza Sadeghi. 2018. Detection of Web site visitors based on fuzzy rough sets. Soft Comput. 22, 7 (2018), 2175–2188. https://doi.org/10.1007/s00500-016-2476-4

Digital Library

[15]

Christos Iliou, Theodoros Kostoulas, Theodora Tsikrika, Vasilis Katos, Stefanos Vrochidis, and Ioannis Kompatsiaris. 2021. Detection of Advanced Web Bots by Combining Web Logs with Mouse Behavioural Biometrics. Digital Threats: Research and Practice 2, 3, Article 24 (jun 2021), 26 pages. https://doi.org/10.1145/3447815

Digital Library

[16]

Christos Iliou, Theodoros Kostoulas, Theodora Tsikrika, Vasilis Katos, Stefanos Vrochidis, and Ioannis Kompatsiaris. 2021. Web Bot Detection Evasion Using Generative Adversarial Networks. In IEEE International Conference on Cyber Security and Resilience, CSR 2021, Rhodes, Greece, July 26-28, 2021. IEEE, 115–120. https://doi.org/10.1109/CSR51186.2021.9527915

[17]

Christos Iliou, Theodoros Kostoulas, Theodora Tsikrika, Vasilis Katos, Stefanos Vrochidis, and Yiannis Kompatsiaris. 2019. Towards a framework for detecting advanced Web bots. In Proceedings of the 14th International Conference on Availability, Reliability and Security, ARES 2019, Canterbury, UK, August 26-29, 2019. ACM, 18:1–18:10. https://doi.org/10.1145/3339252.3339267

Digital Library

[18]

Christos Iliou, Theodora Tsikrika, Stefanos Vrochidis, and Yiannis Kompatsiaris. 2017. Evasive Focused Crawling by Exploiting Human Browsing Behaviour: a Study on Terrorism-Related Content. In Proceedings of the 1st International Workshop on Cyber Deviance Detection co-located with the Tenth International Conference on Web Search and Data Mining (CyberDD @ WSDM 2017), Cambridge, UK, February, 10, 2017.

[19]

Imperva. 2021. Bad Bot Report 2021: The Pandemic of the Internet. https://www.imperva.com/blog/bad-bot-report-2021-the-pandemic-of-the-internet/

[20]

Imperva. 2021. Data User Behavior Analytics. https://www.imperva.com/products/data-user-behavior-analytics/

[21]

Athanasios Lagopoulos and Grigorios Tsoumakas. 2020. Content-aware web robot detection. Appl. Intell. 50, 11 (2020), 4017–4028. https://doi.org/10.1007/s10489-020-01754-9

Digital Library

[22]

Pierre Laperdrix, Nataliia Bielova, Benoit Baudry, and Gildas Avoine. 2020. Browser Fingerprinting: A Survey. ACM Trans. Web 14, 2 (2020), 8:1–8:33. https://doi.org/10.1145/3386040

Digital Library

[23]

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin A. Riedmiller. 2013. Playing Atari with Deep Reinforcement Learning. CoRR abs/1312.5602(2013). arXiv:1312.5602http://arxiv.org/abs/1312.5602

[24]

Stefano Rovetta, Alberto Cabri, Francesco Masulli, and Grazyna Suchacka. 2019. Bot or Not? A Case Study on Bot Recognition from Web Session Logs. In Quantifying and Processing Biomedical and Behavioral Signals. Smart Innovation, Systems and Technologies, Vol. 103. Springer, 197–206. https://doi.org/10.1007/978-3-319-95095-2_19

[25]

Stefano Rovetta, Grazyna Suchacka, and Francesco Masulli. 2020. Bot recognition in a Web store: An approach based on unsupervised learning. J. Netw. Comput. Appl. 157 (2020), 102577. https://doi.org/10.1016/j.jnca.2020.102577

Digital Library

[26]

Nathan Rude and Derek Doran. 2015. Request Type Prediction for Web Robot and Internet of Things Traffic. In 14th IEEE International Conference on Machine Learning and Applications, ICMLA 2015, Miami, FL, USA, December 9-11, 2015. IEEE, 995–1000. https://doi.org/10.1109/ICMLA.2015.53

[27]

Michael Schwarz, Florian Lackner, and Daniel Gruss. 2019. JavaScript Template Attacks: Automatically Inferring Host Information for Targeted Exploits. In 26th Annual Network and Distributed System Security Symposium, NDSS 2019, San Diego, California, USA, February 24-27, 2019. The Internet Society.

[28]

David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Vedavyas Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy P. Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, and Demis Hassabis. 2016. Mastering the game of Go with deep neural networks and tree search. Nat. 529, 7587 (2016), 484–489. https://doi.org/10.1038/nature16961

[29]

Dilip Singh Sisodia, Shrish Verma, and Om Prakash Vyas. 2015. Agglomerative approach for identification and elimination of web robots from web server logs to extract knowledge about actual visitors. Journal of Data Analysis and Information Processing 3, 01(2015), 1.

[30]

Suphannee Sivakorn, Iasonas Polakis, and Angelos D. Keromytis. 2016. I am Robot: (Deep) Learning to Break Semantic Image CAPTCHAs. In IEEE European Symposium on Security and Privacy, EuroS&P 2016, Saarbrücken, Germany, March 21-24, 2016. 388–403.

[31]

Dusan Stevanovic, Aijun An, and Natalija Vlajic. 2012. Feature evaluation for web crawler detection with data mining techniques. Expert Syst. Appl. 39, 10 (2012), 8707–8717. https://doi.org/10.1016/j.eswa.2012.01.210

Digital Library

[32]

Dusan Stevanovic, Natalija Vlajic, and Aijun An. 2013. Detection of malicious and non-malicious website visitors using unsupervised neural network learning. Appl. Soft Comput. 13, 1 (2013), 698–708. https://doi.org/10.1016/j.asoc.2012.08.028

Digital Library

[33]

Grazyna Suchacka and Mariusz Sobkow. 2015. Detection of Internet robots using a Bayesian approach. In 2nd IEEE International Conference on Cybernetics, CYBCONF 2015, Gdynia, Poland, June 24-26, 2015. IEEE, 365–370. https://doi.org/10.1109/CYBConf.2015.7175961

[34]

Antoine Vastel, Walter Rudametkin, Romain Rouvoy, and Xavier Blanc. 2020. FP-Crawlers: studying the resilience of browser fingerprinting to block crawlers. In MADWeb’20-NDSS Workshop on Measurements, Attacks, and Defenses for the Web.

[35]

Luis von Ahn, Manuel Blum, Nicholas J. Hopper, and John Langford. 2003. CAPTCHA: Using Hard AI Problems for Security. In Advances in Cryptology - EUROCRYPT 2003, International Conference on the Theory and Applications of Cryptographic Techniques, Warsaw, Poland, May 4-8, 2003, Proceedings(Lecture Notes in Computer Science, Vol. 2656). Springer, 294–311. https://doi.org/10.1007/3-540-39200-9_18

[36]

Ang Wei, Yuxuan Zhao, and Zhongmin Cai. 2019. A Deep Learning Approach to Web Bot Detection Using Mouse Behavioral Biometrics. In Biometric Recognition - 14th Chinese Conference, CCBR 2019, Zhuzhou, China, October 12-13, 2019, Proceedings. 388–395. https://doi.org/10.1007/978-3-030-31456-9_43

Digital Library

[37]

Mahdieh Zabihi, Majid Vafaei Jahan, and Javad Hamidzadeh. 2014. A density based clustering approach for web robot detection. In 2014 4th International Conference on Computer and Knowledge Engineering (ICCKE). IEEE, 23–28.

[38]

Mahdieh Zabihimayvan, Reza Sadeghi, H. Nathan Rude, and Derek Doran. 2017. A soft computing approach for benign and malicious web robot detection. Expert Syst. Appl. 87(2017), 129–140. https://doi.org/10.1016/j.eswa.2017.06.004

Digital Library

Cited By

Jiang MCui BFu JWang TWang Z(2024)KimeraPAD: A Novel Low-Overhead Real-Time Defense Against Website Fingerprinting Attacks Based on Deep Reinforcement LearningIEEE Transactions on Network and Service Management10.1109/TNSM.2024.336008221:3(2944-2961)Online publication date: Jun-2024
https://doi.org/10.1109/TNSM.2024.3360082

Index Terms

Web Bot Detection Evasion Using Deep Reinforcement Learning
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Reinforcement learning
      2. Supervised learning
        Supervised learning by classification
2. Information systems
  1. World Wide Web
    1. Web mining
      1. Web log analysis

Recommendations

Detection of Advanced Web Bots by Combining Web Logs with Mouse Behavioural Biometrics

Web bots vary in sophistication based on their purpose, ranging from simple automated scripts to advanced web bots that have a browser fingerprint, support the main browser functionalities, and exhibit a humanlike behaviour. Advanced web bots are ...
Towards a framework for detecting advanced Web bots
ARES '19: Proceedings of the 14th International Conference on Availability, Reliability and Security

Automated programs (bots) are responsible for a large percentage of website traffic. These bots can either be used for benign purposes, such as Web indexing, Website monitoring (validation of hyperlinks and HTML code), feed fetching Web content and data ...
Enhancing Machine Learning Based Malware Detection Model by Reinforcement Learning
ICCNS '18: Proceedings of the 8th International Conference on Communication and Network Security

Malware detection is getting more and more attention due to the rapid growth of new malware. As a result, machine learning (ML) has become a popular way to detect malware variants. However, machine learning models can also be cheated. Through ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ARES '22: Proceedings of the 17th International Conference on Availability, Reliability and Security

August 2022

1371 pages

ISBN:9781450396707

DOI:10.1145/3538969

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 August 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

European Commission

Conference

ARES 2022

ARES 2022: The 17th International Conference on Availability, Reliability and Security

August 23 - 26, 2022

Vienna, Austria

Acceptance Rates

Overall Acceptance Rate 228 of 451 submissions, 51%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
194
Total Downloads

Downloads (Last 12 months)82
Downloads (Last 6 weeks)2

Reflects downloads up to 30 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Jiang MCui BFu JWang TWang Z(2024)KimeraPAD: A Novel Low-Overhead Real-Time Defense Against Website Fingerprinting Attacks Based on Deep Reinforcement LearningIEEE Transactions on Network and Service Management10.1109/TNSM.2024.336008221:3(2944-2961)Online publication date: Jun-2024
https://doi.org/10.1109/TNSM.2024.3360082

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents