Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3538969.3538994acmotherconferencesArticle/Chapter ViewAbstractPublication PagesaresConference Proceedingsconference-collections
research-article

Web Bot Detection Evasion Using Deep Reinforcement Learning

Published: 23 August 2022 Publication History

Abstract

Web bots are vital for the web as they can be used to automate several actions, some of which would have otherwise been impossible or very time consuming. These actions can be benign, such as website testing and web indexing, or malicious, such as unauthorised content scraping, scalping, vulnerability scanning, and more. To detect malicious web bots, recent approaches examine the visitors’ fingerprint and behaviour. For the latter, several values (i.e., features) are usually extracted from visitors’ web logs and used as input to train machine learning models. In this research we show that web bots can use recent advances in machine learning, and, more specifically, Reinforcement Learning (RL), to effectively evade behaviour-based detection techniques. To evaluate these evasive bots, we examine (i) how well they can evade a pre-trained bot detection framework, (ii) how well they can still evade detection after the detection framework is re-trained on new behaviours generated from the evasive web bots, and (iii) how bots perform if re-trained again on the re-trained detection framework. We show that web bots can repeatedly evade detection and adapt to the re-trained detection framework to showcase the importance of considering such types of bots when designing web bot detection frameworks.

References

[1]
Alejandro Acien, Aythami Morales, Julian Fiérrez, Rubén Vera-Rodríguez, and Oscar Delgado-Mohatar. 2020. BeCAPTCHA: Bot Detection in Smartphone Interaction using Touchscreen Biometrics and Mobile Sensors. CoRR abs/2005.13655(2020). arxiv:2005.13655https://arxiv.org/abs/2005.13655
[2]
Akamai. 2021. Akamai’s Bot Manager - Advanced strategies to flexibly manage the long-term business and IT impact of bots. https://www.akamai.com/us/en/multimedia/documents/product-brief/bot-manager-product-brief.pdf
[3]
Ismail Akrout, Amal Feriani, and Mohamed Akrout. 2019. Hacking Google reCAPTCHA v3 using Reinforcement Learning. CoRR abs/1903.01003(2019). arxiv:1903.01003http://arxiv.org/abs/1903.01003
[4]
Shafiq Alam, Gillian Dobbie, Yun Sing Koh, and Patricia Riddle. 2014. Web bots detection using Particle Swarm Optimization based clustering. In Proceedings of the IEEE Congress on Evolutionary Computation, CEC 2014, Beijing, China, July 6-11, 2014. IEEE, 2955–2962. https://doi.org/10.1109/CEC.2014.6900644
[5]
Yasmin AlNoamany, Michele C. Weigle, and Michael L. Nelson. 2013. Access patterns for robots and humans in web archives. In 13th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL ’13, Indianapolis, IN, USA, July 22 - 26, 2013. ACM, 339–348. https://doi.org/10.1145/2467696.2467722
[6]
Babak Amin Azad, Oleksii Starov, Pierre Laperdrix, and Nick Nikiforakis. 2020. Web Runner 2049: Evaluating Third-Party Anti-bot Services. In Detection of Intrusions and Malware, and Vulnerability Assessment - 17th International Conference, DIMVA 2020, Lisbon, Portugal, June 24-26, 2020, Proceedings(Lecture Notes in Computer Science, Vol. 12223). Springer, 135–159. https://doi.org/10.1007/978-3-030-52683-2_7
[7]
Kevin Bock, Daven Patel, George Hughey, and Dave Levin. 2017. unCaptcha: A Low-Resource Defeat of reCaptcha’s Audio Challenge. In 11th USENIX Workshop on Offensive Technologies, WOOT 2017, Vancouver, BC, Canada, August 14-15, 2017. USENIX Association. https://www.usenix.org/conference/woot17/workshop-program/presentation/bock
[8]
Alberto Cabri, Grazyna Suchacka, Stefano Rovetta, and Francesco Masulli. 2018. Online Web Bot Detection Using a Sequential Classification Approach. In 20th IEEE International Conference on High Performance Computing and Communications; 16th IEEE International Conference on Smart City; 4th IEEE International Conference on Data Science and Systems, HPCC/SmartCity/DSS 2018, Exeter, United Kingdom, June 28-30, 2018. IEEE, 1536–1540. https://doi.org/10.1109/HPCC/SmartCity/DSS.2018.00252
[9]
Michele Campobasso, Pavlo Burda, and Luca Allodi. 2019. CARONTE: Crawling Adversarial Resources Over Non-Trusted, High-Profile Environments. In 2019 IEEE European Symposium on Security and Privacy Workshops, EuroS&P Workshops 2019, Stockholm, Sweden, June 17-19, 2019. IEEE, 433–442. https://doi.org/10.1109/EuroSPW.2019.00055
[10]
Jun Chen, Xiangyang Luo, Yanqing Guo, Yi Zhang, and Daofu Gong. 2017. A Survey on Breaking Technique of Text-Based CAPTCHA. Secur. Commun. Networks 2017 (2017), 6898617:1–6898617:15. https://doi.org/10.1155/2017/6898617
[11]
Zi Chu, Steven Gianvecchio, and Haining Wang. 2018. Bot or Human? A Behavior-Based Online Bot Detection System. In From Database to Cyber Security - Essays Dedicated to Sushil Jajodia on the Occasion of His 70th Birthday. 432–449. https://doi.org/10.1007/978-3-030-04834-1_21
[12]
Cloudflare. 2021. Cloudflare Bot Management. https://www.cloudflare.com/en-gb/products/bot-management/
[13]
Derek Doran and Swapna S. Gokhale. 2012. A classification framework for web robots. J. Assoc. Inf. Sci. Technol. 63, 12 (2012), 2549–2554. https://doi.org/10.1002/asi.22741
[14]
Javad Hamidzadeh, Mahdieh Zabihimayvan, and Reza Sadeghi. 2018. Detection of Web site visitors based on fuzzy rough sets. Soft Comput. 22, 7 (2018), 2175–2188. https://doi.org/10.1007/s00500-016-2476-4
[15]
Christos Iliou, Theodoros Kostoulas, Theodora Tsikrika, Vasilis Katos, Stefanos Vrochidis, and Ioannis Kompatsiaris. 2021. Detection of Advanced Web Bots by Combining Web Logs with Mouse Behavioural Biometrics. Digital Threats: Research and Practice 2, 3, Article 24 (jun 2021), 26 pages. https://doi.org/10.1145/3447815
[16]
Christos Iliou, Theodoros Kostoulas, Theodora Tsikrika, Vasilis Katos, Stefanos Vrochidis, and Ioannis Kompatsiaris. 2021. Web Bot Detection Evasion Using Generative Adversarial Networks. In IEEE International Conference on Cyber Security and Resilience, CSR 2021, Rhodes, Greece, July 26-28, 2021. IEEE, 115–120. https://doi.org/10.1109/CSR51186.2021.9527915
[17]
Christos Iliou, Theodoros Kostoulas, Theodora Tsikrika, Vasilis Katos, Stefanos Vrochidis, and Yiannis Kompatsiaris. 2019. Towards a framework for detecting advanced Web bots. In Proceedings of the 14th International Conference on Availability, Reliability and Security, ARES 2019, Canterbury, UK, August 26-29, 2019. ACM, 18:1–18:10. https://doi.org/10.1145/3339252.3339267
[18]
Christos Iliou, Theodora Tsikrika, Stefanos Vrochidis, and Yiannis Kompatsiaris. 2017. Evasive Focused Crawling by Exploiting Human Browsing Behaviour: a Study on Terrorism-Related Content. In Proceedings of the 1st International Workshop on Cyber Deviance Detection co-located with the Tenth International Conference on Web Search and Data Mining (CyberDD @ WSDM 2017), Cambridge, UK, February, 10, 2017.
[19]
Imperva. 2021. Bad Bot Report 2021: The Pandemic of the Internet. https://www.imperva.com/blog/bad-bot-report-2021-the-pandemic-of-the-internet/
[20]
Imperva. 2021. Data User Behavior Analytics. https://www.imperva.com/products/data-user-behavior-analytics/
[21]
Athanasios Lagopoulos and Grigorios Tsoumakas. 2020. Content-aware web robot detection. Appl. Intell. 50, 11 (2020), 4017–4028. https://doi.org/10.1007/s10489-020-01754-9
[22]
Pierre Laperdrix, Nataliia Bielova, Benoit Baudry, and Gildas Avoine. 2020. Browser Fingerprinting: A Survey. ACM Trans. Web 14, 2 (2020), 8:1–8:33. https://doi.org/10.1145/3386040
[23]
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin A. Riedmiller. 2013. Playing Atari with Deep Reinforcement Learning. CoRR abs/1312.5602(2013). arXiv:1312.5602http://arxiv.org/abs/1312.5602
[24]
Stefano Rovetta, Alberto Cabri, Francesco Masulli, and Grazyna Suchacka. 2019. Bot or Not? A Case Study on Bot Recognition from Web Session Logs. In Quantifying and Processing Biomedical and Behavioral Signals. Smart Innovation, Systems and Technologies, Vol. 103. Springer, 197–206. https://doi.org/10.1007/978-3-319-95095-2_19
[25]
Stefano Rovetta, Grazyna Suchacka, and Francesco Masulli. 2020. Bot recognition in a Web store: An approach based on unsupervised learning. J. Netw. Comput. Appl. 157 (2020), 102577. https://doi.org/10.1016/j.jnca.2020.102577
[26]
Nathan Rude and Derek Doran. 2015. Request Type Prediction for Web Robot and Internet of Things Traffic. In 14th IEEE International Conference on Machine Learning and Applications, ICMLA 2015, Miami, FL, USA, December 9-11, 2015. IEEE, 995–1000. https://doi.org/10.1109/ICMLA.2015.53
[27]
Michael Schwarz, Florian Lackner, and Daniel Gruss. 2019. JavaScript Template Attacks: Automatically Inferring Host Information for Targeted Exploits. In 26th Annual Network and Distributed System Security Symposium, NDSS 2019, San Diego, California, USA, February 24-27, 2019. The Internet Society.
[28]
David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Vedavyas Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy P. Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, and Demis Hassabis. 2016. Mastering the game of Go with deep neural networks and tree search. Nat. 529, 7587 (2016), 484–489. https://doi.org/10.1038/nature16961
[29]
Dilip Singh Sisodia, Shrish Verma, and Om Prakash Vyas. 2015. Agglomerative approach for identification and elimination of web robots from web server logs to extract knowledge about actual visitors. Journal of Data Analysis and Information Processing 3, 01(2015), 1.
[30]
Suphannee Sivakorn, Iasonas Polakis, and Angelos D. Keromytis. 2016. I am Robot: (Deep) Learning to Break Semantic Image CAPTCHAs. In IEEE European Symposium on Security and Privacy, EuroS&P 2016, Saarbrücken, Germany, March 21-24, 2016. 388–403.
[31]
Dusan Stevanovic, Aijun An, and Natalija Vlajic. 2012. Feature evaluation for web crawler detection with data mining techniques. Expert Syst. Appl. 39, 10 (2012), 8707–8717. https://doi.org/10.1016/j.eswa.2012.01.210
[32]
Dusan Stevanovic, Natalija Vlajic, and Aijun An. 2013. Detection of malicious and non-malicious website visitors using unsupervised neural network learning. Appl. Soft Comput. 13, 1 (2013), 698–708. https://doi.org/10.1016/j.asoc.2012.08.028
[33]
Grazyna Suchacka and Mariusz Sobkow. 2015. Detection of Internet robots using a Bayesian approach. In 2nd IEEE International Conference on Cybernetics, CYBCONF 2015, Gdynia, Poland, June 24-26, 2015. IEEE, 365–370. https://doi.org/10.1109/CYBConf.2015.7175961
[34]
Antoine Vastel, Walter Rudametkin, Romain Rouvoy, and Xavier Blanc. 2020. FP-Crawlers: studying the resilience of browser fingerprinting to block crawlers. In MADWeb’20-NDSS Workshop on Measurements, Attacks, and Defenses for the Web.
[35]
Luis von Ahn, Manuel Blum, Nicholas J. Hopper, and John Langford. 2003. CAPTCHA: Using Hard AI Problems for Security. In Advances in Cryptology - EUROCRYPT 2003, International Conference on the Theory and Applications of Cryptographic Techniques, Warsaw, Poland, May 4-8, 2003, Proceedings(Lecture Notes in Computer Science, Vol. 2656). Springer, 294–311. https://doi.org/10.1007/3-540-39200-9_18
[36]
Ang Wei, Yuxuan Zhao, and Zhongmin Cai. 2019. A Deep Learning Approach to Web Bot Detection Using Mouse Behavioral Biometrics. In Biometric Recognition - 14th Chinese Conference, CCBR 2019, Zhuzhou, China, October 12-13, 2019, Proceedings. 388–395. https://doi.org/10.1007/978-3-030-31456-9_43
[37]
Mahdieh Zabihi, Majid Vafaei Jahan, and Javad Hamidzadeh. 2014. A density based clustering approach for web robot detection. In 2014 4th International Conference on Computer and Knowledge Engineering (ICCKE). IEEE, 23–28.
[38]
Mahdieh Zabihimayvan, Reza Sadeghi, H. Nathan Rude, and Derek Doran. 2017. A soft computing approach for benign and malicious web robot detection. Expert Syst. Appl. 87(2017), 129–140. https://doi.org/10.1016/j.eswa.2017.06.004

Cited By

View all
  • (2024)KimeraPAD: A Novel Low-Overhead Real-Time Defense Against Website Fingerprinting Attacks Based on Deep Reinforcement LearningIEEE Transactions on Network and Service Management10.1109/TNSM.2024.336008221:3(2944-2961)Online publication date: Jun-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ARES '22: Proceedings of the 17th International Conference on Availability, Reliability and Security
August 2022
1371 pages
ISBN:9781450396707
DOI:10.1145/3538969
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 August 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. advanced web bots
  2. evasive web bots
  3. reinforcement learning
  4. web bot detection
  5. web logs

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

ARES 2022

Acceptance Rates

Overall Acceptance Rate 228 of 451 submissions, 51%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)82
  • Downloads (Last 6 weeks)2
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2024)KimeraPAD: A Novel Low-Overhead Real-Time Defense Against Website Fingerprinting Attacks Based on Deep Reinforcement LearningIEEE Transactions on Network and Service Management10.1109/TNSM.2024.336008221:3(2944-2961)Online publication date: Jun-2024

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media