article

Blog or block: Detecting blog bots through behavioral biometrics

Authors:

Steven Gianvecchio,

Sushil JajodiaAuthors Info & Claims

Computer Networks: The International Journal of Computer and Telecommunications Networking, Volume 57, Issue 3

Pages 634 - 646

https://doi.org/10.1016/j.comnet.2012.10.005

Published: 01 February 2013 Publication History

Abstract

Blog bots are automated scripts or programs that post comments to blog sites, often including spam or other malicious links. An effective defense against the automatic form filling and posting from blog bots is to detect and validate the human presence. Conventional detection methods usually require direct participation of human users, such as recognizing a CAPTCHA image, which can be burdensome for users. In this paper, we present a new detection approach by using behavioral biometrics, primarily mouse and keystroke dynamics, to distinguish between human and bot. Based on passive monitoring, the proposed approach does not require any direct user participation. We collect real user input data from a very active online community and blog site, and use this data to characterize behavioral differences between human and bot. The most useful features for classification provide the basis for a detection system consisting of two main components: a webpage-embedded logger and a server-side classifier. The webpage-embedded logger records mouse movement and keystroke data while a user is filling out a form, and provides this data in batches to a server-side detector, which classifies the poster as human or bot. Our experimental results demonstrate an overall detection accuracy greater than 99%, with negligible overhead.

References

[1]

Sophos Security Threat Report, 2010 <http://www.sophos.com/sophos/docs/eng/papers/sophos-security-threat-report-jan-2010-wpna.pdf> (accessed 08.03.2012).

[2]

J.-H. Kim, T.-B. Yoon, K.-S. Kim, J.-H. Lee, Trackback-rank: an effective ranking algorithm for the blog search, in: Proceedings of the Second International Symposium on Intelligent Information Technology Application, vol. 03, Washington, DC, USA, 2008, pp. 503-507.

Digital Library

[3]

K. Chellapilla, K. Larson, P. Simard, M. Czerwinski, Designing human friendly human interaction proofs (hips), in: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 2005.

[4]

J. Yan, A.S. El Ahmad, A low-cost attack on a microsoft captcha, in: Proceedings of the 15th ACM Conference on Computer and Communications, Security, 2008, pp. 543-554.

Digital Library

[5]

Akismet, Comment Spam Prevention for Your Blog <http://akismet.com/> (accessed 08.03.2012).

[6]

Stassopoulou, A. and Dikaiakos, M.D., Web robot detection: a probabilistic reasoning approach. Comput. Netw. v53. 265-278.

Digital Library

[7]

Tan, P.-N. and Kumar, V., Discovery of web robot sessions based on their navigational patterns. Data Min. Knowl. Discov. v6. 9-35.

[8]

K. Park, V.S. Pai, K.-W. Lee, S. Calo, Securing web service by automatic robot detection, in: Proceedings of the Annual Conference on USENIX '06 Annual Technical Conference, 2006, pp. 23-23.

Digital Library

[9]

Matyás Jr., V. and Riha, Z., Toward reliable user authentication through biometrics. IEEE Secur. Priv. v1. 45-49.

[10]

Bergadano, F., Gunetti, D. and Picardi, C., User authentication through keystroke dynamics. ACM Trans. Inf. Syst. Secur. v5. 367-397.

Digital Library

[11]

F. Monrose, A. Rubin, Authentication via keystroke dynamics, in: Proceedings of the 4th ACM Conference on Computer and Communications, Security, 1997, pp. 48-56.

Digital Library

[12]

Ahmed, A.A.E. and Traore, I., A new biometric technology based on mouse dynamics. IEEE Trans. Dependable Secur. Comput. v4. 165-179.

Digital Library

[13]

Brown, M. and Rogers, S.J., User identification via keystroke characteristics of typed names using neural networks. Int. J. Man-Mach. Stud. v39. 999-1014.

Digital Library

[14]

L. Ballard, F. Monrose, D. Lopresti, Biometric authentication revisited: understanding the impact of wolves in sheep's clothing, in: Proceedings of the 15th Conference on USENIX Security Symposium, vol. 15, 2006.

[15]

S. Gianvecchio, Z. Wu, M. Xie, H. Wang, Battle of botcraft: fighting bots in online games with human observational proofs, in: Proceedings of the 16th ACM Conference on Computer and Communications Security, Chicago, IL, USA, 2009.

[16]

Blogbot 2.0 (2012 edition) by Incansoft <http://www.incansoft.com/IS0035.php> (accessed 08.03.2012).

[17]

Ultimate Wordpress Comment Submitter <http://www.wordpresscommentspammer.com/> (accessed 08.03.2012).

[18]

Autohotkey - Free Mouse and Keyboard Macro Program with Hotkeys <http://www.autohotkey.com/> (accessed 08.03.2012).

[19]

Autoit, Automation and Scripting Language <http://www.autoitscript.com/site/autoit/> (accessed 08.03.2012).

[20]

Autome - Automate Mouse and Keyboard Actions <http://www.asoftech.com/autome/> (accessed 08.03.2012).

[21]

Global Mouse and Keyboard Library <http://www.codeproject.com/KB/system/globalmousekeyboardlib.aspx> (accessed 08.03.2012).

[22]

Json, Javascript Object Notation <http://www.json.org/> (accessed 08.03.2012).

[23]

C. Jackson, A. Bortz, D. Boneh, J.C. Mitchell, Protecting browser state from web privacy attacks, in: Proceedings of the 15th International Conference on World Wide Web, 2006, pp. 737-744.

Digital Library

[24]

Virtual-Key Codes <http://msdn.microsoft.com/en-us/library/ms927178.aspx> (accessed 08.03.2012).

[25]

S. Gianvecchio, H.Wang, Detecting covert timing channels: an entropy-based approach, in: Proceedings of the 2007 ACM CCS, Alexandria, VA, USA, 2007.

[26]

Z. Chu, S. Gianvecchio, H. Wang, S. Jajodia, Who is tweeting on twitter: human, bot or cyborg?, in: Proceedings of the 2010 Annual Computer Security Applications Conference, Austin, TX, USA, 2010.

[27]

Cover, T.M. and Thomas, J.A., Elements of Information Theory. 2006. Wiley-Interscience, New York, NY, USA.

[28]

A. Porta, G. Baselli, D. Liberati, N. Montano, C. Cogliati, T. Gnecchi-Ruscone, A. Malliani, S. Cerutti, Measuring regularity by means of a corrected conditional entropy in sympathetic outflow, Biological Cybernetics 78.

[29]

Kohavi, R. and Quinlan, R., Decision tree discovery. In: In Handbook of Data Mining and Knowledge Discovery, University Press. pp. 267-276.

[30]

Quinlan, J.R., Discovering Rules from Large Collections of Examples: A Case Study. 1979. Edinburgh University Press.

[31]

The weka data mining software: an update. SIGKDD Explor. Newsl. v11. 10-18.

[32]

Attribute-relation file format (arff) <http://www.cs.waikato.ac.nz/ml/weka/arff.html> (accessed 08.03.2012).

[33]

McLachlan, G., Do, K. and Ambroise, C., Analyzing Microarray Gene Expression Data. 2004. Wiley.

[34]

How much of the web actually work without javascript <http://tobyho.com/HowMuchoftheWebActuallyWorkWithoutJavascript> (accessed 08.03.2012).

[35]

A study of internet users' cookie and javascript settings <http://smorgasbork.com/component/content/article/84-a-study-of-internet-users-cookie-and-javascript-settings> (accessed 08.03.2012).

Cited By

Barkworth ATabassum RHabibi Lashkari A(2022)Detecting IMAP Credential Stuffing Bots Using Behavioural BiometricsProceedings of the 2022 12th International Conference on Communication and Network Security10.1145/3586102.3586104(7-15)Online publication date: 1-Dec-2022
https://dl.acm.org/doi/10.1145/3586102.3586104
Krumnow BJonker HKarsch SBianchi GMei A(2022)How gullible are web measurement tools?Proceedings of the 18th International Conference on emerging Networking EXperiments and Technologies10.1145/3555050.3569131(171-186)Online publication date: 30-Nov-2022
https://dl.acm.org/doi/10.1145/3555050.3569131
Goßen DJonker HKarsch SKrumnow BRoefs DLevin DMislove AAmann JLuckie M(2021)HLISAProceedings of the 21st ACM Internet Measurement Conference10.1145/3487552.3487843(380-389)Online publication date: 2-Nov-2021
https://dl.acm.org/doi/10.1145/3487552.3487843
Show More Cited By

Recommendations

Leveraging knowledge across media for spammer detection in microblogging
SIGIR '14: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval

While microblogging has emerged as an important information sharing and communication platform, it has also become a convenient venue for spammers to overwhelm other users with unwanted content. Currently, spammer detection in microblogging focuses on ...
Towards Web-Based Biometric Systems Using Personal Browsing Interests
ARES '13: Proceedings of the 2013 International Conference on Availability, Reliability and Security

We investigate the potential to use browsing habits and browser history as a new authentication and identification system for the Web with potential applications to anomaly and fraud detection. For the first time, we provide an empirical analysis using ...
Detecting spam blogs from blog search results

Blogging has been an emerging media for people to express themselves. However, the presence of spam blogs (also known as splogs) may reduce the value of blogs and blog search engines. Hence, splog detection has recently attracted much attention from ...

Comments

Information & Contributors

Information

Published In

Copyright © Elsevier B.V. © 2012.

Publisher

Elsevier North-Holland, Inc.

United States

Publication History

Published: 01 February 2013

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

12
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 17 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Barkworth ATabassum RHabibi Lashkari A(2022)Detecting IMAP Credential Stuffing Bots Using Behavioural BiometricsProceedings of the 2022 12th International Conference on Communication and Network Security10.1145/3586102.3586104(7-15)Online publication date: 1-Dec-2022
https://dl.acm.org/doi/10.1145/3586102.3586104
Krumnow BJonker HKarsch SBianchi GMei A(2022)How gullible are web measurement tools?Proceedings of the 18th International Conference on emerging Networking EXperiments and Technologies10.1145/3555050.3569131(171-186)Online publication date: 30-Nov-2022
https://dl.acm.org/doi/10.1145/3555050.3569131
Goßen DJonker HKarsch SKrumnow BRoefs DLevin DMislove AAmann JLuckie M(2021)HLISAProceedings of the 21st ACM Internet Measurement Conference10.1145/3487552.3487843(380-389)Online publication date: 2-Nov-2021
https://dl.acm.org/doi/10.1145/3487552.3487843
Tandon RPalia ARamani JPaulsen BBartlett GMirkovic J(2021)Defending Web Servers Against Flash Crowd AttacksApplied Cryptography and Network Security10.1007/978-3-030-78375-4_14(338-361)Online publication date: 21-Jun-2021
https://dl.acm.org/doi/10.1007/978-3-030-78375-4_14
Rovetta SSuchacka GMasulli F(2020)Bot recognition in a Web storeJournal of Network and Computer Applications10.1016/j.jnca.2020.102577157:COnline publication date: 1-Jul-2020
https://dl.acm.org/doi/10.1016/j.jnca.2020.102577
Wei AZhao YCai Z(2019)A Deep Learning Approach to Web Bot Detection Using Mouse Behavioral BiometricsBiometric Recognition10.1007/978-3-030-31456-9_43(388-395)Online publication date: 12-Oct-2019
https://dl.acm.org/doi/10.1007/978-3-030-31456-9_43
Jonker HKrumnow BVlot G(2019)Fingerprint Surface-Based Detection of Web Bot DetectorsComputer Security – ESORICS 201910.1007/978-3-030-29962-0_28(586-605)Online publication date: 23-Sep-2019
https://dl.acm.org/doi/10.1007/978-3-030-29962-0_28
(2017)Improving blog spam filters via machine learningInternational Journal of Data Analysis Techniques and Strategies10.1504/IJDATS.2017.0859019:2(99-121)Online publication date: 1-Jan-2017
https://dl.acm.org/doi/10.1504/IJDATS.2017.085901
Bulazel AYener B(2017)A Survey On Automated Dynamic Malware Analysis Evasion and Counter-EvasionProceedings of the 1st Reversing and Offensive-oriented Trends Symposium10.1145/3150376.3150378(1-21)Online publication date: 16-Nov-2017
https://dl.acm.org/doi/10.1145/3150376.3150378
Ji YHe YJiang XCao JLi Q(2016)Combating the evasion mechanisms of social botsComputers and Security10.1016/j.cose.2016.01.00758:C(230-249)Online publication date: 1-May-2016
https://dl.acm.org/doi/10.1016/j.cose.2016.01.007
Show More Cited By

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents