Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Blog or block: Detecting blog bots through behavioral biometrics

Published: 01 February 2013 Publication History

Abstract

Blog bots are automated scripts or programs that post comments to blog sites, often including spam or other malicious links. An effective defense against the automatic form filling and posting from blog bots is to detect and validate the human presence. Conventional detection methods usually require direct participation of human users, such as recognizing a CAPTCHA image, which can be burdensome for users. In this paper, we present a new detection approach by using behavioral biometrics, primarily mouse and keystroke dynamics, to distinguish between human and bot. Based on passive monitoring, the proposed approach does not require any direct user participation. We collect real user input data from a very active online community and blog site, and use this data to characterize behavioral differences between human and bot. The most useful features for classification provide the basis for a detection system consisting of two main components: a webpage-embedded logger and a server-side classifier. The webpage-embedded logger records mouse movement and keystroke data while a user is filling out a form, and provides this data in batches to a server-side detector, which classifies the poster as human or bot. Our experimental results demonstrate an overall detection accuracy greater than 99%, with negligible overhead.

References

[1]
Sophos Security Threat Report, 2010 <http://www.sophos.com/sophos/docs/eng/papers/sophos-security-threat-report-jan-2010-wpna.pdf> (accessed 08.03.2012).
[2]
J.-H. Kim, T.-B. Yoon, K.-S. Kim, J.-H. Lee, Trackback-rank: an effective ranking algorithm for the blog search, in: Proceedings of the Second International Symposium on Intelligent Information Technology Application, vol. 03, Washington, DC, USA, 2008, pp. 503-507.
[3]
K. Chellapilla, K. Larson, P. Simard, M. Czerwinski, Designing human friendly human interaction proofs (hips), in: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 2005.
[4]
J. Yan, A.S. El Ahmad, A low-cost attack on a microsoft captcha, in: Proceedings of the 15th ACM Conference on Computer and Communications, Security, 2008, pp. 543-554.
[5]
Akismet, Comment Spam Prevention for Your Blog <http://akismet.com/> (accessed 08.03.2012).
[6]
Stassopoulou, A. and Dikaiakos, M.D., Web robot detection: a probabilistic reasoning approach. Comput. Netw. v53. 265-278.
[7]
Tan, P.-N. and Kumar, V., Discovery of web robot sessions based on their navigational patterns. Data Min. Knowl. Discov. v6. 9-35.
[8]
K. Park, V.S. Pai, K.-W. Lee, S. Calo, Securing web service by automatic robot detection, in: Proceedings of the Annual Conference on USENIX '06 Annual Technical Conference, 2006, pp. 23-23.
[9]
Matyás Jr., V. and Riha, Z., Toward reliable user authentication through biometrics. IEEE Secur. Priv. v1. 45-49.
[10]
Bergadano, F., Gunetti, D. and Picardi, C., User authentication through keystroke dynamics. ACM Trans. Inf. Syst. Secur. v5. 367-397.
[11]
F. Monrose, A. Rubin, Authentication via keystroke dynamics, in: Proceedings of the 4th ACM Conference on Computer and Communications, Security, 1997, pp. 48-56.
[12]
Ahmed, A.A.E. and Traore, I., A new biometric technology based on mouse dynamics. IEEE Trans. Dependable Secur. Comput. v4. 165-179.
[13]
Brown, M. and Rogers, S.J., User identification via keystroke characteristics of typed names using neural networks. Int. J. Man-Mach. Stud. v39. 999-1014.
[14]
L. Ballard, F. Monrose, D. Lopresti, Biometric authentication revisited: understanding the impact of wolves in sheep's clothing, in: Proceedings of the 15th Conference on USENIX Security Symposium, vol. 15, 2006.
[15]
S. Gianvecchio, Z. Wu, M. Xie, H. Wang, Battle of botcraft: fighting bots in online games with human observational proofs, in: Proceedings of the 16th ACM Conference on Computer and Communications Security, Chicago, IL, USA, 2009.
[16]
Blogbot 2.0 (2012 edition) by Incansoft <http://www.incansoft.com/IS0035.php> (accessed 08.03.2012).
[17]
Ultimate Wordpress Comment Submitter <http://www.wordpresscommentspammer.com/> (accessed 08.03.2012).
[18]
Autohotkey - Free Mouse and Keyboard Macro Program with Hotkeys <http://www.autohotkey.com/> (accessed 08.03.2012).
[19]
Autoit, Automation and Scripting Language <http://www.autoitscript.com/site/autoit/> (accessed 08.03.2012).
[20]
Autome - Automate Mouse and Keyboard Actions <http://www.asoftech.com/autome/> (accessed 08.03.2012).
[21]
Global Mouse and Keyboard Library <http://www.codeproject.com/KB/system/globalmousekeyboardlib.aspx> (accessed 08.03.2012).
[22]
Json, Javascript Object Notation <http://www.json.org/> (accessed 08.03.2012).
[23]
C. Jackson, A. Bortz, D. Boneh, J.C. Mitchell, Protecting browser state from web privacy attacks, in: Proceedings of the 15th International Conference on World Wide Web, 2006, pp. 737-744.
[24]
Virtual-Key Codes <http://msdn.microsoft.com/en-us/library/ms927178.aspx> (accessed 08.03.2012).
[25]
S. Gianvecchio, H.Wang, Detecting covert timing channels: an entropy-based approach, in: Proceedings of the 2007 ACM CCS, Alexandria, VA, USA, 2007.
[26]
Z. Chu, S. Gianvecchio, H. Wang, S. Jajodia, Who is tweeting on twitter: human, bot or cyborg?, in: Proceedings of the 2010 Annual Computer Security Applications Conference, Austin, TX, USA, 2010.
[27]
Cover, T.M. and Thomas, J.A., Elements of Information Theory. 2006. Wiley-Interscience, New York, NY, USA.
[28]
A. Porta, G. Baselli, D. Liberati, N. Montano, C. Cogliati, T. Gnecchi-Ruscone, A. Malliani, S. Cerutti, Measuring regularity by means of a corrected conditional entropy in sympathetic outflow, Biological Cybernetics 78.
[29]
Kohavi, R. and Quinlan, R., Decision tree discovery. In: In Handbook of Data Mining and Knowledge Discovery, University Press. pp. 267-276.
[30]
Quinlan, J.R., Discovering Rules from Large Collections of Examples: A Case Study. 1979. Edinburgh University Press.
[31]
The weka data mining software: an update. SIGKDD Explor. Newsl. v11. 10-18.
[32]
Attribute-relation file format (arff) <http://www.cs.waikato.ac.nz/ml/weka/arff.html> (accessed 08.03.2012).
[33]
McLachlan, G., Do, K. and Ambroise, C., Analyzing Microarray Gene Expression Data. 2004. Wiley.
[34]
How much of the web actually work without javascript <http://tobyho.com/HowMuchoftheWebActuallyWorkWithoutJavascript> (accessed 08.03.2012).
[35]
A study of internet users' cookie and javascript settings <http://smorgasbork.com/component/content/article/84-a-study-of-internet-users-cookie-and-javascript-settings> (accessed 08.03.2012).

Cited By

View all
  • (2022)Detecting IMAP Credential Stuffing Bots Using Behavioural BiometricsProceedings of the 2022 12th International Conference on Communication and Network Security10.1145/3586102.3586104(7-15)Online publication date: 1-Dec-2022
  • (2022)How gullible are web measurement tools?Proceedings of the 18th International Conference on emerging Networking EXperiments and Technologies10.1145/3555050.3569131(171-186)Online publication date: 30-Nov-2022
  • (2021)HLISAProceedings of the 21st ACM Internet Measurement Conference10.1145/3487552.3487843(380-389)Online publication date: 2-Nov-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

Publisher

Elsevier North-Holland, Inc.

United States

Publication History

Published: 01 February 2013

Author Tags

  1. Automatic classification
  2. Behavioral biometrics
  3. Blog Bot
  4. Security
  5. Web

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Detecting IMAP Credential Stuffing Bots Using Behavioural BiometricsProceedings of the 2022 12th International Conference on Communication and Network Security10.1145/3586102.3586104(7-15)Online publication date: 1-Dec-2022
  • (2022)How gullible are web measurement tools?Proceedings of the 18th International Conference on emerging Networking EXperiments and Technologies10.1145/3555050.3569131(171-186)Online publication date: 30-Nov-2022
  • (2021)HLISAProceedings of the 21st ACM Internet Measurement Conference10.1145/3487552.3487843(380-389)Online publication date: 2-Nov-2021
  • (2021)Defending Web Servers Against Flash Crowd AttacksApplied Cryptography and Network Security10.1007/978-3-030-78375-4_14(338-361)Online publication date: 21-Jun-2021
  • (2020)Bot recognition in a Web storeJournal of Network and Computer Applications10.1016/j.jnca.2020.102577157:COnline publication date: 1-Jul-2020
  • (2019)A Deep Learning Approach to Web Bot Detection Using Mouse Behavioral BiometricsBiometric Recognition10.1007/978-3-030-31456-9_43(388-395)Online publication date: 12-Oct-2019
  • (2019)Fingerprint Surface-Based Detection of Web Bot DetectorsComputer Security – ESORICS 201910.1007/978-3-030-29962-0_28(586-605)Online publication date: 23-Sep-2019
  • (2017)Improving blog spam filters via machine learningInternational Journal of Data Analysis Techniques and Strategies10.1504/IJDATS.2017.0859019:2(99-121)Online publication date: 1-Jan-2017
  • (2017)A Survey On Automated Dynamic Malware Analysis Evasion and Counter-EvasionProceedings of the 1st Reversing and Offensive-oriented Trends Symposium10.1145/3150376.3150378(1-21)Online publication date: 16-Nov-2017
  • (2016)Combating the evasion mechanisms of social botsComputers and Security10.1016/j.cose.2016.01.00758:C(230-249)Online publication date: 1-May-2016
  • Show More Cited By

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media