Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

A soft computing approach for benign and malicious web robot detection

Published: 30 November 2017 Publication History
  • Get Citation Alerts
  • Abstract

    We propose a method called SMART (Soft computing for MAlicious RoboT detection).The method detects benign and malicious robots, and human visitors to a web server.SMART selects its features on a particular web server by fuzzy rough set theory.A graph-based clustering algorithm classifies sessions into the three agent types.Analyses on web logs suggest state-of-the-art results to detect both robot types. The accurate detection of web robot sessions from a web server log is essential to take accurate traffic-level measurements and to protect the performance and privacy of information on a Web server. Moreover, the irrecoverable risks of visits from malicious robots that intentionally try to evade web server intrusion detection systems, covering-up their visits with fabricated fields in their http request packets, cannot be ignored. To separate both types of robots from humans in practice, analysts turn to heuristic methods or state-of-the-art soft computing approaches that have only been tuned to the specification of a kind of web server. Noting that the landscape of web robot agents is ever changing, and that behavioral patterns and characteristics vary across different web servers, both options are lacking. To overcome this challenge, this paper presents SMART, a soft computing system that simultaneously detects benign and malicious types of robot agents from web server logs and can automatically adapt to the session characteristics of a web server. The results of experiments over some access log file servers, each servicing different domains of the web, demonstrate outperformance of the proposed method on state-of-the-art ones for benign and malicious robot detection.

    References

    [1]
    Incapsula, 2015 bot traffic report: Humans take back the web, bad bots not giving any ground. (Dec. 2015). https://www.incapsula.com/blog/bot-traffic-report-2015.html.
    [2]
    E. Amig, J. Gonzalo, F. Verdejo, A general evaluation measure for document organization tasks, in: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval, ACM, 2013, pp. 643-652.
    [3]
    Article Baz. (Nov. 2013). http://www.articlebaz.com.
    [4]
    L. Atzori, A. Iera, G. Morabito, The internet of things: A survey, Computer Networks, 54 (2010) 2787-2805.
    [5]
    C. Bomhardt, W. Gaul, L. Schmidt-Thieme, Web robot detection-preprocessing web log files for robot detection, Springer, Berlin, Heidelberg, 2005.
    [6]
    M.D. Dikaiakos, A. Stassopoulou, L. Papageorgiou, An investigation of web crawler behavior: Characterization and metrics, Computer Communications, 28 (2005) 880-897.
    [7]
    D. Doran, S.S. Gokhale, Classifying web robots by K-means clustering, in: Proceedings of the international conference on software engineering and knowledge engineering, 2009, pp. 97-102.
    [8]
    D. Doran, S.S. Gokhale, Web robot detection techniques: Overview and limitations, Data Mining and Knowledge Discovery, 22 (2011) 183-210.
    [9]
    D. Doran, K. Morillo, S.S. Gokhale, A comparison of web robot and human requests, in: Proceedings of the 2013 IEEE/ACM international conference on advances in social networks analysis and mining, ACM, 2013, pp. 1374-1380.
    [10]
    D. Dubois, H. Prade, Rough fuzzy sets and fuzzy rough sets, International Journal of General System, 17 (1990) 191-209.
    [11]
    A. Dutta, C. Meilicke, H. Stuckenschmidt, Enriching structured knowledge with open information, in: Proceedings of the 24th international conference on world wide web, ACM, 2015, pp. 267-277.
    [12]
    T. Grini, L. Mri, J. aban, Lino An intelligent system for detecting malicious web-robots, in: Proceedings of Asian conference on intelligent information and database systems, Springer International Publishing, 2015, pp. 559-568.
    [13]
    E. Heinz, J. Selkrig, M. Belousoff, T. Lithgow, Evolution of the translocation and assembly module (TAM), Genome Biology and Evolution, 7 (2015) 1628-1643.
    [14]
    Z. Huang, Z. Xu, Q. Lu, Some new inequalities for the Hadamard product of a nonsingular M-matrix and its inverse, Linear and Multilinear Algebra, 64 (2016) 1362-1378.
    [15]
    Imam Reza International University. (Aug. 2015). http://www.imamreza.ac.ir.
    [16]
    Y.K. Jain, S.K. Bhandare, Min max normalization based data perturbation method for privacy protection, International Journal of Computer & Communication Technology, 2 (2011) 45-50.
    [17]
    G.K. Kanji, 100 statistical tests, Sage, 2006.
    [18]
    S. Kwon, M. Oh, D. Kim, J. Lee, Y.-G. Kim, S. Cha, Web robot detection based on monotonous behavior, in: Proceedings of the information science and industrial applications, Vol. 4, Springer-Verlag, 2012, pp. 43-48.
    [19]
    J. Lee, S. Cha, D. Lee, H. Lee, Classification of web robots: An empirical study based on over one billion requests, Computers & Security, 28 (2009) 795-802.
    [20]
    H. Liao, Z. Xu, Approaches to manage hesitant fuzzy linguistic information based on the cosine distance and similarity measures for HFLTSs and their application in qualitative decision making, Expert Systems with Applications, 42 (2015) 5328-5336.
    [21]
    A.G. Loureno, O.O. Belo, Catching web crawlers in the act, in: Proceedings of the 6th international conference on web engineering, Vol. 263, ACM, 2006, pp. 265-272.
    [22]
    S. Parthasarathy, Y. Ruan, V. Satuluri, Community discovery in social networks: Applications, methods and emerging trends, Springer, 2011.
    [23]
    Z. Pawlak, Rough sets, International Journal of Computer & Information Sciences, 11 (1982) 341-356.
    [24]
    Y. Qian, Q. Wang, H. Cheng, J. Liang, C. Dang, Fuzzy-rough feature selection accelerator, Fuzzy Sets and Systems, 258 (2015) 61-78.
    [25]
    A.M. Radzikowska, E.E. Kerre, A comparative study of fuzzy rough sets, Fuzzy Sets and Systems, 126 (2002) 137-155.
    [26]
    Y. Ruan, D. Fuhry, S. Parthasarathy, Efficient community detection in large networks using content and links, in: Proceedings of the 22nd international conference on world wide web, ACM, 2013, pp. 1089-1098.
    [27]
    H.N. Rude, D. Doran, Request type prediction for web robot and internet of things traffic, in: Proceedings of 2015 IEEE 14th international conference on machine learning and applications, IEEE, 2015, pp. 995-1000.
    [28]
    R. Sadeghi, J. Hamidzadeh, Automatic support vector data description, Journal of Soft Computing (2016) 1-12.
    [29]
    V. Satuluri, S. Parthasarathy, D. Ucar, Markov clustering of protein interaction networks with improved balance and scalability, in: Proceedings of the first ACM international conference on bioinformatics and computational biology, ACM, 2010, pp. 247-256.
    [30]
    D.S. Sisodia, S. Verma, O.P. Vyas, Agglomerative approach for identification and elimination of web robots from web server logs to extract knowledge about actual visitors, Journal of Data Analysis and Information Processing, 3 (2015) 1-10.
    [31]
    A. Stassopoulou, M.D. Dikaiakos, Web robot detection: A probabilistic reasoning approach, Computer Networks, 53 (2009) 265-278.
    [32]
    D. Stevanovic, A. An, N. Vlajic, Feature evaluation for web crawler detection with data mining techniques, Expert Systems with Applications, 39 (2012) 8707-8717.
    [33]
    D. Stevanovic, N. Vlajic, A. An, Detection of malicious and non-malicious website visitors using unsupervised neural network learning, Applied Soft Computing, 13 (2013) 698-708.
    [34]
    G. Suchacka, M. Sobkow, Detection of Internet robots using a Bayesian approach, in: Proceedings of 2015 IEEE 2nd international conference on cybernetics, IEEE, 2015, pp. 365-370.
    [35]
    Y. Sun, I.G. Councill, C.L. Giles, The ethicality of web crawlers, in: Proceedings of 2010 IEEE/WIC/ACM international conference on web intelligence and intelligent agent technology, Vol. 1, IEEE Computer Society, 2010, pp. 668-675.
    [36]
    L. Szilgyi, L. Medvs, S.M. Szilgyi, A modified Markov clustering approach to unsupervised classification of protein sequences, Neurocomputing, 73 (2010) 2332-2345.
    [37]
    P.-N. Tan, V. Kumar, Discovery of web robot sessions based on their navigational patterns, Data Mining and Knowledge Discovery, 6 (2002) 9-35.
    [38]
    S.M. Van Dongen, Graph clustering by flow simulation, University of Utrecht, The Netherlands, 2001.
    [39]
    N. Verbiest, C. Cornelis, F. Herrera, OWA-FRPS: A prototype selection method based on ordered weighted average fuzzy rough set theory, in: Proceedings of international workshop on rough sets, fuzzy sets, data mining, and granular computing, Vol. 8170, 2013, pp. 180-190.
    [40]
    M. Zabihi, M.V. Jahan, J. Hamidzadeh, A density based clustering approach for web robot detection, in: Proceedings of 2014 4th international e-conference on computer and knowledge engineering, IEEE, 2014, pp. 23-28.
    [41]
    M. Zabihi, M.V. Jahan, J. Hamidzadeh, A density based clustering approach to distinguish between web robot and human requests to a web server, The ISC International Journal of Information Security, 6 (2014) 77-89.

    Cited By

    View all
    • (2023)DISET: a distance based semi-supervised self-training for automated users’ agent activity detection from web access logMultimedia Tools and Applications10.1007/s11042-022-14258-082:13(19853-19876)Online publication date: 1-May-2023
    • (2023)Toward the search for the perfect blade runner: a large-scale, international assessment of a test that screens for “humanness sensitivity”AI & Society10.1007/s00146-022-01398-y38:4(1543-1563)Online publication date: 1-Aug-2023
    • (2022)Web Bot Detection Evasion Using Deep Reinforcement LearningProceedings of the 17th International Conference on Availability, Reliability and Security10.1145/3538969.3538994(1-10)Online publication date: 23-Aug-2022
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Expert Systems with Applications: An International Journal
    Expert Systems with Applications: An International Journal  Volume 87, Issue C
    November 2017
    377 pages

    Publisher

    Pergamon Press, Inc.

    United States

    Publication History

    Published: 30 November 2017

    Author Tags

    1. Fuzzy Rough Set Theory
    2. Malicious web agents
    3. Markov clustering algorithm
    4. Web Robot Detection
    5. Web crawler

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 12 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)DISET: a distance based semi-supervised self-training for automated users’ agent activity detection from web access logMultimedia Tools and Applications10.1007/s11042-022-14258-082:13(19853-19876)Online publication date: 1-May-2023
    • (2023)Toward the search for the perfect blade runner: a large-scale, international assessment of a test that screens for “humanness sensitivity”AI & Society10.1007/s00146-022-01398-y38:4(1543-1563)Online publication date: 1-Aug-2023
    • (2022)Web Bot Detection Evasion Using Deep Reinforcement LearningProceedings of the 17th International Conference on Availability, Reliability and Security10.1145/3538969.3538994(1-10)Online publication date: 23-Aug-2022
    • (2021)Detection of Advanced Web Bots by Combining Web Logs with Mouse Behavioural BiometricsDigital Threats: Research and Practice10.1145/34478152:3(1-26)Online publication date: 8-Jun-2021
    • (2020)Bot recognition in a Web storeJournal of Network and Computer Applications10.1016/j.jnca.2020.102577157:COnline publication date: 1-Jul-2020
    • (2020)Content-aware web robot detectionApplied Intelligence10.1007/s10489-020-01754-950:11(4017-4028)Online publication date: 7-Jul-2020
    • (2019)Towards a framework for detecting advanced Web botsProceedings of the 14th International Conference on Availability, Reliability and Security10.1145/3339252.3339267(1-10)Online publication date: 26-Aug-2019
    • (2019)A Broad Evaluation of the Tor English Content EcosystemProceedings of the 10th ACM Conference on Web Science10.1145/3292522.3326031(333-342)Online publication date: 26-Jun-2019
    • (2019)Concise Fuzzy System Modeling Integrating Soft Subspace Clustering and Sparse LearningIEEE Transactions on Fuzzy Systems10.1109/TFUZZ.2019.289557227:11(2176-2189)Online publication date: 30-Oct-2019
    • (2019)Fuzzy Rough Set Feature Selection to Enhance Phishing Attack Detection2019 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE)10.1109/FUZZ-IEEE.2019.8858884(1-6)Online publication date: 23-Jun-2019
    • Show More Cited By

    View Options

    View options

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media