Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1599272.1599278acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Malware detection using statistical analysis of byte-level file content

Published: 28 June 2009 Publication History
  • Get Citation Alerts
  • Abstract

    Commercial anti-virus software are unable to provide protection against newly launched (a.k.a "zero-day") malware. In this paper, we propose a novel malware detection technique which is based on the analysis of byte-level file content. The novelty of our approach, compared with existing content based mining schemes, is that it does not memorize specific byte-sequences or strings appearing in the actual file content. Our technique is non-signature based and therefore has the potential to detect previously unknown and zero-day malware. We compute a wide range of statistical and information-theoretic features in a block-wise manner to quantify the byte-level file content. We leverage standard data mining algorithms to classify the file content of every block as normal or potentially malicious. Finally, we correlate the block-wise classification results of a given file to categorize it as benign or malware. Since the proposed scheme operates at the byte-level file content; therefore, it does not require any a priori information about the filetype. We have tested our proposed technique using a benign dataset comprising of six different filetypes --- DOC, EXE, JPG, MP3, PDF and ZIP and a malware dataset comprising of six different malware types --- backdoor, trojan, virus, worm, constructor and miscellaneous. We also perform a comparison with existing data mining based malware detection techniques. The results of our experiments show that the proposed nonsignature based technique surpasses the existing techniques and achieves more than 90% detection accuracy.

    References

    [1]
    Symantec Internet Security Threat Reports I-XI (Jan 2002---Jan 2008).
    [2]
    F-Secure Corporation, "F-Secure Reports Amount of Malware Grew by 100% during 2007", Press release, 2007.
    [3]
    A. Stepan, "Improving Proactive Detection of Packed Malware", Virus Buletin, March 2006, available at http://www.virusbtn.com/virusbulletin/archive/2006/03/vb200603-packed.dkb
    [4]
    R. Perdisci, A. Lanzi, W. Lee, "Classification of Packed Executables for Accurate Computer Virus Detection", Pattern Recognition Letters, 29(14), pp. 1941--1946, Elsevier, 2008.
    [5]
    AVG Free Antivirus, available at http://free.avg.com/.
    [6]
    Panda Antivirus, available at http://www.pandasecurity.com/.
    [7]
    M. G. Schultz, E. Eskin, E. Zadok, S. J. Stolfo, "Data mining methods for detection of new malicious executables", IEEE Symposium on Security and Privacy, pp. 38--49, USA, IEEe Press, 2001.
    [8]
    J. Z. Kolter, M. A. Maloof, "Learning to detect malicious executables in the wild", ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 470--478, USA, 2004.
    [9]
    J. Kephart, G. Sorkin, W. Arnold, D. Chess, G. Tesauro, S. White, "Biologically inspired defenses against computer viruses", International Joint Conference on Artificial Intelligence (IJCAI), pp. 985--996, USA, 1995.
    [10]
    R. W. Lo, K. N. Levitt, R. A. Olsson, "MCF: A malicious code filter", Computers&Security, 14(6):541--566, Elseveir, 1995.
    [11]
    O. Henchiri, N. Japkowicz, "A Feature Selection and Evaluation Scheme for Computer Virus Detection", IEEE International Conference on Data Mining (ICDM), pp. 891--895, USA, IEEE Press, 2006.
    [12]
    P. Kierski, M. Okoniewski, P. Gawrysiak, "Automatic Classification of Executable Code for Computer Virus Detection", International Conference on Intelligent Information Systems, pp. 277--284, Springer, Poland, 2003.
    [13]
    T. Abou-Assaleh, N. Cercone, V. Keselj, R. Sweidan. "Detection of New Malicious Code Using N-grams Signatures", International Conference on Intelligent Information Systems, pp. 193--196, Springer, Poland, 2003.
    [14]
    J. H. Wang, P. S. Deng, "Virus Detection using Data Mining Techniques", IEEE International Carnahan Conference on Security Technology, pp. 71--76, IEEE Press, 2003.
    [15]
    W. J. Li, K. Wang, S. J. Stolfo, B. Herzog, "Fileprints: identifying filetypes by n-gram analysis", IEEE Information Assurance Workshop, USA, IEEE Press, 2005.
    [16]
    S. J. Stolfo, K. Wang, W. J. Li, "Towards Stealthy Malware Detection", Advances in Information Security, Vol. 27, pp. 231--249, Springer, USA, 2007.
    [17]
    W. J. Li, S. J. Stolfo, A. Stavrou, E. Androulaki, A. D. Keromytis, "A Study of Malcode-Bearing Documents", International Conference on Detection of Intrusions&Malware, and Vulnerability Assessment (DIMVA), pp. 231--250, Springer, Switzerland, 2007.
    [18]
    M. Z. Shafiq, S. A. Khayam, M. Farooq, "Embedded Malware Detection using Markov n-Grams", International Conference on Detection of Intrusions&Malware, and Vulnerability Assessment (DIMVA), pp. 88--107, Springer, France, 2008.
    [19]
    M. Christodorescu, S. Jha, and C. Kruegal, "Mining Specifications of Malicious Behavior", European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE 2007), pp. 5--14, Croatia, 2007.
    [20]
    Frans Veldman, "Heuristic Anti-Virus Technology", International Virus Bulletin Conference, pp. 67--76, USA, 1993, available at http://mirror.sweon.net/madchat/vxdevl/vdat/epheurs1.htm.
    [21]
    Jay Munro, "Antivirus Research and Detection Techniques", Antivirus Research and Detection Techniques, ExtremeTech, 2002, available at http://www.extremetech.com/article2/0,2845, 367051,00.asp.
    [22]
    D. W. Aha, D. Kibler, M. K. Albert, "Instance-based learning algorithms", Journal of Machine Learning, Vol. 6, pp. 37--66, 1991.
    [23]
    M. E. Maron, J. L. Kuhns, "On relevance, probabilistic indexing and information retrieval", Journal of the Association of Computing Machinery, 7(3), pp. 216--244, 1960.
    [24]
    Y. Freund, R. E. Schapire, "A decision-theoretic generalization of on-line learning and an application to boosting", Journal of Computer and System Sciences, No. 55, pp. 23--37, 1997
    [25]
    J. R. Quinlan, "C4.5: Programs for machine learning", Morgan Kaufmann, USA, 1993.
    [26]
    I. H. Witten, E. Frank, "Data mining: Practical machine learning tools and techniques", Morgan Kaufmann, 2nd edition, USA, 2005.
    [27]
    VX Heavens Virus Collection, VX Heavens website, available at http://vx.netlux.org
    [28]
    J. Oberheide, E. Cooke, F. Jahanian. "CloudAV: N-Version Antivirus in the Network Cloud", USENIX Security Symposium, pp. 91--106, USA, 2008.
    [29]
    T. Fawcett, "ROC Graphs: Notes and Practical Considerations for Researchers", TR HPL-2003-4, HP Labs, USA, 2004.
    [30]
    S. D. Walter, "The partial area under the summary ROC curve", Statistics in Medicine, 24(13), pp. 2025--2040, 2005.
    [31]
    T. M. Cover, J. A. Thomas, "Elements of Information Theory", Wiley-Interscience, 1991.

    Cited By

    View all
    • (2024)Design and Performance Analysis of an Anti-Malware System Based on Generative Adversarial Network FrameworkIEEE Access10.1109/ACCESS.2024.335845412(27683-27708)Online publication date: 2024
    • (2024)A hybrid approach for Android malware detection using improved multi-scale convolutional neural networks and residual networksExpert Systems with Applications: An International Journal10.1016/j.eswa.2024.123675249:PBOnline publication date: 1-Sep-2024
    • (2023)Web-Based Malware Detection System Using Convolutional Neural NetworkDigital10.3390/digital30300173:3(273-285)Online publication date: 12-Sep-2023
    • Show More Cited By

    Index Terms

    1. Malware detection using statistical analysis of byte-level file content

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CSI-KDD '09: Proceedings of the ACM SIGKDD Workshop on CyberSecurity and Intelligence Informatics
      June 2009
      94 pages
      ISBN:9781605586694
      DOI:10.1145/1599272
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 28 June 2009

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. computer malware
      2. data mining
      3. forensics

      Qualifiers

      • Research-article

      Conference

      KDD09
      Sponsor:

      Upcoming Conference

      KDD '24

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)47
      • Downloads (Last 6 weeks)1
      Reflects downloads up to

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Design and Performance Analysis of an Anti-Malware System Based on Generative Adversarial Network FrameworkIEEE Access10.1109/ACCESS.2024.335845412(27683-27708)Online publication date: 2024
      • (2024)A hybrid approach for Android malware detection using improved multi-scale convolutional neural networks and residual networksExpert Systems with Applications: An International Journal10.1016/j.eswa.2024.123675249:PBOnline publication date: 1-Sep-2024
      • (2023)Web-Based Malware Detection System Using Convolutional Neural NetworkDigital10.3390/digital30300173:3(273-285)Online publication date: 12-Sep-2023
      • (2023)Do NoT Open (DOT): A Unified Generic and Specialized Models for Detecting Malicious Email Attachments2023 IEEE 22nd International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)10.1109/TrustCom60117.2023.00072(412-421)Online publication date: 1-Nov-2023
      • (2023)A Data-plane Approach for Detecting Malware in IoT Networks2023 International Conference on Information Networking (ICOIN)10.1109/ICOIN56518.2023.10048918(578-583)Online publication date: 11-Jan-2023
      • (2023)Malware Detection Based on Deep Learning2023 3rd International Conference on Computing and Information Technology (ICCIT)10.1109/ICCIT58132.2023.10273961(427-432)Online publication date: 13-Sep-2023
      • (2023)Application of deep reinforcement learning in attacking and protecting structural features-based malicious PDF detectorFuture Generation Computer Systems10.1016/j.future.2022.11.015141(325-338)Online publication date: Apr-2023
      • (2023)A Survey of strategy-driven evasion methods for PE malware: Transformation, concealment, and attackComputers & Security10.1016/j.cose.2023.103595(103595)Online publication date: Nov-2023
      • (2022)PDF Malware Detection Based on Optimizable Decision TreesElectronics10.3390/electronics1119314211:19(3142)Online publication date: 30-Sep-2022
      • (2022)Machine Learning Analysis of Memory Images for Process Characterization and Malware Detection2022 52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W)10.1109/DSN-W54100.2022.00035(162-169)Online publication date: Jun-2022
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media