Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Malicious sequential pattern mining for automatic malware detection

Published: 15 June 2016 Publication History
  • Get Citation Alerts
  • Abstract

    An effective framework using sequence mining technique is proposed for automatic malware detection.An efficient sequential pattern mining algorithm for discovering discriminative patterns between malware and benign samples.A new nearest neighbor classifier as the detection module to identify unknown malware.The strong results of the proposed framework compared with the existing malware detection methods in detecting new malicious samples. Due to its damage to Internet security, malware (e.g., virus, worm, trojan) and its detection has caught the attention of both anti-malware industry and researchers for decades. To protect legitimate users from the attacks, the most significant line of defense against malware is anti-malware software products, which mainly use signature-based method for detection. However, this method fails to recognize new, unseen malicious executables. To solve this problem, in this paper, based on the instruction sequences extracted from the file sample set, we propose an effective sequence mining algorithm to discover malicious sequential patterns, and then All-Nearest-Neighbor (ANN) classifier is constructed for malware detection based on the discovered patterns. The developed data mining framework composed of the proposed sequential pattern mining method and ANN classifier can well characterize the malicious patterns from the collected file sample set to effectively detect newly unseen malware samples. A comprehensive experimental study on a real data collection is performed to evaluate our detection framework. Promising experimental results show that our framework outperforms other alternate data mining based detection methods in identifying new malicious executables.

    References

    [1]
    N. Abdelhamid, A. Ayesh, F. Thabtah, Phishing detection based associative classification data mining, Expert Systems with Applications, 41 (2014) 5948-5959.
    [2]
    Ahmadi, M., Giacinto, G., Ulyanov, D., Semenov, S. Trofimov, M. (2015). Novel feature extraction, selection and fusion for effective malware family classification. arXiv: http://arxiv.org/abs/1511.04317.
    [3]
    M. Ahmadi, A. Sami, H. Rahimi, B. Yadegari, Malware detection by behavioural sequential patterns, Computer Fraud & Security, 2013 (2013) 11-19.
    [4]
    T.H. Austin, E. Filiol, S. Josse, M. Stamp, Exploring hidden markov models for virus analysis: a semantic approach, 2013.
    [5]
    Z. Bazrafshan, H. Hashemi, S.M.H. Fard, A. Hamzeh, A survey on heuristic malware detection techniques, 2013.
    [6]
    L. Bing, H. Wynne, Y. Ma, Integrating classification and association rule mining, 1998.
    [7]
    C32Asm (2011). https://tuts4you.com/download.php?view.1130. Accessed 22.06.14.
    [8]
    M. Egele, T. Scholte, E. Kirda, C. Kruegel, A survey on automated dynamic malware-analysis techniques and tools, Computing Surveys, 44 (2012) 6.
    [9]
    K. Griffin, S. Schneider, X. Hu, T.C. Chiueh, Automatic generation of string signatures for malware detection, 2009.
    [10]
    G. Guo, H. Wang, D. Bell, Y. Bi, K. Greer, KNN model-based approach in classification, Springer, 2003.
    [11]
    J. Han, M. Kamber, J. Pei, Morgan Kaufmann, 2006.
    [12]
    S.A. Hofmeyr, S. Forrest, A. Somayaji, Intrusion detection using sequences of system calls, Journal of Computer Security, 6 (1998) 151-180.
    [13]
    M. Jain, P. Bajaj, Techniques in detection and analyzing malware executables: A review, International Journal of Computer Science and Mobile Computing, 3 (2014) 930-933.
    [14]
    J.O. Kephart, W.C. Arnold, Automatic extraction of computer virus signatures, 1994.
    [15]
    D. Lo, H. Cheng, J. Han, S. Khoo, C. Sun, Classification of software behaviors for failure detection: a discriminative pattern mining approach, 2009.
    [16]
    M. Narouei, M. Ahmadi, G. Giacinto, H. Takabi, A. Sami, DLLMiner: Structural mining for malware detection, Security and Communication Networks, 8 (2015) 3311-3322.
    [17]
    N. Nissim, R. Moskovitch, L. Rokach, Y. Elovici, Novel active learning methods for enhanced PC malware detection in windows OS, Expert Systems with Applications, 41 (2014) 5843-5857.
    [18]
    Y. Qiao, Y. Yang, J. He, C. Tang, Z. Liu, CBM: Free, automatic malware analysis framework using API call sequences, Springer, 2014.
    [19]
    B.B. Rad, M. Masrom, S. Ibrahim, Opcodes histogram for classifying metamorphic portable executables malware, 2012.
    [20]
    McAfee Labs (2015). McAfee Labs threats report: May 2015. http://www.mcafee.com/us/resources/reports/rpquarterlythreatq12015.pdf. Accessed 17.12.15.
    [21]
    N. Runwal, R.M. Low, M. Stamp, Opcode graph similarity and metamorphic detection, Journal in Computer Virology, 8 (2012) 37-52.
    [22]
    I. Santos, F. Brezo, J. Nieves, Y.K. Penya, B. Sanz, C. Laorden, P.G. Bringas, Idea: Opcode-sequence-based malware detection, Springer, 2010.
    [23]
    M.G. Schultz, E. Eskin, E. Zadok, S.J. Stolfo, Data mining methods for detection of new malicious executables, 2001.
    [24]
    A. Shabtai, R. Moskovitch, C. Feher, S. Dolev, Y. Elovici, Detecting unknown malicious code by applying classification techniques on opcode patterns, Security Informatics, 1 (2012) 1-22.
    [25]
    Y. Shen, Z. Zhang, Q. Yang, Objective-oriented utility-based association mining, 2002.
    [26]
    P. Soucy, G.W. Mineau, Beyond TFIDF weighting for text categorization in the vector space model, 2005.
    [27]
    R. Srikant, R. Agrawal, Springer, 1996.
    [28]
    W.C. Sun, Y.M. Chen, A rough set approach for automatic key attributes identification of zero-day polymorphic worms, Expert Systems with Applications, 36 (2009) 4672-4679.
    [29]
    G.G. Sundarkumar, V. Ravi, I. Nwogu, V. Govindaraju, Malware detection via API calls, topic models and machine learning, 2015.
    [30]
    Symantec (2015). Symantec intelligent report: October 2015. http://www.symantec.com/content/en/us/enterprise/otherresources/b-intelligencereport102015enus.pdf. Accessed 17.12.15.
    [31]
    D. Uppal, R. Sinha, V. Mehra, V. Jain, Malware detection and classification based on extraction of API sequences, 2014.
    [32]
    T. Wchner, M. Ochoa, A. Pretschner, Malware detection with quantitative data flow graphs, 2014.
    [33]
    Y. Yang, J.O. Pedersen, A comparative study on feature selection in text categorization, 1997.
    [34]
    Y. Ye, T. Li, Y. Chen, Q. Jiang, Automatic malware categorization using cluster ensemble, 2010.
    [35]
    Y. Ye, D. Wang, T. Li, D. Ye, Q. Jiang, An intelligent PE-malware detection system based on association mining, Journal in computer virology, 4 (2008) 323-334.
    [36]
    Y. Zeng, Y. Yang, L. Zhao, Pseudo nearest neighbor rule for pattern classification, Expert Systems with Applications, 36 (2009) 3587-3595.
    [37]
    J.F. Zhang, L.F. Chen, G.D. Guo, Hierarchical feature selection method for detection of obfuscated malicious code, Journal of Computer Applications, 32 (2012) 2761-2767.

    Cited By

    View all

    Index Terms

    1. Malicious sequential pattern mining for automatic malware detection
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image Expert Systems with Applications: An International Journal
        Expert Systems with Applications: An International Journal  Volume 52, Issue C
        June 2016
        108 pages

        Publisher

        Pergamon Press, Inc.

        United States

        Publication History

        Published: 15 June 2016

        Author Tags

        1. Classification
        2. Instruction sequence
        3. Malware detection
        4. Sequential pattern mining

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)A systematic literature review on Windows malware detectionJournal of Systems and Software10.1016/j.jss.2023.111921209:COnline publication date: 14-Mar-2024
        • (2024)A hybrid approach for Android malware detection using improved multi-scale convolutional neural networks and residual networksExpert Systems with Applications: An International Journal10.1016/j.eswa.2024.123675249:PBOnline publication date: 1-Sep-2024
        • (2024)MeMalDetComputers and Security10.1016/j.cose.2024.103864142:COnline publication date: 1-Jul-2024
        • (2024)Detection, characterization, and profiling DoH Malicious traffic using statistical pattern recognitionInternational Journal of Information Security10.1007/s10207-023-00790-z23:2(1293-1316)Online publication date: 1-Apr-2024
        • (2023)Anomaly Rule Detection in Sequence DataIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.313908635:12(12095-12108)Online publication date: 1-Dec-2023
        • (2023)Cyber Incidents Risk Assessments Using Feature AnalysisSN Computer Science10.1007/s42979-023-02199-w5:1Online publication date: 15-Nov-2023
        • (2023)Fast and Efficient Malware Detection with Joint Static and Dynamic Features Through Transfer LearningApplied Cryptography and Network Security10.1007/978-3-031-33488-7_19(503-531)Online publication date: 19-Jun-2023
        • (2022)CVGuard: Mitigating Application Attacks on Connected Vehicles2022 IEEE Intelligent Vehicles Symposium (IV)10.1109/IV51971.2022.9827191(623-630)Online publication date: 4-Jun-2022
        • (2022)Efficient mining of concept-hierarchy aware distinguishing sequential patternsKnowledge-Based Systems10.1016/j.knosys.2022.109710255:COnline publication date: 14-Nov-2022
        • (2022)Mining trading patterns of pyramid schemes from financial time series dataFuture Generation Computer Systems10.1016/j.future.2022.02.017134:C(388-398)Online publication date: 1-Sep-2022
        • Show More Cited By

        View Options

        View options

        Get Access

        Login options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media