Abstract
We introduce a novel malware detection algorithm based on the analysis of graphs constructed from dynamically collected instruction traces of the target executable. These graphs represent Markov chains, where the vertices are the instructions and the transition probabilities are estimated by the data contained in the trace. We use a combination of graph kernels to create a similarity matrix between the instruction trace graphs. The resulting graph kernel measures similarity between graphs on both local and global levels. Finally, the similarity matrix is sent to a support vector machine to perform classification. Our method is particularly appealing because we do not base our classifications on the raw n-gram data, but rather use our data representation to perform classification in graph space. We demonstrate the performance of our algorithm on two classification problems: benign software versus malware, and the Netbull virus with different packers versus other classes of viruses. Our results show a statistically significant improvement over signature-based and other machine learning-based detection methods.
Similar content being viewed by others
References
Aspack software. http://www.aspack.com/asprotect.html, Accessed 5 August 2010
Bach, F.R., Lanckriet, G.R.G., Jordan, M.I.: Multiple kernel learning, conic duality, and the smo algorithm. In: Proceedings of the Twenty-First International Conference on Machine Learning, ICML’04, p. 6. ACM, New York (2004)
Ben-Hur, A.: Pyml: machine learning in python. http://pyml.sourceforge.net/, Accessed 28 July 2010
Bishop C.M.: Pattern Recognition and Machine Learning (Information Science and Statistics). Springer, New York (2006)
Bruschi, D., Martignoni, L., Monga, M.: Detecting self-mutating malware using control-flow graph matching. In: Bschkes, R., Laskov, P. (eds.) Detection of Intrusions and Malware and Vulnerability Assessment. Lecture Notes in Computer Science, vol. 4064, pp. 129–143. Springer, Berlin (2006)
Burges C.J.C.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Discov. 2, 121–167 (1998)
Cardie, C., Nowe, N.: Improving minority class prediction using case-specific feature weights. In: Proceedings of the Fourteenth International Conference on Machine Learning, ICML’97, pp. 57–65. Morgan Kaufmann Publishers Inc, San Francisco (1997)
Cesare, S., Xiang, Y.: Classification of malware using structured control flow. In: Proceedings of the Eighth Australasian Symposium on Parallel and Distributed Computing, vol. 107, AusPDC ’10, pp. 61–70. Australian Computer Society Inc, Darlinghurst (2010)
Christodorescu, M., Jha, S.: Static analysis of executables to detect malicious patterns. In: In Proceedings of the 12th USENIX Security Symposium, pp. 169–186 (2003)
Chung, F.R.K.: Spectral Graph Theory (CBMS Regional Conference Series in Mathematics, No. 92). American Mathematical Society, Providence (1997)
Dai J., Guha R., Lee J.: Efficient virus detection using dynamic instruction sequences. J. Comput. 4(5), 405–414 (2009)
Dinaburg, A., Royal, P., Sharif, M., Lee, W.: Ether: malware analysis via hardware virtualization extensions. In: Proceedings of the 15th ACM conference on Computer and communications security, CCS ’08, pp. 51–62. ACM, New York (2008)
UPX: The Ultimate Packer for eXecutables. http://upx.sourceforge.net/, Accessed 16 August 2010
Hotelling H.: Analysis of a complex of statistical variables into principal components. J. Educ. Psychol. 24(6), 417–441 (1933)
Hu, X., Chiueh, T.-c., Shin, K.G.: Large-scale malware indexing using function-call graphs. In: Proceedings of the 16th ACM Conference on Computer and Communications Security, CCS’09, pp. 611–620. ACM, New York (2009)
Lee, Y.J., Mangasarian, O.L.: Rsvm: reduced support vector machines. In: Data Mining Institute, Computer Sciences Department, University of Wisconsin, pp. 00–07 (2001)
Karim Md, Walenstein A., Lakhotia A., Parida L.: Malware phylogeny generation using permutations of code. J. Comput. Virol. 1, 13–23 (2005)
Kashima H., Tsuda K., Inokuchi A.: Kernels for Graphs. MIT Press, Massachusetts (2004)
Kolter, J.Z., Maloof, M.A.: Learning to detect malicious executables in the wild. In: KDD ’04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 470–478. ACM, New York (2004)
Kruegel, C., Kirda, E., Mutz, D., Robertson, W., Vigna, G.: Polymorphic worm detection using structural information of executables. In: Valdes, A., Zamboni, D. (eds.) Recent Advances in Intrusion Detection. Lecture Notes in Computer Science, vol. 3858, pp. 207–226. Springer, Berlin (2006)
Lawton, K., Denney, B., Guarneri, N.D., Ruppert, V., Bothamy, C.: Bochs user manual. Online user manual, November 2010
Luxburg U.: A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416 (2007)
Microsoft, Inc. IsDebuggerPresent function. http://msdn.microsoft.com/en-us/library/ms680345(VS.85).aspx, October 2010
Organisation for Economic Co-operation and Development. Malicious software (malware): A security threat to the internet economy. White Paper, June 2008
Panda Security. Panda labs annual report 2009. White Paper, January 2010
Quist, D., Liebrock, L., Neil, J.: Improving antivirus accuracy with hypervisor assisted analysis. J. Comput. Virol 1–11 (2010). doi:10.1007/s11416-010-0142-4
Reddy, D., Dash, S., Pujari, A.: New malicious code detection using variable length n-grams. In: Information Systems Security. Lecture Notes in Computer Science, vol. 4332, pp. 276–288. Springer, Berlin (2006)
Reddy D., Pujari A.: N-gram analysis for computer virus detection. J. Comput. Virol. 2, 231–239 (2006)
Rieck, K., Holz, T., Willems, C., Dssel, P., Laskov, P.: Learning and classification of malware behavior. In: Zamboni, D. (ed) Detection of Intrusions and Malware, and Vulnerability Assessment. Lecture Notes in Computer Science, vol. 5137, pp. 108–125. Springer, Berlin (2008)
Wang, K., Stolflo, S.J., Li, W.J.: Fileprint analysis for malware detection. In: ACM CCS WORM (2005)
Schölkopf B., Smola A.J.: Learning with Kernels. MIT Press, Massachusetts (2002)
Shafiq, M., Khayam, S., Farooq, M.: Embedded malware detection using markov n-grams. In: Detection of Intrusions and Malware, and Vulnerability Assessment. Lecture Notes in Computer Science, vol. 5137, pp. 88–107. Springer, Berlin (2008)
Shankarapani, M., Ramamoorthy, S., Movva, R., Mukkamala, S.: Malware detection using assembly and api call sequences. J. Comput. Virol. pp. 1–13 (2010). doi:10.1007/s11416-010-0141-5
RDGMax Software. RDG Tejon Crypter. Software package, November 2010
Sonnenburg, S., Raetsch, G., Schaefer, C.: A general and efficient multiple kernel learning algorithm (2006)
Stolfo, S., Wang, K., Li, W.J.: Towards stealthy malware detection. In: Malware Detection. Advances in Information Security, vol. 27, pp. 231–249. Springer, Berlin (2007)
Wagner, C., Wagener, G., State, R., Engel, T.: Malware analysis with graph kernels and support vector machines. In: Malicious and Unwanted Software (MALWARE), 2009 4th International Conference, pp. 63–68 (2009)
Walenstein, A., Venable, M., Hayes, M., Thompson, C., Lakhotia, A.: Exploiting similarity between variants to defeat malware (2008)
Li, T., Ye, Y., Wang, D., Ye, D.: Imds: Intelligent malware detection system. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2007)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Anderson, B., Quist, D., Neil, J. et al. Graph-based malware detection using dynamic analysis. J Comput Virol 7, 247–258 (2011). https://doi.org/10.1007/s11416-011-0152-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11416-011-0152-x