Abstract
Recent work has presented hidden Markov models (HMMs) as a compelling option for malware identification. However, some advanced metamorphic malware like MetaPHOR and MWOR have proven to be more challenging to detect with these techniques. In this paper, we develop the dueling HMM Strategy, which leverages our knowledge about different compilers for more precise identification. We also show how this approach may be combined with previous techniques to minimize the performance overhead. Additionally, we examine the HMMs in order to identify the meaning of these hidden states. We examine HMMs for four different compilers, hand-written assembly code, three virus construction kits, and two metamorphic malware families in order to note similarities and differences in the hidden states of the HMMs.
Similar content being viewed by others
Notes
An expanded version of this section discussing hidden Markov models is available at http://www.cs.sjsu.edu/~stamp/RUA/HMM.
Alternately, we could reasonably define “most likely” as the state sequence with the highest probability from among all possible state sequences. Dynamic programming (DP) can be used to efficiently find this particular solution. Note that the DP solution and the HMM solution are not necessarily the same.
In the dynamic programming (DP) sense, we would simply choose the sequence with the highest probability, namely \(UUUG\). Note that this differs from the optimal solution in the HMM sense.
While NGVCK remains difficult to detect, its false positive rate plummets.
References
Annachhatre, C., Austin, T.H., Stamp, M.: Hidden markov models for malware classification. J. Comput. Virol. Hack. Tech. pp. 1–15 (2014). doi: 10.1007/s11416-014-0215-x
Attaluri, S., McGhee, S., Stamp, M.: Profile hidden markov models and metamorphic virus detection. J. Comput. Virol. 5, 151–169 (2009). doi:10.1007/s11416-008-0105-1
Austin, T.H., Filiol, E., Josse, S., Stamp, M.: Exploring hidden markov models for virus analysis: a semantic approach. In: IEEE HICSS, pp. 5039–5048 (2013)
Bruschi, D., Martignoni, L., Monga, M.: Detecting self-mutating malware using control-flow graph matching. In: DIMVA (2006)
Cave, R.L., Neuwirth, L.P.: Hidden markov models for english. In: Ferguson, J.D. (ed) Hidden Markov Models for Speech (1980)
Chen, S.F., Goodman, J.: An empirical study of smoothing techniques for language modeling. In: Association for computational linguistics (1996). doi: 10.3115/981863.981904
Chess, D.M., White, S.R.: An undetectable computer virus. In: Virus bulletin conference (2000)
Cho, S.B., Han, S.J.: Two sophisticated techniques to improve hmm-based intrusion detection systems. In: RAID (2003)
Christodorescu, M., Jha, S.: Testing malware detectors. In: ISSTA (2004)
Christodorescu, M., Jha, S., Seshia, S.A., Song, D.X., Bryant, R.E.: Semantics-aware malware detection. In: Symposium on security and privacy (2005)
Clang: a C language family frontend for LLVM. http://www.clang.llvm.org. Accessed November 2011
Driller, T.M.: Metamorphic permutating high-obfuscating reassembler source. http://vx.netlux.org/29a/29a-6/29a-6.602. Accessed December 2011
Filiol, E., Josse, S.: A statistical model for undecidable viral detection. J. Comput. Virol. 3, 64–74 (2007). doi:10.1007/s11416-007-0041-5
Filiol, E., Josse, S.: Malware spectral analysis: security evaluation of Bayesian network based detection models. In: EICAR conference (2011)
Francois, J.M.: JAHMM: An implementation of hidden Markov models in Java. http://code.google.com/p/jahmm/. Accessed October 2011
GCC, the GNU compiler collection. http://gcc.gnu.org/. Accessed November 2011
Iliopoulos, D., Adami, C., Szor, P.: Darwin inside the machines: malware evolution and the consequences for computer security. CoRR abs/1111.2503 (2011)
Intersimone, D.: Antique software: Turbo C version 2.01. http://edn.embarcadero.com/article/20841. Accessed November 2011
Krügel, C., Kirda, E., Mutz, D., Robertson, W.K., Vigna, G.: Polymorphic worm detection using structural information of executables. In: RAID (2005)
Leder, F., Steinbock, B., Martini, P.: Classification and detection of metamorphic malware using value set analysis. In: International conference on malicious and unwanted software MALWARE (2009)
Lin, D., Stamp, M.: Hunting for undetectable metamorphic viruses. J. Comput. Virol. 7(3), 201–214 (2011)
Madenur Sridhara, S., Stamp, M.: Metamorphic worm that carries its own morphing engine. J. Comput. Virol. 9(2), 49–58 (2013). doi:10.1007/s11416-012-0174-z
MinGW | the minimalist GNU for Windows. http://www.mingw.org/. Accessed November 2011
Mohammed, M.: Zeroing in on metaphoric computer viruses. Master’s thesis, University of Louisiana at Lafayette (2003)
SnakeByte: next generation virus construktion kit. http://vxheavens.com/vx.php?id=tn02. Accessed December 2011
Song, Y., Locasto, M.E., Stavrou, A., Keromytis, A.D., Stolfo, S.J.: On the infeasibility of modeling polymorphic shellcode—re-thinking the role of learning in intrusion detection systems. Mach. Learn. 81(2), 179–205 (2010)
Stamp, M.: A revealing introduction to hidden Markov models (2004). http://www.cs.sjsu.edu/faculty/stamp/RUA/HMM.pdf. Accessed October 2011
Symantec security response: W32.simile. http://www.symantec.com/security_response/writeup.jsp?docid=2002-030617-5423-99. Accessed December 2011
Szor, P.: The Art of Computer Virus Research and Defense. Addison Wesley, Boston (2005)
Wong, W., Stamp, M.: Hunting for metamorphic engines. J. Comput. Virol. 2(3), 211–229 (2006)
Zhang, Q., Reeves, D.S.: Metaaware: identifying metamorphic malware. In: ACSAC (2007)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kalbhor, A., Austin, T.H., Filiol, E. et al. Dueling hidden Markov models for virus analysis. J Comput Virol Hack Tech 11, 103–118 (2015). https://doi.org/10.1007/s11416-014-0232-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11416-014-0232-9