Abstract
Today, malware threats are more dangerous than ever with thousand of new samples emerging everyday. There exists a wide range of static and dynamic tools to detect malware signatures. Unfortunately, most of those tools are helpless when coming to automatic detection of polymorphic malwares, i.e., malware signature variants belonging to the same family. Recent work propose to handle those difficulties with symbolic execution and machine learning. Contrary to classical analysis, symbolic execution offers a deep exploration of malware’s code and, consequently, contribute to building more informative signatures. Those can then be generalized to an entire family via machine learning training. The contribution of this tool paper is the presentation of SEMA - a Symbolic Execution open-source toolchain for Malware Analysis. SEMA is based on a dedicated extension of ANGR, a well-known symbolic analyser that can be used to extract API calls and their corresponding arguments. Especially, we extend ANGR with strategies to create representative signatures based on System Call Dependency graph (SCDG). Those SCDGs can be exploited in two machine learning modules based on graphs and vectors. Last but not least, SEMA offers the first federating learning module for symbolic malware analysis.
Charles-Henry Bertrand Van Ouytsel is FRIA grantee of the Belgian Fund for Scientific Research (FNRS-F.R.S.).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
MalwareBazaar. https://bazaar.abuse.ch/
Volatility. https://github.com/volatilityfoundation/volatility
Accenture: Eighth Annual Cost of Cybercrime Study. https://www.accenture.com/us-en/insights/security/eighth-annual-cost-cybercrime-study
Afianian, A., Niksefat, S., Sadeghiyan, B., Baptiste, D.: Malware dynamic analysis evasion techniques: a survey. ACM Comput. Surv. (CSUR) 52(6), 1–28 (2019)
Bertrand Van Ouytsel, C.H., Crochet, C., Dam, K.H.T., Legay, A.: SEMA. https://github.com/csvl/SEMA-ToolChain
Bertrand Van Ouytsel, C.H., Dam, K.H.T., Legay, A.: Symbolic analysis meets federated learning to enhance malware identifier. In: ARES (2022)
Bertrand Van Ouytsel, C.H., Legay, A.: Malware analysis with symbolic execution and graph kernel. arXiv preprint arXiv:2204.05632 (2022)
Biondi, F., Given-Wilson, T., Legay, A., Puodzius, C., Quilbeuf, J.: Tutorial: an overview of malware detection and evasion techniques. In: Margaria, T., Steffen, B. (eds.) ISoLA 2018. LNCS, vol. 11244, pp. 565–586. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-03418-4_34
Bonfante, G., Kaczmarek, M., Marion, J.Y.: Architecture of a morphological malware detector. J. Comput. Virol. 5(3), 263–270 (2009)
Brumley, D., Hartwig, C., Liang, Z., Newsome, J., Song, D., Yin, H.: Automatically identifying trigger-based behavior in malware. In: Lee, W., Wang, C., Dagon, D. (eds.) Botnet Detection. ADIS, vol. 36, pp. 65–88. Springer, Boston (2008). https://doi.org/10.1007/978-0-387-68768-1_4
Cadar, C., Sen, K.: Symbolic execution for software testing: three decades later. Commun. ACM 56(2), 82–90 (2013)
Canali, D., Lanzi, A., Balzarotti, D., Kruegel, C., Christodorescu, M., Kirda, E.: A quantitative study of accuracy in system call-based malware detection. In: ISSTA, pp. 122–132 (2012)
Christodorescu, M., Jha, S., Kruegel, C.: Mining specifications of malicious behavior. In: FSE, pp. 5–14 (2007)
Dam, K.H.T., Given-Wilson, T., Legay, A.: Unsupervised behavioural mining and clustering for malware family identification. In: SAC 2021, pp. 374–383. ACM (2021)
Dam, K.H.T., Touili, T.: STAMAD: a static malware detector. In: ARES (2019)
Fredrikson, M., Jha, S., Christodorescu, M., Sailer, R., Yan, X.: Synthesizing near-optimal malware specifications from suspicious behaviors. In: S &P. IEEE (2010)
Galvez, R., Moonsamy, V., Díaz, C.: Less is more: a privacy-respecting android malware classifier using federated learning. PET 2021(4), 96–116 (2021)
Jamalpur, S., Navya, Y.S., Raja, P., Tagore, G., Rao, G.R.K.: Dynamic malware analysis using cuckoo sandbox. In: ICICCT, pp. 1056–1060. IEEE (2018)
Macedo, H.D., Touili, T.: Mining malware specifications through static reachability analysis. In: Crampton, J., Jajodia, S., Mayes, K. (eds.) ESORICS 2013. LNCS, vol. 8134, pp. 517–535. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40203-6_29
Microsoft SEAL. https://github.com/Microsoft/SEAL
Shoshitaishvili, Y., et al.: SoK: (state of) the art of war: offensive techniques in binary analysis. In: S &P (2016)
VirusTotal: Yara. http://virustotal.github.io/yara/
WebRoot: Next Generation Threaths Exposed. https://webroot-cms-cdn.s3.amazonaws.com/7814/5617/2382/Webroot-2016-Threat-Brief.pdf
Xu, X., Liu, C., Feng, Q., Yin, H., Song, L., Song, D.: Neural network-based graph embedding for cross-platform binary code similarity detection. In: CCS, pp. 363–376 (2017)
Yan, X., Han, J.: gSpan: graph-based substructure pattern mining. In: 2002 IEEE International Conference on Data Mining, pp. 721–724. IEEE (2002)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Bertrand Van Ouytsel, CH., Crochet, C., Dam, K.H.T., Legay, A. (2023). Tool Paper - SEMA: Symbolic Execution Toolchain for Malware Analysis. In: Kallel, S., Jmaiel, M., Zulkernine, M., Hadj Kacem, A., Cuppens, F., Cuppens, N. (eds) Risks and Security of Internet and Systems. CRiSIS 2022. Lecture Notes in Computer Science, vol 13857. Springer, Cham. https://doi.org/10.1007/978-3-031-31108-6_5
Download citation
DOI: https://doi.org/10.1007/978-3-031-31108-6_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-31107-9
Online ISBN: 978-3-031-31108-6
eBook Packages: Computer ScienceComputer Science (R0)