survey

Identifying Authorship in Malicious Binaries: Features, Challenges & Datasets

Authors: Jason Gray, Daniele Sgandurra, Lorenzo Cavallaro, and Jorge Blasco AlisAuthors Info & Claims

ACM Computing Surveys, Volume 56, Issue 8

Article No.: 212, Pages 1 - 36

https://doi.org/10.1145/3653973

Published: 30 April 2024 Publication History

Abstract

Attributing a piece of malware to its creator typically requires threat intelligence. Binary attribution increases the level of difficulty as it mostly relies upon the ability to disassemble binaries to obtain authorship-related features. We perform a systematic analysis of works in the area of malware authorship attribution. We identify key findings and some shortcomings of current approaches and explore the open research challenges. To mitigate the lack of ground-truth datasets in this domain, we publish alongside this survey the largest and most diverse meta-information dataset of 17,513 malware labeled to 275 threat actor groups.

References

[1]

S. Afroz, A. C. Islam, A. Stolerman, R. Greenstadt, and D. McCoy. 2014. Doppelgänger finder: Taking stylometry to the underground. In 2014 IEEE Symposium on Security and Privacy. 212–226.

Digital Library

[2]

Mohammadhadi Alaeiyan, Ali Dehghantanha, Tooska Dargahi, Mauro Conti, and Saeed Parsa. 2020. A multilabel fuzzy relevance clustering system for malware attack attribution in the edge layer of cyber-physical networks. ACM Trans. Cyber-Phys. Syst. 4, 3, Article 31 (March2020), 22 pages.

Digital Library

[3]

AlienVault. [n. d.]. https://otx.alienvault.com/

[4]

Saed Alrabaee, Mourad Debbabi, and Lingyu Wang. 2019. On the feasibility of binary authorship characterization. Digital Invest. 28, Supplement (2019), S3–S11.

Digital Library

[5]

Saed Alrabaee, Mourad Debbabi, and Lingyu Wang. 2022. A survey of binary code fingerprinting approaches: Taxonomy, methodologies, and features. ACM Comput. Surv. 55, 1, Article 19 (Jan.2022), 41 pages.

Digital Library

[6]

Saed Alrabaee, ElMouatez Billah Karbab, Lingyu Wang, and Mourad Debbabi. 2019. BinEye: Towards efficient binary authorship characterization using deep learning. In ESORICS Proceedings of the 24th European Symposium on Research in Computer Security (ESORICS’19), Part II. 47–67.

Digital Library

[7]

Saed Alrabaee, Noman Saleem, Stere Preda, Lingyu Wang, and Mourad Debbabi. 2014. OBA2: An onion approach to binary code authorship attribution. Digital Invest. 11 (2014), S94–S103. Proceedings of the First Annual DFRWS Europe.

[8]

Saed Alrabaee, Paria Shirani, Mourad Debbabi, and Lingyu Wang. 2017. On the feasibility of malware authorship attribution. In Foundations and Practice of Security, Frédéric Cuppens, Lingyu Wang, Nora Cuppens-Boulahia, Nadia Tawbi, and Joaquin Garcia-Alfaro (Eds.). Springer International Publishing, Cham, 256–272.

[9]

Saed Alrabaee, Paria Shirani, Lingyu Wang, and Mourad Debbabi. 2018. FOSSIL: A resilient and efficient system for identifying foss functions in malware binaries. ACM Trans. Priv. Secur. 21, 2, Article 8 (Jan.2018), 34 pages.

Digital Library

[10]

Saed Alrabaee, Paria Shirani, Lingyu Wang, Mourad Debbabi, and Aiman Hanna. 2018. On leveraging coding habits for effective binary authorship attribution. In Computer Security, Javier Lopez, Jianying Zhou, and Miguel Soriano (Eds.). Springer International Publishing, Cham, 26–47.

Digital Library

[11]

Victor M. Alvarez. 2020. YARA. Retrieved May 30, 2020 from https://virustotal.github.io/yara/

[12]

Naqqash Aman, Yasir Saleem, Fahim H. Abbasi, and Farrukh Shahzad. 2017. A hybrid approach for malware family classification. In Applications and Techniques in Information Security, Lynn Batten, Dong Seong Kim, Xuyun Zhang, and Gang Li (Eds.). Springer Singapore, Singapore, 169–180.

[13]

armbues. 2015. ioc_parser. Retrieved October 16, 2020 from https://github.com/armbues/ioc_parser

[14]

Vitor Ventura Asheer Malhotra and Jungsoo An. 2022. Lazarus and the tale of three RATs. Retrieved February 1, 2023 from https://blog.talosintelligence.com/lazarus-three-rats/

[15]

AT&T Cybersecurity. 2018. OTX Trends 2018 Q1 and Q2. Retrieved May 21, 2020 from https://cybersecurity.att.com/resource-center/white-papers/2018-open-threat-exchange-trends

[16]

Brian Bartholomew and Juan Andres Guerrero-Saade. 2016. Wave your false flags! Deception tactics muddying attribution in targeted attacks. Retrieved May 24, 2020 from https://media.kasperskycontenthub.com/wp-content/uploads/sites/43/2017/10/20114955/Bartholomew-GuerreroSaade-VB2016.pdf

[17]

Omri Ben Bassat and Itay Cohen. 2019. Mapping the Connections Inside Russia’s APT Ecosystem. Retrieved May 24, 2020 from https://www.intezer.com/blog-russian-apt-ecosystem/

[18]

Boldizsár Bencsáth, Gábor Pék, Levente Buttyán, and Márk Félegyházi. 2012. The cousins of stuxnet: Duqu, flame, and gauss. Future Internet 4, 4 (2012), 971–1003.

[19]

Marius Benthin. 2022. Attribution of Malware Binaries to APT Actors Using an Ensemble Classifier. Master’s thesis.

[20]

Steven Bird, Ewan Klein, and Edward Loper. 2009. Natural Language Processing with Python. O’Reilly. Retrieved from http://www.oreilly.de/catalog/9780596516499/index.html

[21]

Catherine Lu and Brianne Hughes. 2019. cyber.dic. Retrieved October 16, 2020 from https://github.com/BishopFox/cyberdic

[22]

Coen Boot. 2019. Applying Supervised Learning on Malware Authorship Attribution. Master’s thesis.

[23]

Xander Bouwman, Harm Griffioen, Jelle Egbers, Christian Doerr, Bram Klievink, and Michel van Eeten. 2020. A different cup of TI? The added value of commercial threat intelligence. In 29th USENIX Security Symposium (USENIX Security’20). USENIX Association, 433–450. https://www.usenix.org/conference/usenixsecurity20/presentation/bouwman

[24]

Michael Brennan, Sadia Afroz, and Rachel Greenstadt. 2012. Adversarial stylometry: Circumventing authorship recognition to preserve privacy and anonymity. ACM Trans. Inf. Syst. Secur. 15, 3, Article 12 (Nov.2012), 22 pages.

Digital Library

[25]

Steven Burrows, Alexandra L. Uitdenbogerd, and Andrew Turpin. 2014. Comparing techniques for authorship attribution of source code. Softw. Pract. Exper. 44, 1 (2014), 1–32.

[26]

Aylin Caliskan, Fabian Yamaguchi, Edwin Dauber, Richard E. Harang, Konrad Rieck, Rachel Greenstadt, and Arvind Narayanan. 2018. When coding style survives compilation: De-anonymizing programmers from executable binaries. In 25th Annual Network and Distributed System Security Symposium (NDSS’18). https://faculty.washington.edu/aylin/papers/caliskan_when.pdf

[27]

Aylin Caliskan-Islam, Richard Harang, Andrew Liu, Arvind Narayanan, Clare Voss, Fabian Yamaguchi, and Rachel Greenstadt. 2015. De-anonymizing programmers via code stylometry. In 24th USENIX Security Symposium (USENIX Security’15). USENIX Association, Washington, D.C., 255–270. https://www.usenix.org/conference/usenixsecurity15/technical-sessions/presentation/caliskan-islam

[28]

Alejandro Calleja, Juan Tapiador, and Juan Caballero. 2016. A look into 30 years of malware development from a software metrics perspective. In Research in Attacks, Intrusions, and Defenses, Fabian Monrose, Marc Dacier, Gregory Blanc, and Joaquin Garcia-Alfaro (Eds.). Springer International Publishing, Cham, 325–345.

[29]

A. Calleja, J. Tapiador, and J. Caballero. 2019. The MalSource dataset: Quantifying complexity and code reuse in malware development. IEEE Trans. Inf. Forens. Secur. 14, 12 (Dec.2019), 3175–3190.

[30]

N. Carlini and D. Wagner. 2017. Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy (SP’17). 39–57.

[31]

Centro Criptológico Nacional (CCN-CERT). 2020. Ciberamenazas Y Tendencias. Retrieved November 16, 2020 from https://www.ccn-cert.cni.es/informes/informes-ccn-cert-publicos/5377-ccn-cert-ia-13-20-ciberamenazas-y-tendencias-edicion-2020/file.html

[32]

Chronicle. 2004. VirusTotal. Retrieved October 16, 2020 from www.virustotal.com

[33]

Itay Cohen and Eyal Itkin. 2020. Graphology of an exploit — Hunting for exploits by looking for the author’s fingerprints. https://vblocalhost.com/uploads/VB2020-Cohen-Itkin.pdf

[34]

Stephen A. Cook. 1971. The complexity of theorem-proving procedures. In Proceedings of the 3rd Annual ACM Symposium on Theory of Computing (STOC’71). Association for Computing Machinery, New York, NY, USA, 151–158.

Digital Library

[35]

Council on Foreign Relations. 2020. Cyber Operations Tracker. Retrieved October 27, 2020 from https://www.cfr.org/interactive/cyber-operations

[36]

cyber-research. 2019. APTMalware. Retrieved September 25, 2020 from https://github.com/cyber-research/APTMalware

[37]

Edwin Dauber, Aylin Caliskan, Richard E. Harang, Gregory Shearer, Michael Weisman, Frederica Nelson, and Rachel Greenstadt. 2019. Git blame who?: Stylistic authorship attribution of small, incomplete source code fragments. PoPETs 2019, 3 (2019), 389–408.

[38]

M. V. Emmerik and T. Waddington. 2004. Using a decompiler for real-world source recovery. In 11th Working Conference on Reverse Engineering. 27–36.

[39]

Mohammad Reza Farhadi, Benjamin C. M. Fung, Yin Bun Fung, Philippe Charland, Stere Preda, and Mourad Debbabi. 2015. Scalable code clone search for malware analysis. Digital Invest. 15 (2015), 46–60. Special Issue: Big Data and Intelligent Data Analysis.

Digital Library

[40]

FireEye. 2017. FLOSS. Retrieved May 24, 2020 from https://github.com/fireeye/flare-floss

[41]

Georgia Frantzeskou, Efstathios Stamatatos, Stefanos Gritzalis, Carole E. Chaski, and Blake Stephen Howald. 2007. Identifying authorship by byte-level n-grams: The source code author profile (SCAP) method. IJDE 6, 1 (2007), 1–18. http://www.utica.edu/academic/institutes/ecii/publications/articles/B41158D1-C829-0387-009D214D2170C321.pdf

[42]

Noah Gamer. 2016. The problem with open source malware. Retrieved May 29, 2020 from https://blog.trendmicro.com/the-problem-with-open-source-malware/

[43]

GitHub. 2020. GitHub Repositories. Retrieved May 24, 2020 from https://github.com

[44]

Siyi Gong and Hao Zhong. 2021. Code authors hidden in file revision histories: An empirical study. In 2021 IEEE/ACM 29th International Conference on Program Comprehension (ICPC’21). 71–82.

[45]

Hugo Gonzalez, Natalia Stakhanova, and Ali A. Ghorbani. 2018. Authorship attribution of android apps. In Proceedings of the 8th ACM Conference on Data and Application Security and Privacy (CODASPY’18). ACM, 277–286.

Digital Library

[46]

Google. 2008–2020. Google Code Jam. Retrieved May 24, 2020 from https://codingcompetitions.withgoogle.com/codejam/

[47]

Google Scholar. [n. d.]. https://scholar.google.com

[48]

H. Haddadpajouh, A. Azmoodeh, A. Dehghantanha, and R. M. Parizi. 2020. MVFCC: A multi-view fuzzy consensus clustering model for malware threat attribution. IEEE Access 8 (2020), 139188–139198.

[49]

Karsten Hahn. 2021. Malware family naming hell is our own fault. Retrieved January 31, 2023 from https://www.gdatasoftware.com/blog/malware-family-naming-hell

[50]

Weijie Han, Jingfeng Xue, Yong Wang, Fuquan Zhang, and Xianwei Gao. 2021. APTMalInsight: Identify and cognize APT malware based on system call information and ontology knowledge framework. Inf. Sci. 546 (2021), 633–664.

[51]

Irfan Ul Haq and Juan Caballero. 2019. A survey of binary code similarity. CoRR abs/1909.11424 (2019). arxiv:1909.11424 http://arxiv.org/abs/1909.11424

[52]

Steven Hendrikse. 2017. The Effect of Code Obfuscation on Authorship Attribution of Binary Computer Files. Ph.D. Dissertation. https://nsuworks.nova.edu/gscis_etd/1009

[53]

Ben Herzog. 2018. The GandCrab Ransomware Mindset. Retrieved May 24, 2020 from https://research.checkpoint.com/2018/gandcrab-ransomware-mindset/

[54]

Hex-Rays. [n. d.]. Decompiler. Retrieved March 3, 2023 from https://hex-rays.com/decompiler/

[55]

Floyd Hightower. 2017. Observable Finder. Retrieved October 16, 2020 from https://github.com/fhightower/ioc-finder

[56]

Jiwon Hong, Sanghyun Park, Sang-Wook Kim, Dongphil Kim, and Wonho Kim. 2018. Classifying malwares for identification of author groups. Concurr. Comput. Pract. Exp. 30, 3 (2018), e4197. e4197 cpe.4197.

[57]

Jiwon Hong, Sung-Jun Park, Taeri Kim, Yung-Kyun Noh, Sang-Wook Kim, Dongphil Kim, and Wonho Kim. 2019. Malware classification for identifying author groups: A graph-based approach. In Proceedings of the Conference on Research in Adaptive and Convergent Systems (RACS’19). Association for Computing Machinery, New York, NY, USA, 169–174.

Digital Library

[58]

Matthew Honnibal and Ines Montani. 2017. spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing.

[59]

M. Hurier, G. Suarez-Tangil, S. K. Dash, T. F. Bissyandé, Y. Le Traon, J. Klein, and L. Cavallaro. 2017. Euphony: Harmonious unification of cacophonous anti-virus vendor labels for android malware. In 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR’17). 425–435.

Digital Library

[60]

Vaibhavi Kalgutkar, Ratinder Kaur, Hugo Gonzalez, Natalia Stakhanova, and Alina Matyukhina. 2019. Code authorship attribution: Methods and challenges. ACM Comput. Surv. 52, 1, Article 3 (Feb.2019), 36 pages.

Digital Library

[61]

Vaibhavi Kalgutkar, Natalia Stakhanova, Paul Cook, and Alina Matyukhina. 2018. Android authorship attribution through string analysis. In Proceedings of the 13th International Conference on Availability, Reliability and Security (ARES’18), Sebastian Doerr, Mathias Fischer, Sebastian Schrittwieser, and Dominik Herrmann (Eds.). ACM, 4:1–4:10.

Digital Library

[62]

Kaspersky. 2020. The power of threat attribution. Retrieved October 2, 2020 from https://media.kaspersky.com/en/business-security/enterprise/threat-attribution-engine-whitepaper.pdf

[63]

Eujeanne Kim, Sung-Jun Park, Seokwoo Choi, Dong-Kyu Chae, and Sang-Wook Kim. 2021. MANIAC: A man-machine collaborative system for classifying malware author groups. In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security (CCS’21). Association for Computing Machinery, New York, NY, USA, 2441–2443.

Digital Library

[64]

B. Kolosnjaji, A. Demontis, B. Biggio, D. Maiorca, G. Giacinto, C. Eckert, and F. Roli. 2018. Adversarial malware binaries: Evading deep learning for malware detection in executables. In 2018 26th European Signal Processing Conference (EUSIPCO’18). 533–537.

[65]

Ivan Krsul and Eugene H. Spafford. 1997. Authorship analysis: Identifying the author of a program. Comput. Secur. 16, 3 (1997), 233–257.

Digital Library

[66]

Giuseppe Laurenza and Riccardo Lazzeretti. 2020. dAPTaset: A comprehensive mapping of APT-related data. In Computer Security, Apostolos P. Fournaris, Manos Athanatos, Konstantinos Lampropoulos, Sotiris Ioannidis, George Hatzivasilis, Ernesto Damiani, Habtamu Abie, Silvio Ranise, Luca Verderame, Alberto Siena, and Joaquin Garcia-Alfaro (Eds.). Springer International Publishing, Cham, 217–225.

Digital Library

[67]

Giuseppe Laurenza, Riccardo Lazzeretti, and Luca Mazzotti. 2020. Malware triage for early identification of advanced persistent threat activities. Dig. Threats 1, 3, Article 16 (Aug.2020), 17 pages.

Digital Library

[68]

Valentine Legoy, Marco Caselli, Christin Seifert, and Andreas Peter. 2020. Automated Retrieval of ATT&CK Tactics and Techniques for Cyber Threat Reports. arxiv:2004.14322 [cs.CR]

[69]

Antoine Lemay, Joan Calvet, François Menet, and José M. Fernandez. 2018. Survey of publicly available reports on advanced persistent threat actors. Comput. Secur. 72 (2018), 26–59.

Digital Library

[70]

Lockheed-Martin. 2015. Gaining the Advantage Applying Cyber Kill Chain® Methodology to Network Defense. Retrieved May 24, 2020 from https://www.lockheedmartin.com/content/dam/lockheed-martin/rms/documents/cyber/Gaining_the_Advantage_Cyber_Kill_Chain.pdf

[71]

Andrea Marcelli, Mariano Graziano, Xabier Ugarte-Pedrero, Yanick Fratantonio, Mohamad Mansouri, and Davide Balzarotti. 2022. How machine learning is solving the binary function similarity problem. In 31st USENIX Security Symposium (USENIX Security’22). USENIX Association.

[72]

Morgan Marquis-Boire, Marion Marschalek, and Claudio Guarnieri. 2015. Big game hunting: The peculiarities in nation–state malware research. https://www.blackhat.com/docs/us-15/materials/us-15-MarquisBoire-Big-Game-Hunting-The-Peculiarities-Of-Nation-State-Malware-Research.pdf

[73]

Masrepus, vfsrfs, and garanews. 2019. Un{i}packer. Retrieved May 24, 2020 from https://github.com/unipacker/unipacker

[74]

Alina Matyukhina, Natalia Stakhanova, Mila Dalla Preda, and Celine Perley. 2019. Adversarial authorship attribution in open-source projects. In Proceedings of the 9th ACM Conference on Data and Application Security and Privacy (CODASPY’19). ACM, New York, NY, USA, 291–302.

Digital Library

[75]

Xiaozhu Meng. 2016. Fine-grained binary code authorship identification. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE’16). ACM, 1097–1099.

Digital Library

[76]

Xiaozhu Meng and Barton P. Miller. 2018. Binary code multi-author identification in multi-toolchain scenarios. http://ftp.cs.wisc.edu/paradyn/papers/Meng17MultiToolchain.pdf

[77]

Xiaozhu Meng, Barton P. Miller, and Somesh Jha. 2018. Adversarial binaries for authorship identification. CoRR abs/1809.08316 (2018). arxiv:1809.08316 http://arxiv.org/abs/1809.08316

[78]

Xiaozhu Meng, Barton P. Miller, and Kwang-Sung Jun. 2017. Identifying multiple authors in a binary program. In Computer Security (ESORICS’17), Simon N. Foley, Dieter Gollmann, and Einar Snekkenes (Eds.). Springer International Publishing, Cham, 286–304.

[79]

Xiaozhu Meng, B. P. Miller, W. R. Williams, and A. R. Bernat. 2013. Mining software repositories for accurate authorship. In 2013 IEEE International Conference on Software Maintenance (ICSM’13). IEEE Computer Society, Los Alamitos, CA, USA, 250–259.

Digital Library

[80]

Najmeh Miramirkhani, Mahathi Priya Appini, Nick Nikiforakis, and Michalis Polychronakis. 2017. Spotless sandboxes: Evading malware analysis systems using wear-and-tear artifacts. In 2017 IEEE Symposium on Security and Privacy (SP’17). 1009–1024.

[81]

MISP: Open Source Threat Intelligence Platform. 2020. List of Threat Actors. Retrieved October 27, 2020 from https://raw.githubusercontent.com/MISP/misp-galaxy/main/clusters/threat-actor.json

[82]

Mitre. 2020. ATT&CK. Retrieved May 22, 2020 from https://attack.mitre.org/

[83]

Tempestt J. Neal, Kalaivani Sundararajan, Aneez Fatima, Yiming Yan, Yingfei Xiang, and Damon L. Woodard. 2018. Surveying stylometry techniques and applications. ACM Comput. Surv. 50, 6 (2018), 86:1–86:36.

Digital Library

[84]

OASIS Cyber Threat Intelligence. 2020. STIX/TAXII 2.0. Retrieved May 24, 2020 from https://oasis-open.github.io/cti-documentation/

[85]

Office of the Director of National Intelligence. 2018. A Guide to Cyber Attribution. Retrieved September 25, 2020 from https://www.dni.gov/files/CTIIC/documents/ODNI_A_Guide_to_Cyber_Attribution.pdf

[86]

P. W. Oman and C. R. Cook. 1989. Programming style authorship analysis. In Proceedings of the 17th Conference on ACM Annual Computer Science Conference (CSC’89). Association for Computing Machinery, New York, NY, USA, 320–326.

Digital Library

[87]

Luca Pascarella, Fabio Palomba, Massimiliano Di Penta, and Alberto Bacchelli. 2018. How is video game development different from software development in open source? In Proceedings of the 15th International Conference on Mining Software Repositories (MSR’18). Association for Computing Machinery, New York, NY, USA, 392–402.

Digital Library

[88]

Kexin Pei, Zhou Xuan, Junfeng Yang, Suman Jana, and Baishakhi Ray. 2020. Trex: Learning execution semantics from micro-traces for binary similarity. arXiv preprint arXiv:2012.08680 (2020).

[89]

Daniel Plohmann, Martin Clauss, Steffen Enders, and Elmar Padilla. 2017. Malpedia: A collaborative effort to inventorize the malware landscape. J. Cybercrime Digital Invest. 3, 1 (2017), 1–19. https://cyberjournal.cecyf.fr/index.php/cybin/article/view/17

[90]

Erwin Quiring, Alwin Maier, and Konrad Rieck. 2019. Misleading authorship attribution of source code using adversarial learning. In 28th USENIX Security Symposium (USENIX Security’19). USENIX Association, Santa Clara, CA, 479–496. https://www.usenix.org/conference/usenixsecurity19/presentation/quiring

[91]

Edward Raff, Richard Zak, Gary Lopez Munoz, William Fleming, Hyrum S. Anderson, Bobby Filar, Charles Nicholas, and James Holt. 2020. Automatic YARA rule generation using biclustering. In 13th ACM Workshop on Artificial Intelligence and Security (AISec’20).

Digital Library

[92]

Radim Řehůřek and Petr Sojka. 2010. Software framework for topic modelling with large corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks. ELRA, 45–50. http://is.muni.cz/publication/884893/en

[93]

Rewterz. 2023. Annual Threat Intelligence Report 2022. Retrieved March 8, 2023 from https://www.rewterz.com/wp-content/uploads/2023/01/Annual-Threat-Intelligence-Report-2022.pdf

[94]

Thomas Rid and Ben Buchanan. 2015. Attributing cyber attacks. J. Strateg. Stud. 38, 1-2 (2015), 4–37. arXiv:

[95]

Ed Robbins. 2017. Solvers for Type Recovery and Decompilation of Binaries. Ph.D. Dissertation. University of Kent. https://kar.kent.ac.uk/61349/

[96]

Royi Ronen, Marian Radu, Corina Feuerstein, Elad Yom-Tov, and Mansour Ahmadi. 2018. Microsoft Malware Classification Challenge.

[97]

Ishai Rosenberg, Guillaume Sicard, and Eli (Omid) David. 2017. DeepAPT: Nation-state APT attribution using end-to-end deep neural networks. In Artificial Neural Networks and Machine Learning (ICANN’17), Alessandra Lintas, Stefano Rovetta, Paul F. M. J. Verschure, and Alessandro E. P. Villa (Eds.). Springer International Publishing, Cham, 91–99.

[98]

Ishai Rosenberg, Guillaume Sicard, and Eli (Omid) David. 2018. End-to-end deep neural networks and transfer learning for automatic analysis of nation-state malware. Entropy 20, 5 (2018), Article No. 390.https://www.mdpi.com/1099-4300/20/5/390

[99]

Jay Rosenberg and Christiaan Beek. 2018. Examining Code Reuse Reveals Undiscovered Links Among North Korea’s Malware Families. Retrieved May 24, 2020 from https://www.mcafee.com/blogs/other-blogs/mcafee-labs/examining-code-reuse-reveals-undiscovered-links-among-north-koreas-malware-families/

[100]

Nathan Rosenblum, Xiaojin Zhu, and Barton P. Miller. 2011. Who wrote this code? Identifying the authors of program binaries. In Computer Security (ESORICS’11), Vijay Atluri and Claudia Diaz (Eds.). Springer, Berlin,172–189.

[101]

Nathan E. Rosenblum, Barton P. Miller, and Xiaojin Zhu. 2010. Extracting compiler provenance from program binaries. In Proceedings of the 9th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering (PASTE’10). ACM, 21–28.

Digital Library

[102]

sapphirex00. 2018. APTs and OPs Table Guide. Retrieved October 27, 2020 from https://github.com/sapphirex00/Threat-Hunting/raw/master/apts_and_ops_tableguide.xlsx

[103]

Marcos Sebastián, Richard Rivera, Platon Kotzias, and Juan Caballero. 2016. AVclass: A tool for massive malware labeling. In Proceedings of the 19th International Symposium on Research in Attacks, Intrusions, and Defenses (RAID’16). 230–253.

[104]

Lucy Simko, Luke Zettlemoyer, and Tadayoshi Kohno. 2018. Recognizing and imitating programmer style: Adversaries in program authorship attribution. Proc. Priv. Enh. Technol. 2018, 1 (2018), 127–144. https://content.sciendo.com/view/journals/popets/2018/1/article-p127.xml

[105]

Qige Song, Yongzheng Zhang, Linshu Ouyang, and Yige Chen. 2022. BinMLM: Binary authorship verification with flow-aware mixture-of-shared language model. In 2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER’22). 1023–1033.

[106]

Pasquale Stirparo, David Bizeul, Brian Bell, Ziv Chang, Joel Esler, Kristopher Bleich, Maite Moreno, Monnappa K. A., J. Capmany, Paul Hutchinson, Boris Ivanov, Andre Gironda, Devon Ackerman, Carlos Fragoso, Eyal Sela, and Florian Egloff. 2015. APT Groups and Operations. Retrieved May 24, 2020 from https://apt.threattracking.com

[107]

Symantec. 2019. Internet security threat report 2019. Retrieved May 24, 2020 from https://www.symantec.com/content/dam/symantec/docs/reports/istr-24-2019-en.pdf

[108]

DBLP Team. 2020. DBLP computer science bibliography. Retrieved May 24, 2020 from https://dblp.uni-trier.de

[109]

Thailand Computer Emergency Response Team. 2020. Threat Group Cards: A Threat Actor Encyclopedia. Retrieved October 27, 2020 from https://apt.thaicert.or.th/

[110]

Guido van Rossum, Barry Warsaw, and Nick Coghlan. 2001. PEP 8 Style Guide for Python Code. Retrieved May 24, 2020 from https://www.python.org/dev/peps/pep-0008/

[111]

VirusShare. [n. d.]. https://virusshare.com/

[112]

N. Virvilis and D. Gritzalis. 2013. The big four - What we did wrong in advanced persistent threat detection? In 2013 International Conference on Availability, Reliability and Security (ARES’13). 248–254.

Digital Library

[113]

Daniel Votipka, Seth M. Rabin, Kristopher Micinski, Jeffrey S. Foster, and Michelle M. Mazurek. 2020. An observational investigation of reverse engineers’ processes. In Proceedings of the 29th USENIX Conference on Security Symposium. 1875–1892.

Digital Library

[114]

VXUnderground. [n. d.]. https://vx-underground.org/

[115]

Qinqin Wang, Hanbing Yan, and Zhihui Han. 2021. Explainable APT attribution for malware using NLP techniques. In 2021 IEEE 21st International Conference on Software Quality, Reliability and Security (QRS’21). 70–80.

[116]

Claes Wohlin. 2014. Guidelines for snowballing in systematic literature studies and a replication in software engineering. In Proceedings of the 18th International Conference on Evaluation and Assessment in Software Engineering (EASE’14). Association for Computing Machinery, New York, NY, USA, Article 38, 10 pages.

Digital Library

[117]

H. Xue, S. Sun, G. Venkataramani, and T. Lan. 2019. Machine learning-based analysis of program binaries: A comprehensive study. IEEE Access 7 (2019), 65889–65912.

Index Terms

Identifying Authorship in Malicious Binaries: Features, Challenges & Datasets

Recommendations

Adversarial Authorship Attribution in Open-Source Projects
CODASPY '19: Proceedings of the Ninth ACM Conference on Data and Application Security and Privacy

Open-source software is open to anyone by design, whether it is a community of developers, hackers or malicious users. Authors of open-source software typically hide their identity through nicknames and avatars. However, they have no protection against ...
Read More
Malicious SSL Certificate Detection: A Step Towards Advanced Persistent Threat Defence
ICFNDS '17: Proceedings of the International Conference on Future Networks and Distributed Systems

Advanced Persistent Threat (APT) is one of the most serious types of cyber attacks, which is a new and more complex version of multistep attack. Within the APT life cycle, continuous communication between infected hosts and Command and Control (C&C) ...
Read More
A Survey on malware analysis and mitigation techniques
Abstract
In recent days, malwares are advanced, sophisticatedly engineered to attack the target. Most of such advanced malwares are highly persistent and capable of escaping from the security systems. This paper explores such an advanced ...
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM Computing Surveys

ACM Computing Surveys Volume 56, Issue 8

August 2024

963 pages

ISSN:0360-0300

EISSN:1557-7341

DOI:10.1145/3613627

Editors:
David Atienza
Swiss Federal Institute of Technology Lausanne (EPFL), Switzerland
,
Michela Milano
University of Bologna, Italy

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 April 2024

Online AM: 26 March 2024

Accepted: 07 March 2024

Revised: 23 February 2024

Received: 18 November 2020

Published in CSUR Volume 56, Issue 8

Check for updates

Author Tags

Qualifiers

Survey

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
315
Total Downloads

Downloads (Last 12 months)315
Downloads (Last 6 weeks)65

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents