Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
survey

Identifying Authorship in Malicious Binaries: Features, Challenges & Datasets

Published: 30 April 2024 Publication History
  • Get Citation Alerts
  • Abstract

    Attributing a piece of malware to its creator typically requires threat intelligence. Binary attribution increases the level of difficulty as it mostly relies upon the ability to disassemble binaries to obtain authorship-related features. We perform a systematic analysis of works in the area of malware authorship attribution. We identify key findings and some shortcomings of current approaches and explore the open research challenges. To mitigate the lack of ground-truth datasets in this domain, we publish alongside this survey the largest and most diverse meta-information dataset of 17,513 malware labeled to 275 threat actor groups.

    References

    [1]
    S. Afroz, A. C. Islam, A. Stolerman, R. Greenstadt, and D. McCoy. 2014. Doppelgänger finder: Taking stylometry to the underground. In 2014 IEEE Symposium on Security and Privacy. 212–226.
    [2]
    Mohammadhadi Alaeiyan, Ali Dehghantanha, Tooska Dargahi, Mauro Conti, and Saeed Parsa. 2020. A multilabel fuzzy relevance clustering system for malware attack attribution in the edge layer of cyber-physical networks. ACM Trans. Cyber-Phys. Syst. 4, 3, Article 31 (March2020), 22 pages.
    [3]
    AlienVault. [n. d.]. https://otx.alienvault.com/
    [4]
    Saed Alrabaee, Mourad Debbabi, and Lingyu Wang. 2019. On the feasibility of binary authorship characterization. Digital Invest. 28, Supplement (2019), S3–S11.
    [5]
    Saed Alrabaee, Mourad Debbabi, and Lingyu Wang. 2022. A survey of binary code fingerprinting approaches: Taxonomy, methodologies, and features. ACM Comput. Surv. 55, 1, Article 19 (Jan.2022), 41 pages.
    [6]
    Saed Alrabaee, ElMouatez Billah Karbab, Lingyu Wang, and Mourad Debbabi. 2019. BinEye: Towards efficient binary authorship characterization using deep learning. In ESORICS Proceedings of the 24th European Symposium on Research in Computer Security (ESORICS’19), Part II. 47–67.
    [7]
    Saed Alrabaee, Noman Saleem, Stere Preda, Lingyu Wang, and Mourad Debbabi. 2014. OBA2: An onion approach to binary code authorship attribution. Digital Invest. 11 (2014), S94–S103. Proceedings of the First Annual DFRWS Europe.
    [8]
    Saed Alrabaee, Paria Shirani, Mourad Debbabi, and Lingyu Wang. 2017. On the feasibility of malware authorship attribution. In Foundations and Practice of Security, Frédéric Cuppens, Lingyu Wang, Nora Cuppens-Boulahia, Nadia Tawbi, and Joaquin Garcia-Alfaro (Eds.). Springer International Publishing, Cham, 256–272.
    [9]
    Saed Alrabaee, Paria Shirani, Lingyu Wang, and Mourad Debbabi. 2018. FOSSIL: A resilient and efficient system for identifying foss functions in malware binaries. ACM Trans. Priv. Secur. 21, 2, Article 8 (Jan.2018), 34 pages.
    [10]
    Saed Alrabaee, Paria Shirani, Lingyu Wang, Mourad Debbabi, and Aiman Hanna. 2018. On leveraging coding habits for effective binary authorship attribution. In Computer Security, Javier Lopez, Jianying Zhou, and Miguel Soriano (Eds.). Springer International Publishing, Cham, 26–47.
    [11]
    Victor M. Alvarez. 2020. YARA. Retrieved May 30, 2020 from https://virustotal.github.io/yara/
    [12]
    Naqqash Aman, Yasir Saleem, Fahim H. Abbasi, and Farrukh Shahzad. 2017. A hybrid approach for malware family classification. In Applications and Techniques in Information Security, Lynn Batten, Dong Seong Kim, Xuyun Zhang, and Gang Li (Eds.). Springer Singapore, Singapore, 169–180.
    [13]
    armbues. 2015. ioc_parser. Retrieved October 16, 2020 from https://github.com/armbues/ioc_parser
    [14]
    Vitor Ventura Asheer Malhotra and Jungsoo An. 2022. Lazarus and the tale of three RATs. Retrieved February 1, 2023 from https://blog.talosintelligence.com/lazarus-three-rats/
    [15]
    AT&T Cybersecurity. 2018. OTX Trends 2018 Q1 and Q2. Retrieved May 21, 2020 from https://cybersecurity.att.com/resource-center/white-papers/2018-open-threat-exchange-trends
    [16]
    Brian Bartholomew and Juan Andres Guerrero-Saade. 2016. Wave your false flags! Deception tactics muddying attribution in targeted attacks. Retrieved May 24, 2020 from https://media.kasperskycontenthub.com/wp-content/uploads/sites/43/2017/10/20114955/Bartholomew-GuerreroSaade-VB2016.pdf
    [17]
    Omri Ben Bassat and Itay Cohen. 2019. Mapping the Connections Inside Russia’s APT Ecosystem. Retrieved May 24, 2020 from https://www.intezer.com/blog-russian-apt-ecosystem/
    [18]
    Boldizsár Bencsáth, Gábor Pék, Levente Buttyán, and Márk Félegyházi. 2012. The cousins of stuxnet: Duqu, flame, and gauss. Future Internet 4, 4 (2012), 971–1003.
    [19]
    Marius Benthin. 2022. Attribution of Malware Binaries to APT Actors Using an Ensemble Classifier. Master’s thesis.
    [20]
    Steven Bird, Ewan Klein, and Edward Loper. 2009. Natural Language Processing with Python. O’Reilly. Retrieved from http://www.oreilly.de/catalog/9780596516499/index.html
    [21]
    Catherine Lu and Brianne Hughes. 2019. cyber.dic. Retrieved October 16, 2020 from https://github.com/BishopFox/cyberdic
    [22]
    Coen Boot. 2019. Applying Supervised Learning on Malware Authorship Attribution. Master’s thesis.
    [23]
    Xander Bouwman, Harm Griffioen, Jelle Egbers, Christian Doerr, Bram Klievink, and Michel van Eeten. 2020. A different cup of TI? The added value of commercial threat intelligence. In 29th USENIX Security Symposium (USENIX Security’20). USENIX Association, 433–450. https://www.usenix.org/conference/usenixsecurity20/presentation/bouwman
    [24]
    Michael Brennan, Sadia Afroz, and Rachel Greenstadt. 2012. Adversarial stylometry: Circumventing authorship recognition to preserve privacy and anonymity. ACM Trans. Inf. Syst. Secur. 15, 3, Article 12 (Nov.2012), 22 pages.
    [25]
    Steven Burrows, Alexandra L. Uitdenbogerd, and Andrew Turpin. 2014. Comparing techniques for authorship attribution of source code. Softw. Pract. Exper. 44, 1 (2014), 1–32.
    [26]
    Aylin Caliskan, Fabian Yamaguchi, Edwin Dauber, Richard E. Harang, Konrad Rieck, Rachel Greenstadt, and Arvind Narayanan. 2018. When coding style survives compilation: De-anonymizing programmers from executable binaries. In 25th Annual Network and Distributed System Security Symposium (NDSS’18). https://faculty.washington.edu/aylin/papers/caliskan_when.pdf
    [27]
    Aylin Caliskan-Islam, Richard Harang, Andrew Liu, Arvind Narayanan, Clare Voss, Fabian Yamaguchi, and Rachel Greenstadt. 2015. De-anonymizing programmers via code stylometry. In 24th USENIX Security Symposium (USENIX Security’15). USENIX Association, Washington, D.C., 255–270. https://www.usenix.org/conference/usenixsecurity15/technical-sessions/presentation/caliskan-islam
    [28]
    Alejandro Calleja, Juan Tapiador, and Juan Caballero. 2016. A look into 30 years of malware development from a software metrics perspective. In Research in Attacks, Intrusions, and Defenses, Fabian Monrose, Marc Dacier, Gregory Blanc, and Joaquin Garcia-Alfaro (Eds.). Springer International Publishing, Cham, 325–345.
    [29]
    A. Calleja, J. Tapiador, and J. Caballero. 2019. The MalSource dataset: Quantifying complexity and code reuse in malware development. IEEE Trans. Inf. Forens. Secur. 14, 12 (Dec.2019), 3175–3190.
    [30]
    N. Carlini and D. Wagner. 2017. Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy (SP’17). 39–57.
    [31]
    Centro Criptológico Nacional (CCN-CERT). 2020. Ciberamenazas Y Tendencias. Retrieved November 16, 2020 from https://www.ccn-cert.cni.es/informes/informes-ccn-cert-publicos/5377-ccn-cert-ia-13-20-ciberamenazas-y-tendencias-edicion-2020/file.html
    [32]
    Chronicle. 2004. VirusTotal. Retrieved October 16, 2020 from www.virustotal.com
    [33]
    Itay Cohen and Eyal Itkin. 2020. Graphology of an exploit — Hunting for exploits by looking for the author’s fingerprints. https://vblocalhost.com/uploads/VB2020-Cohen-Itkin.pdf
    [34]
    Stephen A. Cook. 1971. The complexity of theorem-proving procedures. In Proceedings of the 3rd Annual ACM Symposium on Theory of Computing (STOC’71). Association for Computing Machinery, New York, NY, USA, 151–158.
    [35]
    Council on Foreign Relations. 2020. Cyber Operations Tracker. Retrieved October 27, 2020 from https://www.cfr.org/interactive/cyber-operations
    [36]
    cyber-research. 2019. APTMalware. Retrieved September 25, 2020 from https://github.com/cyber-research/APTMalware
    [37]
    Edwin Dauber, Aylin Caliskan, Richard E. Harang, Gregory Shearer, Michael Weisman, Frederica Nelson, and Rachel Greenstadt. 2019. Git blame who?: Stylistic authorship attribution of small, incomplete source code fragments. PoPETs 2019, 3 (2019), 389–408.
    [38]
    M. V. Emmerik and T. Waddington. 2004. Using a decompiler for real-world source recovery. In 11th Working Conference on Reverse Engineering. 27–36.
    [39]
    Mohammad Reza Farhadi, Benjamin C. M. Fung, Yin Bun Fung, Philippe Charland, Stere Preda, and Mourad Debbabi. 2015. Scalable code clone search for malware analysis. Digital Invest. 15 (2015), 46–60. Special Issue: Big Data and Intelligent Data Analysis.
    [40]
    FireEye. 2017. FLOSS. Retrieved May 24, 2020 from https://github.com/fireeye/flare-floss
    [41]
    Georgia Frantzeskou, Efstathios Stamatatos, Stefanos Gritzalis, Carole E. Chaski, and Blake Stephen Howald. 2007. Identifying authorship by byte-level n-grams: The source code author profile (SCAP) method. IJDE 6, 1 (2007), 1–18. http://www.utica.edu/academic/institutes/ecii/publications/articles/B41158D1-C829-0387-009D214D2170C321.pdf
    [42]
    Noah Gamer. 2016. The problem with open source malware. Retrieved May 29, 2020 from https://blog.trendmicro.com/the-problem-with-open-source-malware/
    [43]
    GitHub. 2020. GitHub Repositories. Retrieved May 24, 2020 from https://github.com
    [44]
    Siyi Gong and Hao Zhong. 2021. Code authors hidden in file revision histories: An empirical study. In 2021 IEEE/ACM 29th International Conference on Program Comprehension (ICPC’21). 71–82.
    [45]
    Hugo Gonzalez, Natalia Stakhanova, and Ali A. Ghorbani. 2018. Authorship attribution of android apps. In Proceedings of the 8th ACM Conference on Data and Application Security and Privacy (CODASPY’18). ACM, 277–286.
    [46]
    Google. 2008–2020. Google Code Jam. Retrieved May 24, 2020 from https://codingcompetitions.withgoogle.com/codejam/
    [47]
    [48]
    H. Haddadpajouh, A. Azmoodeh, A. Dehghantanha, and R. M. Parizi. 2020. MVFCC: A multi-view fuzzy consensus clustering model for malware threat attribution. IEEE Access 8 (2020), 139188–139198.
    [49]
    Karsten Hahn. 2021. Malware family naming hell is our own fault. Retrieved January 31, 2023 from https://www.gdatasoftware.com/blog/malware-family-naming-hell
    [50]
    Weijie Han, Jingfeng Xue, Yong Wang, Fuquan Zhang, and Xianwei Gao. 2021. APTMalInsight: Identify and cognize APT malware based on system call information and ontology knowledge framework. Inf. Sci. 546 (2021), 633–664.
    [51]
    Irfan Ul Haq and Juan Caballero. 2019. A survey of binary code similarity. CoRR abs/1909.11424 (2019). arxiv:1909.11424http://arxiv.org/abs/1909.11424
    [52]
    Steven Hendrikse. 2017. The Effect of Code Obfuscation on Authorship Attribution of Binary Computer Files. Ph.D. Dissertation. https://nsuworks.nova.edu/gscis_etd/1009
    [53]
    Ben Herzog. 2018. The GandCrab Ransomware Mindset. Retrieved May 24, 2020 from https://research.checkpoint.com/2018/gandcrab-ransomware-mindset/
    [54]
    Hex-Rays. [n. d.]. Decompiler. Retrieved March 3, 2023 from https://hex-rays.com/decompiler/
    [55]
    Floyd Hightower. 2017. Observable Finder. Retrieved October 16, 2020 from https://github.com/fhightower/ioc-finder
    [56]
    Jiwon Hong, Sanghyun Park, Sang-Wook Kim, Dongphil Kim, and Wonho Kim. 2018. Classifying malwares for identification of author groups. Concurr. Comput. Pract. Exp. 30, 3 (2018), e4197. e4197 cpe.4197.
    [57]
    Jiwon Hong, Sung-Jun Park, Taeri Kim, Yung-Kyun Noh, Sang-Wook Kim, Dongphil Kim, and Wonho Kim. 2019. Malware classification for identifying author groups: A graph-based approach. In Proceedings of the Conference on Research in Adaptive and Convergent Systems (RACS’19). Association for Computing Machinery, New York, NY, USA, 169–174.
    [58]
    Matthew Honnibal and Ines Montani. 2017. spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing.
    [59]
    M. Hurier, G. Suarez-Tangil, S. K. Dash, T. F. Bissyandé, Y. Le Traon, J. Klein, and L. Cavallaro. 2017. Euphony: Harmonious unification of cacophonous anti-virus vendor labels for android malware. In 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR’17). 425–435.
    [60]
    Vaibhavi Kalgutkar, Ratinder Kaur, Hugo Gonzalez, Natalia Stakhanova, and Alina Matyukhina. 2019. Code authorship attribution: Methods and challenges. ACM Comput. Surv. 52, 1, Article 3 (Feb.2019), 36 pages.
    [61]
    Vaibhavi Kalgutkar, Natalia Stakhanova, Paul Cook, and Alina Matyukhina. 2018. Android authorship attribution through string analysis. In Proceedings of the 13th International Conference on Availability, Reliability and Security (ARES’18), Sebastian Doerr, Mathias Fischer, Sebastian Schrittwieser, and Dominik Herrmann (Eds.). ACM, 4:1–4:10.
    [62]
    Kaspersky. 2020. The power of threat attribution. Retrieved October 2, 2020 from https://media.kaspersky.com/en/business-security/enterprise/threat-attribution-engine-whitepaper.pdf
    [63]
    Eujeanne Kim, Sung-Jun Park, Seokwoo Choi, Dong-Kyu Chae, and Sang-Wook Kim. 2021. MANIAC: A man-machine collaborative system for classifying malware author groups. In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security (CCS’21). Association for Computing Machinery, New York, NY, USA, 2441–2443.
    [64]
    B. Kolosnjaji, A. Demontis, B. Biggio, D. Maiorca, G. Giacinto, C. Eckert, and F. Roli. 2018. Adversarial malware binaries: Evading deep learning for malware detection in executables. In 2018 26th European Signal Processing Conference (EUSIPCO’18). 533–537.
    [65]
    Ivan Krsul and Eugene H. Spafford. 1997. Authorship analysis: Identifying the author of a program. Comput. Secur. 16, 3 (1997), 233–257.
    [66]
    Giuseppe Laurenza and Riccardo Lazzeretti. 2020. dAPTaset: A comprehensive mapping of APT-related data. In Computer Security, Apostolos P. Fournaris, Manos Athanatos, Konstantinos Lampropoulos, Sotiris Ioannidis, George Hatzivasilis, Ernesto Damiani, Habtamu Abie, Silvio Ranise, Luca Verderame, Alberto Siena, and Joaquin Garcia-Alfaro (Eds.). Springer International Publishing, Cham, 217–225.
    [67]
    Giuseppe Laurenza, Riccardo Lazzeretti, and Luca Mazzotti. 2020. Malware triage for early identification of advanced persistent threat activities. Dig. Threats 1, 3, Article 16 (Aug.2020), 17 pages.
    [68]
    Valentine Legoy, Marco Caselli, Christin Seifert, and Andreas Peter. 2020. Automated Retrieval of ATT&CK Tactics and Techniques for Cyber Threat Reports. arxiv:2004.14322 [cs.CR]
    [69]
    Antoine Lemay, Joan Calvet, François Menet, and José M. Fernandez. 2018. Survey of publicly available reports on advanced persistent threat actors. Comput. Secur. 72 (2018), 26–59.
    [70]
    Lockheed-Martin. 2015. Gaining the Advantage Applying Cyber Kill Chain® Methodology to Network Defense. Retrieved May 24, 2020 from https://www.lockheedmartin.com/content/dam/lockheed-martin/rms/documents/cyber/Gaining_the_Advantage_Cyber_Kill_Chain.pdf
    [71]
    Andrea Marcelli, Mariano Graziano, Xabier Ugarte-Pedrero, Yanick Fratantonio, Mohamad Mansouri, and Davide Balzarotti. 2022. How machine learning is solving the binary function similarity problem. In 31st USENIX Security Symposium (USENIX Security’22). USENIX Association.
    [72]
    Morgan Marquis-Boire, Marion Marschalek, and Claudio Guarnieri. 2015. Big game hunting: The peculiarities in nation–state malware research. https://www.blackhat.com/docs/us-15/materials/us-15-MarquisBoire-Big-Game-Hunting-The-Peculiarities-Of-Nation-State-Malware-Research.pdf
    [73]
    Masrepus, vfsrfs, and garanews. 2019. Un{i}packer. Retrieved May 24, 2020 from https://github.com/unipacker/unipacker
    [74]
    Alina Matyukhina, Natalia Stakhanova, Mila Dalla Preda, and Celine Perley. 2019. Adversarial authorship attribution in open-source projects. In Proceedings of the 9th ACM Conference on Data and Application Security and Privacy (CODASPY’19). ACM, New York, NY, USA, 291–302.
    [75]
    Xiaozhu Meng. 2016. Fine-grained binary code authorship identification. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE’16). ACM, 1097–1099.
    [76]
    Xiaozhu Meng and Barton P. Miller. 2018. Binary code multi-author identification in multi-toolchain scenarios. http://ftp.cs.wisc.edu/paradyn/papers/Meng17MultiToolchain.pdf
    [77]
    Xiaozhu Meng, Barton P. Miller, and Somesh Jha. 2018. Adversarial binaries for authorship identification. CoRR abs/1809.08316 (2018). arxiv:1809.08316http://arxiv.org/abs/1809.08316
    [78]
    Xiaozhu Meng, Barton P. Miller, and Kwang-Sung Jun. 2017. Identifying multiple authors in a binary program. In Computer Security (ESORICS’17), Simon N. Foley, Dieter Gollmann, and Einar Snekkenes (Eds.). Springer International Publishing, Cham, 286–304.
    [79]
    Xiaozhu Meng, B. P. Miller, W. R. Williams, and A. R. Bernat. 2013. Mining software repositories for accurate authorship. In 2013 IEEE International Conference on Software Maintenance (ICSM’13). IEEE Computer Society, Los Alamitos, CA, USA, 250–259.
    [80]
    Najmeh Miramirkhani, Mahathi Priya Appini, Nick Nikiforakis, and Michalis Polychronakis. 2017. Spotless sandboxes: Evading malware analysis systems using wear-and-tear artifacts. In 2017 IEEE Symposium on Security and Privacy (SP’17). 1009–1024.
    [81]
    MISP: Open Source Threat Intelligence Platform. 2020. List of Threat Actors. Retrieved October 27, 2020 from https://raw.githubusercontent.com/MISP/misp-galaxy/main/clusters/threat-actor.json
    [82]
    Mitre. 2020. ATT&CK. Retrieved May 22, 2020 from https://attack.mitre.org/
    [83]
    Tempestt J. Neal, Kalaivani Sundararajan, Aneez Fatima, Yiming Yan, Yingfei Xiang, and Damon L. Woodard. 2018. Surveying stylometry techniques and applications. ACM Comput. Surv. 50, 6 (2018), 86:1–86:36.
    [84]
    OASIS Cyber Threat Intelligence. 2020. STIX/TAXII 2.0. Retrieved May 24, 2020 from https://oasis-open.github.io/cti-documentation/
    [85]
    Office of the Director of National Intelligence. 2018. A Guide to Cyber Attribution. Retrieved September 25, 2020 from https://www.dni.gov/files/CTIIC/documents/ODNI_A_Guide_to_Cyber_Attribution.pdf
    [86]
    P. W. Oman and C. R. Cook. 1989. Programming style authorship analysis. In Proceedings of the 17th Conference on ACM Annual Computer Science Conference (CSC’89). Association for Computing Machinery, New York, NY, USA, 320–326.
    [87]
    Luca Pascarella, Fabio Palomba, Massimiliano Di Penta, and Alberto Bacchelli. 2018. How is video game development different from software development in open source? In Proceedings of the 15th International Conference on Mining Software Repositories (MSR’18). Association for Computing Machinery, New York, NY, USA, 392–402.
    [88]
    Kexin Pei, Zhou Xuan, Junfeng Yang, Suman Jana, and Baishakhi Ray. 2020. Trex: Learning execution semantics from micro-traces for binary similarity. arXiv preprint arXiv:2012.08680 (2020).
    [89]
    Daniel Plohmann, Martin Clauss, Steffen Enders, and Elmar Padilla. 2017. Malpedia: A collaborative effort to inventorize the malware landscape. J. Cybercrime Digital Invest. 3, 1 (2017), 1–19. https://cyberjournal.cecyf.fr/index.php/cybin/article/view/17
    [90]
    Erwin Quiring, Alwin Maier, and Konrad Rieck. 2019. Misleading authorship attribution of source code using adversarial learning. In 28th USENIX Security Symposium (USENIX Security’19). USENIX Association, Santa Clara, CA, 479–496. https://www.usenix.org/conference/usenixsecurity19/presentation/quiring
    [91]
    Edward Raff, Richard Zak, Gary Lopez Munoz, William Fleming, Hyrum S. Anderson, Bobby Filar, Charles Nicholas, and James Holt. 2020. Automatic YARA rule generation using biclustering. In 13th ACM Workshop on Artificial Intelligence and Security (AISec’20).
    [92]
    Radim Řehůřek and Petr Sojka. 2010. Software framework for topic modelling with large corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks. ELRA, 45–50. http://is.muni.cz/publication/884893/en
    [93]
    Rewterz. 2023. Annual Threat Intelligence Report 2022. Retrieved March 8, 2023 from https://www.rewterz.com/wp-content/uploads/2023/01/Annual-Threat-Intelligence-Report-2022.pdf
    [94]
    Thomas Rid and Ben Buchanan. 2015. Attributing cyber attacks. J. Strateg. Stud. 38, 1-2 (2015), 4–37. arXiv:
    [95]
    Ed Robbins. 2017. Solvers for Type Recovery and Decompilation of Binaries. Ph.D. Dissertation. University of Kent. https://kar.kent.ac.uk/61349/
    [96]
    Royi Ronen, Marian Radu, Corina Feuerstein, Elad Yom-Tov, and Mansour Ahmadi. 2018. Microsoft Malware Classification Challenge.
    [97]
    Ishai Rosenberg, Guillaume Sicard, and Eli (Omid) David. 2017. DeepAPT: Nation-state APT attribution using end-to-end deep neural networks. In Artificial Neural Networks and Machine Learning (ICANN’17), Alessandra Lintas, Stefano Rovetta, Paul F. M. J. Verschure, and Alessandro E. P. Villa (Eds.). Springer International Publishing, Cham, 91–99.
    [98]
    Ishai Rosenberg, Guillaume Sicard, and Eli (Omid) David. 2018. End-to-end deep neural networks and transfer learning for automatic analysis of nation-state malware. Entropy 20, 5 (2018), Article No. 390.https://www.mdpi.com/1099-4300/20/5/390
    [99]
    Jay Rosenberg and Christiaan Beek. 2018. Examining Code Reuse Reveals Undiscovered Links Among North Korea’s Malware Families. Retrieved May 24, 2020 from https://www.mcafee.com/blogs/other-blogs/mcafee-labs/examining-code-reuse-reveals-undiscovered-links-among-north-koreas-malware-families/
    [100]
    Nathan Rosenblum, Xiaojin Zhu, and Barton P. Miller. 2011. Who wrote this code? Identifying the authors of program binaries. In Computer Security (ESORICS’11), Vijay Atluri and Claudia Diaz (Eds.). Springer, Berlin,172–189.
    [101]
    Nathan E. Rosenblum, Barton P. Miller, and Xiaojin Zhu. 2010. Extracting compiler provenance from program binaries. In Proceedings of the 9th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering (PASTE’10). ACM, 21–28.
    [102]
    sapphirex00. 2018. APTs and OPs Table Guide. Retrieved October 27, 2020 from https://github.com/sapphirex00/Threat-Hunting/raw/master/apts_and_ops_tableguide.xlsx
    [103]
    Marcos Sebastián, Richard Rivera, Platon Kotzias, and Juan Caballero. 2016. AVclass: A tool for massive malware labeling. In Proceedings of the 19th International Symposium on Research in Attacks, Intrusions, and Defenses (RAID’16). 230–253.
    [104]
    Lucy Simko, Luke Zettlemoyer, and Tadayoshi Kohno. 2018. Recognizing and imitating programmer style: Adversaries in program authorship attribution. Proc. Priv. Enh. Technol. 2018, 1 (2018), 127–144. https://content.sciendo.com/view/journals/popets/2018/1/article-p127.xml
    [105]
    Qige Song, Yongzheng Zhang, Linshu Ouyang, and Yige Chen. 2022. BinMLM: Binary authorship verification with flow-aware mixture-of-shared language model. In 2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER’22). 1023–1033.
    [106]
    Pasquale Stirparo, David Bizeul, Brian Bell, Ziv Chang, Joel Esler, Kristopher Bleich, Maite Moreno, Monnappa K. A., J. Capmany, Paul Hutchinson, Boris Ivanov, Andre Gironda, Devon Ackerman, Carlos Fragoso, Eyal Sela, and Florian Egloff. 2015. APT Groups and Operations. Retrieved May 24, 2020 from https://apt.threattracking.com
    [107]
    Symantec. 2019. Internet security threat report 2019. Retrieved May 24, 2020 from https://www.symantec.com/content/dam/symantec/docs/reports/istr-24-2019-en.pdf
    [108]
    DBLP Team. 2020. DBLP computer science bibliography. Retrieved May 24, 2020 from https://dblp.uni-trier.de
    [109]
    Thailand Computer Emergency Response Team. 2020. Threat Group Cards: A Threat Actor Encyclopedia. Retrieved October 27, 2020 from https://apt.thaicert.or.th/
    [110]
    Guido van Rossum, Barry Warsaw, and Nick Coghlan. 2001. PEP 8 Style Guide for Python Code. Retrieved May 24, 2020 from https://www.python.org/dev/peps/pep-0008/
    [111]
    [112]
    N. Virvilis and D. Gritzalis. 2013. The big four - What we did wrong in advanced persistent threat detection? In 2013 International Conference on Availability, Reliability and Security (ARES’13). 248–254.
    [113]
    Daniel Votipka, Seth M. Rabin, Kristopher Micinski, Jeffrey S. Foster, and Michelle M. Mazurek. 2020. An observational investigation of reverse engineers’ processes. In Proceedings of the 29th USENIX Conference on Security Symposium. 1875–1892.
    [114]
    [115]
    Qinqin Wang, Hanbing Yan, and Zhihui Han. 2021. Explainable APT attribution for malware using NLP techniques. In 2021 IEEE 21st International Conference on Software Quality, Reliability and Security (QRS’21). 70–80.
    [116]
    Claes Wohlin. 2014. Guidelines for snowballing in systematic literature studies and a replication in software engineering. In Proceedings of the 18th International Conference on Evaluation and Assessment in Software Engineering (EASE’14). Association for Computing Machinery, New York, NY, USA, Article 38, 10 pages.
    [117]
    H. Xue, S. Sun, G. Venkataramani, and T. Lan. 2019. Machine learning-based analysis of program binaries: A comprehensive study. IEEE Access 7 (2019), 65889–65912.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Computing Surveys
    ACM Computing Surveys  Volume 56, Issue 8
    August 2024
    963 pages
    ISSN:0360-0300
    EISSN:1557-7341
    DOI:10.1145/3613627
    • Editors:
    • David Atienza,
    • Michela Milano
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 30 April 2024
    Online AM: 26 March 2024
    Accepted: 07 March 2024
    Revised: 23 February 2024
    Received: 18 November 2020
    Published in CSUR Volume 56, Issue 8

    Check for updates

    Author Tags

    1. Adversarial
    2. malware
    3. authorship attribution
    4. advanced persistent threats
    5. datasets

    Qualifiers

    • Survey

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 315
      Total Downloads
    • Downloads (Last 12 months)315
    • Downloads (Last 6 weeks)65

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media