Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3015135.3015136acmotherconferencesArticle/Chapter ViewAbstractPublication PagesssprewConference Proceedingsconference-collections
research-article

Metadata recovery from obfuscated programs using machine learning

Published: 05 December 2016 Publication History

Abstract

Obfuscation is a mechanism used to hinder reverse engineering of programs. To cope with the large number of obfuscated programs, especially malware, reverse engineers automate the process of deobfuscation i.e. extracting information from obfuscated programs. Deobfuscation techniques target specific obfuscation transformations, which requires reverse engineers to manually identify the transformations used by a program, in what is known as metadata recovery attack. In this paper, we present Oedipus, a Python framework that uses machine learning classifiers viz., decision trees and naive Bayes, to automate metadata recovery attacks against obfuscated programs. We evaluated Oedipus' performance using two datasets totaling 1960 unobfuscated C programs, which were used to generate 11.075 programs obfuscated using 30 configurations of 6 different obfuscation transformations. Our results empirically show the feasibility of using machine learning to implement the metadata recovery attacks with classification accuracies of 100% in some cases.

References

[1]
A. Balakrishnan and C. Schulze. Code obfuscation literature survey. CS701 Construction of Compilers, 19, 2005.
[2]
S. Banescu, M. Ochoa, and A. Pretschner. A framework for measuring software obfuscation resilience against automated attacks. In 2015 IEEE/ACM 1st International Workshop on Software Protection (SPRO), pages v--vi, May 2015.
[3]
C. Cadar, D. Dunbar, and D. R. Engler. Klee: Unassisted and automatic generation of high-coverage tests for complex systems programs. In OSDI, 2008.
[4]
C. Collberg. Tigress: Transformations Index. University of Arizona, 2015.
[5]
C. Collberg, S. Martin, J. Myers, and J. Nagra. Distributed application tamper detection via continuous software updates. In Proceedings of the 28th Annual Computer Security Applications Conference, ACSAC '12, pages 319--328, New York, NY, USA, 2012. ACM.
[6]
C. Collberg, C. Thomborson, and D. Low. A taxonomy of obfuscating transformations. Technical report, Department of Computer Science, The University of Auckland, New Zealand, 1997.
[7]
D. Cournapeau. Scikit-learn.
[8]
Cquestions.com. C programming interview questions and answers, 2015.
[9]
T. Ebringer, L. Sun, and S. Boztas. A fast randomness test that preserves local detail. In Virus Bulletin 2008, pages 34--42. Virus Bulletin Ltd, 2008.
[10]
M. Egele, T. Scholte, E. Kirda, and C. Kruegel. A survey on automated dynamic malware-analysis techniques and tools. ACM Computing Surveys (CSUR), 44(2):6, 2012.
[11]
Y. Guillot and A. Gazet. Automatic binary deobfuscation. Journal in computer virology, 6(3):261--276, 2010.
[12]
Y. Kanzaki, A. Monden, and C. Collberg. Code artificiality: a metric for the code stealth based on an n-gram model. In Proceedings of the 1st International Workshop on Software Protection, pages 31--37. IEEE Press, 2015.
[13]
R. Lyda and J. Hamrock. Using entropy analysis to find encrypted and packed malware. IEEE Security & Privacy, 5(2):40--45, 2007.
[14]
R. Perdisci, A. Lanzi, and W. Lee. Classification of packed executables for accurate computer virus detection. Pattern Recognition Letters, 29(14):1941--1946, 2008.
[15]
G. Project. Gnu binutils.
[16]
G. Project. Gnu compiler collection.
[17]
G. Project. Gnu debugger.
[18]
J. Ramos. Using tf-idf to determine word relevance in document queries. In Proceedings of the first instructional conference on machine learning, 2003.
[19]
R. Rehurek. Gensim.
[20]
R. Rolles. Unpacking virtualization obfuscators. In 3rd USENIX Workshop on Offensive Technologies.(WOOT), 2009.
[21]
J. Salwan and F. Saudel. Triton: A concolic execution framework for x86-64 binaries. In Symposium sur la securite des technologies de l'information et des communications, SSTIC, France, Rennes, June 3-5 2015, pages 31--54. SSTIC, 2015.
[22]
M. Sikorski and A. Honig. Practical Malware Analysis: The Hands-On Guide to Dissecting Malicious Software. No Starch Press, 1st edition, 2012.
[23]
A. Slowinska, T. Stancescu, and H. Bos. Howard: A dynamic excavator for reverse engineering data structures. In NDSS. Citeseer, 2011.
[24]
L. Sun, S. Versteeg, S. Boztaş, and T. Yann. Pattern recognition techniques for the classification of malware packers. In Australasian Conference on Information Security and Privacy, pages 370--390. Springer, 2010.
[25]
S. K. Udupa, S. K. Debray, and M. Madou. Deobfuscation: Reverse engineering obfuscated code. In Reverse Engineering, 12th Working Conference on, pages 10--pp. IEEE, 2005.
[26]
X. Ugarte-Pedrero, I. Santos, P. G. Bringas, M. Gastesi, and J. M. Esparza. Semi-supervised learning for packed executable detection. In Network and System Security (NSS), 2011 5th International Conference on, pages 342--346. IEEE, 2011.
[27]
H. S. Warren. Hacker's delight. Pearson Education, 2013.
[28]
B. Yadegari, B. Johannesmeyer, B. Whitely, and S. Debray. A generic approach to automatic deobfuscation of executable code. Technical report, Technical report, Department of Computer Science, The University of Arizona, 2014.

Cited By

View all
  • (2024)Control-Flow Deobfuscation using Trace-Informed Compositional Program SynthesisProceedings of the ACM on Programming Languages10.1145/36897898:OOPSLA2(2211-2241)Online publication date: 8-Oct-2024
  • (2024)Obfuscation undercover: Unraveling the impact of obfuscation layering on structural code patternsJournal of Information Security and Applications10.1016/j.jisa.2024.10385085(103850)Online publication date: Sep-2024
  • (2023)Design of a new detection system for anti-virtualization malicious code2023 International Conference on Networks, Communications and Intelligent Computing (NCIC)10.1109/NCIC61838.2023.00057(302-306)Online publication date: 17-Nov-2023
  • Show More Cited By

Index Terms

  1. Metadata recovery from obfuscated programs using machine learning

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      SSPREW '16: Proceedings of the 6th Workshop on Software Security, Protection, and Reverse Engineering
      December 2016
      85 pages
      ISBN:9781450348416
      DOI:10.1145/3015135
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 05 December 2016

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. machine learning
      2. obfuscation
      3. reverse engineering

      Qualifiers

      • Research-article

      Conference

      SSPREW '16

      Acceptance Rates

      Overall Acceptance Rate 6 of 13 submissions, 46%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)28
      • Downloads (Last 6 weeks)8
      Reflects downloads up to 16 Oct 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Control-Flow Deobfuscation using Trace-Informed Compositional Program SynthesisProceedings of the ACM on Programming Languages10.1145/36897898:OOPSLA2(2211-2241)Online publication date: 8-Oct-2024
      • (2024)Obfuscation undercover: Unraveling the impact of obfuscation layering on structural code patternsJournal of Information Security and Applications10.1016/j.jisa.2024.10385085(103850)Online publication date: Sep-2024
      • (2023)Design of a new detection system for anti-virtualization malicious code2023 International Conference on Networks, Communications and Intelligent Computing (NCIC)10.1109/NCIC61838.2023.00057(302-306)Online publication date: 17-Nov-2023
      • (2023)Explaining Binary Obfuscation2023 IEEE International Conference on Cyber Security and Resilience (CSR)10.1109/CSR57506.2023.10224825(22-27)Online publication date: 31-Jul-2023
      • (2022)Robust learning against relational adversariesProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601452(16246-16260)Online publication date: 28-Nov-2022
      • (2022)A Preliminary Study on Using Text- and Image-Based Machine Learning to Predict Software MaintainabilitySoftware Quality: The Next Big Thing in Software Engineering and Quality10.1007/978-3-031-04115-0_4(41-60)Online publication date: 12-Apr-2022
      • (2021)Dynamic Taint Analysis versus Obfuscated Self-CheckingProceedings of the 37th Annual Computer Security Applications Conference10.1145/3485832.3485926(182-193)Online publication date: 6-Dec-2021
      • (2019)Fine-grained static detection of obfuscation transforms using ensemble-learning and semantic reasoningProceedings of the 9th Workshop on Software Security, Protection, and Reverse Engineering10.1145/3371307.3371313(1-12)Online publication date: 9-Dec-2019
      • (2019)Defeating Opaque Predicates Statically through Machine Learning and Binary AnalysisProceedings of the 3rd ACM Workshop on Software Protection10.1145/3338503.3357719(3-14)Online publication date: 15-Nov-2019
      • (2019)Hypervisor-Based Protection of CodeIEEE Transactions on Information Forensics and Security10.1109/TIFS.2019.289457714:8(2203-2216)Online publication date: 1-Aug-2019
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media