Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
Introduction to Malware Analysis
Disclaimer 
• This stuff requires the analyst to dive 
extremely deep into technical details 
• This quick talk will attempt to give you a 1000 
foot view of malware analysis 
• I put a careful distinction between Malware 
Analysis and Reverse Engineering
Malware Analysis Overview 
• Static Analysis: involves analyzing the code 
without actually running the code 
– File identification, header information, strings, etc. 
– Disassembler – IDA Pro 
• Dynamic Analysis: involves executing the code in 
a controlled manner and monitoring system 
changes 
– Sysinternals, memory forencis, etc. 
– Debuggers – Immunity Debugger OllyDbg
Coding Terms 
• Malware authors with code in High Level Programming 
Language: C/C++
Static Analysis: File Identification 
• Linux “file” utility 
• Python-magic module
Static Analysis: MD5 Hash 
• Linux “md5sum” utility: md5sum <fileName> 
• Python hashlib module:
Static Analysis: Strings 
• Can be a quick way to gain intelligence from 
the file: 
– Domains, Ips, URLs, Function names, hardcoded 
information
Static Analysis: Packers 
• Packers are used to obfuscate the code which leads to: 
Changes the file signature (MD5 Hash) 
– Obfuscates the file strings, and code 
– Compress file size (sometimes) 
• Packed code can be identified by: 
– Examining the PE sections, and Imports: If a PE file only 
has LoadLibrary/GetProcAddress normally packed 
– Strings: UPX0, UPX1, aspack, adata, NSP0, NSP1, WinRAR 
SFX, PEC2, PECompact2, Themida, Orean.sys, NTkrnl, 
Secure Suite 
• Tools like (PEiD, LordPE, and Python peutils module)
Static Analysis: Packers 
• Unpacked vs. Packed Strings:
Data Encoding 
• Malware uses encoding for a number of reasons, 
some are to disguise internal workings, hide C2 
information, and data exfil 
– Some simple encoding algorithms are: 
– Character Substitution 
– XOR – uses a static key to XOR with the original value 
– Base64 – Can use default or custom character set 
– Default Base64 character set: A-Z, a-z, 0-9, +, / 
• We will examine two common data encoding 
techniques used in Malware XOR and Base64
Data Encoding: XOR 
• Strings are often required to be stored in a program in order 
to pass it as a parameter to a function 
• XOR once = encoded 
• XOR again with same key = plaintext
Data Encoding: Base64 
• Storing base64 strings as HTML comments is how the APT group 
“Comment Crew” got their name. This technique is still leveraged today in 
malware 
• Base64 is a common encoding scheme because it is very easy to decode
Static Analysis: PE File Format 
• PE data structure contains all the information required for the 
Windows OS loader to manage executable code. .text – instructions 
the CPU executes 
– .rdata – Imports and Exports 
– .data – Global data 
– .rsrc – Resources (icons, images, strings, etc.) 
• Useful information in PE header: Imports and Exports – Gives an 
idea to malware functionality 
– Compilation Time, Language Settings, and strings 
– Section Names – Packed code can have non-standard section names 
• Tools to analyze PE header: pescanner.py, CFF Explorer, python 
pefile, Resource Hacker, Dependency Walker, LordPE, etc.
Windows API Calls: 
• When performing advanced static or dynamic analysis it’s 
important to have a good understanding of Windows API calls 
• By looking at the imported functions within the PE header you 
can see which Windows API functions the PE file wants to 
utilize 
• By recognizing API calls you can quickly get an idea of 
malware’s functionality by analyzing strings output, and 
during advanced static analysis using a disassembler 
• An excellent resource for Windows API calls is MSDN. Google 
search “API_Function MSDN”
Windows API: MSDN Example 
• The Parameters modify how the function will be used on the 
system. 
• The return type is what the function will return after it is 
called in a program
Windows API: Disassembly 
• Parameters are pushed to the stack in Last In First Out(LIFO) 
order, which is why they are in reverse order in the 
disassembly
Wake Up  
• Okay, that was likely starting to bore some 
people – SORRY 
• Let’s move to Dynamic analysis which is more 
flashy
Getting Infected 
• Double clicking the executable doesn’t always work 
– Sometimes you need to register the malware as a service or load it as 
a DLL (regsvr32.exe and rundll32.exe ) 
• Install the malware as a service 
– Interact with the system like a normal user The 
malware may be waiting for a certain application to open 
to inject code into it (Ex: Internet Explorer) 
– It could require a CLI argument : One sample required 
<filename> /install in order to actually run the malware 
– Static analysis is normally required to determine CLI 
switches
SysInternals Tool Suite 
• If I could pick just one tool, id pick the 50+ in 
the Sysinternals tool suite  
• Tools put out by Mark Russinovich – now 
works for Microsoft 
• Process Explorer, Process Monitor, Autoruns, 
etc.
Process Explorer
Process Monitor 
• Very verbose tool that generates a lot of events 
• Filtering is required to make sense of the data
Process Monitor Cont. 
• Press Ctrl+L to bring up the filtering dialog box 
– Quick filters are: Operation is WriteFile 
– Category is Write
Malware Persistence - Autoruns 
• Really is the key to identify malware – how does it gain 
persistence? 
• Autoruns can help enumerate persistence mechanisms:
Monitoring Network Activity 
• Some interesting network indicators of malware are: 
– SYNs out to an IP or domain 
– UDP traffic to IP or domain 
– HTTP GET/POST requests 
– DNS Queries 
– Connection attempt times are important. Every 1 min, 30mins, etc.
Automation? Sandboxes 
• So far the basic dynamic analysis we have talked about 
can be automated 
• Sandboxes are a good tool in any malware analyst 
toolbox – they have Pro’s and Con’s: 
– Pros: Speeds up analysis, fast, saves time 
– Cons: Misses details, can be fooled 
• Sandboxes can be open source or commercial: 
– Really good free option is Cuckoo sandbox: 
• Install Tutorial: http://www.primalsecurity.net/im-cuckoo-for-malware- 
with-a-spice-of-reverse-engineering/
Summary 
• Malware analysis requires both static and 
dynamic analysis techniques to accurately 
enumerate indicators of compromise 
• As with any automated tool an analyst will 
need to be able to validate findings manually

More Related Content

Introduction to Malware Analysis

  • 2. Disclaimer • This stuff requires the analyst to dive extremely deep into technical details • This quick talk will attempt to give you a 1000 foot view of malware analysis • I put a careful distinction between Malware Analysis and Reverse Engineering
  • 3. Malware Analysis Overview • Static Analysis: involves analyzing the code without actually running the code – File identification, header information, strings, etc. – Disassembler – IDA Pro • Dynamic Analysis: involves executing the code in a controlled manner and monitoring system changes – Sysinternals, memory forencis, etc. – Debuggers – Immunity Debugger OllyDbg
  • 4. Coding Terms • Malware authors with code in High Level Programming Language: C/C++
  • 5. Static Analysis: File Identification • Linux “file” utility • Python-magic module
  • 6. Static Analysis: MD5 Hash • Linux “md5sum” utility: md5sum <fileName> • Python hashlib module:
  • 7. Static Analysis: Strings • Can be a quick way to gain intelligence from the file: – Domains, Ips, URLs, Function names, hardcoded information
  • 8. Static Analysis: Packers • Packers are used to obfuscate the code which leads to: Changes the file signature (MD5 Hash) – Obfuscates the file strings, and code – Compress file size (sometimes) • Packed code can be identified by: – Examining the PE sections, and Imports: If a PE file only has LoadLibrary/GetProcAddress normally packed – Strings: UPX0, UPX1, aspack, adata, NSP0, NSP1, WinRAR SFX, PEC2, PECompact2, Themida, Orean.sys, NTkrnl, Secure Suite • Tools like (PEiD, LordPE, and Python peutils module)
  • 9. Static Analysis: Packers • Unpacked vs. Packed Strings:
  • 10. Data Encoding • Malware uses encoding for a number of reasons, some are to disguise internal workings, hide C2 information, and data exfil – Some simple encoding algorithms are: – Character Substitution – XOR – uses a static key to XOR with the original value – Base64 – Can use default or custom character set – Default Base64 character set: A-Z, a-z, 0-9, +, / • We will examine two common data encoding techniques used in Malware XOR and Base64
  • 11. Data Encoding: XOR • Strings are often required to be stored in a program in order to pass it as a parameter to a function • XOR once = encoded • XOR again with same key = plaintext
  • 12. Data Encoding: Base64 • Storing base64 strings as HTML comments is how the APT group “Comment Crew” got their name. This technique is still leveraged today in malware • Base64 is a common encoding scheme because it is very easy to decode
  • 13. Static Analysis: PE File Format • PE data structure contains all the information required for the Windows OS loader to manage executable code. .text – instructions the CPU executes – .rdata – Imports and Exports – .data – Global data – .rsrc – Resources (icons, images, strings, etc.) • Useful information in PE header: Imports and Exports – Gives an idea to malware functionality – Compilation Time, Language Settings, and strings – Section Names – Packed code can have non-standard section names • Tools to analyze PE header: pescanner.py, CFF Explorer, python pefile, Resource Hacker, Dependency Walker, LordPE, etc.
  • 14. Windows API Calls: • When performing advanced static or dynamic analysis it’s important to have a good understanding of Windows API calls • By looking at the imported functions within the PE header you can see which Windows API functions the PE file wants to utilize • By recognizing API calls you can quickly get an idea of malware’s functionality by analyzing strings output, and during advanced static analysis using a disassembler • An excellent resource for Windows API calls is MSDN. Google search “API_Function MSDN”
  • 15. Windows API: MSDN Example • The Parameters modify how the function will be used on the system. • The return type is what the function will return after it is called in a program
  • 16. Windows API: Disassembly • Parameters are pushed to the stack in Last In First Out(LIFO) order, which is why they are in reverse order in the disassembly
  • 17. Wake Up  • Okay, that was likely starting to bore some people – SORRY • Let’s move to Dynamic analysis which is more flashy
  • 18. Getting Infected • Double clicking the executable doesn’t always work – Sometimes you need to register the malware as a service or load it as a DLL (regsvr32.exe and rundll32.exe ) • Install the malware as a service – Interact with the system like a normal user The malware may be waiting for a certain application to open to inject code into it (Ex: Internet Explorer) – It could require a CLI argument : One sample required <filename> /install in order to actually run the malware – Static analysis is normally required to determine CLI switches
  • 19. SysInternals Tool Suite • If I could pick just one tool, id pick the 50+ in the Sysinternals tool suite  • Tools put out by Mark Russinovich – now works for Microsoft • Process Explorer, Process Monitor, Autoruns, etc.
  • 21. Process Monitor • Very verbose tool that generates a lot of events • Filtering is required to make sense of the data
  • 22. Process Monitor Cont. • Press Ctrl+L to bring up the filtering dialog box – Quick filters are: Operation is WriteFile – Category is Write
  • 23. Malware Persistence - Autoruns • Really is the key to identify malware – how does it gain persistence? • Autoruns can help enumerate persistence mechanisms:
  • 24. Monitoring Network Activity • Some interesting network indicators of malware are: – SYNs out to an IP or domain – UDP traffic to IP or domain – HTTP GET/POST requests – DNS Queries – Connection attempt times are important. Every 1 min, 30mins, etc.
  • 25. Automation? Sandboxes • So far the basic dynamic analysis we have talked about can be automated • Sandboxes are a good tool in any malware analyst toolbox – they have Pro’s and Con’s: – Pros: Speeds up analysis, fast, saves time – Cons: Misses details, can be fooled • Sandboxes can be open source or commercial: – Really good free option is Cuckoo sandbox: • Install Tutorial: http://www.primalsecurity.net/im-cuckoo-for-malware- with-a-spice-of-reverse-engineering/
  • 26. Summary • Malware analysis requires both static and dynamic analysis techniques to accurately enumerate indicators of compromise • As with any automated tool an analyst will need to be able to validate findings manually