Chapter 2
Chapter 2
Static Analysis
1
Contents
• Determining the File Type
• Fingerprinting the Malware
• Multiple Anti-Virus Scanning
• Extracting Strings
• Determining File Obfuscation
• Inspecting PE Header Information
• Comparing And Classifying The Malware
2
Determining the File Type
• During your analysis, determining the file type of a suspect binary will
help you identify the malware's target operating system (Windows,
Linux, and so on) and architecture (32-bit or 64-bit platforms).
• For example, if the suspect binary has a file type of Portable
Executable (PE), which is the file format for Windows executable files
(.exe, .dll, .sys, .drv, .com, .ocx, and so on), then you can deduce that
the file is designed to target the Windows operating system.
• Most Windows-based malware are executable files ending with
extensions such as .exe, .dll, .sys, and so on. But relying on file
extensions alone is not recommended.
• File extension is not the sole indicator of file type.
3
Determining the File Type
• Attackers use different tricks to hide their file by modifying the file
extension and changing its appearance to trick users into executing it.
• Instead of relying on file extension, File signature can be used to determine
the file type.
• A file signature is a unique sequence of bytes that is written to the file's
header.
• Different files have different signatures, which can be used to identify the
type of file.
• The Windows executable files, also called PE files (such as the files ending
with .exe, .dll, .com, .drv, .sys, and so on), have a file signature of MZ or
hexadecimal characters 4D 5A in the first two bytes of the file.
4
File Signature
• Electronic files have file signatures (file header signatures) which are
needed by operating systems and programs in order to select the
appropriate program to open or run the file.
• For example, an image file will be opened in an image viewer. The
image viewer program recognizes the header signature as an image file
and will correctly open it.
• On a Windows system, a file signature is normally contained within
the first 20 bytes of the file.
5
File Signature
• Different file types have different file signatures; for example, a
Windows Bitmap image file (.bmp extension) begins with the
hexadecimal characters 42 4D in the first two bytes of the file,
characters that translate to the letters “BM.”
• Most Windows-based malware specimens are executable files, often
ending in the extensions .exe, .dll, .com, .pif, .drv, .qtx, .qts, .ocx, or
.sys. The file signature for these files is “MZ” or the hexadecimal
characters 4D 5A, found in the first two bytes of the file.
6
7
Identifying File Type Using Tools
• On Linux systems, this can be achieved using the file utility.
• On Windows, CFF Explorer, part of Explorer Suite
(http://www.ntcore.com/exsuite.php), can be used to determine the file
type; it is not just limited to determining file type.
• It is also a great tool for inspecting executable files (both 32-bit and
64-bit) and allows you to examine the PE internal structure, modify
fields, and extract resources.
8
Determining File Type Using Python
• In Python, the python-magic module can be used to determine the file
type.
• On Windows, to install the python-magic module, you can follow the
procedure mentioned at https://github.com/ahupp/python-magic.
9
Fingerprinting the Malware
• Fingerprinting involves generating the cryptographic hash values for
the suspect binary based on its file content.
• The cryptographic hashing algorithms such as MD5, SHA1 or
SHA256 are considered the de facto standard for generating file
hashes for the malware specimens.
10
Use of cryptographic hashes lists
• Identifying a malware specimen based on filename is ineffective because the same
malware sample can use different filenames, but the cryptographic hash that is
calculated based on the file content will remain the same.
• During dynamic analysis, when malware is executed, it can copy itself to a
different location or drop another piece of malware. Having the cryptographic
hash of the sample can help in identifying whether the newly dropped/copied
sample is the same as the original sample or a different one. This information can
assist you in deciding whether the analysis needs to be performed on a single
sample or multiple samples.
• File hash is frequently used as an indicator to share with other security researchers
to help them identify the sample.
• File hash can be used to determine whether the sample has been previously
detected by searching online or searching the database of multi Anti-virus
scanning service like VirusTotal.
11
Generating Cryptographic Hash Using
Tools
• On a Linux system, file hashes can be generated using the md5sum,
sha256sum, and sha1sum utilities:
• $ md5sum log.exe
• 6e4e030fbd2ee786e1b6b758d5897316 log.exe
• $ sha256sum log.exe
• 01636faaae739655bf88b39d21834b7dac923386d2b52efb4142
cb278061f97f log.exe
• $ sha1sum log.exe
• 625644bacf83a889038e4a283d29204edc0e9b65 log.exe
12
Generating Cryptographic Hash Using
Tools (For Window)
• HashMyFiles(http:/ /www.nirsoft.net/utils/hash_ my_ files. html) is
one such tool that generates hash values for single or multiple files,
and it also highlights identical hashes with same colors. In the
following screenshot, it can be seen that log.exe and bunny.exe are the
same samples based on their hash values:
13
Determining Cryptographic Hash in
Python
• In Python, it is possible to generate file hashes using the hashlib module, as
shown here:
$ python
Python 2.7.12 (default, Nov 19 2016, 06:48:10)
>>> import hashlib
>>> content = open(r"log.exe","rb").read()
>>> print hashlib.md5(content).hexdigest()
6e4e030fbd2ee786e1b6b758d5897316
>>> print hashlib.sha256(content).hexdigest()
01636faaae739655bf88b39d21834b7dac923386d2b52efb4142cb278061f97f
>>> print hashlib.sha1(content).hexdigest()
625644bacf83a889038e4a283d29204edc0e9b65
14
Multiple Anti-Virus Scanning
• Scanning the suspect binary with multiple anti-virus scanners helps in
determining whether malicious code signatures exist for the suspect
file.
• The signature name for a particular file can provide additional
information about the file and its capabilities.
• By visiting the respective antivirus vendor websites or searching for
the signature in search engines, you can yield further details about the
suspect file.
• Such information can help in your subsequent investigation and can
reduce the analysis time.
15
Scanning the Suspect Binary with
VirusTotal
• It allows you to upload a file, which is then scanned with various anti-virus
scanners, and the scan results are presented in real time on the web page.
• In addition to uploading files for scanning, the VirusTotal web interface
provides you the ability to search their database using hash, URL, domain,
or IP address.
• VirusTotal offers another useful feature called VirusTotal Graph, built on
top of the VirusTotal dataset.
• Using VirusTotal Graph, you can visualize the relationship between the file
that you submit and its associated indicators such as domains, IP addresses,
and URLs.
• It also allows you to pivot and navigate over each indicator; this feature is
extremely useful if you want to quickly determine the indicators associated
with a malicious binary.
16
17
Tips
• There are a few factors/risks to consider when scanning a binary with
Anti-Virus scanners or when submitting a binary to online anti-virus
scanning services:
• If a suspect binary does not get detected by the Anti-Virus scanning engines, it
does not necessarily mean that the suspect binary is safe. These anti-virus
engines rely on signatures and heuristics to detect malicious files. The
malware authors can easily modify their code and use obfuscation techniques
to bypass these detections, because of which some of the anti-virus engines
might fail to detect the binary as malicious.
18
Tips
• When you upload a binary to a public site, the binary you submit may
be shared with third parties and vendors. The suspect binary may
contain sensitive, personal, or proprietary information specific to your
organization, so it is not advisable to submit a binary that is part of a
confidential investigation to public anti-virus scanning services.
• When you submit a binary to the online antivirus scanning engines, the
scan results are stored in their database, and most of the scan data is
publicly available and can be queried later. Attackers can use the
search feature to query the has of their sample to check whether their
binary has been detected. Detection of their sample may cause the
attackers to change their tactics to avoid detection.
19
Extracting Strings
• Strings are ASCII and Unicode-printable sequences of characters embedded
within a file.
• Extracting strings can give clues about the program functionality and
indicators associated with a suspect binary.
• For example, if a malware creates a file, the filename is stored as a string in the
binary. Or, if a malware resolves a domain name controlled by the attacker, then the
domain name is stored as a string.
• Strings extracted from the binary can contain references to filenames,
URLs, domain names, IP addresses, attack commands, registry keys, and so
on.
• Although strings do not give a clear picture of the purpose and capability of
a file, they can give a hint about what malware is capable of doing.
20
String Extraction Using Tools
• In Linux, the strings command, by default, extracts the ASCII strings that are at
least four characters long. With the -a option it is possible to extract strings from
the entire file. The following ASCII strings extracted from the malicious binary
show reference to an IP address. This indicates that when this malware is
executed, it probably establishes a connection with that IP address:
$ strings -a log.exe
!This program cannot be run in DOS mode.
Rich
.text
`.rdata
@.data
L$"%
h4z@
128.91.34.188
%04d-%02d-%02d %02d:%02d:%02d %s
21
Determining File Obfuscation
• Even though string extraction is an excellent technique to harvest valuable
information, often malware authors obfuscate or armor their malware
binary.
• Obfuscation is used by malware authors to protect the inner workings of the
malware from security researchers, malware analysts, and reverse engineers.
• These obfuscation techniques make it difficult to detect/analyze the binary;
extracting the strings from such binary results in very fewer strings, and
most of the strings are obscured.
• Malware authors often use programs such as Packers and Cryptors to
obfuscate their file to evade detection from security products such as anti-
virus and to thwart analysis.
22
Inspecting PE Header Information
• When the binary is executed, the operating system loader reads the
information from the PE header and then loads the binary content from
the file into the memory.
• The PE header contains information such as where the executable
needs to be loaded into memory, the address where the execution
starts, the list of libraries/functions on which the application relies on,
and the resources used by the binary.
• Examining the PE header yields a wealth of information about the
binary, and its functionalities.
23
Common sections in a PE file
Section Name Description
.rdata Contains read-only data. Sometimes it also contains import and export
information.
.idata If present, contains the import table. If not present, then the import information is
stored in .rdata section.
.edata If present, contains export information. If not present, then the export information
is found in .rdata section.
.rsrc This section contains the resources used by the executable such as icons, dialogs,
menus, strings, and so on.
24
Comparing And Classifying The Malware
• Classifying Malware Using Fuzzy Hashing
• Classifying Malware Using Import Hash
• Classifying Malware Using Section Hash
• Classifying Malware Using YARA
25
Fuzzy Hashing
• Fuzzy hashing is a great method to compare files for similarity. ssdeep
(http://ssdeep.sourceforge.net) is a useful tool to generate the fuzzy
hash for a sample, and it also helps in determining percentage
similarity between the samples.
• This technique is useful in comparing a suspect binary with the
samples in a repository to identify the samples that are similar.
26
Import Hash
• Import Hashing is another technique that can be used to identify
related samples and the samples used by the same threat actor groups.
• Import hash (or imphash) is a technique in which hash values are
calculated based on the library/imported function (API) names and
their particular order within the executable.
• If the files were compiled from the same source and in the same
manner, those files would tend to have the same imphash value.
• During your malware investigation, if you come across samples that
have the same imphash values, it means that they have the same
import address table and are probably related.
27
Section Hash
• Similar to import hashing, section hashing can also help in identifying
related samples.
28
YARA
• A malware sample can contain many strings or binary indicators;
recognizing the strings or binary data that are unique to a malware
sample or a malware family can help in malware classification.
• Security researchers classify malware based on the unique strings and
the binary indicators present in the binary. Sometimes, malware can
also be classified based on general characteristics.
• YARA (http://virustotal.github.io/yara/) is a powerful malware
identification and classification tool.
• Malware researchers can create YARA rules based on textual or binary
information contained within the malware specimen.
29
Thank You
30