Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
2 views

Malicious Behavior Detection Method Using API Sequence in Binary Execution Path

This paper presents a novel method for detecting and classifying malicious behavior in malware using API sequences from binary execution paths. By comparing existing methods, the authors demonstrate that their approach improves detection efficiency, particularly for complex malicious behaviors such as DLL Injection and Key Logging. The proposed method utilizes static analysis to track execution paths, allowing for a more effective analysis of malware compared to traditional techniques.

Uploaded by

djkim
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Malicious Behavior Detection Method Using API Sequence in Binary Execution Path

This paper presents a novel method for detecting and classifying malicious behavior in malware using API sequences from binary execution paths. By comparing existing methods, the authors demonstrate that their approach improves detection efficiency, particularly for complex malicious behaviors such as DLL Injection and Key Logging. The proposed method utilizes static analysis to track execution paths, allowing for a more effective analysis of malware compared to traditional techniques.

Uploaded by

djkim
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

ISSN 1330-3651 (Print), ISSN 1848-6339 (Online) https://doi.org/10.

17559/TV-20210202132203
Original scientific paper

Malicious Behavior Detection Method Using API Sequence in Binary Execution Path

Jihun KIM, Sungwon LEE, Jonghee YOUN*

Abstract: Today, the amount of malware is growing very rapidly, and the types and behaviors of malware are becoming very diverse. Unlike existing malicious codes, new
types or variants of malicious codes are being identified, and it takes a lot of time to analyze all malicious codes. To solve these problems malware analysts analyze and
research effective ways to reduce analysis time and cost. In this paper, we propose a method to express characteristics and detect malicious codes by using API Sequence
for malicious code detection and classification. It compares and analyzes several existing expression methods and verifies the effectiveness through actual malicious code
samples. Using the expression method proposed in the paper, we detected six malicious behaviors: DLL Injection, Downloader, IAT Hooking, Key Logger, Screen Capture
and Antidebugging. As a result, more detection was detected than by conventional detection methods, and it can be seen that the more complex the malicious behavior, the
higher the detection efficiency. In addition, static analysis was adopted as the main method, but because it searches execution compression, the flow of malicious behavior
can be analyzed.

Keywords: API sequence; binary execution path; malware analysis; malware detection

1 INTRODUCTION analysis because it is conducted at the code level, the


effects of dynamic analysis can be expected because it
In 1949, Von Neumann published a theory that self- traces the binary execution path to grasp the behavior. Even
replication and proliferation can be performed in if the malicious code does not function completely due to
computers as with biological viruses [1]. Thereafter, those problems such as dependencies or has a packing problem,
computer programs that are equipped with self-replication analysis is possible because it targets various branches in
and proliferation functions started to be called "virus". the binary code. In addition, the API behavior information
However, since the term virus is quite limited in in the analysis results will be visualized with graphs for
expressions, a concept that can comprehensively express clear under-standing of malicious behaviors and the
the concept became necessary. The term made excellence of the malicious behavior detection method
consequently is malware, which is an abbreviation of proposed in this paper will be proved through comparison
malicious software, which means soft-ware that contains with the existing simple API collection and listing
malicious codes. Recently, with the rapid development of methods.
network and ICT technologies, the amount of malware has In Chapter 2, the methods of malicious behavior
been increasing exponentially. To prevent and respond to detection in previous studies are introduced and the
such security threats, analysts and anti-virus program limitations of those methods are mentioned. In Chapter 3,
producers are trying to maximize the efficiency of malware the method proposed in this paper is introduced. In Chapter
analysis using various analysis methods. 4, experiments are conducted based on the proposed
In general, malware analysis is performed in two method, the results are verified, and the efficiency of the
formats, static analysis and dynamic analysis [2]. First, method is mentioned and in Chapter 5, this paper is
static analysis is a technique to detect malicious behaviors finished with conclusions.
by analyzing the structure of malware or specific binary
patterns at the code level. Although static analysis enables 2 RELATED WORKS
more in-depth and detailed analysis, if technologies that
obstruct static analysis such as executable file packing and This paper checks the process and limitations of
code obfuscation are applied to malware, much time and previous malware analysis research. It improves the
effort will be required and the analysis will become limitations of the analysis method by referring to existing
considerably more difficult [3]. Second, dynamic analysis studies and suggests an efficient analysis method. Previous
is a method of analyzing malicious behaviors by executing binary code based static analysis studies have been carried
actual malware in a virtual machine. This method is out by extracting the features of attributes in codes.
advantageous in that malicious behaviors can be clearly Statistical algorithms were generally grafted on such
observed even when executable file packing or code studies and utilized for analysis. Such statistics are mainly
obfuscation has been applied to malware because malware utilized for comparison with normal programs. For
is actually executed for analysis. However, it is not suitable instance, the statistics of op codes [4] or strings [5]
for analyzing trigger-based malware that runs at a certain extracted from the malware code sections are compared
time or is executed when the user's specific action is taken. with those of general normal programs for utilization in
In this paper, a malicious behavior detection method analysis. These methods simply collect signatures and
using static analysis based execution path searches was identify malicious behaviors by extracting information on
proposed to maximize the efficiency of malicious behavior the structures of executable files [6]. The most basic
detection. It aims to improve detection efficiency by methods among the sequence-based malicious behavior
detecting the execution path through static analysis and identification methods mentioned above are those that list
determining malicious behavior based on the correlation of the sequences of op codes [7] or the sequences of strings
APIs. Although the main analysis method is based on static [8]. These studies have been developed to carry out studies

810 Technical Gazette 28, 3(2021), 810-818


Jihun KIM et al.: Malicious Behavior Detection Method Using API Sequence in Binary Execution Path

that identify malicious behaviors by using the n-gram malicious behaviors are determined by the API call
technique [9], which cuts the information in the file sequences found in the execution path search, an API
according to a certain standard and processes the cut pieces extraction process is undergone. In this study, graphical
of information. In addition, studies intended to express the imaging is performed based on APIs. However, since not
byte sequences for binary codes with n-grams with a view all the extracted APIs can be applied to imaging because
to classifying malware were also carried out [10]. Such they amount tens of thousands in kind, they are made into
methods of collecting signatures for the internal structure graph images through classification to represent malicious
of a file can be defined with the unique DNA of the file behaviors, and finally, the mutual similarity relations of
[11] and are used for similarity and classification of pieces of malware are shown through image based
malware based on the foregoing. A clear and definite basis similarity determining work.
for judging malicious behaviors is the discovery of the
functions used by malware. Previous studies have 3.2 Static Execution Path Exploration
attempted to identify or classify malware by processing
such APIs within programs. [12-14] First, methods that list The core of the static execution path search in this
the sequences of APIs [15], or collect log information on paper is that the instruction set and subroutine are divided
the use of APIs to determine malicious behaviors [16] are into true and false ones according to the branch instruction
representative. Since APIs are functions used when the before they are searched. First, among the assembly
program is executed, such APIs are either statically instructions, the instructions for branching are jz, jzr, and
collected [17] or dynamically monitored [18]. In addition so on. As for the branch point, comparison instructions
to the methods that simply list the APIs, there are methods such as cmp and test that occur before the branch
that extract the features of malware according to the instructions are issued, are made through logical operation
frequency of use of the APIs inside the file [19]. The above instructions such as xor. We divide true and false marks
studies are statistical methods that have advantages such as according to the branch instructions to search all
not so large amounts of data to be stored, small amounts of instruction sets and subroutines.
operation, and high speed. However, they cannot respond The IDA's disassembly codes can identify the
to malware in real time and cannot accurately judge diverse instruction sets and subroutines used in the search. The
malware behaviors because they are based on simple instruction sets, which are functions that perform some
statistics [20]. To compensate for the foregoing, some behaviors in the functions, are represented by loc_xxxxxx,
studies carried out recently grafted various algorithms onto and the subroutines or basic blocks are represented by
the statistical properties as such to detect malicious special prefixes such as sub_xxxxxx. The IDA provides
behaviors. The eigenvalues of op code based graph images IDAPython, which is a python script, to provide powerful
can be calculated by measuring the distances between the processing activities for binary codes. In this study,
nodes based on the K Nearest Neighbor Algorithm (KNN searches were performed using IDA APIs such as
Algorithm), which is one of the machine learning GetFunctionName, CodeRefsFrom, and CodeRefsTo [30].
algorithms [21]. In addition, the processed strings can be
reprocessed with the Logistic Common Subsequence Table 1 Example binary code to display static execution path search
(LCS) algorithm to measure the eigenvalues of the strings 1 loc_401460 :
2 mov eax, [esp + argc]
[22]. In the studies introduced above, static analysis-based 3 sub esp, 44h
methods collect signatures or list the signatures in 4 cmp eas, 2
sequence, but cannot identify the accurate features of 5 Push ebx
behaviors because they are based on code-based feature 6 push ebp
extraction. To compensate for this this problem, dynamic 7 push ebi
analysis is adopted as the main detection method [23] or a 8 jzn loc_401488
9 loc_401484:
mixture of static and dynamic analyses is adopted [24]. 10 xor eax, eax
However, since dynamic analysis is a method that directly 11 jmp short loc_40148D
executes malware for analysis, it has disadvantages of 12 loc_401488:
energy efficiency and analysis time [25]. In addition to this, 13 sbb eax, eax
several studies and methods are under way to classify 14 sbb eax,0FFFFFFFFh
malware [26, 27]. In our previous study, we studied how to
express the features of malware using APIs [28]. In this Tab. 1 is the binary code for displaying static execution
paper, a method that is based on static analysis but tracks path searches. After CMP instruction in the fourth line, the
the execution flow was proposed so that the effect of binary code loc_401460 is branched into the instruction
dynamic analysis can be expected to compensate for sets of loc_40148 due to the JNZ in the eighth line, which
studies in which static and dynamic analyses are mixed. is a branch instruction. If the result of the comparison is
true, the binary code will be branched into loc_401488, and
3 PROPOSED METHOD if false, into loc_401484. To summarize finally,
3.1 Method Architecture loc_401460 is a binary code, which is branched into
loc_401488 when it is true and into loc_401488 when it is
First, the malware executable file is converted into false.
disassembled binary codes through the binary reverse Visualizing the binary code in Tab. 1 will look like Fig.
engineering tool, IDA [29] (IDA PRO 6.6). The IDA 1.
extracts the disassembled code for the executable into .asm
and uses it as data for static execution path search. Since

Tehnički vjesnik 28, 3(2021), 810-818 811


Jihun KIM et al.: Malicious Behavior Detection Method Using API Sequence in Binary Execution Path

provided by the operating system or the programming


language so that applications can use system resources or
libraries. Applications use APIs for the purpose of using
system resources or interacting with other applications and
APIs are called from the program. Windows applications
use Windows APIs in most cases and the APIs are included
in dynamic libraries (DLLs). Those pieces of malware that
are executed based on Windows also use the Windows
APIs, and information on the APIs can be used as good
information to determine malware behaviors.
In previous studies, pieces of malware API
information were simply collected based on signatures or
simply listed as with n-grams to detect the similarity and
Figure 1 Visualization of binary code
behaviors of malware. These methods are efficient for
simple classification of pieces of malware and the detection
This mechanism is applied equally even when there are
of the variants of the relevant pieces of malware because
subroutines in the instruction set. For example, if there is a
they simply list the APIs but they have a shortcoming that
subroutine in the instruction set as with the binary code
they cannot accurately detect the behaviors of malware.
shown in Tab. 2, the subroutine is entered and the above
In this study, malicious behaviors will be detected
execution path search is performed.
based on the API information discovered during binary
Table 2 Subroutine in The instruction Set execution path searches according to the proposed method.
1 start Proc near
2 … Table 3 Static Execution Path Exploration and APIs' Behaviors
1 loc_417AA4 :
3 …
2 … …
4 cmp eax, 6 3 mov [esp + 128Ch + var_1274], ebx
5 jz short loc_403366 4 call ds:CreateMutexA
6 push ebx 5 mov [esp + 128Ch + hObject], eax
6 call ds:GetLastError
7 call sub_405B0E 7 cmp eax, 0b7h
8 … … 8 jz loc_41A5DD
9 loc_41A5DD:
9 loc_403366:
10 .. …
10 mov esi, offset 11 call ds:CloseHandle
11 … … … 12 … …
13 cmp byte ptr [esi + 4], 0
12 sub_405B0E 14 jz short loc_428986
13 … … 15 loc_428986:
14 test eax,eax 16 … …
15 jnz short loc_405B32 17 call ds:EnterCriticalSection
18 cmp word ptr [ebx + 40h]
19 mov [esp + 78h + var_64], 1
The relevant binary code is visualized as shown in 20 jnz loc_4268DE
Fig. 2. 21 loc_4268DE:
22 lea ecx,, [esp + 78h + var_68]
23 call sub_428930
24 sub_488930:
25 …
26 call ds:LeavecriticalSection

Tab. 3 is an example of binary codes used to examine


the behaviors of APIs in static execution path searches. The
corresponding code branches from loc_4017AA4 into
loc_41A5DD when it has been found to be true through the
comparison instruction on the seventh line and
loc_41A5DD branches into loc_428986 when it has been
found to be true through the comparison instruction. The
15th through the 21st lines are the same search process as
the one shown above and loc_4268DE is defined as a
normal mark instead of true or false mark because the
subroutine sub_488930 is simply called in line 23 without
Figure 2 Visualization with subroutines
any comparison instruction. The mutual relationship
between the instruction set and the subroutine of the binary
3.3 Processing of the RGL Scheme code in Tab. 3 can be visualized as shown in Fig. 3 using
the method proposed in this paper.
In this study, malicious behaviors are detected based From the binary code in Tab. 3, it can be seen that
on Windows APIs. APIs are predetermined functions Windows API functions such as CreateMutexA,

812 Technical Gazette 28, 3(2021), 810-818


Jihun KIM et al.: Malicious Behavior Detection Method Using API Sequence in Binary Execution Path

GetLastError, and CloseHandle are called in lines 4, 6, 11, 3.4 API Classification
17, and 26 respectively. In this paper, the Normal, True,
and False mark application mechanism is equally applied In this study, malicious behaviors are visualized in the
to APIs to analyze the interactions between the APIs and form of graphs expressed with nodes and intermediate lines
the behaviors of the APIs. as shown in Fig. 5. However, there is a problem that the
number of APIs is too large to make the APIs into nodes.
In this study, to solve such problems, the APIs will be
reclassified into 24 upper categories through the functions
of the APIs so that behaviors can be clearly judged and the
temporal efficiency can be enhanced [31, 32]. For instance,
CreateFile and CreateProcess are APIs that perform
functions related to "files" or "processes" and APIs such as
GetSystemTime and GetLocalTime have the function to
collect information on "time" in the system. In addition,
APIs such as strcmp and stcat all perform functions related
to strings. Such a classification not only has many
categories to which APIs commonly belong although they
have been already classified in MSDN but also is too
Figure 3 Search for the static execution path of the binary code abstract to understand behaviors. For instance, all process-
related APIs are included in the category process but
whether the relevant APIs created, deleted, or accessed
processes cannot be known. Therefore, the functions of
APIs were reclassified into three, which are
CREATE_OR_OPEN, READ_OR_ACCESS, and
CLOSE. Tab. 4 shows the final 24 API categories.

Table 4 API Categorization


FILE- FILE-
FILE_CLOSE
CREATE_OR_OPEN READ_OR_ACCESS
PROCESS- PROCESS -
PROCESS _CLOSE
CREATE_OR_OPEN READ_OR_ACCESS
NETWOKR - NETWOKR -
NETWOKR _CLOSE
CREATE_OR_OPEN READ_OR_ACCESS
Figure 4 APIs' Interactions in the Binary Code
REGEDIT- REGEDIT -
REGEDIT _CLOSE
CREATE_OR_OPEN READ_OR_ACCESS
Fig. 4 is a figure visualized to show the interactions SERVICE STRING DEBUGGING
between the APIs of the binary codes shown in Tab. 3. RESOURCE TIME MUTEX
CreateMutexA is called first in line 4 of Tab. 3, and WINDOW-GUI- SHELL-AND-
THREAD
AND-BITMAP CONSOLE
GetLastError is called in line 6 thereafter. Since there is no
STSTEM-
instruction for branching by any comparison instruction INFORMATION
LIBRARY HANDLE
between lines 4 and 6, the interactions are marked as
normal between CreateMuetexA and GetLastError. Since However, not all APIs are reclassified into four
the interaction between the instruction sets loc_417AA4 behaviors. Since APIs such as Strcat and strcmp do not
and loc_41A5DD is marked as true by lines 7 and 8, the perform the function to create or access strings separately,
interaction between GetLastError, which is called last in APIs related strings are determined to be in a single
loc_417AA4, and GetLastErrorhe, which is called first in category, and APIs related to "time" such as GetLocalTime
loc_41A5DD, is marked as true. One thing noteworthy is and GetSystemTime are included in the category Time
the interaction between EnterCriticalSection and despite the fact that they access the system to obtain time
LeaveCriticalSection. Even though it is marked as true in information, because the behavior "Time" has the most
loc_428986 by loc_4268DE and loc_4268DE is marked as important value in the identification of malicious
normal by sub_488930, since there is no instruction to call behaviors.
any API in loc_4268DE and no API is called between
EnterCiritcalSection and LeaveCriticalSection, it is 4 EXPERIMENTS AND DISCUSSION
marked as true. Finally, only the interactions between APIs 4.1 Malicious Behavior Detection
are visualized as shown in Fig. 5.
In this study, malicious behaviors are detected based
on the interactions between APIs using binary static
execution path searches. Previous API sequence based
static analysis studies had a shortcoming of being unable to
accurately understand malware behavior because they
simply listed or collected APIs. However, the method
proposed in this study enables the understanding of the
interactions between APIs because it uses execution path
Figure 5 APIs' Interactions Decision

Tehnički vjesnik 28, 3(2021), 810-818 813


Jihun KIM et al.: Malicious Behavior Detection Method Using API Sequence in Binary Execution Path

searches despite the fact that it is a static analysis so that for manipulation of dlls, the API behaviors can be
the effects of dynamic analysis can be expected. confirmed as DLL injection that inserts code into the
As a representative example, when the malicious remote process of calling LoadLibrary to forcibly make
behavior of Trojan.Graftor.D4C56B has been analyzed by the DLL to be loaded into the context of the relevant
the method proposed in this paper, the graphic image process.
shown in Fig. 6 appears.
Fig. 6 shows that the malware uses APIs such as 4.2 Comparison with Dynamic Analysis
String, SystemInformation, Module, and Process. In
particular, a detailed analysis of the red shaded API With regard to the behaviors shown in 4.1, the existing
behaviors is as follows. In light of the fact that the relevant simple API collection and listing method, the API Monitor
APIs use processes such as OpenProcess, Process32Next, based [33] dynamic analysis, and the method proposed in
WriteProcessMemory, and VirutalAllocEx and APIs used this paper are compared as shown in Tab. 5.

Figure 6 Trojan.Graftor.D4C56B's malicious behavior

Table 5 Comparison with Dynamic Analysis


API Sequence ….LoadLibrary VirtualAllocEx lstrcmpA … OpenProcess GetCurrentProcessID Process32Next … GetModuleHandleA
listing method GetProcAddress getCurrentProcess … CreateRemoteThread …
… NtCreateMutant … Process32Next OpenProcess VirtualAllocEx GetProcAddress WriteProcessMemory
Dynamic analysis
CreateRemoteThread … NtClose …
...OpenProcess  (True)VirtualAllocEx,  (True)WriteProcessMemory  (Normal)GetModuleHandleA  (Normal)
Proposed method
CreateRemoteThread …

The previous sequence listing method can be shown to


have API sequences different from those that appear in the
dynamic analysis that directly executes APIs to analyze. IAT Hooking
However, the API sequences in the method proposed in this
paper can be identified to be similar to those appearing in
dynamic analysis.
Key Logger
4.3 Common Graph of Malicious Behaviors

In this study, common graphs of representative


malicious behaviors (Dll injection, Downloader, IAT
Screen Capture
Hooking, Key Logger, Screen Capture, Antidebugging)
were identified as shown in Tab. 6 in the same method.

Table 6 Malicious Behavior in Categorization of API


Malicious behavior Common behavior graphs
Antidebugging

DLL Injection
Tab. 6 shows the interactions of the behaviors of all
APIs. These actions of behaviors are shown after being
combined by the categorization of APIs as shown in Tab.
Downloader 7 for the clarity of analysis methods and the efficiency of
analysis time.

814 Technical Gazette 28, 3(2021), 810-818


Jihun KIM et al.: Malicious Behavior Detection Method Using API Sequence in Binary Execution Path

Table 7 Malicious Behavior in Categorization of API In addition, an example of applying each of the
Behavior DLL injection methods to Tab. 8 code is also shown.
The test set is 1236 pieces of randomly generated
Behavior malware and all of them include an IAT (Import address
Grpah Table) because the method proposed in this paper analyzes
Image the interactions between APIs. First, through the identified
common behavior graphs, each malicious behavior was
PROCESS- analyzed based on the data set consisting of 1,236 pieces
Sequence READ_OR_ACESS(TRUE)RESOURCE of malware.
(TRUE) LIBRARY)(NORMAL)THREAD Figs. 7 to 12 are graphs comparing the method
Behavior Downloader
proposed in this study and the existing method. It can be
seen that the proposed methods show larger numbers of
detection of the malicious behaviors, DLL injection, IAT
Behavior
Grpah Hooking, Screen Capture, and Anti Debugging when
Image compared to the existing detection methods.

Sequence NETWORK-READ_OR_ACESS(TRUE)LIBRARY

Behavior IAT Hooking

Behavior
Grpah
Image

LIBRARY(TRUE)STRING
Sequence
(NORMAL)RESOURCE

Behavior KeyLogger

Behavior
Grpah Figure 7 DLL Injection
Image

Sequence WINDOW-GUI-BITMAP(TRUE)HOOK

4.4 Efficiency

DLL injection, downloader, IAT hooking, key logger,


screen capture, and anti-debugging. The studies being
compared are those that simply collect or list op codes or
APIs such as OPCODE, N-gram and API sequence.

Table 8 Example Binary Code


start : push esi
mov esi, [esp + 4 + arg_0]
push edi Figure 8 Key Logger
shl esi, 3
mov edi, off_409068
push edi
call ds: GetModuleHandleA
test eax,eax
jnz short loc_405B32
push edi
call sub_405AA0
test eax, eax,
jz short loc_405B41
loc_405B32:
push off_40906C
push eax
call ds: GetProcAddress

In this paper, we show the efficiency of the proposed


method compared with previous simple signature
information collections. Tab. 9 shows seven methods to Figure 9 Screen Capture
simply list or collect code-level information such as
OPCODE, N-GRAM, BYTE CODE, API, string and etc.

Tehnički vjesnik 28, 3(2021), 810-818 815


Jihun KIM et al.: Malicious Behavior Detection Method Using API Sequence in Binary Execution Path

Table 9 Comparison of previous method


No. Analysis Method Example
1 OPCODE pushmovpushshlmovpushcall…
2 OPCODE, 3-GRAM pusmovpusshlmovpuscaltesjnz…
3 OPCODE, N-GRAM P(n-gram)m(n-gram)p(n-gram)…
4 API Listing GetModuleHandleAGetProcAddress…
5 API Listing, N-GRAM G(n-gram)G(n-gram)….
6 API Frequency GetModuleHandleA(1)GetProcAddress(1)…
STRING Listing, Inst(n-gram)Erro(n-gram)uxth(n-gram)…
7
N-gram (The string is not visible in Tab. 8)

behaviors, Downloader and Key Logger are composed of


two nodes, and the marks of the intermediate lines that
show the interactions are True, so that only one sequence
of each of the relevant behaviors is identified. This means
that the structure of the sequence is too simple to detect
malicious behaviors, which is the reason why the accuracy
of detection is lowered. However, it can be seen that the
more complex malicious behavior, the higher the detection
efficiency.

Figure 10 Anti Debugging

Figure 13 Detection efficiencies according to behavior complexity

Fig. 13 is a graph showing the detection efficiencies


according to the behavior complexity. The relevant
efficiencies shown in the graph are the efficiencies of the
method proposed in this paper in comparison with the
highest efficiencies shown in the existing studies. The
Figure 11 Downloader
complexity of malicious behaviors in this study paper may
also be regarded as the complexity of sequences. The
efficiency of the method proposed in this paper was shown
to be 105% compared to existing studies when the
complexity of sequences was low because the grounds for
detection are reduced when the complexity decreases. On
the contrary, the efficiency of the method proposed in this
paper was shown to be 158% compared to existing studies
when the complexity of sequences was high because the
grounds for detection increase when the complexity
increases. This means that the more complex the malicious
behaviors, the higher the efficiency of detection.

4.5 Binary Classification Result


Figure 12 IAT Hooking
When compared to previous studies, the method
However, it can be seen that the method proposed in proposed in this paper did not show high detection rates for
this study shows very similar numbers of detection of other behaviors such as Downloader and KeyLogger. This is
malicious behaviors such as Downloader and key logger because the relevant behaviors conduct simple API
when compared to the existing detection methods. This is interactions so that grounds for judgment as being
because the relevant behaviors perform API interactions malicious are insufficient. However, it can be seen that the
that are too simple to make the relevant behaviors to be higher the complexity of malicious behaviors, the higher
judged to be malicious. In other words, the two malicious the detection efficiency of the method proposed in this

816 Technical Gazette 28, 3(2021), 810-818


Jihun KIM et al.: Malicious Behavior Detection Method Using API Sequence in Binary Execution Path

paper. These results can be proved based on the accuracy sequence as the main method. All the accuracy and f-
and f-measure values based on the wrong detection rates measure values of the method proposed in this paper were
and detection missing rates of existing studies. measured to be higher compared to previous studies. In
In this paper, we compared the proposed method with addition, the pieces of malware detected by the method
previous studies and binary classification results. Since this proposed were identified to show an average Virustotal
paper is an API sequence based on static analysis, all [37] detection rate of 69%.The summary of the contents
previous comparative studies are based on static analysis. can be found in Tab. 10.
[19, 34] used API frequency, and [35, 36] used API

Table 10 Binary Classification Result


Malware Virus Total
Previous Works Used Method accuracy f-measure
Sample Average
[19] API Frequency 66,703 0.985 0.984
[33] API Frequency 32,000 0.983 0.878
Not mentioned
[34] API Sequence 800 0.841 0.909
[35] API Sequence 17,366 0.0.930 0.941
Proposed Method API Sequence 1,236 0.991 0.992 69

5 CONCLUSIONS AND FUTURE WORK 6 REFERENCES

In this paper, a method to detect execution paths based [1] Neumann, J. & Burks, A. W. (1966). Theory of self-
on static analysis and judge malicious behaviors based on reproducing automata. Urbana: University of Illinois press.
APIs' interrelationships was proposed. Although static [2] Gandotra, E., Bansal, D., & Sofat, S. (2014). Malware
analysis is the main analysis, the method proposed in this analysis and classification: A survey. Journal of Information
Security, 2014. https://doi.org/10.4236/jis.2014.52006
paper enables analyzing the flow of behaviors because it [3] Sharif, M. I., Lanzi, A., Giffin, J. T., & Lee, W. (2008).
searches execution paths. This means that although static Impeding Malware Analysis Using Conditional Code
analysis is adopted as a main method, the advantages of Obfuscation. NDSS.
dynamic analysis that directly executes APIs to analyze the [4] Bilar, D. (2007). Opcodes as predictor for malware.
APIs are applied to the method proposed in this paper. In International journal of electronic security and digital
this study, execution flows were analyzed according to forensics, 1(2), 156-168.
branch instructions and the interactions of APIs collected https://doi.org/10.1504/IJESDF.2007.016865
during the flows were analyzed. API interactions are [5] Griffin, K., Schneider, S., Hu, X., & Chiueh, T. C. (2009).
marked as normal, true, and false and are reclassified into Automatic generation of string signatures for malware
detection. International workshop on recent advances in
and listed as 24 upper categories. In this study, the intrusion detection, 101-120. Springer, Berlin, Heidelberg.
detection method based on the relevant method was https://doi.org/10.1007/978-3-642-04342-0_6
compared with the existing simple API collecting method [6] Shafiq, M. Z., Tabish, S. M., Mirza, F., & Farooq, M. (2009,
and API listing method. The malicious behaviors used for September). Pe-miner: Mining structural information to
the comparison are six behaviors, which are dll injection, detect malicious executables in real time. International
downloader, IAT hooking, key logger, screen capture, and workshop on recent advances in intrusion detection, 121-141.
anti-debugging. The method proposed in this paper showed Springer, Berlin, Heidelberg.
high efficiencies in the discrimination of four behaviors https://doi.org/10.1007/978-3-642-04342-0_7
[7] Santos, I., Brezo, F., Nieves, J., Penya, Y. K., Sanz, B.,
among the six behaviors except for downloader and the key
Laorden, C., & Bringas, P. G. (2010, February). Idea:
logger. This is because the API interactions of downloader Opcode-sequence-based malware detection. International
and key logger are insufficient for judgment of the Symposium on Engineering Secure Software and Systems,
behaviors as being malicious. This is related to the 35-43. Springer, Berlin, Heidelberg.
complexity of malicious behaviors. As malicious behaviors https://doi.org/10.1007/978-3-642-11747-3_3
became more complicated, higher efficiencies of detection [8] Hu, K. G. S. S. X. & Chiueh, T. C. (2008). Automatic
appeared because the grounds for judgment of malicious Generation of String Signatures for Malware Detection.
behaviors became more sufficient. In future studies, the Symantec Research Laboratories, 1-29.
frequencies of behaviors will be added to prepare grounds [9] Santos, I., Penya, Y. K., Devesa, J., & Bringas, P. G. (2009).
N-grams-based File Signatures for Malware Detection.
for judgment of detailed behaviors. The utilization of such
ICEIS, 9(2), 317-320.
numerical data can be extended to apply machine learning https://doi.org/10.5220/0001863603170320
and various statistics based algorithms, and based on such [10]Moskovitch, R., Feher, C., Tzachar, N., Berger, E., Gitelman,
data, malware will be visualized and malware similarity M., Dolev, S., & Elovici, Y. (2008). Unknown malcode
will be calculated. detection using opcode representation. European conference
on intelligence and security informatics, 204-215. Springer,
Acknowledgements Berlin, Heidelberg.
https://doi.org/10.1007/978-3-540-89900-6_21
This research was supported by the 2018 Yeungnam [11] Choi, Y. H., Han, B. J., Bae, B. C., Oh, H. G., & Sohn, K. W.
(2012). Toward extracting malware features for
University Research Grant (218A061016, 218A380138)
classification using static and dynamic analysis. 8th
and the National Research Foundation of Korea (NRF) International Conference on Computing and Networking
grant funded by the Korea government (MSIT) (No. Technology (INC, ICCIS and ICMIC), 126-129).
2018R1D1A1B07050647). [12] Zhang, M., Duan, Y., Yin, H., & Zhao, Z. (2014). Semantics-
aware android malware classification using weighted

Tehnički vjesnik 28, 3(2021), 810-818 817


Jihun KIM et al.: Malicious Behavior Detection Method Using API Sequence in Binary Execution Path

contextual api dependency graphs. Proceedings of the 2014 [28] Jihun, K., Sung, W. L., & Jonghee, Y. (2021). Expression of
ACM SIGSAC conference on computer and communications malware characteristics using API sequence. Journal of
security, 1105-1116. https://doi.org/10.1145/2660267.2660359 Smart Technology Applications, 2(1).
[13] Lu, H., Wang, X., & Su, J. (2013). SCMA: Scalable and [29] Eagle, C. (2011). The IDA pro book.
collaborative malware analysis using system call sequences. [30] See https://www.hexrays.com/products/ida/support/
International Journal of Grid and Distributed Computing, idapython _docs/
6(2), 11-28. [31] Zhou, B., Xia, X., Lo, D., Tian, C., & Wang, X. (2014).
[14] Elhadi, A. A. E., Maarof, M. A., & Barry, B. I. (2013). Towards more accurate content categorization of API
Improving the detection of malware behaviour using discussions. Proceedings of the 22nd International
simplified data dependent API call graph. International Conference on Program Comprehension, 95-105.
Journal of Security and Its Applications, 7(5), 29-42. https://doi.org/10.1145/2597008.2597142
https://doi.org/10.14257/ijsia.2013.7.5.03 [32] Uppal, D., Sinha, R., Mehra, V., & Jain, V. (2014). Exploring
[15] Uppal, D., Sinha, R., Mehra, V., & Jain, V. (2014, behavioral aspects of API calls for malware identification
September). Malware detection and classification based on and categorization. International Conference on
extraction of API sequences. International conference on Computational Intelligence and Communication Networks,
advances in computing, communications and informatics 824-828. https://doi.org/10.1109/CICN.2014.176
(ICACCI), 2337-2342. [33] See http://www.rohitab.com/apimonitor
https://doi.org/10.1109/ICACCI.2014.6968547 [34] Sami, A., Yadegari, B., Rahimi, H., Peiravian, N., Hashemi,
[16] Fan, C. I., Hsiao, H. W., Chou, C. H., & Tseng, Y. F. (2015). S., & Hamze, A. (2010). Malware detection based on mining
Malware detection systems based on API log data mining. API calls. Proceedings of the 2010 ACM symposium on
39th annual computer software and applications conference, applied computing, 1020-1025.
3, 255-260. https://doi.org/10.1109/COMPSAC.2015.241 https://doi.org/10.1145/1774088.1774303
[17] Alazab, M., Venkataraman, S., & Watters, P. (2010, July). [35] Sathyanarayan, V. S., Kohli, P., & Bruhadeshwar, B. (2008).
Towards understanding malware behaviour by the extraction Signature generation and detection of malware families.
of API calls. Second cybercrime and trustworthy computing Australasian Conference on Information Security and
workshop, 52-59. https://doi.org/10.1109/CTC.2010.8 Privacy, 336-349. Springer, Berlin, Heidelberg.
[18] Rajagopalan, M., Hiltunen, M. A., Jim, T., & Schlichting, R. https://doi.org/10.1007/978-3-540-70500-0_25
D. (2006). System call monitoring using authenticated [36] Ye, Y., Wang, D., Li, T., & Ye, D. (2007). IMDS: Intelligent
system calls. IEEE Transactions on Dependable and Secure malware detection system. Proceedings of the 13th ACM
Computing, 3(3), 216-229. SIGKDD international conference on Knowledge discovery
https://doi.org/10.1109/TDSC.2006.41 and data mining, 1043-1047.
[19] Alazab, M., Venkatraman, S., Watters, P., & Alazab, M. https://doi.org/10.1145/1281192.1281308
(2010). Zero-day malware detection based on supervised [37] See https://www.virustotal.com
learning algorithms of API call signatures.
[20] Moser, A., Kruegel, C., & Kirda, E. (2007). Limits of static
analysis for malware detection. Twenty-Third Annual Contact information:
Computer Security Applications Conference (ACSAC 2007),
421-430. https://doi.org/10.1109/ACSAC.2007.21 Jihun KIM, M.S.
[21] Firdausi, I., Erwin, A., & Nugroho, A. S. (2010, December). Dept. of Computer Engineering, Yeungnam University,
Analysis of machine learning techniques used in behavior- 280 Daehak-Ro, Gyeongsan, Gyeongbuk, Republic of Korea
E-mail: f13521@naver.com
based malware detection. Second international conference
on advances in computing, control, and telecommunication Sungwon LEE, M.S.
technologies, 201-203. https://doi.org/10.1109/ACT.2010.33 Dept. of Computer Engineering, Yeungnam University,
[22] Blount, J. J., Tauritz, D. R., & Mulder, S. A. (2011, July). 280 Daehak-Ro, Gyeongsan, Gyeongbuk, Republic of Korea
Adaptive rule-based malware detection employing learning E-mail: noke15@ynu.ac.kr
classifier systems: a proof of concept. 35th Annual Computer
Software and Applications Conference Workshops, 110-115. Jonghee YOUN, PhD, Professor
https://doi.org/10.1109/COMPSACW.2011.28 (Corresponding author)
Dept. of Computer Engineering, Yeungnam University,
[23] Nair, V. P., Jain, H., Golecha, Y. K., Gaur, M. S., & Laxmi,
280 Daehak-Ro, Gyeongsan, Gyeongbuk, Republic of Korea
V. (2010). Medusa: Metamorphic malware dynamic analysis E-mail: youn@yu.ac.kr
using signature from api. Proceedings of the 3rd
International Conference on Security of Information and
Networks, 263-269. https://doi.org/10.1145/1854099.1854152
[24] Roundy, K. A. & Miller, B. P. (2010). Hybrid analysis and
control of malware. International Workshop on Recent
Advances in Intrusion Detection, 317-338. Springer, Berlin,
Heidelberg. https://doi.org/10.1007/978-3-642-15512-3_17
[25] Egele, M., Scholte, T., Kirda, E., & Kruegel, C. (2008). A
survey on automated dynamic malware-analysis techniques
and tools. ACM computing surveys (CSUR), 44(2), 1-42.
https://doi.org/10.1145/2089125.2089126
[26] Sharma, A. & Sahay, S. K. (2016). An effective approach for
classification of advanced malware with high accuracy.
https://doi.org/10.14257/ijsia.2016.10.4.24
[27] Hordri, N. F., Ahmad, N. A., Yuhaniz, S. S., Sahibuddin, S.,
Ariffin, A. F. M., Saupi, N. A. M., Senan, M. F. E. M., et al.
(2018). Classification of malware analytics techniques: a
systematic literature review. International journal of security
and its applications, 12(2), 9-18.
https://doi.org/10.14257/ijsia.2018.12.2.02

818 Technical Gazette 28, 3(2021), 810-818

You might also like