Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
12 views

Binary Code Vulnerability Detection Based On Multi-Level Feature Fusion

This document discusses a method for binary code vulnerability detection based on multi-level feature fusion. The proposed model considers both word-level features obtained using ELMo embeddings and instruction-level features extracted using bidirectional GRUs. These features are then fused using weighted fusion to improve vulnerability detection accuracy. The model is evaluated on two datasets, achieving F1-scores of 98.9% and 87.7%. The experimental results demonstrate that the multi-level feature fusion approach can enhance binary code vulnerability detection performance.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Binary Code Vulnerability Detection Based On Multi-Level Feature Fusion

This document discusses a method for binary code vulnerability detection based on multi-level feature fusion. The proposed model considers both word-level features obtained using ELMo embeddings and instruction-level features extracted using bidirectional GRUs. These features are then fused using weighted fusion to improve vulnerability detection accuracy. The model is evaluated on two datasets, achieving F1-scores of 98.9% and 87.7%. The experimental results demonstrate that the multi-level feature fusion approach can enhance binary code vulnerability detection performance.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Received 21 May 2023, accepted 19 June 2023, date of publication 23 June 2023, date of current version 28 June 2023.

Digital Object Identifier 10.1109/ACCESS.2023.3289001

Binary Code Vulnerability Detection Based on


Multi-Level Feature Fusion
GUANGLI WU , (Member, IEEE), AND HUILI TANG
School of Cyberspace Security, Gansu University of Political Science and Law, Lanzhou 730070, China
Corresponding author: Guangli Wu (272956638@qq.com)
This work was supported in part by the Natural Science Foundation of Gansu Province under Grant 21JR7RA570; in part by the Gansu
University of Political Science and Law Major Scientific Research and Innovation Project under Grant GZF2020XZDA03; in part by the
Young Doctoral Fund Project of Higher Education Institutions in Gansu Province under Grant 2022QB-123 in 2022; in part by the Gansu
Provincial University Innovation Fund Project 2022A-097; in part by the Gansu Province Excellent Graduate Student Innovation Star
Project, in 2022, under Grant 2022CXZX-790; and in part by the University-Level Innovative Research Team of Gansu University of
Political Science and Law.

ABSTRACT The existence of software vulnerabilities will cause serious network attacks and information
leakage problems. Timely and accurate detection of vulnerabilities in software has become a research
focus on the security field. Most existing work only considers instruction-level features, which to some
extent overlooks certain syntax and semantic information in the assembly code segments, affecting the
accuracy of the detection model. In this paper, we propose a binary code vulnerability detection model
based on multi-level feature fusion. The model considers both word-level features and instruction-level
features. In order to solve the problem that traditional text embedding methods cannot handle polysemy,
this paper uses the Embeddings from Language Models (ELMo) model to obtain dynamic word vectors
containing word semantics and other information. Considering the grammatical structure in the assembly
code segment, the model randomly embeds the normalized assembly code segment to represent it. Then
the model uses bidirectional Gated Recurrent Unit (GRU) to extract word-level sequence features and
instruction-level sequence features respectively. Then, the weighted feature fusion method is used to study
the impact of different sequence features on the model performance. During model training, adding standard
deviation regularization to constrain model parameters can prevent the occurrence of overfitting problems.
To evaluate our proposed method, we conduct experiments on two datasets. Our method achieves an F1-score
of 98.9 percent on the Juliet Test Suite dataset and a F1-score of 87.7 percent on the NDSS18 (Whole)
dataset. The experimental results show that the model can improve the accuracy of binary code vulnerability
detection.

INDEX TERMS Binary code vulnerability detection, embeddings from language models, feature fusion,
instruction level sequence features, word level sequence features.

I. INTRODUCTION immeasurable losses. Therefore, in the process of software


In today’s highly developed technology, the widespread development and maintenance, software security vulnerabil-
use of software has greatly facilitated people’s lives, but ities are highly valued by developers. It is significant for
at the same time, it has brought many security problems. individuals, enterprises, and even the whole society to detect
Software vulnerability has always been a topic that can software vulnerabilities timely and accurately.
not be ignored because once criminals maliciously invade Software vulnerability detection [1], [2], [3], [4], [5] is
the system through a software vulnerability, it may cause to analyze the input code in the software to detect whether
there are exploitable defects. Generally, it is very difficult for
The associate editor coordinating the review of this manuscript and researchers to obtain the software’s source code, but they can
approving it for publication was Sedat Akleylek . get the corresponding binary files. Therefore, the study of
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.
For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
63904 VOLUME 11, 2023
G. Wu, H. Tang: Binary Code Vulnerability Detection Based on Multi-Level Feature Fusion

binary code vulnerability detection [6] has crucial practical Due to its excellent ability to automatically extract fea-
significance. At present, static analysis [7], [8], [9], dynamic tures, deep learning has been validated as effective in code
analysis [10], [11], [12], and a combination of dynamic and vulnerability detection [15]. Li et al. [16] were the first to
static analysis can be used for binary code vulnerability apply deep learning in software vulnerability detection tasks.
detection. The designed VulDeePecker (Vulnerability Deep Pecker)
The static analysis method refers to the method of analyz- system provides a new research perspective for vulnerability
ing the program to detect whether there are vulnerabilities detection and this paper also contributes the first vulner-
in the program without running the program. This method ability dataset suitable for deep learning. Laura et al. [17]
often converts the program to be tested into an intermediate proposed the Vulnerability detection using natural code
language, analyzes the typical features contained in the inter- bases (VUDENC) method based on natural code reposito-
mediate language, and then uses specific techniques to detect ries. VUDENC utilizes Word2Vec to represent word vectors,
vulnerabilities. and LSTM networks classify sequences of vulnerable code
The dynamic analysis method [13] needs to run the tar- tokens. The work of [16] and [17] are both excellent
get program and observe and analyze the execution status approaches for applying deep learning to source code vulner-
of the program so as to achieve the purpose of detecting ability detection.
vulnerabilities. For example, environment injection error is to Applying deep learning technology to the field of binary
inject error information into its execution environment with- code vulnerability detection can be divided into code scan-
out modifying the program to be tested and then observing ning and similarity detection. Binary code similarity detec-
the running status of the program. If the program’s running tion is to calculate the similarity between the code to be
status is abnormal, it means that the artificial injection error tested and the code with identified vulnerabilities, and finally
triggered the potential defects in the program so as to achieve determine whether the code to be tested has vulnerabili-
the purpose of detecting vulnerabilities. ties. Wang et al. [18] proposed the jump-aware Transformer
The dynamic and static analysis method [14] combines for binary code similarity detection (jTrans), which is the
the accuracy of dynamic vulnerability analysis technology first solution to embed control-flow information into Trans-
and the path completeness of static vulnerability analysis former. jTrans combines the natural language processing
technology to detect vulnerabilities. (NLP) model that captures instruction semantics with the
Since the static analysis method studies all the running cfg that captures control information to infer the similarity
tracks of the target program, it has a great path coverage. representation of binary code, and finally realizes the fusion
However, the dynamic analysis method and the combination of control flow information into Transformer. At the same
of dynamic and static analysis methods still have the problem time, many scholars have applied the binary code similarity
of low path coverage. Therefore, the research on static meth- method for vulnerability detection. However, this method
ods in binary code vulnerability detection is also extensive. has certain disadvantages, it cannot detect unknown types of
Static binary code vulnerability detection can be divided vulnerabilities, so it is extremely important to directly detect
into traditional detection methods and detection methods binary codes using code scanning methods.
based on deep learning. Traditional static binary code vul- Code scanning refers to the process of traversing and
nerability detection methods usually convert binary code into slicing the assembly code segments obtained from binary
intermediate language, and then use some static methods code conversion to determine whether the slices contain vul-
to analyze the program to detect potential vulnerabilities. nerabilities. Existing code scanning techniques often utilize
For example, pattern matching method detect vulnerabilities methods from natural language processing for vulnerabil-
according to vulnerability patterns defined by human experts ity detection. The specific steps include data preprocessing,
in advance. Vulnerability patterns are the analysis of a large code embedding, and feature extraction. The code embed-
amount of code, abstracting and summarizing typical fea- ding network typically aims to vectorize the assembly code
tures that may be regular expressions, string matching, code segments, but there are still some challenges in transforming
structure, etc. of specific vulnerability types. By analyzing the code into recognizable vectors by neural networks while
intermediate languages, these typical features can better dis- preserving as much syntax and semantic information as possi-
tinguish whether the code contains defects. However, the ble. The Instruction2vec method proposed by Lee et al. [19]
method of pattern matching relies too much on the features represents each word in the assembly code segments using
abstractly selected, and can only detect such binary code Word2Vec word vectors. It takes into consideration the com-
with typical features. Traditional static binary vulnerability position structure of instructions and represents each line
detection methods have the advantages of detecting defects in of instructions using nine values: one for the opcode and
the early stages of program development, reducing software four for each of the two operands. If there are fewer than
development costs and time. However, when using them for two operands, padding is applied to fill the remaining val-
vulnerability detection, there may be false positives, false ues. Based on this, Yan et al. [20] proposed the Hierarchical
negatives, and other situations, and it may also consume a Attention Network for Binary Code Vulnerability Detection
lot of manpower and computing resources. (HAN-BSVD), which expands on the structural composition

VOLUME 11, 2023 63905


G. Wu, H. Tang: Binary Code Vulnerability Detection Based on Multi-Level Feature Fusion

of instructions mentioned in the [19]. They introduced an


additional field for operand type in the instruction struc-
ture. Unlike the aforementioned instruction structure, in the
method proposed by Le et al. [21], each instruction is divided
into opcode and instruction information, which are embedded
separately and then concatenated to form the embedded rep-
resentation of the instruction. In addition, in the BVDetector
method proposed by Tian et al. [22], Word2Vec is directly
used to generate word vectors. However, this method can
only detect vulnerabilities caused by library/API functions,
and it has certain limitations. In most studies using code
scanning techniques for binary code vulnerability detection,
code embedding networks often utilize the traditional text
embedding method Word2Vec to represent word vectors. FIGURE 1. Internal structure diagram of ELMo.
However, Word2Vec can only learn based on a relatively
large window, and the obtained word vector is static, with
identical vectors for the same word regardless of its position, feature fusion exists as the most suitable method for
limiting the capture of contextual relationships. To better vulnerability detection.
extract syntactic information from assembly code segments • In this paper, the ELMo pre-training model is used to
and obtain different vector representations of the same word obtain dynamic vectors according to the context seman-
in various contexts, we use the ELMo model to generate tic information of a word, which can solve the polysemy
dynamic vector representations of words. problem of a word well, and the obtained features are
In the feature extraction module, the work of [19] uses more abundant, which can better represent the assembly
TextCNN to automatically extract features from the con- code segment.
catenated vector representations of each line of instructions.
On the other hand, the work of [20] utilizes bidirectional II. RELATED WORK
GRU and word attention modules to obtain the embedding A. ELMo
representation of instructions. These embeddings are subse- ELMo [25] is a pre-trained language model. It is proposed
quently processed by TextCNN in conjunction with a spatial to solve the problem of polysemy that cannot be handled
attention mechanism to automatically extract features. The by traditional language models. Through it, we can capture
VUDENC [17] method selects Long Short-Term Memory the deep context word information in the sentence and get
(LSTM) networks to classify vulnerable code sequences at a the embedded representation that conforms to the current
fine-grained level. Narayana et al. [23] used artificial neural context. Even in the same sentence, the same word also has a
network, autoencoder, etc. to automatically extract features of different embedding vector representation. The ELMo model
vulnerable codes. Ouyang et al. [24] applied Word2Vec and structure is shown in Fig.1.
Long Short-Term Memory (LSTM) network to the research ELMo model includes an input layer, bidirectional lan-
of binary code vulnerability detection. guage model, and output layer. The bidirectional language
The above methods have been verified to have good perfor- model is the focus of the ELMo model, which is composed of
mance in the field of vulnerability detection. However, these multi-layer bidirectional LSTM. Through it, the deep context
methods only deal with instruction-level sequence features, semantic features of the original input can be obtained.
and to a certain extent ignore part of the syntax and semantic For the current input (t1 , t2 , · · ·, tm ) of the model, ELMo’s
information in the assembly code segment. To better learn forward language model models it, and uses the above infor-
the contextual relations between words in assembly code mation (t1 , t2 , · · ·, tk−1 ) of the current word tk in the input
segments, in this paper we consider word-level sequence sequence to predict the current word. The calculation is
features. Finally, the instruction-level features and word-level shown in (1).
sequence features in the assembly code segment are fused, m
and a binary code vulnerability detection model based on
Y
p (t1 , t2 , · · ·, tm ) = p (tk |t1 , t2 , · · ·, tk−1 ) (1)
multi-level feature fusion is proposed. k=1
The contribution of the model proposed in this paper is:
• In order to better identify vulnerable flaw features
ELMo’s backward language model models the input and
of the assembly code segment, a bidirectional Gated uses the following information (tk+1 , tk+2 , · · ·, tm ) of the
Recurrent Unit (GRU) is used to extract word-level current word tk in the input sequence to predict the current
sequence features and instruction-level sequence fea- word. The calculation is shown in (2).
tures. We compare the effects of feature concatenate, m
Y
feature addition, and weighted feature fusion on the p (t1 , t2 , · · ·, tm ) = p (tk |tk+1 , tk+2 , · · ·, tm ) (2)
model performance and finally prove that weighted k=1

63906 VOLUME 11, 2023


G. Wu, H. Tang: Binary Code Vulnerability Detection Based on Multi-Level Feature Fusion

FIGURE 2. The structure diagram of LSTM.


FIGURE 3. The structure diagram of GRU.

For each word tk , 2 * L+1 vectors will be obtained through The reset gate determines how many states at t − 1 time
the bidirectional language model of the L layer, including the ∼
are written into the current candidate set h t . The calculation
forward and backward vectors of each layer and the original
equation is:
input vector. The final output vector is shown in (3).
γt = σ Wγ · [ht−1 , xt ]

  n (6)
→LM ←LM o
Rk = xkLM , h k,j , h k,j |j = 1, · · ·, L = hLM ∼  
k,j |j = 0, · · ·, L h t = tanh W∼ · [γt ∗ ht−1 , xt ] (7)
h
(3) The output calculation equation of GRU is:
→LM ←LM yt = σ (WO · ht ) (8)
In (3), h k,j and h k,j are forward and backward language
model outputs, respectively. LSTM networks are often used where [·] represents the connection of two vectors, ∗ rep-
to process data with time series characteristics. In the bidirec- resents the product of matrix, σ (·) represents the sigmoid
tional language model, LSTM extracts the context semantic activation function.
information of the current word according to the sequence.
As shown in the Fig.2, LSTM network [26], [27] is composed III. THE MODEL
of an input gate, forgetting gate, and output gate. For binary code vulnerability detection tasks, this paper pro-
In Fig.2, xt is the input at the current time t, ht is the output poses a vulnerability detection model based on multi-level
at the current time, ct is the cell state, c′t is the new data at the feature fusion. The model mainly includes three parts: feature
current time, and σ is the activation function. ft , it and ot are extraction network, feature fusion module, and classifier. The
respectively the forgetting gate, input gate and output gate at feature extraction module extracts word-level and instruction-
the current moment. wf , wi and wo are respectively the weight level sequence features, respectively. The model uses ELMo
matrix corresponding to the forgetting gate, input gate, and pre-training model to obtain the dynamic vector representa-
output gate. tion of words, and bidirectional GRU obtains the sequence
characteristics between words. In the instruction-level feature
B. GRU extraction module, this paper first considers the syntax and
GRU [28] is a variant of LSTM, which can process sequence semantic structure of the instruction, obtains the embedding
data and solve the long-dependence problem in a traditional matrix of the assembly code segment, uses bidirectional GRU
recurrent neural network. Compared with the three ‘‘door’’ to extract context information of the instruction, and obtains
structures contained in the LSTM shown in Fig.2, the GRU instruction-level sequence features. In the feature fusion
structure is simpler, including only update and reset gates. module, the weighted feature fusion mechanism is used to
As shown in Fig.3, the GRU network consists of update complete the fusion between word-level and instruction-level
gate zt and reset gate γt . The current input of GRU at t time features. Finally, the model uses a classifier to get the clas-
is the hidden layer state ht−1 at t − 1 time and the input xt at sification results, that is, to determine whether the assembly
the current time. code segment contains vulnerabilities. The overall framework
The update door determines how much of the status at t −1 of the model is shown in Fig.4.
time is brought into the current state. The calculation equation
is: A. WORD-LEVEL FEATURE EXTRACTION MODULE
In this section, the ELMo pre-training language model is used
zt = σ (Wz · [ht−1 , xt ]) (4) to obtain the embedded representation of each word in the
∼ assembly code segment, and the bidirectional GRU obtains
ht = (1 − zt ) ∗ ht−1 + zt ∗ h t (5) the long-term context dependency of the word.

VOLUME 11, 2023 63907


G. Wu, H. Tang: Binary Code Vulnerability Detection Based on Multi-Level Feature Fusion

FIGURE 4. Overall architecture of the model.

When using deep learning for vulnerability detection, through the forward LSTM of the first layer, and get the hid-
identifying potentially vulnerable code needs to focus on →LM
den state h i,2 by inputting it into the forward LSTM of the
the learning of context semantic information in the code. →LM
It can improve the model detection performance by cor- second layer, so h i,j , j = 1, 2 is output as Ei through the for-
rectly and reasonably expressing the code meaning. Because ←LM
ward language model. Similarly, h i,j , j = 1, 2 is output as
Word2Vec [29] and other embedding methods are based Ei through backward language model.
on a large corpus and relatively long context to obtain the   So n
the output of word
→LM ←LM o
word vector, for the same word, the vector obtained through Ei is Ri = Ei , h i,j , h i,j |j = 1, 2 = hLM i,j |j = 0, 1, 2 .
Word2Vec is fixed without considering the context and other The output layer of the ELMo model takes into account the
factors of the word. Based on this, this paper uses ELMo to output of the last layer of LSTM, the original input vector,
obtain the dynamic word vector. The specific steps are as and the intermediate word vector. The calculation equation is
follows: Firstly, a pre-trained language model is generated as follows:
to learn the embedding of the word. When used, the word  
already has specific context information. Secondly, the word ELM oi = γ · λ0 · hLMi,0 + λ1 · hi,1 + λ2 · hi,2
LM LM
(9)
vector is adjusted according to the context information of the
word in the current task so that the word in different positions
has different vector representations. where γ represents the scaling coefficient of all word vectors
For the word-level feature extraction module, the input of obtained by ELMo when they are finally used, in this experi-
the model is T = (t1 , t2 , · · ·, tm )T , where ti , i = 1, · · · , m ment, the initial value is 1. λ0 , λ1 , λ2 is the weight coefficient
represents the words in the assembly code segment, and after softmax processing, and the initial value is 0. They are
m represents the number of words in the input assembly learnable parameters, and the specific values are obtained by
code segment. Each word is represented by a vector through model training.
the input layer of the ELMo module. Here, the input layer The input dimension in the word-level module is
of the ELMo model uses the random embedding repre- (Batch_size, m). The model input is embedded into the
sentation method to obtain the embedding matrix E = representation through the input layer of the ELMo mod-
(E1 , E2 , · · ·, Em )T . Then this paper uses a two-level bidirec- ule, and the dimension is (Batch_size, m, embed_dim).
tional language model to model the grammar, semantics and Here m represents the number of words in the assembly
other characteristics of words. Take the vector Ei of a word in code segment, embedded_dim indicates the dimension of
→LM the word embedding, that is, the dimension of Ei . Then
the embedded matrix E as the input, get the hidden state h i,1 through the bidirectional language model and the output

63908 VOLUME 11, 2023


G. Wu, H. Tang: Binary Code Vulnerability Detection Based on Multi-Level Feature Fusion

vector in the embedded matrix into the bidirectional GRU


in the form of a time step and finally obtains the instruction
level sequence feature. The result of bidirectional GRU is
also processed by maximum pooling, and the final output
dimension is (Batch, hidden_gru_size * 2).
FIGURE 5. Normalization result of one line instruction. After extracting the features of the input assembly code
segment, this paper’s focus is to consider the feature of word-
level sequence and instruction-level sequence. When using
layer, the dimension of the embedded matrix is (Batch_size, deep learning to detect vulnerabilities, we often take a line of
m, hiden_size * 2). instructions as a whole and use a neural network to obtain
After learning each word vector in the assembly code, the instruction-level features, or Wei et al. [30] only consider
model uses GRU to extract the word-level sequence features. word-level features. Therefore, this paper combines word-
In order to better consider the sequence impact, this paper level and instruction-level sequence features and fuses the
uses bidirectional GRU because the output of the current two features according to a certain method, fully considering
t-time is not only related to the previous state but also related their impact on vulnerability detection. Deep learning has
to the future state. The input of bidirectional GRU is EL = many feature fusion methods, including feature concatenate,
(ELM o1 , ELM o2 , · · ·, ELM om )T , at the current time t, the feature addition, and weighted feature fusion. The weighted
calculation equation of forward GRU is as follows: feature fusion method is used in this paper, and the calculation
→ → equation is as follows:
h t = f (ELM ot ) (10)
F = α · Fword + (1 − α) · Fins (12)
The calculation equation of backward GRU is as follows: Fword = Maxpoolword (BiGRUword (fLM (T ))) (13)
← ←
h t = f (ELM ot ) (11) Fins = Maxpoolins (BiGRUins (X )) (14)

Then the two results are merged to get the output ht = where Fword represents the word level sequence feature
→ ← obtained in section I, and Fins represents the instruction level
W→ ht +W← ht +b of bidirectional GRU at the current time
h h sequence feature obtained in section II. The value of weighted
t. Max pooling is used in the final part of the word-level feature fusion parameter α is given in section IV.
feature extraction module, which aims to make the model pay
more attention to important features so as to improve model C. ALGORITHM STEPS
performance. The vulnerability detection model proposed in this paper
begins by processing the input assembly code segments.
B. INSTRUCTION-LEVEL FEATURE EXTRACTION MODULE It uses an embedding network to obtain a matrix represen-
Considering that the context instructions in the assembly code tation of the input, and through a feature extraction network,
are also related, this module uses bidirectional GRU to extract it obtains word-level features and instruction-level features.
the context information of the input vector. Then, a feature fusion module is employed to obtain blended
The input of the instruction-level feature extraction module features. Finally, the classification results are obtained based
is the assembly code segment, but each line instruction has a on these features.
different length, so it needs to standardize each line instruc- The specific algorithm steps are as follows:
tion. When standardizing instructions, first, considering the Input: Raw input assembly code segments and labels D =
 (n) (n) N
syntax structure of assembly instructions and referring to x , y n=1
the method of processing instructions in Instruction2Vec, the step1. Use the ELMo model to get the dynamic vector
instructions are expressed in the form of one opcode and two representation of words in different contexts;
operands, where four values represent each operand. Second, step2. The word-level features are obtained through the
analyze the type of each operand in the instruction, and place bidirectional GRU network and max pool, as shown in (13);
the operand in a fixed position according to the type. If the step3. Use random embedding to obtain the vector rep-
value of the operand is not enough for a fixed length, it is resentation of the words in the instruction, and obtain the
filled with the invalid operand ‘PH’. The normalization result instruction-level features through the bidirectional GRU net-
of an instruction line is shown in Fig.5. work and max pool, as shown in (14);
The standardized result of the original assembly code seg- step4. Fusion of word-level sequence features and
ment is shown in Fig.6. instruction-level sequence features, as shown in (12);
In this paper, the random embedding method is used to step5. Input the final feature into the fully connected layer
obtain the embedding matrix X = [x1 , x2 , · · · , xn ]T of the with Softmax, and update the weight parameters according
assembly code segment with the dimension of (Batch_size, to (17);
n, 117). xi represents the i-th instruction in the assembly Output: The results of the vulnerability detection model in
code segment, i = 1, · · ·, n. Then the model inputs each this paper.

VOLUME 11, 2023 63909


G. Wu, H. Tang: Binary Code Vulnerability Detection Based on Multi-Level Feature Fusion

FIGURE 6. Assembly Code Segment Normalization Result.

D. LOSS CONSOLIDATION 1) JULIET TEST SUITE


In this paper, a regularization technique [31] based on the The dataset is compiled and disassembled from source code,
standard deviation is used. The motivation for adding this new which is from Juliet Test Suite v1.3 for C/C++ test cases. The
regularization is to control each value of the weight to solve work of [20] selected CWE121 type for research, which is a
the over-fitting problem in the model, which can improve the stack-based buffer overflow, resulting in 6506 files, of which
detection performance of the model. The standard deviation 3244 contain defects and 3262 have no defects.
regularization equation is: Data preprocessing specifically includes the following
k
X steps:
λ σ (wi ) (15) Compilation: The compilation process converts source
i=1 code into binary code. In this paper, makefile included in the
where, λ is the regularization coefficient, σ (·) is the standard Juliet Test Suite v1.3 for C/C++ is used for compilation.
deviation, and the calculation equation is: Makefile typically includes rules, each of which defines a
v  set of targets, dependencies, and a set of commands that
u !2 
u1
u X nk
1 Xnk  describe how to generate target files, including the specific
σ (w) = t w2i − wi (16) steps for generating the target files. Makefile included in
nk  nk 
i=1 i=1 the dataset of this paper typically defines the compiler and
linker instructions for this type of vulnerability, as well as
In (16), n depends on the number of features in the data set,
other necessary variables and macro definitions, to ensure
which represents the number of columns of a specific weight
that test cases are compiled and built correctly. For a single
matrix. Similarly, k is the number of rows in the weight
test case of CWE121 type, its source code contains two
matrix.
N types of functions: one with vulnerabilities and one without
In this task, the input is D = x (n) , y(n) n=1 , and the loss

vulnerabilities. If it is directly compiled into a binary file
function of the model is finally expressed as:
using the makefile, only one file containing both types will
N k
1 X  (n)  (n)  X be generated. Therefore, we implement binary classifica-
J (θ) = L y ,f x ;θ + λ σ (wi ) (17) tion by modifying the CFLAGS parameter in the makefile.
N
n=1 i=1 Specifically, in order to obtain vulnerable samples, a macro
The ultimate goal of the task is to minimize the loss OMIT-GOOD is appended after CFLAGS. This instructs the
between the real value and the predicted value of the model compiler to ignore some good test cases and only compile and
and reach the optimal solution. Where L (·) is the cross run test cases that contain potential vulnerabilities. In order to
entropy loss function and f x (n) ; θ is the output result of the

obtain normal, non-vulnerable samples, a macro OMIT-BAD
model. The model in this paper is tested under the framework is appended after CFLAGS, which will ignore some potential
of Pytorch. The Adam optimizer is used to learn and update vulnerabilities in the test cases.
the parameters of the neural network, and the learning rate Finally, using the modified makefile, we run the make com-
value is set to 0.0005. mand, which can compile all source code into corresponding
ELF files, divided into two categories: one containing vulner-
IV. EXPERIMENT
abilities and the other without vulnerabilities.
The previous sections have introduced this paper’s relevant
Disassembly: Disassembly is the process of converting
work and model. This section will focus on the experimental
binary code to assembly code segments. In this paper, we used
process, including datasets, evaluation indicators, and exper-
the disassembly tool IDA Pro 7.0 to read the compiled ELF
imental results and analysis.
(Executable and Linkable Format) file obtained from the
A. DATASET compilation process and convert it into assembly language
This experiment was mainly conducted on the Juliet Test form. By analyzing the output of the disassembly, we can
Suite dataset mentioned in [20] and NDSS18 dataset men- obtain information such as program instructions, function
tioned in [21]. names, and comments. Instructions are the basic operations

63910 VOLUME 11, 2023


G. Wu, H. Tang: Binary Code Vulnerability Detection Based on Multi-Level Feature Fusion

TABLE 1. Number of NDSS18 datasets under different platforms. TABLE 2. Experimental results on Juliet Test Suite dataset(unit:%).

that the program performs, encoded in binary form and stored


2 × Precision × Recall
in the program. Function names provide a way of naming F1 = (21)
functions, which helps us better understand the structure of Precision + Recall
the program. Comments can add human-readable explana-
AUC is the area under the ROC curve, and is an objective
tions to a program, helping us better understand the function
evaluation index to measure the advantages and disadvan-
and implementation of the program.
tages of the binary classification model. Its value is between
In this paper, we consider a program as a collection of func-
0 and 1. The higher the value, the better the classification
tions and replace calling instructions of each function with the
performance of the model.
corresponding function body. When extracting the dataset,
we chose the functions called by the main function as the
C. EXPERIMENTAL RESULTS AND ANALYSIS
entry points for fragment extraction. This is because the main
function is usually the entry point of the program and it calls This paper conducts experiments on the Juliet Test Suite
other functions to perform various tasks. By choosing the dataset. Moreover, experiments were carried out on the Win-
functions called by the main function as the entry points for dows subset, Linux subset, and datasets containing all binary
fragment extraction, we can extract code fragments related to codes of NDSS18, verifying the effectiveness of the model in
the main functionality of the program, and ultimately obtain this paper. At the same time, some classic methods that have
a dataset containing positive and negative samples. been proposed are compared under several datasets.

2) NDSS18(Linux) 1) COMPARATIVE EXPERIMENT


This dataset is a public dataset in the field of binary code The experimental results of this paper on the Juliet Test Suite
vulnerability detection, which contains code weaknesses dataset are shown in Table 2, where bold indicates that the
CWE119 and CWE399 and CVE (Common Vulnerabil- best results are obtained compared with other methods. (1)
ity and Exposures) samples [32]. It was compiled by The Instruction2Vec method in Table 2 efficiently models the
Le et al. [21] by extracting functions from the source code instructions in the assembly code segment, and uses TextCNN
dataset NDSS18 according to different platforms, and finally to extract local features. (2) O-TextCNN uses the embed-
obtained binary code datasets under Windows and Linux sys- ding method in the literature to express instructions and then
tems. Table 1 shows the number of binary codes that contain uses TextCNN to extract the local features of the embed-
vulnerabilities and do not contain vulnerabilities under the ding matrix. (3) VulDeePecker is a method proposed by
two platforms. Whole in the Table 1 includes all binary code Li et al. [16] for source code vulnerability detection, which
datasets under Windows and Linux systems. can also be used for binary code vulnerability detection. The
model uses bidirectional LSTM, a Dense layer, and a Softmax
B. EVALUATING INDICATOR classifier.
In order to compare with other methods, this paper refers to It can be seen from Table 2 that the model proposed in
the evaluation indicators of [21], including accuracy, preci- this paper has a better effect on the Juliet Test Suite dataset,
sion, recall, and F1-score. The calculation is shown in the and the results on each evaluation index are higher than the
equation. In the (18), TP, TN, FP, and FN are calculated baseline method. Among them, accuracy and F1-score are
by the confusion matrix. TP represents the actual positive both increased by 1.5 percent compared with the best results
data, and the forecast is also positive data. TN represents the in the Table 2. The baseline methods in Table 2 only extract
data that is verified by the truth, but the forecast is negative. instruction-level sequence features. Through comparison, it is
FP represents data that is negative in reality but positive in found that the experiments of the model in this paper are
prediction. FN indicates that the real data is negative, and the higher than other methods based on instruction-level feature
forecast is also negative. extraction in the Table 2, which further verifies the effective-
ness of the method in this paper.
TP + TN The model proposed in this paper was tested on the Win-
Accuracy = (18)
TP + TN + FP + FN dows subset, the Linux subset, and the Whole dataset of
TP NDSS18. The experimental results are shown in Table 3,
Precision = (19)
TP + FP Table 4, and Table 5. Among them, MDSAE is a method of
TP maximum divergence sequence autoencoder based on VAE
Recall = (20)
TP + FN improvement. The model encourages maximum divergence

VOLUME 11, 2023 63911


G. Wu, H. Tang: Binary Code Vulnerability Detection Based on Multi-Level Feature Fusion

TABLE 3. Experimental results on NDSS18(Windows) dataset (unit:%). TABLE 6. Results of ablation experiments for different modules on the
Juliet Test Suite dataset (unit:%).

TABLE 4. Experimental results on NDSS18(Linux) dataset (unit:%).


TABLE 7. Results of ablation experiments for different modules on the
NDSS18(Windows) dataset (unit:%).

TABLE 5. Experimental results on NDSS18(Whole) dataset (unit:%).


TABLE 8. Results of ablation experiments for different modules on
theNDSS18(Linux) dataset (unit:%).

between defective and non-defective files and can detect


binary code vulnerabilities. TABLE 9. Results of ablation experiments for different modules on the
NDSS18(Whole) dataset (unit:%).
According to the results shown in Table 3, Table 4, and
Table 5, the proposed model in this paper exhibits improve-
ments in Accuracy, F1-score, and AUC on the Windows
subset, the Linux subset, and the Whole dataset of NDSS18.
On the Windows dataset of NDSS18, compared with other
baseline models in Table 3, the model in this paper has a
certain improvement in the evaluation indicators except Pre-
cision. On the Linux dataset of NDSS18, our model improves module, we conducted ablation experiments on the Juliet Test
F1-score by 1.1 percent compared to the baseline model. Suite and NDSS18, as shown in Table 6, Table 7, Table 8,
In the field of computer security, recall is a crucial Table 9. By comparing the performance of different modules,
evaluation metric that represents the proportion of sam- it can be found that the accuracy and F1-score are different
ples predicted as containing defects by the network model when using different modules, and the model performance is
out of all actual samples containing defects in the dataset. the best after considering the word level and instruction level
A higher recall indicates that the network model will have feature modules, which verifies the effectiveness of using
fewer defect samples incorrectly classified as non-defective multi-level features proposed in this paper.
during prediction. The experimental results demonstrate that Analyzing the experimental results reveals that after incor-
on the Window subset and the Linux subset of NDSS18, porating both word-level sequence features and instruction-
the proposed model in this paper achieves better recall level sequence features, the proposed model in this paper
values compared to other models. F1-score balances preci- exhibits improvements across all datasets. However, the mag-
sion and recall and holds more reference value in practical nitude of performance improvement varies across different
applications. It can be seen that the proposed model shows datasets. Through ablation experiments, it was observed
improvements in all evaluation metrics, including F1-score. that the experimental performance on various datasets of
NDSS18 showed significant improvement. The accuracy and
2) ABLATION EXPERIMENT F1-score improved the most on the Linux dataset, with accu-
In order to verify the effectiveness of the innovation points racy increasing by 2.6 percent and F1-score increasing by
proposed in this paper, in this section, we conducted ablation 2.7 percent. On the Juliet Test Suite dataset, the experi-
experiments for different modules and different feature fusion mental results have also improved. From the experimental
methods. results, it can be seen that the detection performance of the
model that combines word-level features and instruction-
a: DIFFERENT MODULES level features is better, while the model that only considers
In order to verify the effectiveness of the word-level feature word-level features or only instruction-level features exhibits
extraction module and the instruction-level feature extraction slightly worse performance. It can be concluded that in the

63912 VOLUME 11, 2023


G. Wu, H. Tang: Binary Code Vulnerability Detection Based on Multi-Level Feature Fusion

TABLE 10. Results of ablation experiments for different fusion methods


on the Juliet Test Suite dataset (unit:%).

TABLE 11. Results of ablation experiments for different fusion methods


on the NDSS18(Windows) dataset (unit:%).

TABLE 12. Results of ablation experiments for different fusion methods


on the NDSS18(Linux) dataset (unit:%).

TABLE 13. Results of ablation experiments for different fusion methods


on the NDSS18(Whole) dataset (unit:%).
FIGURE 7. F1-score variation diagram using different weight values α on
different datasets.

3) PARAMETER SELECTION
The selection of parameter values in neural network has a
great influence on the model effect. In this experiment, two
parameters that are more important to the model are selected
multi-level feature fusion vulnerability detection model pro-
for discussion, namely α in the weighted feature fusion
posed in this paper, it is indeed effective to consider both
method and λ in the loss standard deviation regularization
word-level sequence features and instruction-level sequence
coefficient.
features. The features obtained through feature fusion can
better express the syntax, semantics, and other information in a: α IN WEIGHTED FEATURE FUSION
the original assembly code segment, thereby improving the It can be seen from III that the weighted feature fusion method
detection performance of the model. has good experimental results on both datasets. Different
weight values indicate a different emphasis on word level and
b: DIFFERENT FEATURE FUSION METHODS instruction level features. The selection of weight value α is
For word-level and instruction-level sequence features, the focus of this section.
this paper uses feature concatenate, feature addition, and Fig.7 shows the impact of different weights on the model.
weighted feature fusion. The ablation results on two As shown in Fig.7(a), when the values are 0.4 and 0.8,
datasets are shown in Table 10, Table 11, Table 12 and F1-score is higher on the Juliet Test Suite dataset. However,
Table 13. through compared with Fig.7(b), it is found that when α =
Different fusion methods have different characteristics and 0.4 is used on NDSS18 (Whole) dataset, F1-score has a higher
different representation capabilities. By analyzing Table 10, score than when α = 0.8 is used. Therefore, this paper
Table 11, Table 12 and Table 13, it can be seen that the finally chooses α = 0.4 as the weight of word-level vector
weighted feature fusion method performs best on each data features. Although the NDSS18 (Whole) dataset contains all
set. Compared with other fusion methods, the weighted the data of the windows and linux platforms, and contains
feature fusion method improves the most on the NDSS18 more types of vulnerabilities than the Juilet Test Suite dataset,
(Whole) dataset, with accuracy increased by 2.1 percent, and its representation is single, and word-level sequence features
F1-score increased by 1.8 percent. The analysis shows that have a greater impact on it.
the performance of the two fusion methods of feature splicing
and feature addition is low, and the weighted feature fusion b: COEFFICIENT λ OF STANDARD DEVIATION
method considers the contribution of different features to the REGULARIZATION IN LOSS
detection task in this paper, and the experimental results are The standard deviation regularizes the constraint model
the best. parameters and obtains the regularization term by multiplying

VOLUME 11, 2023 63913


G. Wu, H. Tang: Binary Code Vulnerability Detection Based on Multi-Level Feature Fusion

using makefile provided with the dataset for compilation


and the IDA Pro tool for disassembly, resulting in assembly
language with Intel syntax.
In future work, further research can be conducted in the
data processing phase to investigate the impact of different
compilation platforms and disassemblers on the task of binary
code vulnerability detection.

VI. CONCLUSION
In this paper, we propose a binary code vulnerability detec-
tion model based on multi-level feature fusion. The model
proposed in this paper considers the word-level sequence
features in the assembly code segment and learns the dynamic
vector representation of the same word in different contexts
through ELMo model. Then, word-level sequence features
and instruction-level sequence features in assembly code seg-
ments are fused. In the process of feature fusion, this paper
uses methods such as feature splicing, feature addition, and
weighted feature fusion to discuss the influence of differ-
ent features on the binary code vulnerability detection task.
Considering the phenomenon of overfitting in model training,
this paper uses standard deviation regularization to improve
model performance. We conduct experimental evaluation and
FIGURE 8. F1-score variation diagram using different weight values λ on comparison on the Juilet Test Suite and NDSS18 datasets.
different datasets. On the Juilet Test Suite dataset, the F1-score reaches 98.9 per-
cent, and on the NDSS18 (Whole) dataset, the F1-score
reaches 87.7 percent. Compared to the baseline model, the
the standard deviation of the weight matrix so as to reduce the model proposed in this paper exhibits higher accuracy in the
error and prevent over-fitting. task of binary code vulnerability detection.
Fig.8 shows the impact of different weight values λ on
the model. By analyzing Fig.8, it can be found that when REFERENCES
λ value is 1, the results of the model on both datasets are [1] H. Hanif, M. H. N. M. Nasir, M. F. Ab Razak, A. Firdaus, and N. B. Anuar,
optimal. 0 means that the model does not use standard devi- ‘‘The rise of software vulnerability: Taxonomy of software vulnerabilities
ation regularization during training. When the value of λ is detection and machine learning approaches,’’ J. Netw. Comput. Appl.,
vol. 179, Apr. 2021, Art. no. 103009, doi: 10.1016/j.jnca.2021.103009.
between 0 and 1, F1-score is constantly improving, which [2] F. Lomio, E. Iannone, A. De Lucia, F. Palomba, and V. Lenarduzzi, ‘‘Just-
means that when the value of λ is within a certain range, the in-time software vulnerability detection: Are we there yet?’’ J. Syst. Softw.,
standard deviation regularization has a certain fitting ability vol. 188, Jun. 2022, Art. no. 111283, doi: 10.1016/j.jss.2022.111283.
for the model parameters, which can improve the model [3] A. C. Eberendu, V. I. Udegbe, E. O. Ezennorom, A. C. Ibegbulam, and
T. I. Chinebu, ‘‘A systematic literature review of software vulnerability
performance. However, when the value of λ is greater than 1, detection,’’ Eur. J. Comput. Sci. Inf. Technol., vol. 10, no. 1, pp. 23–37,
the F1-score decreases, so the value of λ should not be too Apr. 2022.
small or too large. [4] G. Lin, S. Wen, Q. Han, J. Zhang, and Y. Xiang, ‘‘Software vulnerability
detection using deep neural networks: A survey,’’ Proc. IEEE, vol. 108,
no. 10, pp. 1825–1848, Oct. 2020, doi: 10.1109/JPROC.2020.2993293.
V. DISCUSSION [5] X. Yuan, G. Lin, Y. Tai, and J. Zhang, ‘‘Deep neural embedding for
In summary, the model proposed in this paper can improve software vulnerability discovery: Comparison and optimization,’’ Secur.
Commun. Netw., vol. 2022, pp. 1–12, Jan. 2022, doi: 10.1155/2022/
the detection performance to a certain extent in the binary 5203217.
code vulnerability detection task. However, this paper also [6] P. Xu, Z. Mai, Y. Lin, Z. Guo, and V. S. Sheng, ‘‘A survey on binary code
has certain limitations. The main limitation is the scarcity vulnerability mining technology,’’ J. Inf. Hiding Privacy Protection, vol. 3,
no. 4, pp. 165–179, 2021, doi: 10.32604/jihpp.2021.027280.
of publicly available binary code datasets. Juliet Test Suite [7] S. Alrabaee, M. Debbabi, and L. Wang, ‘‘A survey of binary code fin-
dataset used in this paper is obtained by processing its source gerprinting approaches: Taxonomy, methodologies, and features,’’ ACM
code dataset, involving the compilation of source code into Comput. Surv., vol. 55, no. 1, pp. 1–41, Jan. 2022, doi: 10.1145/
3486860.
binary code and the disassembly of binary code into assembly
[8] C. B. Sahin and L. Abualigah, ‘‘A novel deep learning-based feature selec-
language. In this process, various issues such as choosing a tion model for improving the static analysis of vulnerability detection,’’
compiler, different compilation platforms, and implementing Neural Comput. Appl., vol. 33, no. 20, pp. 14049–14067, Oct. 2021, doi:
a disassembler must be considered in actual situations. The 10.1007/s00521-021-06047-x.
[9] R. Scandariato, J. Walden, and W. Joosen, ‘‘Static analysis versus pene-
data processing process is very complex. This article refers tration testing: A controlled experiment,’’ in Proc. IEEE 24th Int. Symp.
to the method described in the work of [20], which involves Softw. Rel. Eng. (ISSRE), Nov. 2013, pp. 451–460.

63914 VOLUME 11, 2023


G. Wu, H. Tang: Binary Code Vulnerability Detection Based on Multi-Level Feature Fusion

[10] S. Dinesh, N. Burow, D. Xu, and M. Payer, ‘‘RetroWrite: Stati- [25] M. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee,
cally instrumenting COTS binaries for fuzzing and sanitization,’’ in and L. Zettlemoyer, ‘‘Deep contextualized word representations,’’
Proc. IEEE Symp. Secur. Privacy (SP), May 2020, pp. 1497–1511, doi: in Proc. Conf. North Amer. Chapter Assoc. Comput. Linguistics,
10.1109/SP40000.2020.00009. Human Lang. Technol., 2018, pp. 2227–2237. [Online]. Available:
[11] C. Beaman, M. Redbourne, J. D. Mummery, and S. Hakak, ‘‘Fuzzing https://aclanthology.org/N18-1202.pdf
vulnerability discovery techniques: Survey, challenges and future direc- [26] M. Sundermeyer, R. Schluter, and H. Ney, ‘‘LSTM neural net-
tions,’’ Comput. Secur., vol. 120, Sep. 2022, Art. no. 102813, doi: works for language modeling,’’ in Proc. Interspeech, Sep. 2012,
10.1016/j.cose.2022.102813. pp. 1–4. [Online]. Available: https://www.isca-speech.org/archive_v0/
[12] O. Zaazaa and H. El Bakkali, ‘‘Dynamic vulnerability detection archive_papers/interspeech_2012/i12_01 94.pdf
approaches and tools: State of the art,’’ in Proc. 4th Int. Conf. [27] A. Salah, M. Bekhit, E. Eldesouky, A. Ali, and A. Fathalla, ‘‘Price predic-
Intell. Comput. Data Sci. (ICDS), Oct. 2020, pp. 1–6, doi: tion of seasonal items using time series analysis,’’ Comput. Syst. Sci. Eng.,
10.1109/ICDS50568.2020.9268686. vol. 46, no. 1, pp. 445–460, 2023.
[13] J. Jurn, T. Kim, and H. Kim, ‘‘An automated vulnerability detection and [28] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, ‘‘Empirical evalua-
remediation method for software security,’’ Sustainability, vol. 10, no. 5, tion of gated recurrent neural networks on sequence modeling,’’ 2014,
p. 1652, May 2018, doi: 10.3390/su10051652. arXiv:1412.3555.
[14] R. Zhang, S. Huang, Z. Qi, and H. Guan, ‘‘Combining static and dynamic [29] T. Mikolov, K. Chen, G. Corrado, and J. Dean, ‘‘Efficient estimation of
analysis to discover software vulnerabilities,’’ in Proc. 5th Int. Conf. Innov. word representations in vector space,’’ 2013, arXiv:1301.3781.
Mobile Internet Services Ubiquitous Comput., Jun. 2011, pp. 175–181, doi: [30] H. Wei, G. Lin, L. Li, and H. Jia, ‘‘A context-aware neural embedding for
10.1109/IMIS.2011.59. function-level vulnerability detection,’’ Algorithms, vol. 14, no. 11, p. 335,
[15] Q. Wang, Y. Li, Y. Wang, and J. Ren, ‘‘An automatic algorithm for software Nov. 2021, doi: 10.3390/a14110335.
vulnerability classification based on CNN and GRU,’’ Multimedia Tools [31] M. A. Albahar, ‘‘A modified maximal divergence sequential auto-
Appl., vol. 81, no. 5, pp. 7103–7124, Jan. 2022, doi: 10.1007/s11042-022- encoder and time delay neural network models for vulnerable binary
12049-1. codes detection,’’ IEEE Access, vol. 8, pp. 14999–15006, 2020, doi:
[16] Z. Li, D. Zou, S. Xu, X. Ou, H. Jin, S. Wang, Z. Deng, and Y. Zhong, 10.1109/ACCESS.2020.2965726.
‘‘VulDeePecker: A deep learning-based system for vulnerability detec- [32] K. Filus, P. Boryszko, J. Domanska, M. Siavvas, and E. Gelenbe, ‘‘Efficient
tion,’’ 2018, arXiv:1801.01681. feature selection for static analysis vulnerability prediction,’’ Sensors,
[17] L. Wartschinski, Y. Noller, T. Vogel, T. Kehrer, and L. Grunske, vol. 21, no. 4, p. 1133, Feb. 2021, doi: 10.3390/s21041133.
‘‘VUDENC: Vulnerability detection with deep learning on a natural code-
base for Python,’’ Inf. Softw. Technol., vol. 144, Apr. 2022, Art. no. 106809,
doi: 10.1016/j.infsof.2021.106809.
[18] H. Wang, W. Qu, G. Katz, W. Zhu, Z. Gao, H. Qiu, J. Zhuge, and C. Zhang,
GUANGLI WU (Member, IEEE) was born in
‘‘JTrans: Jump-aware transformer for binary code similarity detection,’’ in
Proc. 31st ACM SIGSOFT Int. Symp. Softw. Test. Anal., Jul. 2022, pp. 1–13. Weifang, Shandong, China, in 1981. He received
[19] Y. Lee, H. Kwon, S.-H. Choi, S.-H. Lim, S. H. Baek, and K.-W. Park, the Ph.D. degree. He is currently a professor. His
‘‘instruction2vec: Efficient preprocessor of assembly code to detect soft- research interests include network security and
ware weakness with CNN,’’ Appl. Sci., vol. 9, no. 19, p. 4086, Sep. 2019, artificial intelligence.
doi: 10.3390/app9194086.
[20] H. Yan, S. Luo, L. Pan, and Y. Zhang, ‘‘HAN-BSVD: A hierarchical atten-
tion network for binary software vulnerability detection,’’ Comput. Secur.,
vol. 108, Sep. 2021, Art. no. 102286, doi: 10.1016/j.cose.2021.102286.
[21] T. Le, T. Nguyen, T. Le, D. Phung, P. Montague, O. De Vel, and L. Qu,
‘‘Maximal divergence sequential autoencoder for binary software vulner-
ability detection,’’ in Proc. Int. Conf. Learn. Represent., 2019, pp. 1–15.
[Online]. Available: https://openreview.net/pdf?id=ByloIiCqYQ
[22] J. Tian, W. Xing, and Z. Li, ‘‘BVDetector: A program slice-based binary HUILI TANG is currently pursuing the master’s
code vulnerability intelligent detection system,’’ Inf. Softw. Technol., degree with the Gansu University of Political
vol. 123, Jul. 2020, Art. no. 106289, doi: 10.1016/j.infsof.2020.106289. Science and Law. Her current research interests
[23] K. L. Narayana and K. Sathiyamurthy, ‘‘Automation and smart materi- include artificial intelligence and vulnerability
als in detecting smart contracts vulnerabilities in blockchain using deep detection.
learning,’’ Mater. Today, Proc., vol. 81, pp. 653–659, Jan. 2023, doi:
10.1016/j.matpr.2021.04.125.
[24] W. Ouyang, M. Li, Q. Liu, and J. Wang, ‘‘Binary vulnerability mining
based on long short-term memory network,’’ in Proc. World Autom. Congr.
(WAC), Aug. 2021, pp. 71–76, doi: 10.23919/WAC50355.2021.9559467.

VOLUME 11, 2023 63915

You might also like