Research On Text Classification Based On CNN and LSTM: Yuandong Luan Shaofu Lin

2019 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA)
Research on Text Classification Based on CNN and

LSTM
Yuandong Luan Shaofu Lin

Faculty of Information Technology Faculty of Information Technology
Beijing University of Technology Beijing University of Technology
Beijing, China Beijing, China
1183559690@qq.com linshaofu@bjut.edu.cn
without activation function and LSTM with one of its

variants coupled of input and forget gate, and test it on the
Abstract—With the rapid development of deep learning subjective and objective text dataset provided by [1]. The
technology, CNN and LSTM have become two of the most validity of the model is proved by comparative experiments.
popular neural networks. This paper combines CNN and
LSTM or its variant and makes a slight change. It proposes a II. RELATED WORK
text classification model named NA-CNN-LSTM or NA-CNN- Deep learning is one of the latest trends in machine
COIF-LSTM, which has no activation function in CNN. The learning and artificial intelligence research, and many
experimental results on the subjective and objective text significant breakthroughs have been made in this field all
categorization dataset [1] show that the proposed model has
over the world [4]. Similarly, in the field of natural language
better performance than the standard CNN or LSTM.
processing, the use of deep learning has produced an effect
Keywords—CNN, LSTM, text classification, deep learning that easily surpasses traditional methods. Ref. [5] briefly
introduced the architecture and methods of deep learning and
I. INTRODUCTION its application in natural language processing, and then
discussed the current technical status in detail, and put
Text categorization has always been a basic and popular
forward some suggestions for future research in this field.
research topic in the field of natural language processing. It
Convolutional neural network is widely used in image field.
has a wide range of applications in our life, such as film
[Kim, 2014] firstly proposed the application of convolutional
review classification, subjective and objective sentence
neural network in text categorization task. He used a simple
classification, and text categorization technology has great
single-layer convolution neural network in many
significance in assisting people to make decisions in real life.
classification datasets while achieving excellent
Traditional text categorization methods mainly include
classification results, and made a detailed parameter
dictionary-based method and machine learning method.
adjustment [6]. This also inspires us to use deep learning
Since the emergence of deep learning algorithm, the
methods in some tasks without the need for complex network
accuracy of text categorization has been greatly improved. In
structures. Ref. [7] introduced three text categorization
the task of text categorization using deep learning algorithm,
methods based on multi-task learning of recurrent neural
convolutional neural networks(CNN) and long short term
networks. These methods can improve the performance of
memory networks(LSTM) are widely used. Convolutional
the task with the help of other related tasks. Ref. [8]
neural network is a kind of multi-layer neural network,
introduced the text classification method of standard CNN
which is an improvement of error back propagation network.
combined with standard LSTM. It used CNN to extract
It is good at dealing with related machine learning problems
features of high-level phrase sequences and send them to
of images, especially large images. CNN was first proposed
LSTM to obtain sentence representation. This method has
by Yann Lecun and applied to handwritten character
the ability to capture both local phrase features and global
recognition [2]. Recurrent neural network(RNN) is a kind of
sentence representation. Ref. [9] summarized the differences
neural network structure which contains a loop. It has the
between standard LSTM and its eight variants. Tests on three
ability to preserve information. Information is transmitted
representative tasks showed that none of the variants can
from layer to layer through the recurrent network module.
significantly improve the performance, and the forget gate
The output of hidden layer depends on the information of
and the output activation function is the key to the model.
past time at every moment. Chain attributes of recurrent
Ref. [10] summed up the effects of different parameter
neural networks indicate that the model is closely related to
settings on the performance of CNN model in practical
the problem of sequence annotation, and it has been widely
application. It also provided specific suggestions for
used in natural language processing tasks such as text
practitioners.
classification and machine translation. However, because the
current output results of the recurrent neural network are III. TEXT CLASSIFICATION MODEL BASED ON CNN AND
related to a long input sequence, the gradient explosion or LSTM
disappearance caused by the long-term dependence and the
long sequence of the recurrent neural network is a problem. The text categorization model based on CNN and LSTM
In order to avoid the long-term dependence of recurrent or its variant can be divided into four layers: input layer,
neural networks, researchers proposed long short term convolutional network layer, LSTM or its variants layer and
memory neural networks [3]. In this paper, we use CNN softmax classifier layer. The model structure is shown in
Figure 1.
978-1-7281-1223-7/19/$31.00 ©2019 IEEE 352 Dalian, China

March 29-31, 2019
Fig.2. LSTM: standard
The core of LSTM is cell state, which can add or delete

information to cells, and selectively let information flow
Fig.1. Model structure
through the door mechanism to achieve this purpose. LSTM
consists of three gates: forget gate, input gate and output gate.
A. Input Layer First, the forget gate decides which information to delete
from the cell state, and then the input gate decides what
This paper uses the subjective and objective text data in information to update to the cell state. After determining
[7]. After reading into the data, the text is preprocessed first, these two points, the cell state can be updated. Finally, the
and the required pure data can be obtained by removing output gate decides the final output of the network. The state
redundant spaces, special characters except numbers or of each node in this process is determined by Equation (1)-
letters, and special expressions in English, such as tenses. (6).
Then, we use the learn module of TensorFlow library to
generate word vectors, and get the word vectors of the text
𝑓𝑓𝑡𝑡 = σ�𝑊𝑊𝑓𝑓 ∙ [ℎ𝑡𝑡−1 , 𝑥𝑥𝑡𝑡 ] + 𝑏𝑏𝑓𝑓 �#(1)
and input them into the convolutional network layer as
features.
𝑖𝑖𝑡𝑡 = 𝜎𝜎(𝑊𝑊𝑖𝑖 ∙ [ℎ𝑡𝑡−1 , 𝑥𝑥𝑡𝑡 ] + 𝑏𝑏𝑖𝑖 )#(2)
B. Convolutional Network Layer
As we all know, the convolutional neural network is a 𝐶𝐶�𝑡𝑡 = tanh(𝑊𝑊𝐶𝐶 ∙ [ℎ𝑡𝑡−1 , 𝑥𝑥𝑡𝑡 ]) + 𝑏𝑏𝐶𝐶 #(3)
non-linear activation function applied to the results of
convolutional operation, and then a full connection layer is 𝐶𝐶𝑡𝑡 = 𝑓𝑓𝑡𝑡 ∗ 𝐶𝐶𝑡𝑡−1 + 𝑖𝑖𝑡𝑡 ∗ 𝐶𝐶�𝑡𝑡 #(4)
used after the pooling operation to classification. The core of
convolutional operation is called filter, also known as kernel 𝑜𝑜𝑡𝑡 = σ(𝑊𝑊𝑜𝑜 ∙ [ℎ𝑡𝑡−1 , 𝑥𝑥𝑡𝑡 ] + 𝑏𝑏𝑜𝑜 )#(5)
function. It completes feature extraction by sliding from top
to bottom and from left to right in the original matrix. In ℎ𝑡𝑡 = 𝑜𝑜𝑡𝑡 ∗ tanh(𝐶𝐶𝑡𝑡 ) #(6)
natural language processing, the width of the kernel function
is generally equal to the width of the original matrix, and the Among them, ℎ𝑡𝑡−1 denotes the hidden state of the
kernel function only slides in the upper and lower directions, previous layer, 𝑥𝑥𝑡𝑡 denotes the current input, W and b denote
which guarantees the integrity of the word as the smallest the weight and bias, σ denotes the sigmoid function, 𝑓𝑓𝑡𝑡
granularity in the language [11]. In the sliding process of the denotes the output of the forget gate, 𝑖𝑖𝑡𝑡 denotes the output of
kernel function, there are two kinds of padding strategies, the input gate, 𝐶𝐶�𝑡𝑡 denotes the intermediate temporary
zero-padding and valid-padding, decided by whether adding state, 𝐶𝐶𝑡𝑡−1 denotes the cell state of the previous layer, 𝐶𝐶𝑡𝑡
zero to the original matrix. Here we adopt valid-padding. On denotes the cell state of the next layer, 𝑜𝑜𝑡𝑡 denotes the output
the other hand, the results of our convolutional layer are sent of the output gate, and ℎ𝑡𝑡 denotes the hidden state of the next
to the LSTM layer. LSTM needs the input of sequential layer.
relationship, and the pooling operation will destroy this
relationship, so we remove the pooling operation. What's This paper compares the standard LSTM with one of its
more, unlike the typical CNN, we need to apply the variants, called COIF-LSTM, which is coupled of input and
activation function to the convoluted results, instead we omit forget gates. Its structure is shown in Figure 3.
the activation function here.
C. LSTM Or Its Variant Layer
Long short term memory network (LSTM) is a special
type of recurrent neural network (RNN), which has the
ability to learn long-term dependence. Its structure is shown
in Figure 2.
353
TABLE I. CONFUSION MATRIX
Negative Pre Positive Pre
Negative Act TN FP
Positive Act FN TP
TN(True Negative) represents the number of true

negative classes, that is, the number of samples predicted as
objective text and actually as objective text.
FN(False Negative) represents the number of false
negative classes, that is, the number of samples predicted as
objective text and actually as subjective text.
FP(False Positive) represents the number of false positive
classes, that is, the number of samples predicted as subjective
Fig.3. LSTM: coupled of input and forget gates
text and actually as objective text.
The only difference between them is that the calculation TP(True Positive) represents the number of true positive
method of input gates is different. Instead of calculating the classes, that is, the number of samples predicted as subjective
output of forget gate and input gate separately, in COIF- text and actually as subjective text.
LSTM, the output of the input gates is determined by 1-𝑓𝑓𝑡𝑡 , so
the updating mode of cell state of the next layer can be Precision, which represents the percentage of the amount
shown in Equation (7). of relevant information retrieved to the total amount of
information retrieved, is an index to measure the signal-to-
noise ratio of a retrieval system. It can be expressed as
𝐶𝐶𝑡𝑡 = 𝑓𝑓𝑡𝑡 ∗ 𝐶𝐶𝑡𝑡−1 + (1 − 𝑓𝑓𝑡𝑡 ) ∗ 𝐶𝐶�𝑡𝑡 #(7) Equation (9).
D. Softmax Classifier Layer TP
After extracting high-level features from text by CNN 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 = × 100% #(9)
𝑇𝑇𝑇𝑇 + 𝐹𝐹𝐹𝐹
without activation function combined with LSTM or its
variants, the features are sent to the softmax classifier in a Recall, which represents the percentage of the amount of
fully connected way for classification. Softmax is a special relevant information retrieved to the total amount of relevant
kind of function. It can map the output of neurons to (0,1) information in the system, is an index to measure the success
interval and select the class with the largest probability value of a retrieval system in detecting relevant documents from a
as the result of prediction. The calculation of the softmax collection of documents. It can be expressed as Equation (10).
value is shown in Equation (8).
TP
𝑒𝑒 𝑖𝑖 𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 = × 100%#(10)
𝑃𝑃𝑖𝑖 = #(8) TP + FN
∑ 𝑗𝑗 𝑒𝑒 𝑗𝑗
F1 score, which takes both recall and precision into
Where 𝑃𝑃𝑖𝑖 denotes the probability of the ith category, 𝑒𝑒 𝑖𝑖 account, is the harmonic mean of precision and recall. The
denotes the corresponding value of the output of the ith higher recall and precision are, the higher F1 score will be
category, and j denotes the total number of categories. gotten. It can be expressed as Equation (11).
IV. EXPERIMENT TP
𝐹𝐹1 = × 100%#(11)
A. Data Set FN + FP
TP +
2
The experimental data in this paper are derived from the
subjective and objective text data used in [1]. The data set D. Experimental comparison
version number is subjectivity dataset v1.0, which includes
In this paper, a single CNN and LSTM model is used as
5000 subjective text and 5000 objective text data.
the contrast model, and produce four kinds of subjective and
B. Experimental settings objective text data classification models by combining CNN
In this experiment, for the convolutional network layer, and LSTM and their variants. They are standard CNN
we use the word embedding dimension of 256, the size of the combined with standard LSTM called CNN-LSTM model,
filter is 3,4,5, the number of filters is 128, the sliding step is non-activation function CNN combined with standard LSTM
1, and the valid padding method is used. For LSTM layer, we called NA-CNN-LSTM model, standard CNN combined
use two-tier stacked LSTM and set the number of hidden with variant LSTM called CNN-COIF-LSTM model, and
units to 128. non-activation function CNN combined with variant LSTM
called NA-CNN-COIF-LSTM model.
C. Evaluating indicator
V. RESULTS ANALYSIS
In order to evaluate the performance of our model, we
use the precision, recall and f1-score as the evaluation In this paper, the above model is used to experiment on
criteria of this experiment. To illustrate the meanings of a given data set. The final experimental results are shown in
these indicators, confusion matrix is introduced first [12], Table II. comparison result
shown in TableⅠ. Model Precision Recall F1 score
CNN 98.9353% 98.5197% 98.7270%
354
LSTM 98.9816% 99.1598% 99.0706% combination of CNN without activation function and LSTM
CNN-LSTM 99.4769% 98.9197% 99.1975% or its variant has better performance. Ref. [9] proposed eight
NA-CNN-LSTM 99.2201% 99.2598% 99.2400% variant models of LSTM. The next step of this paper is to
CNN-COIF-LSTM 98.9816% 99.1598% 99.0706%
NA-CNN-COIF-LSTM 99.1415 99.3398% 99.2406%
explore the performance of the combination of CNN and
other variants of LSTM.
From the above results, we can draw some interesting
findings. REFERENCES
[1] Bo Pang, Lillian Lee. A Sentimental Education: Sentiment Analysis
In terms of precision, CNN-LSTM performs best, Using Subjectivity Summarization Based on Minimum Cuts.
followed by NA-CNN-LSTM. As far as recall is concerned, Proceedings of the ACL, 2004.
NA-CNN-COIF-LSTM performs best, followed by NA- [2] Lecun, Y. L. , et al. "Gradient-Based Learning Applied to Document
CNN-LSTM. Generally speaking, NA-CNN-LSTM and NA- Recognition." Proceedings of the IEEE 86.11(1998):2278-2324.
CNN-COIF-LSTM are the best performers in terms of F1 [3] Hochreiter, Sepp , and Schmidhuber, Jürgen. "Long Short-Term
score. This is in line with our expectation that CNN without Memory." Neural Computation 9.8(1997):1735-1780.
activation function combined with LSTM or its variant will [4] Minar, Matiur Rahman , and J. Naher . "Recent Advances in Deep
have better performance, which proves the validity of the Learning: An Overview." (2018).
model in this paper. [5] Otter, Daniel W. , J. R. Medina , and J. K. Kalita . "A Survey of the
Usages of Deep Learning in Natural Language Processing." (2018).
The model performance of CNN combined with LSTM [6] Kim, Yoon . "Convolutional Neural Networks for Sentence
variants is not necessarily improved, such as the performance Classification." Eprint Arxiv (2014).
of CNN-COIF-LSTM is the same as that of LSTM. However, [7] Liu, Pengfei , X. Qiu , and X. Huang . "Recurrent Neural Network for
the performance of CNN without activation combined with Text Classification with Multi-Task Learning." (2016).
LSTM variants is obviously improved, which proves the [8] Zhou, Chunting , et al. "A C-LSTM Neural Network for Text
validity of this model again. Classification." Computer Science (2015).
[9] Greff, Klaus , et al. "LSTM: A Search Space Odyssey." IEEE
VI. CONCLUSION Transactions on Neural Networks & Learning Systems
28.10(2015):2222-2232.
Unlike the typical CNN, which contains convolution [10] Ye Zhang, Byron C. Wallace. A Sensitivity Analysis of (and
operation and activation function, this paper constructs two Practitioners’ Guide to) Convolutional Neural Networks for Sentence
text classification models called NA-CNN-LSTM and NA- Classification.(2016).
CNN-COIF-LSTM by combining CNN without activation [11] Yan Fang. An Analysis of the Internal Structure of Words.(2014).
function and LSTM, and one of its variants COIF-LSTM. [12] Ting, Kai Ming . "Confusion Matrix." (2011).
Through comparative experiments, it is proved that the
355

Research On Text Classification Based On CNN and LSTM: Yuandong Luan Shaofu Lin

Uploaded by

Copyright:

Available Formats

Research On Text Classification Based On CNN and LSTM: Yuandong Luan Shaofu Lin

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Research On Text Classification Based On CNN and LSTM: Yuandong Luan Shaofu Lin

Uploaded by

Copyright:

Available Formats

2019 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA)

Research on Text Classification Based on CNN and

Yuandong Luan Shaofu Lin

without activation function and LSTM with one of its

978-1-7281-1223-7/19/$31.00 ©2019 IEEE 352 Dalian, China

The core of LSTM is cell state, which can add or delete

TN(True Negative) represents the number of true

You might also like