Learning Improved Class Vector For Multi-Class Question Type Classification

Atlantis Highlights in Computer Sciences, volume 4
Proceedings of the 3rd International Conference on Integrated Intelligent Computing

Communication & Security (ICIIC 2021)
Learning Improved Class Vector for Multi-Class

Question Type Classification
Tanu Gupta1,* Ela Kumar2
1,2
Department of Computer Science and Engineering, Indira Gandhi Delhi Technical University for Women
*
Corresponding author. Email: tanu92gupta@gmail.com
ABSTRACT
Recent research in NLP has exploited word embedding to achieve outstanding results in various tasks such as; spam
filtering, text classification and summarization and others. Present word embedding algorithms have power to capture
semantic and syntactic knowledge about word, but not enough to portray the distinct meaning of polysemy word. Many
work has utilized sense embeddings to integrate all possible meaning to word vector, which is computationally
expensive. Context embedding is another way out to identify word’s actual meaning, but it is hard to enumerate every
context with a small size dataset. This paper has proposed a methodology to generate improved class-specific word
vector that enhance the distinctive property of word in a class to tackle light polysemy problem in question classification.
The proposed approach is compared with baseline approaches, tested using deep learning models upon TREC, Kaggle
and Yahoo questions datasets and respectively attain 93.6%, 91.8% and 89.2% accuracy.
Keywords: Class-specific vector, Deep Learning Models, Polysemy, Question Classification, Word2vec.
1. INTRODUCTION work such as context vector [3,4] and incorporating every

possible sense vector [5,6] to word vector. Despite their
The massive growth of online data day by day has significance these solutions suffer from sever drawbacks.
created the demand of automatic question answering From small dataset it’s hard to interpret correct word
system to retrieve accurate and precise answer for user’s sense from contextual information, therefore context2vec
query. Many researchers have shown interest in fails to identify the actual word meaning. On the other
developing QA systems for various applications such as hand, sense embeddings try to capture all possible sense
Alexa and Siri. Question classification (QC) is an of word irrespective of application and domain. Then, all
important element in designing an outstanding question possible combinations of sense vectors are applied in
answering system, which categorize question either on its NLP task which causes high training computation cost.
answer type or to the related topic that helps in selecting
relevant documents to answer it. Various previous studies Sicong et al. [7] has proposed an idea to encapsulate
[1,2] has used vector space model (VSM) to map class information into a vector to represent a class. This
documents into points in a vector space that embrace the technique has proved successful in overcoming
relevance of its terms. Word2vec, Glove, Fasttext and drawbacks of commonly used context and multiple sense
many algorithms exist to implement VSM for text embeddings. For each class a class vector is generated,
classification, but they are not capable to find exact sense those are further merged with general word embeddings
of polysemy word. Word Embedding algorithms has to create single embedding for a word per class. In short
produce one vector representation for a word, each word has multiple embeddings according to the
irrespective of its sense change with occurrence. The number of classes to solve polysemy problem in text
polysemy is a common issue in text classification, where classification.
a word can have multiple sense. This ambiguity problem Despite the advantage of above work in comparison
is also seen in question classification. Because of of other existing approaches, we observe some critical
polysemy words the actual meaning of user’s question drawbacks. First, the class specific vectors approaches
can be misunderstood and corresponding generated target only binary categorized datasets for classification
answer may not be accurate. Some acclaimed solutions task. Secondly, approach is class label dependent, means
for handling polysemy problem are present in existing based on meaning of class label.
Copyright © 2021 The Authors. Published by Atlantis Press International B.V.

This is an open access article distributed under the CC BY-NC 4.0 license -http://creativecommons.org/licenses/by-nc/4.0/. 113
This paper has introduced an approach that overcome the very basic approach for classification in which
the drawback of work mention in [7] and other commonly experts extract words or their combinations as features.
used techniques. Our approach has generated improved In [8], Biswas and others has designed syntactic patterns
class vectors those are independent of class labels and of questions in TREC QA track. TREC QA track is the
also aim to classify multi-class categorized datasets. In collection of questions which are categorized by Li and
contrast to previous work, the proposed approach can Roth into two levels; 6 coarse classes and 50 fine classes.
generate class vectors for questions labelled with its The syntactic rule based approach has achieved ~98%
answer type such as yes/no, factoid, list and summary and accuracy for classification. Another rule based approach
also questions labelled with their domain such as has added semantic hand crafted rule with syntactic rules
business, entertainment and so on. For example, in in [9] on same dataset and tested these rules using
previous work, tweets were collected form two classes; machine learning models. The linear SVM gives 91.6%
hurricane related and unrelated. The class vector for accuracy and manual approach has given 97% accuracy.
hurricane related tweet is based on the words having least Although rule based approaches are more accurate in
cosine similarity with label ‘hurricane’. This approach classification task, but this practice is taking enormous
doesn’t work for questions classified with their expected efforts and time. With machine learning models, these
answer type. Therefore, a general methodology has been drawbacks can be overcome and more latent features of
proposed in this paper to produce class vectors those are text can be extracted. [10] has explored deep learning
not depend on class label. Our work has been tested on models such as CNN, LSTM, hybrid framework, for
both types of datasets; TREC questions are classified question classification using word2vec algorithm on
with its semantic class, Kaggle questions are classified Turkish translated UIUC English dataset. These models
with topics or issues related to python such as run time are tested upon word vector generated using skip gram
error, installation etc. and Yahoo Questions are classified embedding model and CBOW model. This study has
with list, summary, factoid and yes/no answer type. investigated that skip gram model’s word vectors has
brought highest accuracy with CNN model. The RNN
Our proposed approach has used modified skip gram
model and its variants has also gained popularity [11,12]
model to generate word vectors and improved tf-idf score
in the same task. Stefan and others [11] has developed
(tf-idf-cf) to express class information for question
algorithm for accessing the question quality using
classification. With the experiments performed on
sequence model. The questions are the collection of
mentioned datasets using improved class-specific vector student feedback from a tutoring system, iSTART. Total
representation, helps to infer the excellence of proposed collected questions are 4575 and are manually coded
approach to tackle light polysemy problem in question between: very shallow (1) to very deep (4). The
classification task. The brief of the contribution has been experiments are implemented on multiple RNN models
made in this paper: such as GRU, Bi-GRU and LSTM using Glove and
FastText embeddings. The Bi-GRU model gives best
● Our work is first to utilize class specific word
performance with 81.22 % accuracy. Bi-LSTM has also
embeddings for multi-class question classification.
given improved results on question collection of about
● Unlike previous approach, class vectors are unrelated daily meetings and conversation [12]. The study has
to class labels. evaluated accuracy and loss function value as 90.9% and
● Our work has targeted polysemy problem in question 0.316 respectively. The question classification on
classification task. Chinese dataset has tested word2vec algorithm with
attention based deep learning model and compared
The paper is structured as, section 2 has discussion
results with other models such as CNN, LSTM and Bi-
about related literature survey, feature extraction
GRU [13]. From results, it is discovered that precision of
methods are explained section 3, proposed approach is
attention based LSTM and Bi-GRU CNN model is same,
mentioned in section 4, experiment setup, datasets
but the f-score of attention based Bi-GRU CNN is highest
description and results is given in section 5, section 6 has
among all models i.e. 0.784. Various previous works
made conclusion of the paper with insight into future
have contributed in word2vec algorithm [14,15,22] to
work and then references.
give a new direction in performing different NLP
2. RELATED WORK classification tasks. But all the existing techniques
generate one embedding per word, which face a fall in
Question classification is a vital step of question determining correct sense of word, when word actually
processing module in question answering system. This has multiple senses called polysemy words.
helps to find out the type of answer expected by the user
To eliminate polysemy problem, several models have
and also it’s related documents. This section describes the
been introduced to induce multiple-embedding for a
existing work in question classification. The techniques
word, multiple-embeddings were trained on machine
involved in classifying questions are placed in three
learning classifiers [17,16]. One way to discriminate
groups: rule based, machine learning and hybrid
between different senses of word is learning context of
techniques. The rule base or hand craft methodology is
114
target word. The k-means clustering algorithm used by proposed approach. We test our approach on TREC and
Huang et al. [18] has empirically declared k-senses for Kaggle question dataset using deep learning models.
each ambiguous word. Local contexts of word are
grouped into k-clusters, which limits the knowledge that 3. FEATURE EXTRACTION METHODS
we can gather for distinguishing the related sense.
Neelakantan et al. [17] has extended this idea and utilized 3.1 Modified Skip Gram Model
skip-gram model for context clustering. The fusion of
skip-gram model and context-clustering was proposed
where cluster centroid is equivalent to sense vector and is
sent to skip gram model for updating. This approach
suffers through expensive training computation cost.
Some research work has also focus on morphology,
i.e. sub-words level, to obtain multiple embeddings per
word. Unlike previous work, Bojanowski et al. [19]
explored the internal structure of word and modify skip-
gram model with n-grams of word’s character. Each Figure 1 Architecture of Modified Skip-Gram Model. N
character n-gram has a vector representation and their is size of vocabulary; M is the context window size.
sum generate the word vector. This approach is combined
with Gaussian mixture model, where each Gaussian The skip gram model is a shallow neural network
component represents different sense of word [20,21]. architecture to implement word2vec algorithm. It
evaluates the context words embeddings from the given
The convolutional neural network has explored in target word. The earliest version of model represents a
many research work that help in producing context word with single vector, which is not sufficient to address
relevant embeddings. Jingyun et al. [23] has designed a polysemy obstacle in text classification. In the proposed
two layer recurrent CNN model to capture context approach, modified skip gram model [7] is used to update
relevant concepts. The first layer presents the (word, context vectors of word using its class-specific
concept) pair using pre-trained word vectors (local embedding, it’s architecture is given in Figure 1. The
information) and second layer hidden states are updated objective function for modified skip gram model
concatenated using Bi-GRU according to the word input is given in Equation 1.
time (global information). Both information is
aggregated at attention layer and generate context Լ = ∑𝑤∈∁ 𝑙𝑜𝑔 𝑝 (𝑉𝑤,𝑐 ) (1)
relevant word embedding for classifying short text
Where 𝑤 is the target word and ∁ is corpus.
datasets; TREC, MR (movie review) and AG news
corpus. The GCN (Graph Convolutional Network) model
variant of CNN has been explored to extract local 3.2 Tf-idf-cf
information using lexical relations in language and global
The popular tf-idf algorithm is one of the most utilized
information from BERT model [24-26]. Both the
statistical measure to find relevance of each word in
knowledge is combined via attention mechanism through
dataset [27,28] Many researchers have continuously
different layers of network. The hybrid of GCN and
improved weighting formula to perform better in various
BERT model (VGCN-BERT) has performed outstanding
NLP applications. We have used an improved tf-idf
in text classification on various datasets. SST-2, MR,
scoring formula to find relevant terms in each class. Liu
CoLA, ArangoHate, FountaHate has acquired 91.93%,
et al. [29] has introduced in-class characteristic in tf-idf
86.49%, 83.68%, 88.43% and 81.26% f1-score
(term frequency- inverse document frequency) weighting
respectively.
formula to exploit its ability to differentiate documents
The class-specific vector representation of word [7] among different classes. The in-class feature says that if
in corpus is another popular work for text classifying. term frequency is high and present in small portion of
The modified version of skip-gram model and continuous documents, then it is qualified to discriminate documents
bag of word (CBOW) model has been proposed for into classes. The formula for tf-idf-cf is written in
generating class vectors. The linear compositionality Equation 2:
property of word vector has been exploited for adding 𝑛𝑐𝑖𝑗
𝑁
class information in general word vectors [35-38]. A 𝑎𝑖𝑗 = 𝑡𝑓𝑖𝑗 ∗ 𝑙𝑜𝑔 ( ) ∗ (2)
𝑛𝑗 𝑁𝑐𝑖
parallel CNN framework is designed for classifying
binary categorized datasets; SemEval 2013 and hurricane Where 𝑡𝑓𝑖𝑗 is the term frequency of term j in
related tweets. With proposed features and deep learning document i, N is the total count of documents, 𝑛𝑗
model, SemEval 2013 attain 73.15% and hurricane represents the count of documents where term j appears,
related tweets attain 88.19%. The drawbacks observed in 𝑛𝑐𝑖𝑗 is the count of documents in which term j is present
this work (see section 1) has been removed in our in the same class c document i belongs to and 𝑁𝑐𝑖 is the
115
number of documents within the same class document i proposed approach which based upon Sicong work [7].
belongs to. The improved formula with smoothing After pre-processing, words belong to a class are
technique is given in Equation 3: separated and tf-idf-cf weight is calculated for each word
𝑁+1.0 𝑛𝑐𝑖𝑗 in the class. Higher the tf-idf-cf score signifies the higher
𝑎𝑖𝑗 = 𝑙𝑜𝑔 (𝑡𝑓𝑖𝑗 + 1.0) ∗ 𝑙𝑜𝑔 ( ) ∗ (3) occurrence of the word in class, in short that word can
𝑛𝑗 𝑁𝑐𝑖
represent the class. For generating word embeddings, set
4. PROPOSED APPROACH of words of class are given as input to modified Skip-
Gram model given in Sicong work [7]. Then the top-n
In this paper, a methodology for automatic question
words with highest tf-idf-cf score are picked up and the
classification is proposed to handle light text polysemy
average of these word vectors (Equation 4) gives class
problem. Using linear compositionality property of word
vector (𝑣𝑒𝑐𝑡𝑜𝑟(𝑐𝑙𝑎𝑠𝑠)), approach has been tested for
embedding, different meanings of a word are captured in
different values of n. So, this methodology generates
class information vector; one vector for each class [39].
class vectors equal to the number of classes in dataset.
This section describes the procedure of executing this
1
approach, it’s flowchart is shown in Figure 2. 𝑣𝑒𝑐𝑡𝑜𝑟(𝑐𝑙𝑎𝑠𝑠) = (𝑣𝑒𝑐𝑡𝑜𝑟(𝑤1 ) + 𝑣𝑒𝑐𝑡𝑜𝑟 (𝑤2 ) +
𝑛
(a) Input dataset ⋯ + 𝑣𝑒𝑐𝑡𝑜𝑟 (𝑤𝑛 )) (4)
The details of question datasets used for this work is The representation of the class specific word
given in section 5.2. In TREC dataset, training dataset has embedding 𝑉𝑤,𝑐 is achieved using linear
questions with their labels and testing dataset has compositionality property of vector with the summation
questions only. Whereas Kaggle dataset is split into of general word vector (𝑉𝑤 ) and class vector
training and testing dataset with 7:3 ratio and all (vector(class)) as shown in Equation 5.
questions are labelled.
𝑉𝑤,𝑐 = 𝑉𝑤 + 𝑣𝑒𝑐𝑡𝑜𝑟(𝑐𝑙𝑎𝑠𝑠) (5)
(b) Pre-processing 𝑉𝑤,𝑐 for all the words of a class are given for training
Text pre-processing is an important step to convert in modified skip gram model to obtain updated word
raw textual data into more edible structure for machine vectors. These vector representations capture semantic as
learning models. Here the following are steps are taken well as conceptual knowledge for each occurrence of
in sequence to pre-process the questions: Tokenization, word. Basically the number of embeddings per word
Stop Words Removal and Stemming. The output of pre- depends on the number of classes in which word occur.
processing step gives necessary words those are given as Thus we get multiple class specific word embeddings for
input to word2vec algorithm [40-43]. training classifiers for multi-class datasets.
(c) Vector Representation of Words (d) Question Classification
The question classification accuracy depends upon The proposed methodology to generate class-specific
the quality of vector representation of words. In our vector of word for classifying questions is evaluated
proposed approach, improved class-specific word vectors using deep learning models with existing work of class
are generated to tackle light polysemy problem in vectors and others baseline approaches [44-48].
question classification. Figure 3 illustrates the flow of
Figure 2 Flowchart of Proposed Methodology
Figure 3 Proposed Approach to generate Class Vector
116
5. EXPERIMENT AND DATASET the basis of expected answers such as yes/no, summary,
factoid and list. The classifiers are feed with 70% of
5.1 Experimental Setup questions for training purpose and rest 30% are for
testing purpose. The details of training and testing
The introduced approach for automatic question questions is given in table1.
classification has been analyzed on system with 8GB
main memory and intel core i5 processor and 5.3 Baseline Approaches
implemented on python 3.6 environment using keras
[30]. The word2vec algorithm is implemented using This section explains the commonly used
NumPy library with following hyper-parameters methodologies used for question classification.
values; (window_size=3, embedding_dimension= 150,
1. Word2vec [13,15]: With Word2vec algorithm,
epochs=40, learning_rate=0.01). The vector
words are represented as a continuous vector. One
representation of questions and their labels are feed to word has one vector representation.
classifiers after 10-fold cross validation to calculate
accuracy of proposed approach. The confusion matrix 2. Word2Vec + tf-idf-cf [31]: This methodology has
has been calculated to find accuracy for classification provided weights to word embeddings using tf-idf-
on testing data with the below given formula (Equation cf score [29], which enables to calculate more than
one embedding for a word. The higher the tf-idf-cf
6):
score of word, more the importance of word in
𝐶𝑜𝑟𝑟𝑒𝑐𝑡𝑙𝑦 𝑙𝑎𝑏𝑒𝑙𝑙𝑒𝑑 𝑞𝑢𝑒𝑠𝑡𝑖𝑜𝑛𝑠 class.
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = (6)
𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑞𝑢𝑒𝑠𝑡𝑖𝑜𝑛𝑠
3. Word2vec + class vector [7]: The approach
described in [7] has generated class vectors to
5.2 Datasets represent a class, but with few drawbacks
(mentioned in section 1). Our work has eliminated
Table 1. Number of Questions in TREC and Kaggle
these gaps and increases the classification
Datasets accuracy.
Dataset About Total Trainin Testing
Name and Sample g Sample 5.4 Results
Categorie s Samples s
s Table 2. Results on TREC dataset
TREC Open 6463 5453 1000
References Classifiers
[32] domain
CNN Bi-LSTM ABBC
Questions/
[12] [10] [13]
6 Coarse
Word2vec 82 83.5 85
and 50
Word2vec + tf- 83.9 85.7 87.7
Fine
idf-cf
Classes
Word2vec+class 86.3 88.6 90.2
Kaggle Python 342184 239528 102656
vector
[33] question
collection Proposed 88.2 91.7 93.6
from stack approach
Overflow/ *ABBC- Attention Based Bi-GRU CNN
13378
classes Table 3. Results on Kaggle dataset
Yahoo Collection 43500 30450 13050
References Classifiers
Question of
CNN Bi-LSTM ABBC
s [34] questions
[12] [10] [13]
posted on
Word 2vec 81.1 84.4 87.3
Yahoo/ 4
Word2vec+tf-idf- 83 86 87.3
classes
cf
Word2vec+class 85.8 87.4 89.1
The proposed approach has been evaluated on three vector
multi-class question datasets. The first one is TREC Proposed approach 87.4 89.5 91.8
dataset; an open domain dataset which is classified into
six main categories and further into 50 fine categories.
The second is a Kaggle dataset; is collection of python
related questions, which has 342184 questions
categorized into 13378 classes. The third dataset is the
collection of Yahoo Questions those are categorized on
117
96 CNN Bi-LSTM ABBC

CNN Bi-LSTM ABBC
94
Accuracy (%)
92 95
90 90
88
86 85
Accuracy%)
84 80
82 75
80
70
65
Figure 4 Graphical Representation of Results on TREC

dataset.
Figure 6 Graphical Representation of Results on
CNN Bi-LSTM ABBC Yahoo Questions dataset.
94
92 6. CONCLUSION
Accuracy (%)
90
88 We use CNN, Bi-LSTM and ABBC model with
86 different feature sets to examine the efficiency of
84 proposed approach. As the word embedding is the
82 essential feature for classification, but addition of class
80 information with word vectors has increase its
recognition in dataset. The proposed class vector in last
section are useful to update context word embedding of
a word in class and also expand the distinctive property
of a word for each appearance.
The elementary target of our work is to determine
the efficacy of question classification task utilizing
proposed improved class-specific word embeddings.
Figure 5 Graphical Representation of Results on
The graphical representation (figures 4- 6) of results on
Kaggle dataset.
TREC, Kaggle and Yahoo Questions dataset attained
Table 4. Results on Yahoo Questions dataset from proposed work shows that proposed approach has
achieved competent and better results when compared
References Classifiers with baseline approaches.
CNN Bi-LSTM ABBC The proposed approach gives 93.6%, 91.8 and
[12] [10] [13] 89.2% classification accuracy on TREC, Kaggle and
Yahoo Question datasets respectively on ABBC model,
Word 2vec 75.8 76.5 80
which shows improvement with respect to other
Word2vec+tf-idf-cf 78 78.5 82.3 baseline approaches (tables 2-4). The comparative
Word2vec+class 81.2 82.7 85.8 analysis has been made with different vector features
vector using deep learning models conclude that ABBC
framework is best to extract semantic and contextual
Proposed approach 84.5 86.4 89.2 information from class-specific vectors to handle light
polysemy problem in question classification. When
compared with main baseline approach, proposed work
has shown ~3% improvement.
The tf-idf-cf weight of word has also shown its
significance in question classification when seen in
second baseline approach and proposed approach. The
word2vec+tf-idf-cf feature has increase accuracy by
~2.5% for all datasets when compare with word2vec
118
feature. It means tf-idf-cf is an efficient method to representations, Neural Computing and

calculate statistical power of word in every class. Applications, 2020, vol. 32, pp. 1-20. DOI:
10.1007/s00521-020-04725-w
REFERENCES
[11] S. Ruseti, M. Dascalu, A.M. Johnson, R. Balyan,
[1] T. Georgieva-Trifonova, Text classification based K.J. Kopp, D.S. McNamara, S.A. Crossley, S.
on enriched vector space model, in: B. Rachev, A. Trausan-Matu, Predicting question quality using
Smrikarov (Eds.), Proceedings of the 18th recurrent neural networks, in: C.P. Rosé, R.
International Conference on Computer Systems Martínez-Maldonado, U. Hoppe, R. Luckin, M.
and Technologies, 2017, pp. 103-110. DOI: Mavrikis, K. Porayska-Pomsta, B. McLaren, B. du
https://doi.org/10.1145/3134302.3134343 Boulay, International conference on artificial
intelligence in education, 2018, pp. 491-502.
[2] Bhuvaneswary, N., S. Prabu, S. Karthikeyan, R.
Springer. DOI: 10.1007/978-3-319-93843-1_36
Kathirvel, and T. Saraswathi. "Low Power
Reversible Parallel and Serial Binary [12] R. Anhar, T.B. Adji, N. Setiawan, Question
Adder/Subtractor." Further Advances in Internet Classification on Question-Answer System using
of Things in Biomedical and Cyber Physical Bidirectional-LSTM, in: 5th International
Systems (2021): 151. Conference on Science and Technology (ICST),
2019, vol. 1, pp. 1-5, IEEE. DOI:
[3] Prabu, S., Balamurugan Velan, F. V. Jayasudha, P.
10.1109/ICST47872.2019.9166190
Visu, and K. Janarthanan. "Mobile technologies
for contact tracing and prevention of COVID-19 [13] J. Liu, Y. Yang, S. Lv, J. Wang, H. Chen,
positive cases: a cross-sectional Attention-based BiGRU-CNN for Chinese
study." International Journal of Pervasive question classification. Journal of Ambient
Computing and Communications (2020). Intelligence and Humanized Computing, 2019, pp.
1-12. DOI: 10.1007/s12652-019-01344-9
[4] Pham, Dung V., Giang L. Nguyen, Tu N. Nguyen,
Canh V. Pham, and Anh V. Nguyen. "Multi-topic [14] M.A. Fauzi, Word2Vec model for sentiment
misinformation blocking with budget constraint analysis of product reviews in Indonesian
on online social networks." IEEE Access 8 (2020): language. International Journal of Electrical and
78879-78889. Computer Engineering, 2019, vol. 9, pp. 525.
DOI:10.11591/ijece.v9i1.pp525-530
[5] Naeem, Muhammad Ali, Tu N. Nguyen, Rashid
Ali, Korhan Cengiz, Yahui Meng, and Tahir [15] B. Jang, M. Kim, G. Harerimana, S.U. Kang, J.W.
Khurshaid. "Hybrid Cache Management in IoT- Kim, Bi-LSTM model to increase accuracy in text
based Named Data Networking." IEEE Internet of classification: combining Word2vec CNN and
Things Journal (2021). attention mechanism. Applied Sciences, 2020,
vol. 10, pp. 5841. DOI:10.3390/app10175841
[6] M. Pelevina, N. Arefyev, C. Biemann, C., A.
Panchenko, Making sense of word embeddings, [16] A. Trask, P. Michalak, J. Liu, sense2vec-a fast and
arXiv preprint arXiv:1708.03390, 2017. DOI: accurate method for word sense disambiguation in
10.18653/v1/W16-1620. neural word embeddings. arXiv preprint
arXiv:1511.06388, 2015.
[7] S. Kuang, B.D. Davison, Learning class-specific
word embeddings, The Journal of [17] A. Neelakantan, J. Shankar, A. Passos, A.
Supercomputing, 2020, vol. 76, no. 10, pp. 8265- McCallum, Efficient non-parametric estimation of
8292. DOI: 10.1007/s11227-019-03024-z. multiple embeddings per word in vector
space. arXiv preprint arXiv:1504.06654, 2015.
[8] P. Biswas, A. Sharan, R. Kumar, Question
DOI: 10.3115/v1/D14-1113
Classification using syntactic and rule based
approach. in: 2014 International Conference on [18] E.H. Huang, R. Socher, C.D. Manning, A.Y.Ng,
Advances in Computing, Communications and Improving word representations via global context
Informatics (ICACCI),IEEE, 2014, pp. 1033- and multiple word prototypes, in: H. Li, C.Y. Lin,
1038. DOI: 10.1109/ICACCI.2014.6968434 . M. Osborne, G.G. Lee, J.C. Park, Proceedings of
the 50th Annual Meeting of the Association for
[9] H.T. Madabushi, M. Lee, High accuracy rule-
Computational Linguistics, 2012, vol. 1, pp. 873-
based question classification using question
882.
syntax and semantics, in: H. Watanabe
(Eds.), Proceedings of COLING 2016, the 26th [19] P. Bojanowski, E. Grave, A. Joulin, T. Mikolov,
International Conference on Computational Enriching word vectors with subword
Linguistics: Technical Papers, 2016, pp. 1220- information, Transactions of the Association for
1230. Computational Linguistics,2017, vol. 5, pp. 135-
146. DOI: 10.1162/tacl_a_00051
[10] S. Yilmaz, S. Toklu, A deep learning analysis on
question classification task using Word2vec
119
[20] B. Athiwaratkun, A.G. Wilson, A. Anandkumar, for-natural-language-processing-using-word2vec-

Probabilistic fasttext for multi-sense word keras-d9a240c7bb9d, last accessed: 2021/05/18
embeddings, arXiv preprint arXiv:1806.02901,
[31] M. Razzaghnoori, H. Sajedi, I.K. Jazani, Question
2018. DOI: 10.18653/v1/P18-1001
classification in Persian using word vectors and
[21] D.A. Reynolds, Gaussian Mixture frequencies, Cognitive Systems Research, 2018,
Models. Encyclopedia of biometrics, 2009,vol. vol. 47,pp.16-27.
741,pp.659-663. DOI: DOI:10.1016/j.cogsys.2017.07.002
https://doi.org/10.1007/978-0-387-73003-5_196
[32] X. Li, D. Roth, D, Learning question classifiers,
[22] X. Yang, C. Macdonald, I. Ounis, Using word In COLING 2002: The 19th International
embeddings in twitter election Conference on Computational Linguistics, 2002.
classification, Information Retrieval
[33] Python Questions from Stack Overflow,
Journal, 2018, vol. 21, pp. 183-207. DOI:
https://www.kaggle.com/stackoverflow/pythonqu
10.1007/s10791-017-9319-5
estions, last accessed: 2021/05/18
[23] J. Xu, Y. Cai, X. Wu, X. Lei, Q. Huang, H.F.
[34] Yahoo Answers Dataset,
Leung, H. F, Q. Li, Incorporating context-relevant
https://www.kaggle.com/soumikrakshit/yahoo-
concepts into convolutional neural networks for
answers-dataset, last accessed: 2021/05/19.
short text classification, Neurocomputing, 2020,
vol. 386, pp. 42-53. DOI: [35] J. Sang, S. Pang, Y. Zha, F. Yang, Design and
https://doi.org/10.1609/aaai.v33i01.330110067 analysis of a general vector space model for data
classification in Internet of Things, EURASIP
[24] Z. Lu, P. Du, J.Y. Nie, VGCN-BERT: augmenting
Journal on Wireless Communications and
BERT with graph embedding for text
Networking, 2019, vol. 1, pp: 1-10.
classification, in: J.M. Jose, E. Yilmaz, J.
DOI:10.1186/s13638-019-1581-3.
Magalhães, P. Castells, N. Ferro, M.J. Silva, F.
Martins, European Conference on Information [36] M.E. Peters, M. Neumann, M. Iyyer, M. Gardner,
Retrieval, 2020, pp. 369-382. Springer, Cham. C. Clark, K. Lee, L. Zettlemoyer, Deep
DOI: 10.1007/978-3-030-45439-5_25 contextualized word representations, arXiv
preprint arXiv:1802.05365, 2018. DOI:
[25] T. Mikolov, I. Sutskever, K. Chen, G. Corrado, J.
10.18653/v1/N18-1202.
Dean, Distributed representations of words and
phrases and their compositionality, arXiv preprint [37] O. Melamud, J. Goldberger, I. Dagan,
arXiv:1310.4546, 2013 context2vec: Learning generic context embedding
with bidirectional lstm, in: S. Riezler, Y. Goldberg
[26] J. Lilleberg, Y. Zhu, Y. Zhang, Support vector
(Eds.), Proceedings of the 20th SIGNLL
machines and word2vec for text classification
conference on computational natural language
with semantic features, In 2015 IEEE 14th
learning, 2016, pp. 51-61. DOI: 10.18653/v1/K16-
International Conference on Cognitive
1006.
Informatics & Cognitive Computing (ICCI* CC),
2015, pp. 136-140. DOI:10.1109/ICCI- [38] A. Trask, P. Michalak, J. Liu, sense2vec-a fast
CC.2015.7259377. and accurate method for word sense
disambiguation in neural word embeddings, arXiv
[27] S. Jabri, A. Dahbi, T. Gadi, A. Bassir, Ranking of
preprint arXiv:1511.06388, 2015.
text documents using TF-IDF weighting and
association rules mining, In 2018 4th International [39] Rajendran, Ganesh B., Uma M. Kumarasamy,
Conference on Optimization and Applications Chiara Zarro, Parameshachari B. Divakarachari,
(ICOA), 2018, pp. 1-6. DOI: and Silvia L. Ullo. "Land-use and land-cover
10.1109/ICOA.2018.8370597 classification using a human group-based particle
swarm optimization algorithm with an LSTM
[28] R.K. Roul, O.R. Devanand, S.K. Sahay, Web
Classifier on hybrid pre-processing remote-
document clustering and ranking using tf-idf
sensing images." Remote Sensing 12, no. 24
based apriori approach. arXiv preprint
(2020): 4135.
arXiv:1406.5617, 2014
[40] Le, Ngoc Tuyen, Jing-Wein Wang, Duc Huy Le,
[29] M. Liu, J. Yang, An improvement of TFIDF
Chih-Chiang Wang, and Tu N. Nguyen.
weighting in text categorization, in: M.
"Fingerprint enhancement based on tensor of
Othman, International proceedings of computer
wavelet subbands for classification." IEEE
science and information technology, 2012, vol.
Access 8 (2020): 6602-6615.
47, pp. 44-47
[41] K. Yu, L. Tan, L. Lin, X. Cheng, Z. Yi and T. Sato,
[30] Deep Learning for Natural Language Processing
"Deep-Learning-Empowered Breast Cancer
Using word2vec- keras,
Auxiliary Diagnosis for 5GB Remote E-Health,"
https://towardsdatascience.com/deep-learning-
120
IEEE Wireless Communications, vol. 28, no. 3,

pp. 54-61, June 2021, doi:
10.1109/MWC.001.2000374.
[42] K. Yu, L. Tan, S. Mumtaz, S. Al-Rubaye, A. Al-
Dulaimi, A. K. Bashir, F. A. Khan, “Securing
Critical Infrastructures: Deep Learning-based
Threat Detection in the IIoT”, IEEE
Communications Magazine, 2021.
[43] K. Yu, Z. Guo, Y. Shen, W. Wang, J. C. Lin, T.
Sato, “Secure Artificial Intelligence of Things for
Implicit Group Recommendations”, IEEE Internet
of Things Journal, 2021,
doi: 10.1109/JIOT.2021.3079574.
[44] H. Li, K. Yu, B. Liu, C. Feng, Z. Qin and G.
Srivastava, "An Efficient Ciphertext-Policy
Weighted Attribute-Based Encryption for the
Internet of Health Things," IEEE Journal of
Biomedical and Health Informatics, 2021, doi:
10.1109/JBHI.2021.3075995.
[45] Subramani, Prabu, Ganesh Babu Rajendran, Jewel
Sengupta, Rocío Pérez de Prado, and
Parameshachari Bidare Divakarachari. "A block
bi-diagonalization-based pre-coding for indoor
multiple-input-multiple-output-visible light
communication system." Energies 13, no. 13
(2020): 3466.
[46] Rajendrakumar, Shiny, and V. K. Parvati.
"Automation of irrigation system through
embedded computing technology."
In Proceedings of the 3rd International
Conference on Cryptography, Security and
Privacy, pp. 289-293. 2019.
[47] K. Yu, L. Tan, X. Shang, J. Huang, G. Srivastava
and P. Chatterjee, “Efficient and Privacy-
Preserving Medical Research Support Platform
Against COVID-19: A Blockchain-Based
Approach”, IEEE Consumer Electronics
Magazine, doi: 10.1109/MCE.2020.3035520.
[48] Z. Guo, Y. Shen, A. K. Bashir, M. Imran, N.
Kumar, D. Zhang and K. Yu, “Robust Spammer
Detection Using Collaborative Neural Network in
Internet of Thing Applications”, IEEE Internet of
Things Journal, vol. 8, no. 12, pp. 9549-9558, 15
June15, 2021, doi: 10.1109/JIOT.2020.3003802
121

Learning Improved Class Vector For Multi-Class Question Type Classification

Uploaded by

Copyright:

Available Formats

Learning Improved Class Vector For Multi-Class Question Type Classification

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Learning Improved Class Vector For Multi-Class Question Type Classification

Uploaded by

Copyright:

Available Formats

Atlantis Highlights in Computer Sciences, volume 4

Proceedings of the 3rd International Conference on Integrated Intelligent Computing

Learning Improved Class Vector for Multi-Class

1. INTRODUCTION work such as context vector [3,4] and incorporating every

Copyright © 2021 The Authors. Published by Atlantis Press International B.V.

Figure 2 Flowchart of Proposed Methodology

Figure 3 Proposed Approach to generate Class Vector

96 CNN Bi-LSTM ABBC

Figure 4 Graphical Representation of Results on TREC

feature. It means tf-idf-cf is an efficient method to representations, Neural Computing and

[20] B. Athiwaratkun, A.G. Wilson, A. Anandkumar, for-natural-language-processing-using-word2vec-

IEEE Wireless Communications, vol. 28, no. 3,

You might also like