RP Journal 2245-1439 825
RP Journal 2245-1439 825
RP Journal 2245-1439 825
1
Assistant Professor, Department of CSE, Vignan’s Foundation for Science,
Technology and Research, Vadlamudi, Andhra Pradesh, India
2
Professor, Department of IT, Vignan’s Foundation for Science,
Technology and Research, Vadlamudi, Andhra Pradesh, India
Email: veeru2006n@gmail.com
*Corresponding Author
Abstract
The impressive gain in performance obtained using deep neural networks
(DNN) for various tasks encouraged us to apply DNN for image classification
task. We have used a variant of DNN called Deep convolutional Neural
Networks (DCNN) for feature extraction and image classification. Neural
networks can be used for classification as well as for feature extraction.
Our whole work can be better seen as two different tasks. In the first task,
DCNN is used for feature extraction and classification task. In the second
task, features are extracted using DCNN and then SVM, a shallow classifier, is
used to classify the extracted features. Performance of these tasks is compared.
Various configurations of DCNN are used for our experimental studies.Among
different architectures that we have considered, the architecture with 3 levels
of convolutional and pooling layers, followed by a fully connected output
layer is used for feature extraction. In task 1 DCNN extracted features are fed
to a 2 hidden layer neural network for classification. In task 2 SVM is used
to classify the features extracted by DCNN. Experimental studies show that
the performance of υ-SVM classification on DCNN features is slightly better
than the results of neural network classification on DCNN extracted features.
1 Introduction
All Pattern recognition is the process of identifying patterns in the given
data and received increased focus from researcher of this field. In the area of
pattern recognition there are many subfields like: Classification, Clustering,
regression and dimensionality reduction. These are all highlydemanding
tasks in multiple real-time applications [1]. These tasks can be classified
as either supervised or unsupervised depending on whether any supervised
information is provided while training or not. As supervised information is
being used while training, classification and regression tasks fall under the
category of supervised learning. In clustering and dimensionality reduction
tasks supervised information is not available during training and fall in the
category of unsupervised learning.
Regression is a function approximation task in which a continuous value is
to be assigned to a given data point. Classification is a type of regression, where
the output of the test data is discrete. For classification models like simple
K-nearest neighbourhood (KNN), Gaussian mixture models (GMM)-based
Baye’s classification [7], artificial neural networks (ANN)-based Multi-
layer feedforward neural networks (MLFFNN) [10], support vector machine
(SVM)-based classification can be used.
Dimensionality reduction is the process of representing high dimensional
data in a lower dimensional space. This process can be viewed as projection
of data from higher dimensional space to a lower dimensional space. These
dimensionality reduction techniques are basically categorized into linear
and non-linear methods depending on the way data is projected. Principal
component analysis (PCA), Linear discriminant analysis (LDA), ICA, CCA,
NMF come under linear dimensionality reduction techniques as the projected
data is linearly related to the data in the input space. KPCA [13], KLDA, auto
encoders come under non-linear dimensionality reduction techniques, as the
projected data is non-linearly related to the data in the input space [3].
Auto associative Neural networks (AANN) are basically feed forward
neural networks (FFNN) with structures satisfying the requirements for
performing auto-association. Mapping in an AANN [12] is achieved by
using a dimension reduction followed by a dimension expansion. Dimension
reduction part of the network is called encoder while the dimension expansion
part of the network is known as decoder. After training auto-associative
Feature Extraction and Classification Using Deep Convolutional 263
neural network, decoder part is removed and encoder part is used for non-
linear dimension reduction if the activation function of the hidden layer is
non-linear [2].
A convolutional neural network (CNN) is a type of artificial neural network
usually designed to extract features from data and to classify given high
dimensional data. CNN is designed specifically to reorganize two dimensional
shapes with a high degree of invariance to translation, scaling, skewing
and other forms of distortion. The structure includes feature extraction,
feature mapping and subsampling layers. A CNN consists of a number of
convolutional and subsampling layers optionally followed by fully connected
output layers. Back propagation algorithm is used to train the model.
Convolution Neural Networks are biologically inspired variants of Mul-
tilayer Neural Network [6]. From experiments mentioned in [3] it is known
that the visual cortex of animals is a complex network of cells. Each cell
is sensitive to small sub-region of the visual fields, known as the receptive
field. According to [4], there are two kinds of cells: Simple cells and Com-
plex cells, where simple cells extract features and complex cells combine
several such local features from a spatial neighbour-hood. CNN tries to
imitate this structure, by extracting the features in a similar way from the
input space and then performing the classification, unlike the standard tech-
niques where features are extracted manually and provided to the model for
classification [3].
A 32 × 32 color image can be represented as a 1024 dimensional vector.
Considering different values of color (R, G and B), the image can be better
represented as a vector of size 4096 (dimensions). To model such high
dimensional data using shallow networks involves estimation of large number
of parameters. Unless training data is large such models lead to overfit
the data.
Convolutional Neural Network can handle such problems by leveraging
the ideas of local connectivity, parameter Sharing and Pooling/Subsampling.
Local Connectivity:
Each image is divided into equal sized units called as blocks or patches. These
blocks of the image are also known as receptive fields. These blocks can be
overlapping or non-overlapping in nature. Overlapping blocks share some
common part of image while non-overlapping blocks do not share. In order
to extract smooth features, overlapping blocks are considered. In an image of
size 32 × 32 with a block size of 4 × 4 gives a total of 64 blocks.
264 J. D. Bodapati and N. Veeranjaneyulu
Each hidden unit is associated with one block of the input image and
extracts features from that block of the image. In this way local features
are extracted and the exact location of the features become less important.
This is beneficial until the relative location with respect to other features is
preserved [14]. Figure 1 illustrates an example of local connectivity where
each hidden unit is connected to a block or patch of the image.
Parameter Sharing:
Each computational layer, also known as convolutional layer in the network
comprises of a number of feature maps. The neurons in one feature map are
constrained to share the same weights. This constraint guarantees: reduction
in parameters and shift invariance [15].
This idea of parameter sharing allows different neurons to share the same
parameters [16]. To accomplish this task hidden neurons are organized into
feature maps that share parameters. Hidden units within a feature map cover
different blocks of an image share same parameters and extracts same type
of features from different blocks. Each block of an image is associated with
multiple feature maps and neurons in different feature maps extract different
features from the same block [17].
Figure 2 helps us to clearly understand the process of parameter sharing:
Each hidden unit in a feature map is connected to different blocks of an image
and extracts same type of features [18]. Hidden units in different feature maps
extract different features from the same block [19].
In an image, blocks can be overlapping. To obtain activation values for
each hidden unit, weights connecting the input channel to feature map is to
be multiplied with input vector. This operation is known as Convolution and
basically we are concerned with discrete convolutions.
Feature Extraction and Classification Using Deep Convolutional 265
Discrete Convolution:
Discrete Convolution Operation can be defined as:
α
h[n] = f (n) ∗ g(n) = f [k]g[n − k],
k=−α
Model Selection:
Pre-processed images are given as input to the CNN model for training. Cross
validation method is used to arrive at the best model. In the experiments
we conducted, following architecture gives the best performance and is
considered as the best model for the given dataset.
Proposed CNN architectures:
The Convolutional Neural Network is trained using Stochastic Gradient
Descent with Momentum. The network consists of an input layer, followed
Feature Extraction and Classification Using Deep Convolutional 269
3 Experimental Results
In this section the model that gives best performance is considered and results
are reported. We have divided the whole work into 2 different tasks. In the
task 1, a neural network with 2 hidden layers is used to classify the features
extracted by DCNN. In the task 2, υ-SVM is used to classify the features
extracted by DCNN.
Task 1 – Classification of DCNN features using neural-network:
The input image is of size 3 × 32 × 32 consists of 3 feature maps (RGB),
6 kernels are used to transform 3 feature maps (RGB) to 6 feature maps.
In each of the feature map different features are being extracted because
of this the image in each feature map looks different. There would be lot of
connections from input layer to convolutional layer but parameters will be less
due to weight sharing. Convolution of image is performed with 3 × 3 matrix
which results in 6 feature maps ofsize 30 × 30. To have non linearity, rectified
linear unit is used activation function for convolutional layer. From Figure 8
we can infer that having several feature maps allow us to look for different
patterns at different locations of the input image.
After applying convolution layer average pooling is applied for subsam-
pling with a window size of 2 × 2. This 2 × 2 subsampling is results in
6 feature maps of size 15 × 15 and weights equal to 1/4 can be used.
272 J. D. Bodapati and N. Veeranjaneyulu
Parameters:
4 Conclusion
In the convolutional neural network designed, the number of feature maps
in each convolutional layer and the number of pairs of convolutional and
sampling layer define the complexity of the network. We have used average
pooling and max pooling layer in our model and observed that for average
pooling results in better performance than max pooling. We can infer that
having several feature maps allow us to look for different patterns at different
locations of the input image. If we increase the number of feature maps and
convolutional layer beyond a limit, error on testing data starts increasing
slowly. If we keep slightly high learning rate, after several iterations the
274 J. D. Bodapati and N. Veeranjaneyulu
References
[1] Jyostna Devi Bodapati and N. Veeranjaneyulu, “Abnormal Network
Traffic Detection Using Support Vector Data Description”, Proceedings
of the 5th International Conference on Frontiers in Intelligent Computing:
Theory and Applications. Springer, Singapore, 2017.
[2] Jyostna Devi Bodapati and N. Veeranjaneyulu, “Performance of different
Classifiers in non-linear subspace” Proceedings of the International
Conference on Signal and Information Processing, 2016.
[3] Veeranjaneyulu N and Jyostna devi Bodapati, “Scene classification using
support vector machines with LDA”, Journal of theoretical and applied
information technology, Vol. 63, pp. 741, 2014.
[4] Tara N. Sainath, Abdel-rahman Mohamed, Nrian Kingsbury, Bhuvana
Ramabhadram, “Deep convolutional neural networks for LVCSR”,
ICASSP, 2013.
[5] Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, “Image Net
Classification with Deep Convolutional Neural Networks”, NIPS, 2012.
[6] DH Hubel and TN Wiesel, “Receptive fields of single neurones in the
cat’s striate cortex”, The Journal of Physiology, Vol. 3, pp. 574–579,
1959.
[7] LeCun, Yann, et al. “Gradient-based learning applied to document
recognition”, Proceedings of the IEEE 86.11, pp. 2278–2324, 1998.
[8] Scherer, Dominik, Andreas Mller, and Sven Behnke, “Evaluation of pool-
ing operations in Convolutional architectures for object recognition”,
International Conference on Artificial Neural Networks, Springer Berlin
Heidelberg, pp. 92–101, 2010.
[9] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton, “Imagenet
classiffication with deep convolutional neural networks”, Advances in
neural information processing systems, 2012.
[10] Simon Haykin, “Neural Network and Learning Machines”, McMaster
University Hamilton, Ontario, Canada.
Feature Extraction and Classification Using Deep Convolutional 275
[11] Zeiler, Matthew D., and Rob Fergus, “Stochastic pooling for reg-
ularization of deep convolutional neural networks”. arXiv preprint
arXiv:1301.3557, 2013.
[12] Ikbal, M. Shajith, Hemant Misra, and Bayya Yegnanarayana, “Anal-
ysis of autoassociative mapping neural networks”, International Joint
Conference Neural Networks, Vol. 5. IEEE, 1999.
[13] Jyostna devi Bodapati & N Veeranjaneyulu, “An Intelligent face recog-
nition system using Wavelet Fusion of K-PCA, R-LDA”, ICCCCT,
pp. 437–441, 2010.
[14] Ciregan, Dan, Ueli Meier, and Jürgen Schmidhuber. “Multi-column deep
neural networks for image classification”. Computer vision and pattern
recognition (CVPR), 2012 IEEE conference on. IEEE, 2012.
[15] Esteva, Andre, et al. “Dermatologist-level classification of skin cancer
with deep neural networks”. Nature 542.7639, 2015.
[16] Courbariaux, Matthieu, Yoshua Bengio, and Jean-Pierre David. “Bina-
ryconnect: Training deep neural networks with binary weights during
propagations”. Advances in neural information processing systems,
2015.
[17] Cortes, Corinna, et al. “Adanet: Adaptive structural learning of artificial
neural networks”. arXiv preprint arXiv:1607.01097, 2016.
[18] Shin, Hoo-Chang, et al. “Deep convolutional neural networks for
computer-aided detection: CNN architectures, dataset characteristics
and transfer learning”. IEEE transactions on medical imaging, 35.5,
pp. 1285–1298, 2016.
[19] Toshev, Alexander, and Christian Szegedy. “Deeppose: Human pose esti-
mation via deep neural networks”. Proceedings of the IEEE conference
on computer vision and pattern recognition, 2014.
[20] Szegedy, Christian, Dumitru Erhan, and Alexander Toshkov Toshev.
“Object detection using deep neural networks”. U.S. Patent No. 9,275,308,
2016.
[21] Neethu Narayanan, K. Suthendran and FepslinAthishMon, “Recognizing
Spontaneous Emotion From The Eye Region Under Different Head
Poses”, International Journal of Pure and Applied Mathematics, Vol. 118,
pp. 257–263, 2018.
276 J. D. Bodapati and N. Veeranjaneyulu
Biographies