Enhancing Alexnet For Arabic Handwritten Words Recognition Using Incremental Dropout

2017 International Conference on Tools with Artificial Intelligence
Enhancing AlexNet for Arabic Handwritten words

Recognition Using Incremental Dropout
Rolla Almodfer1 Shengwu Xiong1, 2 Mohammed Mudhsh1 and Pengfei Duan1, 2*
1
School of Computer Science and Technology, Wuhan University of Technology, Wuhan
430070, China.
2
Hubei Key Laboratory of Transportation Internet of Things, Wuhan University of Technology, Wuhan 430070, China
duanpf@whut.edu.cn
Abstract— Currently, the growth of mobile technologies, lead feature vectors suited for training and recognition. Large
to a necessity to develop handwritten recognition applications. varieties of techniques/classifiers have been employed for the
While the recognition of handwritten Latin and Chinese has been analytic approach. El Hajj et al. in [3] proposed an offline
extensively investigated using various techniques, so little works Arabic handwritten recognition system based on the
have been done on Arabic handwritten recognition, and none of combination of three HMM classifiers. The baseline-
the existing techniques is accurate enough for practical independent and baseline-dependent features were extracted
application. Over the past few years, deeper convolutional neural using a sliding window in three directions for the purpose of
networks (CNNs) have widely been employed for improving writing inclination, overlapping ascenders and descenders and
handwritten recognition performance. In this paper, we enhance
shifted position of some diacritical marks. Then, an HMM
the popular AlexNet for Arabic Handwritten Words Recognition
(HWR). By adopting a dropout regularization, we prevented our
classifier was put forward for in each direction to classify the
system against overfitting problem and reduced the error extracted features. The results showed that this combination
recognition rate. We also investigated ReLU and tanH activation gave better results than a single HMM. The obtained results
functions performance in the fully connected layers. Through given by the proposed model was higher than 90% accuracy.
several settings of experiments using the benchmarking IFN/ENIT Jayech et al. in [4] developed a dynamic hierarchical Bayesian
Database, we achieved a new state-of-the-art classification network for Arabic handwriting recognition. After the
accuracy of 92.13% and 92.55%. Lastly, we compared our best preprocessing step, an explicit segmentation based on the
result to those of previous state-of-the-art. smoothed vertical histogram projection. Then, a set of statistical
features was extracted for each character using the invariant
Keywords—Arabic handwritten; AlexNet; Overfitting; moment, such as Hu and Zernike. After that, the dynamic
Dropout hierarchical Bayesian network used to recognize the Arabic
words. Based on the nature of Arabic writing, Parvez et al. [5]
proposed an off-line Arabic handwritten text recognition
I. INTRODUCTION AND STATE OF THE ART system using structural techniques. A text line segmented into
Automatic handwriting words recognition is among the most words and sub/words and dots were extracted leaving the Parts
important axes in the NLP (Natural language processing). of Arabic words (PAW). The Arabic characters were modeled
Applications like postal address and zip code recognition, by fuzzy polygons. Fuzzy polygons matching algorithm were
passport validation, check processing demonstrated the need for later applied to recognize the Arabic word. Khémiri et al. [6]
recognition of handwritten characters. During the last few proposed a system based on Probabilistic Graphical Models
decades, numerous research results have been reported on (PGM). The system is divided into three stages, preprocessing,
handwriting recognition. Although there are promising results feature extraction and word classification. Preprocessing
for recognition Latin, Chinese and Japanese script, accuracies on includes baseline estimation. Structural features (ascendants,
recognition handwriting Arabic scripts fall behind. This is due descendants, loops and diacritic points) and statistical features
to unlimited variation in human handwriting, the large variety of at the pixel level (pixel density distributions and local pixel
Arabic character shapes, the presence of ligature between configurations) are then extracted from word images. Words are
characters and overlapping of the components. The different recognized using a variety of PGMs.
approaches of handwritten word recognition (HWR) fall into
either the on-line or off-line category. In on-line HWR, the Although being successful, the performance of such
computer recognizes the words as they are written. Off-line approaches has always been substantially dependent on the
recognition is performed after the writing is completed. We here selection of right representing features, which is a difficult task
focus on Off-line HWR, which has traditionally been tackled by for cursive writing. In a holistic approach, the entire word is
following two main approaches: (i) Analytic approach and (ii) recognized without prior decomposition of the word [7], [8]. In
holistic approach. The analytic approach [1], [2] a word is this case, the feature vectors are extracted from the word as a
decomposed into a set of smaller components (e.g., characters, whole. Convolutional Neural Networks (CNNs) are the current
graphemes, allographs) and then features are extracted for each state-of-the-art model architecture for handwritten recognition
component. Finally, the word is transformed into sequential tasks. CNNs apply a sequence of filters to the raw image data to
2375-0197/17/31.00 ©2017 IEEE 663

DOI 10.1109/ICTAI.2017.00106
extract and learn higher-level features, which the model can then achieve a state of the art recognition accuracies equal to 92.13%
use for recognition. and 92.55%.
Recently, Deep Neural Networks (DNN) has acquired a The rest of the paper is organized as follows: Section 2
reputation for solving many computer vision problems, and its start with Arabic handwritten characteristics and challenges; in
application to the field of HWR has been shown to provide section 3 we give an overview of Deep Neural Networks, the
significantly better results than traditional methods [9]. In this details of AlexNet and the proposed enhancing methods.
vein, the first reported successful use of DNN for Arabic Section 4 is dedicated to the conducted experiments and their
handwritten word recognition was multidimensional recurrent results. Finally, we draw our conclusion in section 5.
neural networks [10]. The authors obtained 91.4% accuracy on
IFN/ENIT database. Later, Elleuch et al. [11] presented a II. ARABIC HANDWRITTEN CHARACTERISTICS AND
Convolutional Deep Belief Network (CDBN) to recognize CHALLENGES
Arabic words. The authors obtained 83.7 Accuracy rate on
IFN/ENIT Database. In spite of the previous works, using Deep Arabic script is cursive by nature; which makes its recognition
architectures on Arabic handwriting recognition are relatively more challenging than that of Latin, Chinese and Japanese.
scarce compared to other languages. Arabic is composed of 28 main characters and written from right
to left in both printed and handwritten forms. Each character has
By introducing ImageNet [12] large-scale visual recognition two or four different shapes depending on its position in the
challenge (ILSVRC), many new state-of-the-art network word, which will increase the number of classes to be recognized
architectures are proposed to improve the accuracy of image from 28 to 84. The number and position of dots dominate
classification such as AlexNet [13], Google Net [14] and characters that have similar shapes [18], [19]. Fifteen characters
VGGNet [15]. However, training these networks needs large have dots with the character and 52 basic character shapes
amounts of data and high computational resources. Currently, a without dots. Some challenging structural characteristics of the
delicate method called transfer learning has presented to deal characters in the database are described below:
with training problems. Transfer learning means fine-tuning
CNN models (supervised) pre-trained from one dataset to 1. Arabic word consists of one or more connected
another dataset. For example, a fully trained net on ImageNet components (sub-words), and each one contains one or
database can also be used without the output layer to extract more characters that can be overlapped with other
features from samples of any other dataset. Nevertheless, characters or diacritics. Moreover, multiple characters
directly adapting any of fully trained net for the HWR task can be combined vertically to form a ligature (Fig. 1a).
cannot yield a good performance as the handwritten has specific 2. Some words have touching or broken characters (Fig.
attribute different from those of ImageNet database. Training 1b).
from scratch is required. Training a convolutional net on a small 3. Every writer has an individual writing style (Fig. 1c).
dataset greatly affects the convolutional net’s ability to 4. Some Arabic characters have diacritic marks (a
generalize, often result in overfitting. The error on the training diacritic may be placed above or below the body of the
set is driven to a very small value, but the network error becomes character). These diacritics can be overlapped (Fig.
large when new data is presented. The network has memorized 1d).
the training samples, but it has not learned to generalize to new
(a) Ligature and
ones. overlapping
During these years, many solutions to the overfitting
problem have been presented; one of them is called dropout.
Dropout has overwhelmed the others regularization methods (b) Touching
due to its simplicity and its good empirical results. Dropout and broken
was proposed by Hinton et al. [16] as a form of regularization characters
for fully connected neural network layers. Each element of a
layer’s output is kept with probability p, otherwise, is set to 0 (c) Different writers
with probability (1 − p) [17]. Experimental results show that different style
Dropout improves the network’s generalization ability, giving
improved test performance. However, not all Dropout methods
will improve performance and should not be used “blindly.” (d) Confusion in
In this paper, we introduce state of the art Arabic HWR assigning
dots/diacritics
model based on deep learning. For this purpose, we trained
AlexNet from scratch on IFN/ENIT Database. Dropout was
adopted and applied in different positions using different values Fig. 1 Complexities in Arabic handwriting recognition
to enhance the performance of AlexNet to reach a good
accuracy. We found that using incremental dropout values, the III. METHOD
classification accuracy further improved consistently and
significantly. We also investigate the effect of ReLU and tanh In this section, we briefly summarize a standard DNN,
activation functions in the fully connected layers performance. standard AlexNet. We also give an overview of Overfitting
After several experiments and parameters adjustments, we problem. We then describe our proposed Enhancing models for
Arabic HWR.
664
A. Deep Neural Networks softmax activation in the output. The fully connected layers are
A DNN is one of the most advanced machine learning simply implementing the dot product between the input and
techniques, it consists of a succession of convolutional and weight vectors, where each neuron in layer l is connected to all
max-pooling layers, and each layer receives connections from outputs of neurons in layer l − 1. Moreover, it uses the dropout
its preceding layer. The most popular image classification regularization method to reduce overfitting in the fully
structure of DNN is constructed by three primary processing connected layers and applies Rectified Linear Units (ReLUs) for
layers: Convolutional Layer, Pooling Layer and Fully the activation of those and the convolutional layers function to
Connected Layer (or classification layer). DNN units are speed up the learning process.
described below: C1
Convolutional Layer: M1
C2 M2 FC1 FC2
Let ∈ ℝ × represents the ith map in the lth layer, jth
kernel filter in the lth layer connected to the ith map in the (l-1)th M3 FC3
layer denoted ∈ ℝ × and index maps set Mj ={i|ith in the 5 C3 C4 C4
(l-1) Layer map connected to jth map in the lth layer}. The
th
5
5
convolution operation can be given by equation (1). 5 3 3 3
= (∑ ∗ + ) (1) 3 3 3
∈
256 384 384 256

Where f (.) is non-linearity activation function and is bias.
96 1000
Fig. 2 AlexNet architucure 4096 4096
Max-Pooling layer
C. Overfitting
The max-pooling layer abstracts the input feature into a
lower dimensional feature. It has been shown that max-pooling Training deep neural network is particularly dependent
can lead to faster convergence select superior invariant features, on the availability of large quantities of training data, which are
and improve generalization. Pooling equation can be described important to learn a non-linear function from input to output
in equation (2). that generalizes well and yields high classification accuracy on
unseen data. Not having enough quality data will generate
−1
= ( ) (2) overfitting, which means that the network is highly biased to
the data it has seen in the training set and, therefore will not be
Where Max (.) is Max-sampling function to compute the max able to generalize the learned model to any other samples[20].
An elegant solution to this problem is a dropout. Dropout
value of each n×n region in −1 map.
simply refers to “dropping out units”; units representing both
hidden and visible in the deep network. Dropped out neurons
Classification layer do not contribute to the forward pass and backpropagation.
Dropping a unit out means temporarily removing it from the
The fully connected layer is used at the end of the network with all its incoming and outgoing connections.
network. After multiple convolutional and max-pooling layers, Dropout regularization reduces complex co-adaptations of
a shallow Multi-Layer Perceptron (MLP) is used to complete neurons as a neuron cannot rely on the existence of other
the DNN. The output maps of the last convolution layer are neurons. Therefore, the neurons forced to learn the robust
either down-sampled to 1 pixel per map, or a fully connected features that are effective in conjunction with the other neurons
layer combines the outputs of the topmost convolutional layer [21]. Dropout is reported to achieve success on several
into a 1D feature vector. The last layer is always a fully benchmark databases [22].At test time, all the neurons are used
connected layer with one output unit per class in the but multiply their outputs by the dropout value, which is a
classification task. This layer may have a non-linear activation reasonable approximation to taking the geometric mean of the
function or a softmax activation to output probabilities of class predictive distributions produced by the exponentially many
predictions. dropout networks [23].
D. Enhancing methods
B. Alexnet Architecture
Dropout is a widely used strategy to increase
AlexNet (proposed by Krizhevsky et al. [13]) is one of the
generalization performance, which can be implemented by
deep convolution nets designed to deal with complex scene
randomly dropping units for each layer with a given probability.
classification task on Imagenet database. The task is to classify
Each element of a layer’s output is kept with probability p,
the given input into one of the 1000 classes. The arrangement
otherwise, is set to 0 with probability (1 − p). Experiments show
and configuration of all the layers of AlexNet are depicted in fig.
that Dropout improves the network’s generalization ability,
2. AlexNet includes five convolutional layers, three max-
giving improved test performance. However, not all Dropout
pooling layers, two Local Response Normalization (LRN)
methods will improve performance and should not be used
layers, three fully connected layers, and a linear layer with
“blindly.” In the proposed enhancing method, we trained
665
AlexNet from scratch and called it Net1, then Net2, Net3 and A. Training Method:
Net4 were build using the baseline Net1. Each of the four Suppose we have T categories and the training data for the
networks tests the input image and produces the same number each category are denoted as (xi, yi), where i = {1,..., N }, with xi
of classes. All the networks share the same configuration of the ∈ ℝ and yi ∈ ℝ Being the feature vector and label apart.
number of layers, the number of filters and the dropout after The objective of training is to iteratively minimize the following
fully connected layers. The four networks can answer the cross-entropy loss function:
following two questions: where to add the dropout and to use
which value? Which activation function can improve the ⎡ ⎤
performance? ReLU or tanH? The networks architectures are 1
( )=− ⎢ 1{ = } = ⎥
depicted in Fig. 3 and the description of each network are as ⎢ ⎥
follow: ⎣ ⎦
Net1 (Fully connected dropout): this net represents standard
AlexNet. In Net1 dropout value set to 0.5. The dropout applied
immediately after fully connected layers. where θ is model parameters, ∑ is a factor of
normalization, 1(.) is an indicating function.
Net2 (Convolution dropout): The dropout value set to 0.5 The loss function of J(T) can be minimized by using one of
after convolution layer 1 and convolution layer 2. optimization algorithms such as Stochastic Gradient Descent
Net3 (Max-pooling dropout): The dropout value set to 0.5 SGD, adam or adadelta, during the training process.
after max-pool layer 1, max-pool layer 2 and max-pool layer 3. To put our results in comparison with another state of the art
methods, we have used the offline IFN/ENIT benchmark
Net4 (Convolution and max-pooling dropout): In this net, the database. IFN/ENIT database contains 32492 binary images of
dropout probability is increased with respect to the depth of the Arabic handwritten words written by more than 400 writers.
network. The dropout rate increased by 0.1 after each 19724 words for training and 12768 for testing. The words
convolution layer. The dropout rate increased by 0.2 after each represent 937 Tunisian town/village names. Before training, we
max-pool layer. The purpose of doing so is that the bottom layers shuffle our training data; we normalize the offline words image
of a deep network are usually harder to train compared with the to a size of 227 x 227.
top layers [24].
The experiments conducted on an open source library called
keras [25]. As mentioned before, AlexNet composed of 5
IV. EXPERIMENTAL RESULTS
convolution layers and three max-pooling layers. A max pooling
To examine the performance of the proposed models, we layer follows the layer 1, layer 2 and layer 5. The receptive field
have conducted many experiments. The network's performance of each convolutional layer is 3×3. Three Fully-Connected (FC)
has been measured in term of Classification Accuracy (CA). layers follow the fifth convolutional layer: the first two has 512
The objective of the experiment part was twofold. First, it was channels each. Since the original AlexNet was trained on 1000
a comparison between the state-of-the-art. Second, we wanted classes, its last fully connected layer produces 1000 outputs. We
to check whether dropout was better after convolution or after replace this layer with a new fully-connected layer that has as
max pooling layer and in which values. many outputs as the number of classes (937 for the IFN/ENIT
Net 1:
D D
C1 M1 C2
C M2 C
C3 C4 C5 M3 Fc1 0.5 Fc2
F 0
0.5
D D D D
Net 2: C1 0.5 M1 C2 0.5 M2 C3 C4 C5 M3 Fc1 0.5 Fc2 00.5
D D D D D
Net 3: C C3 C4 C5 M3 0.5 Fc1
C1 M1 0.5 C2 M2 0.5 C F 0.5 Fc2 0.5
D D D D D D D D D D
Net 4: C1 0.0 M1 0.1 C
C2 0.1 M2 0.3 C3 0.2 C
C C4 0.3 C5 0.4 M
M3 0.5 F
Fc1 0.5 Fc2 0.5
0
Convolution layer Max pool layer Dropout layer Fully connected layer
Net name Net1 Net2 Net3 Net4

Dropout AlexNet 0.5 After C1 and C2 0.5 after M1, M2 and M3 Incremental dropout
Fig. 3 The proposed architectures
666
database). The final layer is the soft-max layer. As we used the training reached 100% after 86 epochs. Net2 achieved equal to
latest version of Keras, we replace the Local Normalization 85.96% when using tanH and 85.32% using ReLU. Net2
Layer with Batch Normalization layer. This layer is useful and performed the worst among the four nets. Net3 achieved
versatile. The training is carried out by using Stochastic Gradient accuracy equal to 89.55%, but when replacing its activation
Descent SGD which is a highly common optimization algorithm function with ReLU, the accuracy grows to 90.43%. The best
utilized in deep networks to update the weights. The batch size performer between our nets is Net4. This net achieved the
was set to 64 and the momentum to 0.9. The training was highest classification accuracies 92.13% using ReLU and
regularized with the weight of five · 10−4 and dropout 92.55% using tanH activation function. These two results have
regularization for the first two fully connected layers (dropout outperformed the state of the art results. The results depicted in
ratio set to 0.5). The learning rate was set to 10−2.The type of Table 1 in term of Classification accuracy (CA). The overall
non-linearity used for the convolution layers is Rectified Linear performances of the four networks using ReLU are depicted in
Unit (ReLU). AlexNet was trained for 200 epochs. The whole Fig. 5 (a), and using tanH are depicted in Fig. 5 (b).
training procedure for a single network took at most 3 hours on
a desktop PC with an Intel i7 3770 processor, a NVidia GTX780 C. The impact of dropout
graphics card and 16 gigabytes of on-board RAM.
Dropout consists of setting to zero the output of each hidden
neuron with probability p. If the neurons in CNN are dropped
B. The impact of tanH and ReLU activation functions out, they do not contribute to the forward pass and do not
ReLU (as shown in Fig.4) is an effective activation function participate in backpropagation. In this paragraph, we provide
for use in neural network. ReLU function is given by: additional insight into the performance of the proposed method.
In Fig. 5(a), (b), we show the accuracy performance during
f(x) = max(0,x) training and testing of the model.
The blue and the orange ascending curves correspond to the
Unlike fully connected layer, convolution layer extracts classification accuracy values for the training and the testing. As
feature. Feature extraction requires sparsity in the input feature shown in Fig. 5, Net1 (dropout in fully connected layers), the
maps, and it should set to 0 as many features as possible. Unlike training accuracy is increased very fast, and after 26 epochs, it
ReLU, the sparsity does not come into effect with other reaches 100%. However, the test accuracy of Net1 is not
activation functions as they can generate small values instead promising due to the overfitting on training data (see Table 1).
of zeros. The sparsity in the features helps in speeding up the Net2 performance is the poorer comparative to other networks.
computation process by removing the undesired features. In This is maybe due to higher drop probability (dropout equal to
AlexNet, the fully connected layers used tanH (shown in Fig.4) 0.5) is applied to the first and the second convolution layers.
as an activation function. tanH function is given by: With the help of dropout, Net3 performs better than Net1, and
f(x) = tanh(x) Net2. Dropout after each max-pool layer significantly improves
the accuracy to 89.55% and 90.43%. Net4 outperforms Net1,
The focus of fully connected layers is to generate new Net2 and Net3 achieving the higher accuracies.
features rather than extracting features as convolution layers do. The very high accuracies 92.13% and 92.55% prove the
Moreover, as fully connected layers are close to the output effectiveness of incremental dropout in improving the
layer, so it is less affected by the vanishing gradient problem. generalization performance of the deep neural network.
However, when dropout is applied to every convolution and
max-pool layers in deep CNN, training process can be slow
ReLU TanH since activation signals are dropped exponentially as dropout is
10 1.5
repeatedly applied [19]. Note that in the testing process, dropout
is no longer used for all nets. The results of using dropout in the
8 1
four networks are depicted in table 1.
6 0.5
4 0 TABLE 1. THE PERFORMANCE OF THE ALL EXPERIMENTAL NETS

2 -0.5 Net CA% + ReLU CA% + TanH
0 -1
-2 -1.5 Net1 85.53% 83.3%

-6 -2 0 2 4 6 -6 -4 -2 0 2 4 6
Fig. 4 ReLu and tanH activation functions Net2 85.32% 85.96%
In our experiments, we first test tanH as activation function

in the fully connected layers. Net1 the standard AlexNet Net3 90.43% 89.55%
achieves classification accuracy equal to 83.3%., the training
reached 100% after 26 epochs. While when replace the Net4 92.13% 92.55%
activation function in the fully connected layers to ReLU, the
Net1 performed better and the accuracy grows to 85.53%. The
667
(a) Net 1 Net 2 Net 3 Net 4
Train Test Train Test Train Test Train Test
100% 100% 100% 100%

Accuracy
Accuracy
Accuracy
Accuracy
80% 80% 80% 80%
60% 60% 60% 60%
40% 40% 40% 40%
20% 20% 20% 20%
0% 0% 0% 0%
Epoch Epoch Epoch Epoch
(b) Net 1 Net 2 Net 3 Net 4
Train Test Train Test Train Test Train Test
100% 100% 100% 100%

Accuracy
Accuracy
Accuracy
Accuracy
80% 80% 80% 80%
60% 60% 60% 60%
40% 40% 40% 40%
20% 20% 20% 20%
0% 0% 0% 0%
Epoch Epoch Epoch Epoch
Fig. 5 The networks performance for 200 epochs (a)using ReLU activation function (b) using TanH activation function
D. Comparison with the state of the art Net2, Net3, and Net4. Dropout regularization helps to reach a
good accuracy. We show incremental improvements of the
To show the performance of the proposed enhanced word recognition comparable to approaches used Deep Belief
AlexNet, we compare the performances of different methods Network (DBN) or Recurrent Neural Network (RNN). We have
using the IFN/ENIT database. The results are given in
achieved promising performance which is superior to the state-
Table 2. Elleuch et al [11] achieved 83.7 % recognition
of-the-art systems. The classification accuracy was 92.87%
accuracy (16.3% error rate). Graves et al [10] achieved 91.4 %
recognition accuracy (8.6% error rate). The result of Graves was using ReLU activation function and 92.55% using the
the highest in literature. It can be seen that Net4 achieved the activation function. These two results are the new state-of-the-
higher accuracies on IFN/ENIT database. AlexNet with art records using the deep convolutional neural network to our
Incremental Dropout + ReLU achieved 92.13% classification best knowledge. Using incremental dropout, the classification
accuracy (7.87% error rate). And AlexNet with Incremental accuracy further improved consistently and significantly. As
Dropout + TanH outperformed the state-of-the-art with 92.55% future works, we plan to discover the performance of other deep
recognition accuracy (error rate of 7.45%). networks like VGGNet, GoogLeNet and ResNet on IFN/ENIT
database.
TABLE 2. COMPARISON WITH THE STATE- OF- THE- ART
ACKNOWLEDGMENT
Author Model CA%
This research was supported in part by Science &
Net4:AlexNet with Technology Pillar Program of Hubei Province under Grant
Present work 92.55% (#2014BAA146), Nature Science Foundation of Hubei Province
Incremental Dropout+TanH
Net4:AlexNet with
under Grant (#2015CFA059), Science and Technology Open
Present work 92.13% Cooperation Pro-gram of Henan Province under Grant
Incremental Dropout+ReLU
(#152106000048).
MDLSM
Graves et al[10] 91.4%
REFERENCES
CDBN
Elleuch et al[11] 83.7%
[1] Kim K K, Jin H K, Yun K C, et al. Legal Amount Recognition Based on
the Segmentation Hypotheses for Bank Check Processing[C]//
International Conference on Document Analysis and Recognition, 2001.
V. CONCLUSION Proceedings. IEEE Xplore,2001:964-967.
In this paper, we enhance the famous network AlexNet for the [2] Vinciarelli A. A survey on off-line Cursive Word Recognition [J]. Pattern
Recognition, 2002,35(7):1433-1446.
purpose of Arabic HWR, which demonstrated the efficiency,
[3] Alhajj M R, Likformansulem L, Mokbel C. Combining slanted-frame
applied on IFN/ENIT databases. In order to protect the network classifiers for improved HMM-based Arabic handwriting recognition.[J].
against overfitting, dropout is applied in different positions in IEEE Transactions on Pattern Analysis & Machine Intelligence, 2009,
31(7):1165.
668
[4] Jayech K, Trimech N, Mahjoub M A, et al. Dynamic hierarchical Information Processing Systems. Curran Associates Inc. 2012:1097-
Bayesian network for Arabic handwritten word recognition[C]// Fourth 1105.
International Conference on Information and Communication [14] Szegedy C, Liu W, Jia Y, et al. Going deeper with convolutions[C]//
Technology and Accessibility. IEEE, 2014:1-6. Computer Vision and Pattern Recognition. IEEE, 2015:1-9.
[5] Parvez M T, Mahmoud S A. Arabic handwriting recognition using [15] Simonyan K, Zisserman A. Very Deep Convolutional Networks for
structural and syntactic pattern attributes[J]. Pattern Recognition, 2013, Large-Scale Image Recognition [J]. Computer Science, 2014.
46(1):141-154.
[16] Hinton G E, Srivastava N, Krizhevsky A, et al. Improving neural networks
[6] Khemiri A, Kacem A, Belaid A. Towards Arabic Handwritten Word by preventing co-adaptation of feature detectors[J]. Computer Science,
Recognition via Probabilistic Graphical Models[C]// International 2012, 3(4):págs. 212-223.
Conference on Frontiers in Handwriting Recognition. IEEE, 2014:678-
[17] Wan L, Zeiler M, Zhang S, et al. Regularization of neural networks using
683.
dropconnect[C]// International Conference on Machine Learning.
[7] Madhvanath S, Govindaraju V. The role of holistic paradigms in 2013:1058-1066.
handwritten word recognition[J]. IEEE Transactions on Pattern Analysis
[18] Aburas A A, Gumah M E. Arabic handwriting recognition: Challenges
& Machine Intelligence, 2001, 23(2):149-164.
and solutions[C]// International Symposium on Information Technology.
[8] Ruiz-Pinales J, Jaime-Rivas R, Castro-Bleda M J. Holistic cursive word IEEE, 2008:1-6.
recognition based on perceptual features[J]. Pattern Recognition Letters,
[19] Srihari S N, Ball G. An Assessment of Arabic Handwriting Recognition
2007, 28(13):1600-1609.
Technology [M]// Guide to OCR for Arabic Scripts. Springer London,
[9] Wu C, Fan W, He Y, et al. Handwritten Character Recognition by 2012:3-34.
Alternately Trained Relaxation Convolutional Neural Network[C]//
[20] Lemley, J.; Bazrafkan, S.; Corcoran, P., Smart Augmentation-Learning
International Conference on Frontiers in Handwriting Recognition. IEEE,
an Optimal Data Augmentation Strategy. IEEE Access 2017.
2014:291-296.
[21] Krizhevsky, A.; Sutskever, I.; Hinton, G. E. In Imagenet classification
[10] Graves A. Offline Arabic Handwriting Recognition with
with deep convolutional neural networks, Advances in neural information
Multidimensional Recurrent Neural Networks[J]. Advances in Neural
processing systems, 2012; pp 1097-1105.
Information Processing Systems, 2012:545-552.
[22] Fraser-Thomas, J.; Côté, J.; Deakin, J., Understanding dropout and
[11] Elleuch M, Tagougui N, Kherallah M. Deep Learning for Feature
prolonged engagement in adolescent competitive sport. Psychology of
Extraction of Arabic Handwritten Script[M]// Computer Analysis of
sport and exercise 2008, 9 (5), 645-662.
Images and Patterns. Springer International Publishing, 2015:371-382.
[23] Wu, H.; Gu, X., Towards dropout training for convolutional neural
[12] Deng J, Dong W, Socher R, et al. ImageNet: A large-scale hierarchical
networks. Neural Networks 2015, 71, 1-10.
image database[C]// Computer Vision and Pattern Recognition, 2009.
CVPR 2009. IEEE Conference on. IEEE, 2009:248-255. [24] Zhang X Y, Bengio Y, Liu C L. Online and Offline Handwritten Chinese
Character Recognition: A Comprehensive Study and New Benchmark[J].
Pattern Recognition, 2016, 61(61):348-360.
[13] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep
[25] Chollet F. , “Keras,” https://github.com/fchollet/keras, 2015.
convolutional neural networks[C]// International Conference on Neural
669

Enhancing Alexnet For Arabic Handwritten Words Recognition Using Incremental Dropout

Uploaded by

Copyright:

Available Formats

Enhancing Alexnet For Arabic Handwritten Words Recognition Using Incremental Dropout

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Enhancing Alexnet For Arabic Handwritten Words Recognition Using Incremental Dropout

Uploaded by

Copyright:

Available Formats

2017 International Conference on Tools with Artificial Intelligence

Enhancing AlexNet for Arabic Handwritten words

2375-0197/17/31.00 ©2017 IEEE 663

256 384 384 256

Net name Net1 Net2 Net3 Net4

Fig. 3 The proposed architectures

4 0 TABLE 1. THE PERFORMANCE OF THE ALL EXPERIMENTAL NETS

-2 -1.5 Net1 85.53% 83.3%

Fig. 4 ReLu and tanH activation functions Net2 85.32% 85.96%

In our experiments, we first test tanH as activation function

Train Test Train Test Train Test Train Test

100% 100% 100% 100%

Epoch Epoch Epoch Epoch

(b) Net 1 Net 2 Net 3 Net 4

Train Test Train Test Train Test Train Test

100% 100% 100% 100%

Epoch Epoch Epoch Epoch

You might also like