Arabic OCR Report
Arabic OCR Report
Arabic OCR Report
Faculty of Engineering
Mechatronics Department
MSD-2
Supervised By:
Adopted By:
In this project we studied an Arabic Handwritten Characters Recognition project
using OpenCV and CNN or, Convolution Neural Network, which can be then
scanned or recognized as an image and converted into letter characters, words,
or texts.
Pooling layer – This layer samples the output of the previous layer
resulting in a structure with smaller dimensions. Pooling helps us to keep
only the prominent parts as we progress in the network. Max pooling is
frequently used in the pooling layer, where we pick the maximum value
in a given KxK window.
Fully Connected layer – This layer computes the output scores in the last
layer. The resulting output is of the size 1x1xL, where L is the number of
classes in the training dataset.
Fig.1 (CNN-Layers)
• Data Exploration:
o Import libraries necessary for this project:
o Load Arabic Letters dataset files into data frames:
o Image Normalization:
▪ from the label’s csv files we can see that labels are categorical
values and it is a multi-class classification problem.
▪ the outputs are in the form of: Letters from ' 'أto ' 'يhave
categories numbers from 0 to 27
▪ Here we will encode these categories values using One Hot
Encoding with keras.
▪ One-hot encoding transforms integer to a binary matrix where
the array contains only one ‘1’ and the rest elements are ‘0’.
o The first hidden layer is a convolutional layer. The layer has 16 feature
maps, which with the size of 3×3 and an activation function which is
relu. This is the input layer, expecting images with the structure
outlined above.
o The second layer is Batch Normalization which solves having
distributions of the features vary across the training and test data,
which breaks the IID assumption. We use it to help in two ways faster
learning and higher overall accuracy.
o The third layer is the MaxPooling layer. MaxPooling layer is used to
down-sample the input to enable the model to make assumptions
about the features so as to reduce overfitting. It also reduces the
number of parameters to learn, reducing the training time.
o The next layer is a Regularization layer using dropout. It is configured
to randomly exclude 20% of neurons in the layer in order to reduce
overfitting.
o Another hidden layer with 32 feature maps with the size of 3×3 and a
relu activation function to capture more features from the image.
o Other hidden layers with 64 and 128 feature maps with the size of 3×3
and a relu activation function to capture complex patterns from the
image which will describe the digits and letters later.
o More MaxPooling, Batch Normalization, Regularization and
GlobalAveragePooling2D layers.
o The last layer is the output layer with (number of output classes) and
it uses softmax activation function as we have multi-classes. Each
neuron will give the probability of that class.
o we used categorical_crossentropy as a loss function because its a
multi-class classification problem. I used accuracy as metrics to
improve the performance of our neural network.
• Kernel_initializer: uniform
• Activation: relu
Let's create the model with the best parameters obtained.
o Making a method which will print all metrics (precision, recall, f1-score
and support) with each class in the dataset:
❖ Notes:
o Since the Arabic letters are total of 28 letters, their label names are
represented by integers of 0-27 as (ي-)أ.
o From the resulted output we see there somehow error in expected
letters or labels.
o Since the dataset is something little and may need more training
dataset in predicting the letter’s label even if accuracy is something
high.
o www.packt.com
o Artificial Intelligence with Python, Second Edition, Alberto
Artasanchez, Prateek Joshi
o https://www.kaggle.com/mloey1/ahcd1