Project Documentation Finalized
Project Documentation Finalized
Project Documentation Finalized
A final year project submitted to the partial fulfillment of the requirements for the
degree of Bachelor of Science in Computer Science and Information Technology
awarded by Tribhuvan University.
Submitted by
Abhishek Gupta (T.U Exam Roll No. 8328/072)
Arjun Gautam (T.U Exam Roll No. 8332/072)
Kamal Gautam (T.U Exam Roll No. 8348/072)
Ramkrishna Acharya (T.U Exam Roll No. 8367/072)
Submitted to
Department of Computer Science and Information Technology
Birendra Multiple Campus
Institute of Science and Technology
Tribhuvan University
August 30, 2019
i
Tribhuvan University
Institute of Science and Technology
I hereby recommend that the project prepared under my supervision by Abhishek Gupta,
Arjun Gautam, Kamal Gautam and Ramkrishna Acharya entitled “Devanagari
Handwritten Character Recognition”, be accepted as fulfilling in part requirements for
the degree of Bachelor of Science in Computer Science and Information Technology. In
my best knowledge this is an original work in computer science.
……………………………..
ii
Tribhuvan University
Institute of Science and Technology
Letter of Approval
This is to certify that this project prepared by Abhishek Gupta, Arjun Gautam,
Kamal Gautam and Ramkrishna Acharya entitled "Devanagari Handwritten
Character Recognition" in partial fulfillment of the requirements for the degree of
B.Sc. in Computer Science and Information Technology has been well studied. In our
opinion, it is satisfactory in the scope and quality as a project for the required degree.
……………………
Mr.
External Examiner
Tribhuvan University, Kirtipur
iii
Tribhuvan University
Institute of Science and Technology
Letter of Approval
This is to certify that this project prepared by Abhishek Gupta, Arjun Gautam,
Kamal Gautam and Ramkrishna Acharya entitled "Devanagari Handwritten
Character Recognition" in partial fulfillment of the requirements for the degree of
B.Sc. in Computer Science and Information Technology has been well studied. In our
opinion, it is satisfactory in the scope and quality as a project for the required degree.
……………………
Er. Binod Sharma
Program Co-ordinator, Department of CSIT
Birendra Multiple Campus, Chitwan
iv
ACKNOWLEDGEMENT
It gives us immense pleasure to express our deepest sense of gratitude and sincere thanks
to our highly respected and esteemed guide Er. Binod Sharma for his valuable
guidance, and encouragement in making this project possible. His constructive
suggestions regarding this project work and consistent support are sincerely
acknowledged.
Finally, we would like to express our sincere thanks to all our friends and all those who
supported us directly or indirectly during this project work and make this project a
successful one.
v
ABSTRACT
Devanagari Handwritten Character Recogniton is the system which provides user the Optical
Character Recognition functions on Devanagari handwriting. This Devanagari Handwritten
character recognition system has mainly five stages: Preprocessing, Segmentation,
Feature Extraction, Prediction and Postprocessing. It uses the Convolutional Neural
Network to train the model and uses Image Processing techniques to recognize user image via
camera or local storage. The system recommends the external camera with good image quality
to get best results. Devanagari Handwritten Character Recognition system was carried
out by the team of members, who were devoted to their respective field to
develop the best possible. This system uses python programming language and
different hardware platforms, software platforms to make the system as a whole.
During the development phase different modules were made separately by different
members and were integrated at last. Along with it testing were done. After the system
has been developed it is found that different algorithms gave different level of
correctness.
vi
Table of Contents
ACKNOWLEDGEMENT .................................................................................................. v
ABSTRACT ....................................................................................................................... vi
List of Figures .................................................................................................................... ix
List of Abbreviations........................................................................................................... x
Chapter 1: Introduction ....................................................................................................... 1
1.1 Introduction ............................................................................................................... 1
1.1.1 Devanagari Characters ........................................................................................ 1
1.1.2 Optical Character Recognition............................................................................ 1
1.1.3 Machine Learning ............................................................................................... 1
1.1.4 Deep Learning..................................................................................................... 2
1.1.5 Convolutional Neural Networks (CNN) ............................................................. 2
1.2 Problem Definition .................................................................................................... 4
1.3 Objectives .................................................................................................................. 4
1.4 Scope and Limitation ................................................................................................. 4
1.5 Significance of the Project ......................................................................................... 4
1.6 Features of Project ..................................................................................................... 5
1.7 Report Organization .................................................................................................. 5
Chapter 2: Literature Review .............................................................................................. 6
2.1 Review of Papers ....................................................................................................... 6
2.1.1 Offline Handwritten English Numerals Recognition using Correlation Method6
2.1.2 Devanagari Character Recognition Using Neural Networks .............................. 6
2.1.3 Handwritten Devanagari Character Recognition using Neural Network ........... 6
Chapter 3: System Analysis ................................................................................................ 8
3.1 Requirement Analysis ............................................................................................... 8
3.1.1 Functional Requirements .................................................................................... 8
3.1.2 Non-Functional Requirements ............................................................................ 9
3.2 Feasibility Analysis ................................................................................................... 9
3.2.1 Economic Feasibility ........................................................................................ 10
3.2.2 Operational Feasibility ...................................................................................... 10
3.2.3 Technical Feasibility ......................................................................................... 10
3.2.4 Schedule Feasibility .......................................................................................... 10
3.3 Constructing System Requirements ........................................................................ 11
Chapter 4: System Design and Implementation ................................................................ 13
4.1 System Design ......................................................................................................... 13
4.1.1 Block Diagram .................................................................................................. 13
vii
4.1.2 Dataset Preparation ........................................................................................... 14
4.2 Implementation ........................................................................................................ 14
4.2.1 Implementation in Python ................................................................................. 14
4.2.2 Implementation in Jupyter Notebook ................................................................ 15
4.2.3 Implementation on Google Colab ..................................................................... 15
4.3 Neural Network Design and Implementation .......................................................... 15
4.3.1 Feed Forward Neural Network ......................................................................... 15
4.3.2 Convolutional Neural Network ......................................................................... 16
A. CNN-0 .............................................................................................................. 17
B. CNN-1 ............................................................................................................... 19
C. CNN-2 ............................................................................................................... 20
4.4 Image Processing Model ......................................................................................... 21
4.4.1 Image Acquisition ............................................................................................. 21
4.4.2 Image Preprocessing ......................................................................................... 21
4.4.3 Segmentation .................................................................................................... 23
4.4.4 Character Prediction/classification ................................................................... 23
4.4.5 Character Localization ...................................................................................... 25
4.4.6 Character Recognition ...................................................................................... 26
4.5 List of Algorithms ................................................................................................... 26
4.5.1 Create Dataset ................................................................................................... 26
4.5.2 Create CNN...................................................................................................... 27
4.5.3 Train CNN ....................................................................................................... 27
4.5.4 Image Processing .............................................................................................. 27
4.5.5 Recognition/Prediction ..................................................................................... 28
Chapter 5: System Testing ............................................................................................... 29
5.1 Unit Testing ............................................................................................................. 29
5.2 Integration Testing................................................................................................... 34
5.3 System Testing ........................................................................................................ 37
Chapter 6: Maintenance and Support ................................................................................ 38
6.1 Maintenance ............................................................................................................ 38
6.2 Support .................................................................................................................... 38
Chapter 7: Conclusion & Future enhancements................................................................ 39
7.1 Conclusion ............................................................................................................... 39
7.2 Future enhancements ............................................................................................... 39
Bibliography ...................................................................................................................... 40
APPENDIX ....................................................................................................................... 41
viii
List of Figures
Figure 1. 1: CNN ................................................................................................................. 3
Figure 3. 1: Use Case Diagram ............................................................................................ 8
Figure 3. 2: Gantt Chart ..................................................................................................... 10
Figure 3. 3: DFD................................................................................................................ 12
Figure 4. 1:Block Diagram of DCR system ...................................................................... 13
Figure 4. 1:Block Diagram ................................................................................................ 13
Figure 4. 1:Block Diagram ................................................................................................ 13
Figure 4. 1:Block Diagram ................................................................................................ 13
Figure 4. 2: FFNN Architecture ........................................................................................ 15
Figure 4. 3: FFNN Training Progress ................................................................................ 16
Figure 4. 4: CNN Architecture .......................................................................................... 17
Figure 4. 5: Loss CNN-0 ................................................................................................... 18
Figure 4. 6: Accuracy CNN-0 ........................................................................................... 18
Figure 4. 7: Loss CNN1-1 ................................................................................................. 19
Figure 4. 8: CNN-1 Accuracy............................................................................................ 19
Figure 4. 9: CNN-2 Loss ................................................................................................... 20
Figure 4. 10: CNN-2 Accuracy.......................................................................................... 20
Figure 4. 11: Character Creation using paint ..................................................................... 21
Figure 4. 12: False Image Prediction ................................................................................. 24
Figure 4. 13: Template Localization ................................................................................. 25
Figure 4. 14: Improved Template Localization ................................................................. 25
Figure 4. 15: Character Recognition.................................................................................. 26
Figure 5. 1: Test Case for Cropping the character from Background ............................... 30
Figure 5. 2: Test Case for removing top most part of image to do segments .................... 31
Figure 5. 3: Test case for segmentation of the characters ................................................. 31
Figure 5. 4: Making border around the matched template of the segments ...................... 32
Figure 5. 5: Test case for Predicting Character ................................................................. 32
Figure 5. 6: Improved Segmentation process .................................................................... 33
Figure 5. 7: Test case for Initializing Camera ................................................................... 33
Figure 5. 8: Integration testing of camera and preprocessing............................................ 34
Figure 5. 9: Integration testing of camera, preprocessing and segmentation .................... 35
Figure 5. 10: Integration testing of camera, preprocessing, segmentation and detection. 36
Figure 8. 1: Accuracy visualizing using Tensorboard ....................................................... 41
Figure 8. 2: Model Visualization using Tensorboard ........................................................ 42
ix
List of Abbreviations
NN Neural Network
ML Machine Learning
DL Deep Learning
AI Artificial Intelligence
x
Chapter 1: Introduction
1.1 Introduction
1.1.1 Devanagari Characters
Devanagari is the national font of Nepal and is used throughout the various part of India
also. Devanagari system have 10 numeral characters(०, १, २, ३, ४, ५, ६, ७, ८, ९) and 36
consonants (क, ख, ग, घ, ङ, च, छ, ज, झ, ञ, ट, ठ, ड, ढ, ण, त, थ, द, ध, न, प, फ, ब, भ, म, य,
र, ल, व, श, ष, स, ह, क्ष, त्र, ज्ञ) with 13 vowels.
Devanagari Characters have some characters with similar structures. The ‘ब’ and ‘व’ are
different in only the cross inside circle. Similarly ‘ङ’ and ‘ड’ have only difference is dot.
Also the character ‘प’,’ म ‘, and ‘य’ are almost same. The character ‘२’ and ‘र’ are almost
same.
Some characters like ‘क्ष’, ‘त्र’, ‘ज्ञ’ are derived from other previous characters like ‘ग’, ‘य’
etc.
The character recognition system had been extremely popular from decades. They are
applied in various field of applications. The most common ones are OCR(Optical
Character Recognition), text conversion, text recognition, robotic vision, number plate
scanning, etc. The character recognition systems mainly is of two types- Online Character
Recognition and Offline Character Recognition. On Online recognition, the model is
usable only when model is created whereas offline recognition works anywhere once OCR
is created. Basically OCR uses Machine Learning.
Machine learning (ML) is the scientific study of algorithms and statistical models that
computer systems use to effectively perform a specific task without using explicit
instructions, relying on patterns and inference instead. It is seen as a subset of artificial
intelligence. Machine learning algorithms build a mathematical model based on sample
data, known as “training data”, in order to make predictions or decisions without being
explicitly programmed to perform the task. ML algorithms are used in a wide variety of
applications, such as email filtering, and computer vision, where it is infeasible to develop
an algorithm of specific instructions for performing the task. ML is closely related to
computational statistics, which focuses on making predictions using computers.
Some popular terminology used around ML environments are:
1. Supervised Learning: Thee algorithm builds a mathematical model from a set of
data that contains both the inputs and the desired outputs. For example, if the task
were determining whether an image contained a certain object, the training data for
a supervised learning algorithm would include images with and without that object
1
(the input), and each image would have a label (the output) designating whether it
contained the object.
5. Underfitting and Overfitting: Underfitting is when there is few train dataset and
the loss becomes high for test set. Overfitting is when there is few dataset and the
model is trained for long time and eventually model remembers the dataset. Both of
these problems cause high loss and less accuracy. Using generalization techniques,
proper loss function, train data and learning rate can solve these problems.
Deep learning is a subset of machine learning in artificial intelligence (AI) that has
networks capable of learning unsupervised from data that is unstructured or unlabeled.
Also known as deep neural learning or deep neural network. Convolutional Neural
Networks along with other Neural Networks are used on Deep Learning techniques.
Convolutional neural networks are deep artificial neural networks that are used primarily to
classify images (e.g. name what they see), cluster them by similarity (photo search), and
perform object recognition within scenes. They are algorithms that can identify faces,
individuals, street signs, tumors, platypuses and many other aspects of visual data.
Convolutional networks perform optical character recognition (OCR) to digitize text and
make natural-language processing possible on analog and hand-written documents, where
the images are symbols to be transcribed. CNNs can also be applied to sound when it is
2
represented visually as a spectrogram. More recently, convolutional networks have been
applied directly to text analytics as well as graph data with graph convolutional networks.
Figure 1. 1: CNN
2. Max-Pooling layer is a layer where we take only few pixels from previous layers.
We must provide a pool size and then that pool size is used on input pixels. The
pool window is moved over entire input and max value within the overlaped input
is taken. For example, pool of size (2, 2) gives half of input data.
3. Dropout is a layer which is actually used to avoid overfitting. This layer randomly
cuts the connection between two neurons of different layers. For example, dropout
of value 0.5 will cut the half of input's connection. Due to this effect, a NN couldn't
memorize input sequence.
4. Flatten layer is one where multiple sized input is converted into 1d vector.
3
5. Dense layer on CNN is mostly used to do classification after doing whole
convolution thing.
1.3 Objectives
The objective of this system is to do possible research on Devanagari Characters for Computer
Vision. This system tries to provide as accurrate as possible prediction of characters for the user
input image which can be taken real time or from local storage. This system is made simple to
understand and using simple Image Processing techniques this system is made abstract as
possible. However, it also aims to serve the following objectives:
1. Predict the characters found on the local storage image.
2. Predict the characters found on the image taken by a system camera device.
3. Predict the characters found on the video(real time).
This project is solely based on Deep Learning techniques, to create the model with high
accuracy. The project intends to recognize the characters found on the image by using
various image process techniques on backend.
This system is academic project so we are lacking many resources and especially high
processing power but we have tried to reach satisfying state of accuracy by using Google
Colaboratory for model creation and our system is tested on limited examples.
4
1.6 Features of Project
This system takes the image as input either by local storage or by the camera device or
even the video and returns the predictions for the characters present there. Supervised
learning was used to train the model and this system works offline. While working offline
the system uses the saved trained model from local storage.
Using only the saved model also hides the complexity behind the model creation and
further increases the recogniton time.
5
Chapter 2: Literature Review
The purpose of this system is to recognize the Handwritten Devanagari Character. Some
group of researchers have worked on this task to make accurrate system. Though this
project aims to recognize the simple charcters, it is obviously not the first one. It would
have been impossible to proceed without the study of previously developed similar typer of
projects and systems. Character recogniton system have been developed as OCR on many
countries but on our country this project tries to define a problem on different way. Thus
study and analysis of few of such project was surplus into our project. Using pretrained
model this system tries to classify the individual character present on the image. This
project also can be inspiration for AI passionate students.
The most popular Devanagari Handwritten Character Recognition system was designed on
2015 and it is written on the paper that system have achieved 0.98471 accurracy using
CNN.
The dataset is publicly available by authors from Computer Vision Research Nepal.
In this paper author has proposed system is to efficiently recognize the offline handwritten
digits with a higher accuracy than previous works done. Also previous handwritten number
recognition systems are based on only recognizing single digits and they are not capable of
recognizing multiple numbers at one time. So the author has focused on efficiently performing
segmentation for isolating the digits.[8]
In this digital era, most important thing is to deal with digital documents, organizations
using handwritten documents for storing their information can use handwritten character
recognition to convert this information into digital. Handwritten Devanagari characters are
more difficult for recognition due to presence of header line, conjunct characters and
similarity in shapes of multiple characters. This paper deals with development of grid
6
based method which is combination of image centroid zone and zone centroid zone of
individual character or numerical image. In feature extraction using grid or zone based
approach individual character or numerical image is divided into n equal sized grids or
zones then average distance of all pixels with respect to image centroid or grid centroid is
computed. In combination of image centroid and zone centroid approach it computes
average distance of all pixels present in each grid with respect to image centroid as well as
zone centroid which gives feature vector of size 2xn features. This feature vector is
presented to feed forward neural network for recognition. Complete process of Devanagari
character recognition works in stages as document preprocessing, segmentation, feature
extraction using grid based approach followed by recognition using feed forward neural
network.[3]
7
Chapter 3: System Analysis
8
The Use Case Diagram of DCR consisted of the two actors i.e. user and computer. The
actors interact with the DCR system shown in the middle rectangle. The use cases are
represented in the oval shape. The include and extend links are used to show the relation
between the use cases.
The Devanagari Handwritten characters should be drawn by the user. The character dawn
are given to the DCR system directly using the file name where the image is saved or using
the camera. The DCR system should have the functionality to scan the image given by the
user. The scanning process includes pre-processing and segmentation functions serially.
The DCR system should also consist the functionality to check the pixels using neural
network for the purpose of recognizing the characters. The system ten finally have the
functionality to display the character recognized using the user interface.
The above described functional requirements are mandatory for developing the well-
functioning DCR system.
Performance
DCR system should recognize the character accurately. It should be efficient in resource
utilization like memory, CPU, storage etc.
Scalability
DCR system should be scalable i.e. it should be able to recognize large number of
characters as per demand.
Reliability
The DCR system should classify individual character accurately and must use CNN.
Recoverability
In case of wrong prediction it should be able to suggest users a solution.
Usability
This system can be used by a user who have python and few Machine learning environments.
Interoperability
DCR system is made by using various ML frameworks and python. So, they must operate
together.
9
3.2.1 Economic Feasibility
Software to be used in the development of the DCR system are python, anaconda and
are available for free. For model creating Google Colab is freely availabe and we can use
nearly 12 GB of GPU. Hardware like computer, webcam to be used in this project are
personnel belonging.
The schedule of DCR system has 7 major activities. Study and analysis is scheduled to be
completed on first month. Data Collection and Manipulation is to be done on following 10
days. Development of System and implementation starts on third phase and is selected to
be run for 1 month. In next 7 days, system testing is scheduled to be done. The
documentation is to be done after the system development started and is to be done for 50
days. The system is reviewed on 6th phase and is done for 10 days. On last phase
presentation is to be done for 5 days.
10
3.3 Constructing System Requirements
• Process Modeling
Process modeling graphically represents the process that capture, manipulate,
store, and distribute data between a system and its environment and among system
components. Each process transforms inputs into outputs.
Data Flow Diagram (DFD) is commonly used in process modeling.
2. Level 1 DFD: Level 1 DFD is the expansion of level 0 DFD. The input
image is given to the DCR system by the help of user interface. The
scanned image from the user is then pre-processed. The pre-process
procedure involves finding the borders and segmenting the image if it
contains multiple characters. The features extraction process extracts the
features from the pre-processed image of character. The extracted features
are passed to the process recognition of character and the recognized
character are displayed to the user.
11
Figure 3. 3: DFD
12
Chapter 4: System Design and Implementation
Character Recognition: In the first step the raw image is acquired with the help of
camera. This image is pre-processed for abstracting the features. The image is segmented if
it has more than one character. The segmented image is passed to the saved model. The
model will predict the characters, localize image and display it to the user.
The dataset is publicly available on the web. The dataset is divided into two parts, test and
train. After downloading the dataset from web it should be converted to the best format to
feed into NN. There are different collections of train and test images in grayscale. Trainset
have 78,200 examples with 1700 for each character. And testset have 200 examples for
each character.
The foreground of all images are in white and the background are black color. Each image
have 2 pixels border on each side of character. So the character is 28 * 28 pixel grayscale
image plus 2 pixels border on each side makes the image 32 * 32 pixel. Each image is on
PNG format and each example is inside the folder of its label.
To train a model, passing the entire image file by moving back and forth to folders
increases the runtime complexity of the process so to avoid these problems the entire
image is saved into a CSV file by converting every image pixels in range 0 to 255. On
CSV file first column is the label and rest 1024 are pixel values. Labeling is done 0 to 46
serially like:- 0 to ‘0’, 10 to ‘ka’ and 46 to ‘gya’. Both train and test images are saved into
individual single CSV file and it takes nearly 10 minutes to complete on local device. But
size of CSV file is larger than the entire image files size.
4.2 Implementation
4.2.1 Implementation in Python
All the coding was done on python 3.6 along with popular libraries like Numpy,
Matplotlib, Opencv, Keras, Tensorflow and other bulit in libraries. Keras is a popular
neural networks framework using Theano or Tensorflow as a backend. It supports GPU
training, that saves a lot of time. Opencv is popular for image processing applications,
numpy is famous for array operation and matplotlib is twin of Matlab. Tensorflow is used
as backend here.
14
4.2.2 Implementation in Jupyter Notebook
Google Colab provides nearly 12gb of GPU and high TPU, and also CPU for model train/test online.
Training the large dataset on our system was challenging due to lack of resources and
Google Colab is nearly 8 times faster than our system so Colab is our best choice. But
creating a working environment on Colab is really a challenging task.
First network is FFNN and it is the simple among the NN class. Here is the model’s
summary:
And the model was trained for 30 epochs. It was trained using ‘adam’ optimizer and
‘categorical_crossentropy’ as loss. All the internal layers used the ‘sigmoid’ function
whereas the last layer used ‘softmax’. 20% of training data was split for validation. The
batch size was 32. The time taken was 42 minutes on Dell I5 with 8gb RAM. Here is the
training progress:
15
Figure 4. 6: FFNN Training Progress
The training process didn’t progress after 5th epoch which clearly shows that this model is
suffering from overfitting. Overfitting can be decreased by adding dropout layers,
regularizations and some noise also. The test accuracy was 91.77%.
Multiple CNNs with same network architecture but with different parameters were trained.
There are lots of advantages of using CNN, one of them is few learning parameters.
The architecture parameters in each layer is as follows:
• Number of filters: 32
• Filter size: (3, 3)
• Pool size: (2, 2)
• Activation function: ReLu on except final layer, softmax on final layer
• Dropout: 0.25 on inner and 0.5 on dense layer
16
The architecture is as follows:
A. CNN-0
17
Figure 4. 8: Loss CNN-0
18
B. CNN-1
19
C. CNN-2
20
4.4 Image Processing Model
The model training is finished and the intention was real world character recognition.
Character Detection is an electronic conversion of images of typed, handwritten or printed
text into machine-encoded text. Here input image is provided from directory or is taken
from computers’ camera. The input image is thus processed and segmented and provided
to NN for prediction. In most of the cases, input image may contain lots of noise which
make difficult for our NN to predict character on the image. The provided image may also
include less brightness and not well written characters which make difficult or sometimes
unable to predict characters. So provided images here are assumed as high quality and only
high quality images are easier or possible to predict character.
4.4.1 Image Acquisition
Image acquisition part plays the main role on recognition problem. Because no matter how
accurate model was on testing, the real world image will never be same. In real world
image, there will be lots of noise, blur, and many other quality degradation problems.
Image acquisition devices like Camera also affects the property of image taken. This
project can work with laptop camera which is nearly 1 MP.
For beginning phase, we worked with the text written using paint’s pencil.
In this stage the image was converted into grayscale, and a numpy array was prepared to
store the image pixels. After this the intention was to find foreground and background
colors. Removing some noise and doing threshold makes it easier for image to recognize
text, and find foreground color. Here we used the combined threshold of Otsu and Binary.
21
Image thresholding is a simple form of image segmentation. It is a way to create a binary
image from grayscale or full-color image. This is typically done in order to separate
“object” or foreground pixels from background pixels to aid in image processing.
The binary Thresholding function creates a raster output that divides your raster into two
distinct classes. The algorithm behind the binary Thresholding function, the Otsu method,
was designed to distinguish between background and foreground in image by creating two
classes with minimal intraclass variance. When working with a raster dataset that has a
unimodal distribution, Binary Thresholding divides the data into two distinct classes. It
creates a high-value class, displayed with white pixels, and a low-value class, displayed
with black pixels.
The main problem arise during thresholding in this project is that it sometimes fails to
detect the exact foreground and background pixels, which results in incorrect detection of
the actual text or number. This is due to reflection of light from paper to camera. If there is
unequal distribution of light falls on paper then the part where light falls seems to have
high intensity than that of other part then the algorithm (i.e. thresholding function) separate
that part as foreground and other part as background of text or vice-versa. So that the text
detection may fails to represent or predict the actual text. Similarly, high brightness may
cause difficult to detect the foreground and background pixels and the intensity of text
must be high means that it is better if we use bold letter written with marker for example.
The paper must be almost noise free and the text must be clear with same color.
For our project the image after thresholding will be pure binary. The text can be either on
black or white but after thresholding it will not be clear. So we need to write a method to
obtain exact color. We check the color value of 5 pixels from top left corner to bottom left.
This 5 diagonal pixels determines our foreground and background color. This has to be
done to represent it as train image’s property so that our system could predict with high
accuracy.
22
4.4.3 Segmentation
The segmentation means to break the entire image into small fragments. On this step,
checking of image was done to determine if there are multiple characters present or not.
Segmentation process was done here on unique way by defining the possible rows for top
most part of character and the possible percentage of space between characters. Each
segment is further passed to a prediction process.
Segmentation function was used in order to segment each digit or the character from the
group of digits or words for the purpose of detecting multiple characters. Each segments
were given as input to neural network which was trained by providing set of single
character or digit. Thus, it could predict each input and finally gives the desired output.
Here the input was scanned from left to right after rotating it towards left (excluding top
joining line i.e. called as ‘diko’ in Nepali language) until background pixel is found and
when background pixel is encountered, function store it as first character and predict it and
process is continued. But the problem arises in some characters like ग, ण which contains
background pixels in the same word when passed to segmentation function. And the
result fails to meet pre-defined determination. This could happen with others word
too. Thus, we define the gap between foreground and background pixel above which
function store it as a single character and store those background location or columns
in array and send only those part of array to the prediction function. As a result the
prediction with high accuracy is obtained.
Each segment was passed to prediction process. Before doing actual prediction, the shape
of segment must be resized as that of the neural network input. Thus, each segmentis
converted into 30 by 30 sized image and in addition, we added a 1 pixel borders around it
with background color. Then our segments will be of 32 by 32 shape which is the input
shape for our model. Then it is to the neural network. If the segment have high prediction
then is assumed that the character should be shown. Prediction can be wrong also
depending upon the image quality due to the false segmentation. Then that segment is
passed to localization.
When the product of rows and columns of the image to be resized is larger than 2^31,
ssize.area() results in a negative number and thus images sometimes cannot be converted
into 32x32 size. This appears to be a bug in OpenCV and hopefully will be fixed in the
future release. A temporary fix is to build OpenCV with this line:
CV_Assert(ssize.area()>0);
The above applies only to image whose width is larger than height. For images with height
larger than width the following code is applied as temporary fix to OpenCV:
CV_Assert(dsize.area()>0);
23
On the concept of finding foreground and background color 5 pixels from the top left
corner were checked. In most of the cases find foreground and background color is
detectable but what if in the case of image whose text pixels are on the positions where
color checking is being done then in such condition foreground and background pixels
values would be taken as same.
If the provided image has very minimum noise, has good brightness and have high quality
regarding to its writing then the neural network can predict the provided character in the
image with high prediction and if the above characteristic is failed to meet in provided
image then neural network is unable to predict character and shows NaN(Not a Number)
message.
It can also predict fake images which are non Devanagari character because it is limitation
of this project. In fact it is feature of NN with the fact that it can return only labels that was
trained. This can also be called as a sensitive part of our project because it can return any
character which is not in the fake image. And our project does not have any code to detect
“this is fake image” and return to user with the dialog box “Try Again”.
24
4.4.5 Character Localization
Here the intention was to find each segment inside an original image by using concept of
template matching. If the template is matched then make a rectangle around the matched
place on the original image. This process is done in order to remove unwanted spaces
between characters which actually removes the chances of having false segments between
characters thereby increasing the accuracy of character prediction. It can be also known as
improved version of segmentation. Here cropping is done from borders on the segments.
25
4.4.6 Character Recognition
This is the overall collection of previous processes. The actual recognition is seen after
there is bordered around the found character and its corresponding label as well. The final
detection takes some time and gives accurate prediction. If poor image is given then it
gives false prediction. So high characteristics images is recommended.
Step 1: Start
Step 2: For every directory inside given directory, if exists:
i) Open the images inside that directory on grayscale.
ii) Find the label from the folder name and convert it into appropriate form.
iii) Create array of size 1025 and place label on 1st column them pixels on
1024.
iv) If CSV file exists: Append the array on the top of corresponding file.
Otherwise create a CSV file and append.
Step 3: Stop
26
4.5.2 Create CNN
Step 1: Start
Step 2: Define input shape to network.
Step 3: For each convolutional block :
a. Define no. of output filters, kernel size, stride, padding.
b. Define activation function.
c. Add max pooling layer.
d. Add drop out layer.
Step 4: Add flatten the output of convolution process.
Step 5: Add some dense layer with output shape and activation function.
Step 6: Add drop out layer.
Step 7: Add output layer.
Step 8: Stop.
4.5.5 Recognition/Prediction
Step 1: Start.
Step 2: Get each segments of image.
Step 3: Convert each segments into 32*32 segmented image.
Step 4: Predict the character in segmented image using the CNN.
Step 5: If prediction accuracy is high localize original segment in original image.
Step 6: Display the original localized image.
Step 7: Display the predicted characters.
Step 8: Stop.
28
Chapter 5: System Testing
The testing phase can be carried out manually or by using automated testing tools to ensure
each component works fine. After the project is ready its various components were tested in
terms of quality, performance to make it error free and remove any sort of technical jargons.
Testing also is done to measure the difference between the desired and the developed
system. Testing is need on development cycle of system to ensure that the system’s every
component works fine.
29
Figure 5.1. 1: Top down extreme position
30
Test Case 2: Removing the topmost part of characters
Figure 5. 2: Test Case for removing top most part of image to do segments
31
Test Case 4: Template localization
32
Test Case 6: Improving segmentation process
Do cropping from borders on the segments also. Which actually removes the chances of
having false segments between characters and removes the huge space around the
characters.
33
5.2 Integration Testing
Integration testing integrates individual modules and tested as groups. Integration testing takes
the unit tested modules, group them into the larger aggregates, applies tests and delivers the
output.
Some of the major test cases are listed below:
34
Test Case 2: Integration of Camera and Preprocess with Segmentation module
After the integration of camera and preprocess module have passed the test, segmentation
module is integrated. Snapshots of this integration is shown below.
35
Figure 5. 10: Integration testing of camera, preprocessing, segmentation and detection.
36
5.3 System Testing
System testing has done after integrating testing in order to ensure that the whole systems
functions properly. After the integration testing the whole system working process was
checked. The output was as per the system specifications and hence the system was found
to work properly. But there are some important things to be aware about this system’s
recognition time.
Results:
• If the system is running for long time and memory usage is high then same image
takes more time to recognition than on normal condition.
• Cropping each segments might save time while recognition if there are few
segments. But if there is large number of segments then it takes more time to do
recognition.
• The image taken from web cam usually have few pixel (low quality) hence it takes
less time to do recognition but image with better quality have more pixels and so
the recognition time increases.
37
Chapter 6: Maintenance and Support
6.1 Maintenance
DCR is implemented on the windows system with python 3.6 and libraries with
specific versions. Since this system have small quality camera, there will not be fine
image. As the recognition times increases, the system becomes slower due to memory
usage. One possible solution to the low quality image is to show the light toward that
object or written text due to which the paper will reflect much light and hence the
good image is captured. Also there is always a high chance of getting false segments
so changing some parameters around will decrease this problem. The recognition on
the video was also done but doing recognition on the every frame of video causes the
system to crash because it needs more memory. A typical image requires average of
4sec of recognition time. The time can be decreased by either increasing resources or
by optimizing the code.
6.2 Support
The advanced version of Waterfall Model is used in this project i.e. V Model. V model
focuses on verification and validation so we can quickly change the requirement of the
system. If any changes need to be made they can be accommodated in short period of
time as V model allows us to do so. The recognition process can be made fast by
increasing resources but this lies on the economic feasibility of system.
38
Chapter 7: Conclusion & Future enhancements
7.1 Conclusion
This is the age of Artificial Intelligence where intelligent programs/systems does
many things. The research has been done on various computer vision field like
number plate scanning, autonomous vehicle, text conversion, OCR etc. While the
world has already seen the OCR for English characters, this project intended to do
small effort for Devanagari Characters recognition. Because this system is still
machine dependent it will not be that use for public yet. But with the development of
mobile app for this system, language conversion feature can also be added. This
system needs more researchers to improve the behavior.
During the entire project development period we were able to develop our skills. This
project enabled us to manage time and resource besides of different constraints. We
learned how to work on group and hence develop a system.
39
Bibliography
1. Chhabra, A. (2019). Deep Learning Based Real Time Devanagari Character
Recognition. San Jose: SJSU ScholarWorks. doi:https://doi.org/10.31979/etd.3yh5-xs5s
2. Deshmukh, A., Meshram, R., Kendre, S., & Shah, K. (2014). Handwritten Devanagari
Character Recognition. Pune: International Journal of Engineering Research.
3. Dongare, S. A., Kshirsagar, D. B., & Waghchaure, S. V. (2014). Handwritten
Devanagari Character Recognition using Neural Network. Kopargaon: IOSR Journal
of Computer Engineering (IOSR-JCE).
4. Gyawali, K. P., Acharya, S., & Pant, A. (2016). Deep Learning based large scale
handwritten Devanagari Character Recognition. Kathmandu: Computer Vision
Research Group.
5. Narang, V., Roy, S., Murthy, O. V., & Hanmandlu, M. (2013). Devanagari Character
Recognition in Scene Images. Washington, DC: IEEE.
6. Negi, V., Mann, S., & Chauhan, V. (2017). Devanagari Character Recognition Using
Artificial Neural Network. New Delhi: International Journal of Engineering and
Technology (IJET). doi:10.21817/ijet/2017/v9i3/1709030246
7. Sayyad, S. S., Jadhav, A., Jadhav, M., Miraje, S., Bele, P., & Pandhare, A. (2013).
Devnagiri Character Recognition Using Neural Networks. Ashta: International Journal
of Engineering and Innovative Technology (IJEIT).
8. Vats, I., & Singh, S. (2014). Offline Handwritten English Numerals Recognition using
Correlation Method. Chandigarh: IJERT.
40
APPENDIX
Appendix A:
41
Appendix B:
42
Appendix C: Source Code
i. Prepare Dataset
#import libraries
import os
import numpy as np
import csv
import cv2
#define a method to wrap all essential processes, we will give location to the
function
def create_csv_data_file(location):
43
elif(len(image_labels) == 2):
label = int(image_labels[::-1][0])
#first column will hold the label of the example and rest 1024 will hold
pixel info.
image_array = np.hstack([np.array(label),
np.array(open_image).reshape(1024)])
create_csv_data_file('train/')
print('Done Creating trainset !!!!\n')
create_csv_data_file('test/')
print('Done Creating testset !!!!')
plt.style.use('seaborn-whitegrid')
model = Sequential()
model.add(Conv2D(32, (3, 3), activation = 'relu', input_shape = (32, 32, 1),
data_format = 'channels_last'))
model.add(Conv2D(32, (3, 3), activation='relu'))
44
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(46, activation='softmax'))
model.summary()
train_link = 'https://drive.google.com/open?id=104DhX-7q-
6gVxM6I7EB8C5XTF0YQ5njr'
fluff, id = train_link.split('=')
downloaded = drive.CreateFile({'id':id})
downloaded.GetContentFile('train.csv')
test_link = 'https://drive.google.com/open?id=1zZTF2b6p8aJoAaPGSrvd04Y5f-
AP1VPD'
fluff, id = test_link.split('=')
downloaded = drive.CreateFile({'id':id})
downloaded.GetContentFile('test.csv')
45
for each_number in example_number:
each_set = current_dataset[each_number]
all_values = each_set.split(',')
return(inputs, targets)
46
plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
#model.save('cnn2.hdf5')
model_json = model.to_json()
shape = image.shape
bg = np.repeat(thresh, shape[1])
count = 0
rows = np.arange(1, shape[0])
#print(rows)
for row in rows[::-1]:
if (np.equal(bg, image[row]).any()) == True:
count += 1
else:
count = 0
if count >= check:
bottom = row + count
break
#print(count)
#plt.imshow(here_img[top:bottom, :])
#plt.imshow(here_img[top:bottom, :])
#plt.show()
d1 = (top - 2) >= 0
d2 = (bottom + 2) < size[0]
48
d = d1 and d2
if(d):
b=2
else:
b=0
return (top, bottom, b)
49
# print(bg_keys)
if len(bg_keys) > 1:
lenkeys = len(bg_keys) - 1
new_keys = [bg_keys[1], bg_keys[-1]]
if lenkeys == 1:
if (new_keys - bg_keys) < (18 * shape[1]/ 100):
return [bordered]
#print(lenkeys)
for i in range(1, lenkeys):
if (bg_keys[i+1] - bg_keys[i]) > check:
new_keys.append(bg_keys[i])
#print(i)
new_keys = sorted(new_keys)
#print(new_keys)
segmented_templates = []
first = 0
for key in new_keys[1:]:
segment = bordered.T[first:key]
# plt.imshow(segment)
# plt.show()
segmented_templates.append(segment.T)
#show middle segments
# plt.imshow(segment.T)
# plt.show()
first = key
last_segment = bordered.T[new_keys[-1]:]
segmented_templates.append(last_segment.T)
return(final_segments)
else:
return [bordered]
except:
# print('exception on segmentation')
return [bordered]
51
return main_image
ret,th_img=cv2.threshold(blur,0,255,cv2.THRESH_BINARY_INV+cv2.THRESH
_OTSU) #converts black to white and inverse
plt.imshow(template)
plt.show()
#print("Process: Segmentation....\n")
segments = segmentation(template, text_color)
#print('Process: Detection.....\n')
return segments, template, th_img, text_color
import cv2
import time
from recognition import recognition
def camera(flag):
choice = print("Click spacebar for photo and anything else for video.\n")
52
orig = 1
cap = cv2.VideoCapture(0)
tr = 0.1
br = 0.8
lc = 0.1
rc = 0.8
f=0
while(flag):
ret, frame = cap.read()
if ret:
#key event
s = cv2.waitKey(2) & 0xFF
if(chr(s) == 'x'):
f = -1
if(chr(s) == 'z'):
f=1
if(chr(s) == 'a'):
tr = tr + 0.1 * f
if(chr(s) == 'd'):
br = br + 0.1 * f
if (chr(s) == 's'):
lc = lc + 0.1 * f
if (chr(s) == 'w'):
rc = rc + 0.1 * f
53
print("Doing RT...")
recognition(ogray, 'no')
else:
if(orig != 0):
show = frame[:]
text = "Press 'space' to take a photo and 'enter' to do realtime(slow)."
text1 = "Make sure the character is inside rectangle."
text2 = "a/s/d/w for change rectangle and z/x for inc/dec."
cv2.putText(show, text1, (15, 50), cv2.FONT_HERSHEY_COMPLEX,
0.75, (0, 100, 200))
cv2.putText(show, text2, (0, np.shape(frame)[0] - 10),
cv2.FONT_HERSHEY_COMPLEX, 0.65, (50, 20, 255))
cv2.rectangle(show, (s_x, s_y), (e_x, e_y), (0, 255, 0), 2)
cv2.putText(show, text, (15, 30), cv2.FONT_HERSHEY_SIMPLEX, 0.6, (15,
0, 255), lineType=cv2.LINE_AA)
cv2.imshow('Project DCR', show)
else:
print('Trying.....\n')
continue
if s == 27:
break
cap.release()
cv2.destroyAllWindows()
v. Main file
from recognition import recognition
import cv2
import matplotlib.pyplot as plt
import time
from video_test import camera
try:
test = input('Please enter the image directory with name.\n')
test = cv2.imread(test, 0)
plt.imshow(cv2.cvtColor(test, cv2.COLOR_GRAY2RGB))
54
plt.xticks([])
plt.yticks([])
plt.show()
time1 = time.time()
in_img = recognition(test, 'show')
print("In %f" %(time.time()-time1), 'sec')
except:
print("Image not found now turning to video mode.\n")
# camera(True)
try:
camera(True)
except:
print('Something is wrong. Try with more stable, less noise and clear
picture.\n')
def prediction(img):
# load json and create model
json_file = open('cnn2\cnn2.json', 'r')
loaded_model_json = json_file.read()
json_file.close()
loaded_model = model_from_json(loaded_model_json)
loaded_model.save('cnn.hdf5')
loaded_model=load_model('cnn.hdf5')
55
characters =
'०,१,२,३,४,५,६,७,८,९,क,ख,ग,घ,ङ,च,छ,ज,झ,ञ,ट,ठ,ड,ढ,ण,त,थ,द,ध,न,प,फ,ब,भ,म,य,र
,ल,व,श,ष,स,ह,क्ष,त्र,ज्ञ'
characters = characters.split(',')
output = loaded_model.predict(x)
output = output.reshape(46)
predicted = np.argmax(output)
devanagari_label = characters[predicted]
success = output[predicted] * 100
56