Handwritten Hindi Character Recognition Using MultipleClassifiers in Machine Learning
Handwritten Hindi Character Recognition Using MultipleClassifiers in Machine Learning
ISSN No:-2456-2165
Abstract:- Hindi is a national language of India spoken in document reading, mail sorting and verifying, bank
many states in our countries, like Bihar, Uttar Pradesh, processing, and postal card address recognition[9][10]. The
Madhya Pradesh, Jharkhand, and Delhi. The Hindi application of handwritten recognition is a type of optical
language is 3rd most popular language globally, which is recognition. In such a situation, when a person behaves
the script of Devanagari. It consists of 36 primary toward another language, they can take a picture of a
alphabets and ten digits. We present sophisticated handwritten image or document and forward it to the HCR
handwritten Hindi character recognition (2HCR) using algorithm[7]. These applications help people to develop
machine learning techniques to implement Hindi reading and writing skills[11]. India is a multi-dialectal
characters and digits. A dataset consists of Ninety-Two country comprised of eighteen official languages. Hindi is the
Thousand images of 46 different types of characters and national language of India, and it is used in many Indian
digits in the Hindi language segmented from handwritten states[12]. It is the third most popular language globally and
documents. Nowadays, it has become easy to train data is written in the Devanagari script[13][14]. Our main
because of the availability of various algorithms and challenging task isthe devising of the dataset. So, we have
methodology. We have used many classification introduced a new publicly accessible MNIST dataset of
algorithms for implementing and improving accuracy. 2HCR[15]. The Hindi Handwritten Character Dataset
Classification algorithms are Linear-Regression (LR), (2HCD), of Ninety- Two thousand images of 36 Hindi
characters and ten digits[16][17]. Before implementing, we
Logistic-Regression (LGR), Support-Vector- have worked on pre-processing because of removing the
Machine (SVM), Random-Forest (RF), and Naïve-Bayes noisy data, Segmentation is converting input images into
(NB) to classify the model and improve the accuracy. individual characters, feature extraction is bringing out of the
Handwritten Character Recognition, the area for features using feature extraction techniques, and
research is still an active platform because of individuals’ classification isthe phase to classify the image data using LR,
different human writing styles, shapes, and sizes. Also, it LGR, SVM, RF, NB algorithms, training the image data, and
is used in many applications such as reading license plate finally testing it to achieve the target[18]. We conducted
numbers, document reading, cheque numbers, postcodes those experiments on the recognition of Hindi Characters and
on envelopes, verification of signatures, etc. This system, used a classification algorithm to improve the accuracy of the
that we have developed, designed, and implement, has method. We will use machine learning techniques to
been done using python programming. After completing, implement handwritten recognition in Hindi characters or
we analyzed the performance and accuracy of the system. digits.
Keywords:- Machine Learning, Python, 2HCR, OCR, Hindi Hindi हि न्द Meaning Pronunciation
Character, Devanagari, LR, LGR, SVM, RF, NB. ० शू न्य 0 Shunya
I. INTRODUCTION १ एक 1 Ek
D. Classification:
The classification algorithm is the decision- making
stage of recognition in which objects are categorized into
classes. The extracted features are used for recognizing
characters. The relevant characteristics are classified using a
different neural network, Multilayer perceptron, fuzzy logic,
Logistic Regression, Naïve Bayes, KNN, SVM, and
CNN[2][8][14].
E. Algorithms:
Support Vector Machine (SVM): Support Vector Machine
or SVM is one of the most effective and efficient
supervised learning methods, which is used for
classification. We have applied the SVM algorithm to
predict the data. After applying, this model it has achieved
the target of 98.88% accuracy.
Random Forest: Random Forest is an effective and
efficient machine learning algorithm that belongs to the
supervised learning methods used for classification
problems. We have applied the Random Forest algorithm
to predict the data. After applying, this model it has
Fig 3. Sample of dataset
achieved the target of 97.22% accuracy.
Logistic Regression: Logistic Regression is the most
IV. RESULTS AND ANALYSIS
favored supervised learning method. This method is
usedto predict the categorical dependent variable using a
The dataset contained 92000 images of 46 different
given set of independent variables. We have applied a
types of Hindi characters and digits. The dataset was
logistic regression algorithm to predict the data. After
randomly shuffled before implementation. The comparison of
applying,this model it has achieved the target of 95.83%
resultsaccuracy is presented in form of Table I.
accuracy.
Linear Regression: Linear Regression is the most popular Algorithms Comparison[18][26] Accuracy
and most effective supervised learning method in SVM 96.25% 98.88%
predictive analyses is linear regression. We have applied Random Forest 98.44% 97.22%
a linear regression algorithm to train the data and predict
Logistic 86.23% 95.83%
the value. After applying, this model it has achieved the Regression
target of 52.43% accuracy. Linear - 52.43%
Naïve Bayes: The naïve Bayes algorithm is the most Regression
effective and efficient supervised learning method. This Naïve Bayes 89.47% 52.68%
method is used to solve the classification and prediction Table 1:- Accuracy achieved by different algorithms
problems, which are based on the Bayes theorem. We
have applied the Naïve Bayes algorithm to predict the It can be winded up that the performance of the
data. After applying, this model it has achieved the target algorithm using 2HCR proposed methods that SVM gives the
of 52.6% accuracy. highest accuracy with 98.88%. Generally, we can say that
SVM resulted in good performance accuracy on a recognition
problem.