Handwritten Hindi Character Recognition Using MultipleClassifiers in Machine Learning

Hindi is a national language of India spoken in many states in our countries, like Bihar, Uttar Pradesh, Madhya Pradesh

Uploaded by

International Journal of Innovative Science and Research Technology

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

64 views

Handwritten Hindi Character Recognition Using MultipleClassifiers in Machine Learning

Hindi is a national language of India spoken in many states in our countries, like Bihar, Uttar Pradesh, Madhya Pradesh

Uploaded by

International Journal of Innovative Science and Research Technology

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Volume 7, Issue 7, July – 2022 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165

Handwritten Hindi Character Recognition using

Multiple Classifiers in Machine Learning
Md Ziaul Haque Mohd Omar
Department of CS & IT Department of CS & IT
Maulana Azad National Urdu University Maulana Azad National Urdu University
Hyderabad, India Hyderabad, India

Abstract:- Hindi is a national language of India spoken in document reading, mail sorting and verifying, bank
many states in our countries, like Bihar, Uttar Pradesh, processing, and postal card address recognition[9][10]. The
Madhya Pradesh, Jharkhand, and Delhi. The Hindi application of handwritten recognition is a type of optical
language is 3rd most popular language globally, which is recognition. In such a situation, when a person behaves
the script of Devanagari. It consists of 36 primary toward another language, they can take a picture of a
alphabets and ten digits. We present sophisticated handwritten image or document and forward it to the HCR
handwritten Hindi character recognition (2HCR) using algorithm[7]. These applications help people to develop
machine learning techniques to implement Hindi reading and writing skills[11]. India is a multi-dialectal
characters and digits. A dataset consists of Ninety-Two country comprised of eighteen official languages. Hindi is the
Thousand images of 46 different types of characters and national language of India, and it is used in many Indian
digits in the Hindi language segmented from handwritten states[12]. It is the third most popular language globally and
documents. Nowadays, it has become easy to train data is written in the Devanagari script[13][14]. Our main
because of the availability of various algorithms and challenging task isthe devising of the dataset. So, we have
methodology. We have used many classification introduced a new publicly accessible MNIST dataset of
algorithms for implementing and improving accuracy. 2HCR[15]. The Hindi Handwritten Character Dataset
Classification algorithms are Linear-Regression (LR), (2HCD), of Ninety- Two thousand images of 36 Hindi
characters and ten digits[16][17]. Before implementing, we
Logistic-Regression (LGR), Support-Vector- have worked on pre-processing because of removing the
Machine (SVM), Random-Forest (RF), and Naïve-Bayes noisy data, Segmentation is converting input images into
(NB) to classify the model and improve the accuracy. individual characters, feature extraction is bringing out of the
Handwritten Character Recognition, the area for features using feature extraction techniques, and
research is still an active platform because of individuals’ classification isthe phase to classify the image data using LR,
different human writing styles, shapes, and sizes. Also, it LGR, SVM, RF, NB algorithms, training the image data, and
is used in many applications such as reading license plate finally testing it to achieve the target[18]. We conducted
numbers, document reading, cheque numbers, postcodes those experiments on the recognition of Hindi Characters and
on envelopes, verification of signatures, etc. This system, used a classification algorithm to improve the accuracy of the
that we have developed, designed, and implement, has method. We will use machine learning techniques to
been done using python programming. After completing, implement handwritten recognition in Hindi characters or
we analyzed the performance and accuracy of the system. digits.

Keywords:- Machine Learning, Python, 2HCR, OCR, Hindi Hindi हि न्द Meaning Pronunciation
Character, Devanagari, LR, LGR, SVM, RF, NB. ० शू न्य 0 Shunya
I. INTRODUCTION १ एक 1 Ek

Handwritten Character Recognition (HCR), also २ दो 2 Do

familiar as Handwritten Text Recognition is the capability of ३ तन 3 Teen
the machine to collect and process the recognition of different
handwritten input image data from a source such as paper ४ चार 4 Char
images, screen images, scanning document devices,
etc.[1][2]. In addition, cursive writing is an upward slant, so ५ पााााँच 5 Paanch
handwriting will be more difficult to detect[3]. We face many ६ छै 6 Che
difficulties because of individual people’s different styles,
shapes, and designs of writing[4]. Also, Handwritten ७ सात 7 Saat
recognition is a highly active research domain where machine
learning is utilized[5][6]. Nowadays, HCR is a current ८ आठ 8 Aath
technology that will be helpful in the 21st century because ९ नौ 9 Nau
HCR technology is responsible for the automatic conversion
of handwritten text into computerized text[7][8]. The HCR Fig 1 Table of Hindi Numeric values
system is commonly used in various applications such as

IJISRT22JUL757 www.ijisrt.com 1071

Volume 7, Issue 7, July – 2022 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
II. RELATED WORKS recognition research work[31].

Many organizations have developed and designed

methods for handwritten characters and digits as a deep
experiment is going on in this particular field. We have
studied HCR methods to implement the characters anddigits:

A large amount of research is accomplished on

handwritten characters and digit recognition. The research in
this field started in 1970. Devanagri text consists of basic,
combination, hybrid, and numeral characters[19]. A
Handwritten character recognition system has been designed
and implemented using a fuzzy logic algorithm based on
Hindi Character recognition. They have usedhamming neural
network techniques in their approach[20][21]. A creative
method for recognizing handwritten Hindi characters
approach using Neural Networks algorithm has been
developed and designed by using Self Organizing Map(SOM)
techniques.[22]. Their system executes the close accurate
results but occasionally gets errors if the handwritten
character is not fragmented or divided correctly[23]. A few
research reports are Fig:2 Flow chart of proposed 2HCR Systems

III. PROPOSED WORK A. Pre-processing:

Pre-processing is an initial stage that focuses on
In this field, we have discussed how our system has been improving the input data by reducing unwanted noises and
implemented and how the model of our system works. The redundancy and the image quality so that we can analyze it
model of our system performs many functions. Out of Ninety- better[4][14].
Thousand image data 80% data is to be trained and the rest
20% is to be tested by us. Pre-processing for further process. Pre-processing is the
pioneer to start the main work of processing the images. It is
The given diagram in figure.2 has been designed to done by reducing unwanted noises and redundancy of the
achieve accuracy. The dataset preparation phase can be image quality. Pre-processing is followed up by another
divided into some phases like pre-processing, segmentation, important step called segmentation. Segmentation is the
feature extraction, and classification with training and testing process of converting input images into individual characters.
data. First, we have to collect the image data from different After segmentation, the process to be followed up is feature
types of handwritten MNIST datasets in Hindi characters. extraction. Feature extraction is the process by which
This dataset consists ofNinety-Two Thousand images of 46 characteristics of images are extracted by feature extraction
different types of characters and digits. After collecting the methods. Feature Extraction is lastly followed up by the
image data is accessed from the memory location. After the process called classification. We have used the algorithms of
image is accessed, it has to be sent to available on recognizing machine learning to classify the data for better accuracy.
handwritten Hindi characters[24]. And these Hindi character Logistic Regression, Naïve Bayes, SVM, Random Forest.
recognition of handwritten Indian scripts is more complicated After classifying, predict whether the target's data accuracy
than in other languages[25]. 2HCR has the same importance has been achieved. If the target has been achieved then apply
as character recognition for they can be found on cheques, the model to check the accuracy of the results. If not then go
envelopes, Optical Mark Recognition (OMR) sheets, etc[26]. back again to feature extraction and repeat the same process.
The researcher has presented in this paper that the size of a By doing this the data accuracy will be achieved.
handwritten text is eccentric and unique for every person and
proposed system for recognizing and identifying a different B. Segmentation:
people from their handwriting[27]. In the Devanagari text, The noise-free image is passed to the segmentation after
there is a lot of old literature available which contains pre-processing and cleaning the document into itscomponents
Devanagari characters and digits[28]. The structure of (i.e. paragraph, sentences, words, and letters). Segmentation
characters in the Hindi language varies from character to is converting input images into individual characters[2][8].
character [29]. SVM- RBF kernel had the highest accuracy Segmentation of an image is in practice for classifying an
achieved with it was 91.63%, MPL algorithm determined to image pixel.
be the lowest with an accuracy of 86.72%, and 90% accuracy
with the KNN classifier[30][26]. Some algorithms have been C. Feature Extraction:
developed to recognize handwritten characters for languages After segmentation, the process to be followed up is
like English, Hindi, Gujarati, Tamil, French, etc. Many feature extraction. Feature extraction is the process by which
research developers in the area of computer science and characteristics of images are extracted by feature extraction
machine learning have considerably scrutinized handwriting methods. Feature Extraction is the important phase in

IJISRT22JUL757 www.ijisrt.com 1072

Volume 7, Issue 7, July – 2022 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
character recognition. Recognition accuracy mainly
depends on extracted features[2][14]. To extract the features
of individual characters techniques likeZoning, Histogram,
Principle Components Analysis (PCA), and Gradient-based
features can be applied[32].

D. Classification:
The classification algorithm is the decision- making
stage of recognition in which objects are categorized into
classes. The extracted features are used for recognizing
characters. The relevant characteristics are classified using a
different neural network, Multilayer perceptron, fuzzy logic,
Logistic Regression, Naïve Bayes, KNN, SVM, and
CNN[2][8][14].

E. Algorithms:
 Support Vector Machine (SVM): Support Vector Machine
or SVM is one of the most effective and efficient
supervised learning methods, which is used for
classification. We have applied the SVM algorithm to
predict the data. After applying, this model it has achieved
the target of 98.88% accuracy.
 Random Forest: Random Forest is an effective and
efficient machine learning algorithm that belongs to the
supervised learning methods used for classification
problems. We have applied the Random Forest algorithm
to predict the data. After applying, this model it has
Fig 3. Sample of dataset
achieved the target of 97.22% accuracy.
 Logistic Regression: Logistic Regression is the most
IV. RESULTS AND ANALYSIS
favored supervised learning method. This method is
usedto predict the categorical dependent variable using a
The dataset contained 92000 images of 46 different
given set of independent variables. We have applied a
types of Hindi characters and digits. The dataset was
logistic regression algorithm to predict the data. After
randomly shuffled before implementation. The comparison of
applying,this model it has achieved the target of 95.83%
resultsaccuracy is presented in form of Table I.
accuracy.
 Linear Regression: Linear Regression is the most popular Algorithms Comparison[18][26] Accuracy
and most effective supervised learning method in SVM 96.25% 98.88%
predictive analyses is linear regression. We have applied Random Forest 98.44% 97.22%
a linear regression algorithm to train the data and predict
Logistic 86.23% 95.83%
the value. After applying, this model it has achieved the Regression
target of 52.43% accuracy. Linear - 52.43%
 Naïve Bayes: The naïve Bayes algorithm is the most Regression
effective and efficient supervised learning method. This Naïve Bayes 89.47% 52.68%
method is used to solve the classification and prediction Table 1:- Accuracy achieved by different algorithms
problems, which are based on the Bayes theorem. We
have applied the Naïve Bayes algorithm to predict the It can be winded up that the performance of the
data. After applying, this model it has achieved the target algorithm using 2HCR proposed methods that SVM gives the
of 52.6% accuracy. highest accuracy with 98.88%. Generally, we can say that
SVM resulted in good performance accuracy on a recognition
problem.

IJISRT22JUL757 www.ijisrt.com 1073

Volume 7, Issue 7, July – 2022 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165

Fig 4 Results and Analysis

IJISRT22JUL757 www.ijisrt.com 1074

Volume 7, Issue 7, July – 2022 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
V. CONCLUSION 1397, 2019, doi: 10.35940/ijitee.a4748.129219.
[11]. J. Memon, M. Sami, R. A. Khan, and M. Uddin,
Handwritten Hindi character recognition is a “Handwritten Optical Character Recognition (OCR): A
challenging task for the researchers, which is not simply Comprehensive Systematic Literature Review (SLR),”
solvable. There are many developments possible in character IEEE Access, vol. 8, no. August, pp. 142642–142668,
recognition machines in near future. Nowadays, the machine 2020, doi: 10.1109/ACCESS.2020.3012542.
can only recognize characters and numbers. We can build up [12]. D. Singh and M. M. M. E. College, “Neural Network
the recognition of special characters in the future. The Based Handwritten Hindi Character Recognition
accurate recognition is directly turned on the nature of the System,” pp. 9–12, 2009.
written by different users. We presented an MNIST dataset of [13]. V. J. Dongre and V. H. Mankar, “Devnagari Document
Handwritten Hindi Characters which is publicly accessible. Segmentation Using Histogram Approach,” Int. J.
It consists of 92 thousand images of 36 primary alphabets Comput. Sci. Eng. Inf. Technoogy, vol. 1, no. 3, pp. 46–
and ten digits of Devanagari Script. ThisLearning paper 53, 2011, doi: 10.5121/ijcseit.2011.1305.
presents handwritten Hindi character recognition based on a [14]. A. Gaur, “Handwritten Hindi Character Recognition
machine-learning algorithm to improve using K- Means Clustering and SVM,” pp. 65–70, 2015.
accuracy. [15]. [ “search Hindi / Devanagari MNIST Data About
Dataset,” p. 1700.
FUTURE WORK [16]. S. Acharya, A. K. Pant, and P. K. Gyawali, “Deep
learning based large scale handwritten Devanagari
A lot of research work on recognition systems is still character recognition,” Ski. 2015 - 9th Int. Conf.
needed for utilizing new features to improve the current Software, Knowledge, Inf. Manag. Appl., 2016, doi:
performance. In the future, we will develop a deep learning 10.1109/SKIMA.2015.7400041.
model which is used for recognizing Hindi words and [17]. S. K. Singh and A. Khamparia, “Deep Learning
sentences. Architecture for Large Scale Hand Written Devanagari
Character Recognition,” vol. 5, no. 10, pp. 222–228,
REFERENCES 2018.
[18]. P. Chaudhary, “Handwritten Hindi Character
[1]. G. A. Fink, “Handwriting Recognition,” Markov Recognition using Machine ThisLearning and Deep
Model. Pattern Recognit., pp. 237–248, 2014, doi: Learning,” pp. 48–53.
10.1007/978-1-4471-6308-4_14. [19]. S. D. Pande et al., “Digitization of handwritten
[2]. S. N. R. S and S. Afseena, “Handwritten Character Devanagari text usingCNN Informatics, vol. 2, no. 3, p.
Recognition – A Review,” vol. 5, no. 3, pp. 1–6, 2015. 100016, 2022, doi: 10.1016/j.neuri.2021.100016.
[3]. S. S. Rosyda and T. W. Purboyo, “A Review of Various [20]. W. Lu, Z. Li, and B. Shi, “v p j”.
Handwriting Recognition Methods,” Int. J. Appl. Eng. [21]. H. Zhan, S. Lyu, U. Pal, and Y. Lu, “CNN-based Hindi
Res., vol. 13, no. 2, pp. 1155– 1164, 2018, [Online]. numeral string recognition for Indian postal
Available: http://www.ripublication.com automation,” 2019 Int. Conf. Doc. Anal. Recognit.
[4]. R. Dixit, R. Kushwah, and S. Pashine, “Handwritten Work. ICDARW 2019, vol. 5, pp. 77–82, 2019, doi:
Digit Recognition using Machine and Deep Learning 10.1109/ICDARW.2019.40085.
Algorithms,” Int. J. Comput. Appl., vol. 176, no. 42, pp. [22]. P. Banumathi and G. M. Nasira, “Handwritten Tamil
27–33, 2020, doi: 10.5120/ijca2020920550. character recognition using artificial neural networks,”
[5]. D. Eshwar Reddy, K. V. Pranathi Naidu, M. Kartheek Proc. 2011 Int. Conf. Process Autom. Control Comput.
Srinivas, A. Raheem, and S. Sureddy, “Handwritten PACC 2011, 2011, doi: 10.1109/PACC.2011.5978989.
character recognition using SVM,” Int. J. Adv. Sci. [23]. B. V. S. Murthy, “Handwriting recognition using
Technol., vol. 29, no. 5, pp. 4001–4007, 2020, doi: supervised neural networks,” Proc. Int. Jt. Conf. Neural
10.55014/pij.v3i2.98. Networks, vol. 4, pp. 2899–2902, 1999, doi:
[6]. D. D. Frp, “+Dqgzulwwhq +Lqgl 1Xphulf 10.1109/ijcnn.1999.833545.
&Kdudfwhu 5Hfrjqlwlrq Dqg Frpsdulvrq Ri [24]. N. K. Garg, L. Kaur, and M. Jndal, “Recognition of
Dojrulwkpv,” pp. 13–16, 2017. Offline Handwritten Hindi text using middle zone of the
[7]. S. Preetha, I. M. Afrid, K. H. P, and S. K. Nishchay, words,” 2015 IEEE/ACIS 14th Int. Conf. Comput. Inf.
“Machine Learning for Handwriting Recognition,” vol. Sci. ICIS 2015 - Proc., pp. 325–328, 2015, doi:
4523, pp. 93–101. 10.1109/ICIS.2015.7166614.
[8]. A. Indian, G. K. Vishvidyalaya, K. Bhatia, and G. K. [25]. J. M. R. D and A. V. Reddy, “Recognition of
Vishvidyalaya, “A Survey of Offline Handwritten Hindi Handwritten Characters using Deep Convolutional
Character Recognition,” 2017. Neural Network,” no. 6, pp. 314–317, 2019, doi:
[9]. V. L. Sahu and B. Kubde, “Offline Handwritten 10.35940/ijitee.F1064.0486S419.
Character Recognition Techniques using Neural [26]. M. Yadav and R. Purwar, “Hindi handwritten character
Network : A Review,” vol. 2, no. 1, pp. 87–94, 2013. recognition using multiple classifiers,” Proc. 7th Int.
[10]. P. Bojja, N. Sai, S. Teja, G. K. Pandala, and S. D. L. R. Conf. Conflu. 2017 Cloud Comput. Data Sci. Eng., pp.
Sharma, “Handwritten Text Recognition using Machine 149–154, 2017, doi:
Learning Techniques in Application of NLP,” Int. J. 10.1109/CONFLUENCE.2017.7943140.
Innov. Technol. Explor. Eng., vol. 9, no. 2, pp. 1394–

IJISRT22JUL757 www.ijisrt.com 1075

Volume 7, Issue 7, July – 2022 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
[27]. J. Pradeep, E. Srinivasan, and S. Himavathi, “Neural
network based handwritten character recognition
system without feature extraction,” 2011 Int. Conf.
Comput. Commun. Electr. Technol. ICCCET 2011, pp.
40–44, 2011, doi: 10.1109/ICCCET.2011.5762513.
[28]. Y. Gurav, P. Bhagat, R. Jadhav, and S. Sinha,
“Devanagari Handwritten Character Recognition using
Convolutional Neural Networks,” 2nd Int. Conf. Electr.
Commun. Comput. Eng. ICECCE 2020, no. June, pp. 1–
6, 2020, doi: 10.1109/ICECCE49384.2020.9179193.
[29]. N. Singh, “An Efficient Approach for Handwritten
Devanagari Character Recognition based on Artificial
Neural Network,” 2018 5th Int. Conf. Signal Process.
Integr. Networks, SPIN 2018, pp. 894–897, 2018, doi:
10.1109/SPIN.2018.8474282.
[30]. A. Sahu and S. N. Mishra, “Odia handwritten character
recognition with noise using machine learning,” Proc. -
2020 IEEE Int. Symp. Sustain. Energy, Signal Process.
Cyber Secur. iSSSC 2020, pp. 20–23, 2020, doi:
10.1109/iSSSC50941.2020.9358804.
[31]. I. Khandokar, M. Hasan, F. Ernawan, S. Islam, and M.
N. Kabir, “Handwritten character recognition using
convolutional neural network,” J. Phys. Conf. Ser., vol.
1918, no. 4, 2021, doi: 10.1088/1742-
6596/1918/4/042152.
[32]. M. Agarwal, V. Tomar, and P. Gupta, “Handwritten
Character Recognition using Neural Network and
Tensor Flow,” no. April, pp. 1445–1448, 2019, doi:
10.35940/ijitee.F1294.0486S419.