Applying AI To Biometric Identification For Recognizing Text Using One-Hot Encoding and CNN
Applying AI To Biometric Identification For Recognizing Text Using One-Hot Encoding and CNN
Applying AI To Biometric Identification For Recognizing Text Using One-Hot Encoding and CNN
ISSN No:-2456-2165
Abstract:- Text on an image often contains important The method of selecting and extracting the most
information and directly carries high-level semantics in unique characteristics from an image is known as feature
academic institutions and financial institutions. This extraction and selection. This peculiar 5 feature has the
makes it an important source of information and a significant ability to accentuate or expand the difference
popular research topic. Many studies have shown that between different class patterns while maintaining its
CNN-based neural networks are very good at classifying invariance to the same class patterns of other kinds. This is
images, which is the foundation of text recognition. By one of its important characteristics [5].
combining AI with the process of biometric
identification, a technique for text recognition in The class of methods that are used for character
academic institutions and financial institutions is recognition in practice is, in principle, not dissimilar to the
performed using Convolutional Neural Network (CNN). class of methods that are used for any broad pattern
Initially, preprocessing is done for making the document recognition issue. On the other hand, based on the
image suitable for feature extraction. One hot encoding- characteristics that are employed, the various methodologies
based feature extraction is performed. Two-dimensional for character recognition may be roughly categorized as
CNN is used to classify the final features. Finally, follows: Approaches for matching templates and performing
RMSprop is used to optimize the results and improve the correlations, as well as techniques for analyzing and
accuracy. Results of the proposed method show that the matching features [6].
accuracy is 99%, which is more when compared to the
existing methods. Template matching and correlation techniques: These
approaches are high-level machine vision algorithms that
Keywords:- Adam optimizer, AI, CNN, RMSprop, One hot identify the entirety or a portion of an image that match a
encoding . standard prototype template. These techniques may be used
to either a single picture or several images simultaneously.
I. INTRODUCTION In this step, the pixels of an input character are compared,
one by one, with character prototypes that have been saved
In today's world, optical character recognition (OCR) before [7].
technologies are utilised for a wide variety of tasks, some of
which include scanner-based data entry, bank checks, Feature analysis and matching techniques: At the
business cards, automatic mail sorting, handheld price label present time, the methods of character recognition that are
scanners, and a variety of document recognition applications utilized most frequently include feature analysis and
[1]. These days, the market also contains the commercial matching algorithms. This strategy is also sometimes
character identification scheme that are accessible to referred to as the structural analysis method. These
purchase. As a direct result of this innovation, researchers approaches replicate human thinking better than template
are now able to work on the issue of online handwriting matching did, which was the previous standard. Using this
recognition. This was made possible as a result of the methodology, relevant characteristics are first retrieved from
development of electronic tablets that collected the x, y the input character, and then the extracted features are
coordinate data as well as the movement of the pen tip [2]. compared with the feature descriptions of the trained
characters. Recognition can be achieved by using the
The quality of the source document that is supplied description that is the closest match [8].
into any OCR system has a noteworthy influence on the
performance of that system. It is possible that the original Many of today's OCR systems are modelled based on
document is rather old and that it's undergone some physical mathematical models in order to reduce the amount of
wear and tear. It is possible that the original document was categorization errors that occur. A method for recognizing
of poor quality because to the changes in toner density that characters like these might make use of either structural or
were present in it [3]. There is a possibility that the scanning pixel-based information. The following list includes some of
process will miss some of the fainter sections that were them. Using hyper surfaces in multi-dimensional feature
present in the original document, which might result in the spaces, discriminant function classifiers attempt to reduce
corruption of many characters in the original text. the mean-squared classification error by separating the
Gradations in the text picture can be brought about by feature descriptions of characters that belong to various
scanning on low-quality paper or by printing of low-quality semantic classes [9]. Bayesian classifiers make use of
copies. The performance of optical character recognition is probability theory in order to minimize a loss function that
also significantly impacted by other factors, including is connected with character misclassification. The theories
accurate line and word segmentation [4]. of human and animal vision are the basis for Artificial
In this chapter, we discuss the proposed OCR disrupt the operation of the future phases. In order to remedy
algorithm by using geometric and texture features. The this situation, a method known as binarization is applied to
algorithm has four main phases: (a) preprocessing, (b) the filtered image to produce a binary representation of it. In
features extraction, (c) classification, (d) Optimization. this method, a threshold value is determined, and the pixels
whose intensity values are more than the threshold are made
B. Preprocessing to be white (0), while less than the threshold are made to be
The OCR system can acquire images of the document black. The value of the threshold is determined by finding
either by scanning the text or by taking a photograph of the the document's overall average pixel intensity and then
text. Before carrying out any processes, photos are scaled subtracting that amount from 1.
down to 256 by 256 pixels for normalization purposes. After
that, the proposed method iterated four times to construct an C. Feature Extraction Using One-hot encoding
existing three-dimensional median filter, which ensures that Many significant properties in real-world datasets are
there is no noise, maintains the image's quality, and prevents categorical rather than numerical. These categorical features
edges. It did a good job of getting rid of the salt and pepper must be changed or converted into a numeric format in order
sounds by preventing a significant level of blurriness. After to be used in training and fitting machine learning
being filtered, the images in RGB are converted to the HSV algorithms because they are crucial for increasing the
color system. Utilize the HSV color space, which accuracy of ML models. While there are many ways to
differentiates chromatic information, so that color visibility convert these qualities into a numerical format, One Hot
can be differentiated. The median filter will serve as the Encoding is the most popular and widely used method.
foundation for the purpose of noise elimination. The
adjacent pixels will be ranked based on their intensity, and In one hot encoding method, the representation of
the median value is used to determine the novel value for the categorical data is changed into numeric data by splitting the
central pixel after the median filter has been applied. The column into multiple columns. The numeric data can be fed
final product of the filtering method can still contain a into algorithms for deep learning and machine learning. It is
background that is faintly textured or coloured, which could a binary vector representation of a categorical variable, with
all values in the vector being 0 except for the ith value,
IV. RESULTS AND DISCUSSION The dataset comprises 26 files (A-Z), each of which
has a handwritten image with a size of 2828 pixels and a
A. Experimental Setup box size of 2020 pixels.
The dataset used in the proposed experiment is given
below. Figure 2 illustrates that the sample dataset image.
https://www.kaggle.com/datasets/sachinpatel21/az-
handwritten-alphabets-in-csv-format
The use of a convolutional neural network model to Preprocessing plays a key role in ensuring that the model
identify text in academic and financial institution papers is performs well. Preprocessing approaches for images
discussed in this research. The model has proved successful improve the features of the image, improving recognition
in recognising characters in real-time, expanding its use. precision. Figure 3 illustrates the pre-processing process.
Fig. 3: Pre-processing
Figure 4 illustrates the feature extraction process done by using one hot encoding technique. The preprocessed features are
converted to the format suitale for CNN classifier.
The data is normalized. The data is split into training and testing in the ratio of 70:30.
Both CNN model with RMSprop and CNN model with Adam optimizer is run to compare the results.
The prediction results are given in Figure 8. From the figure, it is clear that the proposed model accurately identify the text
from the document image.
The CNN architecture with RMSProp optimizer Classifier [27], Bi-LSTM [28], and Hybrid PSO-SVM [29].
produced the highest level of recognition accuracy. This Compared to all the aforementioned models, the proposed
network might, however, function more effectively with CNN with RMSprop model has the maximum accuracy.
alternative optimizers. We used the Adam optimizer to From Table 1, it is clear that training and testing accuracy
conduct experiments to further our investigation. When with ‘adam’ optimizer is 97% and 98%, respectively.
compared to the Adam optimizer, RMSprop optimizer has Training and testing accuracy with ‘rmsprop’ optimizer is
more accuracy. The training and testing accuracy of the 99%. The two models are evaluated by using two metrics,
proposed model is also compared with CNN+LSTM [24], loss and accuracy. It is observed that CNN with rmsprop
SVM [25], VGG-16 model [26], LSTM and Adaptive outperformed in terms of accuracy.
V. COMPARATIVE ANALYSIS
From Table 1, it is clear that training and testing optimizer is 99%. The two models are evaluated by using
accuracy with ‘adam’ optimizer is 97% and 98%, two metrics, loss and accuracy. It is observed that CNN with
respectively. Training and testing accuracy with ‘rmsprop’ rmsprop outperformed in terms of accuracy.
Comparitve Analysis
CNN + Adam optimizer
Hybrid PSO-SVM
Bi-LSTM
VGG-16 model
SVM
CNN+LSTM
86 88 90 92 94 96 98 100
Percentage