Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Handwritten Character Recognition System

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 81

Handwritten Character Recognition System

Abstract
Handwritten character identification is a topic that has been researched for years
and is an area of interest for the community of Pattern recognition researchers since
It may be put to use in a wide range of fascinating applications. all across the field.
This subject is a difficult challenge as a task because each person has their own
unique writing style. SVM, ANN, and CNN models are some of the available
options for handling this problem's many different ways and approaches. HCR is a
need in the modern world since it assists us in a variety of fields of public domain,
which makes it all the more vital to study in depth. Off-line character recognition
and online character recognition are both examples of the hybrid character
recognition (HCR) category. In this study, we review the many existing algorithms
that have been implemented to get the better knowledge of the course, and we will
come to a conclusion on the best strategies that are currently being developed for
HCR.

HCR for Devanagari is carried out by the performance of a computational


device that accepts input from documents, screens, photos, and other responsive
devices and believe to provides output by reading those images as an ASCII or
UNICODE format. This theory is supported by the fact that computers have
become increasingly powerful in recent years. Sanskrit, Nepali, Marathi, and Hindi
are some of the languages that are represented in Devanagari. This script is a blend
of numerous languages. This implementation is more important because the design
of upper-case and lower-case characters in Devanagari are more complicated than
in most other languages out there. Comparatively speaking, the set of characters
and digits used in Devanagari is more complicated than the set of characters used
in the English language. Character recognition has been hampered by the absence
of verified datasets including Devanagari, which has made the task more difficult
to do in the field.
1. INTRODUCTION

1.1 Introduction

Machine learning and deep learning play an important role in computer technology and
artificial intelligence. With the use of deep learning and machine learning, human effort
can be reduced in recognizing, learning, predictions and many more areas.

This article presents recognizing the handwritten characters (A to Z) from the famous
MNIST dataset, comparing classifiers like KNN, PSVM, NN and convolution neural
network on basis of performance, accuracy, time, sensitivity, positive productivity, and
specificity with using different parameters with the classifiers.

To make machines more intelligent, the developers are diving into machine learning and
deep learning techniques. A human learns to perform a task by practicing and repeating it
again and again so that it memorizes how to perform the tasks. Then the neurons in his
brain automatically trigger and they can quickly perform the task they have learned. Deep
learning is also very similar to this. It uses different types of neural network architectures
for different types of problems.

For example – object recognition, image and sound classification, object detection,
image segmentation, etc.

Handwritten character recognition is the ability of computers to recognize human


handwritten characters. It is a hard task for the machine because handwritten characters
are not perfect and can be made with many different flavors. The handwritten character
recognition is the solution to this problem which uses the image of a character and
recognizes the character present in the image.

1.2 Character Recognition System:

Character recognition system is the working of a machine to train itself or recognizing


the characters from different sources like emails, bank cheque, papers, images, etc. and
in different real-world scenarios for online handwriting recognition on computer tablets
or system, recognize number plates of numeric entries in forms filled up by hand and so
on.

1.3 Problem Statement:

The goal of this project is to create a model that will be able to recognize and determine
handwritten characters from its image by using the concepts of Convolution Neural Network.
Though the goal is to create a model which can recognize the characters, it can be extended to
digits and an individual’s handwriting. The major goal of the proposed system is understanding
Convolutional Neural Network and applying it to the handwritten recognition system.

1.4 Relevant theory

The process of Handwritten Character Recognition (HWCR) often consists of many steps. Here,
the first one is the Image Acquisition, and in this state, we take all of the possible images and
obtain the input either by taking a picture with a camera or phone, or we can take it by drawing
the image on a sheet of page, and after that, we can scan that using a scanner or the camera on
our phone. Creating drawings with the light stylus is yet another type of image input that may
potentially be used.

The subsequent step in the process is called pre-processing, and it is at this phase that the quality
of the photos is enhanced and improved. After applying various procedures to the input photos,
which may include thinning, skeletonization, normalizing, skew correction, noise removal,
filtering, and binarization, During the process of preprocessing, we provide the photographs a
higher standard of quality by improving and enhancing them.
Use case Diagram

The third stage of the system is called segmentation, and it is important for decomposing the
input picture into meaningful pieces. Additionally, this procedure helps to separate the many
things that are displayed in the object. A basic function shown in figure 1.3. Therefore, we are
able to define segmentation as the condition in which the input is broken down into subparts, and
each subpart is defined as an object

Throughout the subsequent step, we will be classifying the objects that were produced during the
process of segmentation in order to make a determination regarding the category of item to
which each individual object belongs. Therefore, when we are in this stage, it is possible for us to
have classes that are dependent on the object. This is because numerous classes may be
developed for categorization, and as we obtain the properties of the entity, we will be able to
assign a class to all of the objects.

For illustration's sake, let's say we have a class called "car" and another class called "plane." At
this point, any item that exhibits vehicle-like qualities must be assigned to the "car" class owing
to the traits it possesses. In the same way as this system, this review paper investigates the many
approaches to HCR that are already in use and discusses both the benefits and drawbacks
associated with each of them.

The use of a deep neural network (DNN), which, in addition to being an efficient feature
extractor and classifier, as well as one of the many approaches that can be used, also happens to
be one of the many approaches that can be taken to address this issue, is one of the many ways
that this problem can be solved.. However, there is a catch, and that is the necessity for an
excessively long period of time to train the network. This is because the network has a
significant number of nonlinear hidden layers, in addition to certain connections. While Deep
Neural Networks (DNN) were developed to solve these challenges, Convolutional Neural
Networks (CNN) were developed to solve these problems by implementing nonlinear hidden
layers in a smaller amount compared to DNN .

This is the key reason that we use CNN to extract the characteristics of position-invariant. CNN
has a simpler structure compared to DNN, which is why we employ it. Because it provides the
user with temporal subsampling and allows for a degree of rotation, shift invariance, and
distortion, a map may be constructed between the input picture or dataset and the output dataset
through the usage of a convolutional neural network with relative ease.

Fans of machine learning and data mining have already put in a significant amount of work to
improve their chances of successfully approximating pattern recognition. Back checks are a
perfect illustration of how dependent we are on HCR as a medium, which takes the information
and lets us communicate with others. HCR has a significant effect on the way we live today
because of how dependent we are on it.

The same debate occurs whenever we make use of the distortion of the HCR. This is due to the
fact that various locations house a variety of languages, and these locations will also provide a
variety of handwriting styles and methods. As a result, it is somewhat challenging to maintain
control and extract the character from the language that is provided.
2. LITERATURE SURVEY
An early notable attempt in the area of character recognition research is by Grimsdale in
1959. The origin of a great deal of research work in the early sixties was based on an
approach known as analysis- by-synthesis method suggested by Eden in 1968. The great
importance of Eden's work was that he formally proved that all handwritten characters
are formed by a finite number of schematic features, a point that was implicitly included
in previous works.

This notion was later used in all methods in syntactic (structural) approaches of character
recognition.

1. K. Gaurav, Bhatia P. K. , his paper deals with the various pre-processing


techniques involved in the character recognition with different kind of images
ranges from a simple handwritten form based documents and documents
containing colored and complex background and varied intensities.In this, different
preprocessing techniques like skew detection and correction, image enhancement
techniques of contrast stretching, binarization, noise removal techniques,
normalization and segmentation, morphological processing techniques are
discussed.
2. Sandhya Arora , used four feature extraction techniques namely, intersection,
shadow feature, chain code histogram and straight line fitting features. Shadow
features are computed globally for character image while intersection features,
chain code histogram features and line fitting features are computed by dividing
the character image into different segments. On experimentation with a dataset of
4900 samples the overall recognition rate observed was 92.80% for Devanagari
characters.
3. Brakensiek, J. Rottland, A. Kosmala, J. Rigoll, in their paper a system for off-line
cursive handwriting recognition is described which is based on Hidden Markov Models
(HMM) using discrete and hybrid modelling techniques. Handwriting recognition
experiments using a discrete and two different hybrid approaches, which consist of a
discrete and semi-continuous structures, are compared. It is found that the recognition
rate performance can be improved of a hybrid modelling technique for HMMs, which
depends on a neural vector quantizer (hybrid MMI), compared to discrete and hybrid
HMMs, based on tired mixture structure (hybrid - TP), which may be caused by a relative
small data set.
4. R. Bajaj, L. Dey, S. Chaudhari , employed three different kinds of features, namely, the
density features, moment features and descriptive component features for classification of
Devanagari Numerals. They proposed multi classifier connectionist architecture for
increasing the recognition reliability and they obtained 89.6% accuracy for handwritten
Devanagari numerals.
5. G. Pirlo and D. Impedovo in his work on , presented a new class of membership
functions, which are called Fuzzymembership functions (FMFs), for zoning-based
classification. These FMFs can be easily adapted to the specific characteristics of a
classification problem in order to maximize classification performance. In this research, a
realcoded genetic algorithm is presented to find, in a single optimization procedure, the
optimal FMF, together with the optimal zoning described by Voronoi tessellation. The
experimental results, which are carried out in the field of handwritten digit and character
recognition, indicate that optimal FMF performs better than other membership functions
based on abstract level, ranked-level, and measurement-level weighting models, which
can be found in the literature.
6. Sushree Sangita Patnaik and Anup Kumar Panda May 2011 , this paper proposes the
implementation of particle swarm optimization (PSO) and bacterial foraging optimization
(BFO) algorithms which are intended for optimal harmonic compensation by minimizing
the undesirable losses occurring inside the APF itself. The efficiency and effectiveness of
the implementation of two approaches are compared for two different conditions of
supply. The total harmonic distortion (THD) in the source current which is a measure of
APF performance is reduced drastically to nearly 1% by employing BFO. The results
demonstrate that BFO outperforms the conventional and PSO based approaches by
ensuring excellent functionality of APF and quick prevail over harmonics in the source
current even under unbalanced supply.
7. M. Hanmandlu, O.V. Ramana Murthy have presented in their study the recognition of
handwritten Hindi and English numerals by representing them in the form of exponential
membership functions which serve as a fuzzy model. The recognition is carried out by
modifying the exponential membership functions fitted to the fuzzy sets. These fuzzy sets
are derived from features consisting of normalized distances obtained using the Box
approach. The membership function is modified by two structural parameters that are
estimated by optimizing the entropy subject to the attainment of membership function to
unity. The overall recognition rate is found to be 95% for Hindi numerals and 98.4% for
English numerals.
8. Renata F. P. Neves have proposed SVM based offline handwritten digit recognition.
Authors claim that SVM outperforms the Multilayer perceptron classifier. Experiment is
carried out on NIST SD19 standard dataset. Advantage of MLP is that it is able to
segment non-linearly separable classes. However, MLP can easily fall into a region of
local minimum, where the training will stop assuming it has achieved an optimal point in
the error surface. Another hindrance is defining the best network architecture to solve the
problem, considering the number of layers and the number of perceptron in each hidden
layer. Because of these disadvantages, a digit recognizer using the MLP structure may
not produce the desired low error rate.

2.1 KEY-DEDUPLICATION WITH IBBE

Identity-Based Broadcast Encryption (IBBE): IBBE is a cryptographic technique that allows a


sender to encrypt a message for a specific identity (such as an email address or username) and
then broadcast it to multiple recipients. Each recipient can decrypt the message if they possess
the corresponding private key associated with their identity.

If you want to perform key duplication with IBBE, it generally means that you want to create
multiple private keys for the same identity. Key duplication might be necessary in some
scenarios, such as when multiple devices or users need access to the same encrypted content. To
achieve key duplication in an IBBE system, you would typically have an authority that can issue
multiple private keys for a single identity while ensuring the security and authentication of those
keys.
Other Interpretations of "IBBE": If you meant something different by "IBBE" or if it stands
for a specific technology or system not covered above, please provide more context or clarify the
acronym so I can give you a more accurate response.

In any case, working with cryptographic keys, especially for encryption or decryption, should be
done with great care and consideration of security implications. Unauthorized key duplication
can lead to security breaches, so it's essential to follow best practices and adhere to security
protocols when managing cryptographic keys in any system.

2.2 SERVER LESS DISTRIBTED FILE SYSTEM

A "serverless distributed file system" typically refers to a file system architecture that operates
without traditional, dedicated servers for storage and file management. Instead, it relies on a
decentralized and distributed model where various nodes or devices participate in storing and
accessing files. Here are some key aspects of such a system:

1. **Decentralization:** In a serverless distributed file system, there is no central server that


manages all file operations. Instead, multiple nodes (e.g., computers, devices) participate as both
clients and providers of storage resources.

2. **Distributed Storage:** Files are distributed across the network of nodes rather than being
stored on a single, dedicated server. Each node may contribute storage space, and files are
distributed across these nodes for redundancy and fault tolerance.

3. **Load Balancing:** The system typically employs load balancing mechanisms to evenly
distribute file requests and storage across the participating nodes. This helps ensure that no single
node becomes a performance bottleneck.

4. **Data Replication:** To enhance fault tolerance and availability, some serverless


distributed file systems replicate data across multiple nodes. This means that if one node goes
offline or experiences data corruption, there are redundant copies of the data on other nodes.

5. **Dynamic Scaling:** Serverless systems often support dynamic scaling, allowing nodes to
join or leave the network without disrupting the overall file system's operation. This scalability is
a key advantage.
6. **Access Control:** Access control mechanisms are essential to ensure that only authorized
users or nodes can access specific files. These systems often employ encryption and
authentication techniques for security.

7. **P2P Networking:** Many serverless distributed file systems are built on peer-to-peer
(P2P) networking principles, where nodes communicate directly with each other to exchange
data and manage the file system.

8. **Metadata Management:** Managing file metadata, such as file names, permissions, and
file locations, is a crucial aspect of distributed file systems. Metadata may be distributed across
nodes or managed centrally, depending on the design.

Examples of serverless distributed file systems and technologies include:

- **IPFS (InterPlanetary File System):** A peer-to-peer hypermedia protocol designed to


create a decentralized method of storing and sharing hypermedia in a distributed file system.

- **Ceph:** A distributed storage platform that provides object storage, block storage, and file
storage capabilities.

- **Tahoe-LAFS (Least Authority File System):** A decentralized, secure, and fault-tolerant


file storage system that uses cryptography to protect data.

- **Storj:** A decentralized cloud storage platform that leverages a global network of nodes to
store and retrieve data.

These systems are designed to address various use cases, including data resilience, privacy, and
scalability, by distributing storage and file management tasks across a network of nodes rather
than relying on traditional, centralized servers.

2.3 THE GOOGLE FILE SYSTEM

The Google File System (GFS) is a distributed file system developed by Google to meet their
storage needs. It was first described in a research paper published by Google in 2003. GFS is
designed to provide high availability, reliability, and scalability for storing and managing large
amounts of data across a distributed infrastructure. Here are some key features and
characteristics of the Google File System:
1. **Scalability:** GFS is designed to handle massive amounts of data, and it is highly scalable.
It is capable of storing petabytes of data across a cluster of commodity hardware.

2. **Fault Tolerance:** GFS is built with fault tolerance in mind. It automatically replicates
data across multiple machines to ensure that data is not lost in the event of hardware failures.

3. **Master-Chunkserver Architecture:** GFS uses a master-slave architecture where the


master server manages metadata and coordinates access to data, while chunk servers store and
serve the actual data chunks. This separation allows for better scalability and reliability.

4. **Large File Support:** GFS is optimized for handling large files, particularly for Google's
applications, like Google Search and Google MapReduce, which require the storage and
processing of large datasets.

5. **Atomic Record Append:** GFS supports an atomic record append operation, which is
crucial for applications like Google's MapReduce, where data needs to be appended to files
reliably.

6. **High Throughput:** GFS is designed for high throughput, making it suitable for
applications that require rapid data access and processing.

7. **Simple Interface:** GFS provides a simple file system interface, which simplifies
application development and integration.

8. **Automatic Data Replication:** Data in GFS is automatically replicated across multiple


chunk servers. This redundancy ensures data availability even if some servers fail.

9. **Consistency Model:** GFS provides a relaxed consistency model. It prioritizes availability


and performance over strong consistency, which is sufficient for many of Google's applications.

10. **Garbage Collection:** GFS includes a garbage collection mechanism to manage the
storage space efficiently by reclaiming space used by deleted or obsolete data.

GFS was developed specifically to support Google's internal infrastructure and applications, and
it played a crucial role in the success of services like Google Search and Google MapReduce.
While GFS itself is not available for public use, its design principles and concepts have
influenced the development of other distributed file systems and storage solutions, such as
Hadoop HDFS (Hadoop Distributed File System) and the open-source Ceph file system.

2.4 CONVERGENT KEY MANAGEMENT

Convergent key management is a cryptographic approach that allows multiple parties to derive
the same encryption key from the same data, even if they do not necessarily have access to each
other's data or communicate directly. This approach is particularly useful in scenarios where
different entities need to encrypt and decrypt data consistently, even when they are distributed or
operate independently. Here's how convergent key management works:

1. **Data Deduplication:** Convergent key management relies on data deduplication, a


process where identical pieces of data are identified and stored only once. Deduplication is
commonly used in storage systems to save space and reduce redundancy.

2. **Convergent Hashing:** To generate a consistent encryption key for a specific piece of


data, a convergent hash function is used. This hash function takes the data as input and produces
a unique hash value. The crucial property of this hash function is that it is deterministic, meaning
the same input data will always produce the same hash value.

3. **Key Derivation:** The hash value generated by the convergent hash function is used as the
basis for deriving an encryption key. Multiple parties or systems can independently perform this
process, given the same data and convergent hash function. As long as they use the same data,
they will derive the same encryption key.

4. **Encryption and Decryption:** These parties can then use the derived encryption key to
encrypt and decrypt data securely. Since they all use the same data and convergent hash function,
they can consistently generate the same key, ensuring that they can access the data encrypted by
others.

Convergent key management has several advantages:

- **Consistency:** It ensures that different parties can access the same data consistently, even
in distributed or decentralized systems.
- **Data Deduplication:** By using convergent key management, duplicate data can be
identified and stored only once, which can save storage space.

- **Security:** When properly implemented, convergent key management can provide strong
security for data encryption, as long as the convergent hash function is secure and collision-
resistant.

However, there are some important considerations and potential drawbacks:

- **Privacy:** Convergent key management implies that the same data will result in the same
encryption key. This can raise privacy concerns, as it could allow parties to deduce that they are
working with the same or similar data.

- **Security Risks:** The security of convergent key management relies heavily on the security
of the convergent hash function. If the hash function is compromised, it could lead to the
exposure of encrypted data.

- **Scalability:** In large-scale systems, managing convergent keys for a vast amount of data
can become complex, and it requires careful key management practices.

Convergent key management is a useful technique in certain scenarios, but its suitability depends
on the specific requirements and security considerations of the system in which it is
implemented. It is important to choose a secure and well-vetted convergent hash function and
follow best practices for key management to mitigate potential risks.
3. SYSTEM ANALYSIS AND DESIGN

3.1 Existing System

In most of the existing systems recognition accuracy is heavily dependent on the quality of the
input document. In handwritten text adjacent characters tend to be touched or overlapped.
Therefore it is essential to segment a given string correctly into its character components. In most
of the existing segmentation algorithms, human writing is evaluated empirically to deduce rules.
But there is no guarantee for the optimum results of these heuristic rules in all styles of writing.
Moreover handwriting varies from person to person and even for the same person it varies
depending on mood, speed etc. This requires incorporating artificial neural networks, hidden
Markov models and statistical classifiers to extract segmentation rules based on numerical data.

Disadvantages:

1. High Complexity.
2. Difficult to analysis.
3. Time Consumption Is More

3.2 Proposed System

The user can upload the image from the system’s storage. The uploaded image is then
processed in a neural network model (NN model) which identifies the characters, i.e.; the
digits, alphabets or special symbols. After identifying these characters, they are converted
into text (printed text) and this processed document is sent back to the user as output.
Advantages

1. Less Complicated.
2. Easy to process
3. Accuracy is more

3.3 FEASIBILITY STUDY

The feasibility of the project is analyzed in this phase and business proposal is put forth
with a very general plan for the project and some cost estimates. During system analysis the
feasibility study of the proposed system is to be carried out. This is to ensure that the proposed
system is not a burden to the company. For feasibility analysis, some understanding of the major
requirements for the system is essential.

Three key considerations involved in the feasibility analysis are,

 ECONOMICAL FEASIBILITY
 TECHNICAL FEASIBILITY
 SOCIAL FEASIBILITY

3.3.1 ECONOMICAL FEASIBILITY

This study is carried out to check the economic impact that the system will have on the
organization. The amount of fund that the company can pour into the research and development
of the system is limited. The expenditures must be justified. Thus the developed system as well
within the budget and this was achieved because most of the technologies used are freely
available. Only the customized products had to be purchased.

3.3.2 TECHNICAL FESABILITY


This study is carried out to check the technical feasibility, that is, the technical
requirements of the system. Any system developed must not have a high demand on the available
technical resources. This will lead to high demands on the available technical resources. This
will lead to high demands being placed on the client. The developed system must have a modest
requirement, as only minimal or null changes are required for implementing this system.

3.3.3 SOCIAL FEASIBILITY

The aspect of study is to check the level of acceptance of the system by the user. This
includes the process of training the user to use the system efficiently. The user must not feel
threatened by the system, instead must accept it as a necessity. The level of acceptance by the
users solely depends on the methods that are employed to educate the user about the system and
to make him familiar with it. His level of confidence must be raised so that he is also able to
make some constructive criticism, which is welcomed, as he is the final user of the system.

3.4 System Requirements

Functional And Non-Functional Requirements

Functional Requirements

Functional Requirements All necessary functional requirements can be extracted from


the above defined user stories. First, a user needs to be able to add new vocabulary. There should
not be any restrictions on what can be added and vocabulary should not be limited to single
words, because in many cases it is more helpful to add whole phrases instead. Each vocabulary
consists of the phrase the user tries to memorize and an explanation to help understanding the
meaning of the phrase. Next, the chatbot should provide a way to revise vocabulary. There
should be two possible modes for revising; one where users can click a button to tell whether
they remembered the phrase correctly or not, and a second mode whereby users type out the
phrase themselves. In each case the system should keep track of whether users knew the correct
solution or not. Lastly, it is necessary to determine what to study next. A user should not be
required to think about what or when to review vocabulary. The chatbot needs a system to decide
the review time for each vocabulary, and ideally the user is notified when vocabulary is ready to
be reviewed by sending a message to the user. These three main features can be seen as a
sufficient minimal viable product, or MVP. For demonstration purposes it is desired to keep the
product as simple as possible. The knowledge that can be taken from making decisions about the
implementation and walking through the process of creating the chatbot,is mostly independent
from this particular product and can be applied to the development of other chatbot products.

Non Functional Requirements

Non-functional Requirements Since this is a simple example, non-functional requirements


remain minimal. Availability of the service is not a priority, but chatbot software can be scaled
similar to other software, and redundancy can be used to ensure availability. Since messaging
platforms act as intermediary between users and the chatbot software, most platforms also re-
send missed messages in case the chatbot is unavailable. That the platform ensures availability,
further lessens the priority to address it in the chatbot software itself. Similarly, security is not a
main focus here, because the messaging platform itself already handles certain security-sensitive
functionality such as authentication and encryption of communication. A production scenario,
though, would require further care for securing the service. Performance is equally not a major
concern. Because the scope of the example application is limited, the domain specific logic
remains inexpensive in computation. The main performance bottleneck is the in 3.7 on page 20
mentioned aspect of networking and involved unknown parties. Employing performance-
improving solutions for networking issues won’t be a part of the example chatbot, but
performance can be improved by choosing geographically strategic located data centers for
deploying the chatbot software.

A more central requirement is reusability. Although the example focuses on solving a specific
task, the software architecture should be designed in a way that appropriate parts can be reused
for other chatbots in the future. To ensure reusability the software should be documented, stable
and extensible. Usability can be seen as the most important non-functional requirement. The
focus of developing the example chatbot is to design a good user experience and to explore how
interface and interaction design can be best accomplished with the given medium.
3.5 System Architecture

3.5.1 CNN ARCHITECTURE

CNNs are a class of Deep Neural Networks that can recognize and classify particular features
from images and are widely used for analyzing visual images. Their applications range from
image and video recognition, image classification, medical image analysis, computer vision and
natural language processing.

The term ‘Convolution” in CNN denotes the mathematical function of convolution which is a
special kind of linear operation wherein two functions are multiplied to produce a third function
which expresses how the shape of one function is modified by the other. In simple terms, two
images which can be represented as matrices are multiplied to give an output that is used to
extract features from the image.

Technically, deep learning CNN models to train and test, each input image will pass it through a
series of convolution layers with filters (Kernals), Pooling, fully connected layers (FC) and apply
Softmax function to classify an object with probabilistic values between 0 and 1. The below
figure is a complete flow of CNN to process an input image and classifies the objects based on
values.

Fig CNN Architecture

3.5.2 Basic Architecture

There are two main parts to a CNN architecture

 A convolution tool that separates and identifies the various features of the image for
analysis in a process called as Feature Extraction
 A fully connected layer that utilizes the output from the convolution process and predicts
the class of the image based on the features extracted in previous stages.

3.5.3 CNN Layers:

The multiple occurring of these layers shows how deep our network is, and this formation is
known as the deep neural network.

 Input: raw pixel values are provided as input.


 Convolutional layer: Input layers translates the results of the neuron layer. There is a
need to specify the filter to be used. Each filter can only be a 5*5 window that slides over
input data and gets pixels with maximum intensities.
 Rectified linear unit [ReLU] layer: provided activation function on the data taken as an
image. In the case of back propagation, ReLU function is used which prevents the values
of pixels from changing.
 Pooling layer: Performs a down-sampling operation in volume along the dimensions
(width, height).
 Fully connected layer: score class is focused, and a maximum score of the input
characters is found.

As we go deeper and deeper in the layers, the complexity is increased a lot. But it might be worth
going as accuracy may increase but unfortunately, time consumption also increases.

1. Convolutional Layer

This layer is the first layer that is used to extract the various features from the input
images. In this layer, the mathematical operation of convolution is performed between the input
image and a filter of a particular size MxM. By sliding the filter over the input image, the dot
product is taken between the filter and the parts of the input image with respect to the size of the
filter (MxM).
Figure 11 Convolutional Layer

The output is termed as the Feature map which gives us information about the image such as
the corners and edges. Later, this feature map is fed to other layers to learn several other
features of the input image.

2. Pooling Layer

In most cases, a Convolutional Layer is followed by a Pooling Layer. The primary aim of
this layer is to decrease the size of the convolved feature map to reduce the computational
costs. This is performed by decreasing the connections between layers and independently
operates on each feature map. Depending upon method used, there are several types of
Pooling operations.

In Max Pooling, the largest element is taken from feature map. Average Pooling calculates
the average of the elements in a predefined sized Image section. The total sum of the
elements in the predefined section is computed in Sum Pooling. The Pooling Layer usually
serves as a bridge between the Convolutional Layer and the FC Layer.
Figure 12 Pooling Layer

3. Fully Connected Layer

The Fully Connected (FC) layer consists of the weights and biases along with the neurons and is
used to connect the neurons between two different layers. These layers are usually placed before
the output layer and form the last few layers of a CNN Architecture.
Fully Connected Layer

In this, the input image from the previous layers are flattened and fed to the FC layer. The
flattened vector then undergoes few more FC layers where the mathematical functions operations
usually take place. In this stage, the classification process begins to take place.

Dropout
Usually, when all the features are connected to the FC layer, it can cause overfitting in the
training dataset. Overfitting occurs when a particular model works so well on the training
data causing a negative impact in the model’s performance when used on a new data.
Dropout layer

To overcome this problem, a dropout layer is utilised wherein a few neurons are dropped from
the neural network during training process resulting in reduced size of the model. On passing a
dropout of 0.3, 30% of the nodes are dropped out randomly from the neural network.

Activation Functions

An activation function in a neural network defines how the weighted sum of the input is
transformed into an output from a node or nodes in a layer of the network.

Sometimes the activation function is called a “transfer function.” If the output range of the
activation function is limited, then it may be called a “squashing function.” Many activation
functions are nonlinear and may be referred to as the “nonlinearity” in the layer or the network
design.

The choice of activation function has a large impact on the capability and performance of the
neural network, and different activation functions may be used in different parts of the model.

Technically, the activation function is used within or after the internal processing of each node
in the network, although networks are designed to use the same activation function for all nodes
in a layer.

A network may have three types of layers: input layers that take raw input from the domain,
hidden layers that take input from another layer and pass output to another layer, and output
layers that make a prediction.

All hidden layers typically use the same activation function. The output layer will typically use
a different activation function from the hidden layers and is dependent upon the type of
prediction required by the model.

Activation functions are also typically differentiable, meaning the first-order derivative can be
calculated for a given input value. This is required given that neural networks are typically
trained using the backpropagation of error algorithm that requires the derivative of prediction
error in order to update the weights of the model.

There are many different types of activation functions used in neural networks, although
perhaps only a small number of functions used in practice for hidden and output layers.

Finally, one of the most important parameters of the CNN model is the activation function.
They are used to learn and approximate any kind of continuous and complex relationship
between variables of the network. In simple words, it decides which information of the model
should fire in the forward direction and which ones should not at the end of the network.

It adds non-linearity to the network. There are several commonly used activation functions such
as the ReLU, Softmax, tanH and the Sigmoid functions. Each of these functions have a specific
usage. For a binary classification CNN model, sigmoid and softmax functions are preferred an
for a multi-class classification, generally softmax us used.

 A convolution tool that separates and identifies the various features of the image for
analysis in a process called as Feature Extraction
 A fully connected layer that utilizes the output from the convolution process and
predicts the class of the image based on the features extracted in previous stages.

Applications:
1. Object detection: With CNN, we now have sophisticated models like R-CNN, Fast R-
CNN, and Faster R-CNN that are the predominant pipeline for many object detection
models deployed in autonomous vehicles, facial detection, and more.
2. Semantic segmentation: In 2015, a group of researchers from Hong Kong developed a
CNN- based Deep Parsing Network to incorporate rich information into an image
segmentation model. Researchers from UC Berkeley also built fully convolutional
networks that improved upon state-of- the-art semantic segmentation.
Why ConvNets over Feed-Forward Neural Nets?
An image is nothing but a matrix of pixel values, right? So why not just flatten the image (e.g.
3x3 image matrix into a 9x1 vector) and feed it to a Multi-Level Perceptron for classification
purposes?

Flattening of a 3x3 image matrix into a 9x1 vector


In cases of extremely basic binary images, the method might show an average precision score
while performing prediction of classes but would have little to no accuracy when it comes to
complex images having pixel dependencies throughout. A ConvNet is able to successfully
capture the Spatial and Temporal dependencies in an image through the application of relevant
filters. The architecture performs a better fitting to the image dataset due to the reduction in the
number of parameters involved and reusability of weights. In other words, the network can be
trained to understand the sophistication of the image better.

Convolutional neural network is better than a feed-forward network since CNN has features
parameter sharing and dimensionality reduction. Because of parameter sharing in CNN, the
number of parameters is reduced thus the computations also decreased. The main intuition is the
learning from one part of the image is also useful in another part of the image. Because of the
dimensionality reduction in CNN, the computational power needed is reduced.
All the layers of a CNN have multiple convolutional filters working and scanning the complete
feature matrix and carry out the dimensionality reduction. This enables CNN to be a very apt and
fit network for image classifications and processing.

3.6 Data Flow Diagram

DFD graphically representing the functions, or processes, which capture, manipulate, store, and
distribute data between a system and its environment and between components of a system. The
visual representation makes it a good communication tool between User and System designer.
Structure of DFD allows starting from a broad overview and expand it to a hierarchy of detailed
diagrams. DFD has often been used due to the following reasons:

1. Logical information flow of the system


2. Determination of physical system construction requirements
3. Simplicity of notation
4. Establishment of manual and automated systems requirements

3.7 UML DIAGRAMS

A UML diagram is a diagram based on the UML (Unified Modeling Language) with the purpose
of visually representing a system along with its main actors, roles, actions, artifacts or classes, in
order to better understand, alter, maintain, or document information about the system. UML is an
acronym that stands for Unified Modelling Language. Simply put, UML is a modern approach to
modelling and documenting software. In fact, it’s one of the most popular business process
modelling techniques. It is based on diagrammatic representations of software components. As
the old proverb says: “a picture is worth a thousand words”. By using visual representations, we
are able to better understand possible flaws or errors in software or business processes.
Use Case Diagram:

A use case diagram at its simplest is a representation of a user's interaction with the system that
shows the relationship between the user and the different use cases in which the user is involved.
Component diagrams are used to describe the components and deployment diagrams shows how
they are deployed in hardware. UML is mainly designed to focus on the software artifacts of a
system. However, these two diagrams are special diagrams used to focus on software and
hardware components. Most of the UML diagrams are used to handle logical components of the
system.

A use case diagram can identify the different types of users of a system and the different use
cases and will often be accompanied by other types of diagrams as well. The use cases are
represented by either circles or ellipses. A use-case diagram can help provide a higher-level view
of the system. Use-Case provide the simplified and graphical representation of what the system
must actually do.

In software and systems engineering, a use case is a list of actions or event steps typically
defining the interactions between a role known in the Unified Modeling Language (UML) as an
actor and a system to achieve a goal. The actor can be a human or other external system. In
systems engineering, use cases are used at a higher level than within software engineering. The
detailed requirements may then be captured in the Systems Modeling Language. Use case
analysis is an important and valuable requirement analysis technique that has been widely used
in modern software engineering.
Use case diagram

3.7.1 Class Diagram:

Class diagram model class structure and contents using design elements such as classes,
packages and objects. Class diagram describes 3 perspectives when designing a system
Conceptual, Specification, Implementation. Classes are composed of three things: name,
attributes and operations. Class diagrams also display relations such as containment, inheritance,
associations etc. The association relationship is most common relationship in a class diagram.
The association shows the relationship between instances of classes. The purpose of class
diagram is to model the static view of an application. Class diagrams are the only diagrams
which can be directly mapped with object-oriented languages and thus widely used at the time of
construction. UML diagrams like activity diagram, sequence diagram can only give the sequence
flow of the application, however class diagram is a bit different. It is the most popular UML
diagram in the coder community.
The purpose of the class diagram can be summarized as −

 Analysis and design of the static view of an application.


 Describe responsibilities of a system.
 Base for component and deployment diagrams.
 Forward and reverse engineering.

Class Diagram
3.7.3 Sequence Diagram

SEQUENCE DIAGRAM

3.7.4 Activity Diagram


3.9 .OUTPUT DESGIN :
4. Implementation

4.1 Modules

Basic steps in constructing a Machine Learning model:

4.1.1- Data Collection

 The quantity & quality of your data dictate how accurate our model is
 The outcome of this step is generally a representation of data (Guo simplifies to
specifying a table) which we will use for training
 Using pre-collected data, by way of datasets from Kaggle, UCI, etc., still fits into this
step.

4.1.2- Data Preparation

 Wrangle data and prepare it for training


 Clean that which may require it (remove duplicates, correct errors, deal with missing
values, normalization, data type conversions, etc.)
 Randomize data, which erases the effects of the particular order in which we collected
and/or otherwise prepared our data
 Visualize data to help detect relevant relationships between variables or class imbalances
(bias alert!), or perform other exploratory analysis
 Split into training and evaluation sets

4.1.3 Choose a Model

Different algorithms are for different tasks; choose the right one.

4.1.4 Train the Model

 The goal of training is to answer a question or make a prediction correctly as often as


possible
 Linear regression example: algorithm would need to learn values for m (or W) and b (x is
input, y is output)
 Each iteration of process is a training step
4.1.5 Evaluate the Model

Uses some metric or combination of metrics to "measure" objective performance of model .Test
the model against previously unseen data .This unseen data is meant to be somewhat
representative of model performance in the real world, but still helps tune the model (as opposed
to test data, which does not) .Good train/eval split? 80/20, 70/30, or similar, depending on
domain, data availability, dataset particulars, etc.

4.1.6 Parameter Tuning


This step refers to hyperparameter tuning, which is an "artform" as opposed to a science Tune
model parameters for improved performance Simple model hyperparameters may include:
number of training steps, learning rate, initialization values and distribution, etc.

4.1.7 Make Predictions


Using further (test set) data which have, until this point, been withheld from the model (and
for which class labels are known), are used to test the model; a better approximation of how
the model will perform in the real world

4.1.8 Methodologies for Handwritten Character Recognition System


We used MNIST as a primary dataset to train the model, and it consists of 70,000
handwritten raster images from 250 different sources out of which 60,000 are used for
training, and the rest are used for training validation. Our proposed method mainly
separated into stages, preprocessing, Model Construction, Training & Validation, Model
Evaluation & Prediction. Since the loading dataset is necessary for any process, all the
steps come after it.

Steps in System development

Import the libraries:


Libraries required are Keras, Tensor flow, Numpy, Pillow, Tkinkter.

1. Keras:

Keras is a powerful and easy-to-use free open source Python library for
developing and evaluating deep learning models.

It wraps the efficient numerical computation libraries Theano and TensorFlow


and allows you to define and train neural network models in just a few lines of code. It
uses libraries such as Python, C#, C++ or standalone machine learning toolkits. Theano
and TensorFlow are very powerful libraries but difficult to understand for creating neural
networks.

Keras is based on minimal structure that provides a clean and easy way to create
deep learning models based on TensorFlow or Theano. Keras is designed to quickly
define deep learning models. Well, Keras is an optimal choice for deep learning
applications.

2. TensorFlow:

TensorFlow is a Python library for fast numerical computing created and released by
Google. It is a foundation library that can be used to create Deep Learning models
directly or by using wrapper libraries that simplify the process built on top of
TensorFlow. TensorFlow tutorial is designed for both beginners and professionals. Our
tutorial provides all the basic and advanced concept of machine learning and deep
learning concept such as deep neural network, image processing and sentiment analysis.
TensorFlow is one of the famous deep learning frameworks, developed by Google Team.
It is a free and open source software library and designed in Python programming
language, this tutorial is designed in such a way that we can easily implements deep
learning project on TensorFlow in an easy and efficient way. Unlike other numerical
libraries intended for use in Deep Learning like Theano, TensorFlow was designed for
use both in research and development and in production systems. It can run on single
CPU systems, GPUs as well as mobile devices and largescale distributed systems of
hundreds of machines.
3. Numpy:

NumPy is a Python library used for working with arrays. It also has functions for
working in domain of linear algebra, Fourier transform, and matrices. Numpy which
stands for Numerical Python, is a library consisting of multidimensional array objects and
a collection of routines for processing those arrays. Using NumPy, mathematical and
logical operations on arrays can be performed. This tutorial explains the basics of NumPy
such as its architecture and environment. It also discusses the various array functions,
types of indexing, etc. It is an open source project and you can use it freely. NumPy
stands for Numerical Python. NumPy aims to provide an array object that is up to 50x
faster than traditional Python lists. The array object in NumPy is called ndarray, it
provides a lot of supporting functions that make working with ndarray very
easy. Arrays are very frequently used in data science, where speed and resources are very
important.

4. Pillow:

Pillow is a free and open source library for the Python programming language that allows
you to easily create &s manipulate digital images. Pillow is built on top of PIL (Python
Image Library). PIL is one of the important modules for image processing in Python.
However, the PIL module is not supported since 2011 and doesn’t support python. Pillow
module gives more functionalities, runs on all major operating system and support for
python.It supports wide variety of images such as “jpeg”, “png”, “bmp”, “gif”, “ppm”,
“tiff”. You can do almost anything on digital images using pillow module. Apart from
basic image processing functionality, including point operations, filtering images using
built-in convolution kernels, and color space conversions.

5. Tkinkter:

Tkinter is the standard GUI library for Python. Python when combined with Tkinter
provides a fast and easy way to create GUI applications. Tkinter provides a powerful
object-oriented interface to the Tk GUI toolkit. We need to import all the modules that
we are going to need for training our model. The Keras library already contains some
datasets and MNIST is one of them. So we can easily import the dataset through Keras.
The mnist.load_data() method returns the training data, its labels along with the testing
data and its labels.

Loading The Data Set:


MNIST Data Set:
Modified National Institute of Standards and Technology (MNIST) is a large set of
computer vision dataset which is extensively used for training and testing different systems.
It was created from the two special datasets of National Institute of Standards and
Technology (NIST) which holds binary images of handwritten digits. The training set
contains handwritten digits from 250 people, among them 50% training dataset was
employees from the Census Bureau and the rest of it was from high school students.
However, it is often attributed as the first datasets among other datasets to prove the
effectiveness of the neural networks.

The database contains 60,000 images used for training as well as few of them can be
used for cross- validation purposes and 10,000 images used for testing. All the digits are
grayscale and positioned in a fixed size where the intensity lies at the center of the image
with 28×28 pixels. Since all the images are 28×28 pixels, it forms an array which can be
flattened into 28*28=784 dimensional vector. Each component of the vector is a binary
value which describes the intensity of the pixel.

Pre-Processing

Data pre-processing plays an important role in any recognition process. Data


preprocessing is a data mining technique which is used to transform the raw data in a useful
and efficient format. To shape the input images in a form suitable for segmentation pre-
processing is used. Data preprocessing is a necessary step before building a model with
these features. It usually happens in stages.

 Data quality assessment


 Data cleaning
 Data transformation
 Data reduction
Data quality assessment:

A Data Quality Assessment is a distinct phase within the data quality life-cycle that
is used to verify the source, quantity and impact of any data items that breach pre-defined
data quality rules. The Data Quality Assessment can be executed as a one-off process or
repeatedly as part of an ongoing data quality assurance initiative.

The quality of your data can quickly decay over time, even with stringent data
capture methods cleaning the data as it enters your database. People moving house, changing
phone numbers and passing away all mean the data you hold can quickly become out of
date.

A Data Quality Assessment helps to identify those records that have become
inaccurate, the potential impact that inaccuracy may have caused and the data’s source.
Through this assessment, it can be rectified and other potential issues identified.

Data cleaning:

Data cleaning is one of the important parts of machine learning. It plays a significant
part in building a model. It surely isn’t the fanciest part of machine learning and at the same
time, there aren’t any hidden tricks or secrets to uncover. However, proper data cleaning can
make or break your project. Professional data scientists usually spend a very large portion of
their time on this step. Because of the belief that, “Better data beats fancier algorithms”. If
we have a well-cleaned dataset, we can get desired results even with a very simple
algorithm, which can prove very beneficial at times. Obviously, different types of data will
require different types of cleaning. However, this systematic approach can always serve as a
good starting point.

Data transformation
In fact, by cleaning and smoothing the data, we have already performed data modification.
However, by data transformation, we understand the methods of turning the data into an
appropriate format for the computer to learn from. Data transformation is the process in
which data is taken from its raw, siloed and normalized source state and transform it into
data that’s joined together, dimensionally modelled, de-normalized, and ready for analysis.
Without the right technology stack in place, data transformation can be time-consuming,
expensive, and tedious. Nevertheless, transforming the data will ensure maximum data
quality which is imperative to gaining accurate analysis, leading to valuable insights that
will eventually empower data-driven decisions.

Building and training models to process data is a brilliant concept, and more enterprises
have adopted, or plan to deploy, machine learning to handle many practical applications. But
for models to learn from data to make valuable predictions, the data itself must be organized
to ensure its analysis yield valuable insights.

Data reduction:
Data reduction is a process that reduced the volume of original data and represents it in a
much smaller volume. Data reduction techniques ensure the integrity of data while reducing
the data. The time required for data reduction should not overshadow the time saved by the
data mining on the reduced data set.

Data Reduction Techniques:


Techniques of data deduction include dimensionality reduction, numerosity reduction and
data compression.

1. Dimensionality Reduction:
a. Wavelet Transform
b. Principal Component Analysis
c. Attribute Subset Selection
2. Numerosity Reduction:
a. Parametric
b. Non-Parametric

3. Data Compression:
When you work with large amounts of data, it becomes harder to come up with
reliable solutions. Data reduction can be used to reduce the amount of data and decrease the
costs of analysis. After loading the data, we separated the data into X and y where X is the
image, and y is the label corresponding to X. The first layer/input layer for our model is
convolution. Convolution takes each pixel as a neuron, so we need to reshape the images
such that each pixel value is in its own space, thus converting a 28x28 matrix of greyscale
values into 28x28x1 tensor. With the right dimensions for all the images, we can split the
images into train and test for further steps.

After loading the data, we separated the data into X and y where X is the image, and
y is the label corresponding to X. The first layer/input layer for our model is convolution.
Convolution takes each pixel as a neuron, so we need to reshape the images such that each
pixel value is in its own space, thus converting a 28x28 matrix of greyscale values into
28x28x1 tensor. With the right dimensions for all the images, we can split the images into
train and test for further steps.

Data Encoding:

This is an optional step since we are using the cross-categorical entropy as loss function. We
have to specify the network that the given labels are categorical in nature. The raw data can
contain various different types of data which can be both structured and unstructured and
needs to be processed in order to bring to form that is usable in the Machine Learning
models. Since machine learning is based on mathematical equations, it would cause a
problem when we keep categorical variables as is. Many algorithms support categorical
values without further manipulation, but in those cases, it’s still a topic of discussion on
whether to encode the variables or not. After the identification of the data types of the
features present in the data set, the next step is to process the data in a way that is suitable to
put to Machine Learning models. The three popular techniques of converting Categorical
values to Numeric values are done in two different methods.

1. Label Encoding.
2. One Hot Encoding.
3. Binary Encoding.

Encoding variability describes the variation of encoding of individually inside a category.


When we talk about the variability in one hot encoding, the variability depends on the time
of implementation in which it decides the number of categories to take that do have
sufficient impact on the target. Other encoding methodologies do show a significant
variability which is identified at the time of validation.
Model Construction

Now, comes the fun part where we finally get to use the meticulously prepared data for
model building. Depending on the data type (qualitative or quantitative) of the target
variable (commonly referred to as the Y variable) we are either going to be building a
classification (if Y is qualitative) or regression (if Y is quantitative) model.

Learning Algorithms :

Machine learning algorithms could be broadly categorised to one of three types:

1. Supervised learning — In supervised learning, each example is a pair consisting of


an input object (typically a vector) and a desired output value (also called the
supervisory signal). A supervised learning algorithm analyzes the training data and
produces an inferred function, which can be used for mapping new examples. An
optimal scenario will allow for the algorithm to correctly determine the class labels for
unseen instances.

It is a machine learning task that establishes the mathematical relationship between input X
and output Y variables. Such X, Y pair constitutes the labeled data that are used for model
building in an effort to learn how to predict the output from the input. Supervised learning
problems can be further grouped into regression and classification problems.

 Classification: A classification problem is when the output variable is a category, such


as “red” or “blue” or “disease” and “no disease”.
 Regression: A regression problem is when the output variable is a real value, such as
“dollars” or “weight”.

2. Unsupervised learning — is a machine learning task that makes use of only the input X
variables. Such X variables are unlabeled data that the learning algorithm uses in modeling
the inherent structure of the data. Unsupervised learning problems can be further grouped into
clustering and association problems.

 Clustering: A clustering problem is where you want to discover the inherent groupings
in the data, such as grouping customers by purchasing behavior.
 Association: An association rule learning problem is where you want to discover rules
that describe large portions of your data, such as people that buy X also tend to buy Y.
4. Reinforcement learning — Reinforcement learning is an area of Machine Learning. It
is about taking suitable action to maximize reward in a particular situation. It is
employed by various software and machines to find the best possible behaviour or
path it should take in a specific situation.

Reinforcement learning differs from the supervised learning in a way that in


supervised learning the training data has the answer key with it so the model is trained with
the correct answer itself whereas in reinforcement learning, there is no answer but the
reinforcement agent decides what to do to perform the given task.

In the absence of a training dataset, it is bound to learn from its experience. It is a


machine learning task that decides on the next course of action and it does this by learning
through trial and error in an effort to maximize the reward.

 Input: The input should be an initial state from which the model will start
 Output: There are many possible output as there are variety of solution to a particular
problem
 Training: The training is based upon the input, The model will return a state and the
user will decide to reward or punish the model based on its output.
 The model keeps continues to learn.
 The best solution is decided based on the maximum reward.

4.1.8 MODELS THAT CAN BE USED FOR THE PROJECT:

SUPPORT VECTOR MACHINE:

Support Vector Machine or SVM is one of the most popular Supervised


Learning algorithms, which is used for Classification as well as Regression problems.
However, primarily, it is used for Classification problems in Machine Learning. The
goal of the SVM algorithm is to create the best line or decision boundary that can
segregate n-dimensional space into classes so that we can easily put the new data point
in the correct category in the future. This best decision boundary is called a
hyperplane.

SVM chooses the extreme points/vectors that help in creating the hyperplane.
These extreme cases are called as support vectors, and hence algorithm is termed as
Support Vector Machine. SVM algorithm can be used for Face detection, image
classification, text categorization, etc.

SVM can be of two types:

 Linear SVM: Linear SVM is used for linearly separable data, which means if a
dataset can be classified into two classes by using a single straight line, then
such data is termed as linearly separable data, and classifier is used called as
Linear SVM classifier.
 Non-linear SVM: Non-Linear SVM is used for non-linearly separated data,
which means if a dataset cannot be classified by using a straight line, then such
data is termed as non-linear data and classifier used is called as Non-linear
SVM classifier.

Example: SVM can be understood with the example that we have used in the
KNN classifier. Suppose we see a strange cat that also has some features of dogs, so if
we want a model that can accurately identify whether it is a cat or dog, so such a
model can be created by using the SVM algorithm. We will first train our model with
lots of images of cats and dogs so that it can learn about different features of cats and
dogs, and then we test it with this strange creature. So as support vector creates a
decision boundary between these two data (cat and dog) and choose extreme cases
(support vectors), it will see the extreme case of cat and dog. On the basis of the
support vectors, it will classify it as a cat.
SVM working graph

The followings are important concepts in SVM −

 Support Vectors − Datapoints that are closest to the hyperplane is called


support vectors. Separating line will be defined with the help of these data
points.
 Hyperplane − As we can see in the above diagram, it is a decision plane or
space which is divided between a set of objects having different classes.
 Margin − It may be defined as the gap between two lines on the closet data
points of different classes. It can be calculated as the perpendicular distance
from the line to the support vectors. Large margin is considered as a good
margin and small margin is considered as a bad margin.
 The main goal of SVM is to divide the datasets into classes to find a maximum
marginal hyperplane (MMH) and it can be done in the following two steps
 First, SVM will generate hyperplanes iteratively that segregates the classes in
best way.
 Then, it will choose the hyperplane that separates the classes correctly.

Pros of SVM classifiers


SVM classifiers offers great accuracy and work well with high dimensional space.
SVM classifiers basically use a subset of training points hence in result uses very less
memory.

Cons of SVM classifiers

They have high training time hence in practice not suitable for large datasets. Another
disadvantage is that SVM classifiers do not work well with overlapping classes.

K-NN ALGORITHM:

 K-Nearest Neighbour is one of the simplest Machine Learning algorithms based on


Supervised Learning technique.
 K-NN algorithm assumes the similarity between the new case/data and available cases
and put the new case into the category that is most similar to the available categories.
 K-NN algorithm stores all the available data and classifies a new data point based on
the similarity. This means when new data appears then it can be easily classified into a
well suite category by using K- NN algorithm.
 K-NN algorithm can be used for Regression as well as for Classification but mostly it
is used for the Classification problems.
 K-NN is a non-parametric algorithm, which means it does not make any assumption
on underlying data.
 It is also called a lazy learner algorithm because it does not learn from the training
set immediately instead it stores the dataset and at the time of classification, it
performs an action on the dataset.
 KNN algorithm at the training phase just stores the dataset and when it gets new data,
then it classifies that data into a category that is much similar to the new data.
 Example: Suppose, we have an image of a creature that looks similar to cat and dog,
but we want to know either it is a cat or dog. So for this identification, we can use the
KNN algorithm, as it works on a similarity measure. Our KNN model will find the
similar features of the new data set to the cats and dogs images and based on the most
similar features it will put it in either cat or dog category.
The K-NN working can be explained on the basis of the below algorithm:

 Step-1: Select the number K of the neighbours


 Step-2: Calculate the Euclidean distance of K number of Neighbours
 Step-3: Take the K nearest Neighbours as per the calculated Euclidean distance.
 Step-4: Among these k Neighbours, count the number of the data points in each category.
 Step-5: Assign the new data points to that category for which the number of the neighbor
is maximum.
 Step-6: Our model is ready.

Advantages of KNN Algorithm:

 It is simple to implement.
 It is robust to the noisy training data.
 It can be more effective if the training data is large.

Disadvantages of KNN Algorithm:

Always needs to determine the value of K which may be complex some time.

The computation cost is high because of calculating the distance between the data points for all
the training samples.

Steps to implement the K-NN algorithm:


 Data Pre-processing step
 Fitting the K-NN algorithm to the Training set
 Predicting the test result
 Test accuracy of the result(Creation of Confusion matrix)
 Visualizing the test set result.

This is pseudocode for implementing the KNN algorithm from scratch:

1. Load the training data.


2. Prepare data by scaling, missing value treatment, and dimensionality reduction as
required.
3. Find the optimal value for K:
4. Predict a class value for new data:
a. Calculate distance (X, Xi) from i =1, 2, 3,….,n.
where X= new data point, Xi= training data, distance as per your chosen distance
metric.
b. Sort these distances in increasing order with corresponding train data.
c. From this sorted list, select the top ‘K’ rows.
d. Find the most frequent class from these chosen ‘K’ rows. This will be your
predicted class.

After data encoding, the images and labels are ready to be fitted into our model. We need to
define a Convolutional Neural Network Model.

3.CONVOLUTION NEURAL NETWORK:

In simpler words, CNN is an artificial neural network that specializes in picking out or detect
patterns and make sense of them. Thus, CNN has been most useful for image classification. A
CNN model has various types of filters of different sizes and numbers. These filters are
essentially what helps us in detecting the pattern. The convolutional neural network, or CNN for
short, is a specialized type of neural network model designed for working with two-dimensional
image data, although they can be used with one-dimensional and three-dimensional data.

Central to the convolutional neural network is the convolutional layer that gives the network
its name. This layer performs an operation called a “convolution“. A CNN model generally
consists of convolutional and pooling layers. It works better for data that are represented as
grid structures, this is the reason why CNN works well for image classification problems.
The dropout layer is used to deactivate some of the neurons and while training, it reduces
offer fitting of the model. Our model is composed of feature extraction with convolution and
binary classification. Convolution and max pooling are carried out to extract the features in
the image, and a 32 3x3 convolution filters are applied to a 28x28 image followed by a max-
pooling layer of 2x2 pooling size followed by another convolution layer with 64 3x3 filters.

In the end, we obtain 7x7 images to flatten. Flatten layer will flatten the 7x7 images into a
series of 128 values that will be mapped to a dense layer of 128 neurons that are connected
to the categorical output layer of 10 neurons.

The filter is smaller than the input data and the type of multiplication applied between a
filter-sized patch of the input and the filter is a dot product. A dot product is the element-
wise multiplication between the filter-sized patch of the input and filter, which is then
summed, always resulting in a single value. Because it results in a single value, the
operation is often referred to as the “scalar product” Using a filter smaller than the input is
intentional as it allows the same filter (set of weights) to be multiplied by the input array
multiple times at different points on the input. Specifically, the filter is applied
systematically to each overlapping part or filter-sized patch of the input data, left to right,
top to bottom.

The output from multiplying the filter with the input array one time is a single value. As the
filter is applied multiple times to the input array, the result is a two-dimensional array of
output values that represent a filtering of the input. As such, the two-dimensional output
array from this operation is called a “feature map”.

Convolution in Computer Vision:

The idea of applying the convolutional operation to image data is not new or unique to
convolutional neural networks; it is a common technique used in computer vision.

Historically, filters were designed by hand by computer vision experts, which were then
applied to an image to result in a feature map or output from applying the filter then makes
the analysis of the image easier in some way. The network will learn what types of features
to extract from the input. Specifically, training under stochastic gradient descent, the
network is forced to learn to extract features from the image that minimize the loss for the
specific task the network is being trained to solve, e.g. extract features that are the most
useful for classifying images as dogs or cats.

Worked Example of Convolutional Layers

The Keras deep learning library provides a suite of convolutional layers.

We can better understand the convolution operation by looking at some worked examples
with contrived data and handcrafted filters.

The one-dimensional convolutional layer and a two-dimensional convolutional layer


example to both make the convolution operation concrete and provide a worked example of
using the Keras layers.

 Convolutional neural networks apply a filter to an input to create a feature map that
summarizes the presence of detected features in the input.
 Filters can be handcrafted, such as line detectors, but the innovation of convolutional
neural networks is to learn the filters during training in the context of a specific
prediction problem.

WORKING OF CNN:

Convolutional neural networks are composed of multiple layers of artificial neurons.


Artificial neurons, a rough imitation of their biological counterparts, are mathematical
functions that calculate the weighted sum of multiple inputs and outputs an activation value.

The behaviour of each neuron is defined by its weights. When fed with the pixel values, the
artificial neurons of a CNN pick out various visual features.

When you input an image into a ConvNet, each of its layers generates several activation
maps. Activation maps highlight the relevant features of the image. Each of the neurons takes
a patch of pixels as input, multiplies their colour values by its weights, sums them up, and
runs them through the activation function

The first (or bottom) layer of the CNN usually detects basic features such as horizontal,
vertical, and diagonal edges. The output of the first layer is fed as input of the next layer,
which extracts more complex features, such as corners and combinations of edges. As you
move deeper into the convolutional neural network, the layers start detecting higher-level
features such as objects, faces, and more.

The operation of multiplying pixel values by weights and summing them is called
“convolution” (hence the name convolutional neural network). A CNN is usually composed
of several convolution layers, but it also contains other components. The final layer of a
CNN is a classification layer, which takes the output of the final convolution layer as input
(remember, the higher convolution layers detect complex objects).

Based on the activation map of the final convolution layer, the classification layer outputs a
set of confidence scores (values between 0 and 1) that specify how likely the image is to
belong to a “class.” For instance, if you have a ConvNet that detects cats, dogs, and horses,
the output of the final layer is the possibility that the input image contains any of those
animals.

After selecting the model the following process is done:

The model type that we will be using is Sequential. Sequential is the easiest way to build a
model in Keras. It allows you to build a model layer by layer. We use the ‘add()’ function to
add layers to our model. Our first 2 layers are Conv2D layers. These are convolution layers
that will deal with our input images, which are seen as 2-dimensional matrices. 64 in the
first layer and 32 in the second layer are the number of nodes in each layer. This number can
be adjusted to be higher or lower, depending on the size of the dataset. In our case, 64 and
32 work well, so we will stick with this for now.

Kernel size is the size of the filter matrix for our convolution. So a kernel size of 3
means we will have a 3x3 filter matrix. Refer back to the introduction and the first image for
a refresher on this. Activation is the activation function for the layer. The activation function
we will be using for our first 2 layers is the ReLU, or Rectified Linear Activation. This
activation function has been proven to work well in neural networks.

Our first layer also takes in an input shape. This is the shape of each input image, 28,28,1 as
seen earlier on, with the 1 signifying that the images are greyscale. In between the Conv2D
layers and the dense layer, there is a ‘Flatten’ layer.

Flatten serves as a connection between the convolution and dense layers. ‘Dense’ is
the layer type we will use in for our output layer. Dense is a standard layer type that is used
in many cases for neural networks. We will have 10 nodes in our output layer, one for each
possible outcome (0–9). The activation is ‘softmax’. Softmax makes the output sum up to 1
so the output can be interpreted as probabilities. The model will then make its prediction
based on which option has the highest probability.
Training & Validation :

After the construction of the model the model has to be compiled to train it with the
available data set. Optimizers are used to compile the model. Compiling the model takes
three parameters: optimizer, loss and metrics. Optimizers are algorithms or methods used to
change the attributes of the neural network such as weights and learning rate to reduce the
losses. Optimizers are used to solve optimization problems by minimizing the function.

The optimizer controls the learning rate. We will be using ‘adam’ as our optmizer.
Adam is generally a good optimizer to use for many cases. The adam optimizer adjusts the
learning rate throughout training. The learning rate determines how fast the optimal weights
for the model are calculated. A smaller learning rate may lead to more accurate weights (up
to a certain point), but the time it takes to compute the weights will be longer.

We will use ‘categorical_crossentropy’ for our loss function. This is the most
common choice for classification. A lower score indicates that the model is performing
better. To make things even easier to interpret, we will use the ‘accuracy’ metric to see the
accuracy score on the validation set when we train the model. The idea behind training and
testing any data model is to achieve maximum learning rate and maximum validation. Better
Learning rate and better validation can be achieved by increasing the train and test data
respectively.

Once the model is successfully assembled, then we can train the model with training
data for 100 iterations, but as the number of iteration increases, there is a chance for
overfitting. Therefore we limit the training up to 98% accuracy, as we are using real-world
data for prediction, test data was used to validate the model.

Different optimizers used in Neural Networks are:

1. Gradient Descent
2. Stochastic Gradient Descent (SGD)
3. Mini Batch Stochastic Gradient Descent (MB-SGD)
4. SGD with momentum
5. Nesterov Accelerated Gradient (NAG)
6. Adaptive Gradient (AdaGrad)
7. AdaDelta
8. RMSprop
9. Adam

ADAM Optimizer :

Adaptive Moment Estimation is an algorithm for optimization technique for gradient


descent. The method is really efficient when working with large problem involving a lot of
data or parameters. It requires less memory and is efficient. Intuitively, it is a combination of
the ‘gradient descent with momentum’ algorithm and the ‘RMSP’ algorithm. The Adam
optimization algorithm is an extension to stochastic gradient descent that has recently seen
broader adoption for deep learning applications in computer vision and natural language
processing. Adam is an optimization algorithm that can be used instead of the classical
stochastic gradient descent procedure to update network weights iterative based in training
data.

Adam was presented by Diederik Kingma from OpenAI and Jimmy Ba from the
University of Toronto in their 2015 ICLR paper (poster) titled “Adam: A Method for
Stochastic Optimization“.

The authors describe Adam as combining the advantages of two other extensions of
stochastic gradient descent. Specifically:

Adaptive Gradient Algorithm (AdaGrad) that maintains a per-parameter learning rate that
improves performance on problems with sparse gradients (e.g. natural language and computer
vision problems). Adaptive Moment Estimation is most popular today. ADAM computes
adaptive learning rates for each parameter. In addition to storing an exponentially decaying
average of past squared gradients vt like Adadelta and RMSprop, Adam also keeps an
exponentially decaying average of past gradients mt, similar to momentum

Root Mean Square Propagation (RMSProp) that also maintains per-parameter learning
rates that are adapted based on the average of recent magnitudes of the gradients for the
weight (e.g. how quickly it is changing). This means the algorithm does well on online and
non-stationary problems (e.g. noisy).
Properties of Adam:

1. Actual step size taken by the Adam in each iteration is approximately bounded the
step size hyper-parameter. This property add intuitive understanding to previous
unintuitive learning rate hyper-parameter.
2. Step size of Adam update rule is invariant to the magnitude of the gradient, which
helps a lot when going through areas with tiny gradients (such as saddle points or
ravines). In these areas SGD struggles to quickly navigate through them.
3. Adam was designed to combine the advantages of Adagrad, which works well with
sparse gradients, and RMSprop, which works well in on-line settings. Having both of
these enables us to use Adam for broader range of tasks. Adam can also be looked at
as the combination of RMSprop and SGD with momentum.

Why ADAM?

1. Adam is an optimization algorithm that can be used instead of the classical stochastic
gradient descent procedure to update network weights iterative based in training data.
2. Adam combines the best properties of the AdaGrad and RMSProp algorithms to
provide an optimization algorithm that can handle sparse gradients on noisy problems.
3. Adam is relatively easy to configure where the default configuration parameters do
well on most problems.

4.1.19 Model Evaluation & Prediction:

For real-world image classification prediction, we need to do a little image pre-processing on


the real-world images as model training was done with greyscale raster images. The steps of
image pre-processing are :

1. Loading image
2. Convert the image to greyscale
3. Resize the image to 28x28
4. Converting the image into a matrix form
5. Reshape the matrix into 28x28x1

After pre processing, we predict the label of the image by passing the pre-processed
image through the neural network. The output we get is a list of 10 activation values 0 to 9,
respectively. The position having the highest value is the predicted label for the image.

These structures are called as Neural Networks. It teaches the computer to do what
naturally comes to humans. Deep learning, there are several types of models such as the
Artificial Neural Networks (ANN), Autoencoders, Recurrent Neural Networks (RNN) and
Reinforcement Learning. But there has been one particular model that has contributed a lot in
the field of computer vision and image analysis which is the Convolutional Neural Networks
(CNN) or the ConvNet.

CNNs are a class of Deep Neural Networks that can recognize and classify particular
features from images and are widely used for analyzing visual images. Their applications
range from image and video recognition, image classification, medical image analysis,
computer vision and natural language processing.

Methods for evaluating a model’s performance are divided into 2 categories: namely,
holdout and Cross- validation. Both methods use a test set (i.e data not seen by the model) to
evaluate model performance. It’s not recommended to use the data we used to build the
model to evaluate it. This is because our model will simply remember the whole training set,
and will therefore always predict the correct label for any point in the training set. This is
known as overfitting.

Holdout:

The purpose of holdout evaluation is to test a model on different data than it was trained on.
This provides an unbiased estimate of learning performance.

In this method, the dataset is randomly divided into three subsets:

1. Training set is a subset of the dataset used to build predictive models.


2. Validation set is a subset of the dataset used to assess the performance of the model
built in the training phase. It provides a test platform for fine-tuning a model’s
parameters and selecting the best performing model. Not all modeling algorithms need
a validation set.
3. Test set, or unseen data, is a subset of the dataset used to assess the likely future
performance of a model. If a model fits to the training set much better than it fits the
test set, overfitting is probably the cause.

The holdout approach is useful because of its speed, simplicity, and flexibility. However, this
technique is often associated with high variability since differences in the training and test
dataset can result in meaningful differences in the estimate of accuracy.

Cross-Validation:

Cross-validation is a technique that involves partitioning the original observation dataset into
a training set, used to train the model, and an independent set used to evaluate the analysis.

The most common cross-validation technique is k-fold cross-validation, where the original
dataset is partitioned into k equal size subsamples, called folds. The k is a user-specified
number, usually with 5 or 10 as its preferred value. This is repeated k times, such that each
time, one of the k subsets is used as the test set/validation set and the other k-1 subsets are put
together to form a training set. The error estimation is averaged over all k trials to get the total
effectiveness of our model.

4.2 SOFTWARE ENVIRONMENT :

4.2.1 Python :

Below are some facts about Python.

Python is currently the most widely used multi-purpose, high-level programming language.
Python allows programming in Object-Oriented and Procedural paradigms. Python programs
generally are smaller than other programming languages like Java. Programmers have to
type relatively less and indentation requirement of the language, makes them readable all the
time. Python language is being used by almost all tech-giant companies like – Google,
Amazon, Facebook, Instagram, Dropbox, Uber… etc.

The biggest strength of Python is huge collection of standard library which can be
used for the following –

 Machine Learning
 GUI Applications (like Kivy, Tkinter, PyQt etc. )
 Web frameworks like Django (used by YouTube, Instagram, Dropbox)
 Image processing (like Opencv, Pillow)
 Web scraping (like Scrapy, BeautifulSoup, Selenium)
 Test frameworks
 Multimedia

4.2.2 Why Choose Python

Let’s see how Python dominates over other languages.

1. Extensive Libraries

Python downloads with an extensive library and it contain code for various purposes like
regular expressions, documentation-generation, unit-testing, web browsers, threading,
databases, CGI, email, image manipulation, and more. So, we don’t have to write the
complete code for that manually.

2. Extensible

As we have seen earlier, Python can be extended to other languages. You can write some of
your code in languages like C++ or C. This comes in handy, especially in projects.

3. Embeddable

Complimentary to extensibility, Python is embeddable as well. You can put your Python code
in your source code of a different language, like C++. This lets us add scripting capabilities to
our code in the other language.

4. Improved Productivity

The language’s simplicity and extensive libraries render programmers more productive than
languages like Java and C++ do. Also, the fact that you need to write less and get more things
done.

5. IOT Opportunities

Since Python forms the basis of new platforms like Raspberry Pi, it finds the future bright for
the Internet Of Things. This is a way to connect the language with the real world.
When working with Java, you may have to create a class to print ‘Hello World’. But
in Python, just a print statement will do. It is also quite easy to learn, understand, and code.
This is why when people pick up Python, they have a hard time adjusting to other more
verbose languages like Java.

6. Readable

Because it is not such a verbose language, reading Python is much like reading English. This
is the reason why it is so easy to learn, understand, and code. It also does not need curly
braces to define blocks, and indentation is mandatory. This further aids the readability of the
code.

7. Object-Oriented

This language supports both the procedural and object-oriented programming paradigms.
While functions help us with code reusability, classes and objects let us model the real world.
A class allows the encapsulation of data and functions into one.

8. Free and Open-Source

Like we said earlier, Python is freely available. But not only can you download Python for
free, but you can also download its source code, make changes to it, and even distribute it. It
downloads with an extensive collection of libraries to help you with your tasks.

09. Portable

When you code your project in a language like C++, you may need to make some changes to
it if you want to run it on another platform. But it isn’t the same with Python. Here, you need
to code only once, and you can run it anywhere. This is called Write Once Run Anywhere
(WORA). However, you need to be careful enough not to include any system-dependent
features.

10. Interpreted

Lastly, we will say that it is an interpreted language. Since statements are executed one by
one, debugging is easier than in compiled languages.

Any doubts till now in the advantages of Python? Mention in the comment section.
Advantages of Python Over Other Languages :

1. Less Coding

Almost all of the tasks done in Python requires less coding when the same task is done in
other languages. Python also has an awesome standard library support, so you don’t have to
search for any third-party libraries to get your job done. This is the reason that many people
suggest learning Python to beginners.

2. Affordable

Python is free therefore individuals, small companies or big organizations can leverage the
free available resources to build applications. Python is popular and widely used so it gives
you better community support.

The 2019 Github annual survey showed us that Python has overtaken Java in the most
popular programming language category.

3. Python is for Everyone

Python code can run on any machine whether it is Linux, Mac or Windows. Programmers
need to learn different languages for different jobs but with Python, you can professionally
build web apps, perform data analysis and machine learning, automate things, do web
scraping and also build games and powerful visualizations. It is an all-rounder programming
language.

Disadvantages of Python

So far, we’ve seen why Python is a great choice for your project. But if you choose it, you
should be aware of its consequences as well. Let’s now see the downsides of choosing Python
over another language.

1. Speed Limitations

We have seen that Python code is executed line by line. But since Python is interpreted, it
often results in slow execution. This, however, isn’t a problem unless speed is a focal point
for the project. In other words, unless high speed is a requirement, the benefits offered by
Python are enough to distract us from its speed limitations.
Weak in Mobile Computing and Browsers

While it serves as an excellent server-side language, Python is much rarely seen on the client-
side. Besides that, it is rarely ever used to implement smartphone-based applications. One
such application is called Carbonnelle.

The reason it is not so famous despite the existence of Brython is that it isn’t that secure.

2. Design Restrictions

As you know, Python is dynamically-typed. This means that you don’t need to declare the
type of variable while writing the code. It uses duck-typing. But wait, what’s that? Well, it
just means that if it looks like a duck, it must be a duck. While this is easy on the
programmers during coding, it can raise run-time errors.

3 .Underdeveloped Database Access Layers

Compared to more widely used technologies like JDBC (Java DataBase


Connectivity) and ODBC (Open DataBase Connectivity), Python’s database access layers are
a bit underdeveloped. Consequently, it is less often applied in huge enterprises.

4. Simple

No, we’re not kidding. Python’s simplicity can indeed be a problem. Take my example. I
don’t do Java, I’m more of a Python person. To me, its syntax is so simple that the verbosity
of Java code seems unnecessary.

This was all about the Advantages and Disadvantages of Python Programming Language.

History of Python : -

What do the alphabet and the programming language Python have in common? Right, both
start with ABC. If we are talking about ABC in the Python context, it's clear that the
programming language ABC is meant. ABC is a general-purpose programming language and
programming environment, which had been developed in the Netherlands, Amsterdam, at the
CWI (Centrum Wiskunde &Informatica). The greatest achievement of ABC was to influence
the design of Python.Python was conceptualized in the late 1980s. Guido van Rossum worked
that time in a project at the CWI, called Amoeba, a distributed operating system. In an
interview with Bill Venners1, Guido van Rossum said: "In the early 1980s, I worked as an
implementer on a team building a language called ABC at Centrum voor Wiskunde en
Informatica (CWI). I don't know how well people know ABC's influence on Python. I try to
mention ABC's influence because I'm indebted to everything I learned during that project and
to the people who worked on it."Later on in the same Interview, Guido van Rossum
continued: "I remembered all my experience and some of my frustration with ABC. I decided
to try to design a simple scripting language that possessed some of ABC's better properties,
but without its problems. So I started typing. I created a simple virtual machine, a simple
parser, and a simple runtime. I made my own version of the various ABC parts that I liked. I
created a basic syntax, used indentation for statement grouping instead of curly braces or
begin-end blocks, and developed a small number of powerful data types: a hash table (or
dictionary, as we call it), a list, strings, and numbers.

4.3 Sample Code:

1.Import the libraries and load the dataset:

First, we are going to import all the modules that we are going to need for training our model.
The Keras library already contains some datasets and MNIST is one of them. So we can
easily import the dataset and start working with it.

The mnist.load_data() method returns us the training data, its labels and also the testing data
and its labels.

import keras from keras.datasets

import mnist from keras.models

import Sequential from keras.layers

import Dense, Dropout, Flatten from keras.layers

import Conv2D, MaxPooling2D from keras

import backend as K (x_train, y_train),


(x_test, y_test) = mnist.load_data()

print(x_train.shape, y_train.shape)

Output Training and Test Data Shape

2. Preprocess the data:

The image data cannot be fed directly into the model so we need to perform some operations
and process the data to make it ready for our neural network. The dimension of the training
data is (60000,28,28). The CNN model will require one more dimension so we reshape the
matrix to shape (60000,28,28,1).

x_train = x_train.reshape(x_train.shape[0], 28, 28, 1)

x_test = x_test.reshape(x_test.shape[0], 28, 28, 1)

input_shape = (28, 28, 1)

#convert class vectors to binary class matrices

y_train =keras.utils.to_categorical(y_train,num_classes)

y_test =keras.utils.to_categorical(y_test, num_classes)

x_train =x_train.astype('float32')

x_test = x_test.astype('float32') x_train /= 255

x_test /= 255
print('x_train shape:', x_train.shape)

print(x_train.shape[0], 'train samples') print(x_test.shape[0], 'test samples')

3. Create the model:

Now we will create our CNN model in Python data science project. A CNN model generally
consists of convolutional and pooling layers.

It works better for data that are represented as grid structures, this is the reason why CNN works
well for image classification problems.

The dropout layer is used to deactivate some of the neurons and while training, it reduces offer
fitting of the model. We will then compile the model with the Adadelta optimizer.

batch_size = 128 num_classes = 10 epochs = 10

Model = Sequential()

model.add(Conv2D(32, kernel_size
(3,3),activation='relu',input_shape=input_shap e))
model.add(Conv2D(64,(3,3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25)) model.add(Flatten())

Model.add(Dense(256, activation='relu'))

model.add(Dropout(0.5))

model.add(Dense(num_classes, activation='softmax'))
model.compile(loss=keras.losses.categorica
l_crossentropy,optimizer=keras.optimiz ers.Adadelta(),metrics=['accuracy'] )

4.Train the Model :

The model.fit() function of Keras will start the training of the model. It takes the training data,
validation data, epochs, and batch size.It takes some time to train the model. After training, we
save the weights and model definition in the ‘mnist.h5’ file.

Epochs Running:

Hist = model.fit(x_tra in,y_train,batch_size=batch_size,epochs=epochs,verbose=1,validation_

data=(x_te st, y_test))

print("The model has successfully trained")


model.save('mnist.h5')

print("Saving the model as mnist.h5")


5. Evaluate the model :

We have 10,000 images in our dataset which will be used to evaluate how good our model
works. The testing data was not involved in the training of the data therefore, it is new data for
our model. The MNIST dataset is well balanced so we can get around 99% accuracy.

score = model.evaluate(x_test, y_test,


verbose=0) print('Test loss:', score[0])

print('Test accuracy:', score[1])


6. Create GUI to predict characters:

Now for the GUI, we have created a new file in which we build an interactive window to draw
characters on canvas and with a button, we can recognize the character. The Tkinter library
comes in the Python standard library. We have created a function predict_character() that takes
the image as input and then uses the trained model to predict the character.Then we create the
App class which is responsible for building the GUI for our app. We create a canvas where we
can draw by capturing the mouse event and with a button, we trigger the predict_character()
function and display the results. Here’s the full code for our gui_character_recognizer.py file:

rom keras.models

import load_model from kinter

import *
Import tkinter as tk

import win32gui from PIL import ImageGrab, Image import numpy as np

model = load_model('mnist.h5')

def predict_character(img):

#resize image to 28x28 pixels img = img.resize((28,28)) #convert rgb to grayscale img =
img.convert('L')

img = np.array(img)

#reshaping to support our model input and normalizing

img = img.reshape(1,28,28,1) img = img/255.0

#predicting the class

res = model.predict([img])[0] return np.argmax(res), max(res)

class App(tk.Tk): def init (self):

tk.Tk. init (self) self.x = self.y = 0

# Creating elements

self.canvas = tk.Canvas(self, width=300, height=300, bg = "white", cursor="cross") self.label =


tk.Label(self, text="Thinking..", font=("Helvetica", 48))

self.classify_btn = tk.Button(self, text ="Recognise",command = self.classify_handwriting)

self.button_clear = tk.Button(self, text = "Clear", command = self.clear_all)

# Grid structure

self.canvas.grid(row=0, column=0, pady=2, sticky=W, ) self.label.grid(row=0,


column=1,pady=2, padx=2) self.classify_btn.grid(row=1, column=1, pady=2, padx=2)
self.button_clear.grid(row=1, column=0, pady=2)

#self.canvas.bind("<Motion>", self.start_pos)

self.canvas.bind("<B1-Motion>", self.draw_lines)
def clear_all(self): self.canvas.delete("all")

def classify_handwriting(self):

HWND = self.canvas.winfo_id() # get the handle of the canvas

rect = win32gui.GetWindowRect(HWND) # get the coordinate of the canvas im =


ImageGrab.grab(rect)

character, acc = predict_character(im)

self.label.configure(text= str(character)+', '+ str(int(acc*100))+'%')

def draw_lines(self, event): self.x = event.x

self.y = event.y r=8

self.canvas.create_oval(self.x-r, self.y-r, self.x + r, self.y + r, fill='black')

app = App() mainloop()


5.System testing and Screenshots

5.1 System Testing

5.1.1 Unit Testing

Unit testing verification efforts on the smallest unit of software design, module.
This is known as “Module Testing”. The modules are tested separately. This testing is
carried out during programming stage itself. In these testing steps, each module is found
to be working satisfactorily as regard to the expected output from the module.

Integration Testing/Function testing

Integration testing is a systematic technique for constructing tests to uncover error associated
within the interface. In the project, all the modules are combined and then the entire programmer
is tested as a whole. In the integration-testing step, all the error uncovered is corrected for the
next testing steps.

White box testing

White box testing is a form of application testing that provides the tester with complete
knowledge of the application being tested, including access to source code and design
documents. This in-depth visibility makes it possible for white box testing to identify issues that
are invisible to gray and black box testing.

5.1.4 Black box testing

Black box testing involves testing a system with no prior knowledge of its internal workings. A
tester provides an input, and observes the output generated by the system under test. This makes
it possible to identify how the system responds to expected and unexpected user actions, its
response time, usability issues and reliability issues.

Black box testing is a powerful testing technique because it exercises a system end-to-end. Just
like end-users “don’t care” how a system is coded or architected, and expect to receive an
appropriate response to their requests, a tester can simulate user activity and see if the system
delivers on its promises. Along the way, a black box test evaluates all relevant subsystems,
including UI/UX, web server or application server, database, dependencies, and integrated
systems.

Acceptance testing

When the system has no measure problem with its accuracy, the system passes through a final
acceptance test. This test confirms that the system needs the original goal, Objective and
requirements established during analysis. If the system fulfils all the requirements, it is finally
acceptable and ready for operation.

5.2 Screenshots
5.2.1 HOMESCREEN
5.2.2 UPLOAD IMAGE
5.2.3 IMAGE UPLOAD
5.2.4 IMAGE COMPRESSION
5.2.5 SIZE COMPRESSION IN DIRECTORY
5.2.6 SIZE COMPARSION AFTER COMPRESSION
5.2.7 IMAGE COMPARSION GRAPH

6. CONCLUSION AND FUTURE WORK

6.1 CONCLUSION

This article takes a comprehensive look at the most efficient, cutting-edge technologies
and methods that have been used in HCR in the past. Although the amount of work that has been
done in this field is substantial, the high demand for HCR necessitates the development of more
effective and accurate algorithms that require less time and storage.

A complicated study topic has been presented as a result of the wide variety of human
handwriting and the various character write-ups. This paper presents a concise analysis of all of
the significant algorithms that were addressed. Researchers will get in-depth information of the
ongoing work being done on this issue as a result of this study.

Handwritten Character Recognition Using Efficient Net B2 With Transfer Learning and
Two Dense Layers Is the Focus of This Research This paper explores and concentrates on
Handwritten Character Recognition using Dataset DHCD of Devanagari script, which has 92000
pictures of 46 distinct classes. Because these techniques scale all of the factors uniformly, unlike
other CNN methodologies, which makes them superior, transfer learning model helps us in rapid
progressive accuracy of our result, and The EfficientNet are the face of CNN after their
publication in 2019. This will happen in 2019 because these techniques scale all of the factors
uniformly. In addition, two thick layers facilitate the processing of data through the utilization of
metrics vector multiplication from all of the neurons in the layer that comes before it. Even if a
vast amount of work has already been done in this sector, there is still a lot of research that needs
to be done in order to achieve a high level of accuracy while taking into account a large time
factor.

6.2 FUTURE SCOPE

This article assists scholars in this sector in examining new methods and comparing them
to one another, which might be useful for further research and development in the years to come.
Our model's calculations are dependent on the size of the picture contained in the DCH dataset.
A more complicated dataset may be utilized in the testing of our model, which may lead to
outcomes that are distinct from those predicted by the model.

The accuracy achieved by the other approaches is compared in table 1.1 above, along
with the accuracy achieved by our method, which is superior to the accuracy achieved by the
other ways. In addition, this precision may be enhanced by refining the tuning of a number of
factors, although doing so may require a significant investment of time and may or may not be
technically viable. This work is being presented as a starting effort, and the objective is to
simplify the progress and process of identifying hand-written Indo characters. This yields an
accuracy in validation that is 99.49 percent accurate.
7. BIBILOGRAPHY & REFERENCES
[1] H. Zeng,(2020)An Off-line Handwriting Recognition Employing Tensorflow. International
Conference on Big Data, Artificial Intelligence and Internet of Things Engineering (ICBAIE).

[2] A. Beltrán and S. Mendoza, “Efficient algorithm for real-time handwritten character
recognition in mobile devices “,2011, 8th International Conference on Electrical Engineering,
Computing Science and Automatic Control, 2011, pp. 1-6, doi: 10.1109/ICEEE.2011.6106583.

[3] H. A. Shiddieqy, T. Adiono and I. Syafalni, \"Mobile Client-Server Approach for


Handwriting Digit Recognition\'\',2019 International Symposium on Electronics and Smart
Devices (ISESD), 2019, pp. 1-4, doi: 10.1109/ISESD.2019.8909448.

[4] R. Vaidya, D. Trivedi, S. Satra and P. M. Pimpale, \"Handwritten Character Recognition


Using DeepLearning\",2018 Second International Conference on Inventive Communication and
Computational Technologies (ICICCT), 2018, pp. 772-775, doi: 10.1109/ICICCT.2018.8473291.

[5] H. Du, P. Li, H. Zhou, W. Gong, G. Luo and P. Yang, \"WordRecorder: Accurate Acoustic-
based Handwriting Recognition Using Deep Learning,\"IEEE INFOCOM 2018 - IEEE
Conference on Computer Communications, 2018, pp. 1448-1456, doi:
10.1109/INFOCOM.2018.8486285. Fig - 7: Image capturing Fig -8: Printed text output © 2021,
IRJET | Impact Factor value: 7.529

[6] Kavitha, D., & Shamini, P. (2016),\"Handwritten Document into Digitized Text Using
Segmentation Algorithm\", An International Journal of Advanced Computer Technology.
Retrieved from https://ijact.in/index.php/ijact/article/view/465

[7] Chammas, C. Mokbel, R. Al Hajj Mohamad, C. Oprean, L. L. Sulem and G.


Chollet, \"Reducing language barriers for tourists using handwriting recognition enabled mobile
application,\"2012 2nd International Conference on Advances in Computational Tools for
Engineering Applications (ACTEA), 2012, pp. 20-23, doi: 10.1109/ICTEA.2012.6462868

[8] S. B. K.S., V. Bhat and A. S. Krishnan, \"SolveIt: An Application for Automated Recognition
and Processing of Handwritten Mathematical Equations, \"2018 4th International Conference for
Convergence in Technology (I2CT), 2018, pp. 1-8, doi: 10.1109/I2CT42659.2018.9058273.

[9] T. Mantoro, A. M. Sobri and W. Usino, \"Optical Character Recognition (OCR) Performance
in ServerBased Mobile Environment,\"2013 International Conference on Advanced Computer
Science Applications and Technologies, 2013, pp. 423-428, doi: 10.1109/ACSAT.2013.89.

[10] V. V. Mainkar, J. A. Katkar, A. B. Upade and P. R. Pednekar, \"Handwritten Character


Recognition to Obtain Editable Text,\" 2020 International Conference on Electronics and
Sustainable Communication Systems (ICESC), 2020, pp. 599-602, doi:
10.1109/ICESC48915.2020.9155786.

You might also like