Handwritten Character Recognition System
Handwritten Character Recognition System
Handwritten Character Recognition System
Abstract
Handwritten character identification is a topic that has been researched for years
and is an area of interest for the community of Pattern recognition researchers since
It may be put to use in a wide range of fascinating applications. all across the field.
This subject is a difficult challenge as a task because each person has their own
unique writing style. SVM, ANN, and CNN models are some of the available
options for handling this problem's many different ways and approaches. HCR is a
need in the modern world since it assists us in a variety of fields of public domain,
which makes it all the more vital to study in depth. Off-line character recognition
and online character recognition are both examples of the hybrid character
recognition (HCR) category. In this study, we review the many existing algorithms
that have been implemented to get the better knowledge of the course, and we will
come to a conclusion on the best strategies that are currently being developed for
HCR.
1.1 Introduction
Machine learning and deep learning play an important role in computer technology and
artificial intelligence. With the use of deep learning and machine learning, human effort
can be reduced in recognizing, learning, predictions and many more areas.
This article presents recognizing the handwritten characters (A to Z) from the famous
MNIST dataset, comparing classifiers like KNN, PSVM, NN and convolution neural
network on basis of performance, accuracy, time, sensitivity, positive productivity, and
specificity with using different parameters with the classifiers.
To make machines more intelligent, the developers are diving into machine learning and
deep learning techniques. A human learns to perform a task by practicing and repeating it
again and again so that it memorizes how to perform the tasks. Then the neurons in his
brain automatically trigger and they can quickly perform the task they have learned. Deep
learning is also very similar to this. It uses different types of neural network architectures
for different types of problems.
For example – object recognition, image and sound classification, object detection,
image segmentation, etc.
The goal of this project is to create a model that will be able to recognize and determine
handwritten characters from its image by using the concepts of Convolution Neural Network.
Though the goal is to create a model which can recognize the characters, it can be extended to
digits and an individual’s handwriting. The major goal of the proposed system is understanding
Convolutional Neural Network and applying it to the handwritten recognition system.
The process of Handwritten Character Recognition (HWCR) often consists of many steps. Here,
the first one is the Image Acquisition, and in this state, we take all of the possible images and
obtain the input either by taking a picture with a camera or phone, or we can take it by drawing
the image on a sheet of page, and after that, we can scan that using a scanner or the camera on
our phone. Creating drawings with the light stylus is yet another type of image input that may
potentially be used.
The subsequent step in the process is called pre-processing, and it is at this phase that the quality
of the photos is enhanced and improved. After applying various procedures to the input photos,
which may include thinning, skeletonization, normalizing, skew correction, noise removal,
filtering, and binarization, During the process of preprocessing, we provide the photographs a
higher standard of quality by improving and enhancing them.
Use case Diagram
The third stage of the system is called segmentation, and it is important for decomposing the
input picture into meaningful pieces. Additionally, this procedure helps to separate the many
things that are displayed in the object. A basic function shown in figure 1.3. Therefore, we are
able to define segmentation as the condition in which the input is broken down into subparts, and
each subpart is defined as an object
Throughout the subsequent step, we will be classifying the objects that were produced during the
process of segmentation in order to make a determination regarding the category of item to
which each individual object belongs. Therefore, when we are in this stage, it is possible for us to
have classes that are dependent on the object. This is because numerous classes may be
developed for categorization, and as we obtain the properties of the entity, we will be able to
assign a class to all of the objects.
For illustration's sake, let's say we have a class called "car" and another class called "plane." At
this point, any item that exhibits vehicle-like qualities must be assigned to the "car" class owing
to the traits it possesses. In the same way as this system, this review paper investigates the many
approaches to HCR that are already in use and discusses both the benefits and drawbacks
associated with each of them.
The use of a deep neural network (DNN), which, in addition to being an efficient feature
extractor and classifier, as well as one of the many approaches that can be used, also happens to
be one of the many approaches that can be taken to address this issue, is one of the many ways
that this problem can be solved.. However, there is a catch, and that is the necessity for an
excessively long period of time to train the network. This is because the network has a
significant number of nonlinear hidden layers, in addition to certain connections. While Deep
Neural Networks (DNN) were developed to solve these challenges, Convolutional Neural
Networks (CNN) were developed to solve these problems by implementing nonlinear hidden
layers in a smaller amount compared to DNN .
This is the key reason that we use CNN to extract the characteristics of position-invariant. CNN
has a simpler structure compared to DNN, which is why we employ it. Because it provides the
user with temporal subsampling and allows for a degree of rotation, shift invariance, and
distortion, a map may be constructed between the input picture or dataset and the output dataset
through the usage of a convolutional neural network with relative ease.
Fans of machine learning and data mining have already put in a significant amount of work to
improve their chances of successfully approximating pattern recognition. Back checks are a
perfect illustration of how dependent we are on HCR as a medium, which takes the information
and lets us communicate with others. HCR has a significant effect on the way we live today
because of how dependent we are on it.
The same debate occurs whenever we make use of the distortion of the HCR. This is due to the
fact that various locations house a variety of languages, and these locations will also provide a
variety of handwriting styles and methods. As a result, it is somewhat challenging to maintain
control and extract the character from the language that is provided.
2. LITERATURE SURVEY
An early notable attempt in the area of character recognition research is by Grimsdale in
1959. The origin of a great deal of research work in the early sixties was based on an
approach known as analysis- by-synthesis method suggested by Eden in 1968. The great
importance of Eden's work was that he formally proved that all handwritten characters
are formed by a finite number of schematic features, a point that was implicitly included
in previous works.
This notion was later used in all methods in syntactic (structural) approaches of character
recognition.
If you want to perform key duplication with IBBE, it generally means that you want to create
multiple private keys for the same identity. Key duplication might be necessary in some
scenarios, such as when multiple devices or users need access to the same encrypted content. To
achieve key duplication in an IBBE system, you would typically have an authority that can issue
multiple private keys for a single identity while ensuring the security and authentication of those
keys.
Other Interpretations of "IBBE": If you meant something different by "IBBE" or if it stands
for a specific technology or system not covered above, please provide more context or clarify the
acronym so I can give you a more accurate response.
In any case, working with cryptographic keys, especially for encryption or decryption, should be
done with great care and consideration of security implications. Unauthorized key duplication
can lead to security breaches, so it's essential to follow best practices and adhere to security
protocols when managing cryptographic keys in any system.
A "serverless distributed file system" typically refers to a file system architecture that operates
without traditional, dedicated servers for storage and file management. Instead, it relies on a
decentralized and distributed model where various nodes or devices participate in storing and
accessing files. Here are some key aspects of such a system:
2. **Distributed Storage:** Files are distributed across the network of nodes rather than being
stored on a single, dedicated server. Each node may contribute storage space, and files are
distributed across these nodes for redundancy and fault tolerance.
3. **Load Balancing:** The system typically employs load balancing mechanisms to evenly
distribute file requests and storage across the participating nodes. This helps ensure that no single
node becomes a performance bottleneck.
5. **Dynamic Scaling:** Serverless systems often support dynamic scaling, allowing nodes to
join or leave the network without disrupting the overall file system's operation. This scalability is
a key advantage.
6. **Access Control:** Access control mechanisms are essential to ensure that only authorized
users or nodes can access specific files. These systems often employ encryption and
authentication techniques for security.
7. **P2P Networking:** Many serverless distributed file systems are built on peer-to-peer
(P2P) networking principles, where nodes communicate directly with each other to exchange
data and manage the file system.
8. **Metadata Management:** Managing file metadata, such as file names, permissions, and
file locations, is a crucial aspect of distributed file systems. Metadata may be distributed across
nodes or managed centrally, depending on the design.
- **Ceph:** A distributed storage platform that provides object storage, block storage, and file
storage capabilities.
- **Storj:** A decentralized cloud storage platform that leverages a global network of nodes to
store and retrieve data.
These systems are designed to address various use cases, including data resilience, privacy, and
scalability, by distributing storage and file management tasks across a network of nodes rather
than relying on traditional, centralized servers.
The Google File System (GFS) is a distributed file system developed by Google to meet their
storage needs. It was first described in a research paper published by Google in 2003. GFS is
designed to provide high availability, reliability, and scalability for storing and managing large
amounts of data across a distributed infrastructure. Here are some key features and
characteristics of the Google File System:
1. **Scalability:** GFS is designed to handle massive amounts of data, and it is highly scalable.
It is capable of storing petabytes of data across a cluster of commodity hardware.
2. **Fault Tolerance:** GFS is built with fault tolerance in mind. It automatically replicates
data across multiple machines to ensure that data is not lost in the event of hardware failures.
4. **Large File Support:** GFS is optimized for handling large files, particularly for Google's
applications, like Google Search and Google MapReduce, which require the storage and
processing of large datasets.
5. **Atomic Record Append:** GFS supports an atomic record append operation, which is
crucial for applications like Google's MapReduce, where data needs to be appended to files
reliably.
6. **High Throughput:** GFS is designed for high throughput, making it suitable for
applications that require rapid data access and processing.
7. **Simple Interface:** GFS provides a simple file system interface, which simplifies
application development and integration.
10. **Garbage Collection:** GFS includes a garbage collection mechanism to manage the
storage space efficiently by reclaiming space used by deleted or obsolete data.
GFS was developed specifically to support Google's internal infrastructure and applications, and
it played a crucial role in the success of services like Google Search and Google MapReduce.
While GFS itself is not available for public use, its design principles and concepts have
influenced the development of other distributed file systems and storage solutions, such as
Hadoop HDFS (Hadoop Distributed File System) and the open-source Ceph file system.
Convergent key management is a cryptographic approach that allows multiple parties to derive
the same encryption key from the same data, even if they do not necessarily have access to each
other's data or communicate directly. This approach is particularly useful in scenarios where
different entities need to encrypt and decrypt data consistently, even when they are distributed or
operate independently. Here's how convergent key management works:
3. **Key Derivation:** The hash value generated by the convergent hash function is used as the
basis for deriving an encryption key. Multiple parties or systems can independently perform this
process, given the same data and convergent hash function. As long as they use the same data,
they will derive the same encryption key.
4. **Encryption and Decryption:** These parties can then use the derived encryption key to
encrypt and decrypt data securely. Since they all use the same data and convergent hash function,
they can consistently generate the same key, ensuring that they can access the data encrypted by
others.
- **Consistency:** It ensures that different parties can access the same data consistently, even
in distributed or decentralized systems.
- **Data Deduplication:** By using convergent key management, duplicate data can be
identified and stored only once, which can save storage space.
- **Security:** When properly implemented, convergent key management can provide strong
security for data encryption, as long as the convergent hash function is secure and collision-
resistant.
- **Privacy:** Convergent key management implies that the same data will result in the same
encryption key. This can raise privacy concerns, as it could allow parties to deduce that they are
working with the same or similar data.
- **Security Risks:** The security of convergent key management relies heavily on the security
of the convergent hash function. If the hash function is compromised, it could lead to the
exposure of encrypted data.
- **Scalability:** In large-scale systems, managing convergent keys for a vast amount of data
can become complex, and it requires careful key management practices.
Convergent key management is a useful technique in certain scenarios, but its suitability depends
on the specific requirements and security considerations of the system in which it is
implemented. It is important to choose a secure and well-vetted convergent hash function and
follow best practices for key management to mitigate potential risks.
3. SYSTEM ANALYSIS AND DESIGN
In most of the existing systems recognition accuracy is heavily dependent on the quality of the
input document. In handwritten text adjacent characters tend to be touched or overlapped.
Therefore it is essential to segment a given string correctly into its character components. In most
of the existing segmentation algorithms, human writing is evaluated empirically to deduce rules.
But there is no guarantee for the optimum results of these heuristic rules in all styles of writing.
Moreover handwriting varies from person to person and even for the same person it varies
depending on mood, speed etc. This requires incorporating artificial neural networks, hidden
Markov models and statistical classifiers to extract segmentation rules based on numerical data.
Disadvantages:
1. High Complexity.
2. Difficult to analysis.
3. Time Consumption Is More
The user can upload the image from the system’s storage. The uploaded image is then
processed in a neural network model (NN model) which identifies the characters, i.e.; the
digits, alphabets or special symbols. After identifying these characters, they are converted
into text (printed text) and this processed document is sent back to the user as output.
Advantages
1. Less Complicated.
2. Easy to process
3. Accuracy is more
The feasibility of the project is analyzed in this phase and business proposal is put forth
with a very general plan for the project and some cost estimates. During system analysis the
feasibility study of the proposed system is to be carried out. This is to ensure that the proposed
system is not a burden to the company. For feasibility analysis, some understanding of the major
requirements for the system is essential.
ECONOMICAL FEASIBILITY
TECHNICAL FEASIBILITY
SOCIAL FEASIBILITY
This study is carried out to check the economic impact that the system will have on the
organization. The amount of fund that the company can pour into the research and development
of the system is limited. The expenditures must be justified. Thus the developed system as well
within the budget and this was achieved because most of the technologies used are freely
available. Only the customized products had to be purchased.
The aspect of study is to check the level of acceptance of the system by the user. This
includes the process of training the user to use the system efficiently. The user must not feel
threatened by the system, instead must accept it as a necessity. The level of acceptance by the
users solely depends on the methods that are employed to educate the user about the system and
to make him familiar with it. His level of confidence must be raised so that he is also able to
make some constructive criticism, which is welcomed, as he is the final user of the system.
Functional Requirements
A more central requirement is reusability. Although the example focuses on solving a specific
task, the software architecture should be designed in a way that appropriate parts can be reused
for other chatbots in the future. To ensure reusability the software should be documented, stable
and extensible. Usability can be seen as the most important non-functional requirement. The
focus of developing the example chatbot is to design a good user experience and to explore how
interface and interaction design can be best accomplished with the given medium.
3.5 System Architecture
CNNs are a class of Deep Neural Networks that can recognize and classify particular features
from images and are widely used for analyzing visual images. Their applications range from
image and video recognition, image classification, medical image analysis, computer vision and
natural language processing.
The term ‘Convolution” in CNN denotes the mathematical function of convolution which is a
special kind of linear operation wherein two functions are multiplied to produce a third function
which expresses how the shape of one function is modified by the other. In simple terms, two
images which can be represented as matrices are multiplied to give an output that is used to
extract features from the image.
Technically, deep learning CNN models to train and test, each input image will pass it through a
series of convolution layers with filters (Kernals), Pooling, fully connected layers (FC) and apply
Softmax function to classify an object with probabilistic values between 0 and 1. The below
figure is a complete flow of CNN to process an input image and classifies the objects based on
values.
A convolution tool that separates and identifies the various features of the image for
analysis in a process called as Feature Extraction
A fully connected layer that utilizes the output from the convolution process and predicts
the class of the image based on the features extracted in previous stages.
The multiple occurring of these layers shows how deep our network is, and this formation is
known as the deep neural network.
As we go deeper and deeper in the layers, the complexity is increased a lot. But it might be worth
going as accuracy may increase but unfortunately, time consumption also increases.
1. Convolutional Layer
This layer is the first layer that is used to extract the various features from the input
images. In this layer, the mathematical operation of convolution is performed between the input
image and a filter of a particular size MxM. By sliding the filter over the input image, the dot
product is taken between the filter and the parts of the input image with respect to the size of the
filter (MxM).
Figure 11 Convolutional Layer
The output is termed as the Feature map which gives us information about the image such as
the corners and edges. Later, this feature map is fed to other layers to learn several other
features of the input image.
2. Pooling Layer
In most cases, a Convolutional Layer is followed by a Pooling Layer. The primary aim of
this layer is to decrease the size of the convolved feature map to reduce the computational
costs. This is performed by decreasing the connections between layers and independently
operates on each feature map. Depending upon method used, there are several types of
Pooling operations.
In Max Pooling, the largest element is taken from feature map. Average Pooling calculates
the average of the elements in a predefined sized Image section. The total sum of the
elements in the predefined section is computed in Sum Pooling. The Pooling Layer usually
serves as a bridge between the Convolutional Layer and the FC Layer.
Figure 12 Pooling Layer
The Fully Connected (FC) layer consists of the weights and biases along with the neurons and is
used to connect the neurons between two different layers. These layers are usually placed before
the output layer and form the last few layers of a CNN Architecture.
Fully Connected Layer
In this, the input image from the previous layers are flattened and fed to the FC layer. The
flattened vector then undergoes few more FC layers where the mathematical functions operations
usually take place. In this stage, the classification process begins to take place.
Dropout
Usually, when all the features are connected to the FC layer, it can cause overfitting in the
training dataset. Overfitting occurs when a particular model works so well on the training
data causing a negative impact in the model’s performance when used on a new data.
Dropout layer
To overcome this problem, a dropout layer is utilised wherein a few neurons are dropped from
the neural network during training process resulting in reduced size of the model. On passing a
dropout of 0.3, 30% of the nodes are dropped out randomly from the neural network.
Activation Functions
An activation function in a neural network defines how the weighted sum of the input is
transformed into an output from a node or nodes in a layer of the network.
Sometimes the activation function is called a “transfer function.” If the output range of the
activation function is limited, then it may be called a “squashing function.” Many activation
functions are nonlinear and may be referred to as the “nonlinearity” in the layer or the network
design.
The choice of activation function has a large impact on the capability and performance of the
neural network, and different activation functions may be used in different parts of the model.
Technically, the activation function is used within or after the internal processing of each node
in the network, although networks are designed to use the same activation function for all nodes
in a layer.
A network may have three types of layers: input layers that take raw input from the domain,
hidden layers that take input from another layer and pass output to another layer, and output
layers that make a prediction.
All hidden layers typically use the same activation function. The output layer will typically use
a different activation function from the hidden layers and is dependent upon the type of
prediction required by the model.
Activation functions are also typically differentiable, meaning the first-order derivative can be
calculated for a given input value. This is required given that neural networks are typically
trained using the backpropagation of error algorithm that requires the derivative of prediction
error in order to update the weights of the model.
There are many different types of activation functions used in neural networks, although
perhaps only a small number of functions used in practice for hidden and output layers.
Finally, one of the most important parameters of the CNN model is the activation function.
They are used to learn and approximate any kind of continuous and complex relationship
between variables of the network. In simple words, it decides which information of the model
should fire in the forward direction and which ones should not at the end of the network.
It adds non-linearity to the network. There are several commonly used activation functions such
as the ReLU, Softmax, tanH and the Sigmoid functions. Each of these functions have a specific
usage. For a binary classification CNN model, sigmoid and softmax functions are preferred an
for a multi-class classification, generally softmax us used.
A convolution tool that separates and identifies the various features of the image for
analysis in a process called as Feature Extraction
A fully connected layer that utilizes the output from the convolution process and
predicts the class of the image based on the features extracted in previous stages.
Applications:
1. Object detection: With CNN, we now have sophisticated models like R-CNN, Fast R-
CNN, and Faster R-CNN that are the predominant pipeline for many object detection
models deployed in autonomous vehicles, facial detection, and more.
2. Semantic segmentation: In 2015, a group of researchers from Hong Kong developed a
CNN- based Deep Parsing Network to incorporate rich information into an image
segmentation model. Researchers from UC Berkeley also built fully convolutional
networks that improved upon state-of- the-art semantic segmentation.
Why ConvNets over Feed-Forward Neural Nets?
An image is nothing but a matrix of pixel values, right? So why not just flatten the image (e.g.
3x3 image matrix into a 9x1 vector) and feed it to a Multi-Level Perceptron for classification
purposes?
Convolutional neural network is better than a feed-forward network since CNN has features
parameter sharing and dimensionality reduction. Because of parameter sharing in CNN, the
number of parameters is reduced thus the computations also decreased. The main intuition is the
learning from one part of the image is also useful in another part of the image. Because of the
dimensionality reduction in CNN, the computational power needed is reduced.
All the layers of a CNN have multiple convolutional filters working and scanning the complete
feature matrix and carry out the dimensionality reduction. This enables CNN to be a very apt and
fit network for image classifications and processing.
DFD graphically representing the functions, or processes, which capture, manipulate, store, and
distribute data between a system and its environment and between components of a system. The
visual representation makes it a good communication tool between User and System designer.
Structure of DFD allows starting from a broad overview and expand it to a hierarchy of detailed
diagrams. DFD has often been used due to the following reasons:
A UML diagram is a diagram based on the UML (Unified Modeling Language) with the purpose
of visually representing a system along with its main actors, roles, actions, artifacts or classes, in
order to better understand, alter, maintain, or document information about the system. UML is an
acronym that stands for Unified Modelling Language. Simply put, UML is a modern approach to
modelling and documenting software. In fact, it’s one of the most popular business process
modelling techniques. It is based on diagrammatic representations of software components. As
the old proverb says: “a picture is worth a thousand words”. By using visual representations, we
are able to better understand possible flaws or errors in software or business processes.
Use Case Diagram:
A use case diagram at its simplest is a representation of a user's interaction with the system that
shows the relationship between the user and the different use cases in which the user is involved.
Component diagrams are used to describe the components and deployment diagrams shows how
they are deployed in hardware. UML is mainly designed to focus on the software artifacts of a
system. However, these two diagrams are special diagrams used to focus on software and
hardware components. Most of the UML diagrams are used to handle logical components of the
system.
A use case diagram can identify the different types of users of a system and the different use
cases and will often be accompanied by other types of diagrams as well. The use cases are
represented by either circles or ellipses. A use-case diagram can help provide a higher-level view
of the system. Use-Case provide the simplified and graphical representation of what the system
must actually do.
In software and systems engineering, a use case is a list of actions or event steps typically
defining the interactions between a role known in the Unified Modeling Language (UML) as an
actor and a system to achieve a goal. The actor can be a human or other external system. In
systems engineering, use cases are used at a higher level than within software engineering. The
detailed requirements may then be captured in the Systems Modeling Language. Use case
analysis is an important and valuable requirement analysis technique that has been widely used
in modern software engineering.
Use case diagram
Class diagram model class structure and contents using design elements such as classes,
packages and objects. Class diagram describes 3 perspectives when designing a system
Conceptual, Specification, Implementation. Classes are composed of three things: name,
attributes and operations. Class diagrams also display relations such as containment, inheritance,
associations etc. The association relationship is most common relationship in a class diagram.
The association shows the relationship between instances of classes. The purpose of class
diagram is to model the static view of an application. Class diagrams are the only diagrams
which can be directly mapped with object-oriented languages and thus widely used at the time of
construction. UML diagrams like activity diagram, sequence diagram can only give the sequence
flow of the application, however class diagram is a bit different. It is the most popular UML
diagram in the coder community.
The purpose of the class diagram can be summarized as −
Class Diagram
3.7.3 Sequence Diagram
SEQUENCE DIAGRAM
4.1 Modules
The quantity & quality of your data dictate how accurate our model is
The outcome of this step is generally a representation of data (Guo simplifies to
specifying a table) which we will use for training
Using pre-collected data, by way of datasets from Kaggle, UCI, etc., still fits into this
step.
Different algorithms are for different tasks; choose the right one.
Uses some metric or combination of metrics to "measure" objective performance of model .Test
the model against previously unseen data .This unseen data is meant to be somewhat
representative of model performance in the real world, but still helps tune the model (as opposed
to test data, which does not) .Good train/eval split? 80/20, 70/30, or similar, depending on
domain, data availability, dataset particulars, etc.
1. Keras:
Keras is a powerful and easy-to-use free open source Python library for
developing and evaluating deep learning models.
Keras is based on minimal structure that provides a clean and easy way to create
deep learning models based on TensorFlow or Theano. Keras is designed to quickly
define deep learning models. Well, Keras is an optimal choice for deep learning
applications.
2. TensorFlow:
TensorFlow is a Python library for fast numerical computing created and released by
Google. It is a foundation library that can be used to create Deep Learning models
directly or by using wrapper libraries that simplify the process built on top of
TensorFlow. TensorFlow tutorial is designed for both beginners and professionals. Our
tutorial provides all the basic and advanced concept of machine learning and deep
learning concept such as deep neural network, image processing and sentiment analysis.
TensorFlow is one of the famous deep learning frameworks, developed by Google Team.
It is a free and open source software library and designed in Python programming
language, this tutorial is designed in such a way that we can easily implements deep
learning project on TensorFlow in an easy and efficient way. Unlike other numerical
libraries intended for use in Deep Learning like Theano, TensorFlow was designed for
use both in research and development and in production systems. It can run on single
CPU systems, GPUs as well as mobile devices and largescale distributed systems of
hundreds of machines.
3. Numpy:
NumPy is a Python library used for working with arrays. It also has functions for
working in domain of linear algebra, Fourier transform, and matrices. Numpy which
stands for Numerical Python, is a library consisting of multidimensional array objects and
a collection of routines for processing those arrays. Using NumPy, mathematical and
logical operations on arrays can be performed. This tutorial explains the basics of NumPy
such as its architecture and environment. It also discusses the various array functions,
types of indexing, etc. It is an open source project and you can use it freely. NumPy
stands for Numerical Python. NumPy aims to provide an array object that is up to 50x
faster than traditional Python lists. The array object in NumPy is called ndarray, it
provides a lot of supporting functions that make working with ndarray very
easy. Arrays are very frequently used in data science, where speed and resources are very
important.
4. Pillow:
Pillow is a free and open source library for the Python programming language that allows
you to easily create &s manipulate digital images. Pillow is built on top of PIL (Python
Image Library). PIL is one of the important modules for image processing in Python.
However, the PIL module is not supported since 2011 and doesn’t support python. Pillow
module gives more functionalities, runs on all major operating system and support for
python.It supports wide variety of images such as “jpeg”, “png”, “bmp”, “gif”, “ppm”,
“tiff”. You can do almost anything on digital images using pillow module. Apart from
basic image processing functionality, including point operations, filtering images using
built-in convolution kernels, and color space conversions.
5. Tkinkter:
Tkinter is the standard GUI library for Python. Python when combined with Tkinter
provides a fast and easy way to create GUI applications. Tkinter provides a powerful
object-oriented interface to the Tk GUI toolkit. We need to import all the modules that
we are going to need for training our model. The Keras library already contains some
datasets and MNIST is one of them. So we can easily import the dataset through Keras.
The mnist.load_data() method returns the training data, its labels along with the testing
data and its labels.
The database contains 60,000 images used for training as well as few of them can be
used for cross- validation purposes and 10,000 images used for testing. All the digits are
grayscale and positioned in a fixed size where the intensity lies at the center of the image
with 28×28 pixels. Since all the images are 28×28 pixels, it forms an array which can be
flattened into 28*28=784 dimensional vector. Each component of the vector is a binary
value which describes the intensity of the pixel.
Pre-Processing
A Data Quality Assessment is a distinct phase within the data quality life-cycle that
is used to verify the source, quantity and impact of any data items that breach pre-defined
data quality rules. The Data Quality Assessment can be executed as a one-off process or
repeatedly as part of an ongoing data quality assurance initiative.
The quality of your data can quickly decay over time, even with stringent data
capture methods cleaning the data as it enters your database. People moving house, changing
phone numbers and passing away all mean the data you hold can quickly become out of
date.
A Data Quality Assessment helps to identify those records that have become
inaccurate, the potential impact that inaccuracy may have caused and the data’s source.
Through this assessment, it can be rectified and other potential issues identified.
Data cleaning:
Data cleaning is one of the important parts of machine learning. It plays a significant
part in building a model. It surely isn’t the fanciest part of machine learning and at the same
time, there aren’t any hidden tricks or secrets to uncover. However, proper data cleaning can
make or break your project. Professional data scientists usually spend a very large portion of
their time on this step. Because of the belief that, “Better data beats fancier algorithms”. If
we have a well-cleaned dataset, we can get desired results even with a very simple
algorithm, which can prove very beneficial at times. Obviously, different types of data will
require different types of cleaning. However, this systematic approach can always serve as a
good starting point.
Data transformation
In fact, by cleaning and smoothing the data, we have already performed data modification.
However, by data transformation, we understand the methods of turning the data into an
appropriate format for the computer to learn from. Data transformation is the process in
which data is taken from its raw, siloed and normalized source state and transform it into
data that’s joined together, dimensionally modelled, de-normalized, and ready for analysis.
Without the right technology stack in place, data transformation can be time-consuming,
expensive, and tedious. Nevertheless, transforming the data will ensure maximum data
quality which is imperative to gaining accurate analysis, leading to valuable insights that
will eventually empower data-driven decisions.
Building and training models to process data is a brilliant concept, and more enterprises
have adopted, or plan to deploy, machine learning to handle many practical applications. But
for models to learn from data to make valuable predictions, the data itself must be organized
to ensure its analysis yield valuable insights.
Data reduction:
Data reduction is a process that reduced the volume of original data and represents it in a
much smaller volume. Data reduction techniques ensure the integrity of data while reducing
the data. The time required for data reduction should not overshadow the time saved by the
data mining on the reduced data set.
1. Dimensionality Reduction:
a. Wavelet Transform
b. Principal Component Analysis
c. Attribute Subset Selection
2. Numerosity Reduction:
a. Parametric
b. Non-Parametric
3. Data Compression:
When you work with large amounts of data, it becomes harder to come up with
reliable solutions. Data reduction can be used to reduce the amount of data and decrease the
costs of analysis. After loading the data, we separated the data into X and y where X is the
image, and y is the label corresponding to X. The first layer/input layer for our model is
convolution. Convolution takes each pixel as a neuron, so we need to reshape the images
such that each pixel value is in its own space, thus converting a 28x28 matrix of greyscale
values into 28x28x1 tensor. With the right dimensions for all the images, we can split the
images into train and test for further steps.
After loading the data, we separated the data into X and y where X is the image, and
y is the label corresponding to X. The first layer/input layer for our model is convolution.
Convolution takes each pixel as a neuron, so we need to reshape the images such that each
pixel value is in its own space, thus converting a 28x28 matrix of greyscale values into
28x28x1 tensor. With the right dimensions for all the images, we can split the images into
train and test for further steps.
Data Encoding:
This is an optional step since we are using the cross-categorical entropy as loss function. We
have to specify the network that the given labels are categorical in nature. The raw data can
contain various different types of data which can be both structured and unstructured and
needs to be processed in order to bring to form that is usable in the Machine Learning
models. Since machine learning is based on mathematical equations, it would cause a
problem when we keep categorical variables as is. Many algorithms support categorical
values without further manipulation, but in those cases, it’s still a topic of discussion on
whether to encode the variables or not. After the identification of the data types of the
features present in the data set, the next step is to process the data in a way that is suitable to
put to Machine Learning models. The three popular techniques of converting Categorical
values to Numeric values are done in two different methods.
1. Label Encoding.
2. One Hot Encoding.
3. Binary Encoding.
Now, comes the fun part where we finally get to use the meticulously prepared data for
model building. Depending on the data type (qualitative or quantitative) of the target
variable (commonly referred to as the Y variable) we are either going to be building a
classification (if Y is qualitative) or regression (if Y is quantitative) model.
Learning Algorithms :
It is a machine learning task that establishes the mathematical relationship between input X
and output Y variables. Such X, Y pair constitutes the labeled data that are used for model
building in an effort to learn how to predict the output from the input. Supervised learning
problems can be further grouped into regression and classification problems.
2. Unsupervised learning — is a machine learning task that makes use of only the input X
variables. Such X variables are unlabeled data that the learning algorithm uses in modeling
the inherent structure of the data. Unsupervised learning problems can be further grouped into
clustering and association problems.
Clustering: A clustering problem is where you want to discover the inherent groupings
in the data, such as grouping customers by purchasing behavior.
Association: An association rule learning problem is where you want to discover rules
that describe large portions of your data, such as people that buy X also tend to buy Y.
4. Reinforcement learning — Reinforcement learning is an area of Machine Learning. It
is about taking suitable action to maximize reward in a particular situation. It is
employed by various software and machines to find the best possible behaviour or
path it should take in a specific situation.
Input: The input should be an initial state from which the model will start
Output: There are many possible output as there are variety of solution to a particular
problem
Training: The training is based upon the input, The model will return a state and the
user will decide to reward or punish the model based on its output.
The model keeps continues to learn.
The best solution is decided based on the maximum reward.
SVM chooses the extreme points/vectors that help in creating the hyperplane.
These extreme cases are called as support vectors, and hence algorithm is termed as
Support Vector Machine. SVM algorithm can be used for Face detection, image
classification, text categorization, etc.
Linear SVM: Linear SVM is used for linearly separable data, which means if a
dataset can be classified into two classes by using a single straight line, then
such data is termed as linearly separable data, and classifier is used called as
Linear SVM classifier.
Non-linear SVM: Non-Linear SVM is used for non-linearly separated data,
which means if a dataset cannot be classified by using a straight line, then such
data is termed as non-linear data and classifier used is called as Non-linear
SVM classifier.
Example: SVM can be understood with the example that we have used in the
KNN classifier. Suppose we see a strange cat that also has some features of dogs, so if
we want a model that can accurately identify whether it is a cat or dog, so such a
model can be created by using the SVM algorithm. We will first train our model with
lots of images of cats and dogs so that it can learn about different features of cats and
dogs, and then we test it with this strange creature. So as support vector creates a
decision boundary between these two data (cat and dog) and choose extreme cases
(support vectors), it will see the extreme case of cat and dog. On the basis of the
support vectors, it will classify it as a cat.
SVM working graph
They have high training time hence in practice not suitable for large datasets. Another
disadvantage is that SVM classifiers do not work well with overlapping classes.
K-NN ALGORITHM:
It is simple to implement.
It is robust to the noisy training data.
It can be more effective if the training data is large.
Always needs to determine the value of K which may be complex some time.
The computation cost is high because of calculating the distance between the data points for all
the training samples.
After data encoding, the images and labels are ready to be fitted into our model. We need to
define a Convolutional Neural Network Model.
In simpler words, CNN is an artificial neural network that specializes in picking out or detect
patterns and make sense of them. Thus, CNN has been most useful for image classification. A
CNN model has various types of filters of different sizes and numbers. These filters are
essentially what helps us in detecting the pattern. The convolutional neural network, or CNN for
short, is a specialized type of neural network model designed for working with two-dimensional
image data, although they can be used with one-dimensional and three-dimensional data.
Central to the convolutional neural network is the convolutional layer that gives the network
its name. This layer performs an operation called a “convolution“. A CNN model generally
consists of convolutional and pooling layers. It works better for data that are represented as
grid structures, this is the reason why CNN works well for image classification problems.
The dropout layer is used to deactivate some of the neurons and while training, it reduces
offer fitting of the model. Our model is composed of feature extraction with convolution and
binary classification. Convolution and max pooling are carried out to extract the features in
the image, and a 32 3x3 convolution filters are applied to a 28x28 image followed by a max-
pooling layer of 2x2 pooling size followed by another convolution layer with 64 3x3 filters.
In the end, we obtain 7x7 images to flatten. Flatten layer will flatten the 7x7 images into a
series of 128 values that will be mapped to a dense layer of 128 neurons that are connected
to the categorical output layer of 10 neurons.
The filter is smaller than the input data and the type of multiplication applied between a
filter-sized patch of the input and the filter is a dot product. A dot product is the element-
wise multiplication between the filter-sized patch of the input and filter, which is then
summed, always resulting in a single value. Because it results in a single value, the
operation is often referred to as the “scalar product” Using a filter smaller than the input is
intentional as it allows the same filter (set of weights) to be multiplied by the input array
multiple times at different points on the input. Specifically, the filter is applied
systematically to each overlapping part or filter-sized patch of the input data, left to right,
top to bottom.
The output from multiplying the filter with the input array one time is a single value. As the
filter is applied multiple times to the input array, the result is a two-dimensional array of
output values that represent a filtering of the input. As such, the two-dimensional output
array from this operation is called a “feature map”.
The idea of applying the convolutional operation to image data is not new or unique to
convolutional neural networks; it is a common technique used in computer vision.
Historically, filters were designed by hand by computer vision experts, which were then
applied to an image to result in a feature map or output from applying the filter then makes
the analysis of the image easier in some way. The network will learn what types of features
to extract from the input. Specifically, training under stochastic gradient descent, the
network is forced to learn to extract features from the image that minimize the loss for the
specific task the network is being trained to solve, e.g. extract features that are the most
useful for classifying images as dogs or cats.
We can better understand the convolution operation by looking at some worked examples
with contrived data and handcrafted filters.
Convolutional neural networks apply a filter to an input to create a feature map that
summarizes the presence of detected features in the input.
Filters can be handcrafted, such as line detectors, but the innovation of convolutional
neural networks is to learn the filters during training in the context of a specific
prediction problem.
WORKING OF CNN:
The behaviour of each neuron is defined by its weights. When fed with the pixel values, the
artificial neurons of a CNN pick out various visual features.
When you input an image into a ConvNet, each of its layers generates several activation
maps. Activation maps highlight the relevant features of the image. Each of the neurons takes
a patch of pixels as input, multiplies their colour values by its weights, sums them up, and
runs them through the activation function
The first (or bottom) layer of the CNN usually detects basic features such as horizontal,
vertical, and diagonal edges. The output of the first layer is fed as input of the next layer,
which extracts more complex features, such as corners and combinations of edges. As you
move deeper into the convolutional neural network, the layers start detecting higher-level
features such as objects, faces, and more.
The operation of multiplying pixel values by weights and summing them is called
“convolution” (hence the name convolutional neural network). A CNN is usually composed
of several convolution layers, but it also contains other components. The final layer of a
CNN is a classification layer, which takes the output of the final convolution layer as input
(remember, the higher convolution layers detect complex objects).
Based on the activation map of the final convolution layer, the classification layer outputs a
set of confidence scores (values between 0 and 1) that specify how likely the image is to
belong to a “class.” For instance, if you have a ConvNet that detects cats, dogs, and horses,
the output of the final layer is the possibility that the input image contains any of those
animals.
The model type that we will be using is Sequential. Sequential is the easiest way to build a
model in Keras. It allows you to build a model layer by layer. We use the ‘add()’ function to
add layers to our model. Our first 2 layers are Conv2D layers. These are convolution layers
that will deal with our input images, which are seen as 2-dimensional matrices. 64 in the
first layer and 32 in the second layer are the number of nodes in each layer. This number can
be adjusted to be higher or lower, depending on the size of the dataset. In our case, 64 and
32 work well, so we will stick with this for now.
Kernel size is the size of the filter matrix for our convolution. So a kernel size of 3
means we will have a 3x3 filter matrix. Refer back to the introduction and the first image for
a refresher on this. Activation is the activation function for the layer. The activation function
we will be using for our first 2 layers is the ReLU, or Rectified Linear Activation. This
activation function has been proven to work well in neural networks.
Our first layer also takes in an input shape. This is the shape of each input image, 28,28,1 as
seen earlier on, with the 1 signifying that the images are greyscale. In between the Conv2D
layers and the dense layer, there is a ‘Flatten’ layer.
Flatten serves as a connection between the convolution and dense layers. ‘Dense’ is
the layer type we will use in for our output layer. Dense is a standard layer type that is used
in many cases for neural networks. We will have 10 nodes in our output layer, one for each
possible outcome (0–9). The activation is ‘softmax’. Softmax makes the output sum up to 1
so the output can be interpreted as probabilities. The model will then make its prediction
based on which option has the highest probability.
Training & Validation :
After the construction of the model the model has to be compiled to train it with the
available data set. Optimizers are used to compile the model. Compiling the model takes
three parameters: optimizer, loss and metrics. Optimizers are algorithms or methods used to
change the attributes of the neural network such as weights and learning rate to reduce the
losses. Optimizers are used to solve optimization problems by minimizing the function.
The optimizer controls the learning rate. We will be using ‘adam’ as our optmizer.
Adam is generally a good optimizer to use for many cases. The adam optimizer adjusts the
learning rate throughout training. The learning rate determines how fast the optimal weights
for the model are calculated. A smaller learning rate may lead to more accurate weights (up
to a certain point), but the time it takes to compute the weights will be longer.
We will use ‘categorical_crossentropy’ for our loss function. This is the most
common choice for classification. A lower score indicates that the model is performing
better. To make things even easier to interpret, we will use the ‘accuracy’ metric to see the
accuracy score on the validation set when we train the model. The idea behind training and
testing any data model is to achieve maximum learning rate and maximum validation. Better
Learning rate and better validation can be achieved by increasing the train and test data
respectively.
Once the model is successfully assembled, then we can train the model with training
data for 100 iterations, but as the number of iteration increases, there is a chance for
overfitting. Therefore we limit the training up to 98% accuracy, as we are using real-world
data for prediction, test data was used to validate the model.
1. Gradient Descent
2. Stochastic Gradient Descent (SGD)
3. Mini Batch Stochastic Gradient Descent (MB-SGD)
4. SGD with momentum
5. Nesterov Accelerated Gradient (NAG)
6. Adaptive Gradient (AdaGrad)
7. AdaDelta
8. RMSprop
9. Adam
ADAM Optimizer :
Adam was presented by Diederik Kingma from OpenAI and Jimmy Ba from the
University of Toronto in their 2015 ICLR paper (poster) titled “Adam: A Method for
Stochastic Optimization“.
The authors describe Adam as combining the advantages of two other extensions of
stochastic gradient descent. Specifically:
Adaptive Gradient Algorithm (AdaGrad) that maintains a per-parameter learning rate that
improves performance on problems with sparse gradients (e.g. natural language and computer
vision problems). Adaptive Moment Estimation is most popular today. ADAM computes
adaptive learning rates for each parameter. In addition to storing an exponentially decaying
average of past squared gradients vt like Adadelta and RMSprop, Adam also keeps an
exponentially decaying average of past gradients mt, similar to momentum
Root Mean Square Propagation (RMSProp) that also maintains per-parameter learning
rates that are adapted based on the average of recent magnitudes of the gradients for the
weight (e.g. how quickly it is changing). This means the algorithm does well on online and
non-stationary problems (e.g. noisy).
Properties of Adam:
1. Actual step size taken by the Adam in each iteration is approximately bounded the
step size hyper-parameter. This property add intuitive understanding to previous
unintuitive learning rate hyper-parameter.
2. Step size of Adam update rule is invariant to the magnitude of the gradient, which
helps a lot when going through areas with tiny gradients (such as saddle points or
ravines). In these areas SGD struggles to quickly navigate through them.
3. Adam was designed to combine the advantages of Adagrad, which works well with
sparse gradients, and RMSprop, which works well in on-line settings. Having both of
these enables us to use Adam for broader range of tasks. Adam can also be looked at
as the combination of RMSprop and SGD with momentum.
Why ADAM?
1. Adam is an optimization algorithm that can be used instead of the classical stochastic
gradient descent procedure to update network weights iterative based in training data.
2. Adam combines the best properties of the AdaGrad and RMSProp algorithms to
provide an optimization algorithm that can handle sparse gradients on noisy problems.
3. Adam is relatively easy to configure where the default configuration parameters do
well on most problems.
1. Loading image
2. Convert the image to greyscale
3. Resize the image to 28x28
4. Converting the image into a matrix form
5. Reshape the matrix into 28x28x1
After pre processing, we predict the label of the image by passing the pre-processed
image through the neural network. The output we get is a list of 10 activation values 0 to 9,
respectively. The position having the highest value is the predicted label for the image.
These structures are called as Neural Networks. It teaches the computer to do what
naturally comes to humans. Deep learning, there are several types of models such as the
Artificial Neural Networks (ANN), Autoencoders, Recurrent Neural Networks (RNN) and
Reinforcement Learning. But there has been one particular model that has contributed a lot in
the field of computer vision and image analysis which is the Convolutional Neural Networks
(CNN) or the ConvNet.
CNNs are a class of Deep Neural Networks that can recognize and classify particular
features from images and are widely used for analyzing visual images. Their applications
range from image and video recognition, image classification, medical image analysis,
computer vision and natural language processing.
Methods for evaluating a model’s performance are divided into 2 categories: namely,
holdout and Cross- validation. Both methods use a test set (i.e data not seen by the model) to
evaluate model performance. It’s not recommended to use the data we used to build the
model to evaluate it. This is because our model will simply remember the whole training set,
and will therefore always predict the correct label for any point in the training set. This is
known as overfitting.
Holdout:
The purpose of holdout evaluation is to test a model on different data than it was trained on.
This provides an unbiased estimate of learning performance.
The holdout approach is useful because of its speed, simplicity, and flexibility. However, this
technique is often associated with high variability since differences in the training and test
dataset can result in meaningful differences in the estimate of accuracy.
Cross-Validation:
Cross-validation is a technique that involves partitioning the original observation dataset into
a training set, used to train the model, and an independent set used to evaluate the analysis.
The most common cross-validation technique is k-fold cross-validation, where the original
dataset is partitioned into k equal size subsamples, called folds. The k is a user-specified
number, usually with 5 or 10 as its preferred value. This is repeated k times, such that each
time, one of the k subsets is used as the test set/validation set and the other k-1 subsets are put
together to form a training set. The error estimation is averaged over all k trials to get the total
effectiveness of our model.
4.2.1 Python :
Python is currently the most widely used multi-purpose, high-level programming language.
Python allows programming in Object-Oriented and Procedural paradigms. Python programs
generally are smaller than other programming languages like Java. Programmers have to
type relatively less and indentation requirement of the language, makes them readable all the
time. Python language is being used by almost all tech-giant companies like – Google,
Amazon, Facebook, Instagram, Dropbox, Uber… etc.
The biggest strength of Python is huge collection of standard library which can be
used for the following –
Machine Learning
GUI Applications (like Kivy, Tkinter, PyQt etc. )
Web frameworks like Django (used by YouTube, Instagram, Dropbox)
Image processing (like Opencv, Pillow)
Web scraping (like Scrapy, BeautifulSoup, Selenium)
Test frameworks
Multimedia
1. Extensive Libraries
Python downloads with an extensive library and it contain code for various purposes like
regular expressions, documentation-generation, unit-testing, web browsers, threading,
databases, CGI, email, image manipulation, and more. So, we don’t have to write the
complete code for that manually.
2. Extensible
As we have seen earlier, Python can be extended to other languages. You can write some of
your code in languages like C++ or C. This comes in handy, especially in projects.
3. Embeddable
Complimentary to extensibility, Python is embeddable as well. You can put your Python code
in your source code of a different language, like C++. This lets us add scripting capabilities to
our code in the other language.
4. Improved Productivity
The language’s simplicity and extensive libraries render programmers more productive than
languages like Java and C++ do. Also, the fact that you need to write less and get more things
done.
5. IOT Opportunities
Since Python forms the basis of new platforms like Raspberry Pi, it finds the future bright for
the Internet Of Things. This is a way to connect the language with the real world.
When working with Java, you may have to create a class to print ‘Hello World’. But
in Python, just a print statement will do. It is also quite easy to learn, understand, and code.
This is why when people pick up Python, they have a hard time adjusting to other more
verbose languages like Java.
6. Readable
Because it is not such a verbose language, reading Python is much like reading English. This
is the reason why it is so easy to learn, understand, and code. It also does not need curly
braces to define blocks, and indentation is mandatory. This further aids the readability of the
code.
7. Object-Oriented
This language supports both the procedural and object-oriented programming paradigms.
While functions help us with code reusability, classes and objects let us model the real world.
A class allows the encapsulation of data and functions into one.
Like we said earlier, Python is freely available. But not only can you download Python for
free, but you can also download its source code, make changes to it, and even distribute it. It
downloads with an extensive collection of libraries to help you with your tasks.
09. Portable
When you code your project in a language like C++, you may need to make some changes to
it if you want to run it on another platform. But it isn’t the same with Python. Here, you need
to code only once, and you can run it anywhere. This is called Write Once Run Anywhere
(WORA). However, you need to be careful enough not to include any system-dependent
features.
10. Interpreted
Lastly, we will say that it is an interpreted language. Since statements are executed one by
one, debugging is easier than in compiled languages.
Any doubts till now in the advantages of Python? Mention in the comment section.
Advantages of Python Over Other Languages :
1. Less Coding
Almost all of the tasks done in Python requires less coding when the same task is done in
other languages. Python also has an awesome standard library support, so you don’t have to
search for any third-party libraries to get your job done. This is the reason that many people
suggest learning Python to beginners.
2. Affordable
Python is free therefore individuals, small companies or big organizations can leverage the
free available resources to build applications. Python is popular and widely used so it gives
you better community support.
The 2019 Github annual survey showed us that Python has overtaken Java in the most
popular programming language category.
Python code can run on any machine whether it is Linux, Mac or Windows. Programmers
need to learn different languages for different jobs but with Python, you can professionally
build web apps, perform data analysis and machine learning, automate things, do web
scraping and also build games and powerful visualizations. It is an all-rounder programming
language.
Disadvantages of Python
So far, we’ve seen why Python is a great choice for your project. But if you choose it, you
should be aware of its consequences as well. Let’s now see the downsides of choosing Python
over another language.
1. Speed Limitations
We have seen that Python code is executed line by line. But since Python is interpreted, it
often results in slow execution. This, however, isn’t a problem unless speed is a focal point
for the project. In other words, unless high speed is a requirement, the benefits offered by
Python are enough to distract us from its speed limitations.
Weak in Mobile Computing and Browsers
While it serves as an excellent server-side language, Python is much rarely seen on the client-
side. Besides that, it is rarely ever used to implement smartphone-based applications. One
such application is called Carbonnelle.
The reason it is not so famous despite the existence of Brython is that it isn’t that secure.
2. Design Restrictions
As you know, Python is dynamically-typed. This means that you don’t need to declare the
type of variable while writing the code. It uses duck-typing. But wait, what’s that? Well, it
just means that if it looks like a duck, it must be a duck. While this is easy on the
programmers during coding, it can raise run-time errors.
4. Simple
No, we’re not kidding. Python’s simplicity can indeed be a problem. Take my example. I
don’t do Java, I’m more of a Python person. To me, its syntax is so simple that the verbosity
of Java code seems unnecessary.
This was all about the Advantages and Disadvantages of Python Programming Language.
History of Python : -
What do the alphabet and the programming language Python have in common? Right, both
start with ABC. If we are talking about ABC in the Python context, it's clear that the
programming language ABC is meant. ABC is a general-purpose programming language and
programming environment, which had been developed in the Netherlands, Amsterdam, at the
CWI (Centrum Wiskunde &Informatica). The greatest achievement of ABC was to influence
the design of Python.Python was conceptualized in the late 1980s. Guido van Rossum worked
that time in a project at the CWI, called Amoeba, a distributed operating system. In an
interview with Bill Venners1, Guido van Rossum said: "In the early 1980s, I worked as an
implementer on a team building a language called ABC at Centrum voor Wiskunde en
Informatica (CWI). I don't know how well people know ABC's influence on Python. I try to
mention ABC's influence because I'm indebted to everything I learned during that project and
to the people who worked on it."Later on in the same Interview, Guido van Rossum
continued: "I remembered all my experience and some of my frustration with ABC. I decided
to try to design a simple scripting language that possessed some of ABC's better properties,
but without its problems. So I started typing. I created a simple virtual machine, a simple
parser, and a simple runtime. I made my own version of the various ABC parts that I liked. I
created a basic syntax, used indentation for statement grouping instead of curly braces or
begin-end blocks, and developed a small number of powerful data types: a hash table (or
dictionary, as we call it), a list, strings, and numbers.
First, we are going to import all the modules that we are going to need for training our model.
The Keras library already contains some datasets and MNIST is one of them. So we can
easily import the dataset and start working with it.
The mnist.load_data() method returns us the training data, its labels and also the testing data
and its labels.
print(x_train.shape, y_train.shape)
The image data cannot be fed directly into the model so we need to perform some operations
and process the data to make it ready for our neural network. The dimension of the training
data is (60000,28,28). The CNN model will require one more dimension so we reshape the
matrix to shape (60000,28,28,1).
y_train =keras.utils.to_categorical(y_train,num_classes)
x_train =x_train.astype('float32')
x_test /= 255
print('x_train shape:', x_train.shape)
Now we will create our CNN model in Python data science project. A CNN model generally
consists of convolutional and pooling layers.
It works better for data that are represented as grid structures, this is the reason why CNN works
well for image classification problems.
The dropout layer is used to deactivate some of the neurons and while training, it reduces offer
fitting of the model. We will then compile the model with the Adadelta optimizer.
Model = Sequential()
model.add(Conv2D(32, kernel_size
(3,3),activation='relu',input_shape=input_shap e))
model.add(Conv2D(64,(3,3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25)) model.add(Flatten())
Model.add(Dense(256, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax'))
model.compile(loss=keras.losses.categorica
l_crossentropy,optimizer=keras.optimiz ers.Adadelta(),metrics=['accuracy'] )
The model.fit() function of Keras will start the training of the model. It takes the training data,
validation data, epochs, and batch size.It takes some time to train the model. After training, we
save the weights and model definition in the ‘mnist.h5’ file.
Epochs Running:
We have 10,000 images in our dataset which will be used to evaluate how good our model
works. The testing data was not involved in the training of the data therefore, it is new data for
our model. The MNIST dataset is well balanced so we can get around 99% accuracy.
Now for the GUI, we have created a new file in which we build an interactive window to draw
characters on canvas and with a button, we can recognize the character. The Tkinter library
comes in the Python standard library. We have created a function predict_character() that takes
the image as input and then uses the trained model to predict the character.Then we create the
App class which is responsible for building the GUI for our app. We create a canvas where we
can draw by capturing the mouse event and with a button, we trigger the predict_character()
function and display the results. Here’s the full code for our gui_character_recognizer.py file:
rom keras.models
import *
Import tkinter as tk
model = load_model('mnist.h5')
def predict_character(img):
#resize image to 28x28 pixels img = img.resize((28,28)) #convert rgb to grayscale img =
img.convert('L')
img = np.array(img)
# Creating elements
# Grid structure
#self.canvas.bind("<Motion>", self.start_pos)
self.canvas.bind("<B1-Motion>", self.draw_lines)
def clear_all(self): self.canvas.delete("all")
def classify_handwriting(self):
Unit testing verification efforts on the smallest unit of software design, module.
This is known as “Module Testing”. The modules are tested separately. This testing is
carried out during programming stage itself. In these testing steps, each module is found
to be working satisfactorily as regard to the expected output from the module.
Integration testing is a systematic technique for constructing tests to uncover error associated
within the interface. In the project, all the modules are combined and then the entire programmer
is tested as a whole. In the integration-testing step, all the error uncovered is corrected for the
next testing steps.
White box testing is a form of application testing that provides the tester with complete
knowledge of the application being tested, including access to source code and design
documents. This in-depth visibility makes it possible for white box testing to identify issues that
are invisible to gray and black box testing.
Black box testing involves testing a system with no prior knowledge of its internal workings. A
tester provides an input, and observes the output generated by the system under test. This makes
it possible to identify how the system responds to expected and unexpected user actions, its
response time, usability issues and reliability issues.
Black box testing is a powerful testing technique because it exercises a system end-to-end. Just
like end-users “don’t care” how a system is coded or architected, and expect to receive an
appropriate response to their requests, a tester can simulate user activity and see if the system
delivers on its promises. Along the way, a black box test evaluates all relevant subsystems,
including UI/UX, web server or application server, database, dependencies, and integrated
systems.
Acceptance testing
When the system has no measure problem with its accuracy, the system passes through a final
acceptance test. This test confirms that the system needs the original goal, Objective and
requirements established during analysis. If the system fulfils all the requirements, it is finally
acceptable and ready for operation.
5.2 Screenshots
5.2.1 HOMESCREEN
5.2.2 UPLOAD IMAGE
5.2.3 IMAGE UPLOAD
5.2.4 IMAGE COMPRESSION
5.2.5 SIZE COMPRESSION IN DIRECTORY
5.2.6 SIZE COMPARSION AFTER COMPRESSION
5.2.7 IMAGE COMPARSION GRAPH
6.1 CONCLUSION
This article takes a comprehensive look at the most efficient, cutting-edge technologies
and methods that have been used in HCR in the past. Although the amount of work that has been
done in this field is substantial, the high demand for HCR necessitates the development of more
effective and accurate algorithms that require less time and storage.
A complicated study topic has been presented as a result of the wide variety of human
handwriting and the various character write-ups. This paper presents a concise analysis of all of
the significant algorithms that were addressed. Researchers will get in-depth information of the
ongoing work being done on this issue as a result of this study.
Handwritten Character Recognition Using Efficient Net B2 With Transfer Learning and
Two Dense Layers Is the Focus of This Research This paper explores and concentrates on
Handwritten Character Recognition using Dataset DHCD of Devanagari script, which has 92000
pictures of 46 distinct classes. Because these techniques scale all of the factors uniformly, unlike
other CNN methodologies, which makes them superior, transfer learning model helps us in rapid
progressive accuracy of our result, and The EfficientNet are the face of CNN after their
publication in 2019. This will happen in 2019 because these techniques scale all of the factors
uniformly. In addition, two thick layers facilitate the processing of data through the utilization of
metrics vector multiplication from all of the neurons in the layer that comes before it. Even if a
vast amount of work has already been done in this sector, there is still a lot of research that needs
to be done in order to achieve a high level of accuracy while taking into account a large time
factor.
This article assists scholars in this sector in examining new methods and comparing them
to one another, which might be useful for further research and development in the years to come.
Our model's calculations are dependent on the size of the picture contained in the DCH dataset.
A more complicated dataset may be utilized in the testing of our model, which may lead to
outcomes that are distinct from those predicted by the model.
The accuracy achieved by the other approaches is compared in table 1.1 above, along
with the accuracy achieved by our method, which is superior to the accuracy achieved by the
other ways. In addition, this precision may be enhanced by refining the tuning of a number of
factors, although doing so may require a significant investment of time and may or may not be
technically viable. This work is being presented as a starting effort, and the objective is to
simplify the progress and process of identifying hand-written Indo characters. This yields an
accuracy in validation that is 99.49 percent accurate.
7. BIBILOGRAPHY & REFERENCES
[1] H. Zeng,(2020)An Off-line Handwriting Recognition Employing Tensorflow. International
Conference on Big Data, Artificial Intelligence and Internet of Things Engineering (ICBAIE).
[2] A. Beltrán and S. Mendoza, “Efficient algorithm for real-time handwritten character
recognition in mobile devices “,2011, 8th International Conference on Electrical Engineering,
Computing Science and Automatic Control, 2011, pp. 1-6, doi: 10.1109/ICEEE.2011.6106583.
[5] H. Du, P. Li, H. Zhou, W. Gong, G. Luo and P. Yang, \"WordRecorder: Accurate Acoustic-
based Handwriting Recognition Using Deep Learning,\"IEEE INFOCOM 2018 - IEEE
Conference on Computer Communications, 2018, pp. 1448-1456, doi:
10.1109/INFOCOM.2018.8486285. Fig - 7: Image capturing Fig -8: Printed text output © 2021,
IRJET | Impact Factor value: 7.529
[6] Kavitha, D., & Shamini, P. (2016),\"Handwritten Document into Digitized Text Using
Segmentation Algorithm\", An International Journal of Advanced Computer Technology.
Retrieved from https://ijact.in/index.php/ijact/article/view/465
[8] S. B. K.S., V. Bhat and A. S. Krishnan, \"SolveIt: An Application for Automated Recognition
and Processing of Handwritten Mathematical Equations, \"2018 4th International Conference for
Convergence in Technology (I2CT), 2018, pp. 1-8, doi: 10.1109/I2CT42659.2018.9058273.
[9] T. Mantoro, A. M. Sobri and W. Usino, \"Optical Character Recognition (OCR) Performance
in ServerBased Mobile Environment,\"2013 International Conference on Advanced Computer
Science Applications and Technologies, 2013, pp. 423-428, doi: 10.1109/ACSAT.2013.89.