Project Report

Real time hand Detection and Recognition using Machine
A PROJECT REPORT ON
REAL TIME HAND GESTURE RECOGNITION USING

USING OPENCV AND CNN TECHNIQUES
OF
BACHELOR OF ENGINEERING
IN
INFORMATION TECHNOLOGY
BY
ANKUR SINGH 71913662J

ANAS KHAN 71913550J
ARBAZ ANSARI 71913469C
UNDER THE GUIDANCE OF
PROF. V. P. TONDE
DEPARTMENT OF INFORMATION TECHNOLOGY

SINHGAD INSTITUTE OF TECHNOLOGY
LONAVALA
KUSGAON(BK), PUNE-410401, MAHARASHTRA, INDIA 2020-2021
Sinhgad Institute of Technology –Information Technology

CERTIFICATE
This is to certify that the project report entitled
“ REAL TIME HAND GESTURE RECOGNITION

USING USING OPENCV AND CNN TECHNIQUES
”
Submitted by
Ankur Singh 71913662J

Anas Khan 71913550J
Arbaz Ansari 71913469C
Is a bonafide work carried out by them under the supervision of Prof. V. P. Tonde and it is
approved for the partial fulfillment of the requirement of Savitribai Phule Pune University for the
award of the Degree of Bachelor of Engineering (Information Technology)Pune in the academic
year 2020-2021.
This project report has not been earlier submitted to any other Institute or University for the
award of any degree or diploma.
Prof. V. P. Tonde Prof. R.V.Babar

Internal Guide Head of the Department
Department of Information Department of Information
Technology. Technology.
External Examiner Dr .M.S.Gaikwad

Principal
Place: Sinhgad Institute Of Technology
Lonavala Date:
I
ACKNOWLEDGEMENT
We would like to thank our Project Guide Prof. K. S. Karnekar , Assistant Professor in
Information Technology, Sinhgad Institute of Technology, Lonavala for their continuous support
and valuable suggestions throughout this work carried out by us. Authors are also grateful to the
reviewer for perilously going through the manuscript and giving valuable suggestions for the
renovation of manuscript. We would also like to thank the Department of Information Technology,
Sinhgad Institute of Technology, Lonavala for providing us with the facility for carrying out the
simulations.
I express my thanks to all staff members and friends for all the help and co-ordination
extended in bringing out this project successfully in time. I will be failing in duty if I do not
acknowledge with grateful thanks to the authors of the references and other literatures referred to
in this project. Last but not the least; I am very much thankful to my parents who guided me in
every step which I took.
II
ABSTRACT
Hand gesture recognition is one of the system that can detect the gesture of hand in a real
time video. The gesture of hand is classify within a certain area of interest. In this study,
designing of the hand gesture recognition is one of the complicated job that involves two major
problem. Firstly is the detection of hand. Another problem is to create the sign that is suitable to
be used for one hand in a time. This project concentrates on how a system could detect,
recognize and interpret the hand gesture recognition through computer vision with the
challenging factors which variability in pose, orientation, location and scale.
To perform well for developing this project, different types of gestures such as numbers
and sign languages need to be created in this system. The image taken from the realtime video is
analysed via Haar-cascaded Classifier to detect the gesture of hand before the image processing
is done or in the other word to detect the appearance of hand in a frame.[1]
In this project, the detection of hand will be done using the theories of Region of Interest
(ROI) via Python programming. The explanation of the results will be focused on the simulation
part since the different for the hardware implementation is the source code to read the real-time
input video.
The developing of hand gesture recognition using Python and OpenCV can be
implemented by applying the theories of hand segmentation and the hand detection system which
use the Haar-cascade classifier.In our future work, we will incorporate the complex video like
disaster videos for object detection and classification present in the scene for more sophisticated
vision based applications like fire accident, earthquake disaster etc. As a result, our algorithm
identifies the objects by its classes, assigns each object by its tag, and has dimensions on detected
image.[1]
III
LIST OF FIGURES
Sr. No. Figure Name Page No.

1 Flow of Evaluation 19
2 Convolutional Neural Network 28
3 Backward Propagation in neural network 30
4 Kernel convlution 31
4 DFD level 0 34
5 DFD level 1 35
8 Use-Case Diagram 37
9 Class Diagram 38
10 Activity Diagram 39
11 Sequence Diagram 40
IV
Sinhgad Institute of Technology – Information Technology ( 2021-2022 )

CONTENTS
CERTIFICATE I
ACKNOWLEDGEMENT II
ABSTRACT III
LIST OF FIGURES IV
Sr. No. Chapter Page No.

1. Introduction 8
1.1 Introduction to Project 8
1.2 Motivation behind project topic 8
1.3 Aim and Objective of the work 9
1.4 Project Scope 9
2. Literature Survey 10
2.1 Introduction 10
2.2 Inclusion and Exclusion Criteria 11
3. Problem Statement 12
3.1 Definition 12
4. Project Requirement Specification 14
4.1 System Requirements 14
4.2 Hardware Requirements 18
5. Systems Proposed Architecture 19
5.1 Flow of the System 19
5.2 CONVOLUTION NEURAL NETWORK 20
5.3 MATHEMATICS Model 22

6. High level design of the project(DFD/UML) 25

6.1 DFD level 0 25
6.2 DFD level 1 26
6.3 Use-Case Diagram 27
6.4 Class Diagram 28
6.5 Activity Diagram 29
6.6 Sequence Diagram 30
7. System Documentation 31
7.1 Implementation – Code Algorithm 32
7.2 Implementation – Dataset Used 32
8. GUI/Working modules/Experimental Results 33
8.1 Screenshots of The Working of Our Project 34
9. Other Specifications 34
9.1 Advantages 37
9.2 Limitations 37
9.3 Use Cases and Applications 37
10. Conclusions 37
References 40
42
CHAPTER 1
INTRODUCTION
1.1 INTRODUCTION TO THE PROJECT
Hand gestures are the form of non-verbal communication .This can be practised in
several fields such as communicating with deaf-mute people, robot control (Gesture
Control Robotics), HCI, home automation and medical applications.
.
A central requirement here that this implementation depends on, is the need for the system
to provide feedback in real time. In ideal situations where communication between someone who
does and does not understand sign language is to occur, the requirement would immediately be to
have the translation done as soon as the signer finishes signing.
These make communication very ambiguous. The system uses a VGG16 neural network
model trained to predict hand gestures. But it also allows users to add their desired hand gestures
and associated labels to the database, and then layers of the neural network are retrained using
transfer learning.
This project uses static identification of hand gestures in this case because it improves
accuracy as compared to dynamic hand gestures, such as those for the letters J and Z. The proposed
study here tries to increase accuracy by combining Convolution Neural Network (CNN) and open
Computer Vision.
1.2 MOTIVATION BEHIND PROJECT TOPIC
The motivation of project is to build a system that is cost efficient and can help people
use hand gestures to give commands to machines. In India alone, over 50 lakh people suffer from
hearing disabilities as per the 2011 Census which made for 19 percent of the total population.
Close to 20 lakh people suffer from speech impairments , making for roughly 7 percent of the
population . In order to bridge this communication gap and aid with the ease of life for these
minorities, the use of artificial intelligence aided technology is introduced.
● To identify suitable and highly efficient deep learning models for real-time
object recognition and tracking of objects.
● Evaluate the classification performance of the selected deep learning models.
● Compare the classification performance of the selected models among each other
and present the results.
● To gain the ability to sense and react to its environment so as to navigate without the
help or involvement of a human.
This project aims at creating a system to make their communication with other people one
step easier by converting the sign language into text and audio output.
The aim of this thesis is to evaluate the classification performance of the suitable deep learning
models for real-time hand gesture recognition and translating into text or speech.
Sinhgad Institute of Technology – Information Technology ( 2021-

1.3 AIM AND OBJECTIVE OF THE WORK
This project aims at creating a system to make their communication with other people one step
easier by converting the sign language into text and audio output.
The following objectives have been identified to fulfil the aim of this thesis work:
● To identify suitable and highly efficient deep learning models for real-time
object recognition .
● Evaluate the classification performance of the selected deep learning models.
● Producing a model which can recognize Fingerspelling-based hand gestures in order to
form a complete word by combining each gesture.
● To gain the ability to convert the text to speech, sense and react to its environment so as
to navigate without the help or involvement of a human.
1.2 PROJECT SCOPE
The system explains a very generic way of communication not restricted to any sign
language.But while the user shave the ability to add new gestures to system the number of gestures
that can be added is limited.Because the accuracy of CNN may vary largely on adding more gestures.
So algorithms that can build a flexible model that can adjust, modify and get trained with good
accuracy on periodical addition of good amount of images could be built.. And system with good
computation power could be used to hold maximum gestures.
Sentiment analysis can be added to such introduced to potentially increase the working of
architecture.
The system draws different geometric on getting command from user. In the similar way systems that
can perform multiple tasks like playing music/video, sending email, playing game, house automation
etc using gestures as input can be developed.
CHAPTER 2
LITERATURE
SURVEY
2.1 LITERATURE REVIEW
RESEARCH PAPER METHOD USED PROS CONS
Hand Gesture CNN, As from the results, Only words can be

Recognition for Sign Masking and preprocessing CNN gives 98.55%. predicted emotions
Language Using using open CV accuracy. cannot be captured
Convolutional Neural through this.
Network
- 2021
(IEEE)
Models for Hand Inception V3 model , Only limited for ISL

Using MNIST sign Accuracy over 90% recognition.
Gesture is acchived for Due to trransfer
language dataset, Image
Recognition using Augmentation and recognising ISL learning the model is
Deep preprocessing techniques of gestures.. overfitted.
Learning openCV
-Oct 30-31, 2020 .
(IEEE)
Sign Language SURF (Speeded-Up Robust • It predicts word and

Features) not alphabets It can predict
Converter Using to discover focal • The confusion only the defined
Hand Gestures point, an API, free between alphabets like j words
-2020 (IEEE TTS. and z eliminated
• suitable for urgent
situation

2.2 INCLUSION AND EXCLUSION CRITERIA
The following inclusion and exclusion criteria have been followed while collecting the
articles for the literature review:
● Only those articles that discussed about sign language detection/recognition and
deep learning models have been included.
● Only the articles published in the years 2020 and 2021 have been included, as they
reflect the most recent research conducted in this area.
● Only the journal articles, conference papers, magazines and reviews have been included.
● Only the articles written in English language have been included for
understandability purposes .
● Abstracts and PowerPoint presentations have been excluded.
CHAPTER 3
PROBLEM STATEMENT
3.1 DEFINITION OF PROBLEM STATEMENT
With the growth of ubiquitous computing, use of software is not limited to computers or
CPUs. It has reached everywhere in one or the other form .Interaction of user with machine is
now not limited using mouse or keyboards. Each of orthodox input device has its limitation on
receiving commands .Using hand gesture as an input device will provide natural and efficient
interaction Our project is based on concept of Image processing and Neural Networks .Lots of
research is based on gesture recognition using kinetic sensor on using HD camera but camera and
kinetic sensors are more costly. To reduce cost and improve robustness of the proposed system
we used simple web camera.
.
CHAPTER 4
PROJECT REQUIREMENT SPECIFICATION
4.1 SOFTWARE REQUIREMENTS
Python has been selected as the programming languages, it is a high-level programming

language, which is easy to learn and code, making it the widely used programming language for
developing machine learning as well as deep learning algorithms
Install Python on your computer system

Install dependencies like TensorFlow, NumPy, OpenCV, etc.
Steps to be followed:
1) Download and install Python version 3 from official Python Language website
https://www.python.org/
2) Install the following dependencies via pip :
i. TensorFlow :
TensorFlow is an open-source software library for dataflow and differentiable

programming across a range of tasks. It is a symbolic math library, and is also used for machine
learning application such as neural networks, etc.
It is used for both research and production by Google. TensorFlow is developed by the Google
Brain team for internal Google use. TensorFlow is Google Brain's second-generation system.1st
Version of TensorFlow was released on February 11, 2017.
While the reference implementation runs on single devices, TensorFlow can run on multiple
CPU’s and GPU (with optional CUDA and SYCL extensions for general-purpose computing on
graphics processing units).
TensorFlow is available on various platforms such as64-bit Linux, macOS, Windows, and mobile
computing platforms including Android and iOS.
The architecture of TensorFlow allows the easy deployment of computation across a variety of
platforms (CPU’s, GPU’s, TPU’s), and from desktops - clusters of servers to mobile and edge
devices. TensorFlow computations are expressed as stateful dataflow graphs. The name
TensorFlow derives from operations that such neural networks perform on multidimensional data
arrays, which are referred to as tensors.
pip install TensorFlow -command
ii. Numpy :
NumPy is library of Python programming language, adding support for large, multi-
dimensional array and matrices, along with large collection of high-level mathematical function
to operate over these arrays. The ancestor of NumPy, Numeric, was originally created by Jim
Hugunin with contributions from several developers. In 2005 Travis Olphant created NumPy by
incorporating features of computing Numarray into Numeric, with extension modifications.
NumPy is open-source software and has many contributors.
pip install numpy -command

iii. SciPy :
SciPy contain modules for many optimizations, linear algebra, integration, interpolation,
special function, FFT, signal and image processing, ODE solvers and other tasks common in
engineering. SciPy abstracts majorly on NumPy array object, and is the part of the NumPy stack
which include tools like Matplotlib, pandas and SymPy, etc., and an expanding set of scientific
computing libraries.This NumPy stack has similar uses to other applications such as MATLAB,
Octave, and Scilab. The NumPy stack is also sometimes referred as the SciPy stack.
The SciPy library is currently distributed under BSDlicense, and its development is sponsored
and supported by an open communities of developers. It is also supported by NumFOCUS,
community foundation for supporting reproducible and accessible science.
pip install scipy -command
iv. OpenCV :
OpenCV is an library of programming functions mainly aimed on real time computer vision.
originally developed by Intel, it is later supported by Willow Garage then Itseez. The library is
a cross-platform and free to use under the open-source BSD license.
pip install opencv-python -command
v. Pillow :
Python Imaging Library is a free Python programming language library that provides support
to open, edit and save several different formats of image files. Windows, Mac OS X and Linux
are available for this.
pip install pillow -command

vi. Matplotlib :
Matplotlib is a Python programming language plotting library and its NumPy numerical math
extension. It provides an object-oriented API to use general-purpose GUI toolkits such as
Tkinter, wxPython, Qt, or GTK+ to embed plots into applications.
pip install matplotlib – command
vii. H5py :
The software h5py includes a high-level and low-level interface for Python’s HDF5 library.
The low interface expected to be complete wrapping of the HDF5 API, while the high-
level0020component uses established Python and NumPy concepts to support access to HDF5
files, datasets and groups.
A strong emphasis on automatic conversion between Python (Numpy) datatypes and data
structures and their HDF5 equivalents vastly simplifies the process of reading and writing data
from Python.
pip install h5py
viii. Keras :
Keras is an open-source neural-network library written in Python. It is capable of running on

top of TensorFlow, Microsoft Cognitive Toolkit, Theano, or PlaidML. Designed to enable fast
experimentation with deep neural networks, it focuses on being user-friendly, modular, and
extensible.
pip install keras

4.2 HARDWARE REQUIREMENTS
 System : PC, Laptop.
 CPU : Core i5- and above
 GPU : NVIDIA Quadro M4000M
 Memory : 6 GB
 Display Memory : 4 GB
 Operating System : Windows 8/10 ,Linux (deb/rpm)

CHAPTER 5
SYSTEMS PROPOSED ARCHITECTURE
5.1 FLOW OF THE SYSTEM
Fig. 1 : Flow of Evaluation
 Image Aquistion: It is the action of extracting an image from a source, typically a

hardware-based source, for process of image processing. WebCamera is the hardware-
based source in our project. It is the first step in the workflow sequence because no
processing can be done without an image. The picture that is obtained has not been
processed in any way.
 Segmentation: The method of separating objects or signs from the context of a
captured image is known as segmentation. text subtracting, skin-color detection, and
edge detection are all used in the segmentation process. The motion and location of the
hand must be detected and segmented in order to recognise gestures.fig:threshold
3. Features Extraction: Predefined features such as form, contour, geometrical feature (position,
angle, distance, etc. ), colour feature, histogram, and others are extracted from the preprocessed
images and used later for sign classification or recognition. Feature extraction is a step in the
dimensionality reduction process that divides and organises a large collection of raw data. reduced to
smaller, easier-to-manage classes As a result, processing would be simpler. The fact that these
massive data sets have a large number of variables is the most important feature. To process these
variables, a large amount of computational power is needed. As a result, function extraction aids in
the extraction of the best feature from large data sets by selecting and combining variables into
functions. reducing the size of the data These features are simple to use while still accurately and
uniquely de- scribing the actual data collection.
4. Preprocessing: Each picture frame is preprocessed to eliminate noise using a variety of filters
including erosion, dilation, and Gaussian smoothing, among others. The size of an image is reduced
when a color image is transformed to grayscale. A common method for reducing the amount of data
to be processed is to convert an image to grey scale. The phases of preprocessing are as follows
Morphological operations use a structuring feature onan input image to create a similar-sized
output image.It compares the corresponding pixel in the input image with its neighbours to
determine the value of each pixelin the output image.There are two different kinds of morphological
transformations Erosion and Dilation.
5. Recognition:
We’ll use classifiers in this case. Classifiers are the meth- ods or algorithms that are used to interpret
the signals. Popular classifiers that identify or understand sign language include the Hidden Markov
Model (HMM), K- Nearest Neighbor classifiers, Support Vector Machine (SVM), Artificial Neural
Network (ANN), and Principle Component Analysis (PCA), among others. However, in this project,
the classifier will be CNN. Because of its high precision, CNNs are used for image classification and
recognition. The CNN uses a hierarchical model that builds a network, similar to a funnel, and then
outputs a fully-connected layer in which all neurons are connected to each other and the output is
processed

23 | P a g
5.2 CONVOLUTION NEURAL NETWORK
Image classification is the process of taking an input(like a picture) and outputting its class or
probability that the input is a particular class. Neural networks are applied in the following steps:
1) One hot encode the data: A one-hot encoding can be applied to the integer
representation. This is where the integer encoded variable is removed and a new
binary variable is added for each unique integer value.
2) Define the model: A model said in a very simplified form is nothing but a function that
is used to take in certain input, perform certain operation to its beston the given input
(learning and then predicting/classifying) and produce the suitable output. 3)Compile the
model: The optimizer controls the learning rate. We will be using ‘adam’ as our optmizer.
Adam is generally a good optimizer to use for many cases. The adam optimizer adjusts
the learning rate throughout training. The learning rate determines how fast the optimal
weights for the model are calculated. A smaller learning rate may lead to more accurate
weights (up to a certain point), but the time it takes to compute the weights will be longer.
4)Train the model: Training a model simply means learning (determining) good values for
all the weights and the bias from labeled examples. In supervised learning, a machine
learning algorithm builds a model by examining many examples and attempting to find a
model that minimizes loss; this process is called empirical risk minimization.
5) Test the model
A convolutional neural network convolves learned featured with input data and uses 2D
convolution layers.
ConvolutionOperation:
In purely mathematical terms, convolution is a function derived from two given

functions by integration which expresses how the shape of one is modified by the other.
Convolution formula:

Here are the three elements that enter into the convolution operation: • Input
image
• Feature detector
• Feature map
Steps to apply convolution layer:
• You place it over the input image beginning from the top-left corner within the
borders you see demarcated above, and then you count the number of cells in which the
feature detector matches the input image.
• The number of matching cells is then inserted in the top-left cell of the feature
map
• You then move the feature detector one cell to the right and do the same thing.
This movement is called a and since we are moving the feature detector one cell at time,
that would be called a stride of one pixel.
• What you will find in this example is that the feature detector's middle-left cell
with the number 1 inside it matches the cell that it is standing over inside the input image.
That's the only matching cell, and so you write “1” in the next cell in the feature map, and
so on and so forth.
32
• After you have gone through the whole first row, you can then move it over to the
next row and go through the same process.
There are several uses that we gain from deriving a feature map. These are the most
important of them: Reducing the size of the input image, and you should know that the
larger your strides (the movements across pixels), the smaller your feature map.

 Relu Layer: Rectified linear unit is used to scale the parameters to non negativevalues.We
get pixel values as negative values too . Inthis layer we make them as 0’s. The purpose of
applying the rectifier function is to increase the non-linearity in our images. The reason we
want to do that is that images are naturally non-linear. The rectifier serves to break up the
linearity even further in order to make up for the linearity that we might impose an image
when we put it through the convolution operation. What the rectifier function does to an
image like this is remove all the black elements from it, keeping only those carrying a
positive value (the grey and white colors).The essential difference between the non-rectified
version of the image and the rectified one is the progression of colors. After we rectify the
image, you will find the colors changing more abruptly. The gradual change is no longer
there. That indicates that the linearity has been disposed of.
 Pooling Layer:
The pooling (POOL) layer reduces the height and width of the input. It helps
reduce computation, as well as helps make feature detectors more invariant to its position
in the input This process is what provides the convolutional neural network with the
“spatial variance” capability. In addition to that, pooling serves to minimize the size of the
images as well as the number of parameters which, in turn, prevents an issue of
“overfitting” from coming up. Overfitting in a nutshell is when you create an excessively
complex model in order to account for the idiosyncracies we just mentioned The result
ofusing a pooling layer and creating down sampled or pooled feature maps is a
summarized version of the features detected in the input. They are useful as small changes
in the location of the feature in the input detected by the convolutional layer will result in a
pooled feature map with the feature in the same
location. Thiscapability added by pooling is called the model’s invariance to
local translation.

o Fully Connected Layer:

The role of the artificial neural network is to take this data and combine the
features into a wider variety of attributes that make the convolutional network more
capable of classifying images, which is the whole purpose from creating a convolutional
neural network. It has neurons linked to each other ,and activates if it identifies patterns
and sends signals to output layer .the outputlayer gives output class based on weight
values, For now, all you need to know is that the loss function informs us of how accurate
our network is, which we then use in optimizing our network in order to increase its
effectiveness. That requires certain things to be altered in our network. These include the
weights (the blue lines connecting the neurons, which are basically the synapses), and the
feature detector since the network often turns out to be looking for the wrong features and
has to be reviewed multiple times for the sake of optimization.This full connection process
practically works as follows:
• The neuron in the fully-connected layer detects a certain feature; say, a nose. •
It preserves its value.
• It communicates this value to the classes trained images.
Fig 2 : Convolutional Neural Network:

5.3 MATHEMATICAL MODELS:-
 FORWARD PROPAGATION IN CNN:-

During forward propagation at each node of hidden and output layer preactivation and
activation takes place. For example at the first node of the hidden layer, a1(preactivation) is
calculated first and then h1(activation) is calculated.
a1 = w5*h1 + w6*h2 + b3
Any of activation function is applied on this a1.
This makes the output value of neuron to be close to 1 or 0.
 BACKWARD PROPAGATION IN NEURAL NETWORK:-

Back-propagation is the essence of neural net training. It is the practice of fine-tuning the
weights of a neural net based on the error rate (i.e. loss) obtained in the previous epoch (i.e. iteration).
Proper tuning of the weights ensures lower error rates, making the model reliable by increasing its
generalization .
Fig 3 : Backward Propagation in neural

 CONVULATION IN CNN:-
Kernel convolution is not only used in CNNs, but is also a key element of many other
Computer Vision algorithms. It is a process where we take a small matrix of numbers (called kernel
or filter), we pass it over our image and transform it based on the values from filter. Subsequent
feature map values are calculated according to the following formula, where the input image is
denoted by f and our kernel by h. The indexes of rows and columns of the result matrix are marked
with m and n respectively.
After placing our filter over a selected pixel, we take each value from kernel and multiply them in
pairs with corresponding values from the image.
In CNN the Kernels can be considered as nodes in ANN the kernel matrix values are weights that are
updated in each epoch
Feature map is the output of the node
Fig 4: kernel convolution

6.2 DFD LEVEL 1
A level 1 DFD notates each of the main sub-processes that together form the complete
system. We can think of a level 1 DFD as an “exploded view” of the context diagram and shows
the input and output flow of the proposed system.
Fig. 6: DFD Level 1
Sinhgad Institute of Technology – Information Technology ( 2021-2022 )

35 | P a g e
6.3 USE CASE DIAGRAM
The purpose of the use case diagrams is simply to provide the high level view of the system
and convey the requirements in laypeople's terms for the stakeholders.
Fig. 7: Use Case Diagram

6.4 CLASS DIAGRAM
The class diagram is the main building block of object-oriented modeling. It is used for
general conceptual modeling of the structure of the application, and for detailed modeling
translating the models into programming code. Class diagrams can also be used for data
modeling.
Fig. 8 : Class
6.5 ACTIVITY DIAGRAM
An activity diagram is a behavioral diagram i.e. it depicts the behavior of a system.

An activity diagram portrays the control flow from a start point to a finish point showing the
various decision paths that exist while the activity is being executed.
Fig. 9 : Activity Diagram

6.6 SEQUENCE DIAGRAM
A sequence diagram is a type of interaction diagram because it describes how and in what
order—a group of objects works together. These diagrams are used by software developers and
business professionals to understand requirements for a new system or to document an existing
process.
Fig. 10 : Sequence Diagram

CHAPTER 7
SYSTEM DOCUMENTATION
7.1 IMPLEMENTATION – CODE ALGORITHM
Sample Code :
import argparse
import os
import shutil
import time
from pathlib import Path
import cv2
import torch
import torch.backends.cudnn as cudnn
from numpy import random
from models.experimental import attempt_load
from utils.datasets import LoadStreams,
LoadImages from utils.general import (
check_img_size, non_max_suppression, apply_classifier, scale_coords,
xyxy2xywh, plot_one_box, strip_optimizer, set_logging)
from utils.torch_utils import select_device, load_classifier, time_synchronized
def detect(save_img=False):
out, source, weights, view_img, save_txt, imgsz = \
opt.save_dir, opt.source, opt.weights, opt.view_img, opt.save_txt, opt.img_size
webcam = source.isnumeric() or source.startswith(('rtsp://', 'rtmp://', 'http://')) or
source.endswith('.txt')
7.2 IMPLEMENTATION – DATASET USED

CHAPTER 8
GUI/WORKING MODULES/EXPERIMENTAL RESULTS
8.1 EXPERIMENTAL RESULTS OF CODE IMPLEMENTATION :

CHAPTER 9
OTHER SPECIFICATIONS
9.1 ADVANTAGES
1.Hand gesture Recognition and conversion to text.

2.Converting the gestures to audio
3. Making it possible to communicate with people who cannot speak.
4. Providing input to machine as hand gestures and getting performing different
functions 5.In emergency systems for getting quick alerts
9.2 LIMITATIONS
 Class Imbalance.
 Speed for real time objects.
 Multiple spatial scales and aspect ratio.
 User must have large memory storage
 User must have all the software required to run application
9.3 USE CASES AND APPLICATIONS
In this section, we’ll provide an overview of real-world use cases for real time hand
detection. We’ve mentioned several of them in previous sections, but here we’ll dive a bit deeper
and explore the impact this computer vision technique can have across industries.
Specifically, we’ll examine how real time hand detection can be used in the following areas:
● To brigde gap between deaf community and our community

● Crowd coumncating
● Anomaly detection (i.e. in industries like agriculture, health care)
● Hospitals emergency situations
.
CHAPTER 10
CONCLUSIONS
Gesture recognition is a budding field of computer science and AI. Using hand gestures as the input
to system can enhance the way user interacts with the system. This system performs as a medium of
communication for the deaf and dumb community of society. Thus tries to reduce the barrier for the
after mentioned minorities. The system also allows the user to add new gestures associated labels to
the system for translation.
The primary advantage of the system is that it is designed to be an interface that functions in real
time and would be available in masses. If developed as a mobile application then it can be effectively
used by targeted audience.
Like the general systems this system does not promise a 100% accuracy translating the gestures.
And since this is made to be used in real time the chances of error increase due to randomness in the
behaviour of user and noise in the images captured.But as the user would be in midst of a
communication or task chance of user to flag the error are considerably low.However this does not
rule out chances of user ignoring all the errors.Inbuilt machine learning algorithms can be used to
avoid the flagged errors by user.
Since the user has privilege to add new gestures to data base for gesture translation.on which the
model gets trained again. This process may affect the accuracy of system. To overcome this we have
limited the number of new gestures that a user can add.But still according to the nature of gesture the
model accuracy may get affected a bit.
Furthermore this system does not translate the mood or emotions of the user. But it is made for
simple translation of gestures to text
REFERENCES
1. Dr. V. Subedha, Sandhya , Shree Lakshmi , Swathi (IRJMETS - 2021) - Sign Language
Recognition to Aid Physically Challenged Using Open CV and CNNN
2. Shruti Chavan, Xinrui Yu and Jafar Saniie (IEEE - 2021) - Convolutional Neural Network Hand
Gesture Recognition for American Sign Language
3. Dr.J Rethna Virgil Jeny, A Anjana, Karnati Monica, Thandu Sumanth ,A Mamatha (IEEE -2021)
- Hand Gesture Recognition for Sign Language Using Convolutional Neural Network
4. Shagun Gupta ,Riya Thakur,, Vinay Maheshwari & Namita Pulgam (IEEE - 2020) - Sign
Language Converter Using Hand Gestures
5. Manasi Agrawal, Rutuja Ainapur, Shrushti Agrawal, Simran Bhosale,Dr. Sharmishta
Desai (IEEE - 2020) - Models for Hand Gesture Recognition using Deep Learning
6. Aishwarya Sharma, Dr. Siba Panda, Prof. Saurav Verma (IEEE -2020) - Sign Language
to Speech Translation
7. Ritika Bharti, Sarthak Yadav, Sourav Gupta, and Rajitha Bakthula (SSRN -2019) - Automated
Speech to Sign language Conversion using Google API and NLP

Project Report

Uploaded by

Copyright:

Available Formats

Project Report

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Project Report

Uploaded by

Copyright:

Available Formats

Real time hand Detection and Recognition using Machine

REAL TIME HAND GESTURE RECOGNITION USING

ANKUR SINGH 71913662J

UNDER THE GUIDANCE OF

DEPARTMENT OF INFORMATION TECHNOLOGY

Sinhgad Institute of Technology –Information Technology

This is to certify that the project report entitled

“ REAL TIME HAND GESTURE RECOGNITION

Ankur Singh 71913662J

Prof. V. P. Tonde Prof. R.V.Babar

External Examiner Dr .M.S.Gaikwad

Sr. No. Figure Name Page No.

Sinhgad Institute of Technology – Information Technology ( 2021-2022 )

Sr. No. Chapter Page No.

5.3 MATHEMATICS Model 22

6. High level design of the project(DFD/UML) 25

1.1 INTRODUCTION TO THE PROJECT

1.2 MOTIVATION BEHIND PROJECT TOPIC

Sinhgad Institute of Technology – Information Technology ( 2021-

1.3 AIM AND OBJECTIVE OF THE WORK

1.2 PROJECT SCOPE

2.1 LITERATURE REVIEW

RESEARCH PAPER METHOD USED PROS CONS

Hand Gesture CNN, As from the results, Only words can be

Models for Hand Inception V3 model , Only limited for ISL

Sign Language SURF (Speeded-Up Robust • It predicts word and

Sinhgad Institute of Technology – Information Technology ( 2021-

2.2 INCLUSION AND EXCLUSION CRITERIA

3.1 DEFINITION OF PROBLEM STATEMENT

PROJECT REQUIREMENT SPECIFICATION

4.1 SOFTWARE REQUIREMENTS

Python has been selected as the programming languages, it is a high-level programming

Install Python on your computer system

2) Install the following dependencies via pip :

TensorFlow is an open-source software library for dataflow and differentiable

pip install TensorFlow -command

pip install numpy -command

pip install scipy -command

pip install opencv-python -command

pip install pillow -command

pip install matplotlib – command

pip install h5py

Keras is an open-source neural-network library written in Python. It is capable of running on

pip install keras

4.2 HARDWARE REQUIREMENTS

 System : PC, Laptop.

 CPU : Core i5- and above

 GPU : NVIDIA Quadro M4000M

 Operating System : Windows 8/10 ,Linux (deb/rpm)

SYSTEMS PROPOSED ARCHITECTURE

5.1 FLOW OF THE SYSTEM

Fig. 1 : Flow of Evaluation

 Image Aquistion: It is the action of extracting an image from a source, typically a

Sinhgad Institute of Technology – Information Technology ( 2021-

5.2 CONVOLUTION NEURAL NETWORK

In purely mathematical terms, convolution is a function derived from two given

Sinhgad Institute of Technology – Information Technology ( 2021-

Steps to apply convolution layer:

Sinhgad Institute of Technology – Information Technology ( 2021-

Sinhgad Institute of Technology – Information Technology ( 2021-