Hand gesture recognition system(FYP REPORT)

1 | P a g e
HAND GESTURE RECOGNITION
SYSTEM
FINAL YEAR PROJECT REPORT
AFNAN UR REHMAN (P11-6053)
HASEEB ANSER IQBAL (p11-6106)
ANWAAR UL HAQ (p11-6001)
SESSION 2011-2015
SUPERVISED BY
Dr. NAVEED ISLAM
DEPARTMENT OF COMPUTER SCIENCE
NATIONAL UNIVERSITY OF COMPUTER & EMERGING SCIENCES,
PESHWAR CAMPUS
(MAY 2015)

2 | P a g e
STUDENT’S DECLARATION
I declare that this project entitled “HAND GESTURE RECOGNITION SYSTEM”,
submitted as requirement for the award of BS (CS) degree, does not contain any material
previously submitted for a degree in any university; and that to the best of my knowledge
it does not contain any materials previously published or written by another person except
where due reference is made in the text.
AFNAN UR REHMAN ________________________
HASEEB ANSER IQBAL ________________________
ANWAAR UL HAQ ________________________

3 | P a g e
HAND GESTURE RECOGNITION SYSTEM
THE DEPARTMENT OF COMPUTER SCIENCE, NATIONAL UNIVERSITY OF
COMPUTER & EMERGING SCIENCES, ACCEPTS THIS THESIS SUBMITTED BY
AFNAN UR REHMAN, HASEEB ANSER IQBAL, ANWAAR UL HAQ IN ITS
PRESENT FORM AND IT IS SATISFYING THE DISSERTATION REQUIREMENTS
FOR THE AWARD OF BACHELOR DEGREE IN COMPUTER SCIENCE.
SUPERVISOR
Dr. NAVEED ISLAM
ASSISTANT PROFESSOR ________________________
FYP COORDINATOR
Mr. SHAKIR ULLAH
HEAD OF DEPARTMENT
FAZL-E-BASIT
DATED:
DEPARTMENT OF COMPUTER SCIENCE
NATIONAL UNIVERSITY OF COMPUTER & EMERGING SCIENCES,
PESHWAR CAMPUS

4 | P a g e
ACKNOWLEDGEMENT
Through this acknowledgment, we express our sincere gratitude to all those people
who have been associated with this project and have helped us with it and made it
a worthwhile experience.
Firstly we extend our thanks to the Final year project coordinator who arranged
and managed all the presentations and understood all of our problems in a good
manner and effectively. Without his management skills we might have faced a lot of
problem.
Secondly we would like to thank Final year project committee, who attended each
and every presentation and the listened to our project related problems and
presented solutions and opinions. They effectively raised questions about the
limitations of our system in different phases and advised us to use better and
effective techniques where we could, it was due to their judgment that we improved
our project to overcome those limitations, so they were crucial to this project.
In the last we would like to take this opportunity to express a deep sense of gratitude
to our Final year project Supervisor for his cordial support, exemplary guidance,
monitoring and constant encouragement. Whenever we needed his help he was there
to help us.
We are obliged to our batch fellows and parents for their valuable guidance and
co-operation during the period of this task. Their blessing, help and guidance was
a deep inspiration to us.

5 | P a g e
ABSTRACT
We have proposed a method for real time Hand Gesture Recognition and feature extraction
using a web camera. In this approach, the image is captured through webcam attached to
the system. First the input image is preprocessed and threshold is used to remove noise
from image and smoothen the image. After this apply region filling to fill holes in the
gesture or the object of interest. This helps in improving the classification and recognition
step. Then select the biggest blob (biggest binary linked object) in the image and remove
all small object, this is done to remove extra unwanted objects or noise from image.
When the preprocessing is complete the image is passed on to feature extraction phase.
For feature extraction “HU moments” are used because of their distinct properties like
rotation, scale and translation invariance. The extracted features are normalized and
matched with the training dataset features using KNN (K-nearest neighbor) algorithm.
Euclidean distance in KNN is used to calculate the distance and then for finding the nearest
neighbor. The test image is classified in nearest neighbor’s class in training set. The
classification results are displayed to user and through the windows text to speech API
gesture is translated into speech as well. The training data set of images that is used has 5
gestures, each with 50 variations of a single gesture with different lighting conditions. The
purpose of this is to improve the accuracy of classification.
Keywords
Hand gestures, gesture recognition, contours, HU moments invariant, Sign language
recognition, Matlab, K-mean classifier, Human Computer interface, Text to speech
conversion and Machine learning.

6 | P a g e
Disclaimer
The report is submitted as part requirement for Bachelor’s degree in Computer science at
FAST NU Peshawar. It is substantially the result of Afnan-Ur-Rehman, Anwaar Ul Haq
and Haseeb Anser Iqbal’s own work except where explicitly indicated in the text.
The report will be distributed to FYP supervisor and FYP coordinator to examine it, but
there after may not be copied or distributed.

7 | P a g e
Table of Contents
1 Introduction ............................................................................................................. 9
2 Background............................................................................................................10
Literature ...................................................................................................................10
Image sensing...........................................................................................................10
3 Method ...................................................................................................................12
Proposed Method .....................................................................................................12
Steps chart:...............................................................................................................13
Flow chart: ................................................................................................................14
4 Image Acquisition...................................................................................................15
5 Preprocessing ........................................................................................................16
Flow chart of steps:..................................................................................................16
RGB to Grayscale:....................................................................................................16
Binarize......................................................................................................................17
Grayscale filtering using value ...............................................................................17
Noise removal and smoothing ................................................................................18
Remove small objects other than hand......................................................................20
Region filling.............................................................................................................21
Canny edge detection (Additional step).................................................................21
6 Hand Detection.......................................................................................................23
7 Hand cropping........................................................................................................24
8 Feature extraction ..................................................................................................25
9 Hand Gesture Training (Machine learning).............................................................27

8 | P a g e
Machine Learning.....................................................................................................27
Training Dataset .......................................................................................................27
Feature Extraction:...................................................................................................30
Normalization:...........................................................................................................30
Inter class difference: ..............................................................................................30
10 Classification..........................................................................................................31
11 Text to speech........................................................................................................34
12 UML Diagrams .......................................................................................................35
Use Case Diagram....................................................................................................35
Sequence Diagram ...................................................................................................36
Flow Diagram ............................................................................................................37
13 Conclusion .............................................................................................................38
Future work ...............................................................................................................38
Potential applications ..............................................................................................38
14 Project poster .........................................................................................................39
15 References.............................................................................................................41
16 Turnitin Originality Report.......................................................................................42

9 | P a g e
1 Introduction
Hands are human organs which are used to manipulate physical objects. For this very
reason hands are used most frequently by human beings to communicate and interact with
machines. Mouse and Keyboard are the basic input/output to computers and the use of both
of these devices require the use of hands. Most important and immediate information
exchange between man and machine is through visual and aural aid, but this
communication is one sided. Computers of this age provide humans with 1024 * 768 pixels
at a rate of 15 frames per second and compared to it a good typist can write 60 words per
minute with each word on average containing 6 letters. To help somewhat mouse remedies
this problem, but there are limitations in this as well. Although hands are most commonly
used for day to day physical manipulation related tasks, but in some cases they are also
used for communication. Hand gestures support us in our daily communications to convey
our messages clearly. Hands are most important for mute and deaf people, who depends
their hands and gestures to communicate, so hand gestures are vital for communication in
sign language.
If computer had the ability to translate and understand hand gestures, it would be a leap
forward in the field of human computer interaction. The dilemma, faced with this is that
the images these days are information rich and in-order to achieve this task extensive
processing is required. Every gesture has some distinct features, which differentiates it
from other gestures, HU invariant moments are used to extract these features of gestures
and then classify them using KNN algorithm. Real life applications of gesture based human
computer interaction are; interacting with virtual objects, in controlling robots, translation
of body and sign language and controlling machines using gestures.

10 | P a g e
2 Background
Literature
Several methods are proposed for both dynamic and static hand gestures. [1] Pujan Ziaie
proposed a technique of first computing the similarity of different gestures and then assign
probabilities to them using Bayesian Interface Rule. Invariant classes were estimated from
these using a modification of KNN (K-Nearest Neighbor).These classes consist of Hu-
moments with geometrical attributes like rotation, transformation and scale in variation
which were used as features for classification. Performance of this technique was very
well and it was giving 95 % accurate results. [2] Pujan Ziaie also proposed a similar
technique which also uses HU-moments along with modified KNN (K-Nearest
Neighbor) algorithm for classification called as Locally Weighted Naive Bayes Classifier.
Classification results were this technique were 93% accurate under different lighting
conditions with different users. [3] Rajat Shrivastava proposed a method, in which he used
HU moments and hand orientation for feature extraction. Baum Welch algorithm was used
for recognition. The method has accuracy of 90 %. [4] Technique propose by Neha S.
Chourasia, Kanchan Dhote and Supratim Saha used a hybrid feature descriptor, combining
HU invariant moments and SURF. They used (KNN) K-nearest neighbors and SVM for
classification. They Achieved 96% accuracy. [5] Joyeeta Singa proposed a hand gestures
recognition system based on K-L Transform. This system was consisting of five steps,
which are; skin filtering (Image acquisition, converting RGB to HSV, filtering image,
smoothing image, binary image, finding biggest BLOB), palm cropping, Hand edge
detection using Canny edge detector, feature extraction using K-L Transform and
classification. [6] Huter proposed a system that uses Zernike moments to extract image
features and used Hidden Markov Model for recognition. [7] Raheja proposed a technique
that scanned the image all directions to find the edges of finger tips. [8] Segan proposed a
technique that used edges for feature extraction. This reduces the time complexity and also
help for removing noise.
Image sensing
Image is a two-dimensional function f(x, y), where x and y are spatial coordinates, and the
amplitude of f at any pair of coordinates (x, y) is called the intensity or gray level of the
image at that point.
Image creation is based on two main factors which are; Reflection or absorption of energy
from the object being imaged and Illumination source. Illumination source can be an
electromagnetic energy like; infrared, or X-ray or sources like ultrasound, sunlight or
Computer generated illumination pattern. In some cases, the energy that is transmitted or
reflected is focused onto converter, this is called photo converter. This photo converter
converts energy into visible light. A basic arrangement of sensors is used to convert energy

11 | P a g e
into digital images. The energy that is coming in is converted into a voltage, by the use of
input electrical power and sensor material that is responsive to a specific type of energy
that is being detected. In response the sensor produces an output waveform and the digital
quantity produced by each sensor. This is just the approximation or real scene.
Camera in computers usually include a lens (image sensor) and they also may include a
microphone to capture sound. Image sensors of computer can be one of two type available;
CCD or CMOS. CCD stands for charge coupled device and CMOS stands for
Complementary metal oxide semiconductor. Most of the user web cameras are able to
provide VGA resolutions. This is at a rate of 30 frames per second. The next generation
modern devices on the other hand are capable of providing multi-megapixel resolutions. In
the project ordinary Web camera is used to capture the scene.

12 | P a g e
3 Method
Proposed Method
In order to extract features and recognize a gesture following method is proposed:
1. A GUI which allows the user to capture the scene. This phase is called image
acquisition.
2. After capturing the image, next step is to detect the hand and separate the hand
from the scene, because only hand gesture is needed for accurate classification. If
hand is not separated from the scene it will affect the accuracy of the system
while extracting and matching the features.
3. Crop hand out of scene.
4. Preprocessing steps, which are:
a. Convert RGB to Gray scale.
b. Gray filtering using Value.
c. Noise removal and smoothing.
d. Remove small objects other than hand.
5. Feature extraction using HU moments invariant.
6. Classification using KNN algorithm. Using Euclidean distance formula for
calculating distance and having threshold to have better results.
7. Translation (conversion) in Speech.
The proposed method is given in the figure 3.1.

13 | P a g e
Steps chart:
Figure 3.1 Proposed steps
Image
acquisition
Hand
detection
Crop HandPreprocessing
Feature
extraction
Classification
Gesture to
speech

14 | P a g e
Flow chart:
Figure 3.2 Proposed flow chart
Detection:
Capture scene (image)
Preprocessing
Hand Detection
Feature Extraction for
Gesture
Contour detection
Learning:
Training Set (Hand
Gestures)
Feature Extraction
Recognition:
Feature Matching
Gesture Recognition
Conversion to speech

15 | P a g e
4 Image Acquisition
In this step a GUI, is made which shows the video stream of the scene. From that GUI
when the capture button is clicked it takes an image of the scene. The problem is that this
scene includes the whole body and other unwanted objects as well. The figures below
shows the GUI based front end of the system through which user can capture the image:
Figure 4.1 System GUI

16 | P a g e
5 Preprocessing
Flow chart of steps:
Figure 5.1 Steps of preprocessing
RGB to Grayscale:
RGB stands for Red, Green and blue. It is a system of colors in which these three mentioned
colors are added in different quantities to give different colors. A human’s ability of visions
can distinguish between many different colors, their intensities and shades. When it comes
to the shades of gray, human vision can only distinguish approximately 100 shades of gray.
So it is evident from this fact that the images that colored contain more information.
RGB to
Grayscale
Gray
filtering
using
value
Binarize
Noise
removal
and
smoothing
Remove
small
objects
other than
hand
Region
filling

17 | P a g e
Figure 5.2 This is RGB Image
Figure 5.3 This is a grayscale Image
Binarize
Binarization is a process which converts a gray level image to a binary image. Gray level
image has 0 to 255 levels, whereas in binary image there are only two values; 0 and 1(black
and white).
Grayscale filtering using value
There are many different type of filters in the field of Digital Image Processing, Gray level
filter is one of them. This filter works on gray level image. The aim is to reduce noise in
order to increase accuracy and get better results out of this system. In this a threshold is

18 | P a g e
used to filter out noise in grayscale image. The threshold used in this project was 75, it was
giving better results.
Figure 5.4 Grayscale image
Figure 5.5 Image after Grayscale filtering
Noise removal and smoothing
What is noise? Noise is actually a variation in an image or unwanted and undesired changes
in the color or brightness of an image. Noise in the image need to be removed, because it
will affect the results. If features extracted from a noisy image are used and then it is
classified, it will be misleading and will result in bad classification and results, so in-order

19 | P a g e
to avoid this image is preprocessed by removing noise from this image. It will increase the
accuracy of the system.
In the field of digital image processing smoothing is used as a preprocessing step. This is
a process which will use different type of filters and apply them on the image. What it does
is that it will give an approximation, which means that you can get the important portion
or pattern in an image and the noise in that image will be reduced significantly, hence
improving the results massively. In the figure below there is a small dot, which is unwanted
and is a noise, which need to be removed, because this dot will participate in the feature
extractions process and then in classifying this image in a labeled class it might deviate and
give wrong results.
Figure 5.6 Image with Noise
Figure 5.7 Filtered image with noise being removed.

20 | P a g e
In-order to remove noise from this image a 3x3 median filter is used. What is does is that
it will create a small matrix of dimensions 3x3 and this matrix will move on the image
pixel by pixel. This will calculate the median of all the covered pixels and replace the
middle value or the current pixel with the median of its neighborhood pixels. It will also
make edges clear. The result of this filter is evident in the above example figure.
Remove small objects other than hand
In the figure 5.7 it can be seen that the biggest object in the image is the hand. The object
of interest is the hand, not other small objects or noise acting as a small object in the images.
This biggest object in this case which is hand is called biggest BLOB. In this step a
threshold of 50 was used, that removed all the connected components that have a pixel size
lower than 50, it means remove all the objects that have pixels smaller than 50. As a result
only the biggest object is extracted, which is hand in this case. This uses 8 connected
neighbors.
Figure 5.8 Image before Applying BLOB

21 | P a g e
Figure 5.9 Image after removing small objects other than hand
Region filling
To improve accuracy region filling is applied. This completed the hand portion where due
to bad lighting conditions erroneous or bad image of gestures was captured. This improved
the accuracy of the project a lot. It actually fills the holes left in the gestures.
.....3,2,1)( 1   kABXX c
kk
Take the first point in the hole which is X0. B will be the structuring element, Ac
will be the
complement of the image A. The algorithm will move through all the pixels inside the hole
and apply the above equation which involves dilation operation, till Xk. At this stage the
result will be the whole inside area of the shape and then its union is taken will be taken
with the original image.
Canny edge detection (Additional step)
One additional step that can performed is to extract the contours (edges) of the hand.
Actually edge detection is a technique, which extracts the boundaries of an object in an
image which in this case is the hand. This works and finds edges by using the
discontinuities in the brightness in the image. There are many edge detections algorithms
like, Sobel, Prewitt, Fuzzy logic, Canny and even using erosion and then subtracting it
from original image.
Canny is an algorithm designed to detect edges in the best possible way. What sets Canny
apart from others? Actually Canny takes double threshold value, one for sharp edges and
one for weak edges. Which mean it detects better. They major plus point of Canny over
other algorithms is that it takes First Derivative in Horizontal Direction, Vertical Direction

22 | P a g e
and even diagonally while others can only do this in one direction, either horizontally or
vertically.
Canny takes an image as input and outputs an image with the edges of the object found on
the basis of discontinuities in the brightness. Initially what it does is that it will apply
Gaussian Convolutions to perform image smoothing. After this it applies the derivatives
which results in outputting ridges. Ridges is mountain top or hill top kind of shape, then it
uses a threshold to make all the other parts 0, which means it makes all the other part black
and leave only edge. In the figures 5.10 and 5.11, the effect of Canny and other algorithms
can be seen and it is understandable why Canny is better.
Figure 5.10 Image after applying canny edge detection.
Figure 5.11 Image after applying Sobel edge detection.
It is evident from the figures 5.10 and 5.11 that Canny is a better technique for edge
detection.

23 | P a g e
6 Hand Detection
First of all colored image is read which is captured in image acquisition step. Once we get
the image, the dimensions of the image are calculated. Number of color bands should be
one. If the image is not a grayscale, convert it to grayscale by only taking green channel.
Now find the biggest blobs. This technique results in giving two biggest blobs, ignore the
first biggest blob, which is the largest one. The second biggest blob will be the hand. This
result in drawing box around the blobs and second biggest blob is separated from the image.
The limitation of this technique is that color of clothes and other objects in scene might
effect it. It can be demonstrated by the following figure.
Figure 6.1 hand detection

24 | P a g e
7 Hand cropping
Once the portion of hand is separated from the Image, the hand is cropped out, for this
certain threshold is used. Actually in binarizing of the image a threshold value is used,
which only gives out the portion of image with hand and then we can crop out the hand.
This image of hand is then stored and passed to the next phase.

25 | P a g e
8 Feature extraction
What are features? To understand this let us consider a scenario. An image is acquired, and
user wants to classify this image, now what user does is that he/she will have a large amount
of images stored which will take a lot of space, plus user will have to compare image pixel
by pixel which will be computation expensive and will also have a large space complexity.
This is not a realistic approach. Both these factors need to be reduced. Plus if the object
which in this case is hand, its rotation, translation or position will result in a bad
classification if that variation is not already present in the images that is compared with.
So in-order to avoid this dilemma user uses feature extraction. Now let us come back to
what are features, feature is a term related to the field of computer vision. A features is a
small information or the prominent and important details. These details can be the edges
(contours), or objects.
There are various algorithms used for features extractions like Zernike moments and
Fourier descriptors. In general, descriptors are some set of numbers that are produced to
describe a given shape. A few simple descriptors are:
 Area: The number of pixels in the shape.
 Perimeter : The number of pixels in the boundary of the shape
 Elongation: Rotate a rectangle so that it is the smallest rectangle in which the
shape fits. Then compare its height to its width.
 Rectangularity: How rectangular a shape is (how much it fills its minimal
bounding box) area of the object.
 Orientation: The overall direction of the shape.
Moments are common in statistics and physics. What Statistical Moments Are?
1) Mean
2) Variance
3) Skew
4) Kurtosis
Moment of image is weighted average of the images (Intensities of Pixels) they are usually
have some attractive property. It is useful to describe shapes in an image (Binary) after
segmentation. Using image moments one can find simple properties of an image such as
area (intensity), centroid and orientation of object inside an image.
Raw Moments
Image with pixel intensities I(x, y)
Raw moments of a simple image include:
1) Sum of grey levels or Area (In case of Binary image) : M00
2) Centroid : M10/ M00 , M01/ M00

26 | P a g e
Central Moments
Where f(x, y) is input digital image where
Scale Invariant Moments
Moments nij where i+j >=2 can be constructed to be invariant to both translation and changes
in scale by dividing the corresponding central moment by dividing 00th
moment.
Rotation Invariant Moments (HU set of invariant moments)
HU set of invariant moments are most frequently used which are invariant under Translation,
Rotation and Scale.
These 7 values from I1 to I7 are the feature set Stored as descriptors for each image.
The usefulness of these moments in this application is that they are used to process images
in order to make their features invariant to scale, translation and transformations.HU
moments are used in this project. They are also called invariant statistical moments because
they are not affected by rotation, scaling and translation.

27 | P a g e
9 Hand Gesture Training (Machine learning)
Machine Learning
Machine Learning involves two basic Steps:
 Collecting Training Set.
 Feature Extraction.
Figure 9.1 Machine learning
Training Dataset
The dataset with variations is captured for training step. Training dataset consist of 5
gestures, there are 50 variations for each Gesture. So that the system is trained to get more
accuracy with
Variations of same gesture. This helps to recognize the gesture under different conditions.
Few samples from the proposed dataset are:
Gesture 1:
First Variation Second Variation Third Variation
Training
set Images
Features from
Invariant HU
Moment
Feature
Set

28 | P a g e
Figure 9.2 Punch gesture
Gesture 2:
First Variation Second Variation Third Variation
Figure 9.3 Left gesture
Gesture 3:
First variation Second Variation Third variation
Figure 9.4 Well done gesture
Gesture 4:
First variation Second Variation Third variation
Figure 9.5 Drop gesture

29 | P a g e
Gesture 5:
First variation Second variation Third variation
Figure 9.6 Catch gesture
The following 5 gestures are included.
Figure 9.7 Gestures included in the system

30 | P a g e
Feature Extraction:
Feature extraction in training step is the same as explained in chapter 8 (See page 25).
In training/learning step features of each image are extracted using the method of HU Set
of Invariant Moments and store the result for each image of training set in a file so that
during classification step it need not to be done again. The file contains a matrix having
descriptor values of each image from the training dataset and its classifier class. It saves
time and makes classification robust because the most time consuming operation among
most of these is training.
Normalization:
The matrix of features which is calculated and stored, each row in it represents one image.
Each attribute of matrix represent a specific feature (attribute), one attribute does not
depend on another. Therefore the values of each column need to be normalized irrespective
to each other. Max of each column is stored in a file which will be later used in the
classification step.
Value of each row in a particular attribute (feature) is selected, and is divided by the
maximum value of that attribute (feature) in whole matrix; it is repeated for all the records.
What this does is that it normalizes the values which mean that the resultant values will be
in a range of 0 and 1. This vastly improves the results of classification. It will decrease
biasness where each attribute has the same weight in classification.
Inter class difference:
Average of each class is calculated from matrix of descriptors. In this step one class is
chosen and the distance between the current row and the rest of the classes is calculated.
Same step is done for all the classes and results are stored. After this find three values from
these results, which are; Maximum, Minimum and Median. These values can be used as a
threshold and it depends on the level of hardness for classification. This is an adoptive
threshold and the purpose of this is the prevent under-fitting. The level of hardness is the
level of under-fitting.

31 | P a g e
10 Classification
Classification involves two basic steps:
 Machine Leaning
 Recognition
Machine Learning:
Recognition:
Figure 10.1 Classification steps
Training
set Images
Features from Invariant
HU Moment
Feature
Set
Resul
t Classified
Classification
Test
Image
Feature
s

32 | P a g e
Recognition:
Recognition involve following steps:
 First the features of the test image are calculated using Hu moments.
 These features are compared with the training feature set.
 The algorithm used for classification is KNN (K-nearest neighbor).
 This algorithm uses neighbors to calculate distance and on the basis of distances it
classifies the current record in one of the predefined classes.
 Euclidean is used for finding the distance by comparison.
Euclidean Distance ( ( X,Y),(A,B) ) = [ (X – A)2
+ ( Y – B )2
]1/2
 Gesture is classified in to the class with which it has minimum distance.
 K value is selected, which is the number of neighbors taken in account for every
calculation.
 Carefully select the value of K, if the value of K is too small it is sensitive to noise,
and if the K value is too large the neighbors might include points that are from other
classes. So a normal or medium value of K is selected.
 One of the limitations of this method is that it will classify the input gesture to at
least one of the training class with minimum distance, which results in in-correct
classification. So a Threshold is applied.
 After calculating distance the value is compared with the Threshold.
 If it passes the threshold it is classified, otherwise it is identified as a new gesture.
Test results:
Figure 10.2 Punch gesture test

33 | P a g e
Figure 10.3 Drop gesture test
Figure 10.4 Catch gesture test
Figure 10.5 Left gesture test
Figure 10.6 Well done gesture test

34 | P a g e
11 Text to speech
Once the gesture gets translated the class of the gesture which was given at run time is
obtained. In this function, first of all the type of voices available is searched and then the
first available voice is picked up by default. As a parameter user sends it text and voice
type. After that it sets the speed of text. The speed or pace range of voice can be in the
range of -10 to 10. By default the speed is 0. After that it sets the rate of sampling of the
speech. It is based on speech API of MS window 32.

35 | P a g e
12 UML Diagrams
Use Case Diagram
Figure 12.1 Use case diagram

36 | P a g e
Sequence Diagram
Figure 12.2 Sequence diagram

37 | P a g e
Flow Diagram
Figure 12.3 Flow diagram

38 | P a g e
13 Conclusion
Future work
There are some aspects of projects which can be improved in future.
 Instead of webcam a better and more accurate acquisition device can be used which
even used Infrared for accuracy e.g. Kinect.
 Mechanism for hand detection is not accurate.
 HU set of invariant moments are very basic descriptors as features of image which
will not have good accuracy. A better descriptor can give good results but
classification mechanism may change.
Potential applications
Image recognition concept have vital applications in various fields like:
 Robotics.
 Artificial Intelligence.
 Controlling the Computer through hand gestures.

39 | P a g e
14 Project poster
Poster for this project is created using Adobe InDesign, which is a software by Adobe
specially made for poster designing. This poster is of standard size and is using vector
graphics so no matter how much it is zoomed, its pixels will not burst.
Figure 14.1 Project poster in Adobe InDesign.

40 | P a g e
Figure 14.2 Project poster.

41 | P a g e
15 References
[1] Pujan Ziaie, Thomas M¨uller and Alois Knoll. A Novel Approach to Hand-
Gesture Recognition in a Human-Robot Dialog System: Robotics and Embedded
Systems Group Department of Informatics Technische Universit¨at Munchen.
[2] Pujan Ziaie and Alois Knoll. An invariant-based approach to static Hand-Gesture
Recognition: Technical University of Munich.
[3] Rajat Shrivastava. A Hidden Markov Model based Dynamic Hand Gesture
Recognition System using OpenCV: Dept. of Electronics and Communication
Engineering Maulana Azad National Institute of Technology Bhopal-462001,
India.
[4] Neha S. Chourasia, Kanchan Dhote, Supratim Saha. Analysis on Hand Gesture
Spotting using Sign Language through Computer Interfacing: International
Journal of Engineering Science and Innovative Technology (IJESIT) Volume 3,
Issue 3, May 2014.
[5] Joyeeta Singha, Karen Das. Hand Gesture Recognition Based on Karhunen-
Loeve Transform: Department of Electronics and Communication Engineering
Assam Don Bosco University, Guwahati, Assam, India.
[6] Hunter, E. Posture estimation in reduced model gesture imput systems,
Proceedings of International Workshop on Automated Face and Gestures
Recognition, June 1995.
[7] Chaudhary, A., Raheja, J. L., Das, K., Raheja, S., A Vision based Geometrical
Method to find Fingers Positions in Real Time Hand Gesture Recognition,
Journal of Software, Academy Publisher, Vol.7, 2012.
[8] Segan, J, Controlling computers with gloveless gestures in Virtual Reality
Systems. 1993
[9] Gastaldi G. and et al., "a man-machine communication system based on the
visual analysis of dynamic gestures", International conference on image
processing, Genoa, Italy, September, 2005, pp.397-400

42 | P a g e
16 Turnitin Originality Report
HAND GESTURE RECOGNITION SYSTEM by Afnan Ur Rehman, Haseeb Ansar Iqbal,
Anwaar ul Haq
From HAND GESTURE RECOGNITION SYSTEM (Research)
 Processed on 30-Jun-2015 08:29 PKT
 ID: 553340821
 Word Count: 3906
Similarity Index
10%
Similarity by Source
Internet Sources:
8%
Publications:
6%
Student Papers:
6%
Sources:
1
2% match (Internet from 11-Dec-2007)
http://www.forbes.com/lists/2007/10/07billionaires_The-Worlds-
Billionaires_NameHTML_36.html
2
1% match (Internet from 12-Jul-2013)
http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/MORSE/region-props-and-
moments.pdf
3
1% match (Internet from 12-Oct-2014)
http://www.ijsret.org/pdf/120374.pdf
4
1% match (publications)
A. Musso. "Structural dynamic monitoring on Vega platform: an example of Industry and
University collaboration", Proceedings of European Petroleum Conference EUROPEC,
10/1996
5
1% match (student papers from 16-Dec-2014)
Submitted to iGroup on 2014-12-16
6
1% match (student papers from 03-Aug-2010)
Submitted to Universiti Teknikal Malaysia Melaka on 2010-08-03
7
< 1% match (Internet from 01-Jul-2003)
http://www.discovery.mala.bc.ca/web/bandalia/digital/work.htm
8

43 | P a g e
< 1% match (publications)
Yeo, Hangu, Vadim Sheinin, Yuri Sheinin, and Benoit M. Dawant. "", Medical Imaging
2009 Image Processing, 2009.
9
< 1% match (student papers from 16-Dec-2013)
Submitted to Universiti Malaysia Perlis on 2013-12-16
10
< 1% match (Internet from 05-Jun-2012)
http://www.csjournals.com/IJCSC/PDF1-1/16.pdf
11
< 1% match (student papers from 27-Oct-2012)
Submitted to VIT University on 2012-10-27
12
< 1% match (Internet from 30-Apr-2003)
http://www.goodstaff.com/jobseekers/articles/sat/Sat14.html
13
< 1% match (Internet from 29-Jul-2010)
http://ethesis.nitrkl.ac.in/1459/1/Removal_of_RVIN.pdf
14
< 1% match (Internet from 08-Oct-2013)
http://www.lifesciencesite.com/lsj/life1009s/041_20339life1009s_289_296.pdf
15
Henke, Daniel, Padhraic Smyth, Colene Haffke, and Gudrun Magnusdottir. "Automated
analysis of the temporal behavior of the double Intertropical Convergence Zone over the
east Pacific", Remote Sensing of Environment, 2012.
16
< 1% match (Internet from 07-Mar-2015)
http://en.wikipedia.org/wiki/Image_moment
17
< 1% match (Internet from 05-Dec-2013)
http://eventos.spc.org.pe/inns-iesnn/papers/Jimenez-Oliden-Huapaya-Cardenas-
Neurocopter.pdf
18
< 1% match (Internet from 25-Dec-2014)
http://ijcsn.org/IJCSN-2014/3-4/A-Fast-and-Robust-Hybridized-Filter-for-Image-De-
Noising.pdf
19
< 1% match (Internet from 26-Nov-2002)
http://aips2.nrao.edu/released/docs/user/Utility/node248.html
20
Sungsik Huh. "A Vision-Based Automatic Landing Method for Fixed-Wing UAVs",
Selected papers from the 2nd International Symposium on UAVs Reno Nevada U S A
June 8–10 2009, 2009

44 | P a g e
21
Kong, Fan zhi, Xing zhou Zhang, Yi zhong Wang, Da wei Zhang, Jun lan LI, Shanhong
Xia, Chih-Ming Ho, and Helmut Seidel. "", 2008 International Conference on Optical
Instruments and Technology MEMS/NEMS Technology and Applications, 2008.

Hand gesture recognition system(FYP REPORT)

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Hand gesture recognition system(FYP REPORT)

Similar to Hand gesture recognition system(FYP REPORT) (20)

Hand gesture recognition system(FYP REPORT)