Report
Report
A PROJECT REPORT
Submitted By
AMRITHA M L( TVE19MCA012 )
ANJU V H ( TVE19MCA017 )
FEJINA TERESA A F ( TVE19MCA029 )
to
of
DECEMBER 2021
Declaration
We undersigned hereby declare that the project report titled ”Real time malayalam
sign language detection using deep learning” submitted for partial fulfillment of the re-
quirements for the award of degree of Master of Computer Applications of the APJ Abdul Kalam
Technological University, Kerala is a bonafide work done by me under supervision of Smt. Baby
Syla L, Asst.Professor. This submission represents our ideas in our words and where ideas or
words of others have been included. we have adequately and accurately cited and referenced the
original sources. we also declare that we have adhered to ethics of academic honesty and integrity
as directed in the ethics policy of the college and have not misrepresented or fabricated any data
or idea or fact or source in our submission. We understand that any violation of the above will be
a cause for disciplinary action by the Institute and/or University and can also evoke penal action
from the sources which have thus not been properly cited or from whom proper permission has
not been obtained. This report has not been previously formed the basis for the award of any
degree, diploma or similar title.
COLLEGE OF ENGINEERING
TRIVANDRUM
CERTIFICATE
This is to certify that the report entitled Real time malayalam sign language
detection using deep learning submitted by Amritha M L (TVE19MCA012), Anju V
H (TVE19MCA017), Fejina Teresa A F (TVE19MCA029) to the APJ Abdul Kalam
Technological University in partial fulfillment of the requirements for the award of the Degree
of Master of Computer Applications is a bonafide record of the project work carried out by him
under my guidance and supervision. This report in any form has not been submitted to any
University or Institute for any purpose.
Acknowledgement
First and for most I thank GOD almighty and to my parents for the success of this
project. I owe a sincere gratitude and heart full thanks to everyone who shared their precious
time and knowledge for the successful completion of my project.
I express our sincere thanks to Prof. Baby Syla L, Asst. Professor, Department of
Computer Applications, College of Engineering Trivandrum for her valuable guidance, support
and advice that aided in the successful completion of my project.
I profusely thank other Asst. Professors in the department and all other staffs of CET,
for their guidance and inspirations throughout my course of study.
I owe my thanks to my friends and all others who have directly or indirectly helped me
in the successful completion of this project. No words can express my humble gratitude to my
beloved parents and relatives who have been guiding me in all walks of my journey.
Amritha M L
Anju V H
Fejina Teresa A F
Abstract
Deafness does not restrict its negative effect on the person’s hearing, but rather on all
aspect of their daily life. Moreover, hearing people aggravated the issue through their reluctance
to learn sign language. This resulted in a constant need for human translators to assist deaf
person which represents a real obstacle for their social life. Therefore, automatic sign language
translation emerged as an urgent need for the community. The availability and the widespread
use of computers equipped with web cameras promoted the design of Real time malayalam sign
language recognition systems(MSL). In this work, we introduce a new MSL recognition system
that is able to localize and recognize the alphabet of the malayalam sign language using a Faster
Region-based Convolutional Neural Network (R-CNN). Specifically, faster R-CNN is designed
to extract and map the image features, and learn the position of the hand in a given image.
Additionally, the proposed approach alleviates both challenges; the choice of the relevant features
used to encode the sign visual descriptors, and the segmentation task intended to determine the
hand region. For the implementation and the assessment of the proposed Faster R-CNN based
sign recognition system, we exploited Zoo model, and we collected a real MSL image dataset. The
proposed approach yielded 92% accuracy and confirmed the robustness of the proposed model
against drastic background variations in the captured scenes.
Contents
1 Introduction v
3 Literature Review 4
4 Requirement Analysis 5
4.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4.2 Overall Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4.2.1 Software Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4.2.2 Hardware Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4.3 Technologies Used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4.3.1 Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4.3.2 Tensorflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.3.3 OpenCV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.3.4 LabelImg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.4 Functional Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.5 Non Functional Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
5 System Design 8
6 Coding 11
7 Implementation 12
7.1 Pseudocode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
7.1.1 Capturing Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
7.1.2 Training and Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
CONTENTS
7.1.3 Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
8 Testing 14
8.1 Testing and various types of testing used. . . . . . . . . . . . . . . . . . . . . . . . 14
8.1.1 Unit Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
8.1.2 Integration Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
8.1.3 System Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
List of Figures
iii
List of Tables
iv
Chapter 1
Introduction
Gesturing is one of the earliest forms of human communication. Nowadays, Deaf and Hard
of Hearing (DHH) people are the predominant users of the officially recognized sign language
which consists of alphabets, numbers, and words typically used to communicate within and
outside their community. Typically, a sign language consists of; (i) manual components, and
(ii) non-manual component. Specifically, the configuration, the position, and the movement
of the hands form the manual components. On the other hand, the facial expression and the
body movement compose the non-manual components. Such sign language is perceived as a
non-verbal communication way that is mainly intended to ease the communication for the DHH
persons. However, the communication between a Deaf person and a hearing individual remains
an open challenge for the community. In fact, approximately 466 million people who suffer from
a moderate to profound hearing loss struggle with communication daily. In other words, deaf
people cannot be considered as a linguistic minority which the language can be neglected.
A sign language includes designated hand gestures for each letter of the alphabet. These
gestures are used to spell people names, places, and other words without a predefined sign.
Besides, it is a common occurrence for the sign formation to resemble the shape of the written
letter. Although the hand gestures exhibit some similarities due to the limited number of possible
hand gestures, sign language is not universal. Specifically, there are 150 sign languages around the
world. They vary based on the region/country rather than the language itself. The Malayalam
Sign Language (MSL) includes 62 identical alphabet signs.
CHAPTER 1. INTRODUCTION
People who are unable to communicate verbally use sign language to communicate with
others. One of the most important fields of research and study in computer vision is sign lan-
guage. This real-time sign language recognition system was created to recognise Malayalam Sign
Language gestures. So yet, no sign language recognition system for understanding gestures in
Malayalam has been created. There have been various technological improvements, as well as
much study, to assist the deaf and dumb. Deep learning and computer vision can also be utilised
to help with the cause. This may be extremely useful for deaf and dumb people in interacting
with others, as understanding sign language is not something that everyone has. Furthermore,
this can be expanded to building automated editors, where a person can easily write using only
their hand movements.
CHAPTER 1. INTRODUCTION
Chapter 2
There are a lot of folks that are unable to communicate. They are unable to communicate
their messages to ordinary people. NISH (National Institute of Speech and Hearing) will assist
such individuals with their education by providing sign languages. However, there is still the
issue that ordinary people are unable to comprehend what they are attempting to communicate.
It is the driving force behind the development of this system. For object collecting, training
and testing data, and picture detection, we use OpenCV, TensorFlow, and LabelImg. The core
technology behind it is deep learning.
Because deaf individuals communicate via hand signals, regular folks have a hard time un-
derstanding what they’re saying. As a result, systems that identify various signs and deliver
information to ordinary people are required.
Chapter 3
Literature Review
SLR is a prominent study topic, but despite its popularity, it is rarely used on a daily
basis; this is owing to the complexity and numerous resources required. Through an examination
of the methodology and models used to create a functional model of any sign-language translator
from multiple sources, we examined numerous strategies that may be utilised to develop an
automated sign-language translator in this literature study. The goal of this research is to
look at different approaches to use Artificial Intelligence technology to enhance the currently
unavailable automated Malayalam Sign Language translator. During this time, we discovered a
number of study articles. The study revealed that each of the selected research studies produced
reasonable findings; nevertheless, they are not flawless, since each research has its own set of
strengths and shortcomings.There are various ways that may be suited for our desire to develop
a usable Sign Language Translator, such as employing a regular video camera to collect data and
either a Convolutional Neural Network or a Support Vector Machine to classify the input.
This study focuses on the recent advancement of technology that allow persons with
speech impairments to speak readily and often with regular people. The work done thus far for
development has included smart gloves, Android applications, and techniques such as Convolution
Neural Networks, Gaussian filtering, HMM, voice to text, video to Text then to Speech, and so on.
The system was put to the test in real time with several sample motions, and it was discovered
that the average recognition rate was 99 percent. As a consequence, we decide that this effort
will proceed based on the correctness of the test results.
Chapter 4
Requirement Analysis
4.1 Purpose
In this research, we aim to recognize the hand gestures of the Malyalam sign language
using two-dimensional images, and translate them into text. The proposed system is intended
to support non-hearing people in their communication with others either they master or not the
MSL language. This would lessen the social hardship this community withstands daily. Moreover,
the proposed system is not a bothersome for the user since it does not require any accessory or
sophisticated sensors or cameras. Specifically, we propose a faster R-CNN based approach to
localize and classify the thirty letters of the Malayalam sign language. In particular, a deep
learning network that is designed as a typical CNN architecture is utilized as a feature extractor.
The rationale behind the choice of the proposed Region CNN (R-CNN) is its noticeable impact on
the object recognition field. In fact, the region proposals generation using an intelligent selective
search yields to relax the need for a separate image segmentation stage. Nevertheless, some
limitations were noticeable concerning the efficiency of the method, more specifically, the large
number of proposals that are conveyed to the network represents a major drawback. Therefore,
the more recent version fast R-CNN was introduced to enhance the performance by integrating
a Region of Interest (ROI) pooling layer and thus reducing the processing time required by
the network. Despite this enhancement, the main issue still persists, laying within the time-
consuming selective search used for proposal generation. Consequently, the latest incarnation of
region CNN, namely the faster RCNN, was considered adapted in this research to exploit the
Region Proposal Network (RPN)
• Python
• TensorFlow
• OpenCV
• LabelImg
• Numpy
• Memory : 8 GB RAM
• Webcam :2 MP
4.3.1 Python
Python is a high-level, interpreted programming language that may be used for a vari-
ety of tasks. Python’s design philosophy prioritises code readability, as seen by its extensive
use of whitespace. Its language elements and object-oriented approach are aimed at assisting
programmers in writing clear, logical code for both small and large-scale projects. Python is
garbage-collected and dynamically typed. Procedural, object-oriented, and functional program-
ming are among the programming paradigms supported.
4.3.2 Tensorflow
TensorFlow is a machine learning software library that is free and open-source. It may
be used for a variety of applications, but it focuses on deep neural network training and infer-
ence. TensorFlow is a dataflow and differentiable programming-based symbolic math framework.
At Google, it’s utilised for both research and manufacturing. The Google Brain team created
TensorFlow for internal Google use. In 2015, it was distributed under the Apache License 2.0.
4.3.3 OpenCV
It’s utilised for identifying signs, and OpenCV is used to detect objects in real time. It’s
a massive open source library for image processing, computer vision, and machine learning. It
supports a broad range of programming languages, including C++ and Python.
4.3.4 LabelImg
LabelImg is a free, open source programme for tagging pictures visually. It’s developed
in Python and has a graphical user interface built with QT. It’s a quick and painless approach
to identify a few hundred photographs for your next object detection project.
Chapter 5
System Design
Deep Learning Technology is used to detect sign language in real time. It works by en-
gaging directly with the user’s language and translating it into Sign Language. It entails the
following procedures:
1. Using a webcam to gather images.
2. Using LabelImg to label captured images.
3. Educate the model.
The Mobile Net architecture is used in the TensorFlow pretrained model. It’s a single convolu-
tional network that learns to anticipate and categorise bounding box positions in a single run.
The bounding box is an imagined rectangle that produces a collision box and acts as a point of
reference for object detection. 4. The ability to detect it in real time.
The majority of the coding is done using a technique named ”Object Detection: Faster
RCNN.” Only a few code tasks are required.
Chapter 6
Coding
Faster RCNN receives CNN’s feature maps and forwards them to the Region Proposal
Network. RPN overlays these feature maps with a sliding window, and for each window, it
constructs k Anchor boxes of various shapes and sizes: Anchor boxes are a type of fixed-size
boundary box that may be found all over the picture and come in a variety of forms and sizes.
RPN forecasts two things for each anchor:
• The first is the likelihood that an anchor is an object (it ignores the object’s class).
• The second is the bounding box regressor, which is used to adapt the anchors to better suit
the object.
Bounding boxes of various shapes and sizes have now been passed on to the RoI pooling layer.
After the RPN stage, it’s likely that some proposals will have no classes assigned to them. Each
proposal can be cropped such that each proposal contains an item. The RoI pooling layer does
this. For each anchor, it extracts fixed-size feature maps
11
Chapter 7
Implementation
7.1 Pseudocode
12
CHAPTER 7. IMPLEMENTATION
7.1.3 Detection
while True:
ret, frame = cap.read()
imagen p = np.array(f rame)
inputt ensor = tf.convertt ot ensor(np.expandd ims(imagen p, 0), dtype = tf.f loat32)
detections = detectf n(inputt ensor)
numd etections = int(detections.pop(′ numd etections′ ))
detections = key : value[0, : numd etections].numpy()f orkey, valueindetections.items()detections[′ numd
numd etections
detectionc lassesshouldbeints.detections[′ detectionc lasses′ ] = detections[′ detectionc lasses′ ].astype(np.int
labeli do f f set = 1
imagen pw ithd etections = imagen p.copy()
vizu tils.visualizeb oxesa ndl abelso ni magea rray(imagen pw ithd etections, detections[′ detectionb oxes′ ], detect
labeli do f f set, detections[′ detections cores′ ], categoryi ndex, usen ormalizedc oordinates = T rue, maxb oxest
5, mins coret hresh = .5, agnosticm ode = F alse)
cv2.imshow(′ objectdetection′ ,
cv2.resize(imagen pw ithd etections, (800, 600))
if cv2.waitKey(1)0xF F == ord(′ q ′ ) :
cap.release()
break
Chapter 8
Testing
The primary goal of testing application structures will be to detect and uncover all flaws
and associated dangers, as well as to pass on each and every acknowledged issue to be addressed
in a proper issue before release. The finest available strategies, approaches, work processes, and
14
CHAPTER 8. TESTING
frameworks were utilised to configure, create, execute, and manage the testing of the assignment
”Real Time Sign Language Detection” in this test approach record.
CHAPTER 8. TESTING
CHAPTER 8. TESTING
Chapter 9
It is observed that the system performs all the functionalities as expected. The system
updates the Python project with valid codes based on the verbal inputs given by the user. The
main aim behind this project was to ease the process of communication between common people
and speech impaired people.The proposed system provides a lot of advantages over the existing
system. It solves the problems of the existing system. The proposed system is more useful than
the existing system in many ways
18
9.0.1 Advantages
Some words like yes, no, thanks etc. are also can be detected.
9.0.2 Limitations
Chapter 10
With the aid of OpenCV, TensorFlow, and Python, we suggest a concept for communi-
cation between a speech impaired and a normal person in this project. The correctness of this
planned work is guaranteed to be 99 percent.
20
Bibliography
[1] https://www.tensorflow.org
[2] https://opencv.org
[3] https://en.wikipedia.org
[5] Real time sign language recognition based on neural network architecture, IEEE April 2011.
[6] Real time Bangla Sign Language Detection, ICCIT April 2020.
[7] https://jupyter.org
[8] https://www.python.org
[9] https://www.w3schools.com
21