Sign Language Recognition Using Convolutional Neur
Sign Language Recognition Using Convolutional Neur
Abstract. Sign language is a lingua among the speech and hearing-impaired community. It is hard
for most people who are not familiar with sign language to communicate without an interpreter.
Sign language recognition appertains to track and recognize the meaningful emotion of human-
made with head, arms, hands, fingers, etc. The technique that has been implemented here,
transcribes the gestures from sign language to a spoken language which is easily understood by the
listening. The gestures that have been translated include alphabets, words from static images. This
becomes more important for the people who completely rely on gestural sign language for
communication tries to communicate with a person who does not understand the sign language.
Most of the systems that are under use face a recognition problem with the skin tone, by
introducing a filter it will identify the symbols irrespective of the skin tone. The aim is to represent
features that will be learned by a system known as convolutional neural networks (CNN), which
contains four types of layers: convolution layers, pooling/subsampling layers, nonlinear layers, and
fully connected layers.
1. Introduction
A sign language interpreter is a significant step toward improving contact between the deaf and the
general population. Sign language is a natural language used by hearing and speech impaired people to
communicate. It uses hand gestures instead of sound to convey messages or information. Sign language
can vary from one part of the world to another. Due to this, people find difficulty in communicating with
normal people because normal people cannot understand sign languages. There arises a need for sign
philological translators, which can translate sign language to spoken language. However, the availability
of translators is limited when considering the sign language translators and these translators have many
limitations. This led to the development of a sign language recognition system, which can automatically
translate sign language into the text as well as a speech by effective pre-processing and accurate
classification of the signs. According to recent developments in the area of deep learning, neural networks
may have far-reaching implications and implementations for sign language analysis. In the proposed
system, Convolutional Neural Network (CNN) is used to classify images of sign language because
convolutional networks are faster in feature extraction and classification of images over other classifiers.
Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution
of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
Published under licence by IOP Publishing Ltd 1
ICCCEBS 2021 IOP Publishing
Journal of Physics: Conference Series 1916 (2021) 012091 doi:10.1088/1742-6596/1916/1/012091
The environment may also recognise a sign as a compression technique for information transmission,
which is then reconstructed by the receiver. The signs are divided into two categories: static and dynamic
signs. The movement of body parts is frequently included in dynamic signs. Depending on the meaning of
the gesture, it may also include emotions. Depending on the situation of the context, the gesture may be
widely classified as:
• Arm gestures
• Facial / Head gestures
• Body gestures
2. Existing System
This research suggested the use of filters in the sign language translation algorithm because the existing
system has low accuracy as it faced issues with skin tone identification. Sign language conversion can
reach a maximum of 96% of accuracy but achieving that can be a tedious task. The current system failed
to obtain this accuracy as it lagged to identify the skin tone under the low light areas.
3. Objectives
The key goal is to recognize the sign with maximum accuracy apart from different light,dark conditions
must be developed
4. Proposed System
In this article, the filtering of images plays an important role. It improves the accuracy of identifying the
symbols even in low light areas. Before the process of saturation and grey scaling the image is sent to the
filtering system where it tries to find the symbol shown in the hands, after recognizing the symbol the
image is further processed and final result which is the word is obtained.
6. Image Recognition
The software requirements of this system are python, Open Source Computer Vision Library (OpenCV),
TensorFlow, and NumPy. As python is the fastest when compared to other languages, it is used by this
system. TensorFlow is an open-source machine learning tool that is used to train the sign images from
start to finish. This framework makes use of OpenCV is a free and open-source software library for
computer vision and machine learning. Numpy is a Python library that adds support for big, multi-
dimensional arrays and matrices, as well as a wide range of high-level mathematical functions that can be
used for them. [1]. The proposed system not only recognizes the digits and alphabets but also recognizes
the words of sign language. Background elimination is done by giving a range that lies between the color
range of the human hand. Thus this dynamically recognizes the hand region. The lighting condition
problem is corrected by using the color of the human hand.
2
ICCCEBS 2021 IOP Publishing
Journal of Physics: Conference Series 1916 (2021) 012091 doi:10.1088/1742-6596/1916/1/012091
7.3 Rescaling
Some of the existing systems do not include the resizing step in pre-processing. Image resizing is
important to increase or decrease the total number of pixels. In the proposed system, the images are
resized by decreasing the number of pixels. As for size increases, the time it takes to process also
increases. So, the reduction of size is being done in the first step of preprocessing
3
ICCCEBS 2021 IOP Publishing
Journal of Physics: Conference Series 1916 (2021) 012091 doi:10.1088/1742-6596/1916/1/012091
4
ICCCEBS 2021 IOP Publishing
Journal of Physics: Conference Series 1916 (2021) 012091 doi:10.1088/1742-6596/1916/1/012091
x+b) In Equation 4.1, x is the input vector, w is weight, b is bias and g is the activation function.[5] This
calculation is repeated for each layer. To calculate the most reliable weights, the CNN network's
completely linked component goes through its backpropagation process. Weights are allocated to each
neuron that prioritise the most suitable mark. Finally, the neurons "vote" for one of the marks, and the
assignment decision is taken depending on the winner of that vote.The final layer is used to get probability
of the input being in a specific class after going through the Entirely Connected Layer connected layers.
8 Implementation
The above diagram is the basic architecture that was proposed for the system. This consist of the basic
system modules such as image acquisition, pre-processing, feature extraction, and finally producing output
based on pattern recognition
8.4 Cropping:
Cropping is the process of removing unnecessary sections of an image in order to better frame the subject
matter or adjust the aspect ratio[3].
8.5 Resizing:
Photos are resized to suit the available or reserved space. Resizing photographs is a technique for
preserving the original image's quality [5]. The physical scale is affected by changing the resolution, but
not the resolution is affected by changing the physical size.
9 Evaluation Metrics
• [7-8] True Positive (TP): A true positive is when the algorithm forecasts the positive class accurately. (If
the expected and real signs are the same, the outcome is true positive.)
• False Positive (FP): In a false positive, the algorithm forecasts the positive class wrongly. (A false
positive happens where the expected sign varies from the real sign.)
• True Negative (TN):A true negative is when the algorithm forecasts the negative class accurately. (If the
expected sign is the same as the real sign, it is true negative.)
5
ICCCEBS 2021 IOP Publishing
Journal of Physics: Conference Series 1916 (2021) 012091 doi:10.1088/1742-6596/1916/1/012091
• False Negative (FN): A false negative is a result in which the algorithm forecasts the negative class
incorrectly. (If a projected sign varies from a negative sign (not a real sign), the consequence is a false
negative.)
• Precision: Precision (P) equals the number of true positives (TP) divided by the number of true positives
plus the number of false positives (FP).The percentage of applicable results is referred to as precision.
Precision = TP/(T P+FP)
• Recall: The number of real positives (TP) divided by the number of true positives plus the number of
false negatives equals recall (R) (FN). The proportion of overall related findings correctly classified is
referred to as recall. TP/(T P+FN) = Recall
• Sensitivity: In certain fields, sensitivity (also known as true positive intensity, memory, or chance of
detection) refers to the number of true positives that are correctly defined as such (e.g., the percentage of
the predicted sign which correctly identified as the actual sign). Sensitivity = TP/(T P+FN) (6.3)
• Specificity:The proportion of real negatives that are accurately defined as such (e.g., the percentage of
the expected sign that is correctly identified as a negative sign (not actual sign)) is determined through
precision (also regarded as the true negative rate).
Specificity = TN/(FP+TN)
•F-Score: The F score is used to determine a test's accuracy, and it does so by combining precision and
recalls. By combining accuracy and memory, the F score will offer a more accurate assessment of a test's
results. F = 2;
For each sign a total of 200 images are taken for evaluation. Here Recall and Precision evaluation metrics
are calculated for each sign. Table 1 the proposed system's test findings The number of True Positives
(TP), False Positives (FP), True Negatives (TN), and False Negatives (FN) generated while measuring
these signs are mentioned in the table below. 13 signs are shown here as an example. Table 2 shows the
Recall and Precision that are calculated for different signs of the proposed system using the formulae
mentioned above. The calculated values are shown for 13 signs which are part of the proposed system.
Thus for each sign, the output precision can be known from this table and its sensitivity is also given.
10 Result
The device was put to the test with real-time video feedback, and the results were tallied. In any event, the
SL recognition system's success is perfect. On all of the signs, the highest precision is 90%. This means
that the machine is capable of understanding the plurality of signals. Figures 1-9 shows the results. Tables
1 and 2 shows the Sign names.
Software:
Operating System : Windows 7/8/10
Language : Python 3.7
Tools : Tensorflow,open cv ,pyttsx3,numpy,keras
Browser : Firefox / Chrome / Internet explorer
Hardware:
Processor : Intel Core i7
RAM : 8GB
Hard Disk : 1TB
Mouse : logical optical mouse
Keyboard : logical 107 keys
Motherboard : Intel
Speed : 3.3GHZ
6
ICCCEBS 2021 IOP Publishing
Journal of Physics: Conference Series 1916 (2021) 012091 doi:10.1088/1742-6596/1916/1/012091
11 Visual Results:
7
ICCCEBS 2021 IOP Publishing
Journal of Physics: Conference Series 1916 (2021) 012091 doi:10.1088/1742-6596/1916/1/012091
8
ICCCEBS 2021 IOP Publishing
Journal of Physics: Conference Series 1916 (2021) 012091 doi:10.1088/1742-6596/1916/1/012091
9
ICCCEBS 2021 IOP Publishing
Journal of Physics: Conference Series 1916 (2021) 012091 doi:10.1088/1742-6596/1916/1/012091
1.Conclusion
The proposed system successfully predicts the signs of sign and some common words under different
lighting conditions and different speeds. Accurate masking of the images is being done by giving a range
of values that could detect human hand dynamically. The proposed system uses CNN for the training and
classification of images. For classification and training, more informative features from the images are
finely extracted and being used. A total of 1750 static images for each sign is used for training to get the
accurate output. Finally, the output of the recognized sign is shown in the form of text as well as converted
into speech. The system is capable of recognizing 125 words including alphabets. Thus this is a user-
friendly system that can be easily accessed by all the deaf and people.
10
ICCCEBS 2021 IOP Publishing
Journal of Physics: Conference Series 1916 (2021) 012091 doi:10.1088/1742-6596/1916/1/012091
References
[1] S. C. W. Ong and S. Ranganath, ―Automatic sign language analysis: A survey and the future
beyond lexical meaning,ǁ IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 6, pp. 873– 891,
Jun. 2005.
[2] L. Ding and A. M. Martinez, ―Modelling and recognition of the linguistic components in
American sign language,ǁ Image Vis. Comput., vol. 27, no. 12, pp. 1826– 1844, Nov. 2009.
[3] D. Kelly, R. Delannoy, J. Mc Donald, and C. Markham, ―A framework for continuous
multimodal sign language recognition,ǁ in Proc. Int. Conf. Multimodal Interfaces, Cambridge,
MA, 2009, pp. 351–358
[4] G. Fang, W. Gao, and D. Zhao, ―Large vocabulary sign language recognition based on fuzzy
decision trees,ǁ IEEE Trans. Syst., Man, Cybern. A Syst. Humans, vol. 34, no. 3, pp. 305–314,
May 2004.
[5] Haldorai, A. Ramu, and S. Murugan, Social Aware Cognitive Radio Networks, Social Network
Analytics for Contemporary Business Organizations, pp. 188–202. doi:10.4018/978-1-5225-
5097-6.ch010
[6] R. Arulmurugan and H. Anandakumar, Region-based seed point cell segmentation and detection
for biomedical image analysis, International Journal of Biomedical Engineering and Technology,
vol. 27, no. 4, p. 273, 2018.
[7] N. Purva, K. Vaishali, Indian Sign language Recognition: A Review, IEEE proceedings on
International Conference on Electronics and Communication Systems, pp. 452-456, 2014.
[8] F. Pravin, D. Rajiv, HASTA MUDRA An Interpretation of Indian Sign Hand Gestures, 3rd
International conference on Electronics Computer technology, vol. 2, pp.377-380, 2011
11