Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
4 views

Aml 22 Sign Language Recognition Using Deep Learning

This document presents a study on developing a deep learning-based solution for recognizing Indian Sign Language (ISL) using MediaPipe Holistic for feature extraction and Long Short-Term Memory (LSTM) networks for detection and translation. The research highlights the importance of bridging communication gaps for the approximately 63 million deaf individuals in India and aims to create a reliable system for accurate sign language recognition. The proposed approach combines advanced deep learning techniques with effective landmark extraction strategies to facilitate inclusive communication between deaf and hearing communities.

Uploaded by

kotisuraboina
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Aml 22 Sign Language Recognition Using Deep Learning

This document presents a study on developing a deep learning-based solution for recognizing Indian Sign Language (ISL) using MediaPipe Holistic for feature extraction and Long Short-Term Memory (LSTM) networks for detection and translation. The research highlights the importance of bridging communication gaps for the approximately 63 million deaf individuals in India and aims to create a reliable system for accurate sign language recognition. The proposed approach combines advanced deep learning techniques with effective landmark extraction strategies to facilitate inclusive communication between deaf and hearing communities.

Uploaded by

kotisuraboina
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

IEEE - 61001

Sign language recognition using deep learning


2024 15th International Conference on Computing Communication and Networking Technologies (ICCCNT) | 979-8-3503-7024-9/24/$31.00 ©2024 IEEE | DOI: 10.1109/ICCCNT61001.2024.10725481

Swapnil Shinde ,
AI & DS Parikshit Mahalle Sayee Panchal
Vishwakarma institute of information AI & DS, AI & DS.
Technology, Pune,India Vishwakarma institute of information Vishwakarma institute of information
swapnil.shinde@viit.ac.in Technology, Pune,India Technology, Pune,India
parikshit.mahalle@viit.ac.in sayee.22110420@viit.ac.in

Shreya Mahalle
AI & DS Atharva Pandit Parag Tonpe
Vishwakarma institute of information AI & DS AI & DS
Technology, Pune,India Vishwakarma institute of information Vishwakarma institute of information
shreya.22110637@viit.ac.in Technology, Pune,India Technology, Pune,India
atharva.22110449@viit.ac.in parag.22110305@viit.ac.in

I. INTRODUCTION
Abstract— This study aims to develop a robust deep learning-based
solution for accurate detection and recognition of Indian Sign
In light of a report from the World Health Organization highlighting
Language (ISL) motions. The proposed model integrates advanced
that India is home to approximately 63 million deaf individuals,
techniques such as MediaPipe Holistic for feature extraction and
whether completely or partially hearing-impaired, the significance of
Long Short-Term Memory (LSTM) networks for sign language
detection and translation. The overall process is divided into addressing communication barriers within this demographic becomes
several key phases: data collection, feature extraction, and model increasingly apparent. Sign language, serving as the primary means of
training using essential Python modules. In the data collection communication for the deaf community, stands as a vital component of
phase, a comprehensive dataset of ISL motions is compiled in the their cultural identity. Employing body and eye gestures, sign language
form of video clips. These videos are meticulously annotated with possesses its own distinctive vocabulary, datasets, grammatical
corresponding textual representations of the signs to facilitate structures, and rules, analogous to any spoken language used by the
accurate training and evaluation. The preprocessing step involves hearing population. Despite its importance, a considerable gap exists in
extracting significant hand landmarks from the video frames, the understanding of sign language by non-deaf individuals, thereby
which are crucial for distinguishing between different sign contributing to pervasive communication challenges.
language motions.Feature extraction is carried out using the Sign language, being a visual and physical mode of communication, is
MediaPipe Holistic algorithm, which is renowned for its high integral to the interaction between deaf or hard-of-hearing individuals.
accuracy and efficiency in detecting hand and body landmarks. However, the limited fluency of sign language among the broader
The provided code snippet outlines the detailed procedure for population exacerbates the existing communication divide, hindering
converting videos to landmark coordinates. This process leverages effective interaction. In response to this pressing need, recent years
several libraries including OpenCV for video processing, have witnessed a surge in interest and efforts to develop technological
MediaPipe for landmark detection, and NumPy for efficient data solutions aimed at bridging the communication gap between deaf and
manipulation. The extracted landmarks serve as critical inputs to hearing communities. One promising avenue involves the automatic
the subsequent deep learning model. Using the TensorFlow or conversion of sign language into written or spoken language, and vice
Keras frameworks, an LSTM network—a type of Recurrent versa, through the utilization of advanced technologies. This review
Neural Network (RNN) model—is implemented during the deep paper delves into the exploration of the application of MediaPipe
learning phase. These frameworks are essential for specifying the Holistic and LSTM (Long Short-Term Memory) for the conversion of
model's architecture, preprocessed data training, and real-time sign language—an endeavor that holds immense potential for fostering
sign language recognition deployment. Because the LSTM network seamless communication and inclusivity between these distinct
can capture temporal dependencies in sequential data, it is linguistic communities.
especially well-suited for understanding the dynamic character of In light of a report from the World Health Organization highlighting
sign language motions. The goal of this research is to develop a that India is home to approximately 63 million deaf individuals,
reliable and accurate system for recognizing sign language. The whether completely or partially hearing-impaired, the significance of
purpose of this method is to facilitate inclusive communication by addressing communication barriers within this demographic becomes
removing obstacles to communication that the hearing-impaired increasingly apparent. Sign language, serving as the primary means of
community faces. The project's objectives of precise and communication for the deaf community, stands as a vital component of
dependable ISL recognition depend on the effective fusion of their cultural identity. Employing body and eye gestures, sign language
cutting-edge deep learning approaches with effective landmark possesses its own distinctive vocabulary, datasets, grammatical
extraction strategies. structures, and rules, analogous to any spoken language used by the
hearing population. Despite its importance, a considerable gap exists in
the understanding of sign language by non-deaf individuals, thereby
Keywords: Sign Language Recognition, Deep Learning, MediaPipe
contributing to pervasive communication challenges.
Holistic, RNN model, LSTM, Indian Sign Language, Feature
Sign language, being a visual and physical mode of communication, is
Extraction, TensorFlow, Keras, OpenCV, NumPy.
integral to the interaction between deaf or hard-of-hearing individuals.

15th ICCCNT IEEE Conference,


June 24-28,
Authorized licensed use limited to: SRM University Amaravathi. Downloaded on2024,
March 18,2025 at 09:25:04 UTC from IEEE Xplore. Restrictions apply.
IIT - Mandi, Kamand, India
IEEE - 61001

However, the limited fluency of sign language among the broader to a skeleton pose using pose estimation and Decision Tree algorithms.
population exacerbates the existing communication divide, hindering They create datasets using Tf-pose-estimation and the system can
effective interaction. In response to this pressing need, recent years recognize multiple sign language gestures in sequence and output the
have witnessed a surge in interest and efforts to develop technological corresponding words.
solutions aimed at bridging the communication gap between deaf and
hearing communities. One promising avenue involves the automatic The aim of [12] is to create a hand detection-based learning tool for
conversion of sign language into written or spoken language, and vice individuals who are new to sign language. They proposed a solution
versa, through the utilization of advanced technologies. This review that utilizes a CNN algorithm to identify and translate static sign
paper delves into the exploration of the application of MediaPipe gestures into their corresponding words. The system achieved an
Holistic and LSTM (Long Short-Term Memory) for the conversion of accuracy rate of 93.44% in recognizing numbers, with an average time
sign language—an endeavor that holds immense potential for fostering of 3.93 seconds.
seamless communication and inclusivity between these distinct
linguistic communities. III. RELATED WORK
A. Gesture Recognition Using Deep Learning
II. LITERATURE SURVEY Deep learning techniques have resulted in significant advancements in
The primary goal of the proposed system in [1] is to create a feature gesture recognition, which involves detecting and interpreting human
vector capable of representing dynamic hand movements and gestures and movements. Convolutional Neural Networks (CNNs) and
achieving adequate recognition accuracy using just the Leap Motion Recurrent Neural Networks (RNNs), specifically, have been critical in
controller (LMC). A feature vector with depth information is developing models that automatically learn to recognize gestures.
computed and sent into a Hidden Conditional Neural Field (HCNF) CNNs (Convolutional Neural Networks):
classifier as part of the suggested solution. For the LeapMotion- CNNs have been extensively used to recognize hand gestures. Hand
Gesture 3D dataset and the Handicraft-Gesture dataset, the system shape recognition, an essential component of gesture recognition,
achieved recognition accuracies of 89.5% and 95.0%, respectively. depends on correctly identifying features such as finger positions and
The system's main advantage is the LMC's superior localization hand orientation. CNNs can efficiently learn to detect these features
precision compared to other depth sensors. and classify hand shapes in real-time.
[2] proposes using Google API and NLP to automate sign-to-text RNNs (Recurrent Neural Networks):
language conversion. The solution entails acquiring the input sign and RNNs, particularly Long Short-Term Memory (LSTM) networks,
converting it to text with Google API, removing the infected parts with have been utilized for continuous gesture recognition. Recognizing a
NLP concepts, and matching each word/character in the processed text sequence of gestures, rather than individual gestures, is complex, and
with the visual sign word library to retrieve the matched videos. These RNNs are well-suited to this challenge due to their ability to capture
videos are then concatenated to create a single video on the final display temporal dependencies between gestures.
that depicts the entire text in sign language. In terms of sign Other Deep Learning Techniques:
interpretation, the proposed model achieved 90% accuracy. Besides CNNs and RNNs, other deep learning methods have been
[3] The paper titled "Advancements in English to Regional Machine applied to gesture recognition. For example, the Transformer
Translation" discusses approaches and challenges in English-to- model, originally designed for natural language processing, has
Regional machine translation, an essential field for bridging language been adapted for gesture recognition. It treats gestures as a
barriers. sequence of symbols, effectively capturing complex relationships
[4] The research paper "Text-to-Speech Synthesis: An Overview" between them.
provides a thorough overview of text-to-speech technology, which is The ability of deep learning to learn directly from raw data
essential for converting written text into spoken language. eliminates the need for manual feature engineering in gesture
[5] The research paper "Neural Machine Translation and Sequence-to- recognition, allowing models to uncover complex patterns and
Sequence Models: A Tutorial" offers a tutorial on neural machine relationships. However, deep learning models require a large
translation models, which play a vital role in enhancing translation amount of labeled data for training, which can be difficult to
capabilities. obtain due to the variability and complexity of gestures.
[6] Introduces metrics like BLEU, which are commonly used in While deep learning shows immense promise in gesture
machine translation research and aid in the evaluation of translation recognition, challenges remain, such as enhancing model
quality. robustness to factors like lighting, orientation, and background
[7] This topic explores the development of a system that can instantly variations, and adapting models to diverse types of gestures.
translate sign language into spoken language and text, aiming to
enhance communication accessibility for the deaf and hard of hearing B. Computer Vision in Education
community.
[8] This thesis by Daniel Varab aims to enhance automatic text Computer vision has found applications in education,
summarization by improving language support and designing more revolutionizing how students learn and interact with educational
pragmatic systems for generating concise and context-aware content.
summaries. Gesture-Based Learning:
The goal of reference [9] was to create a system that could translate Computer vision enables gesture-based learning systems that
ISL gestures into both text and speech. To accomplish this, they interpret students' hand gestures to interact with educational
proposed converting gesture images into text/speech using the K- software and content. This technology makes learning more
nearest Neighbor algorithm. Their strategy consisted of four steps: A) engaging and interactive, allowing students to control educational
using captured ISL gestures as input, B) extracting features from the applications through gestures and sign language.
segmented images, C) analyzing multiple images using unsupervised Sign Language Instruction:
feature learning (UFL) and classification, and D) synthesizing text and Computer vision can be employed to create sign language teaching
speech from the classified images. In unsupervised feature learning, tools. These tools can recognize and interpret sign language
the system achieved an accuracy rate of 78%. gestures, helping both deaf and hearing individuals learn and
The goal of [10] is to create a Sign Language Interpreter using 2D/3D practice sign language more effectively. The system can provide
sensing and AI/ML - Neural Network algorithms to bridge the feedback and corrections, making sign language education more
communication gap between individuals who are deaf or mute and accessible.
those who are not. To generate text from sign language, this system Interactive Learning Environments:
employs a vision-based approach and neural network algorithms. Computer vision creates interactive learning environments where
Similarly, [11] aims to eliminate the communication barrier between students can use gestures to control virtual elements. For example,
individuals who are deaf or mute and those who are not. They propose students can manipulate 3D models or conduct virtual experiments
a system that converts text to a gloss network and then maps the gloss using hand gestures, enhancing their understanding of complex

15th ICCCNT IEEE Conference,


June 24-28,
Authorized licensed use limited to: SRM University Amaravathi. Downloaded on2024,
March 18,2025 at 09:25:04 UTC from IEEE Xplore. Restrictions apply.
IIT - Mandi, Kamand, India
IEEE - 61001

concepts. translation accuracy can empower deaf and hard-of-hearing


Accessible Educational Resources: individuals to participate more actively in various social and
Computer vision can be integrated into e-learning platforms to professional settings, fostering greater confidence and
make educational resources accessible to students with varying independence.
needs. This technology allows for adaptive content delivery, V.PROPOSED APPROACH
ensuring that students can access educational materials through The proposed gesture recognition approach entails the creation of a
sign language or gesture-based interaction. deep learning model that combines computer vision techniques
Computer vision in education holds the potential to revolutionize with recurrent neural networks (RNNs) to accurately interpret and
the learning experience, making it more inclusive and interactive understand a wide range of human gestures.
for students of diverse backgrounds and abilities. This application Data Collection
of computer vision contributes to ongoing advancements in the Recording Device: We utilized a Smartphone (Nothing Phone 2)
field of education.
for recording the videos of various sign language gestures. This
IV. GAP ANALYSIS device was selected due to its high-quality camera and ease of use.
Limitations in Existing Systems are : Here is an example of some words that contribute to our dataset.
Accuracy: Current sign-to-text translation systems often struggle with
accuracy, particularly when dealing with:
1. Complex sequences of gestures: Existing systems may struggle to
capture the nuances of how signs flow together, leading to errors in
translating the complete message conveyed through sign language.
2. Variations in signing styles: Sign language can vary due to regional
dialects or individual signing habits. Current systems may misinterpret
signs if they haven't been trained on a diverse dataset of signing styles.
Data Dependence: The performance of these systems heavily relies on the
quality and diversity of their training data.
1. Limited data: Systems trained on limited datasets may not be able to
recognize unseen signs or variations in signing styles, hindering their
real-world usability.
Generalizability: Existing systems does not generalize well to real-world
scenarios with environmental variations like:
1. Lighting changes: Fluctuations in lighting can affect how the system
interprets hand shapes and positions, leading to translation errors.
2. Background noise: Background clutter or movement can cause the
system to focus on irrelevant information, impacting translation
accuracy. Fig: dataset
Addressing the Gaps and Potential for Improved Accuracy: Actions: The dataset includes a comprehensive set of common
Enhanced Accuracy Through LSTMs: Our approach utilizes Long words and phrases in Indian Sign Language (ISL). A wide range of
Short-Term Memory (LSTM) networks. LSTMs excel at capturing gestures were recorded to ensure the dataset covers a broad
temporal dependencies within sequences. This allows our system to spectrum of ISL vocabulary and is about 27.9 GB of size.
potentially achieve higher accuracy in recognizing sequences of Recording Process: Each sign was recorded in video format using
signs compared to methods relying solely on static sign recognition. the Nothing Phone 2. However, the subsequent data processing
This translates to a more accurate understanding of the complete steps, including frame extraction and keypoint extraction, were
message conveyed through sign language. performed on a laptop computer equipped with appropriate
Rigorous Data Collection: Unlike some existing works that lack software tools.
details about data collection, our project employs a rigorous Frame Extraction: Frames were extracted from each video using OpenCV
approach. We have collected a diverse dataset of ISL gestures, libraries on the laptop to create a consistent dataset. These frames were
encompassing a significant number of signs (ranging from alphabets saved in a structured directory format to facilitate easy access and
to words). This dataset incorporates both single-handed and double- processing.
handed gestures and considers variations in signer demographics
(background, clothing, ethnicity) to minimize bias and enhance the
Keypoint Extraction
robustness of our sign-to-text translation. We detail the collection MediaPipe Hands : We used the MediaPipe Hands model to detect
methods used to ensure data representativenessThis comprehensive and track hand landmarks in each frame. The model complexity was set to
data collection strategy can lead to a model that generalizes better to 0, with a minimum detection confidence of 0.5 and a minimum tracking
unseen scenarios and signer variations, potentially achieving higher confidence of 0.5.
accuracy than systems trained on less diverse datasets. Key Points: For each frame, the key points representing the 3D
Detailed Feature Extraction: Our system leverages MediaPipe to coordinates of 21 hand landmarks were extracted and saved as a NumPy
extract specific landmark points from video frames. These points array. This array was flattened to create a single feature vector for each
focus on critical aspects like hand orientation, fingertip positions, frame.
and relative positions between hands. By extracting these key Experimental Setup
features, our approach can effectively capture the nuances of Data Processing: The data processing pipeline includes the following
gestures for accurate sign recognition, leading to improved steps:
translation fidelity and potentially surpassing systems that rely on 1. Setup Directories: Create directories to store the processed data.
less informative features. 2. Load and Process Frames: Read each frame from the dataset,
Real-World Impact: Improved Communication Accessibility: convert the image color space, and process it using the MediaPipe Hands
By addressing these limitations and focusing on improved model.
accuracy, our system has the potential to significantly enhance 3. Extract Key Points: Extract and save key points for each frame.
communication accessibility for deaf and hard-of-hearing Implementation Details: The pipeline was implemented using Python
individuals. Imagine: with OpenCV and MediaPipe libraries on the laptop. The processed data
More seamless conversations: With a more accurate sign-to-text was then used for training and testing the deep learning models for sign
translation system, deaf and hard-of-hearing individuals can language recognition.
engage in more natural and fluid conversations, reducing By structuring the dataset in this manner and using consistent methods for
communication barriers in everyday situations. keypoint extraction, we ensure that the data is ready for subsequent model
Increased confidence and independence: A higher level of training and evaluation stages. This setup facilitates reproducibility and

15th ICCCNT IEEE Conference,


June 24-28,
Authorized licensed use limited to: SRM University Amaravathi. Downloaded on2024,
March 18,2025 at 09:25:04 UTC from IEEE Xplore. Restrictions apply.
IIT - Mandi, Kamand, India
IEEE - 61001

scalability for further research in sign language recognition using deep proficiency in understanding diverse human gestures.
learning techniques. The proposed approach integrates computer vision and deep
Data Preparation: The extracted landmarks are used to create a learning to revolutionize gesture recognition, with potential
comprehensive dataset for training and validation. This dataset is applications in various domains, including human-computer
divided into suitable training and validation sets to facilitate model interaction, virtual and augmented reality, and more. It signifies a
development. significant advancement in the understanding and interpretation of
Model Training: The proposed approach relies on recurrent neural human gestures, opening doors to exciting possibilities in the
networks (RNNs), specifically Long Short-Term Memory (LSTM) realm of technology.
networks, to train a deep learning model. These networks excel at VI. RESULTS
capturing temporal dependencies, making them ideal for recognizing ● The sign language recognition system, developed through the
gesture sequences. integration of the MediaPipe library and LSTM, demonstrated
exceptional performance in its evaluation. The accuracy
achieved on the train-test split dataset was a perfect 100.0%.
This high accuracy level indicates the system's robustness in
accurately recognizing and interpreting various sign language
gestures under controlled conditions. The Matthews Correlation
Coefficient (MCC) for the system was 1.0000, emphasizing its
capability to maintain high precision and recall across multiple
classes. Overall Metrics Summary:
● Accuracy: 100.0%
● Matthews Correlation Coefficient (MCC): 1.0000
These metrics demonstrate the high precision and recall achieved
across specific words in our dataset, highlighting the effectiveness
of our sign language recognition system.

Fig: Model Architecture

Model Summary:
Our sign language recognition model utilizes a sequential
architecture built with Long Short-Term Memory (LSTM) layers
for effective recognition of sign sequences. The model summary
(shown below) provides a detailed breakdown of the network
structure and its trainable parameters.
Key Points from the Model Summary:
Sequential Architecture:
The model employs a sequential structure, where data flows
through each layer in a single sequence. This is well-suited for
tasks involving ordered data like sign language sequences.
LSTM Layers:
The core of the model consists of three LSTM layers (lstm, lstm_1,
and lstm_2). LSTMs are known for their ability to capture temporal
dependencies within sequences, which is crucial for recognizing
the order of signs in sign language.
Layer Outputs: Fig: Classification Report
The first two LSTM layers (lstm and lstm_1) have an output shape
of (None, 30, units), where None represents the variable batch size, Real-time evaluation of the system in practical scenarios, however,
30 is the sequence length, and units represent the number of hidden showed a slightly lower accuracy of 96.4%. This discrepancy
units in the layer (64 for the first LSTM and 128 for the second). between train-test split and real-time performance underscores the
The final LSTM layer (lstm_2) has an output shape of (None, challenges posed by real-world factors such as varying lighting
units), where units represent the number of hidden units in the layer conditions, background noise, and different hand orientations.
(set to 64 in this case).
Dense Layers:
Following the LSTM layers, three fully-connected dense layers
(dense, dense_1, and dense_2) are used. These layers perform non-
linear transformations on the extracted features to classify the sign
sequence into one of the possible output categories. The number of
units in each dense layer progressively reduces (64, 32, and 8) as
the network progresses towards the final output layer.
Total Parameters:
The model has a total of 187,496 trainable parameters. This
indicates the model's capacity to learn complex relationships
between the input sign sequences and their corresponding labels.

Model Evaluation: The trained model is rigorously evaluated


using a separate test dataset containing a wide array of gestures
related to (ISL). The model's accuracy is assessed by comparing its
predicted interpretations with the actual gestures, ensuring its Fig: Output for word “hi”

15th ICCCNT IEEE Conference,


June 24-28,
Authorized licensed use limited to: SRM University Amaravathi. Downloaded on2024,
March 18,2025 at 09:25:04 UTC from IEEE Xplore. Restrictions apply.
IIT - Mandi, Kamand, India
IEEE - 61001

using machine learning and computer vision algorithms. These


strategies try to close the communication gap between the hearing
impaired and the general community. The employment of various
algorithms such as CNN, K-NN, decision trees, and neural
networks in conjunction with picture and gesture recognition
techniques has yielded promising results in understanding sign
language gestures and turning them into text or voice.
Furthermore, deep learning approaches like LSTM and the
Mediapipe library have been proven to dramatically enhance
recognition accuracy. These technologies have the potential to be
further refined and applied in real-world circumstances, providing
effective and efficient communication for the deaf and mute
community.
VIII. REFERENCES
Fig: Output for word “your” [1] Lu, Wei, Zheng Tong, and Jinghui Chu. "Dynamic hand
gesture recognition with a leap motion controller." IEEE
Signal Processing Letters 23.9 (2016): 1188-1192.
[2] "Automated Speech to Sign Language Conversion Using
Google API and NLP." Bharti, Ritika, Sarthak Yadav, and
Sourav Gupta. International Conference on Advances in
Electronics, Electrical, and Computational Intelligence
(ICAECC), 2019.
[3] Luc Van Gool, Schindler, and Konrad. "Action snippets: How
many frames does human action recognition require?." IEEE
Conference on Computer Vision and Pattern Recognition
(CVPR 2008). IEEE, 2008.
[4] Papastratis, Ilias, et al. "Continuous sign language recognition
through cross-modal alignment of video and text embeddings in
a joint-latent space." IEEE Access 8 (2020): 91170-91180.
[5] Siming, he. "Research of a sign language translation system
Fig: Output for word “is” based on deep learning." AIAM 2019 is the annual international
conference on artificial intelligence and advanced
● In the domain of sign language recognition, CNN and LSTM manufacturing. IEEE, 2019.
models are used[1]. The model's robustness was enhanced by the use [6] Ezhumalai, P. ``Speech to sign language translator for hearing
of both static and gesture sign languages[1]. Moreover, the impaired." Turkish Journal of Computer and Mathematics
recognition accuracy for single and double hand gestures reached Education (TURNCOAT) 12.10 (2021): 1913-1919.
97.85% and 94.55%, respectively, underscoring the system's [7] Zheng et al. "An improved sign language translation model with
proficiency. explainable adaptations for processing long sign sentences."
Neuroscience and Computational Intelligence 2020 (2020).
[8] Garcia, Brandon, and Sigberto Alarcon Viesca. "Real-time
American sign language recognition with convolutional neural
networks." Convolutional Neural Networks for Visual
Recognition 2.225-232 (2016).
[9] Patil, Rachana, et al. "Indian sign language recognition using
convolutional neural network." ITM Web of Conferences. Vol.
40. EDP Sciences, 2021.
[10] Pigou, Lionel, et al. "Sign language recognition using
convolutional neural networks." Computer Vision-ECCV 2014
Workshops: Zurich, Switzerland, September 6-7 and 12, 2014,
Proceedings, Part I 13. Springer International Publishing, 2015.
[11] Haider, Iram, et al. "A Hand Gesture Recognition based
Fig: Accuracy score and Training losses Communication System for Mute people." 2020 IEEE 23rd
International Multitopic Conference (INMIC). IEEE, 2020.
These encouraging outcomes demonstrate the potential for [12] Sukhwinder Singh and Sakshi Sharma. "Deep learning-based
advanced technology, particularly deep learning models, to vision-based hand gesture recognition for sign language
enhance communication for the deaf and hard-of-hearing interpretation." 115657 in Expert Systems with Applications 182
communities. With further research and development, the (2021).
integration of such systems into diverse devices and applications
could significantly contribute to creating more inclusive and
accessible communication options for individuals with
disabilities.

VII. CONCLUSIONS
In conclusion, the advancement of technology, particularly the
integration of MediaPipe Holistic and LSTM, holds immense
promise in breaking down communication barriers between the
deaf and hearing communities. With approximately 63 million
deaf individuals in India alone, the need for effective sign
language conversion systems is critical.
According to the information presented, a variety of approaches
and strategies for sign language recognition have been developed

15th ICCCNT IEEE Conference,


June 24-28,
Authorized licensed use limited to: SRM University Amaravathi. Downloaded on2024,
March 18,2025 at 09:25:04 UTC from IEEE Xplore. Restrictions apply.
IIT - Mandi, Kamand, India

You might also like