Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
6 views

Visual Language Interpreter

The document presents a project aimed at developing a Visual Language Interpreter that translates sign language into spoken and written text using deep learning technologies, specifically Convolutional Neural Networks (CNN). It addresses the communication barriers faced by individuals who rely on sign language, proposing a system that enhances gesture recognition and text prediction for improved accuracy and efficiency. The research highlights the importance of integrating various features and real-time processing to create a scalable solution for sign language translation.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Visual Language Interpreter

The document presents a project aimed at developing a Visual Language Interpreter that translates sign language into spoken and written text using deep learning technologies, specifically Convolutional Neural Networks (CNN). It addresses the communication barriers faced by individuals who rely on sign language, proposing a system that enhances gesture recognition and text prediction for improved accuracy and efficiency. The research highlights the importance of integrating various features and real-time processing to create a scalable solution for sign language translation.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Volume 10, Issue 3, March– 2025 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/25mar920

Visual Language Interpreter


Aniket Jadhav1; Tejas Ulawekar2; Shubham Kondhare3; Nirbhay Mokal4;
Rupali Patil5
1-5
Dept. of Computer Engineering Pillai HOC College of Engineering and Technology
(University of Mumbai)

Publication Date: 2025/03/27

Abstract: Meaningful communication is a basic human need, and yet there are some people who make use of sign language
to communicate with the spoken word and encounter serious obstacles. This disconnect can leave us feeling isolated and
alienated. Our project aims to solve this issue by creating a system which recognizes few hand signs that real-time converts
into spoken as well written text. Our aim is to create a solution that can enable efficient natural language processing and an
efficient gesture recognition, which will be based on convolutional Networks (CNN) and deep learning technology. Our text
prediction: It improves the translation provided in terms of accuracy and relevance, as well as shortens processing time and
communication. CNNs are a type of deep models, and designed to process structured data represented in form of 2D grids
or multiarray like digital images. They operate by extracting and understanding features from visual inputs, using a
hierarchy of filters that automatically recognize different patterns at increasing levels. Sign language is a critical example
of the nuanced gestures these features would enable us to better understand. Our system then generates and can identify
these different hand movements quite accurately. This enables these same gestures to be translated effortlessly into both
speech and text, thus improving communication for sign language dependent persons. In addition, our solution consists of
leading-edge text prediction technologies for optimization in translation. The purpose of these algorithms — increasing the
accuracy and relevance of translations while at the same time decreasing both processing times, rendering communication
quicker and more natural.

Keywords: Sign Language, Convolutional Neural Networks (CNN), Deep Learning, Gesture Recognition, Text Prediction, Machine
Learning, Artificial Intelligence.

How to Cite: Aniket Jadhav; Tejas Ulawekar; Shubham Kondhare; Nirbhay Mokal; Rupali Patil (2025). Visual Language
Interpreter. International Journal of Innovative Science and Research Technology, 10(3), 1085-1091.
https://doi.org/10.38124/ijisrt/25mar920

I. INTRODUCTION complexity or robust features extractor. The authors


recommend further examination of deep learning models and
Sign language (SL) is an essential communication tool applying the systems to real-world systems.
for millions of individuals around the globe, according to
World Health Organization, the individuals with hearing II. BACKGROUND
disabilities also prefer using sign language, an estimated 430
million people worldwide are completely deaf, while 1.5 Sign translation is one of the most anticipated study
billion individuals are affected by partial hearing impairment. areas, emphasizing the necessity of creating a communication
Despite its importance, SL has historically been overlooked channel between hearing and non-hearing cultures. In an
from research perspective when been in a comparison with effort to close the gap between the hearing challenged and the
the spoken languages due to its unique and differential general population, the last few decades has seen a lot of
structural characteristics. This difference leads to unique research in this field. It lies at the intersection of artificial
considerations as far as effective translation and recognition intelligence, linguistics, machine learning, computer vision,
as well as compelling needs for solutions. Abdullah Al. et al. and human-computer interaction. The highest difficulty in
Cheng et al.'s [28] assessment of 58 research papers on SL this area is to create reliable systems that allow translation of
translation highlights the implementations of deep learning visual, kinetic and mouthing components of sign language
methods, including convolutional and recurrent neural into typed. Many techniques, methodologies, and
networks, to improve recognition accuracy. This paper technologies have been previously investigated. In the early
proposes the solution of integrating manual and non-manual days, researchers concentrated on building still systems
features (face and body language) into SL recognition capable of recognizing a small number of gestures with
systems and proves that manual and non-manual features do hardware-heavy approaches. So-called data gloves with
contribute significantly to improvement of SL recognition sensors were commonly deployed to trace hand movements
systems. But we are facing challenges, such as dynamic sign and placements. For example, the "Power Glove" project in

IJISRT25MAR920 www.ijisrt.com 1085


Volume 10, Issue 3, March– 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/25mar920
the 1990s made a pioneering attempt at recognizing static images for extracting the representations from such images.
hand gestures. However, such systems were limited by their These movement and stance patterns were converted into sign
reliance on specific hardware and were non-scalable. The language words or phrases using classification models like
incorporation of machine learning was a monumental shift in Support Vector Machines (SVMs) and Hidden Markov
this space. SIFT (Scale-Invariant Feature Transform), HOG Models (HMMs). However, such systems under-performed
(Histogram of Oriented Gradients) etc. were the first feature in terms of real-time efficiency and failed to generalize well
extraction methods that researchers used with hand gesture on different datasets.

Fig 1 Alphabetic Representation using Hand Gestures

III. MOTIVATION for accurate recognition. Commonly, deep learning models


(CNN or RNN) are utilized to extract the information
Although communication is a primary human function, contained in visual data provided from camera, sensor, etc.
society itself has become a least common multiple for the Some systems also incorporate wearable devices or motion
millions of people who have speech and hearing impairments. capture technology to more accurately track hand
SIGN LANGUAGE: The deaf and hard-of-hearing movements and body posture. These devices offer rich spatial
community uses sign language as their primary means of and temporal information, expanding the system’s range of
communication, but it isn't widely used because most people answered classes, and facilitating discrimination between
never learn it. All of them have unmet needs, and as a result, similar signs. However, despite all of these advancements, the
this communication breakdown produces barriers that cause exiting systems still has an inability to recognize overlapping
social isolation, difficulties in school, or difficulties at work. movements, handle the intricacy of dynamic signs, and scale
Context The problem of digital literacy has been worsened by for sign languages worldwide. Due to changes in
the widening digital divide, and inclusive technologies are illumination, background noise, and even unique signer
desperately needed to close this gap. By utilizing cutting-edge styles, many systems are still restricted to controlled or
developments in artificial intelligence, machine learning, and laboratory settings but are applied in the real world. Recent
computer vision, we hope to create a sign language intrinsic developments in deep neural networks and motion
translation system that does not only translate signs into text tracking methods have led to notable advancements in the
or speech but also will be able to comprehend and preserve field of recognition of sign languages.
the subtleties of each sign language and dialect while
producing accurate and scalable results. Such a mechanism V. FUNDAMENTAL CONCEPTS
can facilitate participation by those with disabilities.
 Sign Language Recognition:
IV. EXISTING SYSTEM The topic of this research is sign language recognition
(SLR), the challenging problem of recognizing,
Survey of current sign language recognition technologies and understanding, and interpreting the visual gestures and
system architecture for effective real-time sign language expressions people use to communicate with sign language.
processing Like all of them, the tiny portion of sign language his procedure is required since face expressions, voice
translation identifies and converts sign language into text or direction, hand shapes, and hand movements and directions
speech. They have implemented both manual features (e.g., (such as up, down, left, and right) are all essential to
hand shape, motion, location, and orientation) and non- comprehending the meaning. SLR aims to help a system
manual features (e.g., facial expressions and body postures)

IJISRT25MAR920 www.ijisrt.com 1086


Volume 10, Issue 3, March– 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/25mar920
recognize and convert gestures into a representation of between people who are deaf/hard-of-hearing and non-deaf
human language, which can then be spoken or written. people through an examination of the technologies and
approaches adopted in sign language recognition and
 Feature Extraction: translation.
Feature extraction is an essential stage in the process of
converting the sign language into a format that can be  Advantage: Cost effective way for real-time
understood by humans. When employing conventional conversations between deaf and non-deaf through low-
approaches, key features are found by scanning for hand cost / low-powered vision-based and sensor-based
gestures using methods like SIFT and HOG. Modern systems
techniques employ deep learning models to automatically  Limitation: Environmental resilience, vocabulary, and
learn the pertinent characteristics, improving accuracy and some solutions are expensive.
adaptability as compared to the traditional methods that relies
on human created algorithms for feature extraction from  Research of a Sign Language Translation System Based
images. on Deep Learning
This survey paper by Authors: Siming He, Ridley
 Classification: College, St. Catharines, (Canada), We will be reviewing a
Classification begins from the extracted features of paper on Deep Learning-based Sign Language Recognition
videos and identifies to which sign language symbols, words, for Human-Robot Interaction. The utilization of the machine
or phrases they correspond. Old systems used a rule-based learning models is done where Faster R-CNN localizes the
algorithm or similar statistical models; new approaches hands, 3D CNN extracts relevant features, and an LSTM-
apply machine learning and neural networks for greater based sequence-to-sequence model achieves recognition. By
accuracy. CNNs are suited for static gesture recognition, and combining these technologies, they are able to validate sign
RNNs/LSTMs address sequential nature in dynamic language translations, completing the set with a 99%
signing. recognition rate.

 Temporal Dynamics:  Advantage: Our model combines Faster R-CNN, 3D


The temporal properties of sign language are key for CNN and LSTM in an efficient manner to achieve
grasping continuous sign interpretation. This means looking enhanced recognition accuracy.
at the order and timing of gestures to obtain contextual  Limitation: System has Limited Dataset Coverage, Not
information. LSTMs, transformers, etc., are great in this all sign language words are included in the dataset, which
regard to capture these dynamics. limits the scalability of the system.
 Multimodal Integration:  Advancements in Sign Language Recognition
Multimodal systems that amalgamate the visual This paper by Bashaer A. Al Abdullah, Ghada A.
information with other sensory inputs, such as accelerometers Amoudi and Hanan S. Alghamdi reviews the contemporary
or motion sensors, can enhance sign language recognition. It work in Sign Language Recognition (SLR) systems with
also adds robustness to learn and understand, particularly in special focuses on integrating Artificial Intelligence (AI) and
the noisy or changing environments, as well as providing a machine learning techniques for the development of
richer context for the accurate interpretation. automated Sign Language Translation Systems (SLTS). The
research systematically records 58 research papers that
 Real-Time Processing: involve the deep learning techniques like Convolutional
Real-time processing is necessary for practical sign Neural Network (CNN) and Recurrent Neural Network
language translation applications. Among the methods used (RNN) that have obtained a high precision in distinguishing
to attain low-latency performance without compromising the hand gesture.
accuracy are edge computing, hardware acceleration, and
model optimization.  Advantage: Deep learning based approaches e.g. CNN,
RNN have reported high accuracies for sign language
By comprehending and decoding the aforementioned recognition.
fundamental truths, our research seeks to develop a better  Limitation: Limited large or diverse datasets available,
approach that not only learns the systematization of sign especially for less common sign languages.
language but also bridges the gap between systematized sign
language and human-readable language.  Sign Language Conversion to Speech with the Application
of KNN Algorithm
VI. LITERATURE REVIEW OF THE This paper presents the implementation of an
RESEARCH PAPERS application converting American Sign Language (ASL)
gestures in real-time to both text and speech. This study is
 Real-Time Sign Language Translation System the first step towards an inclusive society where the hearing-
The article, “Real-Time Sign Language Translation impaired and the general population can communicate better.
Systems: A review study” Maria Papatsimouli et al., is a Convolutional Neural Networks (CNNs); K-Nearest
review study focusing on real-time sign language translation. Neighbors (KNN) After hand gestures are captures via
The study also pursues the idea of closing the information gap webcam, captured gestures can either be recognized and

IJISRT25MAR920 www.ijisrt.com 1087


Volume 10, Issue 3, March– 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/25mar920
identified or classified through KNN.  Relative motion patterns

 Advantage: The application implements real-time Convolutional Neural Networks (CNNs) and computer
translation where it translates sign language to text and vision algorithms like MediaPipe Hand Tracking are used for
audio for effective communication. the extraction of the features, for instance, which produces a
 Limitation: The ability to recognize an image may be structured representation of the gesture.
affected when the image is not taken in good lighting
environment.
 Classification of Gestures:
VI. METHODOLOGY A trained classifier is then used to link the predicted
gesture class to the appropriate sign language symbol. As a
Visual Language Interpreter (VLI) aims to plays the result, the model is trained using a dataset of sign language
role of a translator between users using sign language and motions for accurate classification. The system is based on
users who are not familiar with it. Our pipeline has a sign alignment, which specifies what text should be displayed
stepwise architecture through which hand gestures will for each sign.
neatly flow to text output and specified speech. The following
stages outline the methodology adopted in this research:  Generating Textual and Speech Outputs:
After the gesture is classified, it is transformed to a
 Selecting a Language: readable text format. Recognized text appears in real-time in
The technology allows users to select their favourite the interface, as complete words and sentences determined by
language for the text output, making things easy for everyone. successive gestures. Additionally, the system may generate
In this manner, the translated text corresponds to the user's speech output through Text-to-Speech (TTS) synthesis,
preferred reading content. There is a simple method to choose which makes it easier to communicate with people who
the language before beginning gesture recognition. speaks

 Preprocessing:  Algorithm: Learning Process


 Input: x, is a dimensional vector of feature
 Sound Reduction: Emits background noise and irrelevant  Output: y, is the output decision
details in the video frames, helping to streamline the
gestures of the signer.  Target function f : X => Y the ideal formula (Unknown)
 Frame Extraction: This part grabs single frames from the  Data:(x1,y1),(x2,y2),.........(xn,yn) training examples
video stream at a defined frame rate for smooth  Hypothesis g : X => Y formula to be used
processing.  Learning algorithm g ≈ f final hypothesis
 Normalization: Standardizes the input data by adjusting
the brightness, contrast, and resolution of the frames.
 Background Subtraction: This method removes the
background to focus on the signer’s hand movements and
body gestures that are useful in recognizing the sign. In
this phase, the background is removed from the frames,
leaving only the signer's hand and body movements
visible, this is a crucial part of the process for identifying
the gestures.

 Classification:
Much of this current implementation revolves around
the classification module as this is the hub of the system
where the learnt models on the UTD dataset are deployed to
recognize sign language gestures. This module involves:
Feature Extraction: Picking out important features including
hand shape, movement, orientation, and facial expressions
from the preprocessed frames. Various machine learning
models are put into practice, such as employing CNN and
RNN networks to detect sequential dependencies by shuffling
data.

 Feature Extraction:
After frames are processed, key features that
characterize the gesture are defined, for example:

 Hand shape and position


 Finger orientations

IJISRT25MAR920 www.ijisrt.com 1088


Volume 10, Issue 3, March– 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/25mar920

Fig 2 System Architecture of Visual Language Interpreter

Fig 3 Media pipe Landmark System

VII. RESULT ANALYSIS This step-by-step process really boosts our recognition
accuracy.
We have found a way to boost how well we can
recognize sign language by changing how we classify the Following extensive testing, we discovered that our model
signs. At first, when we trained a CNN model using 26 achieves an astounding 97% accuracy in a variety of
different alphabet signs, we didn’t get the results we hoped background and lighting conditions. Accuracy can reach 99%
for because some hand gestures looked a lot alike. We under ideal circumstances, such as clean backgrounds and
therefore made the decision to classify these comparable bright lighting. This demonstrates the strength and
indications into eight more extensive groups. This made it dependability of our approach to real-time sign language
simpler to grasp and reduced misunderstandings among each interpretation.
group. We construct a probability distribution for each group,
and our forecast sign is the sign with the highest likelihood.  Insight: When compared to other approaches, the
We also do some math on the hand landmarks, which allows fingerprint system exhibits the best balance between
us to tell the signs apart more accurately within those groups. precision and recall, as seen by its highest F1 Score of
99.24%.

IJISRT25MAR920 www.ijisrt.com 1089


Volume 10, Issue 3, March– 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/25mar920

Fig 4 Convolution layer

A little window size (usually 5*5) that reaches the depth that provides the matrix's response at each geographic
of the input matrix is present in the convolution layer. As I location. Insight: The fingerprint system shows the highest
proceed, I will produce a 2-Dimensional activation matrix success rate, indicating superior reliability and accuracy.

Fig 5 20-point palm detection

The mediapipe library and OpenCV had been a major conditions because the mediapipe library gives us the
help for obtaining these landmark points, and they were landmark points in any background and mostly in any lighting
subsequently displayed on a plain white background. By conditions
doing this we tackled the situation of background and lighting

Fig 6 Real-Time Sign Language Recognition Interface

IJISRT25MAR920 www.ijisrt.com 1090


Volume 10, Issue 3, March– 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/25mar920
VIII. CONCLUSION [6]. Gartner, C. (2021). "The Future of Sign Language
Recognition: Bridging the Communication Gap."
Convolutional Neural Networks (CNNs), a potent deep Technology Review.
learning technique, allowed us to create a system for real-time [7]. Kumar, P., Singh, A., & Mahato, H. (2020). "A
sign language detection and translation. All this while Comprehensive Study on Sign Language Recognition
bringing both the hearing-impaired community and those Systems." Journal of King Saud University - Computer
who depend on spoken language closer together through ease and Information Sciences.
of communication and reducing social isolation. Combining [8]. J.M.Power, G. W. Grimm, and J.-M. List.
modern technologies such as utilizing MediaPipe for "Evolutionary dynamics in the dispersal of sign
tracking hands/landmarks, and are able to maintain a languages," Roy. Soc. Open Sci., vol. 7, no. 1, Jan.
recognition rate higher than 97% across multiple 2020, Art. no. 191100.
environmental settings (illuminations, background noise, [9]. J. Rowe, "Artificial intelligence and the future of sign
etc.). However, the ability of the system to preprocess video language." LinkedIn, 2017. Accessed: Nov. 3, 2021.
frames, extract key features, and classify gestures into [10]. U. Farooq, M. S. M. Rahim, N. Sabir, A. Hussain, and
contextual text and speech outputs shows promise for real- A. Abid, "Advances in machine translation for sign
world applications. The method of classifying comparable language: Approaches, limitations, and challenges."
motions into more general categories significantly improves Neural Computer Application, vol. 33, no. 21, pp.
identification accuracy by removing ambiguity and bolstering 14357-14399, Nov. 2021.
the system's resilience. The findings indicate that a second [11]. M.A. Abdel-Fattah, "Arabic sign language
round of sequence ranking was conducted after a 98.7% perspective." 1. Deaf Snud. Deaf Educ. vol. 10, no.
accuracy rate was attained in a range of settings. 2.pp. 212-221. Apr. 2005.
[12]. B. S. Parton, "Sign language recognition and
REFERENCES translation: A multidisci. plined approach from the
field of artificial intelligence," J. Deaf Stud. Deaf
[1]. Maria Papatsimouli, Konstantinos-Filippos Kollias, Educ. vol.11. no. 1. pp. 94-101, Oct. 2005.
Lazaros Lazaridis, George Maraslidis, Herakles
Michailidis, Panagiotis Sarigiannidis and George F.
Fragulis,”Real-Time sign language Translation
system” 2022 11th International Conference on
Modern Circuits and Systems Technologies
(MOCAST) | 978-1-6654-6717-
9/22/$31.00©2022IEEE|DOI:10.1109/MOCAST548
14.2022.9837666
[2]. BASHAER A. AL ABDULLAH, GHADA A.
AMOUDI, AND HANAN S. ALGHAMDI,
"Advancements in Sign Language Recognition: A
comprehensive review and future prospects." IEEE
Access, vol. 12, 2024, doi:
10.1109/ACCESS.2024.3457692.
[3]. Y. Cheng, W. Shang, L. Zhu and D. Zhang, "Design
and implementation of ATM alarm data analysis
system," 2016 IEEE/ACIS 15th International
Conference on Computer and Information Science
(ICIS), Okayama, Japan, 2023, pp. 1-3.
[4]. Rajanishree M, Nadeem Ahmed N, Yashvi Panchani,
Shreyaa Aravindan1 and Viraj Jadhav Department of
Computer Science and Engineering, School of
Engineering and Technology, Bengaluru, Karnataka,
India.,” Sign Language Conversion to Speech with the
Application of KNN Algorithm” 2022 Sixth
International Conference on I-SMAC (IoT in Social,
Mobile, Analytics and Cloud) (I-SMAC) | 978-1-
6654-6941-8/22/$31.00 ©2022 IEEE | DOI:
10.1109/I-SMAC55078.2022.998742
[5]. Siming He Ridley College, St. Catharines, Canada,
“Research of a Sign Language Translation System
Based on Deep Learning” 2019 International
Conference on Artificial Intelligence and Advanced
Manufacturing (AIAM). | 978-1-7281-4691-
1/19/$31.00 ©2019 IEEE DOI
10.1109/AIAM48774.2019.00083.

IJISRT25MAR920 www.ijisrt.com 1091

You might also like