(7-14) Journal of Soft Computing and Computational Intelligence5
(7-14) Journal of Soft Computing and Computational Intelligence5
(7-14) Journal of Soft Computing and Computational Intelligence5
Received Date: March 25, 2024; Published Date: April 06, 2024
Abstract
Addressing the imperative of bridging communication barriers between the deaf and non-verbal
communities this project centres on advancing automated American Sign Language (ASL)
recognition through key-point detection-based methodologies. A comprehensive analysis of the
model‘s efficacy is conducted, employing rigorous testing methodologies and metrics like F1
score, precision, and recall to ascertain optimal performance. By delving into the nuances of ASL
recognition, the project seeks to enhance the accuracy and reliability of machine learning models
in deciphering sign language gestures. Additionally, the implementation of a user-friendly
graphical user interface (GUI) facilitates seamless interaction, empowering users to effortlessly
engage with the system and generate predictions utilizing the most proficient machine learning
algorithms. Through this endeavour, the aim is not only to enhance accessibility for the deaf and
non-verbal communities but also to foster inclusivity by providing a platform for effective
communication between individuals utilizing ASL and those who rely on verbal communication.
This interdisciplinary approach merges technological innovation with social responsibility, paving
the way for a more inclusive and connected society.
Keywords- American Sign Language (ASL), Effective communication, Graphical User Interface
(GUI), Implementation, Key-point
networks (CNN) to statically model language worldwide with hearing loss [3]. Focusing on
recognition [1]. This work specifically uses Indian Sign Language (ISL), the study highlights
CNNs for Hand Gesture Recognition (HGR) the challenges faced in developing countries,
while taking into account the letters and numbers such as limited educational resources and high
of American Sign Language (ASL). The article unemployment rates among adults with hearing
carefully discusses the advantages and loss. The research aims to bridge the gap in sign
disadvantages of using CNNs in the context of language recognition technology, specifically for
HGR. The CNN architecture was developed by ISL, by utilizing computer vision and machine
modifying the AlexNet and VGG16 models for learning algorithms instead of high-end
classification. Feature extraction uses pre-trained technologies like gloves or Kinect. The project's
modified AlexNet and VGG16 architectures, primary objective is to identify alphabets in
which are then fed into a multi-modal support Indian Sign Language through gesture
vector machine (SVM) classifier. Measurement recognition, contributing to the broader
tools are based on different methods to accessibility and understanding of sign
demonstrate performance. Validity evaluation of languages in the context of Indian
HGR plans involved leave-one-out and random communication and education.
70–30 cross-validation. In addition, this study D.M.M.T. et al. in their paper study the
investigates the unique recognition of unique problem of vision-based Sign Language
characters and explores similarities between Translation (SLT), which bridges the
identical movements. To highlight the good communication gap between the deaf-mute and
results of the proposed method, it is worth noting normal people [4]. It is related to several video
that the experiment was performed on a simple understanding topics that target to interpret
CPU system and the use of a high-end GPU was video into understandable text and language.
avoided. More importantly, the proposed method Sign Language is a form of communication that
achieved recognition accuracy of 99.82%, uses visual gestures to convey meaning. It
outperforming some state-of-the-art methods. involves using hand shapes, movements, facial
Sandrine Tor nay et al. proposed a expressions, and lip patterns to communicate
technique that delves into the realm of sign instead of relying on sound. There are many
language recognition, focusing on the challenge different sign languages around the world, each
of resource scarcity in the field [2]. The primary with its own set of gestures and vocabulary. For
obstacle lies in the diversity of sign languages, instance, ASL (American Sign Language), GSL
each with its vocabulary and grammar, creating (German Sign Language), and BSL (British Sign
a limited user base. The paper proposes a Language) are some examples.
multilingual approach, drawing inspiration from Merlin Huro et al. proposed a system for
recent advancements in hand-shape modelling. language recognition using convolutional neural
By leveraging resources from various sign networks and computer vision [5]. He also
languages and integrating hand movement developed a similar algorithm using 2D CNN
information through Hidden Markov Models models from the Tensorflow library. The
(HMMs), the study aims to develop a convolution technique is used to extract the main
comprehensive sign language recognition features from the input image, the image is
system. The research builds upon prior work that printed with a 3 x 3 filter, and the point object
demonstrated the language independence of and weight indicator of the frame pixels are
discrete hand movement subunits. The validation calculated. Layer after layer is used to reduce the
of this approach is conducted on the Swiss- activation map of the previous layer and
German Sign Language (DSGS) corpus SMILE, integrate all learned features into the functional
German Sign Language (DGS) corpus, and map of the previous layer. This helps reduce
Turkish Sign Language corpus HospiSign, overfitting to the training data and helps
paving the way for a more inclusive and generalize the features represented by the
versatile sign language recognition technology. network. The input method of our convolutional
Shirbhate et al. proposed a technique in neural network consists of 32 maps of size 3 x 3,
their research paper that addresses the vital role and the activation function is a correction unit.
of sign language as a communication medium The maximum pooling layer size is 2x2 and the
for the deaf and dumb community, emphasizing throughput is set to 50%. The layer is then
its significance for the 466 million people flattened and the last layer of the network is a
fully connected output layer consisting of ten picture. The rational part of this exploration is to
units and the activation function is Softmax. adjust the pictures utilizing the multi-centre
Finally, he computed the model using picture combination system. The framework C
categorical cross-entropy as unemployment and language utilizes the pixel level combination
Adam as the optimizer. calculation to assess the consequence of shading
Chengji Liu et al. proposed a pictures dependent on the Xilinx Spartan 3
generalized object detection network that was Embedded Development Kit (EDK) field
developed by applying complex degradation programmable door cluster (FPGA) standard.
processes on training sets like noise, blurring,
rotating and cropping of images [6]. The model METHODOLOGY
was trained with the degraded training sets
which resulted in better generalizing ability and The project utilizes a comprehensive
higher robustness. The experiment showed that methodology encompassing Gesture Training,
the model trained with the standard sets does not Gesture Recognition and Translation, Prediction
have good generalization ability for the degraded Output, Retraining and Copying Sentences, and
images and has poor robustness. Then the model an intuitive User Interface to enable real-time
was trained using degraded images which translation of sign language gestures with high
resulted in improved average precision. It was accuracy and user-friendliness, thereby
proved that the average precision for degraded enhancing communication accessibility for sign
images was better in the general degenerative language users. Through personalized training,
model compared to the standard model. advanced gesture recognition algorithms, and
Horn et al. proposed a technique that continuous user feedback, the system ensures
optical flow detects moving objects even when efficient and adaptable communication,
the camera is also in motion [7]. It deals with the representing an innovative solution to bridging
pattern of lights in the image for detection. It can the gap in communication for individuals reliant
deal with a sequence of images that can be on sign language.
classified as a set rather than unshaped regions Gesture Training: This is the initial phase
in spatial arrangements. It is insensitive to noise where users train the system to recognize
and brightness levels. their unique sign language gestures. Each
Singh and Khare proposed a redundant user can add as many words as they want,
wavelet transform (RWT and R-DWT) for the and associate each word with a specific
image fusion method in multimodal medical gesture. This personalized training allows
images [8]. In their method, they found that the the system to accurately interpret the sign
shift-invariance of the R-DWT produces quality language of individual users, taking into
image fusions. They experimented using several account the variations in how different
multimodal MRI, CT, and PET medical images people make the same gestures.
and the results were drawn using mutual Gesture Recognition and Translation:
information and strength metrics. Comparison of Once the gestures are trained, the system is
the said method was done using spatial and ready to translate them in real-time. This is
wavelet fusions such as principal component achieved using Google Tensor Flow’s
analysis (PCA), discrete wavelet transform implementation of the K-Nearest Neighbors
(DWT), lifting wavelet transform (LWT), and (KNN) algorithm. The KNN algorithm
discrete cosine transform (DCT), which proved works by classifying a query based on the
that the R-DWT method was far better than labels of the K points (gestures in this case)
other methods in medical image fusion. that are closest to it in the feature space.
Bhanusree et al. concentrated on second- The ‘closeness’ is determined by a distance
age wavelet transform for picture combination metric, such as Euclidean distance. The
and researched the quality coefficients at various predicted words corresponding to the
recurrence areas [9]. Low-recurrence gestures are then passed on to the prediction
coefficients are generally utilized in a output class.
neighbourhood to select the estimating criteria, Prediction Output: In the prediction
while coefficients of a high recurrence are output class, the system reads the sentence
utilized for the window property and for formed by the sequence of predicted words
watching the qualities of nearby pixels in the and displays the confidence level for each
predicted word. This confidence level is User Interface: The sign language
essentially a measure of how closely the translator is equipped with an intuitive user
user’s gesture matches the trained gestures interface, making it easy to use even for
for the predicted word. This feedback people who are not tech-savvy. The
mechanism allows users to adjust their interface includes additional features that
gestures if necessary, thereby improving the enhance the user experience, such as easy
accuracy of the translation over time. navigation, clear instructions, and visual
Retraining and Copying Sentences: The feedback.
framework gives clients the adaptability to This comprehensive methodology ensures that
return and retrain words. This assumption the sign language translator is not only accurate
for enduring a client sees that particular and efficient but also user-friendly and adaptable
improvement is continually being perplexed to the unique needs of each user. It represents a
by the development, they can retrain that creative and innovative solution to the challenge
sign. Clients can similarly copy sentences of translating sign language in real-time. This
made through their hand signals. This part project has the potential to significantly improve
can be particularly useful in circumstances the quality of life for people who rely on sign
where the client needs to pass comparative language for communication as shown in Fig. 1.
sentences on to different people.
Furthermore, the system's real-time translation accessibility and fosters inclusivity, empowering
capabilities enable immediate communication individuals with hearing impairments to engage
between individuals using sign language and in more effective and inclusive communication
those who may not understand it. This as shown in Fig. 2, 3, 4, 5, 6 and 7.
functionality enhances communication
Manish Kumar, et al. (2024). Real-Time Sign Language Translation using the KNN Algorithm,
Journal of Soft Computing and Computational Intelligence,1(1),7-14.