American_Sign_Language_Real_Time_Detection_Using_TensorFlow_and_Keras_in_Python
American_Sign_Language_Real_Time_Detection_Using_TensorFlow_and_Keras_in_Python
Abstract—This paper presents a novel approach to enhance The primary contributions of the work are as follows:
communication for individuals with hearing impairments. We
propose a sign language detection program in Python that x ASL Detection Methodology: Introduces an
integrates image recognition techniques to synthesize effective ASL gesture detection approach using
American Sign Language (ASL). The program achieves threshold images and deep learning.
accurate recognition by analysing and interpreting ASL
gestures using a convolutional neural network. The detected
x Custom Deep Learning Model: Develops a Keras-
gestures are mapped to textual representations and relayed to based deep learning model, improving ASL gesture
the display interface for seamless communication. Evaluation recognition accuracy.
of a diverse data set demonstrates the system's robustness and x Real-Time Recognition: Enables real-time ASL
effectiveness. The developed program, implemented in Python recognition for immediate communication.
using OpenCV and TensorFlow, offers a scalable solution to
improve accessibility and inclusivity for the hearing-impaired x Text-to-Speech Integration: Integrates Pyttsx3 for
community. converting ASL signs to spoken language.
Keywords—ASL, Image Recognition, Keras, CNN, x End-to-End Development: Involves dataset creation,
TensorFlow model development, and user-friendly GUI design
depicted in Fig.1 and Fig.2.
I. INTRODUCTION
Sign language is a visual form of communication that
utilizes hand gestures and motions to convey meaning. It
plays a vital role in enabling effective communication for
individuals who are deaf or have hearing impairments,
allowing them to express their thoughts, emotions, and
interact with others in their community.
Inclusivity of individuals who are deaf and mute is
crucial for building a society that values diversity and
ensures equal opportunities. By promoting accessible
communication methods, such as sign language
interpretation and inclusive technologies, we can foster a
more inclusive environment where the deaf and mute
community can actively participate and contribute to all
aspects of life.
Our approach to creating a sign language detection
system consists of first creating a histogram (shown in Fig.6)
and then reading the real time video input frame be frame
and then comparing each frame to the dataset. After this the
prediction is done using a model created by a CNN network
whose accuracy is about 97%. Once the model predicts the
sign, it is displayed to the user using the chatbot that also Fig. 1. GUI Interface
reads out the output.
2
Authorized licensed use limited to: Indian Inst of Inform Technology Guwahati. Downloaded on August 25,2024 at 15:59:57 UTC from IEEE Xplore. Restrictions apply.
Fig. 3. Architecture diagram
Fig. 5. The gestures should be stored in the database along with their names
The training data is then processed and used to train a To predict a particular gesture, frames are read from an
model that will be utilized for prediction. The model is built input video stream, and these frames are converted into
using Keras, a popular deep learning library. Keras provides threshold images. The threshold images are then compared
a user-friendly interface for constructing neural networks and with the dataset using the trained model. This comparison
efficiently training them on large datasets. helps in identifying the corresponding ASL sign for the input
gesture.
3
Authorized licensed use limited to: Indian Inst of Inform Technology Guwahati. Downloaded on August 25,2024 at 15:59:57 UTC from IEEE Xplore. Restrictions apply.
recognize and classify ASL gestures accurately. When a new
input (e.g., a live video stream) is provided, the imported
model can analyse the hand gestures in real-time and predict
the corresponding ASL action or sign.
Researchers and developers can utilize this data
acquisition and modelling pipeline to build effective ASL
gesture recognition systems. These systems hold promise in
enhancing communication and accessibility for the deaf and
hard-of-hearing community, bridging the gap between sign
language and spoken language.
V. DATA AQUISATION
This section focuses on the data acquisition process,
which plays a crucial role in capturing American Sign
Language (ASL) gestures from proficient individuals. The
process involves setting up the data collection environment,
ensuring optimal lighting conditions, appropriate camera
Fig. 6. This is how to histogram is set.
placement, and adhering to specific hand positioning
Training the model involves feeding the processed guidelines.
training data into the model and optimizing its parameters to To begin, creating a suitable data collection environment
learn patterns and features that are indicative of each ASL is essential. This includes ensuring adequate lighting
sign. The training process aims to minimize the model's conditions to enable clear visibility of the hand gestures.
prediction error and improve its accuracy in recognizing Proper lighting helps capture accurate hand shapes and
ASL gestures. movements, minimizing potential errors in the dataset.
In this specific project, once the final output gesture is Additionally, the camera placement is crucial for capturing
predicted, it is then spoken aloud using the Pyttsx3 Python the gestures effectively. The camera should be positioned to
library. Pyttsx3 provides a text-to-speech interface, allowing have a clear view of the hands, ensuring that all hand
the system to convert the recognized ASL sign into spoken movements are captured accurately.
language, enhancing accessibility and understanding for During the recording process, a diverse range of ASL
individuals who may not be familiar with sign language. This gestures should be considered to ensure the dataset
is shown in Fig.4. encompasses various hand shapes, orientations, and
movements. This diversity is important as ASL encompasses
IV. VISUALISATION a rich vocabulary with numerous signs that vary in
The process begins by capturing hand gestures using a complexity and intricacy. By capturing a wide range of
live camera to create the dataset. The camera records images gestures, the resulting dataset becomes more comprehensive
of individuals performing various ASL gestures. To enhance and representative of the full spectrum of ASL.
the quality of the captured images, preprocessing techniques
such as Gaussian and median blur are applied. These After acquiring the data, it can be used to create a
techniques help to reduce noise and improve the clarity of framework for hand gesture recognition. This involves
the hand gestures in the images. leveraging libraries such as TensorFlow and Keras, which
provide powerful tools for training machine learning models.
After applying blurring, thresholding is performed on the TensorFlow allows the creation of dataflow graphs, which
pre-processed images. This step converts the grayscale describe how data moves through a computational graph.
images into binary images, where the hand regions are This capability is particularly useful for building models that
highlighted. The thresholder image is then saved into the process and recognize hand gestures based on the captured
system, forming a part of the dataset. ASL data.
To increase the diversity of the data’s et, the recorded In addition to TensorFlow, scikit-learn is another
images are flipped horizontally. Flipping the images creates valuable library that can be employed in the data analysis
mirrored versions of the gestures, effectively doubling the process. Scikit-learn offers a range of machine learning
dataset's size. This augmentation technique helps to provide models for tasks such as regression, classification, and
more varied examples of the hand gestures, allowing the clustering. It also provides statistical tools to analyze these
model to learn from a wider range of perspectives. models, enabling researchers to gain insights into the
performance and behavior of the trained models.
Once the dataset is prepared, it is loaded into the code
environment. The dataset is then split into training and By utilizing these libraries and techniques, researchers
testing data. The training data is used to train a Keras model, can leverage the collected ASL dataset to train machine
which is a high-level neural networks API built on top of learning models that can accurately recognize and interpret
TensorFlow. Keras provides an intuitive interface for ASL gestures. This advancement in technology holds the
designing and training deep learning models, making it potential to bridge the communication gap between the deaf
suitable for tasks like hand gesture recognition. community and those who do not understand sign language,
enhancing accessibility and inclusivity for all.
The Keras model, after being trained on the dataset, is
imported into the final code for action recognition. The
model's architecture and learned weights enable it to
4
Authorized licensed use limited to: Indian Inst of Inform Technology Guwahati. Downloaded on August 25,2024 at 15:59:57 UTC from IEEE Xplore. Restrictions apply.
VI. ANALYSIS the system's ability to handle different sign variations. The
In addition to the previous steps, the utilization of gesture user-friendly graphical user interface (GUI) significantly
analysis algorithms further enhances the system's capability enhanced the usability and accessibility of the system. It was
to interpret and recognize ASL gestures accurately. These found that the GUI is intuitive and easy to navigate, enabling
algorithms are specifically designed to analyze both the seamless interaction and control. The GUI facilitated various
spatial and temporal characteristics of hand movements, actions, including setting up the system, calibrating hand
extracting meaningful features and patterns that represent gestures, and accessing recognition results, making it user-
different ASL gestures. friendly for individuals with different levels of technical
expertise.
By capturing information such as hand shape, orientation,
movement trajectory, and timing, the gesture analysis VIII. CONCLUSION
algorithms enable the system to capture and comprehend the In conclusion, the real-time ASL recognition system
nuances of ASL gestures. These algorithms often leverage utilizing computer vision and gesture analysis represents a
techniques such as image processing, computer vision, and significant advancement in enabling effective
machine learning to process the input data effectively. communication between ASL users and non-sign language
During the training phase, the system is exposed to a users. By leveraging computer vision techniques,
comprehensive dataset of ASL gestures that cover a wide sophisticated gesture analysis algorithms, and a user-friendly
range of signs. This dataset serves as a reference, allowing GUI, the system demonstrates remarkable potential for
the system to learn and identify the distinctive features improving accessibility, fostering inclusivity, and bridging
associated with each gesture. Through this training process, the communication gap between diverse communities.
the system becomes adept at recognizing and interpreting Continued research and development in this field will pave
ASL gestures accurately. The accuracy for the same is shown the way for even more accurate and efficient ASL
in Fig.7. recognition systems in the future.
However, it's important to acknowledge some limitations.
The system's accuracy heavily relies on a contrasting
background and hand color, and when this contrast is
lacking, the accuracy may decrease. To enhance accuracy in
such scenarios, future developments could consider
predicting sign language based on feature extraction or
detection points from the hands rather than relying solely on
threshold images.
Ultimately, the combination of gesture analysis
algorithms and comprehensive training enables the system to
facilitate seamless communication between ASL users and
non-sign language users, empowering individuals proficient
in ASL to express themselves naturally and enabling non-
sign language users to understand and respond to ASL
gestures effectively. This technological advancement
contributes to fostering inclusivity and accessibility for the
deaf and hard-of-hearing community, promoting equal
opportunities for communication and understanding.
REFERENCES
Fig. 7. Training Accuracy Achieved
[1] A. Pardasani, A. K. Sharma, S. Banerjee, V. Garg and D. S. Roy,
By extracting relevant features from the input data and "Enhancing the Ability to Communicate by Synthesizing American
Sign Language using Image Recognition in A Chatbot for Differently
comparing them with the learned patterns from the training Abled," in 2018 7th International Conference on Reliability, Infocom
dataset, the system can make predictions and determine the Technologies and Optimization (Trends and Future Directions)
corresponding ASL sign for a given gesture. The use of (ICRITO).
advanced machine learning techniques, such as deep learning [2] A. S. Ghotkar, R. Khatal, S. Khupase, S. Asati and M. Hadap, "Hand
models, can further enhance the accuracy and robustness of gesture recognition for Indian Sign Language," in 2012 International
the gesture recognition system. Conference on Computer Communication and Informatics.
[3] A. Tayal, "An Interactive Alphabet and Number learning system
VII. RESULTS using OpenCV and CNN," in 2021 Asian Conference on Innovation
in Technology (ASIANCON).
The real-time American Sign Language (ASL) [4] J. -H. Sun, T. -T. Ji, S. -B. Zhang, J. -K. Yang and G. -R. Ji,
recognition system utilizing gesture analysis demonstrates "Research on the Hand Gesture Recognition Based on Deep
impressive performance in accurately interpreting and Learning," in 2018 12th International Symposium on Antennas,
recognizing ASL hand gestures. Extensive testing and Propagation and EM Theory (ISAPE).
evaluation of the system have been conducted to assess its [5] M. Deshpande et al., "Sign Language Detection using LSTM Deep
Learning Model and Media Pipe Holistic Approach," in 2023
effectiveness and reliability in real-world scenarios. To International Conference on Artificial Intelligence and Smart
evaluate the system's performance, a diverse dataset Communication (AISC).
comprising a wide range of ASL gestures was collected from [6] Mampi Devi, Ananya Chakraborty, Alak Roy, and Dipanjoy
proficient sign language users. The dataset encompassed Majumder, "Single-hand Gesture Recognition of Manipuri Classical
various hand shapes, orientations, and movements, ensuring Dance of India based on Skeletonization Technique," in 2023
5
Authorized licensed use limited to: Indian Inst of Inform Technology Guwahati. Downloaded on August 25,2024 at 15:59:57 UTC from IEEE Xplore. Restrictions apply.
International Conference on Intelligent Systems, Advanced Nanotechnology, Information Technology, Communication and
Computing and Communication (ISACC). Control, Environment, and Management (HNICEM).
[7] S. Almana and A. Al-Omary, "Real-time Arabic Sign Language [9] S. Ikram and N. Dhanda, "American Sign Language Recognition
Recognition using CNN and OpenCV," in 2022 International using Convolutional Neural Network," in 2021 IEEE 4th
Conference on Innovation and Intelligence for Informatics, International Conference on Computing, Power, and Communication
Computing, and Technologies (3ICT). Technologies (GUCON).
[8] S. D. Boncolmo, E. V. Calaquian and M. V. C. Caya, "Gender [10] T. A. Siby, S. Pal, J. Arlina, and S. Nagaraju, "Gesture based Real-
Identification Using Keras Model Through Detection of Face," in Time Sign Language Recognition System," in 2022 International
2021 IEEE 13th International Conference on Humanoid, Conference on Connected Systems & Intelligence (CSI).
6
Authorized licensed use limited to: Indian Inst of Inform Technology Guwahati. Downloaded on August 25,2024 at 15:59:57 UTC from IEEE Xplore. Restrictions apply.