Project Synopsis (2) (2) (1)

SYNOPSIS REPORT
ON
Real Time Conversion of American Sign Language to Text
using Machine Learning
B. TECH COMPUTER SCIENCE & ENGINEERING.
Submitted by
Harshit Garg 2000910100075

Pratham Dubey 2000910100126
Shaurya Gupta 2100910109012
Project Supervisor:
Dr. Rachna Jain
Department of Computer Science and Engineering
JSS Academy of Technical Education, Noida
November 2023
TABLE OF CONTENTS
S. No Topic Page No.

1 INTRODUCTION 3
2 MOTIVATION 5
3 OBJECTIVE(S) 6
4 SCOPE OF THE PROJECT 7
5 RELATED WORK 8
6 HARDWARE & SOFTWARE REQUIREMENTS 12
7 PREDICTED TIMELINE FOR THE PROJECT 13
8 CONCLUSION 14
9 REFERENCES 15
INTRODUCTION
American sign language is a predominant sign language Since the only disability
Deaf and Dumb (hereby referred to as D&M) people have been communication
related and since they cannot use spoken languages, the only way for them to
communicate is through sign language. Communication is the process of
exchange of thoughts and messages in various ways such as speech, signals,
behavior, and visuals. D&M people make use of their hands to express different
gestures to express their ideas with other people. Gestures are non-verbally
exchanged messages and these gestures are understood with vision. This
nonverbal communication of deaf and dumb people is called sign language. Sign
language is a language which uses gestures instead of sound to convey meaning
combining handshapes, orientation and movement of the hands, arms or body,
facial expressions, and lip-patterns. Contrary to popular belief, sign language is
not international. These vary from region to region.
Table 1.1 Sign language is a visual language and consists of 3 major components.
Fingerspelling Word level sign Non-Manual

vocabulary features
Used to spell words. Used for majority of Facial expressions and,

letter by letter. communication. body position.
Minimizing the verbal exchange gap between D&M and non-D&M people turns
into a want to make certain effective conversation among all. Sign language
translation is among one of the most growing lines of research and it enables the
maximum natural manner of communication for those with hearing impairments.
A hand gesture recognition system offers an opportunity for deaf people to talk
with vocal humans without the need of an interpreter. The system is built for the
automated conversion of ASL into textual content and speech.
In our project we primarily focus on producing a model which can recognize
Fingerspelling based hand gestures to form a complete word by combining each
gesture.
Figure 1.1 The gestures we aim to train are as given in the image below.
In recent years there has been tremendous research done on hand gesture
recognition.
With the help of literature survey, we realized that the basic steps in hand gesture
recognition are: -
 Data acquisition
 Data pre-processing
 Feature extraction
 Gesture classification
1.1 Data acquisition:
The different approaches to acquire data about the hand gesture can be
done in the following ways:
1. Use of sensory devices:
It uses electromechanical devices to provide exact hand configuration and

position. Different glove-based approaches can be used to extract information. But
it is expensive and not user friendly.
2. Vision based approach:

In vision-based methods, the computer webcam is the input device for observing
the information of hands and/or fingers. The Vision Based methods require only
a camera, thus realizing a natural interaction between humans and computers
without the use of any extra devices, thereby reducing costs. These systems tend
to complement biological vision by describing artificial vision systems that are
implemented in software and/or hardware. The main challenge of vision-based
hand detection ranges from coping with the large variability of the human hand’s
appearance due to a huge number of hand movements, to different skin-color
possibilities as well as to the variations in viewpoints, scales, and speed of the
camera capturing the scene.
1.2 Data Pre-Processing and 1.3 Feature extraction for vision-based

approach:
● The approach for hand detection combines threshold-based color detection with
background subtraction. We can use AdaBoost face detector to differentiate
between faces and hands as they both involve similar skin-color.
● We can also extract necessary images which are to be trained by applying a filter
called Gaussian Blur (also known as Gaussian smoothing). The filter can be
easily applied using open computer vision (also known as OpenCV).
● For extracting necessary image which is to be trained we can use instrumented

gloves. This helps reduce computation time for Pre-Processing and gives us more
concise and accurate data compared to applying filters on data received from
video extraction.
● We tried doing the hand segmentation of an image using color segmentation

techniques but skin colorur and tone is highly dependent on the lighting
conditions due to which output, we got for the segmentation we tried to do were
no so great. Moreover, we have a huge number of symbols to be trained for our
project many of which look similar to each other like the gesture for symbol ‘V’
and digit ‘2’, hence we decided that in order to produce better accuracies for our
large number of symbols, rather than segmenting the hand out of a random
background we keep background of hand a stable single color so that we don’t
need to segment it on the basis of skin color. This would help us to get better
results.
1.3 Gesture Classification:

 Hidden Markov Models (HMM) are used for the classification of the
gestures. This model deals with dynamic aspects of gestures. Gestures are
extracted from a sequence of video images by tracking the skin-color blobs
corresponding to the hand into a body– face space centered on the face of
the user.
 The goal is to recognize two classes of gestures: deictic and symbolic. The
image is filtered using a fast look–up indexing table. After filtering, skin
color pixels are gathered into blobs. Blobs are statistical objects based on the
location (x, y) and the colorimetry (Y, U, V) of the skin color pixels to
determine homogeneous areas. Naïve Bayes Classifier is used which is an
effective and fast method for static hand gesture recognition. It is based on
classifying the different gestures according to geometric based invariants
which are obtained from image data after segmentation.
 Thus, unlike many other recognition methods, this method is not dependent
on skin color. The gestures are extracted from each frame of the video, with
a static background. The first step is to segment and label the objects of
interest and to extract geometric invariants from them. Next step is the
classification of gestures by using a K nearest neighbor algorithm aided
with distance weighting algorithm (KNNDW) to provide suitable data for a
locally weighted Naïve Bayes‟ classifier.
 According to the paper on “Human Hand Gesture Recognition Using a

Convolution Neural Network” by Hsien-I Lin, Ming-Hsiang Hsu, and Wei-
Kai Chen (graduates of Institute of Automation Technology National Taipei
University of Technology Taipei, Taiwan), they have constructed a skin
model to extract the hands out of an image and then apply binary threshold
to the whole image. After obtaining the threshold image they calibrate it
about the principal axis to center the image about the axis. They input this
image to a convolutional neural network model to train and predict the
outputs. They have trained their model over 7 hand gestures and using to
model they produced an accuracy of around 95% for those 7 gestures.
MOTIVATION
 Transformational Impact:
o Aims to revolutionize communication for the deaf and hard of
hearing community.
o Seeks to seamlessly translate sign language into written words.
 Overcoming Communication Barriers:

o Address the limitations imposed by a lack of effective
communication.
o Dedicated to dismantling barriers and isolation for individuals
reliant on sign language.
 Inclusivity and Accessibility:

o Symbolizes a commitment to inclusivity and accessibility.
o Goes beyond conversion, empowering individuals to participate
fully in society.
 Empowerment and Understanding:

o Aims to empower individuals by providing access to education and
information.
o Envisions unlocking opportunities for the community to confidently
enter the workforce.
 Recognition of Sign Language:

o Acknowledges sign language as a rich and vibrant language,
deserving a place in the global linguistic landscape.
o Highlights that sign language is not merely an alternative to speech
but a language.
OBJECTIVES
The Proposed Work aims to meet the following objectives:
 Communication Barrier: The primary challenge identified is the

language barrier between people who use sign language and those who
don't. This barrier is hindering effective communication, and the proposed
work aims to address this issue.
 Visual Communication: Given that sign language is a visual form of

communication, the project seeks to leverage visuals for interaction. This
likely involves developing a system that can interpret and translate the
visual elements of sign language into a more universally understandable
form.
 Common Interface: The project aims to create a common interface that

serves as a bridge between sign language and written text. This interface is
intended to make it easier for individuals not familiar with sign language to
understand the communication through a visual representation.
 Intuitive Human-Computer Interface (HCI): The ultimate goal is

to develop a human-computer interface that is intuitive and can understand
human sign language. This involves creating technology that can accurately
interpret and respond to the visual gestures and movements associated with
sign language.
SCOPE
The Proposed Work aims to meet the following scope:
 Groundbreaking Initiative:
o Primary goal: Create a seamless system for translating sign language
gestures into written text.
 Specialized Hardware Development:

o Involves the development of advanced hardware, including cameras
and sensors.
o Hardware designed to accurately capture and analyze intricate
movements of sign language users.
 Real-time Processing with Machine Learning:

o Captured data processed in real-time using state-of-the-art machine
learning algorithms.
o Transformation of data into coherent and understandable written text.
 Accessibility and User-Friendliness:

o Aims to integrate the system into various devices (smartphones,
tablets, computers).
o Emphasizes user testing and feedback mechanisms for accuracy,
intuition, and adaptability.
 Scalability:
o Designed with scalability in mind to accommodate emerging sign
language variants.
o Adaptable to advancements in technology for long-term relevance
and effectiveness.
LITERATURE SURVEY
S. No. Title Description Author

1 Sign Language The project aims to create a B. Suneetha, J.
Translator for real-time system for Mrudula, S.
Deaf and Dumb converting sign language to Deeraj,
Using Machine speech and vice versa. It Geethanjali
Learning facilitates two-way College of
communication between Engineering and
deaf-dumb individuals and Technology,
ordinary people. The Hyderabad, India
system utilizes machine (June, 2023)
learning for hand gesture
recognition from a webcam,
speech-to-text conversion,
and a visual sign word
library for sign language
interpretation. Technologies
include Convolutional
Neural Network (CNN),
OpenCV for computer
vision, Python modules for
speech recognition, and API
for speech-to-text and text-
to-speech conversions.
2 American Sign Speech impairment is a Aditi Bailur,

Language complicated condition that Yesha Limbachia,
Recognition and impairs a person's capacity Moksha Shah,
its Conversion for verbal and audible Harshil Shah,
from Text to communication. The project Prof. Atul
Speech focuses on real-time sign Kachare
language-to-text translation (September, 2023)
using American Sign
Language (ASL) and
converting the recognized
signs into audible speech.
3 Sign Language The project utilizes neural Ameer Khan B,
Detection and networks for real-time Chandru M,
Conversion to fingerspelling-based Kalaiselvan R,
Text and Speech American Sign Language (October, 2023)
Conversion recognition. The hand
undergoes filtering and
classification, achieving
98.00% accuracy for the 26
letters of the alphabet.
4 A Machine The research aims to Rahul Solleti
Learning address communication (October, 2023)
Framework and challenges for individuals
Method to with hearing disabilities by
Translate Speech translating speech into
to Real-Time Sign configurable Sign Language
Language for AR (cSL). The approach
Glasses combines speech
recognition, image
processing, and machine
learning, using AR glasses
to display real-time sign
language.
5 Sign Language to The system focuses on a Prof. M.T.
Speech glove-based device for Dangat, Rudra
Conversion converting American Sign Chandgude,
Language (ASL) to speech. Pravin Kushwaha,
It comprises two main Mohammed
parts: sign language Champeli,
recognition and conversion Prathamesh
to text, followed by speech. Pardeshi,
The glove has flex sensors AISSMS's
monitoring finger bends, Polytechnic, Pune,
sending data to an Arduino Maharashtra,
Nano, which recognizes India (October,
signs and displays text on 2023)
an LCD. The text is
wirelessly transmitted to a
device with text-to-speech
conversion software.
6 Sign Language to The paper addresses the Shubham Thakar,
Text Conversion communication challenges Samveg Shah,
in Real Time faced by the hearing Bhavya Shah,
impaired and proposes a Anant V. Nimkar
deep learning model for (December, 2022)
real-time sign language to
text translation. The model
utilizes Convolutional
Neural Network (CNN) and
transfer learning based on
the VGG16 architecture,
achieving an improved
accuracy of 98.7%. An
application integrating the
model has also been
developed.
7 Sign Language to This project aims to enable Shreyas
Text and Speech easy and low-cost Viswanathan,
Conversion Using communication between Saurabh Pandey,
CNN people with hearing/speech Kartik Sharma,
disabilities and those Dr. P.
without, using an IoT Vijayakumar
device. The proposed (May, 2021)
system converts hand
gesture images to text and
then to speech. The IoT
medium, Raspberry Pi 3, is
used for processing tasks,
running neural networks,
and handling hardware
components like a camera,
display, and speaker. The
goal is to create an
affordable and efficient
communication solution.
8 Sign Language Recognition of sign Mary Jane C.
Fingerspelling language fingerspelling Samonte, Carl
Recognition Using using depth information and Jose M. Guingab,
Depth Information deep belief networks. Ron Andrew
and Deep Belief Relayo, Mark
Networks Joseph C. Sheng,
John Ray D.
Tamayo (March,
2022)
9 Sign Language to The paper proposes a sign Akshatha Rani K,
Text-Speech language translator system Dr. N Manjanaik
Translator Using using American Sign (July, 2021)
Machine Learning Language (ASL) dataset.
The system includes hand
tracking, ANN-based
classification, and text-to-
speech conversion. The
goal is to bridge the
communication gap
between deaf-mute
individuals and others. The
proposed system achieves
74% accuracy and can
recognize almost all ASL
letters, contributing to equal
opportunities for
individuals with disabilities.
10 Sign Language The project aims to provide S. Kumara
Recognition and a voice to voiceless Krishnan, V.
Response via individuals using Python Prasanna
Virtual Reality and Unity prototype Venkatesan, V.
systems. It focuses on real- Suriya Ganesh,
time gesture recognition for D.P. Sai
sign language, specifically Prassanna, K.
the Indian Sign Language Sundara Skandan
(ISL). The system utilizes (March-April
digital image processing 2023)
techniques such as color
segmentation, skin
detection, image
segmentation, filtering, and
template matching. It
recognizes ISL gestures,
including the alphabet and a
subset of words.
11 KoSign Sign The paper introduces the Mathew Huerta-
Language NIASL2021 dataset, Enochian, Du Hui
Translation focusing on sign language Lee, Hye Jin
Project: production (SLP) and sign Myung, Kang Suk
Introducing The language translation (SLT). Byun, Jun Woo
NIASL2021 The dataset includes Lee (June, 2022)
Dataset 201,026 Korean-KSL data
pairs, represented in video
recordings, keypoint
position data, and time-
aligned gloss annotations
for hands and non-manual
signals. The evaluation of
the translation methodology
suggests that text-free
prompting produces better
translations than text-based
prompting.
-----------------------Gesture K liye----------------------------------
S. No. Title Description Author
1 Gesture-Based In 2018, a color-based method N. Meghana, K.
Human- captured shape and position Sri Lakshmi, M.
Computer information, evolving into the Naga Lakshmi
Interaction virtual mouse by 2022. This Tejasree, K.
innovation, integrating hand Srujana, N.
gesture recognition and webcam Ashok
input, goes beyond cursor (October 2023)
control, enabling diverse
functions, including paint
application interaction. The
system enhances user-computer
interaction, empowers digital
creativity, and offers an
intuitive alternative to
traditional mouse systems,
marking a significant leap in
Human-Computer Interaction.
2 Virtual Mouse This research presents an AI Burru Venkata
System virtual mouse system leveraging Siddartha Yadav,
Utilizing AI computer vision and hand Sagam
Technology gestures, eliminating the need for Narsimham,
a physical mouse. Implemented Priyanka
in Python with OpenCV, it tracks Kashysap, and
hand motions via a camera, Nikita Kashyap
enabling cursor control and
gestures for clicking and (2022)
scrolling. The technology
enhances user experience and
accessibility, with potential
applications in diverse fields.
3 Gesture- This paper introduces an Bharath Kumar
control- innovative AI visual mouse Reddy Sandra,
Virtual-Mouse system leveraging computer Katakam Harsha
vision to interpret hand gestures Vardhan, Ch.
and tips, enabling mouse, Uday, V Sai
keyboard, and stylus functions Surya, Bala Raju,
without additional hardware. Dr. Vipin Kumar
Developed in Python with (April- 2022)
OpenCV, the system achieves
high accuracy using a webcam,
offering practical applications,
such as mitigating COVID-19
spread without wearables.
Future improvements aim to
enhance right-click accuracy
and text selection through
advanced fingerprint capture
methods.
4 Implementing This project introduces a Ranjith GC,
a Real Time Python-based AI virtual mouse Saritha Shetty
Virtual Mouse system using hand motions and (May- 2023)
System Using tip detection through a
Computer computer's camera, eliminating
Vision the need for a physical mouse.
The model, developed with
MediaPipe and other packages,
exhibits high precision in mouse
operations. It addresses real-
world scenarios where space is
limited or individuals face
challenges using traditional
mice, offering a promising
alternative with future
applications in human-computer
interaction.
--------------------------Text to speech------------------------
S. Title Description Author

No.
1 SummarizeAI - This project unveils a web Dhairya Khanna,
Summarization of app leveraging Large Rishab Bhushan,
the Podcasts Language Models to Khushboo Goel
streamline podcast and Shallu Juneja
consumption for the
expanding audience.
Overcoming challenges, it
delivers concise text and
audio summaries, aiming to
enhance efficiency while
acknowledging limitations
and envisioning future
improvements in speed and
scalability.
2 DEVELOPMENT This research introduces a O.M Olaniyan
OF A TEXT-TO- groundbreaking Text-to- and Victor
SPEECH Speech (TTS) synthesis Akinode
SYNTHESIS system for the Yoruba
language, addressing its (October- 2023)
FOR YORUBA
LANGUAGE unique tonal challenges.
USING DEEP Leveraging state-of-the-art
LEARNING natural language processing
techniques and deep learning,
the project aims to provide a
human-like voice,
revolutionizing Yoruba TTS
synthesis and promoting
inclusivity in industries like
technology, education, and
healthcare, fostering a more
accessible digital world.
3 AN OVERVIEW Text-to-speech (TTS) Mohammad Reza

OF TEXTTO- systems play a crucial role in Hasanabadi
SPEECH applications like navigation (April, June-
SYSTEMS AND and accessibility. Modern 2023)
MEDIA TTS involves complex
APPLICATIONS design, with neural networks
offering higher quality than
traditional methods. Various
models, including Tacotron 2
and FastSpeech, have
advanced TTS capabilities,
with considerations for end-
to-end and fully end-to-end
architectures.
4 Bangla Text This research introduces a text Md. Rezaul
Normalization for normalization method for Islam, Arif
Text-to-speech Bangla TTS synthesis, Ahmad,
Synthesizer Using achieving a record 99.997% Mohammad
Machine Learning accuracy with XGBClassifier. Shahidur
Algorithms The proposed approach Rahman
translates written Bangla text
into normalized form, (October 2023)
enhancing TTS synthesizer
accuracy. Future work aims to
explore deep learning for
further improvements.
5 LAURAGPT: This paper introduces Jiaming Wang† ,

LISTEN, LauraGPT, a groundbreaking Zhihao Du† ,
ATTEND, unified audio-text language Qian Chen,
UNDERSTAND, model designed for artificial Yunfei Chu,
AND general intelligence. Zhifu Gao, Zerui
REGENERATE Overcoming limitations of Li, Kai Hu,
AUDIO WITH existing models, LauraGPT Xiaohuan Zhou,
GPT employs a decoder-only Jin Xu, Ziyang
Transformer framework, Ma, Wen Wang,
combining continuous and Siqi Zheng,
discrete audio features, Chang Zhou,
excelling in diverse tasks, Zhijie Yan,
from speech recognition to Shiliang
audio generation. Zhang‡∗
(October 2023)
6 An Efficient This paper introduces a novel Ms. Swaroopa
Approach for prototype system aiding Shastri, Shashank
Text-to-Speech visually impaired individuals Vishwakarma
Conversion Using in independent text reading.
Employing connected (August 2023)
Machine Learning
and Image component analysis and
Processing MSER algorithms, it identifies
Technique text in images, converts it
through OCR, and utilizes
text-to-speech synthesis. The
technology extends to
automatic speech recognition
and offers potential
enhancements for real-time
text detection and
documentation.
HARDWARE & SOFTWARE REQUIREMENTS

 Python 3.6.6
 Tensorflow 1.11.0
 OpenCV 3.4.3.18
 NumPy 1.15.3
 Matplotlib 3.0.0
 Hunspell 2.0.2
 Keras 2.2.1
 PIL 5.3.0
TIMELINE CHART
Figure 1.2 Timeline Chart
CONCLUSION
In this report, a functional real time vision based American Sign Language
recognition for D&M people have been developed for asl alphabets.
We will achieve final accuracy of 98.0% on our data set. We have improved our
prediction after implementing two layers of algorithms wherein we have verified
and predicted symbols which are more like each other.
This gives us the ability to detect almost all the symbols if they are shown
properly, there is no noise in the background and the lighting is adequate.
Future Scope
We are planning to achieve higher accuracy even in the case of complex
backgrounds by trying out various background subtraction algorithms.
We are also thinking of improving the Pre-Processing to predict gestures in low
light conditions with higher accuracy.
This project can be enhanced by being built as a web/mobile application for the
users to conveniently access the project. Also, the existing project only works for
ASL; it can be extended to work for other native sign languages with the right
amount of data set and training. This project implements a finger spelling
translator; however, sign languages are also spoken in a contextual basis where
each gesture could represent an object, or verb. So, identifying this kind of
contextual signing would require a higher degree of processing and natural
language processing (NLP).
REFERENCES
[1] Sign Language Translator for Deaf and Dumb Using Machine Learning,
ISSN: 0970-2555 Volume: 52, Issue 6, June: 2023
[2] American Sign Language Recognition and its Conversion from Text to
Speech, Volume 11 Issue IX Sep 2023
[3] Sign Language Detection and Conversion to Text and Speech Conversion,
Volume: 07 Issue: 10 | October - 2023
[4] A Machine Learning Framework and Method to Translate Speech to Real-

Time Sign Language for AR Glasses, Vol. 03, Issue 10, October 2023
[5] Sign Language to Speech Conversion, Volume 11 Issue X Oct 2023
[6] Sign Language to Text Conversion in Real Time using Transfer Learning,
December 2022
[7] Sign Language to Text and Speech Conversion Using CNN,

Volume:03/Issue:05/May-2021
[8] Sign Language Fingerspelling Recognition Using Depth Information and

Deep Belief Networks, Proceedings of the International Conference on Industrial
Engineering and Operations Management
Istanbul, Turkey, March 7-10, 2022
[9] Sign Language to Text-Speech Translator Using Machine Learning,

Volume 09. No. 7, July 2021
[10] Sign Language Recognition and Response via Virtual Reality, Volume 5,
Issue 2, March-April 2023
[11] KoSign Sign Language Translation Project: Introducing The NIASL2021

Dataset, Language Resources and Evaluation Conference (LREC 2022),
Marseille, 20-25 June 2022
[12] Sign language recognition system for communicating to people with

disabilities, Volume 216, 2023
[13] Machine translation from text to sign language: a systematic review, 03 July
2021
---------------------------------------
Gesture--------------------------------------------------
[14] Bharath Kumar Reddy Sandra, Katakam Harsha Vardhan, Ch. Uday, V Sai
Surya, Bala Raju, Dr. Vipin Kumar . (2022), “GESTURECONTROL-VIRTUAL-
MOUSE”, International Research Journal of Modernization in Engineering
Technology and Science.
[15] Ranjith GC, Saritha Shetty (2023), “IMPLEMENTING A REAL TIME

VIRTUAL MOUSE SYSTEM USING COMPUTER VISION”, International
Research Journal of Modernization in Engineering Technology and Science.
[16] Israth Jahan, Mohammad Likhan, Md. Omar Faruk Hasan, Shanta Islam,
Nurul Ahad Farhan (2023), “Artificial Intelligence Virtual Mouse”, ResearchGate.
[17] K. Bharath Reddy, Md. Fayazuddin, M. John Manohar(2022), “Artificial

Intelligence Based Virtual Mouse”, International Journal of Computer Science and
Mobile Computing.
[18] Shetty, Monali, Christina A. Daniel, Manthan K. Bhatkar, and Ofrin P.

Lopes. “Virtual Mouse Using Object Tracking”. In Proceeding of 5th International
Conference on Communication and Electronics Systems, IEEE, pp.548-553, 2020.
[19] Tran, D.S., Ho, N.H., Yang, H.J., Kim, S.H. and Lee, G.S. “Realtime Virtual
Mouse System using RGB-D Images and Fingertip Detection”. In Proceeding of
International Conference on Multimedia Tools and Applications, pp.10473- 10490,
2021.
[20] Reddy, Vantukala VishnuTeja, Thumma Dhyanchand, Galla Vamsi Krishna,
and Satish Maheshwaram. “Virtual Mouse Control Using Colored Finger Tips and
Hand Gesture Recognition”. In Proceeding of International Conference in
Hyderabad Section, IEEE, pp.1-5, 2020.
[21] Chowdhury, S.R., Pathak, S., Praveena, M.A.,“Gesture Recognition Based

Virtual Mouse and Keyboard”. In Proceeding of 4th International Conference on
Trends in Electronics and Informatics, IEEE, pp.585-589, 2020.
[21] S.Shriram” Deep Learning-Based Real-Time AI Virtual Mouse System Using

Computer Vision to Avoid COVID-19 Spread”, Applied Sciences, Volume 2021,
Article ID 8133076, 2021.
[22] Masurovsky, A., Chojecki, P., Runde, D., Lafci, M., Przewozny, D., Gaebler,
M.,2020. Controller-Free Hand Tracking for Grab-and Place Tasks in Immersive
Virtual Reality: Design Elements and Their Empirical Study. Multimodal
Technol.Interact.4,91.
[23] Inside Facebook Reality Labs: Wrist-based interaction for the next computing
platform [WWW Document], 2021 Facebook Technol. URL
https://tech.fb.com/inside-facebook-realitylabs-wrist-basedinter action-for-the-next
computing-platform/ (accessed 3.18.21).
[24] J. Katona, “A review of human–computer interaction and virtual reality

research fields in cognitive Info Communications,” Applied Sciences, vol. 11, no.
6, p. 2646, 2021.View at: Publisher Site | Google Scholar.
[25] Prachi Agarwal, Abhay Varshney, Harsh Gupta, Garvit Bhola, Harsh Beer
Singh, Gesture Controlled Virtual Mouse, March 12, 2022.View at google scholar.
[26] J. Katona, “A review of human–computer interaction and virtual reality

research fields in cognitive Info Communications,” Applied Sciences, vol. 11, no.
6, p. 2646, 2021.
----------------------------------text to speech--------------------------------------
[27] Ming Zhong, Pengfei Liu, Yiran Chen, Danqing Wang, Xipeng Qiu, and
Xuanjing Huang. 2020. Extractive Summarization as Text Matching. arXiv
preprint arXiv:2004.08795 (2020).
[28] Yang Liu and Mirella Lapata. 2019. Text Summarization with Pre Trained
Encoders. Proceedings of the 2019 Conference on Empirical Methods in Natural
Language Processing and the 9th International Joint Conference on Natural
Language Processing (EMNLP-IJCNLP). 3721–3731.
[29] Tzu-En Liu, Shih-Hung Liu, and Berlin Chen. 2019. A hierarchical neural
summarization framework for spoken documents. In ICASSP 2019-2019 IEEE
International Conference on Acoustics, Speech and Signal Processing (ICASSP).
IEEE, 7185–7189.
[30] Kong J., Kim, J., & Bae J. (2020) HiFi-GAN: Generative Adversarial
networks for Efficient and High-Fidelity Speech Synthesis. Advances in Neural
Information Processing Systems, 33.
[31] Zhu, C., et al. (2021). Recent advances in text-tospeech synthesis: From
concatenative to parametric approaches. IEEE Signal Processing Magazine, 38(3),
51-66.
[32] Kim J. Kong J. & Son J., “Conditional variational autoencoder with
adversarial learning for end-to-end text-to-speech,” in International Conference on
Machine Learning. PMLR, 2021.
[33] Hayashi, T., Inaguma, H., Ozaki, H., Yamamoto, R., Takeda, K., & Aizawa,
A. (2021). ESPnet-TTS: Unified, Reproducible, and Integratable Open Source
End-to-End Text-to-Speech Toolkit. Proceedings of the 2021 IEEE Automatic
Speech Recognition and Understanding Workshop (ASRU 2021).
[34] Gulati, S., Vaswani, A., Ahuja, A., Gandhi, V., Chan, S., Zhang, Y., ... &
Wu, Y. (2021). Conformer: Convolution-augmented Transformer for Speech
Recognition. Proceedings of the 58th Annual Meeting of the Association for
Computational Linguistics (ACL 2020), 5806-5815.
[35] Gyan, B., et al. (2022). Enhancing speech synthesis for the Yoruba language.
Journal of Language Technology and Computational Linguistics, 36(2), 87-104.
[36] Donahue, J., Dieleman, S., Binkowski, M., Elsen, E., and Simonyan, K.
(2021). End-toend Adversarial Text-to-Speech. In the International Conference on
Learning Representations, URL https://openreview.net/forum? id=rsf1z-JSj87.
[37] Tan, X., et al., A survey on neural speech synthesis. arXiv preprint
arXiv:2106.15561, 2021.
[38] Ren, Y., et al., Fastspeech 2: Fast and highquality end-to-end text to speech.
arXiv preprint arXiv:2006.04558, 2020.
[39] Donahue, J., et al., End-to-end adversarial text-tospeech. arXiv preprint

arXiv:2006.03575, 2020.
[40] Li, N., et al. Neural speech synthesis with transformer network. in
Proceedings of the AAAI Conference on Artifi cial Intelligence. 2019.
[41] Biswas N, Uddin KM, Rikta ST, Dey SK. A comparative analysis of
machine learning classifiers for stroke prediction: A predictive analytics approach.
Healthcare Analytics. 2022 Nov 1;2:100116.
[42] Islam, M.R., Rahman, J., Talha, M.R. and Chowdhury, F., 2020, June. Query
Expansion for Bangla Search Engine Pipilika. In 2020 IEEE Region 10
Symposium (TENSYMP) (pp. 1367-1370). IEEE.
[43] Essa, E., Omar, K. and Alqahtani, A., 2023. Fake news detection based on a
hybrid BERT and LightGBM models. Complex & Intelligent Systems, pp.1-12.
[44] Lai, T.M., Zhang, Y., Bakhturina, E., Ginsburg, B. and Ji, H., 2021. A
Unified Transformer-based Framework for Duplex Text Normalization. arXiv
preprint arXiv:2108.09889.
[45] Tyagi, S., Bonafonte, A., Lorenzo-Trueba, J. and Latorre, J., 2021. Proteno:
Text normalization with limited data for fast deployment in text to speech systems.
arXiv preprint arXiv:2104.07777.
[46] Ro, J.H., Stahlberg, F., Wu, K. and Kumar, S., 2022. Transformer-based
Models of Text Normalization for Speech Applications. arXiv preprint
arXiv:2202.00153.
[47] , Advances in Neural Information Processing Systems 33: Annual Conference

on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12,
2020, virtual, 2020.
[48] Vaibhav V. Govekar, Meenakshi A (2018) “A Smart Reader for Blind

People”, International Journal of Science Technology & Engineering,ISSN: 2349-
784X,Issue 1, Vol.5, pp. 0.
[49] Kiran Rakshana R, Chitra C(2019) “A Smart Navguide System for Visually
Impaired”, International Journal of Innovative Technology and Exploring
Engineering, ISSN:2278- 3075, Issue 6S3, Vol. 8, No. 0, pp. 0.
[50] A. Laptev, R. Korostik, A. Svischev, A. Andrusenko, I. Medennikov, and S.

Rybin, "You Do Not Need More Data: Improving End-To-End Speech
Recognition by Text-To-Speech Data Augmentation," 2020 13th International
Congress on Image and Signal Processing, BioMedical Engineering and
Informatics (CISP-BMEI), 2020, pp. 439-444, DOI:
10.1109/CISPBMEI51763.2020.9263564

Project Synopsis (2) (2) (1)

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Project Synopsis (2) (2) (1)

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Project Synopsis (2) (2) (1)

Uploaded by

Copyright:

Available Formats

SYNOPSIS REPORT

B. TECH COMPUTER SCIENCE & ENGINEERING.

Harshit Garg 2000910100075

S. No Topic Page No.

Fingerspelling Word level sign Non-Manual

Used to spell words. Used for majority of Facial expressions and,

1.1 Data acquisition:

1. Use of sensory devices:

It uses electromechanical devices to provide exact hand configuration and

2. Vision based approach:

1.2 Data Pre-Processing and 1.3 Feature extraction for vision-based

● For extracting necessary image which is to be trained we can use instrumented

● We tried doing the hand segmentation of an image using color segmentation

1.3 Gesture Classification:

 According to the paper on “Human Hand Gesture Recognition Using a

 Overcoming Communication Barriers:

 Inclusivity and Accessibility:

 Empowerment and Understanding:

 Recognition of Sign Language:

 Communication Barrier: The primary challenge identified is the

 Visual Communication: Given that sign language is a visual form of

 Common Interface: The project aims to create a common interface that

 Intuitive Human-Computer Interface (HCI): The ultimate goal is

The Proposed Work aims to meet the following scope:

 Specialized Hardware Development:

 Real-time Processing with Machine Learning:

 Accessibility and User-Friendliness:

S. No. Title Description Author

2 American Sign Speech impairment is a Aditi Bailur,

S. Title Description Author

3 AN OVERVIEW Text-to-speech (TTS) Mohammad Reza

5 LAURAGPT: This paper introduces Jiaming Wang† ,

HARDWARE & SOFTWARE REQUIREMENTS

[4] A Machine Learning Framework and Method to Translate Speech to Real-

[5] Sign Language to Speech Conversion, Volume 11 Issue X Oct 2023

[7] Sign Language to Text and Speech Conversion Using CNN,

[8] Sign Language Fingerspelling Recognition Using Depth Information and

[9] Sign Language to Text-Speech Translator Using Machine Learning,

[11] KoSign Sign Language Translation Project: Introducing The NIASL2021

[12] Sign language recognition system for communicating to people with

[15] Ranjith GC, Saritha Shetty (2023), “IMPLEMENTING A REAL TIME

[17] K. Bharath Reddy, Md. Fayazuddin, M. John Manohar(2022), “Artificial

[18] Shetty, Monali, Christina A. Daniel, Manthan K. Bhatkar, and Ofrin P.

[21] Chowdhury, S.R., Pathak, S., Praveena, M.A.,“Gesture Recognition Based

[21] S.Shriram” Deep Learning-Based Real-Time AI Virtual Mouse System Using

[24] J. Katona, “A review of human–computer interaction and virtual reality

[26] J. Katona, “A review of human–computer interaction and virtual reality

[39] Donahue, J., et al., End-to-end adversarial text-tospeech. arXiv preprint

[47] , Advances in Neural Information Processing Systems 33: Annual Conference

[48] Vaibhav V. Govekar, Meenakshi A (2018) “A Smart Reader for Blind

[50] A. Laptev, R. Korostik, A. Svischev, A. Andrusenko, I. Medennikov, and S.

You might also like