Project Synopsis (2) (2) (1)
Project Synopsis (2) (2) (1)
Project Synopsis (2) (2) (1)
ON
Real Time Conversion of American Sign Language to Text
using Machine Learning
Submitted by
Project Supervisor:
Dr. Rachna Jain
Department of Computer Science and Engineering
JSS Academy of Technical Education, Noida
November 2023
TABLE OF CONTENTS
9 REFERENCES 15
INTRODUCTION
American sign language is a predominant sign language Since the only disability
Deaf and Dumb (hereby referred to as D&M) people have been communication
related and since they cannot use spoken languages, the only way for them to
communicate is through sign language. Communication is the process of
exchange of thoughts and messages in various ways such as speech, signals,
behavior, and visuals. D&M people make use of their hands to express different
gestures to express their ideas with other people. Gestures are non-verbally
exchanged messages and these gestures are understood with vision. This
nonverbal communication of deaf and dumb people is called sign language. Sign
language is a language which uses gestures instead of sound to convey meaning
combining handshapes, orientation and movement of the hands, arms or body,
facial expressions, and lip-patterns. Contrary to popular belief, sign language is
not international. These vary from region to region.
Table 1.1 Sign language is a visual language and consists of 3 major components.
Figure 1.1 The gestures we aim to train are as given in the image below.
In recent years there has been tremendous research done on hand gesture
recognition.
With the help of literature survey, we realized that the basic steps in hand gesture
recognition are: -
Data acquisition
Data pre-processing
Feature extraction
Gesture classification
The different approaches to acquire data about the hand gesture can be
done in the following ways:
● The approach for hand detection combines threshold-based color detection with
background subtraction. We can use AdaBoost face detector to differentiate
between faces and hands as they both involve similar skin-color.
● We can also extract necessary images which are to be trained by applying a filter
called Gaussian Blur (also known as Gaussian smoothing). The filter can be
easily applied using open computer vision (also known as OpenCV).
The goal is to recognize two classes of gestures: deictic and symbolic. The
image is filtered using a fast look–up indexing table. After filtering, skin
color pixels are gathered into blobs. Blobs are statistical objects based on the
location (x, y) and the colorimetry (Y, U, V) of the skin color pixels to
determine homogeneous areas. Naïve Bayes Classifier is used which is an
effective and fast method for static hand gesture recognition. It is based on
classifying the different gestures according to geometric based invariants
which are obtained from image data after segmentation.
Thus, unlike many other recognition methods, this method is not dependent
on skin color. The gestures are extracted from each frame of the video, with
a static background. The first step is to segment and label the objects of
interest and to extract geometric invariants from them. Next step is the
classification of gestures by using a K nearest neighbor algorithm aided
with distance weighting algorithm (KNNDW) to provide suitable data for a
locally weighted Naïve Bayes‟ classifier.
Transformational Impact:
o Aims to revolutionize communication for the deaf and hard of
hearing community.
o Seeks to seamlessly translate sign language into written words.
Groundbreaking Initiative:
o Primary goal: Create a seamless system for translating sign language
gestures into written text.
Scalability:
o Designed with scalability in mind to accommodate emerging sign
language variants.
o Adaptable to advancements in technology for long-term relevance
and effectiveness.
LITERATURE SURVEY
-----------------------Gesture K liye----------------------------------
S. No. Title Description Author
1 Gesture-Based In 2018, a color-based method N. Meghana, K.
Human- captured shape and position Sri Lakshmi, M.
Computer information, evolving into the Naga Lakshmi
Interaction virtual mouse by 2022. This Tejasree, K.
innovation, integrating hand Srujana, N.
gesture recognition and webcam Ashok
input, goes beyond cursor (October 2023)
control, enabling diverse
functions, including paint
application interaction. The
system enhances user-computer
interaction, empowers digital
creativity, and offers an
intuitive alternative to
traditional mouse systems,
marking a significant leap in
Human-Computer Interaction.
2 Virtual Mouse This research presents an AI Burru Venkata
System virtual mouse system leveraging Siddartha Yadav,
Utilizing AI computer vision and hand Sagam
Technology gestures, eliminating the need for Narsimham,
a physical mouse. Implemented Priyanka
in Python with OpenCV, it tracks Kashysap, and
hand motions via a camera, Nikita Kashyap
enabling cursor control and
gestures for clicking and (2022)
scrolling. The technology
enhances user experience and
accessibility, with potential
applications in diverse fields.
3 Gesture- This paper introduces an Bharath Kumar
control- innovative AI visual mouse Reddy Sandra,
Virtual-Mouse system leveraging computer Katakam Harsha
vision to interpret hand gestures Vardhan, Ch.
and tips, enabling mouse, Uday, V Sai
keyboard, and stylus functions Surya, Bala Raju,
without additional hardware. Dr. Vipin Kumar
Developed in Python with (April- 2022)
OpenCV, the system achieves
high accuracy using a webcam,
offering practical applications,
such as mitigating COVID-19
spread without wearables.
Future improvements aim to
enhance right-click accuracy
and text selection through
advanced fingerprint capture
methods.
4 Implementing This project introduces a Ranjith GC,
a Real Time Python-based AI virtual mouse Saritha Shetty
Virtual Mouse system using hand motions and (May- 2023)
System Using tip detection through a
Computer computer's camera, eliminating
Vision the need for a physical mouse.
The model, developed with
MediaPipe and other packages,
exhibits high precision in mouse
operations. It addresses real-
world scenarios where space is
limited or individuals face
challenges using traditional
mice, offering a promising
alternative with future
applications in human-computer
interaction.
--------------------------Text to speech------------------------
TIMELINE CHART
Figure 1.2 Timeline Chart
CONCLUSION
In this report, a functional real time vision based American Sign Language
recognition for D&M people have been developed for asl alphabets.
We will achieve final accuracy of 98.0% on our data set. We have improved our
prediction after implementing two layers of algorithms wherein we have verified
and predicted symbols which are more like each other.
This gives us the ability to detect almost all the symbols if they are shown
properly, there is no noise in the background and the lighting is adequate.
Future Scope
We are planning to achieve higher accuracy even in the case of complex
backgrounds by trying out various background subtraction algorithms.
We are also thinking of improving the Pre-Processing to predict gestures in low
light conditions with higher accuracy.
This project can be enhanced by being built as a web/mobile application for the
users to conveniently access the project. Also, the existing project only works for
ASL; it can be extended to work for other native sign languages with the right
amount of data set and training. This project implements a finger spelling
translator; however, sign languages are also spoken in a contextual basis where
each gesture could represent an object, or verb. So, identifying this kind of
contextual signing would require a higher degree of processing and natural
language processing (NLP).
REFERENCES
[1] Sign Language Translator for Deaf and Dumb Using Machine Learning,
ISSN: 0970-2555 Volume: 52, Issue 6, June: 2023
[2] American Sign Language Recognition and its Conversion from Text to
Speech, Volume 11 Issue IX Sep 2023
[3] Sign Language Detection and Conversion to Text and Speech Conversion,
Volume: 07 Issue: 10 | October - 2023
[6] Sign Language to Text Conversion in Real Time using Transfer Learning,
December 2022
[10] Sign Language Recognition and Response via Virtual Reality, Volume 5,
Issue 2, March-April 2023
[13] Machine translation from text to sign language: a systematic review, 03 July
2021
---------------------------------------
Gesture--------------------------------------------------
[14] Bharath Kumar Reddy Sandra, Katakam Harsha Vardhan, Ch. Uday, V Sai
Surya, Bala Raju, Dr. Vipin Kumar . (2022), “GESTURECONTROL-VIRTUAL-
MOUSE”, International Research Journal of Modernization in Engineering
Technology and Science.
[16] Israth Jahan, Mohammad Likhan, Md. Omar Faruk Hasan, Shanta Islam,
Nurul Ahad Farhan (2023), “Artificial Intelligence Virtual Mouse”, ResearchGate.
[19] Tran, D.S., Ho, N.H., Yang, H.J., Kim, S.H. and Lee, G.S. “Realtime Virtual
Mouse System using RGB-D Images and Fingertip Detection”. In Proceeding of
International Conference on Multimedia Tools and Applications, pp.10473- 10490,
2021.
[20] Reddy, Vantukala VishnuTeja, Thumma Dhyanchand, Galla Vamsi Krishna,
and Satish Maheshwaram. “Virtual Mouse Control Using Colored Finger Tips and
Hand Gesture Recognition”. In Proceeding of International Conference in
Hyderabad Section, IEEE, pp.1-5, 2020.
[22] Masurovsky, A., Chojecki, P., Runde, D., Lafci, M., Przewozny, D., Gaebler,
M.,2020. Controller-Free Hand Tracking for Grab-and Place Tasks in Immersive
Virtual Reality: Design Elements and Their Empirical Study. Multimodal
Technol.Interact.4,91.
[23] Inside Facebook Reality Labs: Wrist-based interaction for the next computing
platform [WWW Document], 2021 Facebook Technol. URL
https://tech.fb.com/inside-facebook-realitylabs-wrist-basedinter action-for-the-next
computing-platform/ (accessed 3.18.21).
[25] Prachi Agarwal, Abhay Varshney, Harsh Gupta, Garvit Bhola, Harsh Beer
Singh, Gesture Controlled Virtual Mouse, March 12, 2022.View at google scholar.
----------------------------------text to speech--------------------------------------
[27] Ming Zhong, Pengfei Liu, Yiran Chen, Danqing Wang, Xipeng Qiu, and
Xuanjing Huang. 2020. Extractive Summarization as Text Matching. arXiv
preprint arXiv:2004.08795 (2020).
[28] Yang Liu and Mirella Lapata. 2019. Text Summarization with Pre Trained
Encoders. Proceedings of the 2019 Conference on Empirical Methods in Natural
Language Processing and the 9th International Joint Conference on Natural
Language Processing (EMNLP-IJCNLP). 3721–3731.
[29] Tzu-En Liu, Shih-Hung Liu, and Berlin Chen. 2019. A hierarchical neural
summarization framework for spoken documents. In ICASSP 2019-2019 IEEE
International Conference on Acoustics, Speech and Signal Processing (ICASSP).
IEEE, 7185–7189.
[30] Kong J., Kim, J., & Bae J. (2020) HiFi-GAN: Generative Adversarial
networks for Efficient and High-Fidelity Speech Synthesis. Advances in Neural
Information Processing Systems, 33.
[31] Zhu, C., et al. (2021). Recent advances in text-tospeech synthesis: From
concatenative to parametric approaches. IEEE Signal Processing Magazine, 38(3),
51-66.
[32] Kim J. Kong J. & Son J., “Conditional variational autoencoder with
adversarial learning for end-to-end text-to-speech,” in International Conference on
Machine Learning. PMLR, 2021.
[33] Hayashi, T., Inaguma, H., Ozaki, H., Yamamoto, R., Takeda, K., & Aizawa,
A. (2021). ESPnet-TTS: Unified, Reproducible, and Integratable Open Source
End-to-End Text-to-Speech Toolkit. Proceedings of the 2021 IEEE Automatic
Speech Recognition and Understanding Workshop (ASRU 2021).
[34] Gulati, S., Vaswani, A., Ahuja, A., Gandhi, V., Chan, S., Zhang, Y., ... &
Wu, Y. (2021). Conformer: Convolution-augmented Transformer for Speech
Recognition. Proceedings of the 58th Annual Meeting of the Association for
Computational Linguistics (ACL 2020), 5806-5815.
[35] Gyan, B., et al. (2022). Enhancing speech synthesis for the Yoruba language.
Journal of Language Technology and Computational Linguistics, 36(2), 87-104.
[36] Donahue, J., Dieleman, S., Binkowski, M., Elsen, E., and Simonyan, K.
(2021). End-toend Adversarial Text-to-Speech. In the International Conference on
Learning Representations, URL https://openreview.net/forum? id=rsf1z-JSj87.
[37] Tan, X., et al., A survey on neural speech synthesis. arXiv preprint
arXiv:2106.15561, 2021.
[38] Ren, Y., et al., Fastspeech 2: Fast and highquality end-to-end text to speech.
arXiv preprint arXiv:2006.04558, 2020.
[40] Li, N., et al. Neural speech synthesis with transformer network. in
Proceedings of the AAAI Conference on Artifi cial Intelligence. 2019.
[41] Biswas N, Uddin KM, Rikta ST, Dey SK. A comparative analysis of
machine learning classifiers for stroke prediction: A predictive analytics approach.
Healthcare Analytics. 2022 Nov 1;2:100116.
[42] Islam, M.R., Rahman, J., Talha, M.R. and Chowdhury, F., 2020, June. Query
Expansion for Bangla Search Engine Pipilika. In 2020 IEEE Region 10
Symposium (TENSYMP) (pp. 1367-1370). IEEE.
[43] Essa, E., Omar, K. and Alqahtani, A., 2023. Fake news detection based on a
hybrid BERT and LightGBM models. Complex & Intelligent Systems, pp.1-12.
[44] Lai, T.M., Zhang, Y., Bakhturina, E., Ginsburg, B. and Ji, H., 2021. A
Unified Transformer-based Framework for Duplex Text Normalization. arXiv
preprint arXiv:2108.09889.
[45] Tyagi, S., Bonafonte, A., Lorenzo-Trueba, J. and Latorre, J., 2021. Proteno:
Text normalization with limited data for fast deployment in text to speech systems.
arXiv preprint arXiv:2104.07777.
[46] Ro, J.H., Stahlberg, F., Wu, K. and Kumar, S., 2022. Transformer-based
Models of Text Normalization for Speech Applications. arXiv preprint
arXiv:2202.00153.
[49] Kiran Rakshana R, Chitra C(2019) “A Smart Navguide System for Visually
Impaired”, International Journal of Innovative Technology and Exploring
Engineering, ISSN:2278- 3075, Issue 6S3, Vol. 8, No. 0, pp. 0.