Universal Sign Language Recognition System Using Gesture Description Generation and Large Language Model

Podder, Kanchon Kanti; Zhang, Jian; Wang, Lingyan

doi:10.1007/978-3-031-71470-2_23

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14999))

Included in the following conference series:

International Conference on Wireless Artificial Intelligent Computing Systems and Applications

133 Accesses

Abstract

Sign language is a priceless means of communication for deaf and hard-of-hearing people to fully enable them to participate in society and interact with others. This study introduces a novel universal sign language system that uses the Gesture-script to generate a detailed description of gestures in videos, which involve continuous movement of hands, arms, heads, and body language. Subsequently, we input this description into a Large Language Model (LLM) to interpret sign language. We deployed a few-shot prompting technique for LLM, enabling it to precisely transfer the sign videos into corresponding sentences in natural language. Furthermore, the Few-shot prompting technique enables our system to interpret multiple types of sign language without pre-training or fine-tuning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

SignGen: End-to-End Sign Language Video Generation with Latent Diffusion

SignAvatars: A Large-Scale 3D Sign Language Holistic Motion Dataset and Benchmark

Sign Language Recognition for Low Resource Languages Using Few Shot Learning

References

Camgoz, N.C., Hadfield, S., Koller, O., Ney, H., Bowden, R.: Neural sign language translation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Google Scholar
Camgoz, N.C., Koller, O., Hadfield, S., Bowden, R.: Sign language transformers: joint end-to-end sign language recognition and translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Google Scholar
Cihan Camgoz, N., Hadfield, S., Koller, O., Bowden, R.: SubUNets: end-to-end hand shape and continuous sign language recognition. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2017)
Google Scholar
He, K., Chen, X., Xie, S., Li, Y., Dollár, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16000–16009 (2022)
Google Scholar
Li, D., Opazo, C.R., Yu, X., Li, H.: Word-level deep sign language recognition from video: a new large-scale dataset and methods comparison (2020)
Google Scholar
Li, J., Li, D., Savarese, S., Hoi, S.: Blip-2: bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv preprint arXiv:2301.12597 (2023)
Lugaresi, C., et al.: MediaPipe: a framework for building perception pipelines. arXiv preprint arXiv:1906.08172 (2019)
Ma, J., Yang, C., Mao, S., Zhang, J., Periaswamy, S.C., Patton, J.: Human trajectory completion with transformers. In: ICC 2022-IEEE International Conference on Communications, pp. 3346–3351. IEEE (2022)
Google Scholar
Pereira-Montiel, E., et al.: Automatic sign language recognition based on accelerometry and surface electromyography signals: a study for colombian sign language. Biomed. Signal Process. Control 71, 103201 (2022)
Article Google Scholar
Podder, K.K., Chowdhury, M., Mahbub, Z.B., Kadir, M.: Bangla sign language alphabet recognition using transfer learning based convolutional neural network. Bangladesh J. Sci. Res. 31–33 (2020)
Google Scholar
Podder, K.K., et al.: Bangla sign language (BdSL) alphabets and numerals classification using a deep learning model. Sensors 22(2), 574 (2022)
Article Google Scholar
Podder, K.K., et al.: Signer-independent Arabic sign language recognition system using deep learning model. Sensors 23(16), 7156 (2023)
Article Google Scholar
Podder, K.K., Tabassum, S., Khan, L.E., Salam, K.M.A., Maruf, R.I., Ahmed, A.: Design of a sign language transformer to enable the participation of persons with disabilities in remote healthcare systems for ensuring universal healthcare coverage. In: 2021 IEEE Technology and Engineering Management Conference-Europe (TEMSCON-EUR), pp. 1–6. IEEE (2021)
Google Scholar
Tenney, I., Das, D., Pavlick, E.: BERT rediscovers the classical NLP pipeline. arXiv preprint arXiv:1905.05950 (2019)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processesing System, vol. 30 (2017)
Google Scholar
Wang, X., Zhang, J., Mao, S., Periaswamy, S.C., Patton, J.: Locating multiple RFID tags with swin transformer-based RF hologram tensor filtering. In: 2022 IEEE 96th Vehicular Technology Conference (VTC2022-Fall), pp. 1–2. IEEE (2022)
Google Scholar
Wu, Y., Zhang, J., Wu, S., Mao, S., Wang, Y.: CMRM: a cross-modal reasoning model to enable zero-shot imitation learning for robotic RFID inventory in unstructured environments. In: IEEE Global Communications Conference (2023)
Google Scholar

Download references

Acknowledgment

This work is supported in part by the NSF under Grants CCSS-2245607 and CCSS-2245608.

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, Kennesaw State University, Marietta, GA, 30060, USA
Kanchon Kanti Podder
Department of Information Technology, Kennesaw State University, Marietta, GA, 30060, USA
Jian Zhang
Department of Computer Science, Kennesaw State University, Marietta, GA, 30060, USA
Lingyan Wang

Authors

Kanchon Kanti Podder
View author publications
You can also search for this author in PubMed Google Scholar
Jian Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Lingyan Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jian Zhang .

Editor information

Editors and Affiliations

Georgia State University, Atlanta, GA, USA
Zhipeng Cai
Old Dominion University, Norfolk, VA, USA
Daniel Takabi
Beijing University of Posts and Telecommunications, Beijing, China
Shaoyong Guo
Shandong University, Qingdao, China
Yifei Zou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Podder, K.K., Zhang, J., Wang, L. (2025). Universal Sign Language Recognition System Using Gesture Description Generation and Large Language Model. In: Cai, Z., Takabi, D., Guo, S., Zou, Y. (eds) Wireless Artificial Intelligent Computing Systems and Applications. WASA 2024. Lecture Notes in Computer Science, vol 14999. Springer, Cham. https://doi.org/10.1007/978-3-031-71470-2_23

Download citation

DOI: https://doi.org/10.1007/978-3-031-71470-2_23
Published: 13 November 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-71469-6
Online ISBN: 978-3-031-71470-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Universal Sign Language Recognition System Using Gesture Description Generation and Large Language Model

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

SignGen: End-to-End Sign Language Video Generation with Latent Diffusion

SignAvatars: A Large-Scale 3D Sign Language Holistic Motion Dataset and Benchmark

Sign Language Recognition for Low Resource Languages Using Few Shot Learning

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Universal Sign Language Recognition System Using Gesture Description Generation and Large Language Model

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

SignGen: End-to-End Sign Language Video Generation with Latent Diffusion

SignAvatars: A Large-Scale 3D Sign Language Holistic Motion Dataset and Benchmark

Sign Language Recognition for Low Resource Languages Using Few Shot Learning

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation