Abstract
Sign language is a priceless means of communication for deaf and hard-of-hearing people to fully enable them to participate in society and interact with others. This study introduces a novel universal sign language system that uses the Gesture-script to generate a detailed description of gestures in videos, which involve continuous movement of hands, arms, heads, and body language. Subsequently, we input this description into a Large Language Model (LLM) to interpret sign language. We deployed a few-shot prompting technique for LLM, enabling it to precisely transfer the sign videos into corresponding sentences in natural language. Furthermore, the Few-shot prompting technique enables our system to interpret multiple types of sign language without pre-training or fine-tuning.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Camgoz, N.C., Hadfield, S., Koller, O., Ney, H., Bowden, R.: Neural sign language translation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Camgoz, N.C., Koller, O., Hadfield, S., Bowden, R.: Sign language transformers: joint end-to-end sign language recognition and translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Cihan Camgoz, N., Hadfield, S., Koller, O., Bowden, R.: SubUNets: end-to-end hand shape and continuous sign language recognition. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2017)
He, K., Chen, X., Xie, S., Li, Y., Dollár, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16000–16009 (2022)
Li, D., Opazo, C.R., Yu, X., Li, H.: Word-level deep sign language recognition from video: a new large-scale dataset and methods comparison (2020)
Li, J., Li, D., Savarese, S., Hoi, S.: Blip-2: bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv preprint arXiv:2301.12597 (2023)
Lugaresi, C., et al.: MediaPipe: a framework for building perception pipelines. arXiv preprint arXiv:1906.08172 (2019)
Ma, J., Yang, C., Mao, S., Zhang, J., Periaswamy, S.C., Patton, J.: Human trajectory completion with transformers. In: ICC 2022-IEEE International Conference on Communications, pp. 3346–3351. IEEE (2022)
Pereira-Montiel, E., et al.: Automatic sign language recognition based on accelerometry and surface electromyography signals: a study for colombian sign language. Biomed. Signal Process. Control 71, 103201 (2022)
Podder, K.K., Chowdhury, M., Mahbub, Z.B., Kadir, M.: Bangla sign language alphabet recognition using transfer learning based convolutional neural network. Bangladesh J. Sci. Res. 31–33 (2020)
Podder, K.K., et al.: Bangla sign language (BdSL) alphabets and numerals classification using a deep learning model. Sensors 22(2), 574 (2022)
Podder, K.K., et al.: Signer-independent Arabic sign language recognition system using deep learning model. Sensors 23(16), 7156 (2023)
Podder, K.K., Tabassum, S., Khan, L.E., Salam, K.M.A., Maruf, R.I., Ahmed, A.: Design of a sign language transformer to enable the participation of persons with disabilities in remote healthcare systems for ensuring universal healthcare coverage. In: 2021 IEEE Technology and Engineering Management Conference-Europe (TEMSCON-EUR), pp. 1–6. IEEE (2021)
Tenney, I., Das, D., Pavlick, E.: BERT rediscovers the classical NLP pipeline. arXiv preprint arXiv:1905.05950 (2019)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processesing System, vol. 30 (2017)
Wang, X., Zhang, J., Mao, S., Periaswamy, S.C., Patton, J.: Locating multiple RFID tags with swin transformer-based RF hologram tensor filtering. In: 2022 IEEE 96th Vehicular Technology Conference (VTC2022-Fall), pp. 1–2. IEEE (2022)
Wu, Y., Zhang, J., Wu, S., Mao, S., Wang, Y.: CMRM: a cross-modal reasoning model to enable zero-shot imitation learning for robotic RFID inventory in unstructured environments. In: IEEE Global Communications Conference (2023)
Acknowledgment
This work is supported in part by the NSF under Grants CCSS-2245607 and CCSS-2245608.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Podder, K.K., Zhang, J., Wang, L. (2025). Universal Sign Language Recognition System Using Gesture Description Generation and Large Language Model. In: Cai, Z., Takabi, D., Guo, S., Zou, Y. (eds) Wireless Artificial Intelligent Computing Systems and Applications. WASA 2024. Lecture Notes in Computer Science, vol 14999. Springer, Cham. https://doi.org/10.1007/978-3-031-71470-2_23
Download citation
DOI: https://doi.org/10.1007/978-3-031-71470-2_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-71469-6
Online ISBN: 978-3-031-71470-2
eBook Packages: Computer ScienceComputer Science (R0)