Sign Language Sentence Recognition Using Hybrid Graph Embedding and Adaptive Convolutional Networks
Abstract
:1. Background and Significance
1.1. Advanced Feature Extraction and Representation
1.2. Performance Enhancement and Robustness
1.3. Accessibility and Technological Innovation
1.4. Significance
2. Literature Survey
3. Proposed Hybrid Graph Embedding and Adaptive Convolution Networks
3.1. Mathematical Foundation
3.2. Feature Extraction
3.3. Spatiotemporal Graph Convolutional Networks (ST-GCN)
3.4. Hybrid Graph Embedding Module
3.4.1. Spatial Graph Convolution
3.4.2. Temporal Attention Mechanism
3.5. Adaptive Convolutional Network
3.5.1. Dynamic Kernel Generation
3.5.2. Multi-Scale Sensor Data Fusion
Algorithm 1: HGE-ACN Method |
Input: Sensor data frames {F1, F2, …, Fn}, Keypoint Detector D Output: Predicted Sentence S Step 1: Begin G ← ∅# Initialize Graph, K ← ∅# Initialize Keypoints Step 2: for Fi in {F1, F2, …, Fn} do K[i] ← D(Fi) # Extract keypoints V ← K[i]; E ← SpatialEdges(V) ∪ TemporalEdges(V) Step 3: for vi, vj ∈ V do (Equation (7)) # Gaussian edge weight Step 4: for vi ∈ Vdo (Equation (8)) (Equation (9)) # ST-GCN Step 5: for t ∈ {1, 2, …, n}do (Equation (10)) # Adaptive Kernel Convolution (Equation (11)) Z ← Concatenate (G_emb, Ft_fused) Y ← Dense(Z); S ← Softmax(Y) Output: S End |
3.5.3. Training Procedure for the Model
3.6. Classification
4. Results and Discussion
4.1. Accuracy
4.2. The Inference Time
4.3. Error Rate
4.4. Levenshtein Distance
4.5. Model Experimentation
Ablation Study: Validating Module Contributions
4.6. Model Comparison
4.6.1. Comparison with Recent Methods
4.6.2. Parameter Settings for Benchmark Models
5. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Shahin, N.; Ismail, L. From rule-based models to deep learning transformers architectures for natural language processing and sign language translation systems: Survey, taxonomy and performance evaluation. Artif. Intell. Rev. 2024, 57, 271. [Google Scholar] [CrossRef]
- Nasabeh, S.S.; Meliá, S. Enhancing quality of life for the hearing-impaired: A holistic approach through the MoSIoT framework. Univers. Access Inf. Soc. 2024, 24, 1–23. [Google Scholar] [CrossRef]
- Buttar, A.M.; Ahmad, U.; Gumaei, A.H.; Assiri, A.; Akbar, M.A.; Alkhamees, B.F. Deep Learning in Sign Language Recognition: A Hybrid Approach for the Recognition of Static and Dynamic Signs. Mathematics 2023, 11, 3729. [Google Scholar] [CrossRef]
- Al-Qurishi, M.; Khalid, T.; Souissi, R. Deep Learning for Sign Language Recognition: Current Techniques, Benchmarks, and Open Issues. IEEE Access 2021, 9, 126917–126951. [Google Scholar] [CrossRef]
- Papatsimouli, M.; Sarigiannidis, P.; Fragulis, G.F. A Survey of Advancements in Real-Time Sign Language Translators: Integration with IoT Technology. Technologies 2023, 11, 83. [Google Scholar] [CrossRef]
- Jiang, X.; Zhang, Y.; Lei, J.; Zhang, Y. A Survey on Chinese Sign Language Recognition: From Traditional Methods to Artificial Intelligence. CMES Comput. Model. Eng. Sci. 2024, 140, 1–40. [Google Scholar] [CrossRef]
- Tao, T.; Zhao, Y.; Liu, T.; Zhu, J. Sign Language Recognition: A Comprehensive Review of Traditional and Deep Learning Approaches, Datasets, and Challenges. IEEE Access 2024, 12, 75034–75060. [Google Scholar] [CrossRef]
- Shakeel, P.M.; Burhanuddin, M.A.; Desa, M.I. Automatic lung cancer detection from CT image using improved deep neural network and ensemble classifier. Neural Comput. Appl. 2020, 34, 9579–9592. [Google Scholar] [CrossRef]
- Murali, R.S.L.; Ramayya, L.D.; Santosh, V.A. Sign language recognition system using convolutional neural network and computer sensor-based. Int. J. Eng. Innov. Adv. Technol. 2022, 4, 138–141. [Google Scholar]
- Kaur, B.; Chaudhary, A.; Bano, S.; Yashmita, S.R.N.; Reddy, S.; Anand, R. Fostering inclusivity through effective communication: Real-time sign language to speech conversion system for the deaf and hard-of-hearing community. Multimed. Tools Appl. 2023, 83, 45859–45880. [Google Scholar] [CrossRef]
- Zhu, W. Quiet Interaction: Designing an Accessible Home Environment for Deaf and Hard of Hearing (DHH) Individuals Through AR, AI, and IoT Technologies. Doctoral Dissertation, OCAD University, Toronto, ON, Canada, 2024. [Google Scholar]
- Miah, A.S.M.; Hasan, A.M.; Jang, S.-W.; Lee, H.-S.; Shin, J. Multi-Stream General and Graph-Based Deep Neural Networks for Skeleton-Based Sign Language Recognition. Electronics 2023, 12, 2841. [Google Scholar] [CrossRef]
- Naz, N.; Sajid, H.; Ali, S.; Hasan, O.; Ehsan, M.K. MIPA-ResGCN: A multi-input part attention enhanced residual graph convolutional framework for sign language recognition. Comput. Electr. Eng. 2023, 112, 109009. [Google Scholar] [CrossRef]
- Liang, Y.; Jettanasen, C.; Chiradeja, P. Progression Learning Convolution Neural Model-Based Sign Language Recognition Using Wearable Glove Devices. Computation 2024, 12, 72. [Google Scholar] [CrossRef]
- Liang, Y.; Jettanasen, C. Development of Sensor Data Fusion and Optimized Elman Neural Model-based Sign Language Recognition System. J. Internet Technol. 2024, 25, 671–681. [Google Scholar] [CrossRef]
- Venugopalan, A.; Reghunadhan, R. Applying Hybrid Deep Neural Network for the Recognition of Sign Language Words Used by the Deaf COVID-19 Patients. Arab. J. Sci. Eng. 2022, 48, 1349–1362. [Google Scholar] [CrossRef]
- Tang, S.; Guo, D.; Hong, R.; Wang, M. Graph-Based Multimodal Sequential Embedding for Sign Language Translation. IEEE Trans. Multimed. 2021, 24, 4433–4445. [Google Scholar] [CrossRef]
- Muthusamy, P.; Murugan, G.P. Recognition of Indian Continuous Sign Language Using Spatio-Temporal Hybrid Cue Network. Int. J. Intell. Eng. Syst. 2023, 16, 874. [Google Scholar]
- Kan, J.; Hu, K.; Hagenbuchner, M.; Tsoi, A.C.; Bennamoun, M.; Wang, Z. Sign language translation with hierarchical spatio-temporal graph neural network. In Proceedings of the 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 3–8 January 2022; pp. 3367–3376. [Google Scholar]
- Xu, X.; Meng, K.; Chen, C.; Lu, L. Isolated Word Sign Language Recognition Based on Improved SKResNet-TCN Network. J. Sens. 2023, 2023, 9503961. [Google Scholar] [CrossRef]
- Gupta, A.; Sawan, A.; Singh, S.; Kumari, S. Dynamic Sign Language Recognition with Hybrid CNN-LSTM and 1D Convolutional Layers. In Proceedings of the 2024 11th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), Noida, India, 14–15 March 2024; pp. 1–6. [Google Scholar]
- Noor, T.H.; Noor, A.; Alharbi, A.F.; Faisal, A.; Alrashidi, R.; Alsaedi, A.S.; Alharbi, G.; Alsanoosy, T.; Alsaeedi, A. Real-Time Arabic Sign Language Recognition Using a Hybrid Deep Learning Model. Sensors 2024, 24, 3683. [Google Scholar] [CrossRef]
- Huang, S.; Ye, Z. Boundary-Adaptive Encoder With Attention Method for Chinese Sign Language Recognition. IEEE Access 2021, 9, 70948–70960. [Google Scholar] [CrossRef]
- Kumar, E.K.; Kishore, P.; Kumar, D.A.; Kumar, M.T.K. Early estimation model for 3D-discrete indian sign language recognition using graph matching. J. King Saud Univ. Comput. Inf. Sci. 2021, 33, 852–864. [Google Scholar] [CrossRef]
- Rajalakshmi, E.; Elakkiya, R.; Subramaniyaswamy, V.; Alexey, L.P.; Mikhail, G.; Bakaev, M.; Kotecha, K.; Gabralla, L.A.; Abraham, A. Multi-Semantic Discriminative Feature Learning for Sign Gesture Recognition Using Hybrid Deep Neural Architecture. IEEE Access 2023, 11, 2226–2238. [Google Scholar] [CrossRef]
Hyperparameter | Notation | Value(s) | Description |
---|---|---|---|
Graph Construction Threshold | τ | 0.5–0.7 | Defines connectivity in the spatial–temporal graph. |
Graph Embedding Dimensionality | d | 64, 128 | Determines feature embedding size in the latent space. |
Adaptive Convolution Kernel Size | k | 3 × 3, 5 × 5 | Controls receptive field in feature extraction. |
Normalization Factor in Edge Weight Calculation | σ | 0.1–1.0 | Regulates sensitivity of graph edge weights. |
Attention Weight Factor | α | 0.1–0.9 | Adjusts attention-based feature importance. |
Dropout Rate | p | 0.3–0.5 | Prevents overfitting in graph and convolution layers. |
Learning Rate | η | 0.001–0.005 | Determines optimization step size for convergence. |
Batch Size | B | 32, 64 | Number of samples processed per training step. |
Number of GCN Layers | L | 2, 3 | Defines depth of graph convolutional layers. |
Number of Attention Heads | h | 4, 8 | Controls multi-head self-attention operations. |
Weight Decay (L2 Regularization) | λ | 1 × 10−4, 1 × 10−5 | Reduces overfitting by penalizing large weights. |
Component | Operation | Computational Complexity |
---|---|---|
Graph Convolution | Node feature aggregation and transformation | |
Adaptive Convolution | Feature extraction over spatial dimensions | |
Total Complexity | Combined graph and convolutional computations |
Experiment | Description | Accuracy (%) | Inference Time (ms) |
---|---|---|---|
Full HGE-ACN Model | Baseline model with all modules enabled. | 94.8% | 45 ms |
Without Graph Embedding (HGE Removed) | Only adaptive convolution is used for feature extraction. | 88.3% | 39 ms |
Without Adaptive Convolution (ACN Removed) | Only graph embeddings are used without adaptive convolution. | 85.7% | 42 ms |
Without Multi-Scale Sensor Fusion | Raw sensor inputs are used without fusion techniques. | 81.5% | 38 ms |
Without Attention Mechanism | Temporal attention is disabled in the graph network. | 86.1% | 44 ms |
Model | Methodology | Accuracy (%) | Inference Time (ms) | Number of Parameters (Million) |
---|---|---|---|---|
HGE-ACN (Proposed) | Hybrid Graph Embedding + Adaptive Convolution | 94.8% | 45 ms | 12.5 M |
DDSTGCNN [1] | Dynamic Dense Spatiotemporal Graph CNN | 89.2% | 50 ms | 14.3 M |
MSeqGraph [2] | Multimodal Sequential Graph Embedding | 91.1% | 56 ms | 15.2 M |
BAE [3] | Boundary-Adaptive Encoder for Sign Language | 87.4% | 62 ms | 11.8 M |
SKResNet-TCN [4] | Hybrid ResNet + Temporal Convolution | 92.3% | 48 ms | 13.1 M |
Model | Learning Rate | Batch Size | Number of Layers | Dropout Rate | Optimizer |
---|---|---|---|---|---|
HGE-ACN (Proposed) | 0.001 | 64 | 3 GCN + 2 ACN | 0.4 | AdamW |
DDSTGCNN | 0.0015 | 64 | 4 GCN | 0.5 | Adam |
MSeqGraph | 0.002 | 32 | 2 GCN + 1 LSTM | 0.3 | RMSprop |
BAE | 0.001 | 32 | 2 LSTM | 0.3 | Adam |
SKResNet-TCN | 0.0008 | 64 | 3 ResNet + 2 TCN | 0.4 | Adam |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Chiradeja, P.; Liang, Y.; Jettanasen, C. Sign Language Sentence Recognition Using Hybrid Graph Embedding and Adaptive Convolutional Networks. Appl. Sci. 2025, 15, 2957. https://doi.org/10.3390/app15062957
Chiradeja P, Liang Y, Jettanasen C. Sign Language Sentence Recognition Using Hybrid Graph Embedding and Adaptive Convolutional Networks. Applied Sciences. 2025; 15(6):2957. https://doi.org/10.3390/app15062957
Chicago/Turabian StyleChiradeja, Pathomthat, Yijuan Liang, and Chaiyan Jettanasen. 2025. "Sign Language Sentence Recognition Using Hybrid Graph Embedding and Adaptive Convolutional Networks" Applied Sciences 15, no. 6: 2957. https://doi.org/10.3390/app15062957
APA StyleChiradeja, P., Liang, Y., & Jettanasen, C. (2025). Sign Language Sentence Recognition Using Hybrid Graph Embedding and Adaptive Convolutional Networks. Applied Sciences, 15(6), 2957. https://doi.org/10.3390/app15062957