Multi-Class Confidence Detection Using Deep Learning Approach
Abstract
:1. Introduction
- One of the objectives of the proposed research is to design and construct a system to analyze the profile of someone who communicates by gestures. Mainly, we focus on video sequences of moving.
- The proposed architecture is trained and tested on a dataset of video clips collected and extracted from Web resources like YouTube.
- Another critical objective of our approach is the design of a model to integrate different Machine learning techniques like Convolution Neural Networks (CNN) and Long Short-Term Memory (LSTM) and optimize the performance of the recognition of hand gestures.
- First, it is a locally collected dataset from freely available open-source resources.
- Second, the chosen domain of confidence determination in a context is unique and helps effectively understand social, academic interviews or crime investigations.
- Third, a combination of two high-performing models, one is Customized CNN (GoogLeNet) with LSTM. The customized CNN consists of four major layers containing Conv2D and then applies a pooling in each layer for feature extraction.
2. Related Work
3. Collected Data-Set
4. Hand Gesture Classification
5. Proposed Architecture
- Visual data pre-processing
- Frame extraction contains human object
- Frame resizing according to human hand’s region
- Feature extraction and deep learning method learning
- Gesture classification through classifier.
Architecture of Customized Neural Network
6. Experiments & Results
7. Conclusions
- Detect gestures from a complex background which contains multiple objects and variant colored background.
- Enhance the number of human gestures for effective interaction, visualization, and estimation of particular contextual activities.
- The system can be extended by video and image sequence processing.
- Applications can be diverse considering different scenarios such as educational activities, and official working for automatic recognition.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- What Is Computer Vision? Available online: https://www.ibm.com/topics/computer-vision (accessed on 25 October 2022).
- Gadekallu, T.R.; Alazab, M.; Kaluri, R.; Maddikunta, P.K.R.; Bhattacharya, S.; Lakshmanna, K.; Parimala, M. Hand gesture classification using a novel CNN-crow search algorithm. Complex Intell. Syst. 2021, 7, 1855–1868. [Google Scholar] [CrossRef]
- Tan, Y.S.; Lim, K.M.; Lee, C.P. Hand gesture recognition via enhanced densely connected convolutional neural network. Expert Syst. Appl. 2021, 175, 114797. [Google Scholar] [CrossRef]
- Zhang, T. Application of AI-based real-time gesture recognition and embedded system in the design of English major teaching. In Wireless Networks; Springer: Cham, Switzerland, 2021; pp. 1–13. [Google Scholar]
- Zhu, M.; Sun, Z.; Zhang, Z.; Shi, Q.; He, T.; Liu, H.; Chen, T.; Lee, C. Haptic-feedback smart glove as a creative human-machine interface (HMI) for virtual/augmented reality applications. Sci. Adv. 2020, 6, eaaz8693. [Google Scholar] [CrossRef]
- Kendon, A. Gesture: Visible Action as Utterance; University of Pennsylvania: Pennsylvania, PA, USA, 2004. [Google Scholar] [CrossRef]
- Nivash, S.; Ganesh, E.; Manikandan, T.; Dhaka, A.; Nandal, A.; Hoang, V.T.; Kumar, A.; Belay, A. Implementation and Analysis of AI-Based Gesticulation Control for Impaired People. Wirel. Commun. Mob. Comput. 2022, 2022, 4656939. [Google Scholar] [CrossRef]
- Xing, Y.; Zhu, J. Deep Learning-Based Action Recognition with 3D Skeleton: A Survey; Wiley: Hoboken, NJ, USA, 2021. [Google Scholar]
- Shanmuganathan, V.; Yesudhas, H.R.; Khan, M.S.; Khari, M.; Gandomi, A.H. R-CNN and wavelet feature extraction for hand gesture recognition with EMG signals. Neural Comput. Appl. 2020, 32, 16723–16736. [Google Scholar] [CrossRef]
- Islam, M.R.; Mitu, U.K.; Bhuiyan, R.A.; Shin, J. Hand gesture feature extraction using deep convolutional neural network for recognizing American sign language. In Proceedings of the IEEE 2018 4th International Conference on Frontiers of Signal Processing (ICFSP), Poitiers, France, 24–27 September 2018; pp. 115–119. [Google Scholar]
- Buhrmester, V.; Münch, D.; Arens, M. Analysis of explainers of black box deep neural networks for computer vision: A survey. Mach. Learn. Knowl. Extr. 2021, 3, 966–989. [Google Scholar] [CrossRef]
- Wang, C.; Yang, H.; Meinel, C. Image captioning with deep bidirectional LSTMs and multi-task learning. ACM Trans. Multimed. Comput. Commun. Appl. (TOMM) 2018, 14, 1–20. [Google Scholar] [CrossRef]
- Bhardwaj, S.; Srinivasan, M.; Khapra, M.M. Efficient video classification using fewer frames. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–19 June 2019; pp. 354–363. [Google Scholar]
- Li, Y.; Miao, Q.; Qi, X.; Ma, Z.; Ouyang, W. A spatiotemporal attention-based ResC3D model for large-scale gesture recognition. Mach. Vis. Appl. 2019, 30, 875–888. [Google Scholar] [CrossRef]
- Ahuja, M.K.; Singh, A. Static vision based Hand Gesture recognition using principal component analysis. In Proceedings of the 2015 IEEE 3rd International Conference on MOOCs, Innovation and Technology in Education (MITE), Amritsar, India, 1–2 October 2015; pp. 402–406. [Google Scholar]
- Oudah, M.; Al-Naji, A.; Chahl, J. Hand gesture recognition based on computer vision: A review of techniques. J. Imaging 2020, 6, 73. [Google Scholar] [CrossRef] [PubMed]
- Bernard, J.; Dobermann, E.; Vögele, A.; Krüger, B.; Kohlhammer, J.; Fellner, D. Visual-interactive semi-supervised labeling of human motion capture data. Electron. Imaging 2017, 2017, 34–45. [Google Scholar] [CrossRef]
- Zhan, F. Hand gesture recognition with convolution neural networks. In Proceedings of the 2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science (IRI), Los Angeles, CA, USA, 30 July–1 August 2019; pp. 295–298. [Google Scholar]
- Ryoo, M.S.; Aggarwal, J.K. Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities. In Proceedings of the 2009 IEEE 12th International Conference on Computer Vision, Kyoto, Japan, 29 September–2 October 2009; pp. 1593–1600. [Google Scholar]
- Aggarwal, J.K.; Ryoo, M.S. Human activity analysis: A review. ACM Comput. Surv. (CSUR) 2011, 43, 1–43. [Google Scholar] [CrossRef]
- Scovanner, P.; Ali, S.; Shah, M. A 3-dimensional sift descriptor and its application to action recognition. In Proceedings of the 15th ACM International Conference on Multimedia, Augsburg, Germany, 24–29 September 2007; pp. 357–360. [Google Scholar]
- Ehatisham-Ul-Haq, M.; Javed, A.; Azam, M.A.; Malik, H.M.; Irtaza, A.; Lee, I.H.; Mahmood, M.T. Robust human activity recognition using multimodal feature-level fusion. IEEE Access 2019, 7, 60736–60751. [Google Scholar] [CrossRef]
- Benitez-Garcia, G.; Prudente-Tixteco, L.; Castro-Madrid, L.C.; Toscano-Medina, R.; Olivares-Mercado, J.; Sanchez-Perez, G.; Villalba, L.J.G. Improving real-time hand gesture recognition with semantic segmentation. Sensors 2021, 21, 356. [Google Scholar] [CrossRef] [PubMed]
- Hendy, N.; Fayek, H.M.; Al-Hourani, A. Deep Learning Approaches for Air-writing Using Single UWB Radar. IEEE Sens. J. 2022, 22, 11989–12001. [Google Scholar] [CrossRef]
- Wei, Y.; Xia, W.; Lin, M.; Huang, J.; Ni, B.; Dong, J.; Zhao, Y.; Yan, S. HCP: A Flexible CNN Framework for Multi-Label Image Classification. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 38, 1901–1907. [Google Scholar] [CrossRef] [Green Version]
- Gong, T.; Liu, B.; Chu, Q.; Yu, N. Using multi-label classification to improve object detection. Neurocomputing 2019, 370, 174–185. [Google Scholar] [CrossRef]
- Li, H.; Gong, M.; Zhang, M.; Wu, Y. Spatially Self-Paced Convolutional Networks for Change Detection in Heterogeneous Images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 4966–4979. [Google Scholar] [CrossRef]
- Parvathy, P.; Subramaniam, K.; Prasanna Venkatesan, G.; Karthikaikumar, P.; Varghese, J.; Jayasankar, T. Development of hand gesture recognition system using machine learning. J. Ambient Intell. Humaniz. Comput. 2021, 12, 6793–6800. [Google Scholar] [CrossRef]
- He, J.; Zhang, H. A real time face detection method in human-machine interaction. In Proceedings of the 2008 2nd International Conference on Bioinformatics and Biomedical Engineering, Shanghai, China, 16–18 May 2008; pp. 1975–1978. [Google Scholar]
- Yang, J.; Lu, W.; Waibel, A. Skin-color modeling and adaptation. In Proceedings of the Asian Conference on Computer Vision; Springer: Cham, Switzerland, 1998; pp. 687–694. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Szegedy, C.; Ioffe, S.; Vanhoucke, V.; Alemi, A.A. Inception-v4, inception-resnet and the impact of residual connections on learning. In Proceedings of the Thirty-first AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017. [Google Scholar]
- You, J.; Korhonen, J. Attention boosted deep networks for video classification. In Proceedings of the 2020 IEEE International Conference on Image Processing (ICIP), Online, 25–28 October 2020; pp. 1761–1765. [Google Scholar]
- Muhammad, K.; Ullah, A.; Imran, A.S.; Sajjad, M.; Kiran, M.S.; Sannino, G.; de Albuquerque, V.H.C. Human action recognition using attention based LSTM network with dilated CNN features. Future Gener. Comput. Syst. 2021, 125, 820–830. [Google Scholar] [CrossRef]
- Khan, S.; Khan, M.A.; Alhaisoni, M.; Tariq, U.; Yong, H.S.; Armghan, A.; Alenezi, F. Human action recognition: A paradigm of best deep learning features selection and serial based extended fusion. Sensors 2021, 21, 7941. [Google Scholar] [CrossRef]
- Tsironi, E.; Barros, P.; Weber, C.; Wermter, S. An analysis of convolutional long short-term memory recurrent neural networks for gesture recognition. Neurocomputing 2017, 268, 76–86. [Google Scholar] [CrossRef]
- Xi, C.; Chen, J.; Zhao, C.; Pei, Q.; Liu, L. Real-time hand tracking using kinect. In Proceedings of the 2nd International Conference on Digital Signal Processing, Tokyo, Japan, 25–27 February 2018; pp. 37–42. [Google Scholar]
- Konstantinidis, D.; Dimitropoulos, K.; Daras, P. Sign language recognition based on hand and body skeletal data. In Proceedings of the IEEE 2018-3DTV-Conference: The True Vision-Capture, Transmission and Display of 3D Video (3DTV-CON), Helsinki, Finland, 3–5 June 2018; pp. 1–4. [Google Scholar]
- De Smedt, Q.; Wannous, H.; Vandeborre, J.P.; Guerry, J.; Saux, B.L.; Filliat, D. 3d hand gesture recognition using a depth and skeletal dataset: Shrec’17 track. In Proceedings of the Workshop on 3D Object Retrieval, Graz, Austria, 23–24 April 2017; pp. 33–38. [Google Scholar]
- Ren, Z.; Meng, J.; Yuan, J. Depth camera based hand gesture recognition and its applications in human-computer-interaction. In Proceedings of the IEEE 2011 8th International Conference on Information, Communications & Signal Processing, Lyon, France, 23–24 April 2011; pp. 1–5. [Google Scholar]
- Sahoo, J.P.; Ari, S.; Patra, S.K. Hand gesture recognition using PCA based deep CNN reduced features and SVM classifier. In Proceedings of the 2019 IEEE International Symposium on Smart Electronic Systems (iSES)(Formerly iNiS), Singapore, 13–16 December 2019; pp. 221–224. [Google Scholar]
- Ma, X.; Peng, J. Kinect sensor-based long-distance hand gesture recognition and fingertip detection with depth information. J. Sensors 2018, 2018, 21692932. [Google Scholar] [CrossRef] [Green Version]
- Desai, S. Segmentation and recognition of fingers using Microsoft Kinect. In Proceedings of the International Conference on Communication and Networks; Springer: Cham, Szwitzerland, 2017; pp. 45–53. [Google Scholar]
- Bakar, M.Z.A.; Samad, R.; Pebrianti, D.; Aan, N.L.Y. Real-time rotation invariant hand tracking using 3D data. In Proceedings of the 2014 IEEE International Conference on Control System, Computing and Engineering (ICCSCE 2014), Penang, Malaysia, 28–30 November 2014; pp. 490–495. [Google Scholar]
- Bamwenda, J.; Özerdem, M. Recognition of static hand gesture with using ANN and SVM. Dicle Univ. J. Eng. 2019. [Google Scholar] [CrossRef] [Green Version]
- Desai, S.; Desai, A. Human Computer Interaction through hand gestures for home automation using Microsoft Kinect. In Proceedings of the International Conference on Communication and Networks; Springer: Cham, Switzerland, 2017; pp. 19–29. [Google Scholar]
- Tekin, B.; Bogo, F.; Pollefeys, M. H+ o: Unified egocentric recognition of 3d hand-object poses and interactions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–19 June 2019; pp. 4511–4520. [Google Scholar]
- Wan, C.; Probst, T.; Gool, L.V.; Yao, A. Self-supervised 3d hand pose estimation through training by fitting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–19 June 2019; pp. 10853–10862. [Google Scholar]
- Ge, L.; Ren, Z.; Li, Y.; Xue, Z.; Wang, Y.; Cai, J.; Yuan, J. 3d hand shape and pose estimation from a single rgb image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–19 June 2019; pp. 10833–10842. [Google Scholar]
- Han, S.; Liu, B.; Cabezas, R.; Twigg, C.D.; Zhang, P.; Petkau, J.; Yu, T.H.; Tai, C.J.; Akbay, M.; Wang, Z.; et al. MEgATrack: Monochrome egocentric articulated hand-tracking for virtual reality. ACM Trans. Graph. 2020, 39, 87:1–87:13. [Google Scholar] [CrossRef]
- Wu, X.; Finnegan, D.; O’Neill, E.; Yang, Y.L. Handmap: Robust hand pose estimation via intermediate dense guidance map supervision. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 237–253. [Google Scholar]
- Alnaim, N.; Abbod, M.; Albar, A. Hand gesture recognition using convolutional neural network for people who have experienced a stroke. In Proceedings of the 2019 3rd International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT), Ankara, Turkey, 11–13 October 2019; pp. 1–6. [Google Scholar]
- Chung, H.Y.; Chung, Y.L.; Tsai, W.F. An efficient hand gesture recognition system based on deep CNN. In Proceedings of the 2019 IEEE International Conference on Industrial Technology (ICIT), Melbourne, VIC, Australia, 13–15 February 2019; pp. 853–858. [Google Scholar]
- Bao, P.; Maqueda, A.I.; del Blanco, C.R.; García, N. Tiny hand gesture recognition without localization via a deep convolutional network. IEEE Trans. Consum. Electron. 2017, 63, 251–257. [Google Scholar] [CrossRef] [Green Version]
- Li, G.; Tang, H.; Sun, Y.; Kong, J.; Jiang, G.; Jiang, D.; Tao, B.; Xu, S.; Liu, H. Hand gesture recognition based on convolution neural network. Clust. Comput. 2019, 22, 2719–2729. [Google Scholar] [CrossRef]
Category | Ref. | Type of Camera | Methods | Algorithm | Results | Application Area |
---|---|---|---|---|---|---|
Skeleton Extraction | [37] | Kinetic Camera | Euclidean distance and geodesic distance | Extract Skeleton Pixels | Real-time Hand Tracking | |
[38] | RGB Video Sequence | Laplacian-based contraction | Skeleton Classifier | 80% | Gesture Recognition and Sign Language | |
[39] | Real sense depth camera | Analysis of Depth and Skeleton Extraction | SVM with Linear Kernal | 88.24% and 81.90% | Hand Gesture Application | |
[40] | Kinect V2 Camera | Analysis of Depth Metadata | SVM | 95.42% | Hand Gesture Recognition | |
[41] | Digital Camera | YUV and CAMShift Algorithm for skin and movement feature extraction | Naive Bayes Algorithm | 97.64% | Human and Machine Interaction Framework | |
Depth analysis | [40] | Kinet Camera | Threshold and near convex shape | Finger earth mover distance (FEMD) | 93.9% | Human Computer Interaction (HCI) |
[42] | Kinect Videos | Analysis of depth and infrared Images | Convex hull detection algorithm | 96% | Human Robotic Interaction with Natural Influence | |
[43] | Kinect Videos | Otsu’s global threshold Extraction | KNN Classifier and Eucladien Distance | 90% | Human Computer Interaction (HCI) | |
[44] | Kinet Camera | Threshold ranging analysis | Distance from device and shape based matching | hand rehabilitation system | ||
[16] | Kinect Videos | Integration and Processing of RGB and depth Information | SURF and Forward Recursion | 90% | virtual environment | |
[45] | Kinect Videos | Skeletal data processing and segmentation | SVM and Artificial Neural Network (ANN) | SVM 93.4% and ANN 98.2% | American Sign Language | |
[46] | Videos streams | range of detected hand and their depth analysis | KNN and Euclidean distance | 88% | Control of Electronic Home Appliance | |
3D CNN Model | [47] | RGB Camera | Detect and Predict Hand position and joints | Single shoot neural network | 94% | Understand Human Behavior and Object Interaction |
[48] | Depth Sensor Camera | 3D hand pose estimation | Pose Estimation neural network | 83% | design hand pose estimation using self-supervision method | |
[49] | RGB-D Camera | Direct feed to single RGB-D Camera | Train system with full supervision | 86.53% | Understand Handshape and movement analysis | |
[50] | Kinect V2 Camera | Segmentation Mask and Body Tracker | Customized Machine Learning Method | 76% | Interaction with Machine or Augmented World | |
[51] | Depth images | Predict heat maps of hand joints in detection based model | dense feature maps throughintermediate supervision in regression-based framework | HCI and Human Interaction with Machine | ||
Deep Learning | [52] | Mobile Camera (HD and 4K) | Feature Extraction by CNN | Adapted Deep Convolutional Neural Network (ADCNN) | Testing Acc: 99% | HCI for people suffered with Injuries |
[53] | webcam | Skin color detection, background subtraction and feature extraction | Deep Convolutional Neural Network | 95.61% | Smart home appliances | |
[54] | RGB Camera | Processing of images and direct feed to pre-defined method of CNN | Deep Convolutional Neural Network | Simple Background: 97.1% and Complex Background: 85.3% | Smart Electronic Devices | |
[55] | Kinect | skin color modeling combined with convolution neural network image feature | convolution neural network and support vector machine | 98.52% | Human Computer Interaction and Behavior Analysis |
Classes of Human Gesture | Training | Testing | Total Instances per Class |
---|---|---|---|
Confidence | 814 | 368 | 1182 |
Co-operation | 841 | 352 | 1193 |
Uncomfortable | 567 | 248 | 815 |
Un-confident | 595 | 240 | 835 |
Total | 2817 | 1208 | 4025 |
Gesture Class | Precision | Recall | F-Score | Support |
---|---|---|---|---|
Confidence | 96% | 92% | 94% | 814 |
Cooperation | 95% | 94% | 95% | 841 |
Uncomfortable | 88% | 90% | 89% | 567 |
Unconfident | 93% | 97% | 95% | 595 |
Gesture Class | Precision | Recall | F-Score | Support |
---|---|---|---|---|
Confidence | 92% | 93% | 93% | 368 |
Cooperation | 91% | 90% | 91% | 352 |
Uncomfortable | 87% | 85% | 86% | 248 |
Unconfident | 90% | 93% | 92% | 240 |
Machine Learning Method | Accuracy | Precision | Recall | F-Score |
---|---|---|---|---|
KNN | 79% | 81.5% | 81.75% | 81.25% |
Naives Bayes | 43% | 37.5% | 50.5% | 35.75% |
SVM | 67% | 50.5% | 69.3% | 70% |
VGG-19 | 86% | 89.3% | 87.75% | 88.5% |
Proposed Method (CNN + LSTM) | 90.5% | 90.46% | 90.48% | 90.46 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Mujahid, A.; Aslam, M.; Khan, M.U.G.; Martinez-Enriquez, A.M.; Haq, N.U. Multi-Class Confidence Detection Using Deep Learning Approach. Appl. Sci. 2023, 13, 5567. https://doi.org/10.3390/app13095567
Mujahid A, Aslam M, Khan MUG, Martinez-Enriquez AM, Haq NU. Multi-Class Confidence Detection Using Deep Learning Approach. Applied Sciences. 2023; 13(9):5567. https://doi.org/10.3390/app13095567
Chicago/Turabian StyleMujahid, Amna, Muhammad Aslam, Muhammad Usman Ghani Khan, Ana Maria Martinez-Enriquez, and Nazeef Ul Haq. 2023. "Multi-Class Confidence Detection Using Deep Learning Approach" Applied Sciences 13, no. 9: 5567. https://doi.org/10.3390/app13095567