A Remote-Vision-Based Safety Helmet and Harness Monitoring System Based on Attribute Knowledge Modeling
Abstract
:1. Introduction
- We propose an automatic safety helmet and harness monitoring system based on attribute knowledge modeling to recognize the wearing states of safety helmets and harnesses. In contrast to previous studies that apply object detection to locate and identify safety helmets and harnesses, we transfer this problem into a semantic attribute recognition problem, which is more reliable and robust to the variance in workers’ appearances in unknown poses.
- We present a novel attribute knowledge modeling network based on the transformer architecture, in which the self-attention mechanism is applied to fully explore the relationship between attributes and image features for attribute recognition.
- We develop an open-site monitoring dataset for construction scenes containing three subdatasets: object detection, multiobject tracking, and attribute recognition. This benchmark dataset is crucial for evaluating safety helmet and harness monitoring systems in unconstrained environments.
2. Related Works
2.1. The Safety Helmet and Harness Monitoring System
2.2. Pedestrian Attribute Recognition
2.3. Transformer in Computer Vision
3. Methodology
3.1. Overall System
3.2. Worker Detection and Tracking
3.3. Safety Attribute Recognition
3.3.1. Feature and Attribute Embeddings
3.3.2. Relation Exploitation Based on Transformer
3.3.3. Attribute Recognition Classifier
3.3.4. The Loss Function
3.4. Fusion Processing Module
4. Dataset and Experimental Details
4.1. Open-Site Monitoring Dataset
- (1)
- Object Detection Subdataset
- (2)
- Multiobject Tracking Subdataset
- (3)
- Safety Attribute Recognition Subdataset
4.2. Implementation Details
5. Experimental Results
5.1. Results and Analysis of Object Detection
5.2. Results and Analysis of Object Tracking
5.3. Results and Analysis of Safety Attribute Recognition
5.4. System Visual Interface Design and Display
6. Discussion
6.1. The Superiority of Attribute Recognition for Safety Helmet and Harness Monitoring
6.2. The Effectiveness of Transformer on Attribute Knowledge Modeling
6.3. The Significance of the Open-Site Monitoring Dataset
7. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
CNN | Convolutional neural network |
SSD | Single-shot multibox detector |
MTL | Multitask learning |
GCN | Graph convolutional networks |
LSTM | Long short-term memory |
HPSS | Hierarchical positive sample selection |
MDA | Multidirectional attention |
OSHA | Occupational Safety and Health Administration |
PPE | Personal protective equipment |
References
- Jeong, G.; Kim, H.; Lee, H.S.; Park, M.; Hyun, H. Analysis of safety risk factors of modular construction to identify accident trends. J. Asian Archit. Build. Eng. 2022, 21, 1040–1052. [Google Scholar] [CrossRef]
- OSHA. Available online: https://www.osha.gov/Publications/OSHA3252/3252.html (accessed on 6 July 2019).
- Mei, S.; Geng, Y.; Hou, J.; Du, Q. Learning hyperspectral images from RGB images via a coarse-to-fine CNN. Sci. China Inf. Sci. 2022, 65, 1–14. [Google Scholar] [CrossRef]
- Han, G.; Zhu, M.; Zhao, X.; Gao, H. Method based on the cross-layer attention mechanism and multiscale perception for safety helmet-wearing detection. Comput. Electr. Eng. 2021, 95, 107458. [Google Scholar] [CrossRef]
- Li, Z.; Xie, W.; Zhang, L.; Lu, S.; Xie, L.; Su, H.; Du, W.; Hou, W. Toward Efficient Safety Helmet Detection Based on YoloV5 With Hierarchical Positive Sample Selection and Box Density Filtering. IEEE Trans. Instrum. Meas. 2022, 71, 1–14. [Google Scholar] [CrossRef]
- Nath, N.D.; Behzadan, A.H.; Paal, S.G. Deep learning for site safety: Real-time detection of personal protective equipment. Autom. Constr. 2020, 112, 103085. [Google Scholar] [CrossRef]
- Wu, J.; Cai, N.; Chen, W.; Wang, H.; Wang, G. Automatic detection of hardhats worn by construction personnel: A deep learning approach and benchmark dataset. Autom. Constr. 2019, 106, 102894. [Google Scholar] [CrossRef]
- Fang, C.; Xiang, H.; Leng, C.; Chen, J.; Yu, Q. Research on Real-Time Detection of Safety Harness Wearing of Workshop Personnel Based on YOLOv5 and OpenPose. Sustainability 2022, 14, 5872. [Google Scholar] [CrossRef]
- Fang, W.; Ding, L.; Luo, H.; Love, P.E. Falls from heights: A computer vision-based approach for safety harness detection. Autom. Constr. 2018, 91, 53–61. [Google Scholar] [CrossRef]
- Wang, X.; Zheng, S.; Yang, R.; Zheng, A.; Chen, Z.; Tang, J.; Luo, B. Pedestrian attribute recognition: A survey. Pattern Recognit. 2022, 121, 108220. [Google Scholar] [CrossRef]
- Ray, S.J.; Teizer, J. Real-time construction worker posture analysis for ergonomics training. Adv. Eng. Inform. 2012, 26, 439–455. [Google Scholar] [CrossRef]
- Seo, J.; Han, S.; Lee, S.; Kim, H. Computer vision techniques for construction safety and health monitoring. Adv. Eng. Inform. 2015, 29, 239–251. [Google Scholar] [CrossRef]
- Yan, X.; Li, H.; Li, A.R.; Zhang, H. Wearable IMU-based real-time motion warning system for construction workers’ musculoskeletal disorders prevention. Autom. Constr. 2017, 74, 2–11. [Google Scholar] [CrossRef]
- Wonil, L.; Seto, E.; Lin, K.; Migliaccio, G. An evaluation of wearable sensor s and their placements for analyzing construction worker’s trunk posture i n laboratory conditions. Appl. Erg. 2017, 65, 424–436. [Google Scholar]
- Kolar, Z.; Chen, H.; Luo, X. Transfer learning and deep convolutional neural networks for safety guardrail detection in 2D images. Autom. Constr. 2018, 89, 58–70. [Google Scholar] [CrossRef]
- Fang, Q.; Li, H.; Luo, X.; Ding, L.; Luo, H.; Rose, T.M.; An, W. Detecting non-hardhat-use by a deep learning method from far-field surveillance videos. Autom. Constr. 2018, 85, 1–9. [Google Scholar] [CrossRef]
- Shanti, M.Z.; Cho, C.S.; Byon, Y.J.; Yeun, C.Y.; Kim, T.Y.; Kim, S.K.; Altunaiji, A. A novel implementation of an ai-based smart construction safety inspection protocol in the uae. IEEE Access 2021, 9, 166603–166616. [Google Scholar] [CrossRef]
- Alrayes, F.S.; Alotaibi, S.S.; Alissa, K.A.; Maashi, M.; Alhogail, A.; Alotaibi, N.; Mohsen, H.; Motwakel, A. Artificial Intelligence-Based Secure Communication and Classification for Drone-Enabled Emergency Monitoring Systems. Drones 2022, 6, 222. [Google Scholar] [CrossRef]
- Shanti, M.Z.; Cho, C.S.; de Soto, B.G.; Byon, Y.J.; Yeun, C.Y.; Kim, T.Y. Real-time monitoring of work-at-height safety hazards in construction sites using drones and deep learning. J. Saf. Res. 2022, 83, 364–370. [Google Scholar] [CrossRef]
- Zhu, J.; Liao, S.; Lei, Z.; Yi, D.; Li, S. Pedestrian attribute classification in surveillance: Database and evaluation. In Proceedings of the IEEE International Conference on Computer Vision Workshops, Sydney, NSW, Australia, 2–8 December 2013; pp. 331–338. [Google Scholar]
- Deng, Y.; Luo, P.; Loy, C.C.; Tang, X. Pedestrian attribute recognition at far distance. In Proceedings of the 22nd ACM International Conference on Multimedia, New York, NY, USA, 3–7 November 2014; pp. 789–792. [Google Scholar]
- Zhao, X.; Sang, L.; Ding, G.; Han, J.; Di, N.; Yan, C. Recurrent attention model for pedestrian attribute recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 9275–9282. [Google Scholar]
- Tan, Z.; Yang, Y.; Wan, J.; Hang, H.; Guo, G.; Li, S.Z. Attention-based pedestrian attribute analysis. IEEE Trans. Image Process. 2019, 28, 6126–6140. [Google Scholar] [CrossRef]
- Tang, C.; Sheng, L.; Zhang, Z.; Hu, X. Improving pedestrian attribute recognition with weakly-supervised multi-scale attribute-specific localization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 4997–5006. [Google Scholar]
- Liu, X.; Zhao, H.; Tian, M.; Sheng, L.; Shao, J.; Yi, S.; Yan, J.; Wang, X. Hydraplus-net: Attentive deep features for pedestrian analysis. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 350–359. [Google Scholar]
- Li, D.; Chen, X.; Huang, K. Multi-attribute learning for pedestrian attribute recognition in surveillance scenarios. In Proceedings of the 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR), Kuala Lumpur, Malaysia, 3–6 November 2015; pp. 111–115. [Google Scholar]
- Mei, S.; Li, X.; Liu, X.; Cai, H.; Du, Q. Hyperspectral image classification using attention-based bidirectional long short-term memory network. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5509612. [Google Scholar] [CrossRef]
- Mei, S.; Chen, X.; Zhang, Y.; Li, J.; Plaza, A. Accelerating convolutional neural network-based hyperspectral image classification by step activation quantization. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5502012. [Google Scholar] [CrossRef]
- Sarfraz, M.S.; Schumann, A.; Wang, Y.; Stiefelhagen, R. Deep view-sensitive pedestrian attribute inference in an end-to-end model. arXiv 2017, arXiv:1707.06089. [Google Scholar]
- Wang, J.; Zhu, X.; Gong, S.; Li, W. Attribute recognition by joint recurrent learning of context and correlation. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 531–540. [Google Scholar]
- Zhao, X.; Sang, L.; Ding, G.; Guo, Y.; Jin, X. Grouping attribute recognition for pedestrian with joint recurrent learning. In Proceedings of the IJCAI, Stockholm, Sweden, 13–19 July 2018; Volume 2018, p. 27. [Google Scholar]
- Li, Q.; Zhao, X.; He, R.; Huang, K. Visual-semantic graph reasoning for pedestrian attribute recognition. In Proceedings of the AAAI conference on artificial intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 8634–8641. [Google Scholar]
- Tan, Z.; Yang, Y.; Wan, J.; Guo, G.; Li, S.Z. Relation-aware pedestrian attribute recognition with graph convolutional networks. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 12055–12062. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar] [CrossRef]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- Lan, Z.; Chen, M.; Goodman, S.; Gimpel, K.; Sharma, P.; Soricut, R. Albert: A lite bert for self-supervised learning of language representations. arXiv 2019, arXiv:1909.11942. [Google Scholar]
- Dai, Z.; Yang, Z.; Yang, Y.; Carbonell, J.; Le, Q.V.; Salakhutdinov, R. Transformer-xl: Attentive language models beyond a fixed-length context. arXiv 2019, arXiv:1901.02860. [Google Scholar]
- Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 2020, 33, 1877–1901. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16 × 16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2020; pp. 213–229. [Google Scholar]
- He, S.; Luo, H.; Wang, P.; Wang, F.; Li, H.; Jiang, W. Transreid: Transformer-based object re-identification. In Proceedings of the Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual Conference, 10–17 October 2021; pp. 15013–15022.
- Khan, S.; Naseer, M.; Hayat, M.; Zamir, S.W.; Khan, F.S.; Shah, M. Transformers in vision: A survey. ACM Comput. Surv. (CSUR) 2022, 54, 1–41. [Google Scholar] [CrossRef]
- Gabeur, V.; Sun, C.; Alahari, K.; Schmid, C. Multi-modal transformer for video retrieval. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2020; pp. 214–229. [Google Scholar]
- Cornia, M.; Stefanini, M.; Baraldi, L.; Cucchiara, R. Meshed-memory transformer for image captioning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10578–10587. [Google Scholar]
- Chen, S.; Hong, Z.; Liu, Y.; Xie, G.S.; Sun, B.; Li, H.; Peng, Q.; Lu, K.; You, X. Transzero: Attribute-guided transformer for zero-shot learning. In Proceedings of the AAAI, Virtual Conference, 22 February–1 March 2022; Volume 2, p. 3. [Google Scholar]
- Shorten, C.; Khoshgoftaar, T.M. A survey on image data augmentation for deep learning. J. Big Data 2019, 6, 1–48. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
- Ba, J.L.; Kiros, J.R.; Hinton, G.E. Layer normalization. arXiv 2016, arXiv:1607.06450. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2014; pp. 740–755. [Google Scholar]
- Padilla, R.; Netto, S.L.; Da Silva, E.A. A survey on performance metrics for object-detection algorithms. In Proceedings of the 2020 International Conference on Systems, Signals and Image Processing (IWSSIP), Niteroi, Brazil, 1–3 July 2020; pp. 237–242. [Google Scholar]
- Zhang, S.; Wang, J.; Wang, Z.; Gong, Y.; Liu, Y. Multi-target tracking by learning local-to-global trajectory models. PR 2015, 48, 580–590. [Google Scholar] [CrossRef]
- Zhang, S.; Huang, J.B.; Lim, J.; Gong, Y.; Wang, J.; Ahuja, N.; Yang, M.H. Tracking persons-of-interest via unsupervised representation adaptation. Int. J. Comput. Vis. 2020, 128, 96–120. [Google Scholar] [CrossRef] [Green Version]
- Ristani, E.; Solera, F.; Zou, R.; Cucchiara, R.; Tomasi, C. Performance measures and a data set for multi-target, multi-camera tracking. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2016; pp. 17–35. [Google Scholar]
- Li, D.; Zhang, Z.; Chen, X.; Huang, K. A richly annotated pedestrian dataset for person retrieval in real surveillance scenarios. IEEE Trans. Image Process. 2018, 28, 1575–1590. [Google Scholar] [CrossRef] [PubMed]
- Zagoruyko, S.; Komodakis, N. Wide residual networks. arXiv 2016, arXiv:1605.07146. [Google Scholar]
- Huang, L.; Wang, W.; Chen, J.; Wei, X.Y. Attention on attention for image captioning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Long Beach, CA, USA, 16–20 June 2019; pp. 4634–4643. [Google Scholar]
- Pan, Y.; Yao, T.; Li, Y.; Mei, T. X-linear attention networks for image captioning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual Conference, 14–19 June 2020; pp. 10971–10980. [Google Scholar]
Network Model | Class | Precision | Recall | [email protected] | [email protected]:.95 |
---|---|---|---|---|---|
YOLOv5s | All | 0.957 | 0.875 | 0.921 | 0.652 |
Head | 0.959 | 0.897 | 0.948 | 0.684 | |
Safety helmet | 0.969 | 0.933 | 0.976 | 0.742 | |
Safety harness | 0.942 | 0.794 | 0.838 | 0.530 | |
YOLOv5x | All | 0.981 | 0.922 | 0.940 | 0.791 |
Head | 0.983 | 0.974 | 0.990 | 0.832 | |
Safety helmet | 0.969 | 0.988 | 0.989 | 0.850 | |
Safety harness | 0.993 | 0.806 | 0.841 | 0.693 |
Detector | Tracker | MOTA↑ | IDF1↑ | MT↑ | ML↓ | ID Sw↓ | FP↓ | FN↓ |
---|---|---|---|---|---|---|---|---|
YOLOv5s | Deep SORT | 97.6% | 98.9% | 75.2% | 12.2% | 12 | 192 | 44 |
YOLOv5m | 96.5% | 98.3% | 73.1% | 13.9% | 17 | 304 | 47 | |
YOLOv5l | 92.2% | 96.5% | 70.2% | 16.1% | 25 | 723 | 72 | |
YOLOv5x | 93.9% | 97.3% | 71.7% | 15.4% | 22 | 556 | 60 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wu, X.; Li, Y.; Long, J.; Zhang, S.; Wan, S.; Mei, S. A Remote-Vision-Based Safety Helmet and Harness Monitoring System Based on Attribute Knowledge Modeling. Remote Sens. 2023, 15, 347. https://doi.org/10.3390/rs15020347
Wu X, Li Y, Long J, Zhang S, Wan S, Mei S. A Remote-Vision-Based Safety Helmet and Harness Monitoring System Based on Attribute Knowledge Modeling. Remote Sensing. 2023; 15(2):347. https://doi.org/10.3390/rs15020347
Chicago/Turabian StyleWu, Xiao, Yupeng Li, Jihui Long, Shun Zhang, Shuai Wan, and Shaohui Mei. 2023. "A Remote-Vision-Based Safety Helmet and Harness Monitoring System Based on Attribute Knowledge Modeling" Remote Sensing 15, no. 2: 347. https://doi.org/10.3390/rs15020347