Instance segmentation of faces and mouth-opening degrees based on improved YOLOv8 method

Fan, Yuhe; Zhang, Lixun; Zheng, Canxing; Wang, Xingyuan; Zhu, Jinghui; Wang, Lan

doi:10.1007/s00530-024-01472-z

Instance segmentation of faces and mouth-opening degrees based on improved YOLOv8 method

Regular Paper
Published: 11 September 2024

Volume 30, article number 269, (2024)
Cite this article

Multimedia Systems Aims and scope Submit manuscript

Yuhe Fan¹,
Lixun Zhang¹,
Canxing Zheng²,
Xingyuan Wang¹,
Jinghui Zhu¹ &
…
Lan Wang¹

278 Accesses
Explore all metrics

Abstract

Instance segmentation of faces and mouth-opening degrees is an important technology for meal-assisting robotics in food delivery safety. However, due to the diversity in in shape, color, and posture of faces and the mouth with small area contour, easy to deform, and occluded, it is challenging to real-time and accurate instance segmentation. In this paper, we proposed a novel method for instance segmentation of faces and mouth-opening degrees. Specifically, in backbone network, deformable convolution was introduced to enhance the ability to capture finer-grained spatial information and the CloFormer module was introduced to improve the ability to capture high-frequency local and low-frequency global information. In neck network, classical convolution and C2f modules are replaced by GSConv and VoV-GSCSP aggregation modules, respectively, to reduce the complexity and floating-point operations of models. Finally, in localization loss, CIOU loss was replaced by WIOU loss to reduce the competitiveness of high-quality anchor frames and mask the influence of low-quality samples, which in turn improves localization accuracy and generalization ability. It is abbreviated as the DCGW-YOLOv8n-seg model. The DCGW-YOLOv8n-seg model was compared with the baseline YOLOv8n-seg model and several state-of-the-art instance segmentation models on datasets, respectively. The results show that the DCGW-YOLOv8n-seg model is characterized by high accuracy, speed, robustness, and generalization ability. The effectiveness of each improvement in improving the model performance was verified by ablation experiments. Finally, the DCGW-YOLOv8n-seg model was applied to the instance segmentation experiment of meal-assisting robotics. The results show that the DCGW-YOLOv8n-seg model can better realize the instance segmentation effect of faces and mouth-opening degrees. The novel method proposed can provide a guiding theoretical basis for meal-assisting robotics in food delivery safety and can provide a reference value for computer vision and image instance segmentation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Real-time and accurate model of instance segmentation of foods

Article 30 April 2024

Multiclass Semantic Segmentation of Mediterranean Food Images

MCA-Deeplabv3+: a cupping spot image segmentation network based on improved Deeplabv3+

Article 03 January 2025

Data availability statement

The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author. No datasets were generated or analysed during the current study.

References

Daehyung, P., Yuuna, H., Charles, C.K.: A multimodal anomaly detector for robot-assisted feeding using an LSTM-based variational autoencoder. IEEE Robot. Autom. Lett. 3(3), 1544–1551 (2018)
Article Google Scholar
Jihyeon, H., Sangin, P., Chang-Hwan, I., Laehyun, K.: A hybrid brain–computer interface for real-life food assist robot control. Sensors 21, 4578 (2021)
Article Google Scholar
Nabil, E., Aman, B.: A learning from demonstration framework for implementation of a feeding task. Encyclop. Semant. Comput. Robot. Intell. 2(1), 1850001 (2018)
Article Google Scholar
Tejas, K.S., Maria, K.G., Graser, A.: Application of reinforcement learning to a robotic drinking assistant. Robotics 9(1), 1–15 (2019)
Article Google Scholar
Fei, L., Hongliu, Y., Wentao, W., Changcheng, Q.: I-feed: a robotic platform of an assistive feeding robot for the disabled elderly population. Technol. Health Care 28(4), 425–429 (2020)
Article Google Scholar
Fei, L., Peng, X., Hongliu, Y.: Robot-assisted feeding: a technical application that combines learning from demonstration and visual interaction. Technol. Health Care 29(1), 187–192 (2021)
Article Google Scholar
Yuhe, F., Lixun, Z., Xingyuan, W., Keyi, W., Lan, W., Zhenhan, W., Feng, X., Jinghui, Z., Chao, W.: Rheological thixotropy and pasting properties of food thickening gums orienting at improving food holding rate. Appl. Rheol. 32, 100–121 (2022)
Article Google Scholar
Yuhe, F., Lixun, Z., Jinghui, Z., Yunqin, Z., Xingyuan, W.: Viscoelasticity and friction of solid foods measurement by simulating meal-assisting robot. Int. J. Food Prop. 25(1), 2301–2319 (2022)
Article Google Scholar
Yuhe, F., Lixun, Z., Canxing, Z., Xingyuan, W., Keyi, W., Jinghui, Z.: Motion behavior of non-Newtonian fluid-solid interaction foods. J. Food Eng. 347, 111448 (2023)
Article Google Scholar
Yuhe, F., Lixun, Z., Canxing, Z., Feng, X., Zhenhan, W., Xingyuan, W., Lan, W.: Contact forces and motion behavior of non-Newtonian fluid–solid food by coupled SPH–FEM method. J. Food Sci. 1–21 (2023)
Yuhe, F., Lixun, Z., Canxing, Z., Yunqin, Z., Xingyuan, W., Jinghui, Z.: Real-time and accurate meal detection for meal-assisting robots. J. Food Eng. 371, 111996 (2024)
Article Google Scholar
Yuhe, F., Lixun, Z., Canxing, Z., Yunqin, Z., Keyi, W., Xingyuan, W.: Real-time and accurate model of instance segmentation of foods. J. Real-Time Image Process. 21, 80 (2024)
Article Google Scholar
Jinhai, W., Zongyin, Z., Lufeng, L., Huiling, W., Wei, W., Mingyou, C., Shaoming, L.: DualSeg: fusing transformer and CNN structure for image segmentation in complex vineyard environment. Comput. Electron. Agric. 206, 107682 (2023)
Article Google Scholar
Chan, Z., Pengfei, C., Jing, P., Xiaofan, Y., Changxin, C., Shuqin, T., Yueju, X.: A mango picking vision algorithm on instance segmentation and key point detection from RGB images in an open orchard. Biosyst. Eng. 206, 32–54 (2021)
Article Google Scholar
Jordi, G.M., Mar, F.F., Eduard, G., Jochen, H., Josep-Ramon, M.: Looking behind occlusions: a study on a modal segmentation for robust on-tree apple fruit size estimation. Comput. Electron. Agric. 209, 107854 (2023)
Article Google Scholar
Dandan, W., Dongjian, H.: Fusion of Mask RCNN and attention mechanism for instance segmentation of apples under complex background. Comput. Electron. Agric. 196, 106864 (2022)
Article Google Scholar
Pengyu, C., Zhaojian, L., Kyle, L., Renfu, L., Xiaoming, L.: Deep learning-based apple detection using a suppression mask R-CNN. Pattern Recognit Lett. 147, 206–211 (2021)
Article Google Scholar
Tian, Y., Yang, G., Wang, Z., Li, E., Liang, Z.: Instance segmentation of apple flowers using the improved mask R-CNN model. Biosyst. Eng. 193, 264–278 (2020)
Article Google Scholar
Mubashiru, L.O.: YOLOv5-LiNet: a lightweight network for fruits instance segmentation. PLoS ONE 18(3), e0282297 (2023)
Article Google Scholar
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 779–788 (2016)
Glenn, J.: Ultralytics YOLOv8. https://github.com/ultralytics/ultralytics (2023). Accessed 27 Apr 2023
Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, ICCV, pp. 764–773 (2017)
Qihang, F., Huaibo, H., Jiyang, G., Ran, H.: Rethinking local perception in lightweight vision transformer. abs/2303.17803. https://arxiv.org/abs/2303.17803 (2023)
Hulin, L., Jun, L., Hanbing, W., Zheng, L., Zhenfei, Z., Qiliang, R.: Slim-neck by GsConv: a better design paradigm of detector architectures for autonomous vehicles. In: Computer Vision and Pattern Recognition, CVPR, pp. 1–17 (2022)
Krishnaveni, B., Sridhar, S.: A compressed string matching algorithm for face recognition with partial occlusion. Multim. Syst. 24, 191–203 (2021)
Google Scholar
Peiying, L., Shikui, T., Lei, X.: Deep rival penalized competitive learning for low-resolution face recognition. Neural Netw. 148, 183–193 (2022)
Article Google Scholar
Zhongyue, C., Jiangqi, C., Guangliu, D., He, H.: A lightweight CNN-based algorithm and implementation on embedded system for real-time face recognition. Multim. Syst. 29, 129–138 (2023)
Article Google Scholar
Jian, S., Ge, S., Jinyu, Z., Zhihui, W., Haojie, L.: Face attribute recognition via end-to-end weakly supervised regional location. Multim. Syst. 29, 2137–2152 (2023)
Article Google Scholar
Wenjing, H., Shikui, T., Lei, X.: IA-FaceS: a bidirectional method for semantic face editing. Neural Netw. 158, 272–292 (2023)
Article Google Scholar
Ali, H., Zaid, E., Rafi, U., Hafiz, M.: Distilling facial knowledge with teacher-tasks: semantic-segmentation-features. In: Computer Vision and Pattern Recognition, CVPR. arXiv:2209.01115 (2022)
Hongliang, Z., Zhennao, C., Lei, X., Ali, A.H., Huiling, C., Dong, Z., Shuihua, W., Yudong, Z.: Face image segmentation using boosted grey wolf optimizer. Biomimetics 8(6), 484 (2023)
Article Google Scholar
Li, X., Dechun, Z.: Face mask segmentation method combining salient features and gender constraints. Trait. Signal 40(2), 629–637 (2023)
Article Google Scholar
Min, Z., Kai, X., Yuhang, Z., Chang, W., Jianbiao, H.: Fine segmentation on faces with masks based on a multistep iterative segmentation algorithm. IEEE Access 10, 75742–75753 (2022)
Article Google Scholar
Qing, G., Zhaojie, J., Yongquan, C., Tianwei, Z., Yuquan, L.: Mouth cavity visual analysis based on deep learning for oropharyngeal swab robot sampling. IEEE Trans. Hum. Mach. Syst. 1–10 (2023)
Omar, E., Noor, A., Somaya, A.: Pose-invariant face recognition with multitask cascade networks. Neural Comput. Appl. 34, 6039–6052 (2022)
Article Google Scholar
Chunlu, L., Andreas, M.F., Thomas, V., Bernhard, E., Adam, K.: Robust model-based face reconstruction through weakly-supervised outlier segmentation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, pp. 372–381 (2023)
Ge, S., Li, J., Ye, Q., Luo, Z.: Detecting masked faces in the wild with LLE-CNNs. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 426–434 (2017)
Tang, X., Du, D.K., He, Z., Liu, J.: PyramidBox: a context-assisted single shot face detector. In: European Conference on Computer Vision, ECCV, pp. 812–828 (2018)
Farfade, S.S., Saberian, M., Li, L.J.: Multi-view Face detection using deep convolutional neural networks. In: Computer Vision and Pattern Recognition, CVPR, pp. 643–650. arXiv:1502.02766 (2015)
Hao, Z., Liu, Y., Qin, H., Yan, J., Li, X., Hu, X.: Scale-aware face detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 1913–1922 (2017)
Shuo, Y., Yuanjun, X., Chen, C.L., Xiaoou, T.: Face detection through scale-friendly deep convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR. arXiv: 1706.02863 (2017)
Peiyun, H., Deva, R.: Finding tiny faces. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 1522–1530 (2017)
Zhang, S., Zhu, X., Lei, Z., Shi, H., Wang, X., Li, S.Z.: S3FD: single shot scale-invariant face detector. In: 2017 IEEE International Conference on Computer Vision, CVPR, pp. 192–201 (2017)
Rajeev, R., Vishal, M.P., Rama, C.: HyperFace: a deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. In: Computer Vision and Pattern Recognition, CVPR, vol. 99. arXiv: 1603.01249 (2017)
Tianhua, L., Meng, S., Qinghai, H., Guanshan, Z., Guoying, S.: Tomato recognition and location algorithm based on improved YOLOv5. Comput. Electron. Agric. 208, 107759 (2023)
Article Google Scholar
Glenn, J.: YOLOv5 release v6.1. https://github.com/ultralytics/yolov5/releases/tag/v6.1 (2022)
Wang, C.Y., Bochkovskiy, A., Liao, H.Y.M.: YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: 2022 IEEE Conference on Computer Vision and Pattern Recognition, CVPR, arXiv:2207.02696 (2022).
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 770–778 (2016)
Wenjie, Y., Jiachun, W., Jinlai, Z., Kai, G., Ronghua, D., Zhuo, W., Eksan, F., Dingwen, L.: Deformable convolution and coordinate attention for fast cattle detection. Comput. Electron. Agric. 211, 108006 (2023)
Article Google Scholar
Chilukuri, D.M., Yi, S., Seong, Y.: A robust object detection system with occlusion handling for mobile devices. Comput. Intell. 38(4), 1338–1364 (2022)
Article Google Scholar
Fang, H.S., Li, J., Tang, H., Xu, C., Zhu, H., Xiu, Y., Li, Y.L., Lu, C.: Alphapose: whole-body regional multi-person pose estimation and tracking in real-time. IEEE Trans. Pattern Anal. 45(6), 7157–7173 (2022)
Article Google Scholar
Zanjia, T., Yuhang, C., Zewei, X., Rong, Y.: Wise-IoU: bounding box regression loss with dynamic focusing mechanism. In: 2023 IEEE International Conference on Computer Vision, CVPR. arXiv:2301.10051 (2023)
Tsungyi, L., Priya, G., Ross, G., Kaiming, H., Piotr, D.: Focal loss for dense object detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR. arXiv:1708.02002 (2017)
Haoyang, Z., Ying, W., Feras, D., Niko, S.: VarifocalNet: an IoU-aware dense object detector. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR. arXiv:2008.13367v2 (2021)
Wada, K.: v5.0.5. https://github.com/wkentaro/labelme (2020)
Shu, L., Lu, Q., Haifang, Q., Jianping, S., Jiaya, J.: Path aggregation network for instance segmentation. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR. arXiv:1803.01534v4 (2018)
Cheng-Yang, F., Mykhailo, S., Alexander, C.B.: RetinaMask: Learning to predict masks improves state-of-the-art single-shot detection for free. In: Computer Vision and Pattern Recognition, CVPR, arXiv:1901.03353v1 (2019)
Kaiming, H., Georgia, G., Piotr, D., Ross, G.: Mask R-CNN. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR. arXiv:1703.06870v3 (2018)
Daniel, B., Chong, Z., Fanyi, X., Yong, J.L.: YOLACT real-time instance segmentation. In: Computer Vision and Pattern Recognition, CVPR. arXiv:1904.02689v2 (2019)

Download references

Acknowledgements

The research work is supported by National Key R&D Program of China under Grant 2020YFC2007700 and Fundamental Research Funds for the Central Universities of China under grant 3072022CF0703.

Author information

Authors and Affiliations

College of Mechanical and Electrical Engineering, Harbin Engineering University, Building No. 61, Nantong Street No. 145, Harbin, 150001, China
Yuhe Fan, Lixun Zhang, Xingyuan Wang, Jinghui Zhu & Lan Wang
Weifang People’s Hospital, Weifang, Shandong, China
Canxing Zheng

Authors

Yuhe Fan
View author publications
You can also search for this author in PubMed Google Scholar
Lixun Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Canxing Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Xingyuan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jinghui Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Lan Wang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Yuhe Fan: analysis, experiments, analyze the results, drafting and revising of manuscript. Lixun Zhang: funding, methods, reviewed and revised the manuscript. Canxing Zheng: analyze the results, collecting images, and making datasets. Xingyuan Wang: analyze the results and provision of theory. Jinghui Zhu: collecting images, and making datasets. Lan Wang: reviewed and revised the manuscript. All authors agree to be accountable for all aspects of the work.

Corresponding author

Correspondence to Lixun Zhang.

Ethics declarations

Conflict of interest

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Fan, Y., Zhang, L., Zheng, C. et al. Instance segmentation of faces and mouth-opening degrees based on improved YOLOv8 method. Multimedia Systems 30, 269 (2024). https://doi.org/10.1007/s00530-024-01472-z

Download citation

Received: 12 January 2024
Accepted: 25 August 2024
Published: 11 September 2024
DOI: https://doi.org/10.1007/s00530-024-01472-z

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Instance segmentation of faces and mouth-opening degrees based on improved YOLOv8 method

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Real-time and accurate model of instance segmentation of foods

Multiclass Semantic Segmentation of Mediterranean Food Images

MCA-Deeplabv3+: a cupping spot image segmentation network based on improved Deeplabv3+

Data availability statement

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Instance segmentation of faces and mouth-opening degrees based on improved YOLOv8 method

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Real-time and accurate model of instance segmentation of foods

Multiclass Semantic Segmentation of Mediterranean Food Images

MCA-Deeplabv3+: a cupping spot image segmentation network based on improved Deeplabv3+

Data availability statement

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation