Removal and Recovery of the Human Invisible Region
Abstract
:1. Introduction
- We propose a two-stage framework for removing human occlusion to obtain the mask of the human body and recover the occluded area’s content. We are a challenging study of humans with highly variable postures.
- The results of the amodal mask are refined by the fusion of multiscale features on the hourglass network and the addition of a large amount of a priori information.
- A new visible guided attention (VGA) module was designed to guide low-level features to recover occlusion content by calculating the attention map between the inside and outside of the occlusion region of the high-level feature map.
- We have used natural occlusions to produce a human occlusion dataset that better matches the visual perception of the human eye. Based on this dataset, it is demonstrated that our model outperforms other current methods. In addition, the problem of unpredictable occluded joints in human pose estimation is solved.
2. Related Work
3. Method
3.1. Overview
3.2. Amodal Completion Network
3.3. Content Recovery Network
4. Human Occlusion Dataset
4.1. Data Collection and Filtering
4.2. Data Production
5. Experiments
5.1. Implementation Details
5.2. Comparison with Existing Methods
5.3. Ablation Study
5.4. Human Pose Estimation
6. Conclusions
Author Contributions
Funding
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 2015, 28, 91–99. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Cheng, B.; Xiao, B.; Wang, J.; Shi, H.; Huang, T.S.; Zhang, L. Higherhrnet: Scale-aware representation learning for bottom-up human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–20 June 2020; pp. 5386–5395. [Google Scholar]
- Cao, Z.; Simon, T.; Wei, S.E.; Sheikh, Y. Realtime multi-person 2d pose estimation using part affinity fields. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7291–7299. [Google Scholar]
- Newell, A.; Yang, K.; Deng, J. Stacked hourglass networks for human pose estimation. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 483–499. [Google Scholar]
- Liu, Z.; Chen, H.; Feng, R.; Wu, S.; Ji, S.; Yang, B.; Wang, X. Deep Dual Consecutive Network for Human Pose Estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 525–534. [Google Scholar]
- Artacho, B.; Savakis, A. Unipose: Unified human pose estimation in single images and videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–20 June 2020; pp. 7035–7044. [Google Scholar]
- Kuznetsova, A.; Maleva, T.; Soloviev, V. Using YOLOv3 algorithm with pre-and post-processing for apple detection in fruit-harvesting robot. Agronomy 2020, 10, 1016. [Google Scholar] [CrossRef]
- Kamyshova, G.; Osipov, A.; Gataullin, S.; Korchagin, S.; Ignar, S.; Gataullin, T.; Terekhova, N.; Suvorov, S. Artificial Neural Networks and Computer Vision’s-Based Phytoindication Systems for Variable Rate Irrigation Improving. IEEE Access 2022, 10, 8577–8589. [Google Scholar] [CrossRef]
- Korchagin, S.A.; Gataullin, S.T.; Osipov, A.V.; Smirnov, M.V.; Suvorov, S.V.; Serdechnyi, D.V.; Bublikov, K.V. Development of an Optimal Algorithm for Detecting Damaged and Diseased Potato Tubers Moving along a Conveyor Belt Using Computer Vision Systems. Agronomy 2021, 11, 1980. [Google Scholar] [CrossRef]
- Andriyanov, N.; Khasanshin, I.; Utkin, D.; Gataullin, T.; Ignar, S.; Shumaev, V.; Soloviev, V. Intelligent System for Estimation of the Spatial Position of Apples Based on YOLOv3 and Real Sense Depth Camera D415. Symmetry 2022, 14, 148. [Google Scholar] [CrossRef]
- Chu, X.; Zheng, A.; Zhang, X.; Sun, J. Detection in crowded scenes: One proposal, multiple predictions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–20 June 2020; pp. 12214–12223. [Google Scholar]
- Dai, H.; Zhou, L.; Zhang, F.; Zhang, Z.; Hu, H.; Zhu, X.; Ye, M. Joint COCO and Mapillary Workshop at ICCV 2019 Keypoint Detection Challenge Track Technical Report: Distribution—Aware Coordinate Representation for Human Pose Estimation. arXiv 2020, arXiv:2003.07232. [Google Scholar]
- Barnes, C.; Shechtman, E.; Finkelstein, A.; Goldman, D.B. PatchMatch: A randomized correspondence algorithm for structural image editing. ACM Trans. Graph. 2009, 28, 24. [Google Scholar] [CrossRef]
- Pathak, D.; Krahenbuhl, P.; Donahue, J.; Darrell, T.; Efros, A.A. Context encoders: Feature learning by inpainting. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2536–2544. [Google Scholar]
- Ehsani, K.; Roozbeh, M.; Ali, F. Segan: Segmenting and generating the invisible. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
- Zhan, X.; Pan, X.; Dai, B.; Liu, Z.; Lin, D.; Loy, C.C. Self-supervised scene de-occlusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–20 June 2020; pp. 3784–3792. [Google Scholar]
- Yan, X.; Wang, F.; Liu, W.; Yu, Y.; He, S.; Pan, J. Visualizing the invisible: Occluded vehicle segmentation and recovery. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27–28 October 2019; pp. 7618–7627. [Google Scholar]
- Li, K.; Malik, J. Amodal instance segmentation. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016; pp. 677–693. [Google Scholar]
- Xiao, Y.; Xu, Y.; Zhong, Z.; Luo, W.; Li, J.; Gao, S. Amodal Segmentation Based on Visible Region Segmentation and Shape Prior. arXiv 2020, arXiv:2012.05598. [Google Scholar]
- Yang, C.; Lu, X.; Lin, Z.; Shechtman, E.; Wang, O.; Li, H. High-resolution image inpainting using multi-scale neural patch synthesis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 6721–6729. [Google Scholar]
- Iizuka, S.; Simo-Serra, E.; Ishikawa, H. Globally and locally consistent image completion. ACM Trans. Graph. 2017, 36, 1–14. [Google Scholar] [CrossRef]
- Liu, G.; Reda, F.A.; Shih, K.J.; Wang, T.C.; Tao, A.; Catanzaro, B. Image inpainting for irregular holes using partial convolutions. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 85–100. [Google Scholar]
- Liu, H.; Wan, Z.; Huang, W.; Song, Y.; Han, X.; Liao, J. PD-GAN: Probabilistic Diverse GAN for Image Inpainting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 9371–9381. [Google Scholar]
- Nazeri, K.; Ng, E.; Joseph, T.; Qureshi, F.Z.; Ebrahimi, M. Edgeconnect: Generative image inpainting with adversarial edge learning. arXiv 2019, arXiv:1901.00212. [Google Scholar]
- Isola, P.; Zhu, J.Y.; Zhou, T.; Efros, A.A. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1125–1134. [Google Scholar]
- Zeng, Y.; Fu, J.; Chao, H.; Guo, B. Learning pyramid-context encoder network for high-quality image inpainting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 1486–1494. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Everingham, M.; Van Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A. The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef] [Green Version]
- Yamaguchi, K.; Kiapour, M.H.; Ortiz, L.E.; Berg, T.L. Parsing clothing in fashion photographs. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 3570–3577. [Google Scholar]
- Bolya, D.; Zhou, C.; Xiao, F.; Lee, Y.J. Yolact: Real-time instance segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27–28 October 2019; pp. 9157–9166. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
- Paszke, A.; Gross, S.; Chintala, S.; Chanan, G.; Yang, E.; DeVito, Z.; Lerer, A.; Lin, Z.; Desmaison, A.; Antiga, L. Automatic differentiation in pytorch. In Proceedings of the NIPS 2017 Autodiff Workshop, Long Beach, CA, USA, 9 December 2017. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Heusel, M.; Ramsauer, H.; Unterthiner, T.; Nessler, B.; Hochreiter, S. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
- Fang, H.S.; Lu, G.; Fang, X.; Xie, J.; Tai, Y.W.; Lu, C. Weakly and semi supervised human body part parsing via pose-guided knowledge transfer. arXiv 2018, arXiv:1805.04310. [Google Scholar]
Model | Synthetic Images | Authentic Images | ||
---|---|---|---|---|
mIoU ↑ | ℓ1 ↓ | mIoU ↑ | ℓ1 ↓ | |
SeGAN [19] | 0.722 | 0.0836 | 0.732 | 0.0821 |
PCNets [20] | 0.783 | 0.0718 | 0.773 | 0.0729 |
OVSR [21] | 0.826 | 0.0653 | 0.836 | 0.0645 |
ours | 0.802 | 0.0677 | 0.82 | 0.0638 |
Model | Synthetic Images | Authentic Images | ||||
---|---|---|---|---|---|---|
ℓ1 ↓ | ℓ2 ↓ | FID [26] ↓ | ℓ1 ↓ | ℓ2 ↓ | FID [26] ↓ | |
SeGAN [19] | 0.042 | 0.0403 | 28.96 | 0.0418 | 0.0398 | 38.36 |
PCNets [20] | 0.0368 | 0.0346 | 26.18 | 0.0366 | 0.0349 | 37.57 |
OVSR [21] | 0.0352 | 0.0338 | 22.61 | 0.0348 | 0.0326 | 35.4 |
ours | 0.0343 | 0.033 | 20.83 | 0.0344 | 0.0324 | 33.28 |
Discriminator | Prior | Perceptual | IoU | |
---|---|---|---|---|
1 | 0.686 | |||
2 | √ | 0.720 | ||
3 | √ | √ | 0.783 | |
4 | √ | √ | 0.746 | |
5 | √ | √ | √ | 0.802 |
Concat | Mul | Dilated Conv | FID | |
---|---|---|---|---|
1 | √ | 24.40 | ||
2 | √ | √ | 24.16 | |
3 | √ | √ | 22.83 | |
4 | √ | √ | 22.61 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, Q.; Liang, Q.; Liang, H.; Yang, Y. Removal and Recovery of the Human Invisible Region. Symmetry 2022, 14, 531. https://doi.org/10.3390/sym14030531
Zhang Q, Liang Q, Liang H, Yang Y. Removal and Recovery of the Human Invisible Region. Symmetry. 2022; 14(3):531. https://doi.org/10.3390/sym14030531
Chicago/Turabian StyleZhang, Qian, Qiyao Liang, Hong Liang, and Ying Yang. 2022. "Removal and Recovery of the Human Invisible Region" Symmetry 14, no. 3: 531. https://doi.org/10.3390/sym14030531
APA StyleZhang, Q., Liang, Q., Liang, H., & Yang, Y. (2022). Removal and Recovery of the Human Invisible Region. Symmetry, 14(3), 531. https://doi.org/10.3390/sym14030531