survey

Human Image Generation: A Comprehensive Survey

Authors:

Tieniu TanAuthors Info & Claims

ACM Computing Surveys, Volume 56, Issue 11

Article No.: 279, Pages 1 - 39

https://doi.org/10.1145/3665869

Published: 28 June 2024 Publication History

Abstract

Image and video synthesis has become a blooming topic in computer vision and machine learning communities along with the developments of deep generative models, due to its great academic and application value. Many researchers have been devoted to synthesizing high-fidelity human images as one of the most commonly seen object categories in daily lives, where a large number of studies are performed based on various models, task settings, and applications. Thus, it is necessary to give a comprehensive overview on these variant methods on human image generation. In this article, we divide human image generation techniques into three paradigms, i.e., data-driven methods, knowledge-guided methods, and hybrid methods. For each paradigm, the most representative models and the corresponding variants are presented, where the advantages and characteristics of different methods are summarized in terms of model architectures. The main public human image datasets and evaluation metrics in the literature are also summarized. Furthermore, due to the wide application potential, the typical downstream usages of synthesized human images are covered. Finally, the challenges and potential opportunities of human image generation are discussed to shed light on future research.

References

[1]

Badour AlBahar and Jia-Bin Huang. 2019. Guided image-to-image translation with bi-directional feature transformation. In Proceedings of the IEEE International Conference on Computer Vision. 9016–9025.

[2]

Badour Albahar, Jingwan Lu, Jimei Yang, Zhixin Shu, Eli Shechtman, and Jia-Bin Huang. 2021. Pose with style: Detail-preserving pose-guided image synthesis with conditional stylegan. ACM Transactions on Graphics (TOG) 40, 6 (2021), 1–11.

Digital Library

[3]

Thiemo Alldieck, Marcus Magnor, Bharat Lal Bhatnagar, Christian Theobalt, and Gerard Pons-Moll. 2019. Learning to reconstruct people in clothing from a single RGB camera. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1175–1186.

[4]

Thiemo Alldieck, Marcus Magnor, Weipeng Xu, Christian Theobalt, and Gerard Pons-Moll. 2018. Video based reconstruction of 3D people models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8387–8397.

[5]

Thiemo Alldieck, Gerard Pons-Moll, Christian Theobalt, and Marcus Magnor. 2019. Tex2shape: Detailed full human body geometry from a single image. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 2293–2303.

[6]

Thiemo Alldieck, Hongyi Xu, and Cristian Sminchisescu. 2021. imghum: Implicit generative models of 3d human shape and articulated pose. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5461–5470.

[7]

Dragomir Anguelov, Praveen Srinivasan, Daphne Koller, Sebastian Thrun, Jim Rodgers, and James Davis. 2005. SCAPE: Shape completion and animation of people. In ACM SIGGRAPH 2005 Papers. 408–416.

Digital Library

[8]

Shuai Bai, Huiling Zhou, Zhikang Li, Chang Zhou, and Hongxia Yang. 2022. Single stage virtual try-on via deformable attention flows. In European Conference on Computer Vision. Springer, 409–425.

Digital Library

[9]

Slawomir Bak, Peter Carr, and Jean-Francois Lalonde. 2018. Domain adaptation through synthesis for unsupervised person re-identification. In Proceedings of the European Conference on Computer Vision (ECCV). 189–205.

[10]

Guha Balakrishnan, Amy Zhao, Adrian V. Dalca, Fredo Durand, and John Guttag. 2018. Synthesizing images of humans in unseen poses. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8340–8348.

[11]

Alberto Baldrati, Davide Morelli, Giuseppe Cartella, Marcella Cornia, Marco Bertini, and Rita Cucchiara. 2023. Multimodal garment designer: Human-centric latent diffusion models for fashion image editing. arXiv preprint arXiv:2304.02051 (2023).

[12]

Igor Barros Barbosa, Marco Cristani, Barbara Caputo, Aleksander Rognhaugen, and Theoharis Theoharis. 2018. Looking beyond appearances: Synthetic training data for deep CNNs in re-identification. Computer Vision and Image Understanding 167 (2018), 50–62.

Digital Library

[13]

Hugo Bertiche, Niloy J. Mitra, Kuldeep Kulkarni, Chun-Hao P. Huang, Tuanfeng Y. Wang, Meysam Madadi, Sergio Escalera, and Duygu Ceylan. 2023. Blowing in the wind: CycleNet for human cinemagraphs from still images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 459–468.

[14]

Bharat Lal Bhatnagar, Garvita Tiwari, Christian Theobalt, and Gerard Pons-Moll. 2019. Multi-garment net: Learning to dress 3D people from images. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5420–5430.

[15]

Ankan Kumar Bhunia, Salman Khan, Hisham Cholakkal, Rao Muhammad Anwer, Jorma Laaksonen, Mubarak Shah, and Fahad Shahbaz Khan. 2023. Person image synthesis via denoising diffusion model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5968–5976.

[16]

Federica Bogo, Angjoo Kanazawa, Christoph Lassner, Peter Gehler, Javier Romero, and Michael J. Black. 2016. Keep it SMPL: Automatic estimation of 3D human pose and shape from a single image. In European Conference on Computer Vision. Springer, 561–578.

[17]

Sam Bond-Taylor, Adam Leach, Yang Long, and Chris G. Willcocks. 2021. Deep generative modelling: A comparative review of vaes, gans, normalizing flows, energy-based and autoregressive models. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 11 (2021), 7327–7347.

[18]

Andrew Brock, Jeff Donahue, and Karen Simonyan. 2018. Large scale GAN training for high fidelity natural image synthesis. In International Conference on Learning Representations.

[19]

Bindita Chaudhuri, Nikolaos Sarafianos, Linda Shapiro, and Tony Tung. 2021. Semi-supervised synthesis of high-resolution editable textures for 3D humans. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7991–8000.

[20]

Chieh-Yun Chen, Yi-Chung Chen, Hong-Han Shuai, and Wen-Huang Cheng. 2023. Size does matter: Size-aware virtual try-on via clothing-oriented transformation try-on network. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 7513–7522.

[21]

Chieh-Yun Chen, Ling Lo, Pin-Jui Huang, Hong-Han Shuai, and Wen-Huang Cheng. 2021. FashionMirror: Co-attention feature-remapping virtual try-on with sequential template poses. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 13809–13818.

[22]

Wenzheng Chen, Huan Wang, Yangyan Li, Hao Su, Zhenhua Wang, Changhe Tu, Dani Lischinski, Daniel Cohen-Or, and Baoquan Chen. 2016. Synthesizing training images for boosting human 3d pose estimation. In 2016 4th International Conference on 3D Vision (3DV). IEEE, 479–488.

[23]

Xinya Chen, Jiaxin Huang, Yanrui Bin, Lu Yu, and Yiyi Liao. 2023. VeRi3D: Generative vertex-based radiance fields for 3D controllable human image synthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 8986–8997.

[24]

Xin Chen, Anqi Pang, Wei Yang, Peihao Wang, Lan Xu, and Jingyi Yu. 2021. TightCap: 3D human shape capture with clothing tightness field. ACM Transactions on Graphics (TOG) 41, 1 (2021), 1–17.

[25]

Wei Cheng, Ruixiang Chen, Siming Fan, Wanqi Yin, Keyu Chen, Zhongang Cai, Jingbo Wang, Yang Gao, Zhengming Yu, Zhengyu Lin, Daxuan Ren, Lei Yang, Ziwei Liu, Chen Change Loy, Chen Qian, Wayne Wu, Dahua Lin, Bo Dai, and Kwan-Yee Lin. 2023. Dna-rendering: A diverse neural actor repository for high-fidelity human-centric rendering. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 19982–19993.

[26]

Wen-Huang Cheng, Sijie Song, Chieh-Yun Chen, Shintami Chusnul Hidayati, and Jiaying Liu. 2021. Fashion meets computer vision: A survey. ACM Computing Surveys (CSUR) 54, 4 (2021), 1–41.

Digital Library

[27]

Soon Yau Cheong, Armin Mustafa, and Andrew Gilbert. 2023. UPGPT: Universal diffusion model for person image generation, editing and pose transfer. arXiv preprint arXiv:2304.08870 (2023).

[28]

Seunghwan Choi, Sunghyun Park, Minsoo Lee, and Jaegul Choo. 2021. VITON-HD: High-resolution virtual try-on via misalignment-aware normalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14131–14140.

[29]

Ayush Chopra, Rishabh Jain, Mayur Hemani, and Balaji Krishnamurthy. 2021. ZFlow: Gated appearance flow-based virtual try-on with 3D priors. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5433–5442.

[30]

Enric Corona, Albert Pumarola, Guillem Alenya, Gerard Pons-Moll, and Francesc Moreno-Noguer. 2021. SMPLicit: Topology-aware generative model for clothed people. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11875–11885.

[31]

Aiyu Cui, Daniel McKee, and Svetlana Lazebnik. 2021. Dressing in order: Recurrent person image generation for pose transfer, virtual try-on and outfit editing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3940–3945.

[32]

Lu Dai, Liqian Ma, Shenhan Qian, Hao Liu, Ziwei Liu, and Hui Xiong. 2023. Cloth2Body: Generating 3D human body mesh from 2D clothing. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 15007–15017.

[33]

Haoye Dong, Xiaodan Liang, Ke Gong, Hanjiang Lai, Jia Zhu, and Jian Yin. 2018. Soft-gated warping-GAN for pose-guided person image synthesis. In Advances in Neural Information Processing Systems. 474–484.

[34]

Haoye Dong, Xiaodan Liang, Xiaohui Shen, Bochao Wang, Hanjiang Lai, Jia Zhu, Zhiting Hu, and Jian Yin. 2019. Towards multi-pose guided virtual try-on network. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 9026–9035.

[35]

Haoye Dong, Xiaodan Liang, Xiaohui Shen, Bowen Wu, Bing-Cheng Chen, and Jian Yin. 2019. FW-GAN: Flow-navigated warping GAN for video virtual try-on. In Proceedings of the IEEE International Conference on Computer Vision. 1161–1170.

[36]

Junting Dong, Qi Fang, Tianshuo Yang, Qing Shuai, Chengyu Qiao, and Sida Peng. 2023. iVS-Net: Learning human view synthesis from internet videos. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 22942–22951.

[37]

Xin Dong, Fuwei Zhao, Zhenyu Xie, Xijin Zhang, Daniel K. Du, Min Zheng, Xiang Long, Xiaodan Liang, and Jianchao Yang. 2022. Dressing in the wild by watching dance videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3480–3489.

[38]

Zijian Dong, Xu Chen, Jinlong Yang, Michael J. Black, Otmar Hilliges, and Andreas Geiger. 2023. AG3D: Learning to generate 3D avatars from 2D image collections. arXiv preprint arXiv:2305.02312 (2023).

[39]

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations.

[40]

Nicolas Dufour, David Picard, and Vicky Kalogeiton. 2022. Scam! transferring humans between images with semantic cross attention modulation. In European Conference on Computer Vision. Springer, 713–729.

Digital Library

[41]

Patrick Esser, Robin Rombach, and Bjorn Ommer. 2021. Taming transformers for high-resolution image synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12873–12883.

[42]

Patrick Esser, Ekaterina Sutter, and Björn Ommer. 2018. A variational U-Net for conditional appearance and shape generation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8857–8866.

[43]

Jianglin Fu, Shikai Li, Yuming Jiang, Kwan-Yee Lin, Chen Qian, Chen Change Loy, Wayne Wu, and Ziwei Liu. 2022. StyleGAN-human: A data-centric odyssey of human generation. In European Conference on Computer Vision. Springer, 1–19.

Digital Library

[44]

Jianglin Fu, Shikai Li, Yuming Jiang, Kwan-Yee Lin, Wayne Wu, and Ziwei Liu. 2023. UnitedHuman: Harnessing multi-source data for high-resolution human generation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 7301–7311.

[45]

Kyle Gao, Yina Gao, Hongjie He, Dening Lu, Linlin Xu, and Jonathan Li. 2022. Nerf: Neural radiance field in 3D vision, a comprehensive review. arXiv preprint arXiv:2210.00379 (2022).

[46]

Leon A. Gatys, Alexander S. Ecker, and Matthias Bethge. 2016. Image style transfer using convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2414–2423.

[47]

Chongjian Ge, Yibing Song, Yuying Ge, Han Yang, Wei Liu, and Ping Luo. 2021. Disentangled cycle consistency for highly-realistic virtual try-on. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16928–16937.

[48]

Yuying Ge, Yibing Song, Ruimao Zhang, Chongjian Ge, Wei Liu, and Ping Luo. 2021. Parser-free virtual try-on via distilling appearance flows. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8485–8493.

[49]

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in Neural Information Processing Systems. 2672–2680.

Digital Library

[50]

Artur Grigorev, Karim Iskakov, Anastasia Ianina, Renat Bashirov, Ilya Zakharkin, Alexander Vakhitov, and Victor Lempitsky. 2021. Stylepeople: A generative model of fullbody human avatars. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5151–5160.

[51]

Artur Grigorev, Artem Sevastopolsky, Alexander Vakhitov, and Victor Lempitsky. 2019. Coordinate-based texture inpainting for pose-guided human image generation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 12135–12144.

[52]

Jiatao Gu, Lingjie Liu, Peng Wang, and Christian Theobalt. 2021. Stylenerf: A style-based 3D-aware generator for high-resolution image synthesis. arXiv preprint arXiv:2110.08985 (2021).

[53]

Rıza Alp Güler, Natalia Neverova, and Iasonas Kokkinos. 2018. Densepose: Dense human pose estimation in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7297–7306.

[54]

Kai Han, Yunhe Wang, Hanting Chen, Xinghao Chen, Jianyuan Guo, Zhenhua Liu, Yehui Tang, An Xiao, Chunjing Xu, Yixing Xu, Zhaohui Yang, Yiman Zhang, and Dacheng Tao. 2022. A survey on vision transformer. IEEE Transactions on Pattern Analysis and Machine Intelligence 45, 1 (2022), 87–110.

[55]

Xintong Han, Xiaojun Hu, Weilin Huang, and Matthew R. Scott. 2019. Clothflow: A flow-based model for clothed person generation. In Proceedings of the IEEE International Conference on Computer Vision. 10471–10480.

[56]

Xintong Han, Zuxuan Wu, Zhe Wu, Ruichi Yu, and Larry S. Davis. 2018. Viton: An image-based virtual try-on network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7543–7552.

[57]

Xiao Han, Xiatian Zhu, Jiankang Deng, Yi-Zhe Song, and Tao Xiang. 2023. Controllable person image synthesis with pose-constrained latent diffusion. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 22768–22777.

[58]

Nils Hasler, Carsten Stoll, Martin Sunkel, Bodo Rosenhahn, and H-P. Seidel. 2009. A statistical model of human pose and body shape. In Computer Graphics Forum, Vol. 28. Wiley Online Library, 337–346.

[59]

Sen He, Yi-Zhe Song, and Tao Xiang. 2022. Style-based global appearance flow for virtual try-on. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3470–3479.

[60]

Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. GANs trained by a two time-scale update rule converge to a local Nash equilibrium. Advances in Neural Information Processing Systems 30 (2017).

[61]

Hsuan-I. Ho, Lixin Xue, Jie Song, and Otmar Hilliges. 2023. Learning locally editable virtual humans. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 21024–21035.

[62]

Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems 33 (2020), 6840–6851.

[63]

Xun Huang and Serge Belongie. 2017. Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE International Conference on Computer Vision. 1501–1510.

[64]

Yangyi Huang, Hongwei Yi, Weiyang Liu, Haofan Wang, Boxi Wu, Wenxiao Wang, Binbin Lin, Debing Zhang, and Deng Cai. 2023. One-shot implicit animatable avatars with model-based priors. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 8974–8985.

[65]

Zhichao Huang, Xintong Han, Jia Xu, and Tong Zhang. 2021. Few-shot human motion transfer by personalized geometry and texture modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2297–2306.

[66]

Zeng Huang, Yuanlu Xu, Christoph Lassner, Hao Li, and Tony Tung. 2020. Arch: Animatable reconstruction of clothed humans. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3093–3102.

[67]

Catalin Ionescu, Dragos Papava, Vlad Olaru, and Cristian Sminchisescu. 2013. Human3.6m: Large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Transactions on Pattern Analysis and Machine Intelligence 36, 7 (2013), 1325–1339.

Digital Library

[68]

Umar Iqbal, Akin Caliskan, Koki Nagano, Sameh Khamis, Pavlo Molchanov, and Jan Kautz. 2023. Rana: Relightable articulated neural avatars. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 23142–23153.

[69]

Thibaut Issenhuth, Jérémie Mary, and Clément Calauzenes. 2020. Do not mask what you do not need to mask: A parser-free virtual try-on. In Proceedings of the 16th European Conference on Computer Vision (ECCV). Springer, 619–635.

Digital Library

[70]

Yasamin Jafarian, Tuanfeng Y. Wang, Duygu Ceylan, Jimei Yang, Nathan Carr, Yi Zhou, and Hyun Soo Park. 2023. Normal-guided Garment UV prediction for human re-texturing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4627–4636.

[71]

Arjun Jain, Thorsten Thormählen, Hans-Peter Seidel, and Christian Theobalt. 2010. Moviereshape: Tracking and reshaping of humans in videos. ACM Transactions on Graphics (TOG) 29, 6 (2010), 1–10.

Digital Library

[72]

Vinoj Jayasundara, Amit Agrawal, Nicolas Heron, Abhinav Shrivastava, and Larry S. Davis. 2023. FlexNeRF: Photorealistic free-viewpoint rendering of moving humans from sparse views. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 21118–21127.

[73]

Jianbin Jiang, Tan Wang, He Yan, and Junhui Liu. 2022. ClothFormer: Taming video virtual try-on in all module. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10799–10808.

[74]

Suyi Jiang, Haoran Jiang, Ziyu Wang, Haimin Luo, Wenzheng Chen, and Lan Xu. 2023. Humangen: Generating human radiance fields with explicit priors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12543–12554.

[75]

Yuming Jiang, Shuai Yang, Tong Liang Koh, Wayne Wu, Chen Change Loy, and Ziwei Liu. 2023. Text2Performer: Text-driven human video generation. arXiv preprint arXiv:2304.08483 (2023).

[76]

Justin Johnson, Alexandre Alahi, and Li Fei-Fei. 2016. Perceptual losses for real-time style transfer and super-resolution. In European Conference on Computer Vision. Springer, 694–711.

[77]

Tero Karras, Samuli Laine, and Timo Aila. 2019. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4401–4410.

[78]

Salman Khan, Muzammal Naseer, Munawar Hayat, Syed Waqas Zamir, Fahad Shahbaz Khan, and Mubarak Shah. 2022. Transformers in vision: A survey. ACM Computing Surveys (CSUR) 54, 10s (2022), 1–41.

Digital Library

[79]

Diederik P. Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013).

[80]

Sumith Kulal, Tim Brooks, Alex Aiken, Jiajun Wu, Jimei Yang, Jingwan Lu, Alexei A. Efros, and Krishna Kumar Singh. 2023. Putting people in their place: Affordance-aware human insertion into scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 17089–17099.

[81]

Christoph Lassner, Gerard Pons-Moll, and Peter V. Gehler. 2017. A generative model of people in clothing. In Proceedings of the IEEE International Conference on Computer Vision. 853–862.

[82]

Verica Lazova, Eldar Insafutdinov, and Gerard Pons-Moll. 2019. 360-degree textures of people in clothing from a single image. In 2019 International Conference on 3D Vision (3DV’19). IEEE, 643–653.

[83]

Sangyun Lee, Gyojung Gu, Sunghyun Park, Seunghwan Choi, and Jaegul Choo. 2022. High-resolution virtual try-on with misalignment and occlusion-handled conditions. In European Conference on Computer Vision. Springer, 204–219.

Digital Library

[84]

Kathleen M. Lewis, Srivatsan Varadharajan, and Ira Kemelmacher-Shlizerman. 2021. TryonGAN: Body-aware try-on via layered interpolation. ACM Transactions on Graphics (TOG) 40, 4 (2021), 1–10.

Digital Library

[85]

Kun Li, Jinsong Zhang, Yebin Liu, Yu-Kun Lai, and Qionghai Dai. 2020. PoNA: Pose-guided non-local attention for human pose transfer. IEEE Transactions on Image Processing 29 (2020), 9584–9599.

[86]

Nannan Li, Kevin J. Shih, and Bryan A. Plummer. 2023. Collecting the puzzle pieces: Disentangled self-driven human pose transfer by permuting textures. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 7126–7137.

[87]

Tianjiao Li, Wei Zhang, Ran Song, Zhiheng Li, Jun Liu, Xiaolei Li, and Shijian Lu. 2021. PoT-GAN: Pose transform GAN for person image synthesis. IEEE Transactions on Image Processing 30 (2021), 7677–7688.

[88]

Yining Li, Chen Huang, and Chen Change Loy. 2019. Dense intrinsic appearance flow for human pose transfer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3693–3702.

[89]

Zhi Li, Pengfei Wei, Xiang Yin, Zejun Ma, and Alex C. Kot. 2023. Virtual try-on with pose-garment keypoints guided inpainting. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 22788–22797.

[90]

Xiaodan Liang, Chunyan Xu, Xiaohui Shen, Jianchao Yang, Si Liu, Jinhui Tang, Liang Lin, and Shuicheng Yan. 2015. Human parsing with contextualized convolutional neural network. In Proceedings of the IEEE International Conference on Computer Vision. 1386–1394.

Digital Library

[91]

Fangjian Liao, Xingxing Zou, and Waikeung Wong. 2024. Appearance and pose-guided human generation: A survey. Computing Surveys 56, 5 (2024), 1–35.

Digital Library

[92]

Tingting Liao, Xiaomei Zhang, Yuliang Xiu, Hongwei Yi, Xudong Liu, Guo-Jun Qi, Yong Zhang, Xuan Wang, Xiangyu Zhu, and Zhen Lei. 2023. High-fidelity clothed avatar reconstruction from a single image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8662–8672.

[93]

Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2017. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision. 2980–2988.

[94]

Chong Liu, Xiaojun Chang, and Yi-Dong Shen. 2020. Unity style transfer for person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6887–6896.

[95]

Jinxian Liu, Bingbing Ni, Yichao Yan, Peng Zhou, Shuo Cheng, and Jianguo Hu. 2018. Pose transferrable person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4099–4108.

[96]

Kuan-Hsien Liu, Ting-Yen Chen, and Chu-Song Chen. 2016. MVC: A dataset for view-invariant clothing retrieval and attribute prediction. In Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval. 313–316.

Digital Library

[97]

Lingjie Liu, Marc Habermann, Viktor Rudnev, Kripasindhu Sarkar, Jiatao Gu, and Christian Theobalt. 2021. Neural actor: Neural free-view synthesis of human actors with pose control. ACM Transactions on Graphics (TOG) 40, 6 (2021), 1–16.

Digital Library

[98]

Lingjie Liu, Weipeng Xu, Michael Zollhoefer, Hyeongwoo Kim, Florian Bernard, Marc Habermann, Wenping Wang, and Christian Theobalt. 2019. Neural rendering and reenactment of human actor videos. ACM Transactions on Graphics (TOG) 38, 5 (2019), 1–14.

Digital Library

[99]

Ming-Yu Liu, Xun Huang, Jiahui Yu, Ting-Chun Wang, and Arun Mallya. 2021. Generative adversarial networks for image and video synthesis: Algorithms and applications. Proceedings of the IEEE 109, 5 (2021), 839–862.

[100]

Ting Liu, Jianfeng Zhang, Xuecheng Nie, Yunchao Wei, Shikui Wei, Yao Zhao, and Jiashi Feng. 2021. Spatial-aware texture transformer for high-fidelity garment transfer. IEEE Transactions on Image Processing 30 (2021), 7499–7510.

[101]

Wen Liu, Zhixin Piao, Jie Min, Wenhan Luo, Lin Ma, and Shenghua Gao. 2019. Liquid warping GAN: A unified framework for human motion imitation, appearance transfer and novel view synthesis. In Proceedings of the IEEE International Conference on Computer Vision. 5904–5913.

[102]

Wen Liu, Zhixin Piao, Zhi Tu, Wenhan Luo, Lin Ma, and Shenghua Gao. 2021. Liquid warping GAN with attention: A unified framework for human image synthesis. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 9 (2021), 5114–5132.

[103]

Yu Liu, Wei Chen, Li Liu, and Michael S. Lew. 2019. Swapgan: A multistage generative approach for person-to-person fashion style transfer. IEEE Transactions on Multimedia 21, 9 (2019), 2209–2222.

[104]

Ziwei Liu, Ping Luo, Shi Qiu, Xiaogang Wang, and Xiaoou Tang. 2016. Deepfashion: Powering robust clothes recognition and retrieval with rich annotations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1096–1104.

[105]

Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J. Black. 2015. SMPL: A skinned multi-person linear model. ACM Transactions on Graphics (TOG) 34, 6 (2015), 1–16.

Digital Library

[106]

Zhengyao Lv, Xiaoming Li, Xin Li, Fu Li, Tianwei Lin, Dongliang He, and Wangmeng Zuo. 2021. Learning semantic person image generation by region-adaptive normalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10806–10815.

[107]

Liyuan Ma, Tingwei Gao, Haitian Jiang, Haibin Shen, and Kejie Huang. 2023. WaveIPT: Joint attention and flow alignment in the wavelet domain for pose transfer. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 7215–7225.

[108]

Liqian Ma, Xu Jia, Qianru Sun, Bernt Schiele, Tinne Tuytelaars, and Luc Van Gool. 2017. Pose guided person image generation. In Advances in Neural Information Processing Systems. 406–416.

[109]

Liqian Ma, Qianru Sun, Stamatios Georgoulis, Luc Van Gool, Bernt Schiele, and Mario Fritz. 2018. Disentangled person image generation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 99–108.

[110]

Tianxiang Ma, Bo Peng, Wei Wang, and Jing Dong. 2021. MUST-GAN: Multi-level statistics transfer for self-driven person image generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13622–13631.

[111]

Roey Mechrez, Itamar Talmi, and Lihi Zelnik-Manor. 2018. The contextual loss for image transformation with non-aligned data. In Proceedings of the European Conference on Computer Vision (ECCV). 768–783.

Digital Library

[112]

Dushyant Mehta, Srinath Sridhar, Oleksandr Sotnychenko, Helge Rhodin, Mohammad Shafiei, Hans-Peter Seidel, Weipeng Xu, Dan Casas, and Christian Theobalt. 2017. Vnect: Real-time 3D human pose estimation with a single RGB camera. ACM Transactions on Graphics (TOG) 36, 4 (2017), 1–14.

Digital Library

[113]

Yifang Men, Yiming Mao, Yuning Jiang, Wei-Ying Ma, and Zhouhui Lian. 2020. Controllable person image synthesis with attribute-decomposed GAN. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5084–5093.

[114]

B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng. 2020. Nerf: Representing scenes as neural radiance fields for view synthesis. In European Conference on Computer Vision.

Digital Library

[115]

Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. 2021. Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM 65, 1 (2021), 99–106.

Digital Library

[116]

Aymen Mir, Thiemo Alldieck, and Gerard Pons-Moll. 2020. Learning to transfer texture from clothing images to 3D humans. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7023–7034.

[117]

Gyeongsik Moon, Hyeongjin Nam, Takaaki Shiratori, and Kyoung Mu Lee. 2022. 3D clothed human reconstruction in the wild. In European Conference on Computer Vision. 184–200.

Digital Library

[118]

Davide Morelli, Alberto Baldrati, Giuseppe Cartella, Marcella Cornia, Marco Bertini, and Rita Cucchiara. 2023. LaDI-VTON: Latent diffusion textual-inversion enhanced virtual try-on. arXiv preprint arXiv:2305.13501 (2023).

[119]

Davide Morelli, Matteo Fincato, Marcella Cornia, Federico Landi, Fabio Cesari, and Rita Cucchiara. 2022. Dress code: High-resolution multi-category virtual try-on. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2231–2235.

[120]

Jiteng Mu, Shen Sang, Nuno Vasconcelos, and Xiaolong Wang. 2023. ActorsNeRF: Animatable few-shot human rendering with generalizable NeRFs. arXiv preprint arXiv:2304.14401 (2023).

[121]

Assaf Neuberger, Eran Borenstein, Bar Hilleli, Eduard Oks, and Sharon Alpert. 2020. Image based virtual try-on network from unpaired data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5184–5193.

[122]

Natalia Neverova, Riza Alp Guler, and Iasonas Kokkinos. 2018. Dense pose transfer. In Proceedings of the European Conference on Computer Vision (ECCV). 123–138.

Digital Library

[123]

Haomiao Ni, Changhao Shi, Kai Li, Sharon X. Huang, and Martin Renqiang Min. 2023. Conditional image-to-video generation with latent flow diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18444–18455.

[124]

Ahmed A. A. Osman, Timo Bolkart, and Michael J. Black. 2020. Star: Sparse trained articulated human body regressor. In Proceedings of the 16th European Conference on Computer Vision (ECCV). Springer, 598–613.

Digital Library

[125]

Xi Ouyang, Yu Cheng, Yifan Jiang, Chun-Liang Li, and Pan Zhou. 2018. Pedestrian-synthesis-GAN: Generating pedestrian data in real scene and beyond. arXiv preprint arXiv:1804.02047 (2018).

[126]

Georgios Pavlakos, Vasileios Choutas, Nima Ghorbani, Timo Bolkart, Ahmed A. A. Osman, Dimitrios Tzionas, and Michael J. Black. 2019. Expressive body capture: 3D hands, face, and body from a single image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10975–10985.

[127]

Sida Peng, Junting Dong, Qianqian Wang, Shangzhan Zhang, Qing Shuai, Xiaowei Zhou, and Hujun Bao. 2021. Animatable neural radiance fields for modeling dynamic human bodies. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 14314–14323.

[128]

Sida Peng, Yuanqing Zhang, Yinghao Xu, Qianqian Wang, Qing Shuai, Hujun Bao, and Xiaowei Zhou. 2021. Neural body: Implicit neural representations with structured latent codes for novel view synthesis of dynamic humans. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9054–9063.

[129]

Leonid Pishchulin, Arjun Jain, Mykhaylo Andriluka, Thorsten Thormählen, and Bernt Schiele. 2012. Articulated people detection and pose estimation: Reshaping the future. In 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3178–3185.

[130]

Leonid Pishchulin, Arjun Jain, Christian Wojek, Mykhaylo Andriluka, Thorsten Thormählen, and Bernt Schiele. 2011. Learning people detection models from few training samples. In Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition. 1473–1480.

[131]

Gerard Pons-Moll, Sergi Pujades, Sonny Hu, and Michael J. Black. 2017. ClothCap: Seamless 4D clothing capture and retargeting. ACM Transactions on Graphics (TOG) 36, 4 (2017), 1–15.

Digital Library

[132]

Albert Pumarola, Antonio Agudo, Alberto Sanfeliu, and Francesc Moreno-Noguer. 2018. Unsupervised person image synthesis in arbitrary poses. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8620–8628.

[133]

Xuelin Qian, Yanwei Fu, Tao Xiang, Wenxuan Wang, Jie Qiu, Yang Wu, Yu-Gang Jiang, and Xiangyang Xue. 2018. Pose-normalized image generation for person re-identification. In Proceedings of the European Conference on Computer Vision (ECCV). 650–667.

Digital Library

[134]

Amit Raj, Patsorn Sangkloy, Huiwen Chang, James Hays, Duygu Ceylan, and Jingwan Lu. 2018. Swapnet: Image based garment transfer. In European Conference on Computer Vision. Springer, 679–695.

Digital Library

[135]

Yurui Ren, Xiaoqing Fan, Ge Li, Shan Liu, and Thomas H. Li. 2022. Neural texture extraction and distribution for controllable person image synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13535–13544.

[136]

Yurui Ren, Ge Li, Shan Liu, and Thomas H. Li. 2020. Deep spatial transformation for pose-guided person image generation and animation. IEEE Transactions on Image Processing 29 (2020), 8622–8635.

[137]

Yurui Ren, Xiaoming Yu, Junming Chen, Thomas H. Li, and Ge Li. 2020. Deep image spatial transformation for person image generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7690–7699.

[138]

Grégory Rogez and Cordelia Schmid. 2016. Mocap-guided data augmentation for 3D pose estimation in the wild. Advances in Neural Information Processing Systems 29 (2016), 3108–3116.

[139]

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10684–10695.

[140]

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-assisted Intervention. Springer, 234–241.

[141]

Shunsuke Saito, Zeng Huang, Ryota Natsume, Shigeo Morishima, Angjoo Kanazawa, and Hao Li. 2019. Pifu: Pixel-aligned implicit function for high-resolution clothed human digitization. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 2304–2314.

[142]

Shunsuke Saito, Tomas Simon, Jason Saragih, and Hanbyul Joo. 2020. Pifuhd: Multi-level pixel-aligned implicit function for high-resolution 3d human digitization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 84–93.

[143]

Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. 2016. Improved techniques for training gans. arXiv preprint arXiv:1606.03498 (2016).

[144]

Kripasindhu Sarkar, Lingjie Liu, Vladislav Golyanik, and Christian Theobalt. 2021. HumanGAN: A generative model of human images. In 2021 International Conference on 3D Vision (3DV’21). IEEE, 258–267.

[145]

Kripasindhu Sarkar, Dushyant Mehta, Weipeng Xu, Vladislav Golyanik, and Christian Theobalt. 2020. Neural re-rendering of humans from a single image. In European Conference on Computer Vision. Springer, 596–613.

Digital Library

[146]

Scott Schaefer, Travis McPhail, and Joe Warren. 2006. Image deformation using moving least squares. In ACM SIGGRAPH 2006 Papers. 533–540.

Digital Library

[147]

Katja Schwarz, Yiyi Liao, Michael Niemeyer, and Andreas Geiger. 2020. Graf: Generative radiance fields for 3D-aware image synthesis. Advances in Neural Information Processing Systems 33 (2020), 20154–20166.

[148]

Tong Sha, Wei Zhang, Tong Shen, Zhoujun Li, and Tao Mei. 2023. Deep person generation: A survey from the perspective of face, pose, and cloth synthesis. Computing Surveys 55, 12 (2023), 1–37.

Digital Library

[149]

Ariel Shamir and Olga Sorkine. 2009. Visual media retargeting. In ACM SIGGRAPH ASIA 2009 Courses. 1–13.

Digital Library

[150]

Fei Shen, Hu Ye, Jun Zhang, Cong Wang, Xiao Han, and Wei Yang. 2023. Advancing pose-guided image synthesis with progressive conditional diffusion models. arXiv preprint arXiv:2310.06313 (2023).

[151]

Aliaksandra Shysheya, Egor Zakharov, Kara-Ali Aliev, Renat Bashirov, Egor Burkov, Karim Iskakov, Aleksei Ivakhnenko, Yury Malkov, Igor Pasechnik, Dmitry Ulyanov, Alexander Vakhitov, and Victor Lempitsky. 2019. Textured neural avatars. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2387–2397.

[152]

Chenyang Si, Wei Wang, Liang Wang, and Tieniu Tan. 2018. Multistage adversarial losses for pose-based human image synthesis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 118–126.

[153]

Aliaksandr Siarohin, Stéphane Lathuilière, Enver Sangineto, and Nicu Sebe. 2019. Appearance and pose-conditioned human image generation using deformable GANs. IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 4 (2019), 1156–1171.

[154]

Aliaksandr Siarohin, Stéphane Lathuilière, Sergey Tulyakov, Elisa Ricci, and Nicu Sebe. 2019. First order motion model for image animation. Advances in Neural Information Processing Systems 32 (2019).

[155]

Aliaksandr Siarohin, Enver Sangineto, Stéphane Lathuiliere, and Nicu Sebe. 2018. Deformable GANs for pose-based human image generation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3408–3416.

[156]

Kihyuk Sohn, Honglak Lee, and Xinchen Yan. 2015. Learning structured output representation using deep conditional generative models. Advances in Neural Information Processing Systems 28 (2015), 3483–3491.

[157]

Sijie Song, Wei Zhang, Jiaying Liu, Zongming Guo, and Tao Mei. 2020. Unpaired person image generation with semantic parsing transformation. IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 11 (2020), 4161–4176.

[158]

Sijie Song, Wei Zhang, Jiaying Liu, and Tao Mei. 2019. Unsupervised person image generation with semantic parsing transformation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2357–2366.

[159]

Xiaoxiao Sun and Liang Zheng. 2019. Dissecting person re-identification from the viewpoint of viewpoint. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 608–617.

[160]

David Svitov, Dmitrii Gudkov, Renat Bashirov, and Victor Lempitsky. 2023. Dinar: Diffusion inpainting of neural textures for one-shot human avatars. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 7062–7072.

[161]

Hao Tang, Song Bai, Philip H. S. Torr, and Nicu Sebe. 2020. Bipartite graph reasoning GANs for person image generation. arXiv preprint arXiv:2008.04381 (2020).

[162]

Hao Tang, Song Bai, Li Zhang, Philip H. S. Torr, and Nicu Sebe. 2020. XingGAN for person image generation. arXiv preprint arXiv:2007.09278 (2020).

[163]

Hao Tang, Dan Xu, Gaowen Liu, Wei Wang, Nicu Sebe, and Yan Yan. 2019. Cycle in cycle generative adversarial networks for keypoint-guided image generation. In Proceedings of the 27th ACM International Conference on Multimedia. 2052–2060.

Digital Library

[164]

Lucas Theis, Aäron van den Oord, and Matthias Bethge. 2015. A note on the evaluation of generative models. arXiv preprint arXiv:1511.01844 (2015).

[165]

Gul Varol, Javier Romero, Xavier Martin, Naureen Mahmood, Michael J. Black, Ivan Laptev, and Cordelia Schmid. 2017. Learning from synthetic humans. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 109–117.

[166]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in Neural Information Processing Systems 30 (2017).

[167]

Bochao Wang, Huabin Zheng, Xiaodan Liang, Yimin Chen, Liang Lin, and Meng Yang. 2018. Toward characteristic-preserving image-based virtual try-on network. In Proceedings of the European Conference on Computer Vision (ECCV). 589–604.

Digital Library

[168]

Junying Wang, Jae Shin Yoon, Tuanfeng Y. Wang, Krishna Kumar Singh, and Ulrich Neumann. 2023. Complete 3D human reconstruction from a single incomplete image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8748–8758.

[169]

Jian Wang, Yunshan Zhong, Yachun Li, Chi Zhang, and Yichen Wei. 2019. Re-identification supervised texture generation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 11846–11856.

[170]

Lei Wang, Wei Chen, Wenjia Yang, Fangming Bi, and Fei Richard Yu. 2020. A state-of-the-art review on image synthesis with generative adversarial networks. IEEE Access 8 (2020), 63514–63537.

[171]

Ting-Chun Wang, Ming-Yu Liu, Andrew Tao, Guilin Liu, Bryan Catanzaro, and Jan Kautz. 2019. Few-shot video-to-video synthesis. Advances in Neural Information Processing Systems 32 (2019).

[172]

Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. 2018. Non-local neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7794–7803.

[173]

Yanan Wang, Shengcai Liao, and Ling Shao. 2020. Surpassing real-world source training data: Random 3D characters for generalizable person re-identification. In Proceedings of the 28th ACM International Conference on Multimedia. 3422–3430.

Digital Library

[174]

Zhou Wang, Alan C. Bovik, Hamid R. Sheikh, and Eero P. Simoncelli. 2004. Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing 13, 4 (2004), 600–612.

Digital Library

[175]

Zijian Wang, Xingqun Qi, Kun Yuan, and Muyi Sun. 2022. Self-supervised correlation mining network for person image generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7703–7712.

[176]

Chung-Yi Weng, Brian Curless, Pratul P. Srinivasan, Jonathan T. Barron, and Ira Kemelmacher-Shlizerman. 2022. Humannerf: Free-viewpoint rendering of moving people from monocular video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16210–16220.

[177]

Chung-Yi Weng, Pratul P. Srinivasan, Brian Curless, and Ira Kemelmacher-Shlizerman. 2023. PersonNeRF: Personalized reconstruction from photo collections. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 524–533.

[178]

Bowen Wu, Zhenyu Xie, Xiaodan Liang, Yubei Xiao, Haoye Dong, and Liang Lin. 2021. Image comes dancing with collaborative parsing-flow video synthesis. IEEE Transactions on Image Processing 30 (2021), 9259–9269.

[179]

Xian Wu, Chen Li, Shi-Min Hu, and Yu-Wing Tai. 2021. Hierarchical generation of human pose with part-based layer representation. IEEE Transactions on Image Processing 30 (2021), 7856–7866.

[180]

Zhenyu Xie, Zaiyu Huang, Xin Dong, Fuwei Zhao, Haoye Dong, Xijin Zhang, Feida Zhu, and Xiaodan Liang. 2023. GP-VTON: Towards general purpose virtual try-on via collaborative local-flow global-parsing learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 23550–23559.

[181]

Zhenyu Xie, Zaiyu Huang, Fuwei Zhao, Haoye Dong, Michael Kampffmeyer, and Xiaodan Liang. 2021. Towards scalable unpaired virtual try-on via patch-routed spatially-adaptive GAN. Advances in Neural Information Processing Systems 34 (2021), 2598–2610.

[182]

Zhangyang Xiong, Di Kang, Derong Jin, Weikai Chen, Linchao Bao, Shuguang Cui, and Xiaoguang Han. 2023. Get3DHuman: Lifting StyleGAN-human into a 3D generative model using pixel-aligned reconstruction priors. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 9287–9297.

[183]

Yuliang Xiu, Jinlong Yang, Xu Cao, Dimitrios Tzionas, and Michael J. Black. 2023. ECON: Explicit clothed humans optimized via normal integration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 512–523.

[184]

Yuliang Xiu, Jinlong Yang, Dimitrios Tzionas, and Michael J. Black. 2022. Icon: Implicit clothed humans obtained from normals. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 13286–13296.

[185]

Chengming Xu, Yanwei Fu, Chao Wen, Ye Pan, Yu-Gang Jiang, and Xiangyang Xue. 2020. Pose-guided person image synthesis in the non-iconic views. IEEE Transactions on Image Processing 29 (2020), 9060–9072.

[186]

Feng Xu, Yebin Liu, Carsten Stoll, James Tompkin, Gaurav Bharaj, Qionghai Dai, Hans-Peter Seidel, Jan Kautz, and Christian Theobalt. 2011. Video-based characters: Creating new human performances from a multi-view video database. In ACM SIGGRAPH 2011 Papers. 1–10.

Digital Library

[187]

Munan Xu, Yuanqi Chen, Shan Liu, Thomas H. Li, and Ge Li. 2021. Structure-transformed texture-enhanced network for person image synthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 13859–13868.

[188]

Xiaogang Xu, Ying-Cong Chen, Xin Tao, and Jiaya Jia. 2021. Text-guided human image manipulation via image-text shared space. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 10 (2021), 6486–6500.

[189]

Xingqian Xu, Jiayi Guo, Zhangyang Wang, Gao Huang, Irfan Essa, and Humphrey Shi. 2023. Prompt-free diffusion: Taking ”text” out of text-to-image diffusion models. arXiv preprint arXiv:2305.16223 (2023).

[190]

Xiangyu Xu and Chen Change Loy. 2021. 3D human texture estimation from a single image with transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 13849–13858.

[191]

Haonan Yan, Jiaqi Chen, Xujie Zhang, Shengkai Zhang, Nianhong Jiao, Xiaodan Liang, and Tianxiang Zheng. 2021. Ultrapose: Synthesizing dense pose with 1 billion points by human-body decoupling 3d model. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 10891–10900.

[192]

Keyu Yan, Tingwei Gao, Hui Zhang, and Chengjun Xie. 2023. Linking garment with person via semantically associated landmarks for virtual try-on. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 17194–17204.

[193]

Chaojie Yang, Hanhui Li, Shengjie Wu, Shengkai Zhang, Haonan Yan, Nianhong Jiao, Jie Tang, Runnan Zhou, Xiaodan Liang, and Tianxiang Zheng. 2022. BodyGAN: General-purpose controllable neural human body generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7733–7742.

[194]

Ceyuan Yang, Zhe Wang, Xinge Zhu, Chen Huang, Jianping Shi, and Dahua Lin. 2018. Pose guided human video generation. In Proceedings of the European Conference on Computer Vision (ECCV). 201–216.

Digital Library

[195]

Fan Yang and Guosheng Lin. 2021. CT-Net: Complementary transfering network for garment transfer with arbitrary geometric changes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9899–9908.

[196]

Han Yang, Xinrui Yu, and Ziwei Liu. 2022. Full-range virtual try-on with recurrent tri-level transform. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3460–3469.

[197]

Han Yang, Ruimao Zhang, Xiaobao Guo, Wei Liu, Wangmeng Zuo, and Ping Luo. 2020. Towards photo-realistic virtual try-on by adaptively generating-preserving image content. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7850–7859.

[198]

Lingbo Yang, Pan Wang, Chang Liu, Zhanning Gao, Peiran Ren, Xinfeng Zhang, Shanshe Wang, Siwei Ma, Xiansheng Hua, and Wen Gao. 2021. Towards fine-grained human pose transfer with detail replenishing network. IEEE Transactions on Image Processing 30 (2021), 2422–2435.

[199]

Ling Yang, Zhilong Zhang, Yang Song, Shenda Hong, Runsheng Xu, Yue Zhao, Wentao Zhang, Bin Cui, and Ming-Hsuan Yang. 2023. Diffusion models: A comprehensive survey of methods and applications. Computing Surveys 56, 4 (2023), 1–39.

Digital Library

[200]

Shan Yang, Tanya Ambert, Zherong Pan, Ke Wang, Licheng Yu, Tamara Berg, and Ming C. Lin. 2016. Detailed garment recovery from a single-view image. arXiv preprint arXiv:1608.01250 (2016).

[201]

Zhuoqian Yang, Shikai Li, Wayne Wu, and Bo Dai. 2023. 3DHumanGAN: 3D-aware human image generation with 3D pose mapping. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 23008–23019.

[202]

Jae Shin Yoon, Lingjie Liu, Vladislav Golyanik, Kripasindhu Sarkar, Hyun Soo Park, and Christian Theobalt. 2021. Pose-guided human animation from a single image in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 15039–15048.

[203]

Ruiyun Yu, Xiaoqi Wang, and Xiaohui Xie. 2019. Vtnfp: An image-based virtual try-on network with body and clothing feature preservation. In Proceedings of the IEEE International Conference on Computer Vision. 10511–10520.

[204]

Wing-Yin Yu, Lai-Man Po, Ray C. C. Cheung, Yuzhi Zhao, Yu Xue, and Kun Li. 2023. Bidirectionally deformable motion modulation for video-based human pose transfer. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 7502–7512.

[205]

Wing-Yin Yu, Lai-Man Po, Jingjing Xiong, Yuzhi Zhao, and Pengfei Xian. 2022. ShaTure: Shape and texture deformation for human pose and attribute transfer. IEEE Transactions on Image Processing 31 (2022), 2541–2556.

[206]

Zhengming Yu, Wei Cheng, Xian Liu, Wayne Wu, and Kwan-Yee Lin. 2023. MonoHuman: Animatable human neural field from monocular video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16943–16953.

[207]

Polina Zablotskaia, Aliaksandr Siarohin, Bo Zhao, and Leonid Sigal. 2019. Dwnet: Dense warp-based network for pose-guided human video generation. arXiv preprint arXiv:1910.09139 (2019).

[208]

Mihai Zanfir, Alin-Ionut Popa, Andrei Zanfir, and Cristian Sminchisescu. 2018. Human appearance transfer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5391–5399.

[209]

Fangneng Zhan, Yingchen Yu, Rongliang Wu, Jiahui Zhang, Shijian Lu, Lingjie Liu, Adam Kortylewski, Christian Theobalt, and Eric Xing. 2023. Multimodal image synthesis and editing: The generative AI era. IEEE Transactions on Pattern Analysis and Machine Intelligence 45, 12 (2023), 15098–15119.

[210]

Jinsong Zhang, Kun Li, Yu-Kun Lai, and Jingyu Yang. 2021. PISE: Person image synthesis and editing with decoupled GAN. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7982–7990.

[211]

Jichao Zhang, Enver Sangineto, Hao Tang, Aliaksandr Siarohin, Zhun Zhong, Nicu Sebe, and Wei Wang. 2022. 3D-aware semantic-guided generative model for human synthesis. In European Conference on Computer Vision. Springer, 339–356.

Digital Library

[212]

Kaiduo Zhang, Muyi Sun, Jianxin Sun, Binghao Zhao, Kunbo Zhang, Zhenan Sun, and Tieniu Tan. 2022. HumanDiffusion: A coarse-to-fine alignment diffusion framework for controllable text-driven person image generation. arXiv preprint arXiv:2211.06235 (2022).

[213]

Pengze Zhang, Lingxiao Yang, Jian-Huang Lai, and Xiaohua Xie. 2022. Exploring dual-task correlation for pose guided person image generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7713–7722.

[214]

Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, and Oliver Wang. 2018. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 586–595.

[215]

Tianyu Zhang, Lingxi Xie, Longhui Wei, Zijie Zhuang, Yongfei Zhang, Bo Li, and Qi Tian. 2021. UnrealPerson: An adaptive pipeline towards costless person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11506–11515.

[216]

Xuanmeng Zhang, Jianfeng Zhang, Rohan Chacko, Hongyi Xu, Guoxian Song, Yi Yang, and Jiashi Feng. 2023. GETAvatar: Generative textured meshes for animatable human avatars. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 2273–2282.

[217]

Bo Zhao, Xiao Wu, Zhi-Qi Cheng, Hao Liu, Zequn Jie, and Jiashi Feng. 2018. Multi-view image generation from a single-view. In Proceedings of the 26th ACM International Conference on Multimedia. 383–391.

Digital Library

[218]

Fang Zhao, Shengcai Liao, Kaihao Zhang, and Ling Shao. 2020. Human parsing based texture transfer from single image to 3D human via cross-view consistency. Advances in Neural Information Processing Systems 33 (2020), 14326–14337.

[219]

Fuwei Zhao, Zhenyu Xie, Michael Kampffmeyer, Haoye Dong, Songfang Han, Tianxiang Zheng, Tao Zhang, and Xiaodan Liang. 2021. M3D-VTON: A monocular-to-3D virtual try-on network. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 13239–13249.

[220]

Haitian Zheng, Lele Chen, Chenliang Xu, and Jiebo Luo. 2020. Pose flow learning from person images for pose guided synthesis. IEEE Transactions on Image Processing 30 (2020), 1898–1909.

Digital Library

[221]

Liang Zheng, Liyue Shen, Lu Tian, Shengjin Wang, Jingdong Wang, and Qi Tian. 2015. Scalable person re-identification: A benchmark. In Proceedings of the IEEE International Conference on Computer Vision. 1116–1124.

Digital Library

[222]

Zhedong Zheng, Xiaodong Yang, Zhiding Yu, Liang Zheng, Yi Yang, and Jan Kautz. 2019. Joint discriminative and generative learning for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2138–2147.

[223]

Zerong Zheng, Tao Yu, Yixuan Wei, Qionghai Dai, and Yebin Liu. 2019. Deephuman: 3D human reconstruction from a single image. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 7739–7749.

[224]

Shizhe Zhou, Hongbo Fu, Ligang Liu, Daniel Cohen-Or, and Xiaoguang Han. 2010. Parametric reshaping of human bodies in images. ACM Transactions on Graphics (TOG) 29, 4 (2010), 1–10.

Digital Library

[225]

Xingran Zhou, Siyu Huang, Bin Li, Yingming Li, Jiachen Li, and Zhongfei Zhang. 2019. Text guided person image synthesis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3663–3672.

[226]

Xinyue Zhou, Mingyu Yin, Xinyuan Chen, Li Sun, Changxin Gao, and Qingli Li. 2022. Cross attention based style distribution for controllable person image synthesis. In European Conference on Computer Vision. Springer, 161–178.

Digital Library

[227]

Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision. 2223–2232.

[228]

Shizhan Zhu, Raquel Urtasun, Sanja Fidler, Dahua Lin, and Chen Change Loy. 2017. Be your own Prada: Fashion synthesis with structural coherence. In Proceedings of the IEEE International Conference on Computer Vision. 1680–1688.

[229]

Zhen Zhu, Tengteng Huang, Baoguang Shi, Miao Yu, Bofei Wang, and Xiang Bai. 2019. Progressive pose attention transfer for person image generation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2347–2356.

[230]

Zhen Zhu, Tengteng Huang, Mengde Xu, Baoguang Shi, Wenqing Cheng, and Xiang Bai. 2021. Progressive and aligned pose attention transfer for person image generation. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 8 (2021), 4306–4320.

Index Terms

Human Image Generation: A Comprehensive Survey
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
  2. Computer graphics
    1. Image manipulation
      1. Image processing
      2. Image-based rendering

Recommendations

2.5D Pose Guided Human Image Generation
ICMR '21: Proceedings of the 2021 International Conference on Multimedia Retrieval

In this paper, we propose a 2.5D pose guided human image generation method that integrates depth information with 2D poses. Given a target 2.5D pose and an image of a person, our method generates a new image of that person with the target pose. To ...
Open-Vocabulary Text-Driven Human Image Generation
Abstract
Generating human images from open-vocabulary text descriptions is an exciting but challenging task. Previous methods (i.e., Text2Human) face two challenging problems: (1) they cannot well handle the open-vocabulary setting by arbitrary text inputs ...
Semantic-aware human object interaction image generation
ICML'24: Proceedings of the 41st International Conference on Machine Learning

Recent text-to-image generative models have demonstrated remarkable abilities in generating realistic images. Despite their great success, these models struggle to generate high-fidelity images with prompts oriented toward human-object interaction (HOI). ...

Comments

Information & Contributors

Information

Published In

cover image ACM Computing Surveys

ACM Computing Surveys Volume 56, Issue 11

November 2024

977 pages

EISSN:1557-7341

DOI:10.1145/3613686

Editors:
David Atienza
Swiss Federal Institute of Technology Lausanne (EPFL), Switzerland
,
Michela Milano
University of Bologna, Italy

Issue’s Table of Contents

Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 June 2024

Online AM: 22 May 2024

Accepted: 08 May 2024

Revised: 08 March 2024

Received: 09 December 2022

Published in CSUR Volume 56, Issue 11

Check for updates

Author Tags

Qualifiers

Survey

Funding Sources

National Science and Technology Major Project
National Natural Science Foundation of China
China Postdoctoral Science Foundation

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
1,033
Total Downloads

Downloads (Last 12 months)1,033
Downloads (Last 6 weeks)93

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Figures

Tables

Media

View full text|Download PDF

View Issue’s Table of Contents