Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

A novel Multi-scale architecture driven by decoupled semantic attention transfer for person image generation

Published: 01 April 2023 Publication History

Abstract

Person image generation is a challenging task aimed to transfer the person of the source image from a source pose to a target pose while preserving its style. In this paper, we proposed a Generative Adversarial Network based on Decoupled Semantic Attention Transfer (DSAT-GAN), focusing on that local semantic representations of different image styles and contents cannot be accurately decoupled and transferred. This architecture employs a novel Multi-scale Semantic Mapping Generation Network (Ms-SMGN), driven by two network modules with different semantic attention mechanism, aiming to accurately align and transfer the representations of local semantics at different spatial scales. Then, a channel-separated convolution is applied in the encoding networks instead of the traditional channel fully-connected operation, which reduces computational complexity while realizing channel semantic decoupling. Moreover, a Gram matrix-based global style loss is introduced to further enhance the consistency of high-level semantic between generated and target images. Experiments on Market-1501 and DeepFashion datasets show that DSAT-GAN has superior performance compared with other recent baselines. Additionally, this architecture can be extended to the data enhancement scenes to significantly improve the accuracy of person Re-identification.

Graphical abstract

Display Omitted

Highlights

Our model alleviates the loss of semantic features of complex alignment regions.
Encoders learn feature vectors with channel-separated convolution and attention.
Decoders can fuse and decode feature vectors and generate integrated person details.
Global style loss constrain the high-level semantic of generated person images.
The experimental results on the datasets demonstrate the superiority of our model.

References

[1]
Ma L., Jia X., Sun Q., Schiele B., Tuytelaars T., Van Gool L., Pose guided person image generation, Adv Neural Inf Process Syst 30 (2017).
[2]
Zhang W., He X., Lu W., Qiao H., Li Y., Feature aggregation with reinforcement learning for video-based person re-identification, IEEE Trans Neural Netw Learn Syst 30 (12) (2019) 3847–3852.
[3]
Zhang W., He X., Yu X., Lu W., Zha Z., Tian Q., A multi-scale spatial-temporal attention model for person re-identification in videos, IEEE Trans Image Process 29 (2019) 3365–3373.
[4]
Dong H, Liang X, Shen X, Wang B, Lai H, Zhu J, et al. Towards multi-pose guided virtual try-on network. In: Proceedings of the IEEE/CVF international conference on computer vision. 2019, p. 9026–35.
[5]
Yang H, Zhang R, Guo X, Liu W, Zuo W, Luo P. Towards photo-realistic virtual try-on by adaptively generating-preserving image content. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020, p. 7850–9.
[6]
Villegas R., Yang J., Zou Y., Sohn S., Lin X., Lee H., Learning to generate long-term future via hierarchical prediction, in: International conference on machine learning, PMLR, 2017, pp. 3560–3569.
[7]
Yang C, Wang Z, Zhu X, Huang C, Shi J, Lin D. Pose guided human video generation. In: Proceedings of the European conference on computer vision. 2018, p. 201–16.
[8]
Pumarola A, Agudo A, Sanfeliu A, Moreno-Noguer F. Unsupervised person image synthesis in arbitrary poses. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2018, p. 8620–8.
[9]
Siarohin A., Lathuilière S., Sangineto E., Sebe N., Appearance and pose-conditioned human image generation using deformable gans, IEEE Trans Pattern Anal Mach Intell 43 (4) (2019) 1156–1171.
[10]
Dong H., Liang X., Gong K., Lai H., Zhu J., Yin J., Soft-gated warping-gan for pose-guided person image synthesis, Adv Neural Inf Process Syst 31 (2018).
[11]
Zhang J., Liu X., Li K., Human pose transfer by adaptive hierarchical deformation, Computer graphics forum, vol. 39, Wiley Online Library, 2020, pp. 325–337.
[12]
Huang S., Xiong H., Cheng Z.Q., Wang Q., Zhou X., Wen B., et al., Generating person images with appearance-aware pose stylizer, 2020, arXiv preprint arXiv:2007.09077.
[13]
Li T., Zhang W., Song R., Li Z., Liu J., Li X., et al., Pot-GAN: Pose transform GAN for person image synthesis, IEEE Trans Image Process 30 (2021) 7677–7688.
[14]
Esser P, Sutter E, Ommer B. A variational U-Net for conditional appearance and shape generation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2018, p. 8857–66.
[15]
Isola P, Zhu JY, Zhou T, Efros AA. Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2017, p. 1125–34.
[16]
Kingma D.P., Welling M., Auto-encoding variational bayes, 2013, arXiv preprint arXiv:1312.6114.
[17]
Li Y, Huang C, Loy CC. Dense intrinsic appearance flow for human pose transfer. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019, p. 3693–702.
[18]
Siarohin A, Sangineto E, Lathuiliere S, Sebe N. Deformable gans for pose-based human image generation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2018, p. 3408–16.
[19]
Li K., Zhang J., Liu Y., Lai Y.K., Dai Q., Pona: Pose-guided non-local attention for human pose transfer, IEEE Trans Image Process 29 (2020) 9584–9599.
[20]
Mahendran A, Vedaldi A. Understanding deep image representations by inverting them. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2015, p. 5188–96.
[21]
Ren Y, Yu X, Chen J, Li TH, Li G. Deep image spatial transformation for person image generation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020, p. 7690–9.
[22]
Yang L., Wang P., Liu C., Gao Z., Ren P., Zhang X., et al., Towards fine-grained human pose transfer with detail replenishing network, IEEE Trans Image Process 30 (2021) 2422–2435.
[23]
Men Y, Mao Y, Jiang Y, Ma WY, Lian Z. Controllable person image synthesis with attribute-decomposed gan. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020, p. 5084–93.
[24]
Ren Y, Yu X, Chen J, Li TH, Li G. Deep image spatial transformation for person image generation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020, p. 7690–9.
[25]
Li Y, Huang C, Loy CC. Dense intrinsic appearance flow for human pose transfer. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019, p. 3693–702.
[26]
Ren Y, Fan X, Li G, Liu S, Li TH. Neural Texture Extraction and Distribution for Controllable Person Image Synthesis. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022, p. 13535–44.
[27]
Zhou X., Yin M., Chen X., Sun L., Gao C., Li Q., Cross attention based style distribution for controllable person image synthesis, 2022, arXiv preprint arXiv:2208.00712.
[28]
Zheng L, Shen L, Tian L, Wang S, Wang J, Tian Q. Scalable person re-identification: A benchmark. In: Proceedings of the IEEE international conference on computer vision. 2015, p. 1116–24.
[29]
Liu Z, Luo P, Qiu S, Wang X, Tang X. Deepfashion: Powering robust clothes recognition and retrieval with rich annotations. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2016, p. 1096–104.
[30]
Zhu Z, Huang T, Shi B, Yu M, Wang B, Bai X. Progressive pose attention transfer for person image generation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019, p. 2347–56.
[31]
Han X, Wu Z, Wu Z, Yu R, Davis LS. Viton: An image-based virtual try-on network. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2018, p. 7543–52.
[32]
Tang H., Bai S., Zhang L., Torr P.H., Sebe N., Xinggan for person image generation, in: European conference on computer vision, Springer, 2020, pp. 717–734.
[33]
Chan C, Ginosar S, Zhou T, Efros AA. Everybody dance now. In: Proceedings of the IEEE/CVF international conference on computer vision. 2019, p. 5933–42.
[34]
Balakrishnan G, Zhao A, Dalca AV, Durand F, Guttag J. Synthesizing images of humans in unseen poses. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2018, p. 8340–8.
[35]
Zhang J, Li K, Lai YK, Yang J. Pise: Person image synthesis and editing with decoupled gan. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021, p. 7982–90.
[36]
Lv Z, Li X, Li X, Li F, Lin T, He D, et al. Learning semantic person image generation by region-adaptive normalization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021, p. 10806–15.
[37]
Tang H., Bai S., Torr P.H.S., Sebe N., Bipartite graph reasoning gans for person image generation, 2020, arXiv preprint arXiv:2008.04381.
[38]
Liu W, Piao Z, Min J, Luo W, Ma L, Gao S. Liquid warping gan: A unified framework for human motion imitation, appearance transfer and novel view synthesis. In: Proceedings of the IEEE/CVF international conference on computer vision. 2019, p. 5904–13.
[39]
Huang X, Belongie S. Arbitrary style transfer in real-time with adaptive instance normalization. In: Proceedings of the IEEE international conference on computer vision. 2017, p. 1501–10.
[40]
Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S. Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2017, p. 2117–25.
[41]
Simonyan K., Zisserman A., Very deep convolutional networks for large-scale image recognition, 2014, arXiv preprint arXiv:1409.1556.
[42]
Russakovsky O., Deng J., Su H., Krause J., Satheesh S., Ma S., et al., Imagenet large scale visual recognition challenge, Int J Comput Vis 115 (3) (2015) 211–252.
[43]
Gatys LA, Ecker AS, Bethge M. Image style transfer using convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2016, p. 2414–23.
[44]
Wang Z., Bovik A.C., Sheikh H.R., Simoncelli E.P., Image quality assessment: From error visibility to structural similarity, IEEE Trans Image Process 13 (4) (2004) 600–612.
[45]
Salimans T., Goodfellow I., Zaremba W., Cheung V., Radford A., Chen X., Improved techniques for training gans, Adv Neural Inf Process Syst 29 (2016).
[46]
Heusel M., Ramsauer H., Unterthiner T., Nessler B., Hochreiter S., Gans trained by a two time-scale update rule converge to a local nash equilibrium, Adv Neural Inf Process Syst 30 (2017).
[47]
Zhang R, Isola P, Efros AA, Shechtman E, Wang O. The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2018, p. 586–95.
[48]
Gong K, Liang X, Li Y, Chen Y, Yang M, Lin L. Instance-level human parsing via part grouping network. In: Proceedings of the European Conference on Computer Vision. 2018, p. 770–85.
[49]
Cao Z, Simon T, Wei SE, Sheikh Y. Realtime multi-person 2D pose estimation using part affinity fields. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2017, p. 7291–9.
[50]
Pala P., Seidenari L., Berretti S., Del Bimbo A., Enhanced skeleton and face 3D data for person re-identification from depth cameras, Comput Graph 79 (2019) 69–80.
[51]
Protopapadakis E., Voulodimos A., Doulamis A., Doulamis N., Stathaki T., Automatic crack detection for tunnel inspection using deep learning and heuristic image post-processing, Appl Intell 49 (7) (2019) 2793–2806.
[52]
Acharya U.R., Fujita H., Oh S.L., Hagiwara Y., Tan J.H., Adam M., et al., Deep convolutional neural network for the automated diagnosis of congestive heart failure using ECG signals, Appl Intell 49 (1) (2019) 16–27.
[53]
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2016, p. 770–8.
[54]
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z. Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2016, p. 2818–26.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Computers and Graphics
Computers and Graphics  Volume 111, Issue C
Apr 2023
230 pages

Publisher

Pergamon Press, Inc.

United States

Publication History

Published: 01 April 2023

Author Tags

  1. Person image generation
  2. Pose transfer
  3. Semantics attention
  4. Semantics mapping
  5. GAN

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media