Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Toward Detail-Oriented Image-Based Virtual Try-On with Arbitrary Poses

  • Conference paper
  • First Online:
MultiMedia Modeling (MMM 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13141))

Included in the following conference series:

Abstract

Image-based virtual try-on with arbitrary poses has attracted many attentions recently. The purpose of this study is to synthesize a reference person image wearing a target clothes with a target pose. However, it is still a challenge for the existing methods to preserve the clothing details and person identity while generating fine-grained try-on images. To resolve the issues, we present a new detail-oriented virtual try-on network with arbitrary poses (DO-VTON). Specifically, our DO-VTON consists of three major modules: First, a semantic prediction module adopts a two-stage strategy to gradually predict a semantic map of the reference person. Second, a spatial alignment module warps the target clothes and non-target details to align with the target pose. Third, a try-on synthesis module generates final try-on images. Moreover, to generate high-quality images, we introduce a new multi-scale dilated convolution U-Net to enlarge the receptive field and capture context information. Extensive experiments on two famous benchmark datasets demonstrate our system achieves the state-of-the-art virtual try-on performance both qualitatively and quantitatively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Brouet, R., Sheffer, A., Boissieux, L., Cani, M.P.: Design preserving garment transfer. ACM Trans. Graph. 31(4), Article-No (2012)

    Google Scholar 

  2. Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7291–7299 (2017)

    Google Scholar 

  3. Chang, Y., et al.: Dp-vton: toward detail-preserving image-based virtual try-on network. In: ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2295–2299. IEEE (2021)

    Google Scholar 

  4. Chen, W., et al.: Synthesizing training images for boosting human 3D pose estimation. In: 2016 Fourth International Conference on 3D Vision (3DV), pp. 479–488. IEEE (2016)

    Google Scholar 

  5. Dong, H., et al.: Towards multi-pose guided virtual try-on network. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9026–9035 (2019)

    Google Scholar 

  6. Ge, Y., Song, Y., Zhang, R., Ge, C., Liu, W., Luo, P.: Parser-free virtual try-on via distilling appearance flows. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8485–8493 (2021)

    Google Scholar 

  7. Gong, K., Liang, X., Zhang, D., Shen, X., Lin, L.: Look into person: self-supervised structure-sensitive learning and a new benchmark for human parsing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 932–940 (2017)

    Google Scholar 

  8. Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)

    Google Scholar 

  9. Guan, P., Reiss, L., Hirshberg, D.A., Weiss, A., Black, M.J.: Drape: dressing any person. ACM Trans. Graph. (TOG) 31(4), 1–10 (2012)

    Article  Google Scholar 

  10. Han, X., Wu, Z., Wu, Z., Yu, R., Davis, L.S.: Viton: an image-based virtual try-on network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7543–7552 (2018)

    Google Scholar 

  11. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  12. Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: Gans trained by a two time-scale update rule converge to a local nash equilibrium. In: Advances in Neural Information Processing Systems, pp. 6626–6637 (2017)

    Google Scholar 

  13. Hsieh, C.W., Chen, C.Y., Chou, C.L., Shuai, H.H., Cheng, W.H.: Fit-me: image-based virtual try-on with arbitrary poses. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 4694–4698. IEEE (2019)

    Google Scholar 

  14. Hsieh, C.W., Chen, C.Y., Chou, C.L., Shuai, H.H., Liu, J., Cheng, W.H.: Fashionon: semantic-guided image-based virtual try-on with detailed human and clothing information. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 275–283 (2019)

    Google Scholar 

  15. Jetchev, N., Bergmann, U.: The conditional analogy gan: swapping fashion articles on people images. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 2287–2292 (2017)

    Google Scholar 

  16. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  17. Lee, H.J., Lee, R., Kang, M., Cho, M., Park, G.: La-viton: a network for looking-attractive virtual try-on. In: 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), pp. 3129–3132. IEEE (2019)

    Google Scholar 

  18. Lewis, K.M., Varadharajan, S., Kemelmacher-Shlizerman, I.: Vogue: try-on by stylegan interpolation optimization. arXiv preprint arXiv:2101.02285 (2021)

  19. Ma, L., Jia, X., Sun, Q., Schiele, B., Tuytelaars, T., Van Gool, L.: Pose guided person image generation. arXiv preprint arXiv:1705.09368 (2017)

  20. Miyato, T., Kataoka, T., Koyama, M., Yoshida, Y.: Spectral normalization for generative adversarial networks. arXiv preprint arXiv:1802.05957 (2018)

  21. Neuberger, A., Borenstein, E., Hilleli, B., Oks, E., Alpert, S.: Image based virtual try-on network from unpaired data. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5184–5193 (2020)

    Google Scholar 

  22. Pons-Moll, G., Pujades, S., Hu, S., Black, M.J.: Clothcap: seamless 4D clothing capture and retargeting. ACM Trans. Graph. (TOG) 36(4), 1–15 (2017)

    Article  Google Scholar 

  23. Raj, A., Sangkloy, P., Chang, H., Hays, J., Ceylan, D., Lu, J.: SwapNet: image based garment transfer. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11216, pp. 679–695. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01258-8_41

    Chapter  Google Scholar 

  24. Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28

    Chapter  Google Scholar 

  25. Siarohin, A., Sangineto, E., Lathuiliere, S., Sebe, N.: Deformable gans for pose-based human image generation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3408–3416 (2018)

    Google Scholar 

  26. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  27. Song, S., Zhang, W., Liu, J., Mei, T.: Unsupervised person image generation with semantic parsing transformation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2357–2366 (2019)

    Google Scholar 

  28. Wang, B., Zheng, H., Liang, X., Chen, Y., Lin, L., Yang, M.: Toward characteristic-preserving image-based virtual try-on network. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 589–604 (2018)

    Google Scholar 

  29. Wang, J., Sha, T., Zhang, W., Li, Z., Mei, T.: Down to the last detail: virtual try-on with fine-grained details. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 466–474 (2020)

    Google Scholar 

  30. Wang, P., et al.: Understanding convolution for semantic segmentation. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1451–1460. IEEE (2018)

    Google Scholar 

  31. Wang, T.C., Liu, M.Y., Zhu, J.Y., Tao, A., Kautz, J., Catanzaro, B.: High-resolution image synthesis and semantic manipulation with conditional gans. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8798–8807 (2018)

    Google Scholar 

  32. Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)

    Article  Google Scholar 

  33. Wu, T., Tang, S., Zhang, R., Cao, J., Li, J.: Tree-structured kronecker convolutional network for semantic segmentation. In: 2019 IEEE International Conference on Multimedia and Expo (ICME), pp. 940–945. IEEE (2019)

    Google Scholar 

  34. Yang, H., Zhang, R., Guo, X., Liu, W., Zuo, W., Luo, P.: Towards photo-realistic virtual try-on by adaptively generating-preserving image content. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7850–7859 (2020)

    Google Scholar 

  35. Yu, R., Wang, X., Xie, X.: VTNFP: an image-based virtual try-on network with body and clothing feature preservation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10511–10520 (2019)

    Google Scholar 

  36. Zheng, N., Song, X., Chen, Z., Hu, L., Cao, D., Nie, L.: Virtually trying on new clothing with arbitrary poses. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 266–274 (2019)

    Google Scholar 

  37. Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2223–2232 (2017)

    Google Scholar 

Download references

Acknowledgement

This work is supported in part by the Science Foundation of Hubei under Grant No. 2014CFB764 and Department of Education of the Hubei Province of China under Grant No. Q20131608, and Engineering Research Center of Hubei Province for Clothing Information.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tao Peng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chang, Y. et al. (2022). Toward Detail-Oriented Image-Based Virtual Try-On with Arbitrary Poses. In: Þór Jónsson, B., et al. MultiMedia Modeling. MMM 2022. Lecture Notes in Computer Science, vol 13141. Springer, Cham. https://doi.org/10.1007/978-3-030-98358-1_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-98358-1_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-98357-4

  • Online ISBN: 978-3-030-98358-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics