Abstract
Image-based virtual try-on with arbitrary poses has attracted many attentions recently. The purpose of this study is to synthesize a reference person image wearing a target clothes with a target pose. However, it is still a challenge for the existing methods to preserve the clothing details and person identity while generating fine-grained try-on images. To resolve the issues, we present a new detail-oriented virtual try-on network with arbitrary poses (DO-VTON). Specifically, our DO-VTON consists of three major modules: First, a semantic prediction module adopts a two-stage strategy to gradually predict a semantic map of the reference person. Second, a spatial alignment module warps the target clothes and non-target details to align with the target pose. Third, a try-on synthesis module generates final try-on images. Moreover, to generate high-quality images, we introduce a new multi-scale dilated convolution U-Net to enlarge the receptive field and capture context information. Extensive experiments on two famous benchmark datasets demonstrate our system achieves the state-of-the-art virtual try-on performance both qualitatively and quantitatively.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Brouet, R., Sheffer, A., Boissieux, L., Cani, M.P.: Design preserving garment transfer. ACM Trans. Graph. 31(4), Article-No (2012)
Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7291–7299 (2017)
Chang, Y., et al.: Dp-vton: toward detail-preserving image-based virtual try-on network. In: ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2295–2299. IEEE (2021)
Chen, W., et al.: Synthesizing training images for boosting human 3D pose estimation. In: 2016 Fourth International Conference on 3D Vision (3DV), pp. 479–488. IEEE (2016)
Dong, H., et al.: Towards multi-pose guided virtual try-on network. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9026–9035 (2019)
Ge, Y., Song, Y., Zhang, R., Ge, C., Liu, W., Luo, P.: Parser-free virtual try-on via distilling appearance flows. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8485–8493 (2021)
Gong, K., Liang, X., Zhang, D., Shen, X., Lin, L.: Look into person: self-supervised structure-sensitive learning and a new benchmark for human parsing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 932–940 (2017)
Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)
Guan, P., Reiss, L., Hirshberg, D.A., Weiss, A., Black, M.J.: Drape: dressing any person. ACM Trans. Graph. (TOG) 31(4), 1–10 (2012)
Han, X., Wu, Z., Wu, Z., Yu, R., Davis, L.S.: Viton: an image-based virtual try-on network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7543–7552 (2018)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: Gans trained by a two time-scale update rule converge to a local nash equilibrium. In: Advances in Neural Information Processing Systems, pp. 6626–6637 (2017)
Hsieh, C.W., Chen, C.Y., Chou, C.L., Shuai, H.H., Cheng, W.H.: Fit-me: image-based virtual try-on with arbitrary poses. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 4694–4698. IEEE (2019)
Hsieh, C.W., Chen, C.Y., Chou, C.L., Shuai, H.H., Liu, J., Cheng, W.H.: Fashionon: semantic-guided image-based virtual try-on with detailed human and clothing information. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 275–283 (2019)
Jetchev, N., Bergmann, U.: The conditional analogy gan: swapping fashion articles on people images. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 2287–2292 (2017)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Lee, H.J., Lee, R., Kang, M., Cho, M., Park, G.: La-viton: a network for looking-attractive virtual try-on. In: 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), pp. 3129–3132. IEEE (2019)
Lewis, K.M., Varadharajan, S., Kemelmacher-Shlizerman, I.: Vogue: try-on by stylegan interpolation optimization. arXiv preprint arXiv:2101.02285 (2021)
Ma, L., Jia, X., Sun, Q., Schiele, B., Tuytelaars, T., Van Gool, L.: Pose guided person image generation. arXiv preprint arXiv:1705.09368 (2017)
Miyato, T., Kataoka, T., Koyama, M., Yoshida, Y.: Spectral normalization for generative adversarial networks. arXiv preprint arXiv:1802.05957 (2018)
Neuberger, A., Borenstein, E., Hilleli, B., Oks, E., Alpert, S.: Image based virtual try-on network from unpaired data. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5184–5193 (2020)
Pons-Moll, G., Pujades, S., Hu, S., Black, M.J.: Clothcap: seamless 4D clothing capture and retargeting. ACM Trans. Graph. (TOG) 36(4), 1–15 (2017)
Raj, A., Sangkloy, P., Chang, H., Hays, J., Ceylan, D., Lu, J.: SwapNet: image based garment transfer. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11216, pp. 679–695. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01258-8_41
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Siarohin, A., Sangineto, E., Lathuiliere, S., Sebe, N.: Deformable gans for pose-based human image generation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3408–3416 (2018)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Song, S., Zhang, W., Liu, J., Mei, T.: Unsupervised person image generation with semantic parsing transformation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2357–2366 (2019)
Wang, B., Zheng, H., Liang, X., Chen, Y., Lin, L., Yang, M.: Toward characteristic-preserving image-based virtual try-on network. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 589–604 (2018)
Wang, J., Sha, T., Zhang, W., Li, Z., Mei, T.: Down to the last detail: virtual try-on with fine-grained details. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 466–474 (2020)
Wang, P., et al.: Understanding convolution for semantic segmentation. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1451–1460. IEEE (2018)
Wang, T.C., Liu, M.Y., Zhu, J.Y., Tao, A., Kautz, J., Catanzaro, B.: High-resolution image synthesis and semantic manipulation with conditional gans. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8798–8807 (2018)
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)
Wu, T., Tang, S., Zhang, R., Cao, J., Li, J.: Tree-structured kronecker convolutional network for semantic segmentation. In: 2019 IEEE International Conference on Multimedia and Expo (ICME), pp. 940–945. IEEE (2019)
Yang, H., Zhang, R., Guo, X., Liu, W., Zuo, W., Luo, P.: Towards photo-realistic virtual try-on by adaptively generating-preserving image content. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7850–7859 (2020)
Yu, R., Wang, X., Xie, X.: VTNFP: an image-based virtual try-on network with body and clothing feature preservation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10511–10520 (2019)
Zheng, N., Song, X., Chen, Z., Hu, L., Cao, D., Nie, L.: Virtually trying on new clothing with arbitrary poses. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 266–274 (2019)
Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2223–2232 (2017)
Acknowledgement
This work is supported in part by the Science Foundation of Hubei under Grant No. 2014CFB764 and Department of Education of the Hubei Province of China under Grant No. Q20131608, and Engineering Research Center of Hubei Province for Clothing Information.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Chang, Y. et al. (2022). Toward Detail-Oriented Image-Based Virtual Try-On with Arbitrary Poses. In: Þór Jónsson, B., et al. MultiMedia Modeling. MMM 2022. Lecture Notes in Computer Science, vol 13141. Springer, Cham. https://doi.org/10.1007/978-3-030-98358-1_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-98358-1_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-98357-4
Online ISBN: 978-3-030-98358-1
eBook Packages: Computer ScienceComputer Science (R0)