$$\textrm{D}^4$$ -VTON: Dynamic Semantics Disentangling for Differential Diffusion Based Virtual Try-On

Yang, Zhaotong; Jiang, Zicheng; Li, Xinzhe; Zhou, Huiyu; Dong, Junyu; Zhang, Huaidong; Du, Yong

doi:10.1007/978-3-031-72952-2_3

Zhaotong Yang¹³,
Zicheng Jiang¹³,
Xinzhe Li¹³,
Huiyu Zhou¹⁴,
Junyu Dong¹³,
Huaidong Zhang¹⁵ &
…
Yong Du¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15104))

Included in the following conference series:

European Conference on Computer Vision

198 Accesses

Abstract

In this paper, we introduce $\textrm{D}^4$-VTON, an innovative solution for image-based virtual try-on. We address challenges from previous studies, such as semantic inconsistencies before and after garment warping, and reliance on static, annotation-driven clothing parsers. Additionally, we tackle the complexities in diffusion-based VTON models when handling simultaneous tasks like inpainting and denoising. Our approach utilizes two key technologies: Firstly, Dynamic Semantics Disentangling Modules (DSDMs) extract abstract semantic information from garments to create distinct local flows, improving precise garment warping in a self-discovered manner. Secondly, by integrating a Differential Information Tracking Path (DITP), we establish a novel diffusion-based VTON paradigm. This path captures differential information between incomplete try-on inputs and their complete versions, enabling the network to handle multiple degradations independently, thereby minimizing learning ambiguities and achieving realistic results with minimal overhead. Extensive experiments demonstrate that $\textrm{D}^4$-VTON significantly outperforms existing methods in both quantitative metrics and qualitative evaluations, demonstrating its capability in generating realistic images and ensuring semantic consistency. Code is available at https://github.com/Jerome-Young/D4-VTON.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.99; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Improving Virtual Try-On with Garment-Focused Diffusion Models

Time-Efficient and Identity-Consistent Virtual Try-On Using A Variant of Altered Diffusion Models

MagicEraser: Erasing Any Objects via Semantics-Aware Control

Notes

1.
For simplicity, we illustrate the case with $N=3$ in Fig. 2.

References

Bai, S., Zhou, H., Li, Z., Zhou, C., Yang, H.: Single stage virtual try-on via deformable attention flows. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13675, pp. 409–425. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19784-0_24
Chapter Google Scholar
Bińkowski, M., Sutherland, D.J., Arbel, M., Gretton, A.: Demystifying mmd GANs. In: ICLR (2018)
Google Scholar
Bookstein, F.L.: Principal warps: thin-plate splines and the decomposition of deformations. IEEE TPAMI 11(6), 567–585 (1989)
Article Google Scholar
Chen, C.Y., Chen, Y.C., Shuai, H.H., Cheng, W.H.: Size does matter: size-aware virtual try-on via clothing-oriented transformation try-on network. In: ICCV, pp. 7513–7522 (2023)
Google Scholar
Choi, S., Park, S., Lee, M., Choo, J.: Viton-HD: high-resolution virtual try-on via misalignment-aware normalization. In: CVPR, pp. 14131–14140 (2021)
Google Scholar
Du, Y., et al.: One-for-all: towards universal domain translation with a single stylegan. arXiv preprint arXiv:2310.14222 (2023)
Fele, B., Lampe, A., Peer, P., Struc, V.: C-VTON: context-driven image-based virtual try-on network. In: WACV, pp. 3144–3153 (2022)
Google Scholar
Ge, Y., Song, Y., Zhang, R., Ge, C., Liu, W., Luo, P.: Parser-free virtual try-on via distilling appearance flows. In: CVPR, pp. 8485–8493 (2021)
Google Scholar
Goodfellow, I., et al.: Generative adversarial nets. In: NeurIPS, vol. 27 (2014)
Google Scholar
Gou, J., Sun, S., Zhang, J., Si, J., Qian, C., Zhang, L.: Taming the power of diffusion models for high-quality virtual try-on with appearance flow. In: ACM MM, pp. 7599–7607 (2023)
Google Scholar
Han, X., Hu, X., Huang, W., Scott, M.R.: Clothflow: a flow-based model for clothed person generation. In: ICCV, pp. 10471–10480 (2019)
Google Scholar
Han, X., Wu, Z., Wu, Z., Yu, R., Davis, L.S.: Viton: an image-based virtual try-on network. In: CVPR, pp. 7543–7552 (2018)
Google Scholar
He, S., Song, Y.Z., Xiang, T.: Style-based global appearance flow for virtual try-on. In: CVPR, pp. 3470–3479 (2022)
Google Scholar
Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. In: NeurIPS, vol. 33, pp. 6840–6851 (2020)
Google Scholar
Jang, E., Gu, S., Poole, B.: Categorical reparameterization with gumbel-softmax. arXiv preprint arXiv:1611.01144 (2016)
Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: CVPR, pp. 4401–4410 (2019)
Google Scholar
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013)
Lee, S., Gu, G., Park, S., Choi, S., Choo, J.: High-resolution virtual try-on with misalignment and occlusion-handled conditions. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13677, pp. 204–219. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19790-1_13
Chapter Google Scholar
Li, Z., Wei, P., Yin, X., Ma, Z., Kot, A.C.: Virtual try-on with pose-garment keypoints guided inpainting. In: ICCV, pp. 22788–22797 (2023)
Google Scholar
Li, Z., et al.: Grouplane: end-to-end 3D lane detection with channel-wise grouping. In: ICLR (2024)
Google Scholar
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: CVPR, pp. 2117–2125 (2017)
Google Scholar
Maddison, C.J., Mnih, A., Teh, Y.W.: The concrete distribution: a continuous relaxation of discrete random variables. arXiv preprint arXiv:1611.00712 (2016)
Morelli, D., Baldrati, A., Cartella, G., Cornia, M., Bertini, M., Cucchiara, R.: LaDI-VTON: latent diffusion textual-inversion enhanced virtual try-on. In: ACM MM, pp. 8580–8589 (2023)
Google Scholar
Morelli, D., Fincato, M., Cornia, M., Landi, F., Cesari, F., Cucchiara, R.: Dress code: high-resolution multi-category virtual try-on. In: CVPR, pp. 2231–2235 (2022)
Google Scholar
Parmar, G., Zhang, R., Zhu, J.Y.: On aliased resizing and surprising subtleties in GAN evaluation. In: CVPR, pp. 11410–11420 (2022)
Google Scholar
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: ICML, pp. 8748–8763. PMLR (2021)
Google Scholar
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: CVPR, pp. 10684–10695 (2022)
Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Shim, S.H., Chung, J., Heo, J.P.: Towards squeezing-averse virtual try-on via sequential deformation. In: AAAI, vol. 38, pp. 4856–4863 (2024)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., Ganguli, S.: Deep unsupervised learning using nonequilibrium thermodynamics. In: ICML, pp. 2256–2265. PMLR (2015)
Google Scholar
Song, H., Du, Y., Xiang, T., Dong, J., Qin, J., He, S.: Editing out-of-domain GAN inversion via differential activations. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13677, pp. 1–17. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19790-1_1
Chapter Google Scholar
Song, J., Meng, C., Ermon, S.: Denoising diffusion implicit models. In: ICLR (2020)
Google Scholar
Tang, J., Zheng, G., Shi, C., Yang, S.: Contrastive grouping with transformer for referring image segmentation. In: CVPR, pp. 23570–23580 (2023)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: NeurlPS, vol. 30 (2017)
Google Scholar
Wang, B., Zheng, H., Liang, X., Chen, Y., Lin, L., Yang, M.: Toward characteristic-preserving image-based virtual try-on network. In: ECCV, pp. 589–604 (2018)
Google Scholar
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE TIP 13(4), 600–612 (2004)
Google Scholar
Wei, Y., Ji, Z., Wu, X., Bai, J., Zhang, L., Zuo, W.: Inferring and leveraging parts from object shape for improving semantic image synthesis. In: CVPR, pp. 11248–11258 (2023)
Google Scholar
Xie, Z., et al.: GP-VTON: towards general purpose virtual try-on via collaborative local-flow global-parsing learning. In: CVPR, pp. 23550–23559 (2023)
Google Scholar
Xie, Z., Huang, Z., Zhao, F., Dong, H., Kampffmeyer, M., Liang, X.: Towards scalable unpaired virtual try-on via patch-routed spatially-adaptive GAN. In: NeurIPS, vol. 34, pp. 2598–2610 (2021)
Google Scholar
Xu, C., et al.: Learning dynamic alignment via meta-filter for few-shot learning. In: CVPR, pp. 5182–5191 (2021)
Google Scholar
Xu, Y., Du, Y., Xiao, W., Xu, X., He, S.: From continuity to editability: inverting GANs with consecutive images. In: ICCV, pp. 13910–13918 (2021)
Google Scholar
Yang, B., et al.: Paint by example: exemplar-based image editing with diffusion models. In: CVPR, pp. 18381–18391 (2023)
Google Scholar
Yang, H., Zhang, R., Guo, X., Liu, W., Zuo, W., Luo, P.: Towards photo-realistic virtual try-on by adaptively generating-preserving image content. In: CVPR, pp. 7850–7859 (2020)
Google Scholar
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: CVPR, pp. 586–595 (2018)
Google Scholar
Zhou, T., Tulsiani, S., Sun, W., Malik, J., Efros, A.A.: View synthesis by appearance flow. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 286–301. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_18
Chapter Google Scholar
Zhou, Y., Xu, Y., Du, Y., Wen, Q., He, S.: Pro-pulse: learning progressive encoders of latent semantics in GANs for photo upsampling. IEEE TIP 31, 1230–1242 (2022)
Google Scholar

Download references

Acknowledgement

This project is supported by the National Natural Science Foundation of China (62102381, 41927805); Shandong Natural Science Foundation (ZR2021QF035); the National Key R&D Program of China (2022ZD0117201); and the China Postdoctoral Science Foundation (2020M682240, 2021T140631).

Author information

Authors and Affiliations

Ocean University of China, Qingdao, China
Zhaotong Yang, Zicheng Jiang, Xinzhe Li, Junyu Dong & Yong Du
University of Leicester, Leicester, UK
Huiyu Zhou
South China University of Technology, Guangzhou, China
Huaidong Zhang

Authors

Zhaotong Yang
View author publications
You can also search for this author in PubMed Google Scholar
Zicheng Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Xinzhe Li
View author publications
You can also search for this author in PubMed Google Scholar
Huiyu Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Junyu Dong
View author publications
You can also search for this author in PubMed Google Scholar
Huaidong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yong Du
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yong Du .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Hessen, Germany
Stefan Roth
Princeton University, Palo Alto, CA, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 24465 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yang, Z. et al. (2025). $\textrm{D}^4$-VTON: Dynamic Semantics Disentangling for Differential Diffusion Based Virtual Try-On. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15104. Springer, Cham. https://doi.org/10.1007/978-3-031-72952-2_3

Download citation

DOI: https://doi.org/10.1007/978-3-031-72952-2_3
Published: 01 October 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72951-5
Online ISBN: 978-3-031-72952-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

\(\textrm{D}^4\)-VTON: Dynamic Semantics Disentangling for Differential Diffusion Based Virtual Try-On

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Improving Virtual Try-On with Garment-Focused Diffusion Models

Time-Efficient and Identity-Consistent Virtual Try-On Using A Variant of Altered Diffusion Models

MagicEraser: Erasing Any Objects via Semantics-Aware Control

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 24465 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

\(\textrm{D}^4\)-VTON: Dynamic Semantics Disentangling for Differential Diffusion Based Virtual Try-On

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Improving Virtual Try-On with Garment-Focused Diffusion Models

Time-Efficient and Identity-Consistent Virtual Try-On Using A Variant of Altered Diffusion Models

MagicEraser: Erasing Any Objects via Semantics-Aware Control

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 24465 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation