Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

LoopNet for fine-grained fashion attributes editing

Published: 01 January 2025 Publication History

Abstract

Generative Adversarial Networks (GANs) have revolutionized the field of image synthesis by transforming randomly sampled latent codes into high-fidelity synthesized images. However, current methods fall short in manipulating a wide range of fashion attributes due to semantic ambiguity and lack of disentanglement. This work focuses on fashion attribute editing by proposing an encoder-based GAN inversion method, namely LoopNet. To enable high-fidelity image inversion and fine-grained attribute editing, it refines edited images through two encoder–decoder stages, utilizing predefined directions from principal component analysis on latent codes and canny loss for detail enhancement. Experiments show LoopNet’s effectiveness in attribute disentanglement and manipulation, outperforming seven state-of-the-art image inversion methods.

Highlights

A novel image inversion framework, LoopNet, designed for fashion attribute editing.
The use of semantic attribute directions to enhance model fidelity and editability.
The introduction of a Canny loss function to refine the details of inverted images.
Comprehensive experimental shows LoopNet’s ability in fashion attributes editing.

References

[1]
Abdal, R., Qin, Y., & Wonka, P. (2019). Image2stylegan: How to embed images into the stylegan latent space?. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 4432–4441).
[2]
Abdal, R., Qin, Y., & Wonka, P. (2020). Image2stylegan++: How to edit the embedded images?. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 8296–8305).
[3]
Alaluf, Y., Patashnik, O., & Cohen-Or, D. (2021). Restyle: A residual-based stylegan encoder via iterative refinement. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 6711–6720).
[4]
Alaluf, Y., Tov, O., Mokady, R., Gal, R., & Bermano, A. (2022). Hyperstyle: Stylegan inversion with hypernetworks for real image editing. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 18511–18521).
[5]
Bau D., Strobelt H., Peebles W., Wulff J., Zhou B., Zhu J.-Y., et al., Semantic photo manipulation with a generative image prior, 2020, arXiv preprint arXiv:2005.07727.
[6]
Bauer E., Kohavi R., An empirical comparison of voting classification algorithms: Bagging, boosting, and variants, Machine Learning 36 (1999) 105–139.
[7]
Brock A., Donahue J., Simonyan K., Large scale GAN training for high fidelity natural image synthesis, 2018, arXiv preprint arXiv:1809.11096.
[8]
Brooks, T., Holynski, A., & Efros, A. A. (2023). Instructpix2pix: Learning to follow image editing instructions. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 18392–18402).
[9]
Canny J., A computational approach to edge detection, IEEE Transactions on Pattern Analysis and Machine Intelligence (6) (1986) 679–698.
[10]
Chen Z., Jiang R., Duke B., Zhao H., Aarabi P., Exploring gradient-based multi-directional controls in gans, in: European conference on computer vision, Springer, 2022, pp. 104–119.
[11]
Chui M., Hazan E., Roberts R., Singla A., Smaje K., The economic potential of generative AI, 2023.
[12]
Creswell A., Bharath A.A., Inverting the generator of a generative adversarial network, IEEE Transactions on Neural Networks and Learning Systems 30 (7) (2018) 1967–1974.
[13]
Dalva Y., Altındiş S.F., Dundar A., Vecgan: Image-to-image translation with interpretable latent directions, in: European conference on computer vision, Springer, 2022, pp. 153–169.
[14]
Epstein Z., Hertzmann A., of Human Creativity I., Akten M., Farid H., Fjeld J., et al., Art and the science of generative AI, Science 380 (6650) (2023) 1110–1111.
[15]
Fu J., Li S., Jiang Y., Lin K.-Y., Qian C., Loy C.-C., et al., StyleGAN-human: A data-centric odyssey of human generation, 2022, arXiv preprint, arXiv:2204.11823.
[16]
Goetschalckx, L., Andonian, A., Oliva, A., & Isola, P. (2019). Ganalyze: Toward visual definitions of cognitive image properties. In Proceedings of the ieee/cvf international conference on computer vision (pp. 5744–5753).
[17]
Goodfellow I., Pouget-Abadie J., Mirza M., Xu B., Warde-Farley D., Ozair S., et al., Generative adversarial networks, Communications of the ACM 63 (11) (2020) 139–144.
[18]
Guo Z., Shao M., Li S., Image-to-image translation using an offset-based multi-scale codes GAN encoder, Visual Computer 40 (2) (2024) 699–715.
[19]
Härkönen E., Hertzmann A., Lehtinen J., Paris S., Ganspace: Discovering interpretable gan controls, Advances in Neural Information Processing Systems 33 (2020) 9841–9850.
[20]
Hu, X., Huang, Q., Shi, Z., Li, S., Gao, C., Sun, L., et al. (2022). Style transformer for image inversion and editing. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 11337–11346).
[21]
Johnson J., Alahi A., Fei-Fei L., Perceptual losses for real-time style transfer and super-resolution, in: Computer vision–ECCV 2016: 14th European conference, amsterdam, the netherlands, October 11-14, 2016, proceedings, part II 14, Springer, 2016, pp. 694–711.
[22]
Jovanovic M., Campbell M., Generative artificial intelligence: Trends and prospects, Computer 55 (10) (2022) 107–112.
[23]
Karras T., Aila T., Laine S., Lehtinen J., Progressive growing of gans for improved quality, stability, and variation, 2017, arXiv preprint arXiv:1710.10196.
[24]
Karras T., Aittala M., Laine S., Härkönen E., Hellsten J., Lehtinen J., et al., Alias-free generative adversarial networks, Advances in Neural Information Processing Systems 34 (2021) 852–863.
[25]
Karras, T., Laine, S., & Aila, T. (2019). A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 4401–4410).
[26]
Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., & Aila, T. (2020). Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 8110–8119).
[27]
Khodadadeh, S., Ghadar, S., Motiian, S., Lin, W.-A., Bölöni, L., & Kalarot, R. (2022). Latent to latent: A learned mapper for identity preserving editing of multiple face attributes in stylegan-generated images. In Proceedings of the IEEE/CVF winter conference on applications of computer vision (pp. 3184–3192).
[28]
Kwon G., Ye J.C., One-shot adaptation of gan in just one clip, IEEE Transactions on Pattern Analysis and Machine Intelligence (2023).
[29]
Li, Z., Cao, M., Wang, X., Qi, Z., Cheng, M.-M., & Shan, Y. (2024). Photomaker: Customizing realistic human photos via stacked id embedding. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 8640–8650).
[30]
Lu, Y., Tai, Y.-W., & Tang, C.-K. (2018). Attribute-guided face generation using conditional cyclegan. In Proceedings of the European conference on computer vision (pp. 282–297).
[31]
Pan, X., Tewari, A., Leimkühler, T., Liu, L., Meka, A., & Theobalt, C. (2023). Drag your gan: Interactive point-based manipulation on the generative image manifold. In ACM SIGGRAPH 2023 conference proceedings (pp. 1–11).
[32]
Parihar, R., Dhiman, A., & Karmali, T. (2022). Everything is there in latent space: Attribute editing and attribute style manipulation by stylegan latent space exploration. In Proceedings of the 30th ACM international conference on multimedia (pp. 1828–1836).
[33]
Pehlivan, H., Dalva, Y., & Dundar, A. (2023). Styleres: Transforming the residuals for real image editing with stylegan. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 1828–1837).
[34]
Richardson, E., Alaluf, Y., Patashnik, O., Nitzan, Y., Azar, Y., Shapiro, S., et al. (2021). Encoding in style: a stylegan encoder for image-to-image translation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 2287–2296).
[35]
Roich D., Mokady R., Bermano A.H., Cohen-Or D., Pivotal tuning for latent-based editing of real images, ACM Transactions on Graphics (TOG) 42 (1) (2022) 1–13.
[36]
Rostamzadeh N., Hosseini S., Boquet T., Stokowiec W., Zhang Y., Jauvin C., et al., Fashion-gen: The generative fashion dataset and challenge, 2018, arXiv preprint arXiv:1806.08317.
[37]
Shen, Y., Gu, J., Tang, X., & Zhou, B. (2020). Interpreting the latent space of gans for semantic face editing. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 9243–9252).
[38]
Shen Y., Yang C., Tang X., Zhou B., Interfacegan: Interpreting the disentangled face representation learned by gans, IEEE Transactions on Pattern Analysis and Machine Intelligence 44 (4) (2020) 2004–2018.
[39]
Tov O., Alaluf Y., Nitzan Y., Patashnik O., Cohen-Or D., Designing an encoder for stylegan image manipulation, ACM Transactions on Graphics 40 (4) (2021) 1–14.
[40]
Tutsoy O., Koç G.G., Deep self-supervised machine learning algorithms with a novel feature elimination and selection approaches for blood test-based multi-dimensional health risks classification, BMC Bioinformatics 25 (1) (2024) 103.
[41]
Wang Z., Bovik A.C., Sheikh H.R., Simoncelli E.P., Image quality assessment: from error visibility to structural similarity, IEEE Transactions on Image Processing 13 (4) (2004) 600–612.
[42]
Wang, T., Zhang, Y., Fan, Y., Wang, J., & Chen, Q. (2022). High-fidelity gan inversion for image attribute editing. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 11379–11388).
[43]
Xu J., Liu X., Wu Y., Tong Y., Li Q., Ding M., et al., Imagereward: Learning and evaluating human preferences for text-to-image generation, Advances in Neural Information Processing Systems 36 (2024).
[44]
Yang, X., Xu, X., & Chen, Y. (2023). Out-of-Domain GAN Inversion via Invertibility Decomposition for Photo-Realistic Human Face Manipulation. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 7492–7501).
[45]
Yao X., Newson A., Gousseau Y., Hellier P., A style-based gan encoder for high fidelity reconstruction of images and videos, in: European conference on computer vision, Springer, 2022, pp. 581–597.
[46]
Zhang, R., Isola, P., Efros, A. A., Shechtman, E., & Wang, O. (2018). The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 586–595).
[47]
Zhu J.-Y., Krähenbühl P., Shechtman E., Efros A.A., Generative visual manipulation on the natural image manifold, in: Computer vision–ECCV 2016: 14th European conference, amsterdam, the netherlands, October 11-14, 2016, proceedings, part v 14, Springer, 2016, pp. 597–613.
[48]
Zhu J., Shen Y., Xu Y., Zhao D., Chen Q., Region-based semantic factorization in GANs, in: International conference on machine learning, PMLR, 2022, pp. 27612–27632.
[49]
Zhu J., Shen Y., Xu Y., Zhao D., Chen Q., Zhou B., In-domain GAN inversion for faithful reconstruction and editability, IEEE Transactions on Pattern Analysis and Machine Intelligence (2024).
[50]
Zhu J., Shen Y., Zhao D., Zhou B., In-domain gan inversion for real image editing, in: European conference on computer vision, Springer, 2020, pp. 592–608.
[51]
Zhu J., Zhao D., Zhang B., Zhou B., Disentangled inference for GANs with latently invertible autoencoder, International Journal of Computer Vision 130 (5) (2022) 1259–1276.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Expert Systems with Applications: An International Journal
Expert Systems with Applications: An International Journal  Volume 259, Issue C
Jan 2025
1577 pages

Publisher

Pergamon Press, Inc.

United States

Publication History

Published: 01 January 2025

Author Tags

  1. Fashion editing
  2. Attribute disentanglement
  3. GAN inversion

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media