Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1007/978-981-97-8505-6_28guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

GAN-Diffusion Relay Model: Advancing Semantic Image Synthesis

Published: 07 November 2024 Publication History

Abstract

Semantic image synthesis, involves the transformation of semantic layouts into realistic images, is aimed at comprehending and leveraging given semantic information. Despite recent impressive advancements, challenges persist in terms of fidelity, semantic alignment, and training stability. To enhance the generation quality and semantic alignment in semantic image synthesis, we have reengineered the noise mapping and semantic space embedding, proposing a novel semantic image synthesis model, GAN-Diffusion Relay Model (GDRM), based on GAN and relay diffusion model. Extensive experiments on benchmark datasets validate the effectiveness of our proposed approach, achieving state-of-the-art performance in terms of fidelity (FID) and diversity (LPIPS).

References

[1]
Balaji, Y., Nah, S., Huang, X., Vahdat, A., Song, J., Kreis, K., Liu, M.: Ediffi: Text-to-image diffusion models with an ensemble of expert denoisers. arxiv 2022. arXiv preprint arXiv:2211.01324 (2022)
[2]
Chen, Q., Koltun, V.: Photographic image synthesis with cascaded refinement networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1511–1520 (2017)
[3]
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
[4]
Dhariwal P and Nichol A Diffusion models beat gans on image synthesis Adv. Neural. Inf. Process. Syst. 2021 34 8780-8794
[5]
Eastwood, C., Williams, C.K.: A framework for the quantitative evaluation of disentangled representations. In: International Conference on Learning Representations (2018)
[6]
Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A.C., Bengio, Y.: Generative adversarial networks. Comm. ACM 63, 139–144 (2014), https://api.semanticscholar.org/CorpusID:1033682
[7]
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: Gans trained by a two time-scale update rule converge to a local nash equilibrium. Adv. Neural Inf. Process. Syst. 30 (2017)
[8]
Ho J, Jain A, and Abbeel P Denoising diffusion probabilistic models Adv. Neural. Inf. Process. Syst. 2020 33 6840-6851
[9]
Ho J, Saharia C, Chan W, Fleet DJ, Norouzi M, and Salimans T Cascaded diffusion models for high fidelity image generation J. Mach. Learn. Res. 2022 23 47 1-33
[10]
Hoogeboom, E., Heek, J., Salimans, T.: simple diffusion: End-to-end diffusion for high resolution images. In: International Conference on Machine Learning, pp. 13213–13232. PMLR (2023)
[11]
IsolaP, Z., Zhou, T., et al.: Image to imagetranslation withconditionaladversarialnetworks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Hawaii, USA 1125, 1134 (2017)
[12]
Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196 (2017)
[13]
Karras T, Aittala M, Aila T, and Laine S Elucidating the design space of diffusion-based generative models Adv. Neural. Inf. Process. Syst. 2022 35 26565-26577
[14]
Klinker F Exponential moving average versus moving exponential average Math. Semesterber. 2011 58 97-107
[15]
Liu, X., Yin, G., Shao, J., Wang, X., et al.: Learning to predict layout-to-image conditional convolutions for semantic image synthesis. Adv. Neural Inf. Process. Syst. 32 (2019)
[16]
Lv, Z., Li, X., Niu, Z., Cao, B., Zuo, W.: Semantic-shape adaptive feature modulation for semantic image synthesis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11214–11223 (2022)
[17]
Park, T., Liu, M.Y., Wang, T.C., Zhu, J.Y.: Semantic image synthesis with spatially-adaptive normalization. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2332–2341 (2019), https://api.semanticscholar.org/CorpusID:81981856
[18]
Ramesh, A., Pavlov, M., Goh, G., Gray, S., Voss, C., Radford, A., Chen, M., Sutskever, I.: Zero-shot text-to-image generation. In: International Conference on Machine Learning, pp. 8821–8831. Pmlr (2021)
[19]
Rissanen, S., Heinonen, M., Solin, A.: Generative modelling with inverse heat dissipation. arXiv preprint arXiv:2206.13397 (2022)
[20]
Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18, pp. 234–241. Springer (2015)
[21]
Ruiz, N., Li, Y., Jampani, V., Pritch, Y., Rubinstein, M., Aberman, K.: Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 22500–22510 (2023)
[22]
Saharia C, Chan W, Saxena S, Li L, Whang J, Denton EL, Ghasemipour K, Gontijo Lopes R, Karagol Ayan B, Salimans T, et al. Photorealistic text-to-image diffusion models with deep language understanding Adv. Neural. Inf. Process. Syst. 2022 35 36479-36494
[23]
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
[24]
Song, J., Meng, C., Ermon, S.: Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502 (2020)
[25]
Song, Y., Sohl-Dickstein, J., Kingma, D.P., Kumar, A., Ermon, S., Poole, B.: Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456 (2020)
[26]
Sushko, V., Schönfeld, E., Zhang, D., Gall, J., Schiele, B., Khoreva, A.: You only need adversarial supervision for semantic image synthesis. arXiv preprint arXiv:2012.04781 (2020)
[27]
Tan, Z., Chai, M., Chen, D., Liao, J., Chu, Q., Liu, B., Hua, G., Yu, N.: Diverse semantic image synthesis via probability distribution modeling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7962–7971 (2021)
[28]
Tan Z, Chen D, Chu Q, Chai M, Liao J, He M, Yuan L, Hua G, and Yu N Efficient semantic image synthesis via class-adaptive normalization IEEE Trans. Pattern Anal. Mach. Intell. 2021 44 9 4852-4866
[29]
Tang, H., Bai, S., Sebe, N.: Dual attention gans for semantic image synthesis. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 1994–2002 (2020)
[30]
Teng, J., Zheng, W., Ding, M., Hong, W., Wangni, J., Yang, Z., Tang, J.: Relay diffusion: Unifying diffusion process across resolutions for image synthesis. arXiv preprint arXiv:2309.03350 (2023)
[31]
Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Adv. Neural Inf. Process. Syst. 30 (2017)
[32]
Wang, T., Zhang, T., Zhang, B., Ouyang, H., Chen, D., Chen, Q., Wen, F.: Pretraining is all you need for image-to-image translation. arXiv preprint arXiv:2205.12952 (2022)
[33]
Wang, T.C., Liu, M.Y., Zhu, J.Y., Tao, A., Kautz, J., Catanzaro, B.: High-resolution image synthesis and semantic manipulation with conditional gans. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8798–8807 (2018)
[34]
Wang, W., Bao, J., Zhou, W., Chen, D., Chen, D., Yuan, L., Li, H.: Semantic image synthesis via diffusion models. arXiv preprint arXiv:2207.00050 (2022)
[35]
Wu, Y., He, K.: Group normalization. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19 (2018)
[36]
Zhan, F., Yu, Y., Wu, R., Zhang, J., Lu, S., Liu, L., Kortylewski, A., Theobalt, C., Xing, E.: Multimodal image synthesis and editing: A survey. arXiv preprint arXiv:2112.13592 (2022)
[37]
Zhang, L., Rao, A., Agrawala, M.: Adding conditional control to text-to-image diffusion models. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3836–3847 (2023)
[38]
Zhou, B., Zhao, H., Puig, X., Fidler, S., Barriuso, A., Torralba, A.: Scene parsing through ade20k dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)
[39]
Zhou, B., Zhao, H., Puig, X., Fidler, S., Barriuso, A., Torralba, A.: Scene parsing through ade20k dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 633–641 (2017)
[40]
Zhu, P., Abdal, R., Qin, Y., Wonka, P.: Sean: Image synthesis with semantic region-adaptive normalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5104–5113 (2020)

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
Pattern Recognition and Computer Vision: 7th Chinese Conference, PRCV 2024, Urumqi, China, October 18–20, 2024, Proceedings, Part IV
Oct 2024
523 pages
ISBN:978-981-97-8504-9
DOI:10.1007/978-981-97-8505-6

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 07 November 2024

Author Tags

  1. Semantic image synthesis
  2. GAN
  3. Diffusion model

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 25 Jan 2025

Other Metrics

Citations

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media