Article

CA-GAN: Conditional Adaptive Generative Adversarial Network for Text-to-Image Synthesis

Authors:

Junpeng Liu and

Hengkang BaoAuthors Info & Claims

MultiMedia Modeling: 30th International Conference, MMM 2024, Amsterdam, The Netherlands, January 29 – February 2, 2024, Proceedings, Part III

January 2024

Pages 299 - 312

https://doi.org/10.1007/978-3-031-53311-2_22

Published: 29 January 2024 Publication History

Abstract

Text-to-image synthesis has been a popular multimodal task in recent years, which faces two major challenges: the semantic consistency and the fine-grained information loss. Existing methods mostly adopt either a multi-stage stacked architecture or a single-stream model with several affine transformations as the fusion block. The former requires additional networks to ensure the semantic consistency between text and image, which is complex and results in poor generation quality. The latter simply extracts affine transformation from Conditional Batch Normalization (CBN), which can not match text features well. To address these issues, we propose an effective Conditional Adaptive Generative Adversarial Network. Our proposed method (i.e., CA-GAN) adopts a single-stream network architecture, consisting of a single generator/discriminator pair. To be specific, we propose: (1) a conditional adaptive instance normalization residual block which promotes the generator to synthesize high quality images containing semantic information; (2) an attention block that focuses on image-related channels and pixels. We conduct extensive experiments on CUB and COCO datasets, and the results show the superiority of the proposed CA-GAN in text-to-image synthesis tasks compared with previous methods.

References

[1]

Goodfellow, I., et al.: Generative adversarial nets. Adv. Neural Inf. Process. Syst. 27 (2014)

[2]

Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local Nash equilibrium. Advances in Neural Information Processing Systems 30 (2017)

[3]

Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)

[4]

Huang S and Chen Y Generative adversarial networks with adaptive semantic normalization for text-to-image synthesis Digital. Signal Proc. 2022 120

Digital Library

[5]

Huang, X., Belongie, S.: Arbitrary style transfer in real-time with adaptive instance normalization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1501–1510 (2017)

[6]

Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4401–4410 (2019)

[7]

Li, B., Qi, X., Lukasiewicz, T., Torr, P.: Controllable text-to-image generation. Adv. Neural Inf. Process. Syst. 32 (2019)

[8]

Lim, J.H., Ye, J.C.: Geometric GAN. arXiv preprint. arXiv:1705.02894 (2017)

[9]

Lin Tsung-Yi, Maire Michael, Belongie Serge, Hays James, Perona Pietro, Ramanan Deva, Dollár Piotr, and Zitnick C. Lawrence Fleet David, Pajdla Tomas, Schiele Bernt, and Tuytelaars Tinne Microsoft COCO: Common Objects in Context Computer Vision – ECCV 2014 2014 Cham Springer 740-755

[10]

Qiao, T., Zhang, J., Xu, D., Tao, D.: MirrorGAN: Learning text-to-image generation by redescription. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1505–1514 (2019)

[11]

Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text to image synthesis. In: International Conference on Machine Learning, pp. 1060–1069. PMLR (2016)

[12]

Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training GANS. Adv. Neural Inf. Process. Syst. 29 (2016)

[13]

Schuster M and Paliwal KK Bidirectional recurrent neural networks IEEE Trans. Signal Process. 1997 45 11 2673-2681

Digital Library

[14]

Tao, M., Tang, H., Wu, F., Jing, X.Y., Bao, B.K., Xu, C.: DF-GAN: A simple and effective baseline for text-to-image synthesis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16515–16525 (2022)

[15]

Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The Caltech-UCSD birds-200-2011 dataset (2011)

[16]

Xu, T., et al.: Attngan: Fine-grained text to image generation with attentional generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1316–1324 (2018)

[17]

yang Y et al. Þór Jónsson Björn, Gurrin Cathal, Tran Minh-Triet, Dang-Nguyen Duc-Tien, Hu Anita Min-Chun, Huynh Thi Thanh Binh, Huet Benoit, et al. MF-GAN: Multi-conditional Fusion Generative Adversarial Network for Text-to-Image Synthesis MultiMedia Modeling 2022 Cham Springer 41-53

Digital Library

[18]

Yin, G., Liu, B., Sheng, L., Yu, N., Wang, X., Shao, J.: Semantics disentangling for text-to-image generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2327–2336 (2019)

[19]

Zhang, H., Goodfellow, I., Metaxas, D., Odena, A.: Self-attention generative adversarial networks. In: International Conference on Machine Learning, pp. 7354–7363. PMLR (2019)

[20]

Zhang, H., et al.: Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5907–5915 (2017)

[21]

Zhang, Z., Schomaker, L.: DTGAN: Dual attention generative adversarial networks for text-to-image generation. In: 2021 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2021)

[22]

Zhu, J., Li, Z., Ma, H.: TT2INet: Text to photo-realistic image synthesis with transformer as text encoder. In: 2021 International Joint Conference on Neural Networks (IJCNN). pp. 1–8. IEEE (2021)

[23]

Zhu, M., Pan, P., Chen, W., Yang, Y.: Dm-gan: Dynamic memory generative adversarial networks for text-to-image synthesis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5802–5810 (2019)

Recommendations

MF-GAN: Multi-conditional Fusion Generative Adversarial Network for Text-to-Image Synthesis
MultiMedia Modeling
Abstract
The performance of text-to-image synthesis has been significantly boosted accompanied by the development of generative adversarial network (GAN) techniques. The current GAN-based methods for text-to-image generation mainly adopt multiple generator-...
Read More
CSA-GAN: Cyclic synthesized attention guided generative adversarial network for face synthesis
Abstract
Generative Adversarial Network (GAN) is one of the recent developments in the area of deep learning to transform the images from one domain to another domain. While transforming the images, we need to make sure that the background information ...
Read More
A Comparative Study of Generative Adversarial Networks for Text-to-Image Synthesis

Text-to-picture alludes to the conversion of a textual description into a semantically similar image.The automatic synthesis of top-quality pictures from text portrayals is both exciting and useful at the same time.Current AI systems have shown ...
Read More

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

MultiMedia Modeling: 30th International Conference, MMM 2024, Amsterdam, The Netherlands, January 29 – February 2, 2024, Proceedings, Part III

Jan 2024

551 pages

ISBN:978-3-031-53310-5

DOI:10.1007/978-3-031-53311-2

Editors:
Stevan Rudinac
https://ror.org/04dkp9463University of Amsterdam, Amsterdam, The Netherlands
,
Alan Hanjalic
https://ror.org/02e2c7k09Delft University of Technology, Delft, The Netherlands
,
Cynthia Liem
https://ror.org/02e2c7k09Delft University of Technology, Delft, The Netherlands
,
Marcel Worring
https://ror.org/04dkp9463University of Amsterdam, Amsterdam, The Netherlands
,
Björn Þór Jónsson
https://ror.org/05d2kyx68Reykjavik University, Reykjavik, Iceland
,
Bei Liu
Microsoft Research Lab – Asia, Beijing, China
,
Yoko Yamakata
https://ror.org/057zh3y96The University of Tokyo, Tokyo, Japan

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 29 January 2024

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Other Metrics

View Author Metrics

Citations

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents