Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3584376.3584526acmotherconferencesArticle/Chapter ViewAbstractPublication PagesricaiConference Proceedingsconference-collections
research-article

Text-to-Image algorithm Based on Fusion Mechanism

Published: 19 April 2023 Publication History

Abstract

A multi-step text-generated image algorithm based on feature fusion is proposed to solve the problems of image blurring and missing image details in text-generated images. To enhance the ability of single-channel text features to guide multi-channel image features, the migration of text features to image feature is implemented using a feature fusion module that enhances text features while refining the details of the generated images. The fusion module is executed alternately with the upsampling operation in the generator to enhance the frequency of text feature usage. Placing the generator and discriminator is three pairs achieves the goal of generating fuzzy images from text to clear large images from fuzzy images.The experimental data of the model on the CUB dataset show an improvement in the Inception Score as well as the Frechet Inception Distance score, and the comparative analysis of the generated images shows that the images generated by this method have rich detail texture and sharpness.

References

[1]
Goodfellow I, Pouget-Abadie J, Mirza M, Generative Adversarial Nets [C] Neural Information Processing Systems. MIT Press, 2014.
[2]
Mirza M, Osindero S. Conditional Generative Adversarial Nets [J]. Computer Science, 2014, 2672-2680.
[3]
Radford A, Metz L, Chintala S. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks [J]. Computer ence, 2015.
[4]
Zhang H, Xu T, Li H, StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks [J]. IEEE, 2017.
[5]
Tao X, Zhang P, Huang Q, AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks [J]. 2017.
[6]
Han Z, Xu T, Li H, StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, PP(99):1-1.
[7]
Zhu J Y, Park T, Isola P, Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks [J]. 2017 IEEE International Conference on Computer Vision (ICCV), 2017.
[8]
Tsue, Sen S, Li J. Cycle Text-To-Image GAN with BERT [J]. 2020.
[9]
Zhu M, Pan P, Chen W, DM-GAN: Dynamic Memory Generative Adversarial Networks for Text-to-Image Synthesis [J]. 2019.
[10]
Tan H, Liu X, Liu M, KT-GAN: Knowledge-Transfer Generative Adversarial Network for Text-to-Image Synthesis [J]. IEEE Transactions on Image Processing, 2020, PP(99).
[11]
Liu Y, MD Nadai, Cai D, Describe What to Change: A Text-guided Unsupervised Image-to-image Translation Approach [C] MM '20: The 28th ACM International Conference on Multimedia. ACM, 2020.
[12]
Tan H, Liu X, Yin B, DR-GAN: Distribution Regularization for Text-to-Image Generation [J]. arXiv e-prints, 2022.
[13]
Qiao T, Zhang J, Xu D, MirrorGAN:Learning Text-to-image Generation by Redescription [C] 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2019.
[14]
Zhang H, Goodfellow I, Metaxas D, Self-Attention Generative Adversarial Networks [J]. 2018.
[15]
Liao W, Hu K, Yang M Y, Text to Image Generation with Semantic-Spatial Aware GAN [J]. 2021.
[16]
Shulan Ruan,Yong Zhang, Kun Zhang, Yanbo Fan, Fan Tang, Qi Liu, Enhong Chen. DAEGAN: dynamic aspect-aware GAN for text-to-image synthesis. [J] CoRR. Ruan S, Zhang Y, Zhang K, Dae-gan: Dynamic aspect-aware gan for text-to-image synthesis[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021, 13960-13969.
[17]
Zhang Z, Schomaker L. DTGAN: Dual Attention Generative Adversarial Networks for Text-to-Image Generation [C]// International Joint Conference on Neural Network. IEEE, 2021.
[18]
Ye H, Yang X, Takac M, Improving Text-to-Image Synthesis Using Contrastive Learning [J]. 2021.
[19]
Hassani K, Khasahmadi A H. Contrastive Multi-View Representation Learning on Graphs [C]// 2020.
[20]
Misra I, Maaten L. Self-Supervised Learning of Pretext-Invariant Representations [C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2020.
[21]
Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, Timo Aila. Analyzing and improving the image quality of stylegan. [J]. Conference on Computer Vision and Pattern Recongnition. 2020, 8110-8119.
[22]
Tao M, Tang H, Wu S, DF-GAN: Deep Fusion Generative Adversarial Networks for Text-to-Image Synthesis [J]. 2020.
[23]
Tero Karras, Samuli Laine, Timo AILA. A style-based generator architecture for generative adversarial networks. [J]. IEEE Transmit Pattern Anal., 2021, 43: 41, 217-4228.
[24]
Ye S, Liu F, Tan M. Recurrent Affine Transformation for Text-to-image Synthesis [J]. 2022.
[25]
Saharia C, Chan W, Saxena S, Non-Autoregressive Machine Translation with Latent Alignments [J]. 2020.
[26]
Gafni O, Polyak A, Ashual O, Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors [J]. 2022.
[27]
Avrahami O, Lischinski D, Fried O. Blended Diffusion for Text-driven Editing of Natural Images [J]. 2021.
[28]
Liu X, Gong C, Wu L, FuseDream: Training-Free Text-to-Image Generation with Improved CLIP+GAN Space Optimization [J]. arXiv e-prints, 2021.
[29]
Gu S, Chen D, Bao J, Vector Quantized Diffusion Model for Text-to-Image Synthesis [J]. arXiv e-prints, 2021.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
RICAI '22: Proceedings of the 2022 4th International Conference on Robotics, Intelligent Control and Artificial Intelligence
December 2022
1396 pages
ISBN:9781450398343
DOI:10.1145/3584376
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 April 2023

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

RICAI 2022

Acceptance Rates

Overall Acceptance Rate 140 of 294 submissions, 48%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 37
    Total Downloads
  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)0
Reflects downloads up to 21 Jan 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media