Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3569966.3569990acmotherconferencesArticle/Chapter ViewAbstractPublication PagescsseConference Proceedingsconference-collections
research-article
Open access

CTI-GAN: Cross-Text-Image Generative Adversarial Network for Bidirectional Cross-modal Generation

Published: 20 December 2022 Publication History

Abstract

Cross-modal tasks between text and images are increasingly a research hotspot. This paper proposed a cross-text-image generative adversarial network(CTI-GAN). This model can complete the cross-modal bidirectional generation task between image and text. The method effectively connects text and image modeling to realize bidirectional generation between image and text. The extraction effect of text features is improved by hierarchical LSTM encoding. Through feature pyramid fusion, the features of each layer are fully utilized to improve the image feature representation. In this paper, experiments are conducted to verify the effectiveness of the above improvements for image text generation. The improved algorithm can efficiently complete the task of cross-modal image text generation and improve the accuracy of the generated samples. In the text description generation image task, the inception score of CTI-GAN is improved by about 2% compared with StackGAN++, HDGAN, GAN-INT-CLS, and other models under the same conditions of the same dataset.

References

[1]
Dewei Zeng, Shuqiang Wang, Yanyan Shen, and Changhong Shi. 2017. A GA-based feature selection and parameter optimization for support tucker machine. Procedia computer science 111 (2017), 17–23.
[2]
Kun Wu, Yanyan Shen, and Shuqiang Wang. 2018. 3D convolutional neural network for regional precipitation nowcasting. Journal of Image and Signal Processing 7, 4 (2018), 200–212.
[3]
Shuqiang Wang, Yanyan Shen, Changhong Shi, Peng Yin, Zuhui Wang, Prudence Wing-Hang Cheung, Jason Pui Yin Cheung, Keith Dip-Kei Luk, and Yong Hu. 2018. Skeletal maturity recognition using a fully automated system with convolutional neural networks. IEEE Access 6(2018), 29979–29993.
[4]
Shuqiang Wang, Xiangyu Wang, Yanyan Shen, Bing He, Xinyan Zhao, Prudence Wing-Hang Cheung, Jason Pui Yin Cheung, Keith Dip-Kei Luk, and Yong Hu. 2020. An Ensemble-Based Densely-Connected Deep Learning System for Assessment of Skeletal Maturity. IEEE Transactions on Systems, Man, and Cybernetics: Systems (2020).
[5]
Wen Yu, Baiying Lei, Michael K Ng, Albert C Cheung, Yanyan Shen, and Shuqiang Wang. 2021. Tensorizing GAN with high-order pooling for Alzheimer’s disease assessment. IEEE Transactions on Neural Networks and Learning Systems (2021).
[6]
Shengye Hu, Jianpeng Yuan, and Shuqiang Wang. 2019. Cross-modality Synthesis from MRI to PET Using Adversarial U-Net with Different Normalization. In 2019 International Conference on Medical Imaging Physics and Engineering (ICMIPE). IEEE, 1–5.
[7]
S. Wang, Y. Shen, W. Chen, T. Xiao, and J. Hu. 2017. Automatic recognition of mild cognitive impairment from MRI images using expedited convolutional neural networks. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 10613 LNCS (2017), 373–380.
[8]
Y. Shen, X. Huang, K.S. Kwak, B. Yang, and S. Wang. 2017. Subcarrier-Pairing-Based Resource Optimization for OFDM Wireless Powered Relay Transmissions with Time Switching Scheme. IEEE Transactions on Signal Processing 65, 5 (2017), 1130–1145.
[9]
S. Wang, H. Wang, Y. Shen, and X. Wang. 2019. Automatic Recognition of Mild Cognitive Impairment and Alzheimers Disease Using Ensemble based 3D Densely Connected Convolutional Networks. Proceedings - 17th IEEE International Conference on Machine Learning and Applications, ICMLA 2018(2019), 517–523.
[10]
S. Wang, Y. Shen, D. Zeng, and Y. Hu. 2018. Bone age assessment using convolutional neural networks. 2018 International Conference on Artificial Intelligence and Big Data, ICAIBD 2018 (2018), 175–178.
[11]
B. Lei, E. Liang, M. Yang, P. Yang, F. Zhou, E.-L. Tan, Y. Lei, C.-M. Liu, T. Wang, X. Xiao, and S. Wang. 2022. Predicting clinical scores for Alzheimer’s disease based on joint and deep learning. Expert Systems with Applications 187 (2022).
[12]
Shuqiang Wang, Yong Hu, Yanyan Shen, and Hanxiong Li. 2018. Classification of diffusion tensor metrics for the diagnosis of a myelopathic cord using machine learning. International journal of neural systems 28, 02 (2018), 1750036.
[13]
Shu-Qiang Wang, Xiang Li, Jiao-Long Cui, Han-Xiong Li, Keith DK Luk, and Yong Hu. 2015. Prediction of myelopathic level in cervical spondylotic myelopathy using diffusion tensor imaging. Journal of Magnetic Resonance Imaging 41, 6 (2015), 1682–1688.
[14]
Shengye Hu, Yanyan Shen, Shuqiang Wang, and Baiying Lei. 2020. Brain MR to PET Synthesis via Bidirectional Generative Adversarial Network. In International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, Cham, 698–707.
[15]
Shengye Hu, Wen Yu, Zhuo Chen, and Shuqiang Wang. 2020. Medical Image Reconstruction Using Generative Adversarial Network for Alzheimer Disease Assessment with Class-Imbalance Problem. In 2020 IEEE 6th International Conference on Computer and Communications (ICCC). IEEE, 1323–1327.
[16]
Shuqiang Wang, Xiangyu Wang, Yong Hu, Yanyan Shen, Zhile Yang, Min Gan, and Baiying Lei. 2020. Diabetic retinopathy diagnosis using multichannel generative adversarial network with semisupervision. IEEE Transactions on Automation Science and Engineering (2020).
[17]
Baiying Lei, Zaimin Xia, Feng Jiang, Xudong Jiang, Zongyuan Ge, Yanwu Xu, Jing Qin, Siping Chen, Tianfu Wang, and Shuqiang Wang. 2020. Skin lesion segmentation via generative adversarial networks with dual discriminators. Medical Image Analysis 64 (2020), 101716.
[18]
Raffaella Bernardi, Ruket Cakici, Desmond Elliott, Aykut Erdem, Erkut Erdem, Nazli Ikizler-Cinbis, Frank Keller, Adrian Muscat, and Barbara Plank. 2016. Automatic description generation from images: A survey of models, datasets, and evaluation measures. Journal of Artificial Intelligence Research 55 (2016), 409–442.
[19]
Ali Farhadi, Mohsen Hejrati, Mohammad Amin Sadeghi, Peter Young, Cyrus Rashtchian, Julia Hockenmaier, and David Forsyth. 2010. Every picture tells a story: Generating sentences from images. In European conference on computer vision. Springer, 15–29.
[20]
Micah Hodosh, Peter Young, and Julia Hockenmaier. 2013. Framing image description as a ranking task: Data, models and evaluation metrics. Journal of Artificial Intelligence Research 47 (2013), 853–899.
[21]
Yunchao Gong, Liwei Wang, Micah Hodosh, Julia Hockenmaier, and Svetlana Lazebnik. 2014. Improving image-sentence embeddings using large weakly annotated photo collections. In European conference on computer vision. Springer, 529–545.
[22]
Yan Huang, Wei Wang, and Liang Wang. 2017. Instance-aware image and sentence matching with selective multimodal lstm. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2310–2318.
[23]
Lu-Feng Mo and Shu-Qiang Wang. 2009. A variational approach to nonlinear two-point boundary value problems. Nonlinear Analysis: Theory, Methods & Applications 71, 12(2009), e834–e838.
[24]
Shu-Qiang Wang. 2009. A variational approach to nonlinear two-point boundary value problems. Computers & Mathematics with Applications 58, 11-12 (2009), 2452–2455.
[25]
Shu-Qiang Wang and Ji-Huan He. 2007. Variational iteration method for solving integro-differential equations. Physics letters A 367, 3 (2007), 188–191.
[26]
Shu-Qiang Wang and Ji-Huan He. 2008. Variational iteration method for a nonlinear reaction-diffusion process. International Journal of Chemical Reactor Engineering 6, 1(2008).
[27]
Senrong You, Baiying Lei, Shuqiang Wang, Charles K Chui, Albert C Cheung, Yong Liu, Min Gan, Guocheng Wu, and Yanyan Shen. 2022. Fine perceptive gans for brain mr image super-resolution in wavelet domain. IEEE transactions on neural networks and learning systems (2022).
[28]
Shuqiang Wang, Lujia Pan, and Cheng He. 2013. Junk information filtering method and apparatus. US Patent US10079770B2.
[29]
Shuqiang Wang, Zhuo Chen, Wen Yu, and Baiying Lei. 2022. Brain Stroke Lesion Segmentation Using Consistent Perception Generative Adversarial Network. Neural Comput & Applic 34 (2022), 8657–8669.
[30]
Bowen Hu, Baiying Lei, Yanyan Shen, Yong Liu, and Shuqiang Wang. 2021. A Point Cloud Generative Model via Tree-Structured Graph Convolutions for 3D Brain Shape Reconstruction. In 2021 PRCV. 263–274.
[31]
Qiankun Zuo, Baiying Lei, Yanyan Shen, Yong Liu, Zhiguang Feng, and Shuqiang Wang. 2021. Multimodal Representations Learning and Adversarial Hypergraph Fusion for Early Alzheimer’s Disease Prediction. In PRCV2021. 479–490.
[32]
Junren Pan, Baiying Lei, Yanyan Shen, Yong Liu, Zhiguang Feng, and Shuqiang Wang. 2021. Characterization Multimodal Connectivity of Brain Network by Hypergraph GAN for Alzheimer’s Disease Analysis. In PRCV2021. 467–478.
[33]
Shengye Hu, Baiying Lei, Shuqiang Wang, Yong Wang, Zhiguang Feng, and Yanyan Shen. 2021. Bidirectional mapping generative adversarial networks for brain MR to PET synthesis. IEEE Transactions on Medical Imaging 41, 1 (2021), 145–157.
[34]
Wen Yu, Baiying Lei, Yanyan Shen, Shuqiang Wang, Yong Liu, Zhiguang Feng, Yong Hu, and Michael K Ng. 2021. Morphological feature visualization of Alzheimer’s disease via Multidirectional Perception GAN. IEEE Transactions on Neural Networks and Learning Systems0 (2021), 0.
[35]
Junren Pan, Baiying Lei, Shuqiang Wang, Bingchuan Wang, Yong Liu, and Yanyan Shen. 2021. DecGAN: Decoupling Generative Adversarial Network detecting abnormal neural circuits for Alzheimer’s disease. arXiv preprint arXiv:2110.05712(2021).
[36]
SQ Wang, W Yu, CC Xiao, and SY Hu. 2021. Visualization method and device for evaluating brain addiction traits, and medium. Ph.D. Dissertation. WO/2021/179189.
[37]
Shuqiang Wang, ZENG Dewei, SHEN Yanyan, SHI Changhong, and Zhe Lu. 2020. Method for processing tensor data for pattern recognition and computer device. US Patent 10,748,080.
[38]
Shuqiang Wang, Wen Yu, XIAO Chenchen, and HU Shengye. 2022. Visualization method for evaluating brain addiction traits, apparatus, and medium. US Patent App. 17/549,258.
[39]
Shuqiang Wang, SHEN Yanyan, and Wenyong Zhang. 2020. Enhanced generative adversarial network and target sample recognition method. US Patent App. 16/999,118.
[40]
Shuqiang Wang, Wen Yu, XIAO Chenchen, HU Shengye, and SHEN Yanyan. 2022. Image feature visualization method, image feature visualization apparatus, and electronic device. US Patent App. 17/283,199.
[41]
Qiankun Zuo, Baiying Lei, Shuqiang Wang, Yong Liu, Bingchuan Wang, and Yanyan Shen. 2021. A Prior Guided Adversarial Representation Learning and Hypergraph Perceptual Network for Predicting Abnormal Connections of Alzheimer’s Disease. arXiv preprint arXiv:2110.09302(2021).
[42]
Mehdi Mirza and Simon Osindero. 2014. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784(2014).
[43]
Emily L Denton, Soumith Chintala, Rob Fergus, 2015. Deep generative image models using a? laplacian pyramid of adversarial networks. Advances in neural information processing systems 28 (2015).
[44]
Alec Radford, Luke Metz, and Soumith Chintala. 2015. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434(2015).
[45]
Ming-Yu Liu and Oncel Tuzel. 2016. Coupled generative adversarial networks. Advances in neural information processing systems 29 (2016).
[46]
Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele, and Honglak Lee. 2016. Generative adversarial text to image synthesis. In International conference on machine learning. PMLR, 1060–1069.
[47]
Alexey Dosovitskiy, Jost Tobias Springenberg, and Thomas Brox. 2015. Learning to generate chairs with convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1538–1546.
[48]
Kiana Ehsani, Hessam Bagherinezhad, Joseph Redmon, Roozbeh Mottaghi, and Ali Farhadi. 2018. Who let the dogs out? modeling dog behavior from visual data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4051–4060.
[49]
Xiaolong Wang and Abhinav Gupta. 2016. Generative image modeling using style and structure adversarial networks. In European conference on computer vision. Springer, 318–335.
[50]
Emily L Denton, Soumith Chintala, Rob Fergus, 2015. Deep generative image models using a? laplacian pyramid of adversarial networks. Advances in neural information processing systems 28 (2015).
[51]
Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele, and Honglak Lee. 2016. Generative adversarial text to image synthesis. In International conference on machine learning. PMLR, 1060–1069.
[52]
Scott E Reed, Zeynep Akata, Santosh Mohan, Samuel Tenka, Bernt Schiele, and Honglak Lee. 2016. Learning what and where to draw. Advances in neural information processing systems 29 (2016).
[53]
Zizhao Zhang, Yuanpu Xie, and Lin Yang. 2018. Photographic text-to-image synthesis with a hierarchically-nested adversarial network. In Proceedings of the IEEE conference on computer vision and pattern recognition. 6199–6208.
[54]
Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, and Dimitris N Metaxas. 2018. Stackgan++: Realistic image synthesis with stacked generative adversarial networks. IEEE transactions on pattern analysis and machine intelligence 41, 8(2018), 1947–1962.
[55]
Andrej Karpathy and Li Fei-Fei. 2015. Deep visual-semantic alignments for generating image descriptions. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3128–3137.
[56]
Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan. 2015. Show and tell: A neural image caption generator. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3156–3164.
[57]
Ryan Kiros, Ruslan Salakhutdinov, and Rich Zemel. 2014. Multimodal neural language models. In International conference on machine learning. PMLR, 595–603.

Cited By

View all
  • (2023)Strong and Weak Supervision Combined with CLIP for Water Surface Garbage DetectionWater10.3390/w1517315615:17(3156)Online publication date: 4-Sep-2023

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
CSSE '22: Proceedings of the 5th International Conference on Computer Science and Software Engineering
October 2022
753 pages
ISBN:9781450397780
DOI:10.1145/3569966
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 December 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Cross-modality
  2. Cycle consistency
  3. Generative adversarial networks
  4. Image-Text generation

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • National Natural Science Foundations of China

Conference

CSSE 2022

Acceptance Rates

Overall Acceptance Rate 33 of 74 submissions, 45%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)495
  • Downloads (Last 6 weeks)54
Reflects downloads up to 23 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Strong and Weak Supervision Combined with CLIP for Water Surface Garbage DetectionWater10.3390/w1517315615:17(3156)Online publication date: 4-Sep-2023

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media