Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3581783.3612589acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

A Multitask Framework for Graffiti-to-Image Translation

Published: 27 October 2023 Publication History

Abstract

Recently, image-to-image translation models have achieved great success in terms of content consistency and visual fidelity. However, in most of these tasks, the inaccuracy of sketches and the high cost of fine semantic masks acquisition limit the large-scale use of image translation models. Therefore, we propose to use graffiti that combines the advantages of sketches and semantic masks as model input. Graffiti reflects the general content of an image using lines and color distinctions, with some unlabeled regions. However, due to the large number of unknown areas in the graffiti, the generated results may be blurred, resulting in poor visual effects. To address these challenges, this paper proposes a multi-task framework that can predict unknown regions by learning semantic mask from graffiti, thereby improving the quality of generated real scene images. Furthermore, by introducing an edge activation module, which utilizes semantic and edge information to optimize the object boundaries of the generated images, the details of the generated images can be improved. Experiments on the Cityscapes dataset demonstrate that our multi-task framework achieves competitive performance on graffiti-based image generation task.

References

[1]
Deblina Bhattacharjee, Seungryong Kim, Guillaume Vizier, and Mathieu Salzmann. 2020. DUNIT: Detection-based unsupervised image-to-image translation. In Proc. IEEE CVPR. 4786--4795.
[2]
Deblina Bhattacharjee, Tong Zhang, Sabine Süsstrunk, and Mathieu Salzmann. 2022. MuIT: An End-to-End Multitask Learning Transformer. In Proc. IEEE CVPR. 12021--12031.
[3]
Andrew Brock, Jeff Donahue, and Karen Simonyan. 2019. Large Scale GAN Training for High Fidelity Natural Image Synthesis. In Proc. ICLR.
[4]
Xiaojun Chang, Pengzhen Ren, Pengfei Xu, Zhihui Li, Xiaojiang Chen, and Alex Hauptmann. 2023. A Comprehensive Survey of Scene Graphs: Generation and Application. IEEE Trans. Pattern Anal. Mach. Intell., Vol. 45, 1 (2023), 1--26.
[5]
Qifeng Chen and Vladlen Koltun. 2017. Photographic Image Synthesis with Cascaded Refinement Networks. In Proc. IEEE ICCV. 1520--1529.
[6]
Shu-Yu Chen, Wanchao Su, Lin Gao, Shihong Xia, and Hongbo Fu. 2020a. DeepFaceDrawing: deep generation of face images from sketches. ACM Trans. Graph., Vol. 39, 4 (2020), 72.
[7]
Ying-Cong Chen, Xiaogang Xu, and Jiaya Jia. 2020b. Domain adaptive image-to-image translation. In Proc. IEEE CVPR. 5273--5282.
[8]
Prafulla Dhariwal and Alexander Quinn Nichol. 2021. Diffusion Models Beat GANs on Image Synthesis. In Proc. NIPS. 8780--8794.
[9]
Chao Dong, Chen Change Loy, Kaiming He, and Xiaoou Tang. 2014. Learning a Deep Convolutional Network for Image Super-Resolution. In Proc. ECCV. 184--199.
[10]
Leon A. Gatys, Alexander S. Ecker, and Matthias Bethge. 2015. A Neural Algorithm of Artistic Style. http://arxiv.org/abs/1508.06576 arXiv preprint arXiv.1508.06576(2015).
[11]
Ian. Goodfellow et al. 2014. Generative adversarial nets. In Proc. NIPS. 2672--2680.
[12]
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. In Proc. NIPS. 6626--6637.
[13]
Di Hu, Feiping Nie, and Xuelong Li. 2019. Deep Binary Reconstruction for Cross-Modal Hashing. IEEE Trans. Multim., Vol. 21, 4 (2019), 973--985.
[14]
Xueqi Hu, Xinyue Zhou, Qiusheng Huang, Zhengyi Shi, Li Sun, and Qingli Li. 2022. QS-Attn: Query-Selected Attention for Contrastive Learning in I2I Translation. In Proc. IEEE CVPR. 18270--18279.
[15]
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. 2017. Image-to-image translation with conditional adversarial networks. In Proc. IEEE CVPR. 5967--5976.
[16]
Wei Ji et al. 2021. Learning Calibrated Medical Image Segmentation via Multi-Rater Agreement Modeling. In Proc. IEEE CVPR. 12341--12351.
[17]
Tero Karras, Samuli Laine, and Timo Aila. 2019. A style-based generator architecture for generative adversarial networks. In Proc. IEEE CVPR. 4401--4410.
[18]
Christian Ledig et al. 2017. Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network. In Proc. IEEE CVPR. 105--114.
[19]
Xuelong Li, Mulin Chen, Feiping Nie, and Qi Wang. 2017. A Multiview-Based Parameter Free Framework for Group Detection. In Proc. AAAI. AAAI Press, 4147--4153.
[20]
Xuelong Li, Chen Li, Kai Kou, and Bin Zhao. 2022. Weather Translation via Weather-Cue Transferring. IEEE Transactions on Neural Networks and Learning Systems (2022), 1--11.
[21]
Jie Liang, Hui Zeng, and Lei Zhang. 2022. Details or Artifacts: A Locally Discriminative Learning Approach to Realistic Image Super-Resolution. In Proc. IEEE CVPR. 5647--5656.
[22]
Xihui Liu, Guojun Yin, Jing Shao, Xiaogang Wang, and Hongsheng Li. 2019. Learning to Predict Layout-to-image Conditional Convolutions for Semantic Image Synthesis. In Proc. NIPS. 568--578.
[23]
Yulei Lu, Yawei Luo, Li Zhang, Zheyang Li, Yi Yang, and Jun Xiao. 2022. Bidirectional Self-Training with Multiple Anisotropic Prototypes for Domain Adaptive Semantic Segmentation. In ACM International Conference on Multimedia. 1405--1415.
[24]
Fujun Luan, Sylvain Paris, Eli Shechtman, and Kavita Bala. 2017. Deep Photo Style Transfer. In Proc. IEEE CVPR. 6997--7005.
[25]
Zhengyao Lv, Xiaoming Li, Zhenxing Niu, Bing Cao, and Wangmeng Zuo. 2022. Semantic-shape Adaptive Feature Modulation for Semantic Image Synthesis. In Proc. IEEE CVPR. 11204--11213.
[26]
Lars M. Mescheder, Andreas Geiger, and Sebastian Nowozin. 2018. Which Training Methods for GANs do actually Converge. In Proc. ICML. 3478--3487.
[27]
Mehdi Mirza and Simon Osindero. 2014. Conditional Generative Adversarial Nets. (2014). arXiv preprint arXiv:1411.1784(2014).
[28]
Augustus Odena, Christopher Olah, and Jonathon Shlens. 2017. Conditional Image Synthesis with Auxiliary Classifier GANs. In Proc. ICML. 2642--2651.
[29]
Taesung Park, Ming-Yu Liu, Ting-Chun Wang, and Jun-Yan Zhu. 2019. Semantic Image Synthesis With Spatially-Adaptive Normalization. In Proc. IEEE CVPR. 2337--2346.
[30]
Xiaojuan Qi, Qifeng Chen, Jiaya Jia, and Vladlen Koltun. 2018. Semi-Parametric Image Synthesis. In Proc. IEEE CVPR. 8808--8816.
[31]
Alec Radford et al. 2021. Learning Transferable Visual Models From Natural Language Supervision. In Proc. ICML, Vol. 139. 8748--8763.
[32]
Sebastian Ruder. 2017. An Overview of Multi-Task Learning in Deep Neural Networks. http://arxiv.org/abs/1706.05098 arXiv preprint arXiv:1706.05098(2017).
[33]
Edgar Schö nfeld, Vadim Sushko, Dan Zhang, Juergen Gall, Bernt Schiele, and Anna Khoreva. 2021. You Only Need Adversarial Supervision for Semantic Image Synthesis. In Proc. ICLR.
[34]
Yupeng Shi, Xiao Liu, Yuxiang Wei, Zhongqin Wu, and Wangmeng Zuo. 2022. Retrieval-based Spatially Adaptive Normalization for Semantic Image Synthesis. In Proc. IEEE CVPR. 11214--11223.
[35]
Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In Proc. ICLR.
[36]
Jianxin Sun, Qi Li, Weining Wang, Jian Zhao, and Zhenan Sun. 2021. Multi-caption Text-to-Face Synthesis: Dataset and Algorithm. In ACM International Conference on Multimedia. 2290--2298.
[37]
Hao Tang, Dan Xu, Yan Yan, Philip H. S. Torr, and Nicu Sebe. 2020. Local Class-Specific and Global Image-Level Generative Adversarial Networks for Semantic-Guided Scene Generation. In Proc. IEEE CVPR. 7867--7876.
[38]
Ming Tao, Hao Tang, Fei Wu, Xiaoyuan Jing, Bing-Kun Bao, and Changsheng Xu. 2022. DF-GAN: A Simple and Effective Baseline for Text-to-Image Synthesis. In Proc. IEEE CVPR. 16494--16504.
[39]
vspace0mmLeon A. Gatys, Alexander S. Ecker, and Matthias Bethge. 2016. Image Style Transfer Using Convolutional Neural Networks. In Proc. IEEE CVPR. 2414--2423.
[40]
Sheng-Yu Wang, David Bau, and Jun-Yan Zhu. 2021. Sketch Your Own GAN. In Proc. IEEE ICCV. 14030--14040.
[41]
Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. 2018. High-Resolution Image Synthesis and Semantic Manipulation With Conditional GANs. In Proc. IEEE CVPR. 8798--8807.
[42]
Zihao Wang, Wei Liu, Qian He, Xinglong Wu, and Zili Yi. 2022. CLIP-GEN: Language-Free Training of a Text-to-Image Generator with CLIP. (2022). arXiv preprint arXiv.2203.00386(2022).
[43]
Wei Xiong, Yutong He, Yixuan Zhang, Wenhan Luo, Lin Ma, and Jiebo Luo. 2020. Fine-grained image-to-image transformation towards visual Recognition. In Proc. IEEE CVPR. 5839--5848.
[44]
Tao Xu, Pengchuan Zhang, Qiuyuan Huang, Han Zhang, Zhe Gan, Xiaolei Huang, and Xiaodong He. 2018. AttnGAN: Fine-grained text to image generation with attentional generative adversarial networks. In Proc. IEEE CVPR. 1316--1324.
[45]
Shuai Yang, Liming Jiang, Ziwei Liu, and Chen Change Loy. 2022. Unsupervised Image-to-Image Translation with Generative Prior. In Proc. IEEE CVPR. 18311--18320.
[46]
Fisher Yu, Vladlen Koltun, and Thomas A. Funkhouser. 2017. Dilated Residual Networks. In Proc. IEEE CVPR. 636--644.
[47]
Chongzhen Zhang, Yang Tang, Chaoqiang Zhao, Qiyu Sun, Zhencheng Ye, and Jü rgen Kurths. 2021. Multitask GANs for Semantic Segmentation and Depth Completion With Cycle Consistency. IEEE Trans.Neural Netw. Learn. Syst., Vol. 32, 12 (2021), 5404--5415.
[48]
Han Zhang, Tao Xu, and Hongsheng Li. 2017. StackGAN: Text to photo-realistic image synthesis with stacked generative adversarial networks. In Proc. IEEE ICCV. 5908--5916.
[49]
Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. 2017. Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks. In Proc. IEEE ICCV. 2242--2251.

Index Terms

  1. A Multitask Framework for Graffiti-to-Image Translation

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '23: Proceedings of the 31st ACM International Conference on Multimedia
    October 2023
    9913 pages
    ISBN:9798400701085
    DOI:10.1145/3581783
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 October 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. graffiti
    2. image processing
    3. image-to-image translation
    4. multi-modal learning

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    MM '23
    Sponsor:
    MM '23: The 31st ACM International Conference on Multimedia
    October 29 - November 3, 2023
    Ottawa ON, Canada

    Acceptance Rates

    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 170
      Total Downloads
    • Downloads (Last 12 months)92
    • Downloads (Last 6 weeks)7
    Reflects downloads up to 08 Feb 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media