Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3581783.3612182acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

FlexIcon: Flexible Icon Colorization via Guided Images and Palettes

Published: 27 October 2023 Publication History

Abstract

Automatic icon colorization systems show great potential value as they can serve as a source of inspiration for designers. Despite yielding promising results, previous reference-guided approaches ignore how to effectively fuse icon structure and style, leading to unpleasant color effects. Meanwhile, they cannot take free-style palettes as inputs, which is less user-friendly. To this end, we present FlexIcon, a Flexible Icon colorization model based on guided images and palettes. To promote visual quality, our model first leverages a Hybrid Multi-expert Module to aggregate better structural features, followed by dynamically integrating the global style with each individual pixel of the structure map via the Pixel-Style Aggregation Layer. We also introduce an efficient learning scheme for free-style palette-based colorization, editing, interpolation, and diverse generation. Extensive experiments demonstrate the superiority of our framework compared with state-of-the-art approaches. In addition, we contribute a Mandala dataset to the multimedia community and further validate the application value of the proposed model.

References

[1]
John Canny. 1986. A computational approach to edge detection. IEEE Transactions on pattern analysis and machine intelligence 6 (1986), 679--698.
[2]
Yunjey Choi, Youngjung Uh, Jaejun Yoo, and Jung-Woo Ha. 2020. Stargan v2: Diverse image synthesis for multiple domains. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 8188--8197.
[3]
Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014).
[4]
Yuanzheng Ci, Xinzhu Ma, Zhihui Wang, Haojie Li, and Zhongxuan Luo. 2018. User-guided deep anime line art colorization with conditional adversarial networks. In Proceedings of the 26th ACM international conference on Multimedia. 1536--1544.
[5]
Yingying Deng, Fan Tang, Weiming Dong, Wen Sun, Feiyue Huang, and Changsheng Xu. 2020. Arbitrary style transfer via multi-adaptation network. In Proceedings of the 28th ACM International Conference on Multimedia. 2719--2727.
[6]
Prafulla Dhariwal and Alexander Nichol. 2021. Diffusion models beat gans on image synthesis. Advances in Neural Information Processing Systems, Vol. 34 (2021), 8780--8794.
[7]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).
[8]
Chaowei Fang, Dingwen Zhang, Liang Wang, Yulun Zhang, Lechao Cheng, and Junwei Han. 2022. Cross-modality high-frequency transformer for MR image super-resolution. In Proceedings of the 30th ACM International Conference on Multimedia. 1584--1592.
[9]
Klaus Greff, Rupesh K Srivastava, Jan Koutník, Bas R Steunebrink, and Jürgen Schmidhuber. 2016. LSTM: A search space odyssey. IEEE transactions on neural networks and learning systems, Vol. 28, 10 (2016), 2222--2232.
[10]
Arthur Gretton, Karsten M Borgwardt, Malte J Rasch, Bernhard Schölkopf, and Alexander Smola. 2012. A kernel two-sample test. The Journal of Machine Learning Research, Vol. 13, 1 (2012), 723--773.
[11]
Qin-Ru Han, Wen-Zhe Zhu, and Qing Zhu. 2020. Icon colorization based on triple conditional generative adversarial networks. In 2020 IEEE International Conference on Visual Communications and Image Processing (VCIP). IEEE, 391--394.
[12]
Yliess Hati, Gregor Jouet, Francis Rousseaux, and Clément Duhart. 2019. PaintsTorch: a User-Guided Anime Line Art Colorization Tool with Double Generator Conditional Adversarial Network. In European Conference on Visual Media Production. 1--10.
[13]
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, Vol. 30 (2017).
[14]
Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, Vol. 33 (2020), 6840--6851.
[15]
Jonathan Ho and Tim Salimans. 2022. Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598 (2022).
[16]
Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7132--7141.
[17]
Zhiyuan Hu, Jia Jia, Bei Liu, Yaohua Bu, and Jianlong Fu. 2020. Aesthetic-aware image style transfer. In Proceedings of the 28th ACM International Conference on Multimedia. 3320--3329.
[18]
Xin Huang, Dong Liang, Hongrui Cai, Juyong Zhang, and Jinyuan Jia. 2022. CariPainter: Sketch Guided Interactive Caricature Generation. In Proceedings of the 30th ACM International Conference on Multimedia. 1232--1240.
[19]
Xun Huang, Ming-Yu Liu, Serge Belongie, and Jan Kautz. 2018. Multimodal unsupervised image-to-image translation. In Proceedings of the European conference on computer vision (ECCV). 172--189.
[20]
Jing Huo, Xiangde Liu, Wenbin Li, Yang Gao, Hujun Yin, and Jiebo Luo. 2022. CAST: Learning Both Geometric and Texture Style Transfers for Effective Caricature Generation. IEEE Transactions on Image Processing, Vol. 31 (2022), 3347--3358.
[21]
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. 2017. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1125--1134.
[22]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[23]
Hsin-Ying Lee, Hung-Yu Tseng, Jia-Bin Huang, Maneesh Singh, and Ming-Hsuan Yang. 2018. Diverse image-to-image translation via disentangled representations. In Proceedings of the European conference on computer vision (ECCV). 35--51.
[24]
Hsin-Ying Lee, Hung-Yu Tseng, Qi Mao, Jia-Bin Huang, Yu-Ding Lu, Maneesh Singh, and Ming-Hsuan Yang. 2020b. Drit: Diverse image-to-image translation via disentangled representations. International Journal of Computer Vision, Vol. 128, 10 (2020), 2402--2417.
[25]
Junsoo Lee, Eungyeup Kim, Yunsung Lee, Dongjun Kim, Jaehyuk Chang, and Jaegul Choo. 2020a. Reference-Based Sketch Image Colorization using Augmented-Self Reference and Dense Semantic Correspondence. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5801--5810.
[26]
Jieun Lee, Hyeonwoo Kim, Jonghwa Shim, and Eenjun Hwang. 2022. Cartoon-Flow: A Flow-Based Generative Adversarial Network for Arbitrary-Style Photo Cartoonization. In Proceedings of the 30th ACM International Conference on Multimedia. 1241--1251.
[27]
Wei Li, Tianzhao Yang, Xiao Wu, Xian-Jun Du, and Jian-Jun Qiao. 2022c. Learning Action-guided Spatio-temporal Transformer for Group Activity Recognition. In Proceedings of the 30th ACM International Conference on Multimedia. 2051--2060.
[28]
Yuan-kui Li, Yun-Hsuan Lien, and Yu-Shuen Wang. 2022b. Style-Structure Disentangled Features and Normalizing Flows for Diverse Icon Colorization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11244--11253.
[29]
Zekun Li, Zhengyang Geng, Zhao Kang, Wenyu Chen, and Yibo Yang. 2022a. Eliminating Gradient Conflict in Reference-based Line-Art Colorization. In European Conference on Computer Vision. Springer, 579--596.
[30]
Bingchen Liu, Yizhe Zhu, Kunpeng Song, and Ahmed Elgammal. 2021b. Self-supervised sketch-to-image synthesis. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 2073--2081.
[31]
Weiming Liu, Jiajie Su, Chaochao Chen, and Xiaolin Zheng. 2021a. Leveraging distribution alignment via stein path for cross-domain cold-start recommendation. Advances in Neural Information Processing Systems, Vol. 34 (2021), 19223--19234.
[32]
Mingsheng Long, Yue Cao, Jianmin Wang, and Michael Jordan. 2015. Learning transferable features with deep adaptation networks. In International conference on machine learning. PMLR, 97--105.
[33]
Jiaqi Ma, Zhe Zhao, Xinyang Yi, Jilin Chen, Lichan Hong, and Ed H Chi. 2018. Modeling task relationships in multi-task learning with multi-gate mixture-of-experts. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 1930--1939.
[34]
Roey Mechrez, Itamar Talmi, and Lihi Zelnik-Manor. 2018. The contextual loss for image transformation with non-aligned data. In Proceedings of the European conference on computer vision (ECCV). 768--783.
[35]
Larry R Medsker and LC Jain. 2001. Recurrent neural networks. Design and Applications, Vol. 5 (2001), 64--67.
[36]
Taesung Park, Alexei A Efros, Richard Zhang, and Jun-Yan Zhu. 2020. Contrastive learning for unpaired image-to-image translation. In European Conference on Computer Vision. Springer, 319--345.
[37]
Taesung Park, Ming-Yu Liu, Ting-Chun Wang, and Jun-Yan Zhu. 2019. GauGAN: semantic image synthesis with spatially adaptive normalization. In ACM SIGGRAPH 2019 Real-Time Live! 1--1.
[38]
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10684--10695.
[39]
Patsorn Sangkloy, Jingwan Lu, Chen Fang, Fisher Yu, and James Hays. 2017. Scribbler: Controlling deep image synthesis with sketch and color. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5400--5409.
[40]
Kobayashi Shigenobu. 1992. Color image scale.
[41]
Tsai-Ho Sun, Chien-Hsun Lai, Sai-Keung Wong, and Yu-Shuen Wang. 2019. Adversarial colorization of icons based on contour and color conditions. In Proceedings of the 27th ACM International Conference on Multimedia. 683--691.
[42]
Eric Tzeng, Judy Hoffman, Ning Zhang, Kate Saenko, and Trevor Darrell. 2014. Deep domain confusion: Maximizing for domain invariance. arXiv preprint arXiv:1412.3474 (2014).
[43]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems, Vol. 30 (2017).
[44]
Yael Vinker, Eliahu Horwitz, Nir Zabari, and Yedid Hoshen. 2021. Image Shape Manipulation from a Single Augmented Training Sample. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 13769--13778.
[45]
Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. 2018. High-resolution image synthesis and semantic manipulation with conditional gans. In Proceedings of the IEEE conference on computer vision and pattern recognition. 8798--8807.
[46]
Weilun Wang, Wengang Zhou, Jianmin Bao, Dong Chen, and Houqiang Li. 2021. Instance-wise Hard Negative Example Generation for Contrastive Learning in Unpaired Image-to-Image Translation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 14020--14029.
[47]
Xinrui Wang and Jinze Yu. 2020. Learning to cartoonize using white-box cartoon representations. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 8090--8099.
[48]
Zhizhong Wang, Zhanjie Zhang, Lei Zhao, Zhiwen Zuo, Ailin Li, Wei Xing, and Dongming Lu. 2022. AesUST: towards aesthetic-enhanced universal style transfer. In Proceedings of the 30th ACM International Conference on Multimedia. 1095--1106.
[49]
Shukai Wu, Weiming Liu, Qingqin Wang, Sanyuan Zhang, Zhenjie Hong, and Shuchang Xu. 2022. RefFaceNet: Reference-based Face Image Generation from Line Art Drawings. Neurocomputing (2022).
[50]
Shuai Yang, Zhangyang Wang, Jiaying Liu, and Zongming Guo. 2020. Deep plastic surgery: Robust and controllable image editing with human-drawn sketches. In European Conference on Computer Vision. Springer, 601--617.
[51]
Shuai Yang, Zhangyang Wang, Jiaying Liu, and Zongming Guo. 2021. Controllable Sketch-to-Image Translation for Robust Face Synthesis. IEEE Transactions on Image Processing, Vol. 30 (2021), 8797--8810.
[52]
Ming Yuan and T Tony Cai. 2010. A reproducing kernel Hilbert space approach to functional linear regression. (2010).
[53]
Yu Zeng, Zhe Lin, and Vishal M Patel. 2022. Sketchedit: Mask-free local image manipulation with partial sketches. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5951--5961.
[54]
Fangneng Zhan, Yingchen Yu, Kaiwen Cui, Gongjie Zhang, Shijian Lu, Jianxiong Pan, Changgong Zhang, Feiying Ma, Xuansong Xie, and Chunyan Miao. 2021. Unbalanced feature transport for exemplar-based image translation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 15028--15038.
[55]
Fangneng Zhan, Yingchen Yu, Rongliang Wu, Jiahui Zhang, Kaiwen Cui, Aoran Xiao, Shijian Lu, and Chunyan Miao. 2022a. Bi-level feature alignment for versatile image translation and manipulation. In Computer Vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part XVI. Springer, 224--241.
[56]
Fangneng Zhan, Yingchen Yu, Rongliang Wu, Jiahui Zhang, Shijian Lu, and Changgong Zhang. 2022b. Marginal Contrastive Correspondence for Guided Image Generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10663--10672.
[57]
Fangneng Zhan, Jiahui Zhang, Yingchen Yu, Rongliang Wu, and Shijian Lu. 2022c. Modulated contrast for versatile image synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18280--18290.
[58]
Dong Zhang, Jinhui Tang, and Kwang-Ting Cheng. 2022. Graph Reasoning Transformer for Image Parsing. In Proceedings of the 30th ACM International Conference on Multimedia. 2380--2389.
[59]
Lvmin Zhang and Maneesh Agrawala. 2023. Adding conditional control to text-to-image diffusion models. arXiv preprint arXiv:2302.05543 (2023).
[60]
Lvmin Zhang, Chengze Li, Edgar Simo-Serra, Yi Ji, Tien-Tsin Wong, and Chunping Liu. 2021. User-guided line art flat filling with split filling mechanism. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9889--9898.
[61]
Pan Zhang, Bo Zhang, Dong Chen, Lu Yuan, and Fang Wen. 2020. Cross-domain correspondence learning for exemplar-based image translation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5143--5153.
[62]
Xingran Zhou, Bo Zhang, Ting Zhang, Pan Zhang, Jianmin Bao, Dong Chen, Zhongfei Zhang, and Fang Wen. 2021. Cocosnet v2: Full-resolution correspondence learning for image translation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11465--11475.
[63]
Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision. 2223--2232.
[64]
Peihao Zhu, Rameen Abdal, Yipeng Qin, and Peter Wonka. 2020. Sean: Image synthesis with semantic region-adaptive normalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5104--5113.

Index Terms

  1. FlexIcon: Flexible Icon Colorization via Guided Images and Palettes

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      MM '23: Proceedings of the 31st ACM International Conference on Multimedia
      October 2023
      9913 pages
      ISBN:9798400701085
      DOI:10.1145/3581783
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 27 October 2023

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. attention
      2. guided images
      3. icon colorization
      4. style palettes

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      MM '23
      Sponsor:
      MM '23: The 31st ACM International Conference on Multimedia
      October 29 - November 3, 2023
      Ottawa ON, Canada

      Acceptance Rates

      Overall Acceptance Rate 995 of 4,171 submissions, 24%

      Upcoming Conference

      MM '24
      The 32nd ACM International Conference on Multimedia
      October 28 - November 1, 2024
      Melbourne , VIC , Australia

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 165
        Total Downloads
      • Downloads (Last 12 months)165
      • Downloads (Last 6 weeks)10
      Reflects downloads up to 26 Sep 2024

      Other Metrics

      Citations

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media