research-article

FlexIcon: Flexible Icon Colorization via Guided Images and Palettes

Authors:

Sanyuan ZhangAuthors Info & Claims

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

Pages 8662 - 8673

https://doi.org/10.1145/3581783.3612182

Published: 27 October 2023 Publication History

Abstract

Automatic icon colorization systems show great potential value as they can serve as a source of inspiration for designers. Despite yielding promising results, previous reference-guided approaches ignore how to effectively fuse icon structure and style, leading to unpleasant color effects. Meanwhile, they cannot take free-style palettes as inputs, which is less user-friendly. To this end, we present FlexIcon, a Flexible Icon colorization model based on guided images and palettes. To promote visual quality, our model first leverages a Hybrid Multi-expert Module to aggregate better structural features, followed by dynamically integrating the global style with each individual pixel of the structure map via the Pixel-Style Aggregation Layer. We also introduce an efficient learning scheme for free-style palette-based colorization, editing, interpolation, and diverse generation. Extensive experiments demonstrate the superiority of our framework compared with state-of-the-art approaches. In addition, we contribute a Mandala dataset to the multimedia community and further validate the application value of the proposed model.

References

[1]

John Canny. 1986. A computational approach to edge detection. IEEE Transactions on pattern analysis and machine intelligence 6 (1986), 679--698.

Digital Library

[2]

Yunjey Choi, Youngjung Uh, Jaejun Yoo, and Jung-Woo Ha. 2020. Stargan v2: Diverse image synthesis for multiple domains. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 8188--8197.

[3]

Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014).

Digital Library

[4]

Yuanzheng Ci, Xinzhu Ma, Zhihui Wang, Haojie Li, and Zhongxuan Luo. 2018. User-guided deep anime line art colorization with conditional adversarial networks. In Proceedings of the 26th ACM international conference on Multimedia. 1536--1544.

Digital Library

[5]

Yingying Deng, Fan Tang, Weiming Dong, Wen Sun, Feiyue Huang, and Changsheng Xu. 2020. Arbitrary style transfer via multi-adaptation network. In Proceedings of the 28th ACM International Conference on Multimedia. 2719--2727.

Digital Library

[6]

Prafulla Dhariwal and Alexander Nichol. 2021. Diffusion models beat gans on image synthesis. Advances in Neural Information Processing Systems, Vol. 34 (2021), 8780--8794.

[7]

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).

[8]

Chaowei Fang, Dingwen Zhang, Liang Wang, Yulun Zhang, Lechao Cheng, and Junwei Han. 2022. Cross-modality high-frequency transformer for MR image super-resolution. In Proceedings of the 30th ACM International Conference on Multimedia. 1584--1592.

Digital Library

[9]

Klaus Greff, Rupesh K Srivastava, Jan Koutník, Bas R Steunebrink, and Jürgen Schmidhuber. 2016. LSTM: A search space odyssey. IEEE transactions on neural networks and learning systems, Vol. 28, 10 (2016), 2222--2232.

[10]

Arthur Gretton, Karsten M Borgwardt, Malte J Rasch, Bernhard Schölkopf, and Alexander Smola. 2012. A kernel two-sample test. The Journal of Machine Learning Research, Vol. 13, 1 (2012), 723--773.

Digital Library

[11]

Qin-Ru Han, Wen-Zhe Zhu, and Qing Zhu. 2020. Icon colorization based on triple conditional generative adversarial networks. In 2020 IEEE International Conference on Visual Communications and Image Processing (VCIP). IEEE, 391--394.

[12]

Yliess Hati, Gregor Jouet, Francis Rousseaux, and Clément Duhart. 2019. PaintsTorch: a User-Guided Anime Line Art Colorization Tool with Double Generator Conditional Adversarial Network. In European Conference on Visual Media Production. 1--10.

Digital Library

[13]

Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, Vol. 30 (2017).

Digital Library

[14]

Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, Vol. 33 (2020), 6840--6851.

[15]

Jonathan Ho and Tim Salimans. 2022. Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598 (2022).

[16]

Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7132--7141.

[17]

Zhiyuan Hu, Jia Jia, Bei Liu, Yaohua Bu, and Jianlong Fu. 2020. Aesthetic-aware image style transfer. In Proceedings of the 28th ACM International Conference on Multimedia. 3320--3329.

Digital Library

[18]

Xin Huang, Dong Liang, Hongrui Cai, Juyong Zhang, and Jinyuan Jia. 2022. CariPainter: Sketch Guided Interactive Caricature Generation. In Proceedings of the 30th ACM International Conference on Multimedia. 1232--1240.

Digital Library

[19]

Xun Huang, Ming-Yu Liu, Serge Belongie, and Jan Kautz. 2018. Multimodal unsupervised image-to-image translation. In Proceedings of the European conference on computer vision (ECCV). 172--189.

Digital Library

[20]

Jing Huo, Xiangde Liu, Wenbin Li, Yang Gao, Hujun Yin, and Jiebo Luo. 2022. CAST: Learning Both Geometric and Texture Style Transfers for Effective Caricature Generation. IEEE Transactions on Image Processing, Vol. 31 (2022), 3347--3358.

Digital Library

[21]

Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. 2017. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1125--1134.

[22]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

[23]

Hsin-Ying Lee, Hung-Yu Tseng, Jia-Bin Huang, Maneesh Singh, and Ming-Hsuan Yang. 2018. Diverse image-to-image translation via disentangled representations. In Proceedings of the European conference on computer vision (ECCV). 35--51.

Digital Library

[24]

Hsin-Ying Lee, Hung-Yu Tseng, Qi Mao, Jia-Bin Huang, Yu-Ding Lu, Maneesh Singh, and Ming-Hsuan Yang. 2020b. Drit: Diverse image-to-image translation via disentangled representations. International Journal of Computer Vision, Vol. 128, 10 (2020), 2402--2417.

Digital Library

[25]

Junsoo Lee, Eungyeup Kim, Yunsung Lee, Dongjun Kim, Jaehyuk Chang, and Jaegul Choo. 2020a. Reference-Based Sketch Image Colorization using Augmented-Self Reference and Dense Semantic Correspondence. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5801--5810.

[26]

Jieun Lee, Hyeonwoo Kim, Jonghwa Shim, and Eenjun Hwang. 2022. Cartoon-Flow: A Flow-Based Generative Adversarial Network for Arbitrary-Style Photo Cartoonization. In Proceedings of the 30th ACM International Conference on Multimedia. 1241--1251.

Digital Library

[27]

Wei Li, Tianzhao Yang, Xiao Wu, Xian-Jun Du, and Jian-Jun Qiao. 2022c. Learning Action-guided Spatio-temporal Transformer for Group Activity Recognition. In Proceedings of the 30th ACM International Conference on Multimedia. 2051--2060.

Digital Library

[28]

Yuan-kui Li, Yun-Hsuan Lien, and Yu-Shuen Wang. 2022b. Style-Structure Disentangled Features and Normalizing Flows for Diverse Icon Colorization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11244--11253.

[29]

Zekun Li, Zhengyang Geng, Zhao Kang, Wenyu Chen, and Yibo Yang. 2022a. Eliminating Gradient Conflict in Reference-based Line-Art Colorization. In European Conference on Computer Vision. Springer, 579--596.

[30]

Bingchen Liu, Yizhe Zhu, Kunpeng Song, and Ahmed Elgammal. 2021b. Self-supervised sketch-to-image synthesis. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 2073--2081.

[31]

Weiming Liu, Jiajie Su, Chaochao Chen, and Xiaolin Zheng. 2021a. Leveraging distribution alignment via stein path for cross-domain cold-start recommendation. Advances in Neural Information Processing Systems, Vol. 34 (2021), 19223--19234.

[32]

Mingsheng Long, Yue Cao, Jianmin Wang, and Michael Jordan. 2015. Learning transferable features with deep adaptation networks. In International conference on machine learning. PMLR, 97--105.

[33]

Jiaqi Ma, Zhe Zhao, Xinyang Yi, Jilin Chen, Lichan Hong, and Ed H Chi. 2018. Modeling task relationships in multi-task learning with multi-gate mixture-of-experts. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 1930--1939.

Digital Library

[34]

Roey Mechrez, Itamar Talmi, and Lihi Zelnik-Manor. 2018. The contextual loss for image transformation with non-aligned data. In Proceedings of the European conference on computer vision (ECCV). 768--783.

Digital Library

[35]

Larry R Medsker and LC Jain. 2001. Recurrent neural networks. Design and Applications, Vol. 5 (2001), 64--67.

[36]

Taesung Park, Alexei A Efros, Richard Zhang, and Jun-Yan Zhu. 2020. Contrastive learning for unpaired image-to-image translation. In European Conference on Computer Vision. Springer, 319--345.

Digital Library

[37]

Taesung Park, Ming-Yu Liu, Ting-Chun Wang, and Jun-Yan Zhu. 2019. GauGAN: semantic image synthesis with spatially adaptive normalization. In ACM SIGGRAPH 2019 Real-Time Live! 1--1.

Digital Library

[38]

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10684--10695.

[39]

Patsorn Sangkloy, Jingwan Lu, Chen Fang, Fisher Yu, and James Hays. 2017. Scribbler: Controlling deep image synthesis with sketch and color. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5400--5409.

[40]

Kobayashi Shigenobu. 1992. Color image scale.

[41]

Tsai-Ho Sun, Chien-Hsun Lai, Sai-Keung Wong, and Yu-Shuen Wang. 2019. Adversarial colorization of icons based on contour and color conditions. In Proceedings of the 27th ACM International Conference on Multimedia. 683--691.

Digital Library

[42]

Eric Tzeng, Judy Hoffman, Ning Zhang, Kate Saenko, and Trevor Darrell. 2014. Deep domain confusion: Maximizing for domain invariance. arXiv preprint arXiv:1412.3474 (2014).

[43]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems, Vol. 30 (2017).

[44]

Yael Vinker, Eliahu Horwitz, Nir Zabari, and Yedid Hoshen. 2021. Image Shape Manipulation from a Single Augmented Training Sample. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 13769--13778.

[45]

Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. 2018. High-resolution image synthesis and semantic manipulation with conditional gans. In Proceedings of the IEEE conference on computer vision and pattern recognition. 8798--8807.

[46]

Weilun Wang, Wengang Zhou, Jianmin Bao, Dong Chen, and Houqiang Li. 2021. Instance-wise Hard Negative Example Generation for Contrastive Learning in Unpaired Image-to-Image Translation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 14020--14029.

[47]

Xinrui Wang and Jinze Yu. 2020. Learning to cartoonize using white-box cartoon representations. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 8090--8099.

[48]

Zhizhong Wang, Zhanjie Zhang, Lei Zhao, Zhiwen Zuo, Ailin Li, Wei Xing, and Dongming Lu. 2022. AesUST: towards aesthetic-enhanced universal style transfer. In Proceedings of the 30th ACM International Conference on Multimedia. 1095--1106.

Digital Library

[49]

Shukai Wu, Weiming Liu, Qingqin Wang, Sanyuan Zhang, Zhenjie Hong, and Shuchang Xu. 2022. RefFaceNet: Reference-based Face Image Generation from Line Art Drawings. Neurocomputing (2022).

[50]

Shuai Yang, Zhangyang Wang, Jiaying Liu, and Zongming Guo. 2020. Deep plastic surgery: Robust and controllable image editing with human-drawn sketches. In European Conference on Computer Vision. Springer, 601--617.

Digital Library

[51]

Shuai Yang, Zhangyang Wang, Jiaying Liu, and Zongming Guo. 2021. Controllable Sketch-to-Image Translation for Robust Face Synthesis. IEEE Transactions on Image Processing, Vol. 30 (2021), 8797--8810.

Digital Library

[52]

Ming Yuan and T Tony Cai. 2010. A reproducing kernel Hilbert space approach to functional linear regression. (2010).

[53]

Yu Zeng, Zhe Lin, and Vishal M Patel. 2022. Sketchedit: Mask-free local image manipulation with partial sketches. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5951--5961.

[54]

Fangneng Zhan, Yingchen Yu, Kaiwen Cui, Gongjie Zhang, Shijian Lu, Jianxiong Pan, Changgong Zhang, Feiying Ma, Xuansong Xie, and Chunyan Miao. 2021. Unbalanced feature transport for exemplar-based image translation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 15028--15038.

[55]

Fangneng Zhan, Yingchen Yu, Rongliang Wu, Jiahui Zhang, Kaiwen Cui, Aoran Xiao, Shijian Lu, and Chunyan Miao. 2022a. Bi-level feature alignment for versatile image translation and manipulation. In Computer Vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part XVI. Springer, 224--241.

[56]

Fangneng Zhan, Yingchen Yu, Rongliang Wu, Jiahui Zhang, Shijian Lu, and Changgong Zhang. 2022b. Marginal Contrastive Correspondence for Guided Image Generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10663--10672.

[57]

Fangneng Zhan, Jiahui Zhang, Yingchen Yu, Rongliang Wu, and Shijian Lu. 2022c. Modulated contrast for versatile image synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18280--18290.

[58]

Dong Zhang, Jinhui Tang, and Kwang-Ting Cheng. 2022. Graph Reasoning Transformer for Image Parsing. In Proceedings of the 30th ACM International Conference on Multimedia. 2380--2389.

Digital Library

[59]

Lvmin Zhang and Maneesh Agrawala. 2023. Adding conditional control to text-to-image diffusion models. arXiv preprint arXiv:2302.05543 (2023).

[60]

Lvmin Zhang, Chengze Li, Edgar Simo-Serra, Yi Ji, Tien-Tsin Wong, and Chunping Liu. 2021. User-guided line art flat filling with split filling mechanism. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9889--9898.

[61]

Pan Zhang, Bo Zhang, Dong Chen, Lu Yuan, and Fang Wen. 2020. Cross-domain correspondence learning for exemplar-based image translation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5143--5153.

[62]

Xingran Zhou, Bo Zhang, Ting Zhang, Pan Zhang, Jianmin Bao, Dong Chen, Zhongfei Zhang, and Fang Wen. 2021. Cocosnet v2: Full-resolution correspondence learning for image translation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11465--11475.

[63]

Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision. 2223--2232.

[64]

Peihao Zhu, Rameen Abdal, Yipeng Qin, and Peter Wonka. 2020. Sean: Image synthesis with semantic region-adaptive normalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5104--5113.

Index Terms

FlexIcon: Flexible Icon Colorization via Guided Images and Palettes
1. Applied computing
  1. Arts and humanities
    1. Fine arts
2. Computing methodologies
  1. Computer graphics
    1. Image manipulation

Recommendations

PalGAN: Image Colorization with Palette Generative Adversarial Networks
Computer Vision – ECCV 2022
Abstract
Multimodal ambiguity and color bleeding remain challenging in colorization. To tackle these problems, we propose a new GAN-based colorization approach PalGAN, integrated with palette estimation and chromatic attention. To circumvent the ...
A Flexible and Effective Colorization System
ISPAN '09: Proceedings of the 2009 10th International Symposium on Pervasive Systems, Algorithms, and Networks

This paper presents a flexible and effective colorization system that carries out colorization by performing two stages of chrominance determination for pixels in a gray-scale image, including scribble region expansion and chrominance blending. The ...
Video recoloring via spatial-temporal geometric palettes

Color correction and color grading are important steps in film production. Recent palette-based approaches to image recoloring have shown that a small set of representative colors provide an intuitive set of handles for color adjustment. However, a ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

October 2023

9913 pages

ISBN:9798400701085

DOI:10.1145/3581783

General Chairs:
Abdulmotaleb El Saddik
University of Ottawa, Canada & MBZUAI, UAE
,
Tao Mei
HiDream.ai, China
,
Rita Cucchiara
University of Modena and Reggio Emilia, Italy
,
Program Chairs:
Marco Bertini
University of Florence, Italy
,
Diana Patricia Tobon Vallejo
Unversidad de Medellin, Colombia
,
Pradeep K. Atrey
University at Albany, State University of New York, USA
,
M. Shamim Hossain
M. Shamim Hossain (King Saud University, KSA

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China

Conference

MM '23

Sponsor:

SIGMM

MM '23: The 31st ACM International Conference on Multimedia

October 29 - November 3, 2023

Ottawa ON, Canada

Acceptance Rates

Overall Acceptance Rate 995 of 4,171 submissions, 24%

Upcoming Conference

MM '24

Sponsor:
sigmm

The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
165
Total Downloads

Downloads (Last 12 months)165
Downloads (Last 6 weeks)10

Reflects downloads up to 26 Sep 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents