research-article

Visual Correspondence Learning and Spatially Attentive Synthesis via Transformer for Exemplar-Based Anime Line Art Colorization

Authors:

Yijun WangAuthors Info & Claims

IEEE Transactions on Multimedia, Volume 26

Pages 6880 - 6890

https://doi.org/10.1109/TMM.2024.3358027

Published: 30 January 2024 Publication History

Abstract

Exemplar-based anime line art colorization of the same character has been a challenging problem in digital art production because of the sparse representation of line images and the significantly different anime appearance between line and color images. Therefore, it is a fundamental problem to find semantic correspondence between two kinds of images. In this paper, we propose a correspondence learning Transformer network for exemplar-based line art colorization, called ArtFormer, which utilizes a Transformer-based architecture to learn both spatial and visual relationships between line art and color images. ArtFormer mainly consists of two parts: correspondence learning and high-quality image generation. In particular, the correspondence learning module is composed of several Transformer blocks, each of which formulates the deep line image features and color images features as queries and keys, and learns the dense correspondence between two image domains. Then, the network synthesizes high-quality images with a newly proposed Spatial Attention Adaptive Normalization (SAAN) that uses warped deep exemplar features to modulate the shallow features for better adaptive normalization parameters generation. Both qualitative and quantitative experiments show that our method achieves the best performance on exemplar-based line art colorization compared with state-of-the-art methods and other baselines.

References

[1]

P. Sangkloy, J. Lu, C. Fang, F. Yu, and J. Hays, “Scribbler: Controlling deep image synthesis with sketch and color,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 6836–6845.

[2]

Y. Qu, T.-T. Wong, and P.-A. Heng, “Manga colorization,” ACM Trans. Graph., vol. 25, no. 3, pp. 1214–1220, 2006.

Digital Library

[3]

A. Orzan et al., “Diffusion curves: A vector representation for smooth-shaded images,” ACM Trans. Graph., vol. 27, no. 3, pp. 1–8, 2008.

Digital Library

[4]

D. Sỳkora, J. Dingliana, and S. Collins, “Lazybrush: Flexible painting tool for hand-drawn cartoons,” in Computer Graphics Forum, vol. 28, no. 2. Hoboken, NJ, USA: Wiley Online Library, 2009, pp. 599–608.

[5]

M. Shi et al., “Reference-based deep line art video colorization,” IEEE Trans. Vis. Comput. Graph., vol. 29, no. 6, pp. 2965–2979, 2023.

Digital Library

[6]

L. Yatziv and G. Sapiro, “Fast image and video colorization using chrominance blending,” IEEE Trans. Image Process., vol. 15, no. 5, pp. 1120–1129, May 2006.

Digital Library

[7]

B. Zhang et al., “Deep exemplar-based video colorization,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2019, pp. 8052–8061.

[8]

E. Casey, V. Pérez, and Z. Li, “The animation transformer: Visual correspondence via segment matching,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2021, pp. 11323–11332.

[9]

H. Zhu, X. Liu, T.-T. Wong, and P.-A. Heng, “Globally optimal toon tracking,” ACM Trans. Graph., vol. 35, no. 4, pp. 1–10, 2016.

Digital Library

[10]

M. He, D. Chen, J. Liao, P. V. Sander, and L. Yuan, “Deep exemplar-based colorization,” ACM Trans. Graph., vol. 37, no. 4, pp. 1–16, 2018.

Digital Library

[11]

H. Camilo, M. Clément, and A. Bugeau, “Super-attention for exemplar-based image colorization,” in Proc. Asian Conf. Comput. Vis., 2022, pp. 4548–4564.

[12]

H. Zhao, W. Wu, Y. Liu, and D. He, “Color2embed: Fast exemplar-based image colorization using color embeddings,” 2021, arXiv:2106.08017.

[13]

P. Zhang, B. Zhang, D. Chen, L. Yuan, and F. Wen, “Cross-domain correspondence learning for exemplar-based image translation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 5143–5153.

[14]

I. Rocco et al., “Neighbourhood consensus networks,” Adv. Neural Inf. Process. Syst., vol. 31, pp. 1658–1669, 2018.

[15]

X. Zhou et al., “Cocosnet v2: Full-resolution correspondence learning for image translation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2021, pp. 11465–11475.

[16]

J. Huang, J. Liao, and S. Kwong, “Semantic example guided image-to-image translation,” IEEE Trans. Multimedia, vol. 23, pp. 1654–1665, 2020.

[17]

A. Vaswani et al., “Attention is all you need,” Adv. Neural Inf. Process. Syst., vol. 30, pp. 6000–6010, 2017.

[18]

K. Lee et al., “Vitgan: Training gans with vision transformers,” in Proc. Int. Conf. Learn. Representations, 2022. [Online]. Available: https://openreview.net/forum?id=dwg5rXg1WS

[19]

A. Dosovitskiy et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” in Proc. Int. Conf. Learn. Representations, 2021. [Online]. Available: https://openreview.net/forum?id=YicbFdNTTy

[20]

N. Carion et al., “End-to-end object detection with transformers,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 213–229.

[21]

Y. Jiang, S. Chang, and Z. Wang, “Transgan: Two pure transformers can make one strong gan, and that can scale up,” Proc. Adv. Neural Inf. Process. Syst., M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan Eds., vol. 34, pp. 14745–14758, 2021. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2021/file/7c220a2091c26a7f5e9f1cfb099511e3-Paper.pdf

[22]

S. Zheng et al., “Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2021, pp. 6881–6890.

[23]

T. Park, M.-Y. Liu, T.-C. Wang, and J.-Y. Zhu, “Semantic image synthesis with spatially-adaptive normalization,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2019, pp. 2337–2346.

[24]

R. Zhang et al., “Real-time user-guided image colorization with learned deep priors,” ACM Trans. Graph., vol. 36, no. 4, pp. 1–11, 2017.

Digital Library

[25]

S. Yoo et al., “Coloring with limited data: Few-shot colorization via memory augmented networks,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2019, pp. 11275–11284.

[26]

R. Zhang, P. Isola, and A. A. Efros, “Colorful image colorization,” in Proc. Eur. Conf. Comput. Vis., 2016, pp. 649–666.

[27]

J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 2223–2232.

[28]

L. Chen, L. Wu, Z. Hu, and M. Wang, “Quality-aware unpaired image-to-image translation,” IEEE Trans. Multimedia, vol. 21, no. 10, pp. 2664–2674, Oct. 2019.

Digital Library

[29]

J.-Y. Zhu et al., “Toward multimodal image-to-image translation,” Adv. Neural Inf. Process. Syst., vol. 30, pp. 465–476, 2017.

[30]

H.-Y. Lee, H.-Y. Tseng, J.-B. Huang, M. Singh, and M.-H. Yang, “Diverse image-to-image translation via disentangled representations,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 35–51.

[31]

Y.-F. Zhou et al., “Branchgan: Unsupervised mutual image-to-image transfer with a single encoder and dual decoders,” IEEE Trans. Multimedia, vol. 21, pp. 3136–3149, 2019.

Digital Library

[32]

P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 1125–1134.

[33]

X. Huang, M.-Y. Liu, S. Belongie, and J. Kautz, “Multimodal unsupervised image-to-image translation,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 172–189.

[34]

J. Huang, L. Jing, Z. Tan, and S. Kwong, “Multi-density sketch-to-image translation network,” IEEE Trans. Multimedia, vol. 24, pp. 4002–4015, 2021.

[35]

J. Liao, Y. Yao, L. Yuan, G. Hua, and S. B. Kang, “Visual attribute transfer through deep image analogy,” ACM Trans. Graph., vol. 36, no. 4, pp. 1–15, 2017.

[36]

V. Jampani, R. Gadde, and P. V. Gehler, “Video propagation networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 451–461.

[37]

C. Sauvaget, S. Manuel, J.-N. Vittaut, J. Suarez, and V. Boyer, “Segmented images colorization using harmony,” in Proc. IEEE 6th Int. Conf. Signal-Image Technol. Internet Based Syst., 2010, pp. 153–160.

[38]

D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” Int. J. Comput. Vis., vol. 60, no. 2, pp. 91–110, 2004.

Digital Library

[39]

K. M. Yi, E. Trulls, V. Lepetit, and P. Fua, “Lift: Learned invariant feature transform,” in Proc. Eur. Conf. Comput. Vis., 2016, pp. 467–483.

[40]

P.-E. Sarlin, D. DeTone, T. Malisiewicz, and A. Rabinovich, “Superglue: Learning feature matching with graph neural networks,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 4938–4947.

[41]

J. Sun, Z. Shen, Y. Wang, H. Bao, and X. Zhou, “Loftr: Detector-free local feature matching with transformers,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2021, pp. 8922–8931.

[42]

W. Jiang, E. Trulls, J. Hosang, A. Tagliasacchi, and K. M. Yi, “Cotr: Correspondence transformer for matching across images,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2021, pp. 6207–6217.

[43]

S. W. Zamir et al., “Restormer: Efficient transformer for high-resolution image restoration,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2022, pp. 5728–5739.

[44]

P. Zhu, R. Abdal, Y. Qin, and P. Wonka, “Sean: Image synthesis with semantic region-adaptive normalization,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 5104–5113.

[45]

X. Huang and S. Belongie, “Arbitrary style transfer in real-time with adaptive instance normalization,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 1501–1510.

[46]

Q. Chen and V. Koltun, “Photographic image synthesis with cascaded refinement networks,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 1511–1520.

[47]

K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in Proc. 3rd Int. Conf. Learn. Representations, 2015.

[48]

H. Zhang, I. Goodfellow, D. Metaxas, and A. Odena, “Self-attention generative adversarial networks,” in Int. Conf. Mach. Learn., 2019, pp. 7354–7363.

[49]

J. Lee et al., “Reference-based sketch image colorization using augmented-self reference and dense semantic correspondence,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 5801–5810.

[50]

S. Liu, J. Ye, S. Ren, and X. Wang, “Dynast: Dynamic sparse transformer for exemplar-guided image generation,” in Proc. 17th Eur. Conf. Comput. Vis., 2022, pp. 72–90.

[51]

D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” 2017, arXiv:1412.6980.

[52]

T. Kim, “Anime sketch colorization pair,” 2018, Accessed on: Aug. 15, 2022. [Online]. Available: https://www.kaggle.com/ktaebum/anime-sketch-colorization-pair

[53]

J. Min, J. Lee, J. Ponce, and M. Cho, “Spair-71 k: A large-scale benchmark for semantic correspondence,” 2019, arXiv:1908.10543.

[54]

H. Winnemöller, J. E. Kyprianidis, and S. C. Olsen, “Xdog: An extended difference-of-gaussians compendium including advanced image stylization,” Comput. Graph., vol. 36, no. 6, pp. 740–753, 2012.

Digital Library

[55]

lllyasviel, “An U-Net with some algorithm to take the sketch from a painting,” 2017. Accessed on: Aug. 8, 2022. [Online]. Available: https://github.com/lllyasviel/sketchKeras

[56]

M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, “Gans trained by a two time-scale update rule converge to a local nash equilibrium,” Adv. Neural Inf. Process. Syst., vol. 30, pp. 6629–6640, 2017.

[57]

Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: From error visibility to structural similarity,” IEEE Trans. image Process., vol. 13, no. 4, pp. 600–612, Apr. 2004.

Digital Library

[58]

X. Ren, T. Yang, Y. Wang, and W. Zeng, “Learning disentangled representation by exploiting pretrained generative models: A contrastive learning view,” in Proc. Int. Conf. Learn. Representations, 2022.

Recommendations

Reference-Based Deep Line Art Video Colorization
Coloring line art images based on the colors of reference images is a crucial stage in animation production, which is time-consuming and tedious. This paper proposes a deep architecture to automatically color line art videos with the same color style as ...
Simultaneous Inpainting and Colorization via Tensor Completion
Neural Information Processing
Abstract
Both image inpainting and colorization can be considered as estimating certain missing pixel values from a given image, which is still a challenging problem in image processing. In fact, it is a more challenging problem to make image inpainting ...
Deep exemplar-based colorization

We propose the first deep learning approach for exemplar-based local colorization. Given a reference color image, our convolutional neural network directly maps a grayscale image to an output colorized image. Rather than using hand-crafted rules as in ...

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Multimedia

IEEE Transactions on Multimedia Volume 26, Issue

2024

10405 pages

ISSN:1520-9210

Issue’s Table of Contents

1520-9210 © 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.

Publisher

IEEE Press

Publication History

Published: 30 January 2024

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 12 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents