Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Visual Correspondence Learning and Spatially Attentive Synthesis via Transformer for Exemplar-Based Anime Line Art Colorization

Published: 30 January 2024 Publication History

Abstract

Exemplar-based anime line art colorization of the same character has been a challenging problem in digital art production because of the sparse representation of line images and the significantly different anime appearance between line and color images. Therefore, it is a fundamental problem to find semantic correspondence between two kinds of images. In this paper, we propose a correspondence learning Transformer network for exemplar-based line art colorization, called ArtFormer, which utilizes a Transformer-based architecture to learn both spatial and visual relationships between line art and color images. ArtFormer mainly consists of two parts: correspondence learning and high-quality image generation. In particular, the correspondence learning module is composed of several Transformer blocks, each of which formulates the deep line image features and color images features as queries and keys, and learns the dense correspondence between two image domains. Then, the network synthesizes high-quality images with a newly proposed Spatial Attention Adaptive Normalization (SAAN) that uses warped deep exemplar features to modulate the shallow features for better adaptive normalization parameters generation. Both qualitative and quantitative experiments show that our method achieves the best performance on exemplar-based line art colorization compared with state-of-the-art methods and other baselines.

References

[1]
P. Sangkloy, J. Lu, C. Fang, F. Yu, and J. Hays, “Scribbler: Controlling deep image synthesis with sketch and color,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 6836–6845.
[2]
Y. Qu, T.-T. Wong, and P.-A. Heng, “Manga colorization,” ACM Trans. Graph., vol. 25, no. 3, pp. 1214–1220, 2006.
[3]
A. Orzan et al., “Diffusion curves: A vector representation for smooth-shaded images,” ACM Trans. Graph., vol. 27, no. 3, pp. 1–8, 2008.
[4]
D. Sỳkora, J. Dingliana, and S. Collins, “Lazybrush: Flexible painting tool for hand-drawn cartoons,” in Computer Graphics Forum, vol. 28, no. 2. Hoboken, NJ, USA: Wiley Online Library, 2009, pp. 599–608.
[5]
M. Shi et al., “Reference-based deep line art video colorization,” IEEE Trans. Vis. Comput. Graph., vol. 29, no. 6, pp. 2965–2979, 2023.
[6]
L. Yatziv and G. Sapiro, “Fast image and video colorization using chrominance blending,” IEEE Trans. Image Process., vol. 15, no. 5, pp. 1120–1129, May 2006.
[7]
B. Zhang et al., “Deep exemplar-based video colorization,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2019, pp. 8052–8061.
[8]
E. Casey, V. Pérez, and Z. Li, “The animation transformer: Visual correspondence via segment matching,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2021, pp. 11323–11332.
[9]
H. Zhu, X. Liu, T.-T. Wong, and P.-A. Heng, “Globally optimal toon tracking,” ACM Trans. Graph., vol. 35, no. 4, pp. 1–10, 2016.
[10]
M. He, D. Chen, J. Liao, P. V. Sander, and L. Yuan, “Deep exemplar-based colorization,” ACM Trans. Graph., vol. 37, no. 4, pp. 1–16, 2018.
[11]
H. Camilo, M. Clément, and A. Bugeau, “Super-attention for exemplar-based image colorization,” in Proc. Asian Conf. Comput. Vis., 2022, pp. 4548–4564.
[12]
H. Zhao, W. Wu, Y. Liu, and D. He, “Color2embed: Fast exemplar-based image colorization using color embeddings,” 2021, arXiv:2106.08017.
[13]
P. Zhang, B. Zhang, D. Chen, L. Yuan, and F. Wen, “Cross-domain correspondence learning for exemplar-based image translation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 5143–5153.
[14]
I. Rocco et al., “Neighbourhood consensus networks,” Adv. Neural Inf. Process. Syst., vol. 31, pp. 1658–1669, 2018.
[15]
X. Zhou et al., “Cocosnet v2: Full-resolution correspondence learning for image translation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2021, pp. 11465–11475.
[16]
J. Huang, J. Liao, and S. Kwong, “Semantic example guided image-to-image translation,” IEEE Trans. Multimedia, vol. 23, pp. 1654–1665, 2020.
[17]
A. Vaswani et al., “Attention is all you need,” Adv. Neural Inf. Process. Syst., vol. 30, pp. 6000–6010, 2017.
[18]
K. Lee et al., “Vitgan: Training gans with vision transformers,” in Proc. Int. Conf. Learn. Representations, 2022. [Online]. Available: https://openreview.net/forum?id=dwg5rXg1WS
[19]
A. Dosovitskiy et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” in Proc. Int. Conf. Learn. Representations, 2021. [Online]. Available: https://openreview.net/forum?id=YicbFdNTTy
[20]
N. Carion et al., “End-to-end object detection with transformers,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 213–229.
[21]
Y. Jiang, S. Chang, and Z. Wang, “Transgan: Two pure transformers can make one strong gan, and that can scale up,” Proc. Adv. Neural Inf. Process. Syst., M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan Eds., vol. 34, pp. 14745–14758, 2021. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2021/file/7c220a2091c26a7f5e9f1cfb099511e3-Paper.pdf
[22]
S. Zheng et al., “Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2021, pp. 6881–6890.
[23]
T. Park, M.-Y. Liu, T.-C. Wang, and J.-Y. Zhu, “Semantic image synthesis with spatially-adaptive normalization,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2019, pp. 2337–2346.
[24]
R. Zhang et al., “Real-time user-guided image colorization with learned deep priors,” ACM Trans. Graph., vol. 36, no. 4, pp. 1–11, 2017.
[25]
S. Yoo et al., “Coloring with limited data: Few-shot colorization via memory augmented networks,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2019, pp. 11275–11284.
[26]
R. Zhang, P. Isola, and A. A. Efros, “Colorful image colorization,” in Proc. Eur. Conf. Comput. Vis., 2016, pp. 649–666.
[27]
J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 2223–2232.
[28]
L. Chen, L. Wu, Z. Hu, and M. Wang, “Quality-aware unpaired image-to-image translation,” IEEE Trans. Multimedia, vol. 21, no. 10, pp. 2664–2674, Oct. 2019.
[29]
J.-Y. Zhu et al., “Toward multimodal image-to-image translation,” Adv. Neural Inf. Process. Syst., vol. 30, pp. 465–476, 2017.
[30]
H.-Y. Lee, H.-Y. Tseng, J.-B. Huang, M. Singh, and M.-H. Yang, “Diverse image-to-image translation via disentangled representations,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 35–51.
[31]
Y.-F. Zhou et al., “Branchgan: Unsupervised mutual image-to-image transfer with a single encoder and dual decoders,” IEEE Trans. Multimedia, vol. 21, pp. 3136–3149, 2019.
[32]
P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 1125–1134.
[33]
X. Huang, M.-Y. Liu, S. Belongie, and J. Kautz, “Multimodal unsupervised image-to-image translation,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 172–189.
[34]
J. Huang, L. Jing, Z. Tan, and S. Kwong, “Multi-density sketch-to-image translation network,” IEEE Trans. Multimedia, vol. 24, pp. 4002–4015, 2021.
[35]
J. Liao, Y. Yao, L. Yuan, G. Hua, and S. B. Kang, “Visual attribute transfer through deep image analogy,” ACM Trans. Graph., vol. 36, no. 4, pp. 1–15, 2017.
[36]
V. Jampani, R. Gadde, and P. V. Gehler, “Video propagation networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 451–461.
[37]
C. Sauvaget, S. Manuel, J.-N. Vittaut, J. Suarez, and V. Boyer, “Segmented images colorization using harmony,” in Proc. IEEE 6th Int. Conf. Signal-Image Technol. Internet Based Syst., 2010, pp. 153–160.
[38]
D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” Int. J. Comput. Vis., vol. 60, no. 2, pp. 91–110, 2004.
[39]
K. M. Yi, E. Trulls, V. Lepetit, and P. Fua, “Lift: Learned invariant feature transform,” in Proc. Eur. Conf. Comput. Vis., 2016, pp. 467–483.
[40]
P.-E. Sarlin, D. DeTone, T. Malisiewicz, and A. Rabinovich, “Superglue: Learning feature matching with graph neural networks,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 4938–4947.
[41]
J. Sun, Z. Shen, Y. Wang, H. Bao, and X. Zhou, “Loftr: Detector-free local feature matching with transformers,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2021, pp. 8922–8931.
[42]
W. Jiang, E. Trulls, J. Hosang, A. Tagliasacchi, and K. M. Yi, “Cotr: Correspondence transformer for matching across images,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2021, pp. 6207–6217.
[43]
S. W. Zamir et al., “Restormer: Efficient transformer for high-resolution image restoration,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2022, pp. 5728–5739.
[44]
P. Zhu, R. Abdal, Y. Qin, and P. Wonka, “Sean: Image synthesis with semantic region-adaptive normalization,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 5104–5113.
[45]
X. Huang and S. Belongie, “Arbitrary style transfer in real-time with adaptive instance normalization,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 1501–1510.
[46]
Q. Chen and V. Koltun, “Photographic image synthesis with cascaded refinement networks,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 1511–1520.
[47]
K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in Proc. 3rd Int. Conf. Learn. Representations, 2015.
[48]
H. Zhang, I. Goodfellow, D. Metaxas, and A. Odena, “Self-attention generative adversarial networks,” in Int. Conf. Mach. Learn., 2019, pp. 7354–7363.
[49]
J. Lee et al., “Reference-based sketch image colorization using augmented-self reference and dense semantic correspondence,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 5801–5810.
[50]
S. Liu, J. Ye, S. Ren, and X. Wang, “Dynast: Dynamic sparse transformer for exemplar-guided image generation,” in Proc. 17th Eur. Conf. Comput. Vis., 2022, pp. 72–90.
[51]
D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” 2017, arXiv:1412.6980.
[52]
T. Kim, “Anime sketch colorization pair,” 2018, Accessed on: Aug. 15, 2022. [Online]. Available: https://www.kaggle.com/ktaebum/anime-sketch-colorization-pair
[53]
J. Min, J. Lee, J. Ponce, and M. Cho, “Spair-71 k: A large-scale benchmark for semantic correspondence,” 2019, arXiv:1908.10543.
[54]
H. Winnemöller, J. E. Kyprianidis, and S. C. Olsen, “Xdog: An extended difference-of-gaussians compendium including advanced image stylization,” Comput. Graph., vol. 36, no. 6, pp. 740–753, 2012.
[55]
lllyasviel, “An U-Net with some algorithm to take the sketch from a painting,” 2017. Accessed on: Aug. 8, 2022. [Online]. Available: https://github.com/lllyasviel/sketchKeras
[56]
M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, “Gans trained by a two time-scale update rule converge to a local nash equilibrium,” Adv. Neural Inf. Process. Syst., vol. 30, pp. 6629–6640, 2017.
[57]
Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: From error visibility to structural similarity,” IEEE Trans. image Process., vol. 13, no. 4, pp. 600–612, Apr. 2004.
[58]
X. Ren, T. Yang, Y. Wang, and W. Zeng, “Learning disentangled representation by exploiting pretrained generative models: A contrastive learning view,” in Proc. Int. Conf. Learn. Representations, 2022.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Multimedia
IEEE Transactions on Multimedia  Volume 26, Issue
2024
10405 pages

Publisher

IEEE Press

Publication History

Published: 30 January 2024

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media