Abstract
Despite having promising results, style transfer, which requires preparing style images in advance, may result in lack of creativity and accessibility. Following human instruction, on the other hand, is the most natural way to perform artistic style transfer that can significantly improve controllability for visual effect applications. We introduce a new task—language-driven artistic style transfer (LDAST)—to manipulate the style of a content image, guided by a text. We propose contrastive language visual artist (CLVA) that learns to extract visual semantics from style instructions and accomplish LDAST by the patch-wise style discriminator. The discriminator considers the correlation between language and patches of style images or transferred results to jointly embed style instructions. CLVA further compares contrastive pairs of content images and style instructions to improve the mutual relativeness. The results from the same content image can preserve consistent content structures. Besides, they should present analogous style patterns from style instructions that contain similar visual semantics. The experiments show that our CLVA is effective and achieves superb transferred results on LDAST.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
WikiArt: https://www.wikiart.org.
- 2.
WallpapersCraft: https://wallpaperscraft.com/.
- 3.
Amazon Mechanical Turk: https://www.mturk.com.
References
Achlioptas, P., Ovsjanikov, M., Haydarov, K., Elhoseiny, M., Guibas, L.: ArtEmis: affective language for visual art. In: CVPR (2021)
Al-Sarraf, A., Shin, B.-S., Xu, Z., Klette, R.: Ground truth and performance evaluation of lane border detection. In: Chmielewski, L.J., Kozera, R., Shin, B.-S., Wojciechowski, K. (eds.) ICCVG 2014. LNCS, vol. 8671, pp. 66–74. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11331-9_9
Borkar, A., Hayes, M., Smith, M.T.: An efficient method to generate ground truth for evaluating lane detection systems. In: ICASSP (2010)
Chen, H., et al.: DualAST: dual style-learning networks for artistic style transfer. In: CVPR (2021)
Chen, J., Shen, Y., Gao, J., Liu, J., Liu, X.: Language-based image editing with recurrent attentive models. In: CVPR (2018)
Chen, Y.L., Hsu, C.T.: Towards deep style transfer: a content-aware perspective. In: BMVC (2016)
Cheng, M.M., et al.: ImageSpirit: verbal guided image parsing. In: ACM Transactions on Graphics (2013)
Cimpoi, M., Maji, S., Kokkinos, I., Mohamed, S., Vedaldi, A.: Describing textures in the wild. In: CVPR (2014)
El-Nouby, A., et al.: Tell, draw, and repeat: generating and modifying images based on continual linguistic instruction. In: ICCV (2019)
Fu, T.J., Wang, X.E., Grafton, S., Eckstein, M., Wang, W.Y.: SSCR: iterative language-based image editing via self-supervised counterfactual reasoning. In: EMNLP (2020)
Fu, T.J., Wang, X.E., Grafton, S., Eckstein, M., Wang, W.Y.: M3L: language-based video editing via multi-modal multi-level transformer. In: CVPR (2022)
Gal, R., Patashnik, O., Maron, H., Chechik, G., Cohen-Or, D.: StyleGAN-NADA: CLIP-guided domain adaptation of image generators. arXiv:2108.00946 (2021)
Gao, C., Gu, D., Zhang, F., Yu, Y.: ReCoNet: real-time coherent video style transfer network. arXiv:1807.01197 (2018)
Gatys, L.A., Ecker, A.S., Bethge, M.: A neural algorithm of artistic style. arXiv:1508.06576 (2015)
Gatys, L.A., Ecker, A.S., Bethge, M.: Texture synthesis using convolutional neural networks. In: NeurIPS (2015)
Gatys, L.A., Ecker, A.S., Bethge, M., Hertzmann, A., Shechtman, E.: Controlling perceptual factors in neural style transfer. In: CVPR (2017)
Goodfellow, I.J., et al.: Generative adversarial networks. In: NeurIPS (2014)
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local nash equilibrium. In: NeurIPS (2017)
Huang, H., et al.: Real-time neural style transfer for videos. In: CVPR (2017)
Huang, X., Belongie, S.: Arbitrary style transfer in real-time with adaptive instance normalization. In: ICCV (2017)
Jing, Y., Yang, Y., Feng, Z., Ye, J., Yu, Y., Song, M.: Neural style transfer: a review. arXiv:1705.04058 (2017)
Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: ECCV (2016)
Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving the image quality of StyleGAN. In: CVPR (2020)
Kwon, G., Ye, J.C.: CLIPstyler: image style transfer with a single text condition. In: CVPR (2022)
Laput, G., et al.: PixelTone: a multimodal interface for image editing. In: CHI (2013)
Li, B., Qi, X., Lukasiewicz, T., Torr, P.H.S.: ManiGAN: text-guided image manipulation. In: CVPR (2020)
Li, X., Liu, S., Kautz, J., Yang, M.H.: Learning linear transformations for fast arbitrary style transfer. In: CVPR (2019)
Li, Y., Fang, C., Yang, J., Wang, Z., Lu, X., Yang, M.H.: Universal style transfer via feature transforms. In: NeurIPS (2017)
Li, Y., Liu, M.Y., Li, X., Yang, M.H., Kautz, J.: A closed-form solution to photorealistic image stylization. In: ECCV (2018)
Liu, S., et al.: AdaAttN: revisit attention mechanism in arbitrary neural style transfer. In: ICCV (2021)
Liu, X., et al.: Open-edit: open-domain image manipulation with open-vocabulary instructions. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12356, pp. 89–106. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58621-8_6
Luan, F., Paris, S., Shechtman, E., Bala, K.: Deep photo style transfer. In: CVPR (2017)
Nam, S., Kim, Y., Kim, S.J.: Text-adaptive generative adversarial networks: manipulating images with natural language. In: NeurIPS (2018)
Nichol, A., et al.: GLIDE: towards photorealistic image generation and editing with text-guided diffusion models. In: arXiv:2112.10741 (2021)
Park, D.Y., Lee, K.H.: Arbitrary style transfer with style-attentional networks. In: CVPR (2019)
Park, T., et al.: Swapping autoencoder for deep image manipulation. In: NeurIPS (2020)
Patashnik, O., Wu, Z., Shechtman, E., Cohen-Or, D., Lischinski, D.: StyleCLIP: text-driven manipulation of StyleGAN imagery. In: ICCV (2021)
Qiao, T., Zhang, J., Xu, D., Tao, D.: MirrorGAN: Learning Text-to-image Generation by Redescription. In: CVPR (2019)
Ramesh, A., et al.: Zero-shot text-to-image generation. In: arXiv:2102.12092 (2021)
Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text to image synthesis. In: ICML (2016)
Salehi, P., Chalechale, A., Taghizadeh, M.: Generative adversarial networks (GANs): an overview of theoretical model, evaluation metrics, and recent developments. arXiv:2005.13178 (2020)
Salvo, R.D.: Large scale ground truth generation for performance evaluation of computer vision methods. In: VIGTA (2013)
Sanakoyeu, A., Kotovenko, D., Lang, S., Ommer, B.: A style-aware content loss for real-time HD style transfer. In: ECCV (2018)
Shi, L., et al.: Contrastive visual-linguistic pretraining. arXiv:2007.13135 (2020)
Somavarapu, N., Ma, C.Y., Kira, Z.: Frustratingly simple domain generalization via image stylization. arXiv:2006.11207 (2020)
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: CVPR (2016)
Wang, P., Li, Y., Vasconcelos, N.: Rethinking and improving the robustness of image style transfer. In: CVPR (2021)
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncel, E.P.: Image quality assessment: from error visibility to structural similarity. In: TIP (2004)
Wu, C., et al.: GODIVA: generating open-DomaIn videos from nAtural descriptions. arXiv:2104.14806 (2021)
Wu, C., Timm, M., Maji, S.: Describing textures using natural language. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 52–70. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_4
Wu, L., Wang, Y., Shao, L.: Cycle-consistent deep generative hashing for cross-modal retrieval. In: TIP (2018)
Xia, W., Yang, Y., Xue, J.H., Wu, B.: TediGAN: text-guided diverse face image generation and manipulation. In: CVPR (2021)
Xian, W., et al.: TextureGAN: controlling deep image synthesis with texture patches. In: CVPR (2018)
Xu, T., et al.: AttnGAN: fine-grained text to image generation with attentional generative adversarial networks. In: CVPR (2018)
Yi, Z., Zhang, H., Tan, P., Gong, M.: DualGAN: unsupervised dual learning for image-to-image translation. In: ICCV (2017)
Yoo, J., Uh, Y., Chun, S., Kang, B., Ha, J.W.: Photorealistic style transfer via wavelet transforms. In: ICCV (2019)
Zhang, H., Goodfellow, I., Metaxas, D., Odena, A.: Self-attention generative adversarial networks. In: PMLR (2019)
Zhang, H., et al.: StackGAN: text to photo-realistic image synthesis with stacked generative adversarial networks. In: ICCV (2017)
Zhang, Z., Wang, Z., Lin, Z., Qi, H.: Image super-resolution by neural texture transfer. In: CVPR (2019)
Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: ICCV (2017)
Acknowledgments
Research was sponsored by the U.S. Army Research Office and was accomplished under Contract Number W911NF-19-D-0001 for the Institute for Collaborative Biotechnologies. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation herein.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Fu, TJ., Wang, X.E., Wang, W.Y. (2022). Language-Driven Artistic Style Transfer. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13696. Springer, Cham. https://doi.org/10.1007/978-3-031-20059-5_41
Download citation
DOI: https://doi.org/10.1007/978-3-031-20059-5_41
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20058-8
Online ISBN: 978-3-031-20059-5
eBook Packages: Computer ScienceComputer Science (R0)