Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3565516.3565525acmconferencesArticle/Chapter ViewAbstractPublication PagescvmpConference Proceedingsconference-collections
research-article

U-Attention to Textures: Hierarchical Hourglass Vision Transformer for Universal Texture Synthesis

Published: 01 December 2022 Publication History

Abstract

We present a novel U-Attention vision Transformer for universal texture synthesis. We exploit the natural long-range dependencies enabled by the attention mechanism to allow our approach to synthesize diverse textures while preserving their structures in a single inference. We propose a hierarchical hourglass backbone that attends to the global structure and performs patch mapping at varying scales in a coarse-to-fine-to-coarse stream. Completed by skip connection and convolution designs that propagate and fuse information at different scales, our hierarchical U-Attention architecture unifies attention to features from macro structures to micro details, and progressively refines synthesis results at successive stages. Our method achieves stronger 2 × synthesis than previous work on both stochastic and structured textures while generalizing to unseen textures without fine-tuning. Ablation studies demonstrate the effectiveness of each component of our architecture.

Supplementary Material

Supplemental material (cvmp22-9-supp.pdf)

References

[1]
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473(2014).
[2]
Irwan Bello. 2021. Lambdanetworks: Modeling long-range interactions without attention. arXiv preprint arXiv:2102.08602(2021).
[3]
Urs Bergmann, Nikolay Jetchev, and Roland Vollgraf. 2017. Learning Texture Manifolds with the Periodic Spatial GAN. In Proceedings of the 34th International Conference on Machine Learning(Proceedings of Machine Learning Research, Vol. 70), Doina Precupand Yee Whye Teh (Eds.). PMLR, 469–477. http://proceedings.mlr.press/v70/bergmann17a.html
[4]
Ya-Liang Chang, Zhe Yu Liu, Kuan-Ying Lee, and Winston Hsu. 2019. Free-form video inpainting with 3d gated convolution and temporal patchgan. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 9066–9075.
[5]
Hanting Chen, Yunhe Wang, Tianyu Guo, Chang Xu, Yiping Deng, Zhenhua Liu, Siwei Ma, Chunjing Xu, Chao Xu, and Wen Gao. 2021. Pre-trained image processing transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12299–12310.
[6]
Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, and Ilya Sutskever. 2020. Generative pretraining from pixels. In International Conference on Machine Learning. PMLR, 1691–1703.
[7]
Krzysztof Choromanski, Valerii Likhosherstov, David Dohan, Xingyou Song, Andreea Gane, Tamas Sarlos, Peter Hawkins, Jared Davis, Afroz Mohiuddin, Lukasz Kaiser, 2020. Rethinking attention with performers. arXiv preprint arXiv:2009.14794(2020).
[8]
M. Cimpoi, S. Maji, I. Kokkinos, S. Mohamed, and A. Vedaldi. 2014. Describing Textures in the Wild. In Proceedings of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR).
[9]
D. Dai, H. Riemenschneider, and L. Van Gool. 2014. The Synthesizability of Texture Examples. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[10]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929(2020).
[11]
Alexei A Efros and William T Freeman. 2001. Image quilting for texture synthesis and transfer. In Proceedings of the 28th annual conference on Computer graphics and interactive techniques. 341–346.
[12]
Alexei A. Efros and Thomas K. Leung. 1999. Texture synthesis by non-parametric sampling. Proceedings of the Seventh IEEE International Conference on Computer Vision 2 (1999), 1033–1038 vol.2.
[13]
B. Galerne, Y. Gousseau, and J. Morel. 2011. Random Phase Textures: Theory and Synthesis. IEEE Transactions on Image Processing 20, 1 (2011), 257–267.
[14]
Bruno Galerne, Ares Lagae, Sylvain Lefebvre, and George Drettakis. 2012. Gabor Noise by Example. ACM Trans. Graph. 31, 4, Article 73 (July 2012), 9 pages. https://doi.org/10.1145/2185520.2185569
[15]
Bruno Galerne, Arthur Leclaire, and Lionel Moisan. 2017. Texton Noise. Computer Graphics Forum 36 (2017).
[16]
Leon Gatys, Alexander S Ecker, and Matthias Bethge. 2015. Texture synthesis using convolutional neural networks. Advances in neural information processing systems 28 (2015), 262–270.
[17]
Guillaume Gilet, Basile Sauvage, Kenneth Vanhoey, Jean-Michel Dischler, and Djamchid Ghazanfarpour. 2014. Local random-phase noise for procedural texturing. ACM Transactions on Graphics (TOG) 33 (2014), 1 – 11.
[18]
David J. Heeger and James R. Bergen. 1995. Pyramid-based texture analysis/synthesis. Proceedings of the 22nd annual conference on Computer graphics and interactive techniques (1995).
[19]
Eric Heitz and Fabrice Neyret. 2018. High-Performance By-Example Noise using a Histogram-Preserving Blending Operator. Proc. ACM Comput. Graph. Interact. Tech. 1 (2018), 31:1–31:25.
[20]
Philipp Henzler, Niloy J Mitra, and Tobias Ritschel. 2020. Learning a Neural 3D Texture Space from 2D Exemplars. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 8356 – 8364.
[21]
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. 2017. Image-to-Image Translation with Conditional Adversarial Networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017), 5967–5976.
[22]
Andrew Jaegle, Felix Gimeno, Andrew Brock, Andrew Zisserman, Oriol Vinyals, and João Carreira. 2021. Perceiver: General Perception with Iterative Attention. In ICML.
[23]
Nikolay Jetchev, Urs Bergmann, and Roland Vollgraf. 2016. Texture synthesis with spatial generative adversarial networks. arXiv preprint arXiv:1611.08207(2016).
[24]
Alexandre Kaspar, Boris Neubert, Dani Lischinski, Mark Pauly, and Johannes Kopf. 2015. Self Tuning Texture Optimization. Computer Graphics Forum 34 (2015).
[25]
Diederik P Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In ICLR.
[26]
Roland Kwitt and Peter Meerwald. 2008. Salzburg Texture Image Database (stex). https://wavelab.at/sources/STex/
[27]
Yijun Li, Chen Fang, Jimei Yang, Zhaowen Wang, Xin Lu, and Ming-Hsuan Yang. 2017. Universal Style Transfer via Feature Transforms. In Advances in Neural Information Processing Systems.
[28]
Guilin Liu, Rohan Taori, Ting-Chun Wang, Zhiding Yu, Shiqiu Liu, Fitsum A Reda, Karan Sapra, Andrew Tao, and Bryan Catanzaro. 2020. Transposer: Universal texture synthesis using feature maps as transposed convolution filter. arXiv preprint arXiv:2007.07243(2020).
[29]
Morteza Mardani, Guilin Liu, Aysegul Dundar, Shiqiu Liu, Andrew Tao, and Bryan Catanzaro. 2020. Neural ffts for universal texture image synthesis. Advances in Neural Information Processing Systems 33 (2020).
[30]
Eyvind Niklasson, Alexander Mordvintsev, Ettore Randazzo, and Michael Levin. 2021. Self-organising textures. Distill 6, 2 (2021), e00027–003.
[31]
Niki Parmar, Ashish Vaswani, Jakob Uszkoreit, Lukasz Kaiser, Noam Shazeer, Alexander Ku, and Dustin Tran. 2018. Image transformer. In International Conference on Machine Learning. PMLR, 4055–4064.
[32]
Lara Raad, Axel Davy, Agnès Desolneux, and Jean-Michel Morel. 2017. A survey of exemplar-based texture synthesis. CoRR abs/1707.07184(2017). arXiv:1707.07184http://arxiv.org/abs/1707.07184
[33]
Carlos Rodriguez-Pardo, Sergio Suja, David Pascual, Jorge Lopez-Moreno, and Elena Garces. 2019. Automatic extraction and synthesis of regular repeatable patterns. Computers & Graphics 83(2019), 33–41.
[34]
Tamar Rott Shaham, Tali Dekel, and Tomer Michaeli. 2019. SinGAN: Learning a Generative Model From a Single Natural Image. 2019 IEEE/CVF International Conference on Computer Vision (ICCV) (2019), 4569–4579.
[35]
Yi Tay, Dara Bahri, Donald Metzler, Da-Cheng Juan, Zhe Zhao, and Che Zheng. 2021. Synthesizer: Rethinking self-attention for transformer models. In International Conference on Machine Learning. PMLR, 10183–10192.
[36]
Dmitry Ulyanov, Vadim Lebedev, Andrea Vedaldi, and Victor S. Lempitsky. 2016. Texture Networks: Feed-forward Synthesis of Textures and Stylized Images. In ICML.
[37]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).
[38]
Huiyu Wang, Yukun Zhu, Bradley Green, Hartwig Adam, Alan Loddon Yuille, and Liang-Chieh Chen. 2020b. Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation. In ECCV.
[39]
Sinong Wang, Belinda Z Li, Madian Khabsa, Han Fang, and Hao Ma. 2020a. Linformer: Self-attention with linear complexity. arXiv preprint arXiv:2006.04768(2020).
[40]
Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. 2018. High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
[41]
Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13, 4 (2004), 600–612.
[42]
Fuzhi Yang, Huan Yang, Jianlong Fu, Hongtao Lu, and Baining Guo. 2020. Learning texture transformer network for image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5791–5800.
[43]
Yanhong Zeng, Jianlong Fu, and Hongyang Chao. 2020. Learning joint spatial-temporal transformations for video inpainting. In European Conference on Computer Vision. Springer, 528–543.
[44]
Han Zhang, Ian Goodfellow, Dimitris Metaxas, and Augustus Odena. 2019. Self-attention generative adversarial networks. In International conference on machine learning. PMLR, 7354–7363.
[45]
Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. 2018. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In CVPR.
[46]
Yang Zhou, Zhen Zhu, Xiang Bai, Dani Lischinski, Daniel Cohen-Or, and Hui Huang. 2018. Non-stationary texture synthesis by adversarial expansion. ACM Transactions on Graphics (TOG) 37 (2018), 1 – 13.

Cited By

View all
  • (2024)Content-aware Tile Generation using Exterior Boundary InpaintingACM Transactions on Graphics10.1145/368798143:6(1-12)Online publication date: 19-Dec-2024
  • (2024)TexTile: A Differentiable Metric for Texture Tileability2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.00425(4439-4449)Online publication date: 16-Jun-2024
  • (2023)Controllable Garment Image Synthesis Integrated with Frequency Domain FeaturesComputer Graphics Forum10.1111/cgf.1493842:7Online publication date: 5-Nov-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CVMP '22: Proceedings of the 19th ACM SIGGRAPH European Conference on Visual Media Production
December 2022
97 pages
ISBN:9781450399395
DOI:10.1145/3565516
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 December 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. hierarchical network architecture
  2. multi-scale attention
  3. texture synthesis
  4. vision Transformer

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

CVMP '22
Sponsor:
CVMP '22: European Conference on Visual Media Production
December 1 - 2, 2022
London, United Kingdom

Acceptance Rates

Overall Acceptance Rate 40 of 67 submissions, 60%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)34
  • Downloads (Last 6 weeks)2
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Content-aware Tile Generation using Exterior Boundary InpaintingACM Transactions on Graphics10.1145/368798143:6(1-12)Online publication date: 19-Dec-2024
  • (2024)TexTile: A Differentiable Metric for Texture Tileability2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.00425(4439-4449)Online publication date: 16-Jun-2024
  • (2023)Controllable Garment Image Synthesis Integrated with Frequency Domain FeaturesComputer Graphics Forum10.1111/cgf.1493842:7Online publication date: 5-Nov-2023
  • (2023)A Geometrically Aware Auto-Encoder for Multi-texture SynthesisScale Space and Variational Methods in Computer Vision10.1007/978-3-031-31975-4_20(263-275)Online publication date: 10-May-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media