Abstract
Reference-based image super-resolution (RefSR) aims to exploit auxiliary reference (Ref) images to super-resolve low-resolution (LR) images. Recently, RefSR has been attracting great attention as it provides an alternative way to surpass single image SR. However, addressing the RefSR problem has two critical challenges: (i) It is difficult to match the correspondence between LR and Ref images when they are significantly different; (ii) How to transfer the relevant texture from Ref images to compensate the details for LR images is very challenging. To address these issues of RefSR, this paper proposes a deformable attention Transformer, namely DATSR, with multiple scales, each of which consists of a texture feature encoder (TFE) module, a reference-based deformable attention (RDA) module and a residual feature aggregation (RFA) module. Specifically, TFE first extracts image transformation (e.g., brightness) insensitive features for LR and Ref images, RDA then can exploit multiple relevant textures to compensate more information for LR features, and RFA lastly aggregates LR features and relevant textures to get a more visually pleasant result. Extensive experiments demonstrate that our DATSR achieves state-of-the-art performance on benchmark datasets quantitatively and qualitatively.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks. In: International Conference on Machine Learning, pp. 214–223 (2017)
Cao, H., ET AL.: Swin-Unet: unet-like pure transformer for medical image segmentation. arXiv preprint arXiv:2105.05537 (2021)
Chan, K.C., Zhou, S., Xu, X., Loy, C.C.: BasicVSR++: Improving video super-resolution with enhanced propagation and alignment. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 5972–5981 (2022)
Dai, J., et al.: Deformable convolutional networks. In: IEEE International Conference on Computer Vision, pp. 764–773 (2017)
Dai, T., Cai, J., Zhang, Y., Xia, S.T., Zhang, L.: Second-order attention network for single image super-resolution. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 11065–11074 (2019)
Dong, C., Loy, C.C., He, K., Tang, X.: Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 38(2), 295–307 (2015)
Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, vol. 27 (2014)
Guo, Y., et al.: Closed-loop matters: dual regression networks for single image super-resolution. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 5407–5416 (2020)
Guo, Y., Luo, Y., He, Z., Huang, J., Chen, J.: Hierarchical neural architecture search for single image super-resolution. IEEE Sig. Process. Lett. 27, 1255–1259 (2020)
Huang, J.B., Singh, A., Ahuja, N.: Single image super-resolution from transformed self-exemplars. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 5197–5206 (2015)
Hui, Z., Li, J., Wang, X., Gao, X.: Learning the non-differentiable optimization for blind super-resolution. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2093–2102 (2021)
Jiang, Y., Chan, K.C., Wang, X., Loy, C.C., Liu, Z.: Robust reference-based super-resolution via c2-matching. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2103–2112 (2021)
Jo, Y., Kim, S.J.: Practical single-image super-resolution using look-up table. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 691–700 (2021)
Kar, A., Biswas, P.K.: Fast bayesian uncertainty estimation and reduction of batch normalized single image super-resolution network. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4957–4966 (2021)
Khrulkov, V., Babenko, A.: Neural side-by-side: predicting human preferences for no-reference super-resolution evaluation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4988–4997 (2021)
Kong, X., Zhao, H., Qiao, Y., Dong, C.: ClassSR: a general framework to accelerate super-resolution networks by data characteristic. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 12016–12025 (2021)
Ledig, C., et al.: Photo-realistic single image super-resolution using a generative adversarial network. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4681–4690 (2017)
Li, Z., Yang, J., Liu, Z., Yang, X., Jeon, G., Wu, W.: Feedback network for image super-resolution. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3867–3876 (2019)
Liang, J., Cao, J., Sun, G., Zhang, K., Van Gool, L., Timofte, R.: SwinIR: image restoration using swin transformer. In: IEEE International Conference on Computer Vision Workshops, pp. 1833–1844 (2021)
Liang, J., Lugmayr, A., Zhang, K., Danelljan, M., Van Gool, L., Timofte, R.: Hierarchical conditional flow: a unified framework for image super-resolution and image rescaling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4076–4085 (2021)
Liang, J., Sun, G., Zhang, K., Van Gool, L., Timofte, R.: Mutual affine network for spatially variant kernel estimation in blind image super-resolution. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4096–4105 (2021)
Lim, B., Son, S., Kim, H., Nah, S., Mu Lee, K.: Enhanced deep residual networks for single image super-resolution. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 136–144 (2017)
Liu, J., Zhang, W., Tang, Y., Tang, J., Wu, G.: Residual feature aggregation network for image super-resolution. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2359–2368 (2020)
Liu, Q., Liu, C.: A novel locally linear KNN model for visual recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 9446–9454 (2015)
Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: IEEE International Conference on Computer Vision, pp. 10012–10022 (2021)
Liu, Z., et al.: Video Swin transformer. arXiv preprint arXiv:2106.13230 (2021)
Lu, L., Li, W., Tao, X., Lu, J., Jia, J.: MASA-SR: matching acceleration and spatial adaptation for reference-based image super-resolution. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6368–6377 (2021)
Lucas, A., Lopez-Tapia, S., Molina, R., Katsaggelos, A.K.: Generative adversarial networks and perceptual losses for video super-resolution. IEEE Trans. Image Process. 28(7), 3312–3327 (2019)
Lugmayr, A., Danelljan, M., Timofte, R.: Ntire 2020 challenge on real-world image super-resolution: Methods and results. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 494–495 (2020)
Matsui, Y., et al.: Sketch-based manga retrieval using manga109 dataset. Multimedia Tools Appl. 76(20), 21811–21838 (2016). https://doi.org/10.1007/s11042-016-4020-z
Mei, Y., Fan, Y., Zhou, Y.: Image super-resolution with non-local sparse attention. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3517–3526 (2021)
Pesavento, M., Volino, M., Hilton, A.: Attention-based multi-reference learning for image super-resolution. In: IEEE International Conference on Computer Vision, pp. 14697–14706 (2021)
Sajjadi, M.S., Scholkopf, B., Hirsch, M.: EnhanceNet: single image super-resolution through automated texture synthesis. In: IEEE International Conference on Computer Vision, pp. 4491–4500 (2017)
Shim, G., Park, J., Kweon, I.S.: Robust reference-based super-resolution with similarity-aware deformable convolution. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 8425–8434 (2020)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (2015)
Song, X., et al.: Channel attention based iterative residual learning for depth map super-resolution. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 5631–5640 (2020)
Sun, L., Hays, J.: Super-resolution from internet-scale scene matching. In: IEEE International Conference on Computational Photography, pp. 1–12 (2012)
Ulyanov, D., Vedaldi, A., Lempitsky, V.: Deep image prior. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 9446–9454 (2018)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Wang, L., Kim, T.K., Yoon, K.J.: EventSR: from asynchronous events to image reconstruction, restoration, and super-resolution via end-to-end adversarial learning. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 8315–8325 (2020)
Wang, L., et al.: Exploring sparsity in image super-resolution for efficient inference. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4917–4926 (2021)
Wang, X., Chan, K.C., Yu, K., Dong, C., Change Loy, C.: EDVR: video restoration with enhanced deformable convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops (2019)
Wang, X., Xie, L., Dong, C., Shan, Y.: Real-ESRGAN: training real-world blind super-resolution with pure synthetic data. In: IEEE International Conference on Computer Vision, pp. 1905–1914 (2021)
Wang, X., et al.: ESRGAN: enhanced super-resolution generative adversarial networks. In: European Conference on Computer Vision Workshops (2018)
Xie, Y., Xiao, J., Sun, M., Yao, C., Huang, K.: Feature representation matters: end-to-end learning for reference-based image super-resolution. In: European Conference on Computer Vision, pp. 230–245 (2020)
Xing, W., Egiazarian, K.: End-to-end learning for joint image demosaicing, denoising and super-resolution. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3507–3516 (2021)
Yan, X., Zhao, W., Yuan, K., Zhang, R., Li, Z., Cui, S.: Towards content-independent multi-reference super-resolution: adaptive pattern matching and feature aggregation. In: European Conference on Computer Vision, pp. 52–68 (2020)
Yang, F., Yang, H., Fu, J., Lu, H., Guo, B.: Learning texture transformer network for image super-resolution. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 5791–5800 (2020)
Zhang, K., Liang, J., Van Gool, L., Timofte, R.: Designing a practical degradation model for deep blind image super-resolution. In: IEEE Conference on International Conference on Computer Vision, pp. 4791–4800 (2021)
Zhang, K., Liang, J., Van Gool, L., Timofte, R.: Designing a practical degradation model for deep blind image super-resolution. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4791–4800 (2021)
Zhang, K., Zuo, W., Zhang, L.: FFDNet: toward a fast and flexible solution for CNN-based image denoising. IEEE Trans. Image Process. 27(9), 4608–4622 (2018)
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 586–595 (2018)
Zhang, W., Liu, Y., Dong, C., Qiao, Y.: RankSRGAN: generative adversarial networks with ranker for image super-resolution. In: IEEE International Conference on Computer Vision, pp. 3096–3105 (2019)
Zhang, Y., Li, K., Li, K., Fu, Y.: MR image super-resolution with squeeze and excitation reasoning attention network. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 13425–13434 (2021)
Zhang, Y., Li, K., Li, K., Wang, L., Zhong, B., Fu, Y.: Image super-resolution using very deep residual channel attention networks. In: European Conference on Computer Vision, pp. 286–301 (2018)
Zhang, Z., Wang, Z., Lin, Z., Qi, H.: Image super-resolution by neural texture transfer. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 7982–7991 (2019)
Zheng, H., Ji, M., Wang, H., Liu, Y., Fang, L.: CrossNet: an end-to-end reference-based super resolution network using cross-scale warping. In: European Conference on Computer Vision, pp. 88–104 (2018)
Zhou, R., Susstrunk, S.: Kernel modeling super-resolution on real low-resolution images. In: IEEE International Conference on Computer Vision, pp. 2433–2443 (2019)
Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 9308–9316 (2019)
Acknowledgements
This work was partly supported by Huawei Fund and the ETH Zürich Fund (OK).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Cao, J. et al. (2022). Reference-Based Image Super-Resolution with Deformable Attention Transformer. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13678. Springer, Cham. https://doi.org/10.1007/978-3-031-19797-0_19
Download citation
DOI: https://doi.org/10.1007/978-3-031-19797-0_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19796-3
Online ISBN: 978-3-031-19797-0
eBook Packages: Computer ScienceComputer Science (R0)