Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1007/978-3-031-19790-1_39guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Efficient Long-Range Attention Network for Image Super-Resolution

Published: 23 October 2022 Publication History

Abstract

Recently, transformer-based methods have demonstrated impressive results in various vision tasks, including image super-resolution (SR), by exploiting the self-attention (SA) for feature extraction. However, the computation of SA in most existing transformer based models is very expensive, while some employed operations may be redundant for the SR task. This limits the range of SA computation and consequently limits the SR performance. In this work, we propose an efficient long-range attention network (ELAN) for image SR. Specifically, we first employ shift convolution (shift-conv) to effectively extract the image local structural information while maintaining the same level of complexity as 1×1 convolution, then propose a group-wise multi-scale self-attention (GMSA) module, which calculates SA on non-overlapped groups of features using different window sizes to exploit the long-range image dependency. A highly efficient long-range attention block (ELAB) is then built by simply cascading two shift-conv with a GMSA module, which is further accelerated by using a shared attention mechanism. Without bells and whistles, our ELAN follows a fairly simple design by sequentially cascading the ELABs. Extensive experiments demonstrate that ELAN obtains even better results against the transformer-based SR models but with significantly less complexity. The source codes of ELAN can be found at https://github.com/xindongzhang/ELAN.

References

[1]
Agustsson, E., Timofte, R.: NTIRE 2017 challenge on single image super-resolution: dataset and study. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 126–135 (2017)
[2]
Ahn N, Kang B, and Sohn K-A Ferrari V, Hebert M, Sminchisescu C, and Weiss Y Fast, accurate, and lightweight super-resolution with cascading residual network Computer Vision – ECCV 2018 2018 Cham Springer 256-272
[3]
Bevilacqua, M., Roumy, A., Guillemot, C., Alberi-Morel, M.L.: Low-complexity single-image super-resolution based on nonnegative neighbor embedding (2012)
[4]
Brown, T., et al.: Language models are few-shot learners. In: Advances in Neural Information Processing Systems 33, pp. 1877–1901 (2020)
[5]
Buades, A., Coll, B., Morel, J.M.: A non-local algorithm for image denoising. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), vol. 2, pp. 60–65. IEEE (2005)
[6]
Caballero, J., et al.: Real-time video super-resolution with spatio-temporal networks and motion compensation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4778–4787 (2017)
[7]
Cao, H., et al.: Swin-Unet: Unet-like pure transformer for medical image segmentation. arXiv preprint arXiv:2105.05537 (2021)
[8]
Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, and Zagoruyko S Vedaldi A, Bischof H, Brox T, and Frahm J-M End-to-end object detection with transformers Computer Vision – ECCV 2020 2020 Cham Springer 213-229
[9]
Chen, H., et al.: Pre-trained image processing transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12299–12310 (2021)
[10]
Dai, T., Cai, J., Zhang, Y., Xia, S.T., Zhang, L.: Second-order attention network for single image super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11065–11074 (2019)
[11]
Dong C, Loy CC, He K, and Tang X Image super-resolution using deep convolutional networks IEEE Trans. Pattern Anal. Mach. Intell. 2015 38 2 295-307
[12]
Dong C, Loy CC, and Tang X Leibe B, Matas J, Sebe N, and Welling M Accelerating the super-resolution convolutional neural network Computer Vision – ECCV 2016 2016 Cham Springer 391-407
[13]
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
[14]
Fedus, W., Zoph, B., Shazeer, N.: Switch transformers: scaling to trillion parameter models with simple and efficient sparsity. arXiv preprint arXiv:2101.03961 (2021)
[15]
Franzen, R.: Kodak lossless true color image suite (1998). http://r0k.us/graphics/kodak/
[16]
He, X., Mo, Z., Wang, P., Liu, Y., Yang, M., Cheng, J.: ODE-inspired network design for single image super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1732–1741 (2019)
[17]
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)
[18]
Huang, J.B., Singh, A., Ahuja, N.: Single image super-resolution from transformed self-exemplars. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5197–5206 (2015)
[19]
Hui, Z., Gao, X., Yang, Y., Wang, X.: Lightweight image super-resolution with information multi-distillation network. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 2024–2032 (2019)
[20]
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456. PMLR (2015)
[21]
Kim, J., Kwon Lee, J., Mu Lee, K.: Accurate image super-resolution using very deep convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1646–1654 (2016)
[22]
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
[23]
Lai, W.S., Huang, J.B., Ahuja, N., Yang, M.H.: Deep Laplacian pyramid networks for fast and accurate super-resolution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 624–632 (2017)
[24]
LeCun Y, Bengio Y, and Hinton G Deep learning Nature 2015 521 7553 436-444
[25]
Ledig, C., et al.: Photo-realistic single image super-resolution using a generative adversarial network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4681–4690 (2017)
[26]
Li, W., Zhou, K., Qi, L., Jiang, N., Lu, J., Jia, J.: LAPAR: linearly-assembled pixel-adaptive regression network for single image super-resolution and beyond. In: Advances in Neural Information Processing Systems 33, pp. 20343–20355 (2020)
[27]
Li, Y., Zhang, K., Cao, J., Timofte, R., Van Gool, L.: LocalViT: bringing locality to vision transformers. arXiv preprint arXiv:2104.05707 (2021)
[28]
Li, Z., Yang, J., Liu, Z., Yang, X., Jeon, G., Wu, W.: Feedback network for image super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3867–3876 (2019)
[29]
Liang, J., Cao, J., Sun, G., Zhang, K., Van Gool, L., Timofte, R.: SwinIR: image restoration using swin transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1833–1844 (2021)
[30]
Lim, B., Son, S., Kim, H., Nah, S., Mu Lee, K.: Enhanced deep residual networks for single image super-resolution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 136–144 (2017)
[31]
Liu, D., Wen, B., Fan, Y., Loy, C.C., Huang, T.S.: Non-local recurrent network for image restoration. In: Advances in Neural Information Processing Systems 31 (2018)
[32]
Liu, J., Tang, J., Wu, G.: Residual feature distillation network for lightweight image super-resolution. arXiv preprint arXiv:2009.11551 (2020)
[33]
Liu L et al. Deep learning for generic object detection: a survey Int. J. Comput. Vis. 2020 128 2 261-318
[34]
Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019)
[35]
Liu, Y., Sun, G., Qiu, Y., Zhang, L., Chhatkuli, A., Van Gool, L.: Transformer in convolutional neural networks. arXiv preprint arXiv:2106.03180 (2021)
[36]
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021)
[37]
Lu, Z., Liu, H., Li, J., Zhang, L.: Efficient transformer for single image super-resolution. arXiv preprint arXiv:2108.11084 (2021)
[38]
Luo X, Xie Y, Zhang Y, Qu Y, Li C, and Fu Y Vedaldi A, Bischof H, Brox T, and Frahm J-M LatticeNet: towards lightweight image super-resolution with lattice block Computer Vision – ECCV 2020 2020 Cham Springer 272-289
[39]
Martin, D., Fowlkes, C., Tal, D., Malik, J.: A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In: Proceedings Eighth IEEE International Conference on Computer Vision, ICCV 2001, vol. 2, pp. 416–423. IEEE (2001)
[40]
Matsui Y et al. Sketch-based manga retrieval using Manga109 dataset Multimed. Tools Appl. 2017 76 20 21811-21838
[41]
Mei, Y., Fan, Y., Zhou, Y.: Image super-resolution with non-local sparse attention. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3517–3526 (2021)
[42]
Niu B et al. Vedaldi A, Bischof H, Brox T, Frahm J-M, et al. Single image super-resolution via a holistic attention network Computer Vision – ECCV 2020 2020 Cham Springer 191-207
[43]
Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems 32, pp. 8026–8037 (2019)
[44]
Radford, A., Narasimhan, K., Salimans, T., Sutskever, I.: Improving language understanding by generative pre-training (2018)
[45]
Ramachandran, P., Parmar, N., Vaswani, A., Bello, I., Levskaya, A., Shlens, J.: Stand-alone self-attention in vision models. In: Advances in Neural Information Processing Systems 32 (2019)
[46]
Sajjadi, M.S., Scholkopf, B., Hirsch, M.: EnhanceNet: single image super-resolution through automated texture synthesis. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4491–4500 (2017)
[47]
Sajjadi, M.S., Vemulapalli, R., Brown, M.: Frame-recurrent video super-resolution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6626–6634 (2018)
[48]
Shi, W., et al.: Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1874–1883 (2016)
[49]
Tai, Y., Yang, J., Liu, X.: Image super-resolution via deep recursive residual network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3147–3155 (2017)
[50]
Tai, Y., Yang, J., Liu, X., Xu, C.: MemNet: a persistent memory network for image restoration. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4539–4547 (2017)
[51]
Tao, X., Gao, H., Liao, R., Wang, J., Jia, J.: Detail-revealing deep video super-resolution. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4472–4480 (2017)
[52]
Tian C, Xu Y, and Zuo W Image denoising using deep CNN with batch renormalization Neural Netw. 2020 121 461-473
[53]
Timofte, R., Agustsson, E., Van Gool, L., Yang, M.H., Zhang, L.: NTIRE 2017 challenge on single image super-resolution: methods and results. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 114–125 (2017)
[54]
Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jégou, H.: Training data-efficient image transformers & distillation through attention. In: International Conference on Machine Learning, pp. 10347–10357. PMLR (2021)
[55]
Wang, X., Chan, K.C., Yu, K., Dong, C., Change Loy, C.: EDVR: video restoration with enhanced deformable convolutional networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (2019)
[56]
Wang X et al. Leal-Taixé L, Roth S, et al. ESRGAN: enhanced super-resolution generative adversarial networks Computer Vision – ECCV 2018 Workshops 2019 Cham Springer 63-79
[57]
Wang, Z., Cun, X., Bao, J., Liu, J.: Uformer: a general U-shaped transformer for image restoration. arXiv preprint arXiv:2106.03106 (2021)
[58]
Wu, B., et al.: Shift: a zero flop, zero parameter alternative to spatial convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9127–9135 (2018)
[59]
Wu, B., et al.: Visual transformers: token-based image representation and processing for computer vision. arXiv preprint arXiv:2006.03677 (2020)
[60]
Xia, Z., Chakrabarti, A.: Identifying recurring patterns with deep neural networks for natural image denoising. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 2426–2434 (2020)
[61]
Zamir, S.W., Arora, A., Khan, S., Hayat, M., Khan, F.S., Yang, M.H.: Restormer: efficient transformer for high-resolution image restoration. arXiv preprint arXiv:2111.09881 (2021)
[62]
Zeyde R, Elad M, Protter M, et al. Boissonnat J-D et al. On single image scale-up using sparse-representations Curves and Surfaces 2012 Heidelberg Springer 711-730
[63]
Zhang K, Li Y, Zuo W, Zhang L, Van Gool L, and Timofte R Plug-and-play image restoration with deep denoiser prior IEEE Trans. Pattern Anal. Mach. Intell. 2021 44 6360-6376
[64]
Zhang L, Wu X, Buades A, and Li X Color demosaicking by local directional interpolation and nonlocal adaptive thresholding J. Electron. Imaging 2011 20 2
[65]
Zhang Y, Li K, Li K, Wang L, Zhong B, and Fu Y Ferrari V, Hebert M, Sminchisescu C, and Weiss Y Image super-resolution using very deep residual channel attention networks Computer Vision – ECCV 2018 2018 Cham Springer 294-310
[66]
Zhang, Y., Li, K., Li, K., Zhong, B., Fu, Y.: Residual non-local attention networks for image restoration. arXiv preprint arXiv:1903.10082 (2019)
[67]
Zhang, Y., Tian, Y., Kong, Y., Zhong, B., Fu, Y.: Residual dense network for image super-resolution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2472–2481 (2018)
[68]
Zheng, S., et al.: Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6881–6890 (2021)
[69]
Zhou, S., Zhang, J., Zuo, W., Loy, C.C.: Cross-scale internal graph neural network for image super-resolution. In: Advances in Neural Information Processing Systems 33, pp. 3499–3509 (2020)

Cited By

View all
  • (2024)Lightweight image super-resolution via flexible meta pruningProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694565(60305-60314)Online publication date: 21-Jul-2024
  • (2024)See more detailsProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694470(58158-58173)Online publication date: 21-Jul-2024
  • (2024)Helmet Detection Algorithm Based on Improved YOLOv7Automatic Control and Computer Sciences10.3103/S014641162470111658:6(642-655)Online publication date: 1-Dec-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XVII
Oct 2022
799 pages
ISBN:978-3-031-19789-5
DOI:10.1007/978-3-031-19790-1

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 23 October 2022

Author Tags

  1. Super-resolution
  2. Long-range attention
  3. Transformer

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 28 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Lightweight image super-resolution via flexible meta pruningProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694565(60305-60314)Online publication date: 21-Jul-2024
  • (2024)See more detailsProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694470(58158-58173)Online publication date: 21-Jul-2024
  • (2024)Helmet Detection Algorithm Based on Improved YOLOv7Automatic Control and Computer Sciences10.3103/S014641162470111658:6(642-655)Online publication date: 1-Dec-2024
  • (2024)FreqFormerProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/81(731-739)Online publication date: 3-Aug-2024
  • (2024)GRFormer: Grouped Residual Self-Attention for Lightweight Single Image Super-ResolutionProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681554(9378-9386)Online publication date: 28-Oct-2024
  • (2024)SSL: A Self-similarity Loss for Improving Generative Image Super-resolutionProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680874(3189-3198)Online publication date: 28-Oct-2024
  • (2024)Efficient Single Image Super-Resolution with Entropy Attention and Receptive Field AugmentationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680744(1302-1310)Online publication date: 28-Oct-2024
  • (2024)Towards real-time practical image compression with lightweight attentionExpert Systems with Applications: An International Journal10.1016/j.eswa.2024.124142252:PAOnline publication date: 24-Jul-2024
  • (2024)SCW-YOLO: An Improved Algorithm for Fall Detection Based on Deep LearningAdvanced Intelligent Computing Technology and Applications10.1007/978-981-97-5612-4_35(408-418)Online publication date: 5-Aug-2024
  • (2024)HiT-SR: Hierarchical Transformer for Efficient Image Super-ResolutionComputer Vision – ECCV 202410.1007/978-3-031-73661-2_27(483-500)Online publication date: 29-Sep-2024
  • Show More Cited By

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media