Dual Projection Fusion for Reference-Based Image Super-Resolution
Abstract
:1. Introduction
- We propose a lightweight backbone, called deep channel attention connection network (DCACN), which can extract valuable high-frequency components from the LR space for image reconstruction. With the help of DCACN, the proposed DPFSR possesses stronger feature representation capability;
- We also propose a novel fusion module, called dual projection fusion module (DPFM), which enables the network to focus on the different information between feature sources through inter-residual projection operations, generating more discriminative fusion features and further improving the performance of the model;
- We evaluate the proposed DPFSR on three publicly available datasets, and our method proved to be superior to the state-of-the-art SISR and RefSR methods through quantitative and qualitative comparisons. Furthermore, we also conduct an ablation study to explore the effect of utilizing reference images with different similarity levels on the model performance. Experimental results demonstrate that the proposed approach possesses superior robustness.
2. Related Work
2.1. Single Image Super-Resolution
2.2. Reference-Based Image Super-Resolution
3. Methods
3.1. Deep Channel Attention Connection Network
3.2. Improved Texture Transformer
3.2.1. Texture Feature Encoder
3.2.2. Similarity Embedding Module
3.2.3. Texture Feature Selector
3.3. Dual Projection Fusion Module
3.4. Image Reconstruction
3.5. Loss Function
- Reconstruction loss: is the Mean absolute error (MAE) loss:
- Perceptual loss: Perceptual loss aims to improve the visual quality of the recovered image. In this paper we uses the conventional perceptual loss [12]:
- Adversarial loss: is the adversarial loss that promotes the synthesized images to obtain clear and natural image details. Here, we also adopt the WGAN-GP [42]:
4. Experiments and Results
4.1. Datasets and Evaluation Metrics
4.2. Implementation Details
4.3. Ablation Study
4.3.1. Effect of DPFM and DCACN
4.3.2. Effect of ITT
4.3.3. Effect of Different Reference Similarity Levels
4.4. Comparisons with State-of-the-Art Methods
4.4.1. Quantitative Evaluation
4.4.2. Qualitative Evaluation
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Yang, C.Y.; Ma, C.; Yang, M.H. Single-image super-resolution: A benchmark. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; pp. 372–386. [Google Scholar]
- Dong, C.; Loy, C.C.; He, K.; Tang, X. Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 38, 295–307. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Dong, C.; Loy, C.C.; He, K.; Tang, X. Learning a deep convolutional network for image super-resolution. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; pp. 184–199. [Google Scholar]
- Wang, Z.; Liu, D.; Yang, J.; Han, W.; Huang, T. Deep networks for image super-resolution with sparse prior. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 370–378. [Google Scholar]
- Lim, B.; Son, S.; Kim, H.; Nah, S.; Mu Lee, K. Enhanced deep residual networks for single image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July 2017; pp. 136–144. [Google Scholar]
- Han, W.; Chang, S.; Liu, D.; Yu, M.; Witbrock, M.; Huang, T.S. Image super-resolution via dual-state recurrent networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 1654–1663. [Google Scholar]
- Oktay, O.; Bai, W.; Lee, M.; Guerrero, R.; Kamnitsas, K.; Caballero, J.; de Marvao, A.; Cook, S.; O’Regan, D.; Rueckert, D. Multi-input cardiac image super-resolution using convolutional neural networks. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Athens, Greece, 17–21 October 2016; pp. 246–254. [Google Scholar]
- Huang, Y.; Shao, L.; Frangi, A.F. Simultaneous super-resolution and cross-modality synthesis of 3D medical images using weakly-supervised joint convolutional sparse coding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 6070–6079. [Google Scholar]
- Zhang, L.; Zhang, H.; Shen, H.; Li, P. A super-resolution reconstruction algorithm for surveillance images. Signal Process. 2010, 90, 848–859. [Google Scholar] [CrossRef]
- Wu, Y.; Chen, Y.; Yuan, L.; Liu, Z.; Wang, L.; Li, H.; Fu, Y. Rethinking classification and localization for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 10186–10195. [Google Scholar]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. In Proceedings of the Advances in Neural Information Processing Systems 27 (NIPS 2014), Montreal, QC, Canada, 8–13 December 2014; Volume 27. [Google Scholar]
- Johnson, J.; Alahi, A.; Fei-Fei, L. Perceptual losses for real-time style transfer and super-resolution. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 694–711. [Google Scholar]
- Timofte, R.; De Smet, V.; Van Gool, L. Anchored neighborhood regression for fast example-based super-resolution. In Proceedings of the IEEE International Conference on Computer Vision, Sydney, NSW, Australia, 2–8 December 2013; pp. 1920–1927. [Google Scholar]
- Zheng, H.; Ji, M.; Han, L.; Xu, Z.; Wang, H.; Liu, Y.; Fang, L. Learning Cross-scale Correspondence and Patch-based Synthesis for Reference-based Super-Resolution. In Proceedings of the BMVC, London, UK, 4–7 September 2017; Volume 1, p. 2. [Google Scholar]
- Zheng, H.; Ji, M.; Wang, H.; Liu, Y.; Fang, L. Crossnet: An end-to-end reference-based super resolution network using cross-scale warping. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 88–104. [Google Scholar]
- Zhang, Z.; Wang, Z.; Lin, Z.; Qi, H. Image super-resolution by neural texture transfer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–17 June 2019; pp. 7982–7991. [Google Scholar]
- Xie, Y.; Xiao, J.; Sun, M.; Yao, C.; Huang, K. Feature representation matters: End-to-end learning for reference-based image super-resolution. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 230–245. [Google Scholar]
- Yang, F.; Yang, H.; Fu, J.; Lu, H.; Guo, B. Learning Texture Transformer Network for Image Super-Resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 19–24 June 2020; pp. 5791–5800. [Google Scholar]
- Dong, C.; Loy, C.C.; Tang, X. Accelerating the super-resolution convolutional neural network. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 391–407. [Google Scholar]
- Kim, J.; Lee, J.K.; Lee, K.M. Accurate image super-resolution using very deep convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 1646–1654. [Google Scholar]
- Kim, J.; Lee, J.K.; Lee, K.M. Deeply-recursive convolutional network for image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 1637–1645. [Google Scholar]
- Shi, W.; Caballero, J.; Huszár, F.; Totz, J.; Aitken, A.P.; Bishop, R.; Rueckert, D.; Wang, Z. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 1874–1883. [Google Scholar]
- Lai, W.S.; Huang, J.B.; Ahuja, N.; Yang, M.H. Deep laplacian pyramid networks for fast and accurate super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 624–632. [Google Scholar]
- Zhang, Y.; Tian, Y.; Kong, Y.; Zhong, B.; Fu, Y. Residual dense network for image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 2472–2481. [Google Scholar]
- Zhang, Y.; Li, K.; Li, K.; Wang, L.; Zhong, B.; Fu, Y. Image super-resolution using very deep residual channel attention networks. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 286–301. [Google Scholar]
- Ledig, C.; Theis, L.; Huszár, F.; Caballero, J.; Cunningham, A.; Acosta, A.; Aitken, A.; Tejani, A.; Totz, J.; Wang, Z.; et al. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4681–4690. [Google Scholar]
- Wang, X.; Yu, K.; Wu, S.; Gu, J.; Liu, Y.; Dong, C.; Qiao, Y.; Change Loy, C. Esrgan: Enhanced super-resolution generative adversarial networks. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]
- Zhang, W.; Liu, Y.; Dong, C.; Qiao, Y. Ranksrgan: Generative adversarial networks with ranker for image super-resolution. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea, 27–28 October 2019; pp. 3096–3105. [Google Scholar]
- Ma, C.; Rao, Y.; Cheng, Y.; Chen, C.; Lu, J.; Zhou, J. Structure-Preserving Super Resolution with Gradient Guidance. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 7769–7778. [Google Scholar]
- Yue, H.; Sun, X.; Yang, J.; Wu, F. Landmark image super-resolution by retrieving web images. IEEE Trans. Image Process. 2013, 22, 4865–4878. [Google Scholar] [PubMed]
- Shim, G.; Park, J.; Kweon, I.S. Robust reference-based super-resolution with similarity-aware deformable convolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 8425–8434. [Google Scholar]
- Dai, J.; Qi, H.; Xiong, Y.; Li, Y.; Zhang, G.; Hu, H.; Wei, Y. Deformable convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 764–773. [Google Scholar]
- Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Hu, Q. ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
- Ma, X.; Guo, J.; Tang, S.; Qiao, Z.; Chen, Q.; Yang, Q.; Fu, S. DCANet: Learning Connected Attentions for Convolutional Neural Networks. arXiv 2020, arXiv:2007.05099. [Google Scholar]
- Lin, R.; Xiao, N. Residual Channel Attention Connection Network for Reference-based Image Super-resolution. In Proceedings of the 2021 8th International Conference on Information, Cybernetics, and Computational Social Systems (ICCSS), Beijing, China, 10–12 December 2021; pp. 307–313. [Google Scholar]
- Nair, V.; Hinton, G.E. Rectified linear units improve restricted boltzmann machines. In Proceedings of the International Conference on International Conference on Machine Learning, Haifa, Israel, 21–24 June 2010. [Google Scholar]
- Duan, C.; Xiao, N. Parallax-based spatial and channel attention for stereo image super-resolution. IEEE Access 2019, 7, 183672–183679. [Google Scholar] [CrossRef]
- Mount, J. The Equivalence of Logistic Regression and Maximum Entropymodels. 2011. Available online: http://www.winvector.com/dfiles/LogisticRegressionMaxEnt.pdf (accessed on 20 April 2022).
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Rahutomo, F.; Kitasuka, T.; Aritsugi, M. Semantic cosine similarity. In Proceedings of the 7th International Student Conference on Advanced Science and Technology ICAST, Seoul, Korea, 29–30 October 2012; Volume 4, p. 1. [Google Scholar]
- Mei, Y.; Fan, Y.; Zhou, Y.; Huang, L.; Huang, T.S.; Shi, H. Image super-resolution with cross-scale non-local attention and exhaustive self-exemplars mining. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 5690–5699. [Google Scholar]
- Gulrajani, I.; Ahmed, F.; Arjovsky, M.; Dumoulin, V.; Courville, A.C. Improved training of wasserstein gans. Adv. Neural Inf. Process. Syst. 2017, 30, 5767–5777. [Google Scholar]
- Sun, L.; Hays, J. Super-resolution from internet-scale scene matching. In Proceedings of the 2012 IEEE International Conference on Computational Photography (ICCP), Seattle, WA, USA, 28–29 April 2012; pp. 1–12. [Google Scholar]
- Huang, J.B.; Singh, A.; Ahuja, N. Single image super-resolution from transformed self-exemplars. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 5197–5206. [Google Scholar]
- Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Paszke, A.; Gross, S.; Chintala, S.; Chanan, G.; Yang, E.; DeVito, Z.; Lin, Z.; Desmaison, A.; Antiga, L.; Lerer, A. Automatic differentiation in pytorch. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Sajjadi, M.S.; Scholkopf, B.; EnhanceNet, M.H. Single Image Super-Resolution through Automated Texture Synthesis. arXiv 2016, arXiv:1612.07919. [Google Scholar]
Experimental Configuration | Options |
---|---|
Linux version | Ubuntu 20.04 |
Deep-learning framework | PyTorch 1.10 |
CUDA version | 11.2 |
Input patchsize | 160 × 160 |
Reference patchsize | 160 × 160 |
Scale factor | 4× |
Model | PSNR/SSIM |
---|---|
Base | 27.18/0.806 |
Base + LR branch of DPFM | 27.18/0.807 |
Base + Ref branch of DPFM | 27.16/0.805 |
Base + DPFM | 27.23/0.807 |
DPFSR (Ours) | 27.25/0.808 |
Model | PSNR/SSIM |
---|---|
DPFSR (replace ITT with TT) | 27.18/0.806 |
DPFSR (use ITT without DPFM) | 27.20/0.807 |
Level | CrossNet | SRNTT-rec | TTSR-rec | DPFSR-rec |
---|---|---|---|---|
PSNR/SSIM | PSNR/SSIM | PSNR/SSIM | PSNR/SSIM | |
L1 | 25.48/0.764 | 26.15/0.781 | 26.99/0.800 | 27.15/0.805 |
L2 | 25.48/0.764 | 26.04/0.776 | 26.74/0.791 | 26.86/0.794 |
L3 | 25.47/0.763 | 25.98/0.775 | 26.64/0.788 | 26.73/0.790 |
L4 | 25.46/0.763 | 25.95/0.774 | 26.58/0.787 | 26.68/0.789 |
LR | 25.46/0.763 | 25.91/0.776 | 26.43/0.782 | 26.63/0.786 |
Method | CUFED5 | Sun80 | Urban100 |
---|---|---|---|
PSNR/SSIM | PSNR/SSIM | PSNR/SSIM | |
Bicubic | 24.22/0.684 | 28.65/0.766 | 23.13/0.659 |
SRCNN [2] | 25.33/0.745 | 28.26/0.781 | 24.41/0.738 |
MDSR [5] | 25.93/0.777 | 28.52/0.792 | 25.51/0.783 |
RDN [24] | 26.17/0.771 | 29.97/0.812 | 25.59/0.768 |
RCAN [25] | 26.19/0.771 | 30.02/0.813 | 25.65/0.771 |
SRGAN [26] | 24.40/0.702 | 26.76/0.725 | 24.07/0.729 |
ENet [48] | 24.24/0.695 | 26.24/0.702 | 23.63/0.711 |
ESRGAN [27] | 21.90/0.633 | 24.18/0.651 | 20.91/0.620 |
RSRGAN [28] | 22.31/0.635 | 25.60/0.667 | 21.47/0.624 |
SPSR [29] | 24.39/0.714 | 27.94/0.744 | 24.29/0.729 |
CrossNet [15] | 25.48/0.764 | 28.52/0.793 | 25.11/0.764 |
SRNTT-rec [16] | 26.24/0.784 | 28.54/0.793 | 25.50/0.783 |
SRNTT [16] | 25.61/0.764 | 27.59/0.756 | 25.09/0.774 |
TTSR-rec [18] | 27.09/0.804 | 30.02/0.814 | 25.87/0.784 |
TTSR [18] | 25.53/0.765 | 28.59/0.774 | 24.62/0.747 |
DPFSR-rec | 27.25/0.808 | 30.10/0.815 | 26.03/0.787 |
DPFSR | 25.23/0.749 | 28.42/0.762 | 24.35/0.734 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lin, R.; Xiao, N. Dual Projection Fusion for Reference-Based Image Super-Resolution. Sensors 2022, 22, 4119. https://doi.org/10.3390/s22114119
Lin R, Xiao N. Dual Projection Fusion for Reference-Based Image Super-Resolution. Sensors. 2022; 22(11):4119. https://doi.org/10.3390/s22114119
Chicago/Turabian StyleLin, Ruirong, and Nanfeng Xiao. 2022. "Dual Projection Fusion for Reference-Based Image Super-Resolution" Sensors 22, no. 11: 4119. https://doi.org/10.3390/s22114119
APA StyleLin, R., & Xiao, N. (2022). Dual Projection Fusion for Reference-Based Image Super-Resolution. Sensors, 22(11), 4119. https://doi.org/10.3390/s22114119