Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

AGG: attention-based gated convolutional GAN with prior guidance for image inpainting

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

A Correction to this article was published on 29 May 2024

This article has been updated

Abstract

Image inpainting has made great achievements recently, but it is often tough to generate a semantically consistent image when faced with large missing areas in complex scenes. To address semantic and structural alignment in existing methods for image inpainting, this paper proposes an end-to-end attention-based gated convolution GAN with prior guidance named AGG, which designs the spatial and channel attention mechanisms for full extraction of semantic and structural features. Moreover, AGG constructs the attention-based upsampling module based on the channel attention to refine the feature map and capture more details from features of up-level low sizes. AGG uses the image contour as prior, allowing the gated convolution and attention mechanism may fill the image efficiently by focusing on the contour information. The attention-based gated convolution can effectively capture the global features and compensate for the limitations of the restricted receptive field of the naive convolution. Compared to other models, AGG generates images with finer outline features and no common problems such as the watermark and blur, which shows the best overall performance on the Paris StreetView, CelebA-HQ, and Places2 datasets. The best FID, LPIPS, PSNR, and SSIM values achieved by AGG are 2.18, 0.046, 30.82, and 0.951 on the CelebA-HQ dataset, with at least 3.21% and 6.52% performance improvement on FID and LPIPS compared to state-of-the-art methods, respectively. The source code will be available at https://github.com/Shawn-Yu-1/AGGNet.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Algorithm 1
Fig. 4
Algorithm 2
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability

The data and codes that support the findings of this study are available from the corresponding author upon reasonable request.

Change history

References

  1. Wan Z, Zhang B, Chen D et al (2022) Old photo restoration via deep latent space translation. IEEE Trans Pattern Anal Mach Intell 45(2):2071–2087. https://doi.org/10.1109/TPAMI.2022.3163183

    Article  Google Scholar 

  2. Wang F, Hu Y, Liu W et al (2022) Face inpainting algorithm combining face sketch and gate convolution. In: 2022 4th international conference on natural language processing, pp 81–86. https://doi.org/10.1109/ICNLP55136.2022.00022

  3. Rouzrokh P, Khosravi B, Faghani S et al (2022) Multitask brain tumor inpainting with diffusion models: a methodological report. arXiv:2210.12113

  4. Jboor NH, Belhi A, Al-Ali AK et al (2019) Towards an inpainting framework for visual cultural heritage. In: 2019 IEEE Jordan international joint conference on electrical engineering and information technology, pp 602–607. https://doi.org/10.1109/JEEIT.2019.8717470

  5. Ding D, Ram S, Rodríguez JJ (2019) Image inpainting using nonlocal texture matching and nonlinear filtering. IEEE Trans Image Process 28(4):1705–1719. https://doi.org/10.1109/TIP.2018.2880681

    Article  MathSciNet  Google Scholar 

  6. Ye H, Li H, Cao F et al (2019) A hybrid truncated norm regularization method for matrix completion. IEEE Trans Image Process 28(10):5171–5186. https://doi.org/10.1109/TIP.2019.2918733

    Article  MathSciNet  Google Scholar 

  7. Goodfellow I, Pouget-Abadie J, Mirza M et al (2020) Generative adversarial networks. Commun ACM 63(11):139–144. https://doi.org/10.1145/3422622

    Article  MathSciNet  Google Scholar 

  8. Karras T, Laine S, Aittala M et al (2020) Analyzing and improving the image quality of stylegan. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8110–8119. https://doi.org/10.1109/CVPR42600.2020.00813

  9. Ho J, Jain A, Abbeel P (2020) Denoising diffusion probabilistic models. Adv Neural Inf Process Syst 33:6840–6851

    Google Scholar 

  10. Cai Y, Hu X, Wang H et al (2021) Learning to generate realistic noisy images via pixel-level noise-aware adversarial training. Adv Neural Inf Process Syst 34:3259–3270

    Google Scholar 

  11. Kawar B, Elad M, Ermon S et al (2022) Denoising diffusion restoration models. Adv Neural Inf Process Syst 35:23593–23606

    Google Scholar 

  12. Yi Z, Tang Q, Azizi S et al (2020) Contextual residual aggregation for ultra high-resolution image inpainting. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7508–7517. https://doi.org/10.1109/CVPR42600.2020.00753

  13. Zhu M, He D, Li X et al (2021) Image inpainting by end-to-end cascaded refinement with mask awareness. IEEE Trans Image Process 30:4855–4866. https://doi.org/10.1109/TIP.2021.3076310

    Article  Google Scholar 

  14. Zeng Y, Lin Z, Yang J et al (2020) High-resolution image inpainting with iterative confidence feedback and guided upsampling. In: Computer vision–ECCV 2020: 16th European conference, pp 1–17. https://doi.org/10.1007/978-3-030-58529-7_1

  15. Li W, Lin Z, Zhou K et al (2022) Mat: mask-aware transformer for large hole image inpainting. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10758–10768. https://doi.org/10.1109/CVPR52688.2022.01049

  16. Cao C, Dong Q, Fu Y (2022) Learning prior feature and attention enhanced image inpainting. In: Computer vision–ECCV 2022: 17th European conference, pp 306–322. https://doi.org/10.1007/978-3-031-19784-0_18

  17. Yu J, Li K, Peng J (2022) Reference-guided face inpainting with reference attention network. Neural Comput Appl 34(12):9717–9731. https://doi.org/10.1007/s00521-022-06961-8

    Article  Google Scholar 

  18. Nazeri K, Ng E, Joseph T et al (2019) Edgeconnect: structure guided image inpainting using edge prediction. In: Proceedings of the IEEE/CVF international conference on computer vision workshops, pp 1–10. https://doi.org/10.1109/ICCVW.2019.00408

  19. Cao C, Dong Q, Fu Y (2023) Zits++: image inpainting by improving the incremental transformer on structural priors. IEEE Trans Pattern Anal Mach Intell 45(10):12667–12684. https://doi.org/10.1109/TPAMI.2023.3280222

    Article  Google Scholar 

  20. Yu J, Lin Z, Yang J et al (2019) Free-form image inpainting with gated convolution. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 4471–4480. https://doi.org/10.1109/ICCV.2019.00457

  21. Suvorov R, Logacheva E, Mashikhin A et al (2022) Resolution-robust large mask inpainting with Fourier convolutions. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 2149–2159. https://doi.org/10.1109/WACV51458.2022.00323

  22. Zeng Y, Fu J, Chao H et al (2022) Aggregated contextual transformations for high-resolution image inpainting. IEEE Trans Vis Comput Graph. https://doi.org/10.1109/ICCV48922.2021.01387

    Article  Google Scholar 

  23. Zhao S, Cui J, Sheng Y et al (2021) Large scale image completion via co-modulated generative adversarial networks. In: Proceedings of the international conference on learning representations, pp 1–25. https://openreview.net/forum?id=sSjqmfsk95O

  24. Ramesh A, Pavlov M, Goh G et al (2021) Zero-shot text-to-image generation. In: Proceedings of the 38th international conference on machine learning, PMLR, pp 8821–8831. https://proceedings.mlr.press/v139/ramesh21a.html

  25. Lugmayr A, Danelljan M, Romero A et al (2022) Repaint: Inpainting using denoising diffusion probabilistic models. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11461–11471. https://doi.org/10.1109/CVPR52688.2022.01117

  26. Chung H, Sim B, Ryu D et al (2022) Improving diffusion models for inverse problems using manifold constraints. Adv Neural Inf Process Syst 35:25683–25696

    Google Scholar 

  27. Meng C, He Y, Song Y et al (2022) SDEdit: guided image synthesis and editing with stochastic differential equations. In: Proceedings of the international conference on learning representations, pp 1–33. https://openreview.net/forum?id=aBsCjcPu_tE

  28. Rombach R, Blattmann A, Lorenz D et al (2022) High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10684–10695. https://doi.org/10.1109/CVPR52688.2022.01042

  29. Song J, Meng C, Ermon S (2021) Denoising diffusion implicit models. In: Proceedings of the international conference on learning representations, pp 1–20. https://openreview.net/forum?id=St1giarCHLP

  30. Lu C, Zhou Y, Bao F et al (2022) DPM-solver: a fast ode solver for diffusion probabilistic model sampling in around 10 steps. Adv Neural Inf Process Syst 35:5775–5787

    Google Scholar 

  31. Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference. Springer, Berlin, pp 234–241. https://doi.org/10.1007/978-3-319-24574-4_28

  32. Dosovitskiy A, Beyer L, Kolesnikov A et al (2021) An image is worth 16x16 words: transformers for image recognition at scale. In: Proceedings of the international conference on learning representations, pp 1–21. https://openreview.net/forum?id=YicbFdNTTy

  33. He K, Chen X, Xie S et al (2022) Masked autoencoders are scalable vision learners. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 16000–16009.https://doi.org/10.1109/CVPR52688.2022.01553

  34. Hoogeboom E, Gritsenko AA, Bastings J et al (2022) Autoregressive diffusion models. In: Proceedings of the international conference on learning representations, pp 1–23. https://openreview.net/forum?id=Lm8T39vLDTE

  35. Lee K, Chang H, Jiang L et al (2022) ViTGAN: training GANs with vision transformers. In: Proceedings of the international conference on learning representations, pp 1–18. https://openreview.net/forum?id=dwg5rXg1WS_

  36. Jiang Y, Chang S, Wang Z (2021) TransGAN: two pure transformers can make one strong GAN, and that can scale up. Adv Neural Inf Process Syst 34:14745–14758

    Google Scholar 

  37. Zhang B, Gu S, Zhang B et al (2022) StyleSwin: transformer-based GAN for high-resolution image generation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11304–11314. https://doi.org/10.1109/CVPR52688.2022.01102

  38. Wang Z, Zheng H, He P et al (2022) Diffusion-GAN: training GANs with diffusion. arXiv:2206.02262

  39. Xiao Z, Kreis K, Vahdat A (2022) Tackling the generative learning trilemma with denoising diffusion GANs. In: Proceedings of the international conference on learning representations, pp 1–28. https://openreview.net/forum?id=JprM0p-q0Co

  40. Liu Z, Lin Y, Cao Y et al (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10012–10022. https://doi.org/10.1109/ICCV48922.2021.00986

  41. Shao X, Ye H, Yang B et al (2023) Two-stream coupling network with bidirectional interaction between structure and texture for image inpainting. Expert Syst Appl 231:120700–120715. https://doi.org/10.1016/j.eswa.2023.120700

    Article  Google Scholar 

  42. Dogan Y, Keles HY (2022) Iterative facial image inpainting based on an encoder-generator architecture. Neural Comput Appl 34(12):10001–10021. https://doi.org/10.1007/s00521-022-06987-y

    Article  Google Scholar 

  43. Teterwak P, Sarna A, Krishnan D et al (2019) Boundless: generative adversarial networks for image extension. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10521–10530. https://doi.org/10.1109/ICCV.2019.01062

  44. Saharia C, Chan W, Chang H et al (2022) Palette: image-to-image diffusion models. In: Proceedings of the ACM SIGGRAPH 2022 conference, pp 1–10. https://doi.org/10.1145/3528233.3530757

  45. Lin CH, Lee HY, Cheng YC et al (2022) InfinityGAN: towards infinite-pixel image synthesis. arXiv:2104.03963

  46. Guo X, Yang H, Huang D (2021) Image inpainting via conditional texture and structure dual generation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 14134–14143

  47. Wang S, Li BZ, Khabsa M et al (2020) Linformer: self-attention with linear complexity. arXiv:2006.04768

  48. Wu H, Wu J, Xu J et al (2022) Flowformer: linearizing transformers with conservation flows. In: Proceedings of the 39th international conference on machine learning, pp 24226–24242. https://proceedings.mlr.press/v162/wu22m

  49. Woo S, Park J, Lee JY et al (2018) Cbam: convolutional block attention module. In: Proceedings of the European conference on computer vision, pp 3–19. https://doi.org/10.1007/978-3-030-01234-2_1

  50. Miyato T, Kataoka T, Koyama M et al (2018) Spectral normalization for generative adversarial networks. In: Proceedings of the international conference on learning representations, pp 1–26. https://openreview.net/forum?id=B1QRgziT-

  51. Musco C, Musco C (2015) Randomized block Krylov methods for stronger and faster approximate singular value decomposition. Adv Neural Inf Process Syst 28:1–9

    Google Scholar 

  52. Guo MH, Lu CZ, Liu ZN et al (2023) Visual attention network. Comput Visual Media 9(4):733–752. https://doi.org/10.1007/s41095-023-0364-2

  53. Howard A, Sandler M, Chu G et al (2019) Searching for mobilenetv3. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 1314–1324. https://doi.org/10.1109/ICCV.2019.00140

  54. Dumoulin V, Visin F (2016) A guide to convolution arithmetic for deep learning. arXiv:1603.07285v1

  55. Johnson J, Alahi A, Fei-Fei L (2016) Perceptual losses for real-time style transfer and super-resolution. In: Computer vision–ECCV 2016: 14th European conference, pp 694–711. https://doi.org/10.1007/978-3-319-46475-6_43

  56. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556

  57. Russakovsky O, Deng J, Su H et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115:211–252. https://doi.org/10.1007/s11263-015-0816-y

    Article  MathSciNet  Google Scholar 

  58. Doersch C, Singh S, Gupta A et al (2012) What makes Paris look like Paris? ACM Trans Graph 31(4):1–9. https://doi.org/10.1145/2830541

    Article  Google Scholar 

  59. Karras T, Aila T, Laine S et al (2018) Progressive growing of GANs for improved quality, stability, and variation. In: Proceedings of the international conference on learning representations, pp 1–26. https://openreview.net/forum?id=Hk99zCeAb

  60. Zhou B, Lapedriza A, Khosla A et al (2017) Places: a 10 million image database for scene recognition. IEEE Trans Pattern Anal Mach Intell 40(6):1452–1464. https://doi.org/10.1109/TPAMI.2017.2723009

    Article  Google Scholar 

  61. Heusel M, Ramsauer H, Unterthiner T et al (2017) GANs trained by a two time-scale update rule converge to a local Nash equilibrium. Adv Neural Inf Process Syst 30:1–12

    Google Scholar 

  62. Zhang R, Isola P, Efros AA et al (2018) The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 586–595. https://doi.org/10.1109/CVPR.2018.00068

  63. Huang Z, Wang X, Huang L et al (2019) CCNet: criss-cross attention for semantic segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 603–612. https://doi.org/10.1109/ICCV.2019.00069

Download references

Acknowledgements

Supported by the National Natural Science Foundation of China (Grant Nos. 62272164 and 62306113) and Science and Technology on Space Intelligent Control Laboratory (Grant No. HTKJ2022KL502010).

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization: Xiankang Yu, Zhihua Chen, Dai Lei; Methodology: Xiankang Yu, Dai Lei; Software: Xiankang Yu; Validation: Xiankang Yu; Formal analysis and investigation: Xiankang Yu, Lei Dai; Writing-original draft preparation: Xiankang Yu, Dai Lei; Writing-review and editing: Zhihua Chen, Bin Sheng; Funding acquisition: Zhihua Chen, Bin Sheng; Supervision: Zhihua Chen. All authors have read and approved the final manuscript.

Corresponding authors

Correspondence to Lei Dai or Zhihua Chen.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original online version of this article was revised to correct the figures 5, 6, 7 and 8

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yu, X., Dai, L., Chen, Z. et al. AGG: attention-based gated convolutional GAN with prior guidance for image inpainting. Neural Comput & Applic 36, 12589–12604 (2024). https://doi.org/10.1007/s00521-024-09785-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-024-09785-w

Keywords