Abstract
Image inpainting has made great achievements recently, but it is often tough to generate a semantically consistent image when faced with large missing areas in complex scenes. To address semantic and structural alignment in existing methods for image inpainting, this paper proposes an end-to-end attention-based gated convolution GAN with prior guidance named AGG, which designs the spatial and channel attention mechanisms for full extraction of semantic and structural features. Moreover, AGG constructs the attention-based upsampling module based on the channel attention to refine the feature map and capture more details from features of up-level low sizes. AGG uses the image contour as prior, allowing the gated convolution and attention mechanism may fill the image efficiently by focusing on the contour information. The attention-based gated convolution can effectively capture the global features and compensate for the limitations of the restricted receptive field of the naive convolution. Compared to other models, AGG generates images with finer outline features and no common problems such as the watermark and blur, which shows the best overall performance on the Paris StreetView, CelebA-HQ, and Places2 datasets. The best FID, LPIPS, PSNR, and SSIM values achieved by AGG are 2.18, 0.046, 30.82, and 0.951 on the CelebA-HQ dataset, with at least 3.21% and 6.52% performance improvement on FID and LPIPS compared to state-of-the-art methods, respectively. The source code will be available at https://github.com/Shawn-Yu-1/AGGNet.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
The data and codes that support the findings of this study are available from the corresponding author upon reasonable request.
Change history
29 May 2024
A Correction to this paper has been published: https://doi.org/10.1007/s00521-024-10016-5
References
Wan Z, Zhang B, Chen D et al (2022) Old photo restoration via deep latent space translation. IEEE Trans Pattern Anal Mach Intell 45(2):2071–2087. https://doi.org/10.1109/TPAMI.2022.3163183
Wang F, Hu Y, Liu W et al (2022) Face inpainting algorithm combining face sketch and gate convolution. In: 2022 4th international conference on natural language processing, pp 81–86. https://doi.org/10.1109/ICNLP55136.2022.00022
Rouzrokh P, Khosravi B, Faghani S et al (2022) Multitask brain tumor inpainting with diffusion models: a methodological report. arXiv:2210.12113
Jboor NH, Belhi A, Al-Ali AK et al (2019) Towards an inpainting framework for visual cultural heritage. In: 2019 IEEE Jordan international joint conference on electrical engineering and information technology, pp 602–607. https://doi.org/10.1109/JEEIT.2019.8717470
Ding D, Ram S, Rodríguez JJ (2019) Image inpainting using nonlocal texture matching and nonlinear filtering. IEEE Trans Image Process 28(4):1705–1719. https://doi.org/10.1109/TIP.2018.2880681
Ye H, Li H, Cao F et al (2019) A hybrid truncated norm regularization method for matrix completion. IEEE Trans Image Process 28(10):5171–5186. https://doi.org/10.1109/TIP.2019.2918733
Goodfellow I, Pouget-Abadie J, Mirza M et al (2020) Generative adversarial networks. Commun ACM 63(11):139–144. https://doi.org/10.1145/3422622
Karras T, Laine S, Aittala M et al (2020) Analyzing and improving the image quality of stylegan. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8110–8119. https://doi.org/10.1109/CVPR42600.2020.00813
Ho J, Jain A, Abbeel P (2020) Denoising diffusion probabilistic models. Adv Neural Inf Process Syst 33:6840–6851
Cai Y, Hu X, Wang H et al (2021) Learning to generate realistic noisy images via pixel-level noise-aware adversarial training. Adv Neural Inf Process Syst 34:3259–3270
Kawar B, Elad M, Ermon S et al (2022) Denoising diffusion restoration models. Adv Neural Inf Process Syst 35:23593–23606
Yi Z, Tang Q, Azizi S et al (2020) Contextual residual aggregation for ultra high-resolution image inpainting. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7508–7517. https://doi.org/10.1109/CVPR42600.2020.00753
Zhu M, He D, Li X et al (2021) Image inpainting by end-to-end cascaded refinement with mask awareness. IEEE Trans Image Process 30:4855–4866. https://doi.org/10.1109/TIP.2021.3076310
Zeng Y, Lin Z, Yang J et al (2020) High-resolution image inpainting with iterative confidence feedback and guided upsampling. In: Computer vision–ECCV 2020: 16th European conference, pp 1–17. https://doi.org/10.1007/978-3-030-58529-7_1
Li W, Lin Z, Zhou K et al (2022) Mat: mask-aware transformer for large hole image inpainting. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10758–10768. https://doi.org/10.1109/CVPR52688.2022.01049
Cao C, Dong Q, Fu Y (2022) Learning prior feature and attention enhanced image inpainting. In: Computer vision–ECCV 2022: 17th European conference, pp 306–322. https://doi.org/10.1007/978-3-031-19784-0_18
Yu J, Li K, Peng J (2022) Reference-guided face inpainting with reference attention network. Neural Comput Appl 34(12):9717–9731. https://doi.org/10.1007/s00521-022-06961-8
Nazeri K, Ng E, Joseph T et al (2019) Edgeconnect: structure guided image inpainting using edge prediction. In: Proceedings of the IEEE/CVF international conference on computer vision workshops, pp 1–10. https://doi.org/10.1109/ICCVW.2019.00408
Cao C, Dong Q, Fu Y (2023) Zits++: image inpainting by improving the incremental transformer on structural priors. IEEE Trans Pattern Anal Mach Intell 45(10):12667–12684. https://doi.org/10.1109/TPAMI.2023.3280222
Yu J, Lin Z, Yang J et al (2019) Free-form image inpainting with gated convolution. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 4471–4480. https://doi.org/10.1109/ICCV.2019.00457
Suvorov R, Logacheva E, Mashikhin A et al (2022) Resolution-robust large mask inpainting with Fourier convolutions. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 2149–2159. https://doi.org/10.1109/WACV51458.2022.00323
Zeng Y, Fu J, Chao H et al (2022) Aggregated contextual transformations for high-resolution image inpainting. IEEE Trans Vis Comput Graph. https://doi.org/10.1109/ICCV48922.2021.01387
Zhao S, Cui J, Sheng Y et al (2021) Large scale image completion via co-modulated generative adversarial networks. In: Proceedings of the international conference on learning representations, pp 1–25. https://openreview.net/forum?id=sSjqmfsk95O
Ramesh A, Pavlov M, Goh G et al (2021) Zero-shot text-to-image generation. In: Proceedings of the 38th international conference on machine learning, PMLR, pp 8821–8831. https://proceedings.mlr.press/v139/ramesh21a.html
Lugmayr A, Danelljan M, Romero A et al (2022) Repaint: Inpainting using denoising diffusion probabilistic models. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11461–11471. https://doi.org/10.1109/CVPR52688.2022.01117
Chung H, Sim B, Ryu D et al (2022) Improving diffusion models for inverse problems using manifold constraints. Adv Neural Inf Process Syst 35:25683–25696
Meng C, He Y, Song Y et al (2022) SDEdit: guided image synthesis and editing with stochastic differential equations. In: Proceedings of the international conference on learning representations, pp 1–33. https://openreview.net/forum?id=aBsCjcPu_tE
Rombach R, Blattmann A, Lorenz D et al (2022) High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10684–10695. https://doi.org/10.1109/CVPR52688.2022.01042
Song J, Meng C, Ermon S (2021) Denoising diffusion implicit models. In: Proceedings of the international conference on learning representations, pp 1–20. https://openreview.net/forum?id=St1giarCHLP
Lu C, Zhou Y, Bao F et al (2022) DPM-solver: a fast ode solver for diffusion probabilistic model sampling in around 10 steps. Adv Neural Inf Process Syst 35:5775–5787
Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference. Springer, Berlin, pp 234–241. https://doi.org/10.1007/978-3-319-24574-4_28
Dosovitskiy A, Beyer L, Kolesnikov A et al (2021) An image is worth 16x16 words: transformers for image recognition at scale. In: Proceedings of the international conference on learning representations, pp 1–21. https://openreview.net/forum?id=YicbFdNTTy
He K, Chen X, Xie S et al (2022) Masked autoencoders are scalable vision learners. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 16000–16009.https://doi.org/10.1109/CVPR52688.2022.01553
Hoogeboom E, Gritsenko AA, Bastings J et al (2022) Autoregressive diffusion models. In: Proceedings of the international conference on learning representations, pp 1–23. https://openreview.net/forum?id=Lm8T39vLDTE
Lee K, Chang H, Jiang L et al (2022) ViTGAN: training GANs with vision transformers. In: Proceedings of the international conference on learning representations, pp 1–18. https://openreview.net/forum?id=dwg5rXg1WS_
Jiang Y, Chang S, Wang Z (2021) TransGAN: two pure transformers can make one strong GAN, and that can scale up. Adv Neural Inf Process Syst 34:14745–14758
Zhang B, Gu S, Zhang B et al (2022) StyleSwin: transformer-based GAN for high-resolution image generation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11304–11314. https://doi.org/10.1109/CVPR52688.2022.01102
Wang Z, Zheng H, He P et al (2022) Diffusion-GAN: training GANs with diffusion. arXiv:2206.02262
Xiao Z, Kreis K, Vahdat A (2022) Tackling the generative learning trilemma with denoising diffusion GANs. In: Proceedings of the international conference on learning representations, pp 1–28. https://openreview.net/forum?id=JprM0p-q0Co
Liu Z, Lin Y, Cao Y et al (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10012–10022. https://doi.org/10.1109/ICCV48922.2021.00986
Shao X, Ye H, Yang B et al (2023) Two-stream coupling network with bidirectional interaction between structure and texture for image inpainting. Expert Syst Appl 231:120700–120715. https://doi.org/10.1016/j.eswa.2023.120700
Dogan Y, Keles HY (2022) Iterative facial image inpainting based on an encoder-generator architecture. Neural Comput Appl 34(12):10001–10021. https://doi.org/10.1007/s00521-022-06987-y
Teterwak P, Sarna A, Krishnan D et al (2019) Boundless: generative adversarial networks for image extension. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10521–10530. https://doi.org/10.1109/ICCV.2019.01062
Saharia C, Chan W, Chang H et al (2022) Palette: image-to-image diffusion models. In: Proceedings of the ACM SIGGRAPH 2022 conference, pp 1–10. https://doi.org/10.1145/3528233.3530757
Lin CH, Lee HY, Cheng YC et al (2022) InfinityGAN: towards infinite-pixel image synthesis. arXiv:2104.03963
Guo X, Yang H, Huang D (2021) Image inpainting via conditional texture and structure dual generation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 14134–14143
Wang S, Li BZ, Khabsa M et al (2020) Linformer: self-attention with linear complexity. arXiv:2006.04768
Wu H, Wu J, Xu J et al (2022) Flowformer: linearizing transformers with conservation flows. In: Proceedings of the 39th international conference on machine learning, pp 24226–24242. https://proceedings.mlr.press/v162/wu22m
Woo S, Park J, Lee JY et al (2018) Cbam: convolutional block attention module. In: Proceedings of the European conference on computer vision, pp 3–19. https://doi.org/10.1007/978-3-030-01234-2_1
Miyato T, Kataoka T, Koyama M et al (2018) Spectral normalization for generative adversarial networks. In: Proceedings of the international conference on learning representations, pp 1–26. https://openreview.net/forum?id=B1QRgziT-
Musco C, Musco C (2015) Randomized block Krylov methods for stronger and faster approximate singular value decomposition. Adv Neural Inf Process Syst 28:1–9
Guo MH, Lu CZ, Liu ZN et al (2023) Visual attention network. Comput Visual Media 9(4):733–752. https://doi.org/10.1007/s41095-023-0364-2
Howard A, Sandler M, Chu G et al (2019) Searching for mobilenetv3. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 1314–1324. https://doi.org/10.1109/ICCV.2019.00140
Dumoulin V, Visin F (2016) A guide to convolution arithmetic for deep learning. arXiv:1603.07285v1
Johnson J, Alahi A, Fei-Fei L (2016) Perceptual losses for real-time style transfer and super-resolution. In: Computer vision–ECCV 2016: 14th European conference, pp 694–711. https://doi.org/10.1007/978-3-319-46475-6_43
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Russakovsky O, Deng J, Su H et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115:211–252. https://doi.org/10.1007/s11263-015-0816-y
Doersch C, Singh S, Gupta A et al (2012) What makes Paris look like Paris? ACM Trans Graph 31(4):1–9. https://doi.org/10.1145/2830541
Karras T, Aila T, Laine S et al (2018) Progressive growing of GANs for improved quality, stability, and variation. In: Proceedings of the international conference on learning representations, pp 1–26. https://openreview.net/forum?id=Hk99zCeAb
Zhou B, Lapedriza A, Khosla A et al (2017) Places: a 10 million image database for scene recognition. IEEE Trans Pattern Anal Mach Intell 40(6):1452–1464. https://doi.org/10.1109/TPAMI.2017.2723009
Heusel M, Ramsauer H, Unterthiner T et al (2017) GANs trained by a two time-scale update rule converge to a local Nash equilibrium. Adv Neural Inf Process Syst 30:1–12
Zhang R, Isola P, Efros AA et al (2018) The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 586–595. https://doi.org/10.1109/CVPR.2018.00068
Huang Z, Wang X, Huang L et al (2019) CCNet: criss-cross attention for semantic segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 603–612. https://doi.org/10.1109/ICCV.2019.00069
Acknowledgements
Supported by the National Natural Science Foundation of China (Grant Nos. 62272164 and 62306113) and Science and Technology on Space Intelligent Control Laboratory (Grant No. HTKJ2022KL502010).
Author information
Authors and Affiliations
Contributions
Conceptualization: Xiankang Yu, Zhihua Chen, Dai Lei; Methodology: Xiankang Yu, Dai Lei; Software: Xiankang Yu; Validation: Xiankang Yu; Formal analysis and investigation: Xiankang Yu, Lei Dai; Writing-original draft preparation: Xiankang Yu, Dai Lei; Writing-review and editing: Zhihua Chen, Bin Sheng; Funding acquisition: Zhihua Chen, Bin Sheng; Supervision: Zhihua Chen. All authors have read and approved the final manuscript.
Corresponding authors
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The original online version of this article was revised to correct the figures 5, 6, 7 and 8
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yu, X., Dai, L., Chen, Z. et al. AGG: attention-based gated convolutional GAN with prior guidance for image inpainting. Neural Comput & Applic 36, 12589–12604 (2024). https://doi.org/10.1007/s00521-024-09785-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-024-09785-w