Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1007/978-3-030-58517-4_31guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Learning Joint Spatial-Temporal Transformations for Video Inpainting

Published: 23 August 2020 Publication History

Abstract

High-quality video inpainting that completes missing regions in video frames is a promising yet challenging task. State-of-the-art approaches adopt attention models to complete a frame by searching missing contents from reference frames, and further complete whole videos frame by frame. However, these approaches can suffer from inconsistent attention results along spatial and temporal dimensions, which often leads to blurriness and temporal artifacts in videos. In this paper, we propose to learn a joint Spatial-Temporal Transformer Network (STTN) for video inpainting. Specifically, we simultaneously fill missing regions in all input frames by self-attention, and propose to optimize STTN by a spatial-temporal adversarial loss. To show the superiority of the proposed model, we conduct both quantitative and qualitative evaluations by using standard stationary masks and more realistic moving object masks. Demo videos are available at https://github.com/researchmm/STTN.

References

[1]
Barnes C, Shechtman E, Finkelstein A, and Goldman DB PatchMatch: a randomized correspondence algorithm for structural image editing TOG 2009 28 3 24:1-24:11
[2]
Bertalmio, M., Bertozzi, A.L., Sapiro, G.: Navier-stokes, fluid dynamics, and image and video inpainting. In: CVPR, pp. 355–362 (2001)
[3]
Caelles, S., et al.: The 2018 DAVIS challenge on video object segmentation. arXiv (2018)
[4]
Carreira, J., Zisserman, A.: Quo vadis, action recognition? A new model and the kinetics dataset. In: CVPR, pp. 6299–6308 (2017)
[5]
Chang, Y.L., Liu, Z.Y., Lee, K.Y., Hsu, W.: Free-form video inpainting with 3D gated convolution and temporal PatchGAN. In: ICCV, pp. 9066–9075 (2019)
[6]
Chang, Y.L., Liu, Z.Y., Lee, K.Y., Hsu, W.: Learnable gated temporal shift module for deep video inpainting. In: BMVC (2019)
[7]
Criminisi A, Pérez P, and Toyama K Region filling and object removal by exemplar-based image inpainting TIP 2004 13 9 1200-1212
[8]
Gatys, L.A., Ecker, A.S., Bethge, M.: Image style transfer using convolutional neural networks. In: CVPR, pp. 2414–2423 (2016)
[9]
Girdhar, R., Carreira, J., Doersch, C., Zisserman, A.: Video action transformer network. In: CVPR, pp. 244–253 (2019)
[10]
Granados M, Tompkin J, Kim K, Grau O, Kautz J, and Theobalt C How not to be seen-object removal from videos of crowded scenes Comput. Graph. Forum 2012 31 21 219-228
[11]
Hausman DM and Woodward J Independence, invariance and the causal Markov condition Br. J. Philos. Sci. 1999 50 4 521-583
[12]
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
[13]
Huang JB, Kang SB, Ahuja N, and Kopf J Temporally coherent completion of dynamic video TOG 2016 35 6 1-11
[14]
Johnson J, Alahi A, and Fei-Fei L Leibe B, Matas J, Sebe N, and Welling M Perceptual losses for real-time style transfer and super-resolution Computer Vision – ECCV 2016 2016 Cham Springer 694-711
[15]
Kim, D., Woo, S., Lee, J.Y., Kweon, I.S.: Deep blind video decaptioning by temporal aggregation and recurrence. In: CVPR, pp. 4263–4272 (2019)
[16]
Kim, D., Woo, S., Lee, J.Y., Kweon, I.S.: Deep video inpainting. In: CVPR, pp. 5792–5801 (2019)
[17]
Lai W-S, Huang J-B, Wang O, Shechtman E, Yumer E, and Yang M-H Ferrari V, Hebert M, Sminchisescu C, and Weiss Y Learning blind video temporal consistency Computer Vision – ECCV 2018 2018 Cham Springer 179-195
[18]
Lee, S., Oh, S.W., Won, D., Kim, S.J.: Copy-and-paste networks for deep video inpainting. In: ICCV, pp. 4413–4421 (2019)
[19]
Lin, J., Gan, C., Han, S.: TSM: temporal shift module for efficient video understanding. In: ICCV, pp. 7083–7093 (2019)
[20]
Liu G, Reda FA, Shih KJ, Wang T-C, Tao A, and Catanzaro B Ferrari V, Hebert M, Sminchisescu C, and Weiss Y Image inpainting for irregular holes using partial convolutions Computer Vision – ECCV 2018 2018 Cham Springer 89-105
[21]
Ma, S., Fu, J., Wen Chen, C., Mei, T.: DA-GAN: instance-level image translation by deep attention generative adversarial networks. In: CVPR, pp. 5657–5666 (2018)
[22]
Matsushita Y, Ofek E, Ge W, Tang X, and Shum HY Full-frame video stabilization with motion inpainting TPAMI 2006 28 7 1150-1163
[23]
Nazeri, K., Ng, E., Joseph, T., Qureshi, F., Ebrahimi, M.: EdgeConnect: generative image inpainting with adversarial edge learning. In: ICCVW (2019)
[24]
Newson A, Almansa A, Fradet M, Gousseau Y, and Pérez P Video inpainting of complex scenes SIAM J. Imaging Sci. 2014 7 4 1993-2019
[25]
Oh, S.W., Lee, S., Lee, J.Y., Kim, S.J.: Onion-peel networks for deep video completion. In: ICCV, pp. 4403–4412 (2019)
[26]
Patwardhan, K.A., Sapiro, G., Bertalmio, M.: Video inpainting of occluding and occluded objects. In: ICIP, pp. 11–69 (2005)
[27]
Patwardhan KA, Sapiro G, and Bertalmío M Video inpainting under constrained camera motion TIP 2007 16 2 545-553
[28]
Vaswani, A., et al.: Attention is all you need. In: NeurIPS, pp. 5998–6008 (2017)
[29]
Wang, C., Huang, H., Han, X., Wang, J.: Video inpainting by jointly learning temporal structure and spatial details. In: AAAI, pp. 5232–5239 (2019)
[30]
Wang, T.C., et al.: Video-to-video synthesis. In: NeuraIPS, pp. 1152–1164 (2018)
[31]
Wexler Y, Shechtman E, and Irani M Space-time completion of video TPAMI 2007 29 3 463-476
[32]
Xu, N., et al.: YouTube-VOS: a large-scale video object segmentation benchmark. arXiv (2018)
[33]
Xu, R., Li, X., Zhou, B., Loy, C.C.: Deep flow-guided video inpainting. In: CVPR, pp. 3723–3732 (2019)
[34]
Yang, F., Yang, H., Fu, J., Lu, H., Guo, B.: Learning texture transformer network for image super-resolution. In: CVPR, pp. 5791–5800 (2020)
[35]
Yu, J., Lin, Z., Yang, J., Shen, X., Lu, X., Huang, T.S.: Free-form image inpainting with gated convolution. In: ICCV, pp. 4471–4480 (2019)
[36]
Zeng, Y., Fu, J., Chao, H., Guo, B.: Learning pyramid-context encoder network for high-quality image inpainting. In: CVPR, pp. 1486–1494 (2019)
[37]
Zhang, H., Mai, L., Xu, N., Wang, Z., Collomosse, J., Jin, H.: An internal learning approach to video inpainting. In: CVPR, pp. 2720–2729 (2019)
[38]
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: CVPR, pp. 586–595 (2018)

Cited By

View all
  • (2024)BONES: Near-Optimal Neural-Enhanced Video StreamingProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/36560148:2(1-28)Online publication date: 29-May-2024
  • (2023)Bitstream-corrupted video recoveryProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3669113(68420-68433)Online publication date: 10-Dec-2023
  • (2023)Look ma, no hands! agent-environment factorization of egocentric videosProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667060(21466-21486)Online publication date: 10-Dec-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
Computer Vision – ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XVI
Aug 2020
845 pages
ISBN:978-3-030-58516-7
DOI:10.1007/978-3-030-58517-4

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 23 August 2020

Author Tags

  1. Video inpainting
  2. Generative adversarial networks

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)BONES: Near-Optimal Neural-Enhanced Video StreamingProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/36560148:2(1-28)Online publication date: 29-May-2024
  • (2023)Bitstream-corrupted video recoveryProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3669113(68420-68433)Online publication date: 10-Dec-2023
  • (2023)Look ma, no hands! agent-environment factorization of egocentric videosProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667060(21466-21486)Online publication date: 10-Dec-2023
  • (2023)UMMAFormer: A Universal Multimodal-adaptive Transformer Framework for Temporal Forgery LocalizationProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3613767(8749-8759)Online publication date: 26-Oct-2023
  • (2023)Mask-Guided Progressive Network for Joint Raindrop and Rain Streak Removal in VideosProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612001(7216-7225)Online publication date: 26-Oct-2023
  • (2023)Semantic-Guided Completion Network for Video Inpainting in Complex Urban ScenePattern Recognition and Computer Vision10.1007/978-981-99-8552-4_18(224-236)Online publication date: 13-Oct-2023
  • (2023)VIFST: Video Inpainting Localization Using Multi-view Spatial-Frequency TracesPRICAI 2023: Trends in Artificial Intelligence10.1007/978-981-99-7025-4_37(434-446)Online publication date: 15-Nov-2023
  • (2023)Generalizable Deep Video Inpainting Detection Based on Constrained Convolutional Neural NetworksDigital Forensics and Watermarking10.1007/978-981-97-2585-4_9(125-138)Online publication date: 25-Nov-2023
  • (2023)CycleSTTN: A Learning-Based Temporal Model for Specular Augmentation in EndoscopyMedical Image Computing and Computer Assisted Intervention – MICCAI 202310.1007/978-3-031-43999-5_54(570-580)Online publication date: 8-Oct-2023
  • (2022)Class-aware adversarial transformers for medical image segmentationProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602415(29582-29596)Online publication date: 28-Nov-2022
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media