DVI: Depth Guided Video Inpainting for Autonomous Driving

Liao, Miao; Lu, Feixiang; Zhou, Dingfu; Zhang, Sibo; Li, Wei; Yang, Ruigang

doi:10.1007/978-3-030-58589-1_1

Miao Liao¹²,
Feixiang Lu¹²,
Dingfu Zhou¹²,
Sibo Zhang¹²,
Wei Li¹² &
…
Ruigang Yang¹²

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12366))

Included in the following conference series:

European Conference on Computer Vision

Abstract

To get clear street-view and photo-realistic simulation in autonomous driving, we present an automatic video inpainting algorithm that can remove traffic agents from videos and synthesize missing regions with the guidance of depth/point cloud. By building a dense 3D map from stitched point clouds, frames within a video are geometrically correlated via this common 3D map. In order to fill a target inpainting area in a frame, it is straightforward to transform pixels from other frames into the current one with correct occlusion. Furthermore, we are able to fuse multiple videos through 3D point cloud registration, making it possible to inpaint a target video with multiple source videos. The motivation is to solve the long-time occlusion problem where an occluded area has never been visible in the entire video. To our knowledge, we are the first to fuse multiple videos for video inpainting. To verify the effectiveness of our approach, we build a large inpainting dataset in the real urban road environment with synchronized images and Lidar data including many challenge scenes, e.g., long time occlusion. The experimental results show that the proposed approach outperforms the state-of-the-art approaches for all the criteria, especially the RMSE (Root Mean Squared Error) has been reduced by about $\mathbf{13} \%$.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Automatic generation of video navigation from Google Street View data with car detection and inpainting

Article 13 December 2018

STViT+: improving self-supervised multi-camera depth estimation with spatial-temporal context and adversarial geometry regularization

Article Open access 16 January 2025

IVS3D: An Open Source Framework for Intelligent Video Sampling and Preprocessing to Facilitate 3D Reconstruction

References

Ballester, C., Bertalmio, M., Caselles, V., Sapiro, G., Verdera, J.: Filling-in by joint interpolation of vector fields and gray levels. Trans. Img. Proc. 10(8), 1200–1211 (2001). https://doi.org/10.1109/83.935036
Article MathSciNet MATH Google Scholar
Bertalmio, M., Vese, L., Sapiro, G., Osher, S.: Simultaneous structure and texture image inpainting. Trans. Img. Proc. 12(8), 882–889 (2003). https://doi.org/10.1109/TIP.2003.815261
Article Google Scholar
Cheng, X., Wang, P., Yang, R.: Depth estimation via affinity learned with convolutional spatial propagation network. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 103–119 (2018)
Google Scholar
Darabi, S., Shechtman, E., Barnes, C., Goldman, D.B., Sen, P.: Image melding: combining inconsistent images using patch-based synthesis. ACM Trans. Graph. (TOG) (Proc. SIGGRAPH 2012), 31(4), 82:1–82:10 (2012)
Google Scholar
Ebdelli, M., Le Meur, O., Guillemot, C.: Video inpainting with short-term windows: application to object removal and error concealment. IEEE Trans. Image Process. 24, 3034–3047 (2015). https://doi.org/10.1109/TIP.2015.2437193
Article MathSciNet MATH Google Scholar
Efros, A.A., Freeman, W.T.: Image quilting for texture synthesis and transfer. In: Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, pp. 341–346. SIGGRAPH 2001, ACM, New York, NY, USA (2001). https://doi.org/10.1145/383259.383296, http://doi.acm.org/10.1145/383259.383296
Goodfellow, I., et al.: Generative adversarial nets. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 27, pp. 2672–2680. Curran Associates, Inc. (2014). http://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf
Huang, J.B., Kang, S.B., Ahuja, N., Kopf, J.: Temporally coherent completion of dynamic video. ACM Trans. Graph. (TOG) 35(6), 196 (2016)
Google Scholar
Iizuka, S., Simo-Serra, E., Ishikawa, H.: Globally and locally consistent image completion. ACM Trans. Graph. (Proc. SIGGRAPH 2017) 36(4), 107:1–107:14 (2017)
Google Scholar
Izadi, S., et al.: Kinectfusion: real-time 3d reconstruction and interaction using a moving depth camera. In: Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology, pp. 559–568. UIST 2011, ACM, New York, NY, USA (2011). DOIurlhttp://doi.org/10.1145/2047196.2047270, http://doi.acm.org/10.1145/2047196.2047270
Li, W., et al.: Aads: augmented autonomous driving simulation using data-driven algorithms. Sci. Robot. 4(28) (2019). https://doi.org/10.1126/scirobotics.aaw0863, https://robotics.sciencemag.org/content/4/28/eaaw0863
Ma, Y., Zhu, X., Zhang, S., Yang, R., Wang, W., Manocha, D.: Trafficpredict: trajectory prediction for heterogeneous traffic-agents. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 6120–6127 (2019). https://arxiv.org/pdf/1811.02146.pdf
Newson, A., Almansa, A., Fradet, M., Gousseau, Y., Pérez, P.: Towards fast, generic video inpainting. In: Proceedings of the 10th European Conference on Visual Media Production, pp. 7:1–7:8. CVMP 2013, ACM, New York, NY, USA (2013). https://doi.org/10.1145/2534008.2534019, http://doi.acm.org/10.1145/2534008.2534019
Newson, A., Almansa, A., Fradet, M., Gousseau, Y., Pérez, P.: Video inpainting of complex scenes. SIAM J. Imaging Sci. 7, 1993–2019 (2014). https://doi.org/10.1137/140954933
Article MathSciNet MATH Google Scholar
Pathak, D., Krähenbühl, P., Donahue, J., Darrell, T., Efros, A.: Context encoders: Feature learning by inpainting. In: Computer Vision and Pattern Recognition (CVPR) (2016)
Google Scholar
Perazzi, F., Pont-Tuset, J., McWilliams, B., Van Gool, L., Gross, M., Sorkine-Hornung, A.: A benchmark dataset and evaluation methodology for video object segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 724–732 (2016)
Google Scholar
Pérez, P., Gangnet, M., Blake, A.: Poisson image editing. In: ACM SIGGRAPH 2003 Papers, pp. 313–318. SIGGRAPH 2003, ACM, New York, NY, USA (2003). https://doi.org/10.1145/1201775.882269, http://doi.acm.org/10.1145/1201775.882269
Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: deep hierarchical feature learning on point sets in a metric space. arXiv preprint arXiv:1706.02413 (2017)
Ren, J.S., Xu, L., Yan, Q., Sun, W.: Shepard convolutional neural networks. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 28, pp. 901–909. Curran Associates, Inc. (2015). http://papers.nips.cc/paper/5774-shepard-convolutional-neural-networks.pdf
Shih, T.K., Tang, N.C., Hwang, J.N.: Exemplar-based video inpainting without ghost shadow artifacts by maintaining temporal continuity. IEEE Trans. Cir. and Sys. for Video Technol. 19(3), 347–360 (2009). https://doi.org/10.1109/TCSVT.2009.2013519, http://dx.doi.org/10.1109/TCSVT.2009.2013519
Simakov, D., Caspi, Y., Shechtman, E., Irani, M.: Summarizing visual data using bidirectional similarity. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE (2008)
Google Scholar
Steinbrücker, F., Sturm, J., Cremers, D.: Real-time visual odometry from dense RGB-D images. In: 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), pp. 719–722, November 2011. https://doi.org/10.1109/ICCVW.2011.6130321
Wang, C., Huang, H., Han, X., Wang, J.: Video inpainting by jointly learning temporal structure and spatial details. In: Proceedings of the 33th AAAI Conference on Artificial Intelligence (2019)
Google Scholar
Xu, R., Li, X., Zhou, B., Loy, C.C.: Deep flow-guided video inpainting. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019
Google Scholar
Yu, J., Lin, Z., Yang, J., Shen, X., Lu, X., Huang, T.S.: Generative image inpainting with contextual attention. arXiv preprint arXiv:1801.07892 (2018)
Zhang, J., Singh, S.: Loam: lidar odometry and mapping in real-time. In: Robotics: Science and Systems Conference, July 2014
Google Scholar
Zhang, R., et al.: Autoremover: automatic object removal for autonomous driving videos. arXiv preprint arXiv:1911.12588 (2019)

Download references

Author information

Authors and Affiliations

Baidu Research, Baidu Inc., Beijing, China
Miao Liao, Feixiang Lu, Dingfu Zhou, Sibo Zhang, Wei Li & Ruigang Yang

Authors

Miao Liao
View author publications
You can also search for this author in PubMed Google Scholar
Feixiang Lu
View author publications
You can also search for this author in PubMed Google Scholar
Dingfu Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Sibo Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Wei Li
View author publications
You can also search for this author in PubMed Google Scholar
Ruigang Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Feixiang Lu .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liao, M., Lu, F., Zhou, D., Zhang, S., Li, W., Yang, R. (2020). DVI: Depth Guided Video Inpainting for Autonomous Driving. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12366. Springer, Cham. https://doi.org/10.1007/978-3-030-58589-1_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-58589-1_1
Published: 12 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58588-4
Online ISBN: 978-3-030-58589-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

DVI: Depth Guided Video Inpainting for Autonomous Driving

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Automatic generation of video navigation from Google Street View data with car detection and inpainting

STViT+: improving self-supervised multi-camera depth estimation with spatial-temporal context and adversarial geometry regularization

IVS3D: An Open Source Framework for Intelligent Video Sampling and Preprocessing to Facilitate 3D Reconstruction

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

DVI: Depth Guided Video Inpainting for Autonomous Driving

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Automatic generation of video navigation from Google Street View data with car detection and inpainting

STViT+: improving self-supervised multi-camera depth estimation with spatial-temporal context and adversarial geometry regularization

IVS3D: An Open Source Framework for Intelligent Video Sampling and Preprocessing to Facilitate 3D Reconstruction

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation