Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

DVI: Depth Guided Video Inpainting for Autonomous Driving

  • Conference paper
  • First Online:
Computer Vision – ECCV 2020 (ECCV 2020)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12366))

Included in the following conference series:

Abstract

To get clear street-view and photo-realistic simulation in autonomous driving, we present an automatic video inpainting algorithm that can remove traffic agents from videos and synthesize missing regions with the guidance of depth/point cloud. By building a dense 3D map from stitched point clouds, frames within a video are geometrically correlated via this common 3D map. In order to fill a target inpainting area in a frame, it is straightforward to transform pixels from other frames into the current one with correct occlusion. Furthermore, we are able to fuse multiple videos through 3D point cloud registration, making it possible to inpaint a target video with multiple source videos. The motivation is to solve the long-time occlusion problem where an occluded area has never been visible in the entire video. To our knowledge, we are the first to fuse multiple videos for video inpainting. To verify the effectiveness of our approach, we build a large inpainting dataset in the real urban road environment with synchronized images and Lidar data including many challenge scenes, e.g., long time occlusion. The experimental results show that the proposed approach outperforms the state-of-the-art approaches for all the criteria, especially the RMSE (Root Mean Squared Error) has been reduced by about \(\mathbf{13} \%\).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Ballester, C., Bertalmio, M., Caselles, V., Sapiro, G., Verdera, J.: Filling-in by joint interpolation of vector fields and gray levels. Trans. Img. Proc. 10(8), 1200–1211 (2001). https://doi.org/10.1109/83.935036

    Article  MathSciNet  MATH  Google Scholar 

  2. Bertalmio, M., Vese, L., Sapiro, G., Osher, S.: Simultaneous structure and texture image inpainting. Trans. Img. Proc. 12(8), 882–889 (2003). https://doi.org/10.1109/TIP.2003.815261

    Article  Google Scholar 

  3. Cheng, X., Wang, P., Yang, R.: Depth estimation via affinity learned with convolutional spatial propagation network. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 103–119 (2018)

    Google Scholar 

  4. Darabi, S., Shechtman, E., Barnes, C., Goldman, D.B., Sen, P.: Image melding: combining inconsistent images using patch-based synthesis. ACM Trans. Graph. (TOG) (Proc. SIGGRAPH 2012), 31(4), 82:1–82:10 (2012)

    Google Scholar 

  5. Ebdelli, M., Le Meur, O., Guillemot, C.: Video inpainting with short-term windows: application to object removal and error concealment. IEEE Trans. Image Process. 24, 3034–3047 (2015). https://doi.org/10.1109/TIP.2015.2437193

    Article  MathSciNet  MATH  Google Scholar 

  6. Efros, A.A., Freeman, W.T.: Image quilting for texture synthesis and transfer. In: Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, pp. 341–346. SIGGRAPH 2001, ACM, New York, NY, USA (2001). https://doi.org/10.1145/383259.383296, http://doi.acm.org/10.1145/383259.383296

  7. Goodfellow, I., et al.: Generative adversarial nets. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 27, pp. 2672–2680. Curran Associates, Inc. (2014). http://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf

  8. Huang, J.B., Kang, S.B., Ahuja, N., Kopf, J.: Temporally coherent completion of dynamic video. ACM Trans. Graph. (TOG) 35(6), 196 (2016)

    Google Scholar 

  9. Iizuka, S., Simo-Serra, E., Ishikawa, H.: Globally and locally consistent image completion. ACM Trans. Graph. (Proc. SIGGRAPH 2017) 36(4), 107:1–107:14 (2017)

    Google Scholar 

  10. Izadi, S., et al.: Kinectfusion: real-time 3d reconstruction and interaction using a moving depth camera. In: Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology, pp. 559–568. UIST 2011, ACM, New York, NY, USA (2011). DOIurlhttp://doi.org/10.1145/2047196.2047270, http://doi.acm.org/10.1145/2047196.2047270

  11. Li, W., et al.: Aads: augmented autonomous driving simulation using data-driven algorithms. Sci. Robot. 4(28) (2019). https://doi.org/10.1126/scirobotics.aaw0863, https://robotics.sciencemag.org/content/4/28/eaaw0863

  12. Ma, Y., Zhu, X., Zhang, S., Yang, R., Wang, W., Manocha, D.: Trafficpredict: trajectory prediction for heterogeneous traffic-agents. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 6120–6127 (2019). https://arxiv.org/pdf/1811.02146.pdf

  13. Newson, A., Almansa, A., Fradet, M., Gousseau, Y., Pérez, P.: Towards fast, generic video inpainting. In: Proceedings of the 10th European Conference on Visual Media Production, pp. 7:1–7:8. CVMP 2013, ACM, New York, NY, USA (2013). https://doi.org/10.1145/2534008.2534019, http://doi.acm.org/10.1145/2534008.2534019

  14. Newson, A., Almansa, A., Fradet, M., Gousseau, Y., Pérez, P.: Video inpainting of complex scenes. SIAM J. Imaging Sci. 7, 1993–2019 (2014). https://doi.org/10.1137/140954933

    Article  MathSciNet  MATH  Google Scholar 

  15. Pathak, D., Krähenbühl, P., Donahue, J., Darrell, T., Efros, A.: Context encoders: Feature learning by inpainting. In: Computer Vision and Pattern Recognition (CVPR) (2016)

    Google Scholar 

  16. Perazzi, F., Pont-Tuset, J., McWilliams, B., Van Gool, L., Gross, M., Sorkine-Hornung, A.: A benchmark dataset and evaluation methodology for video object segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 724–732 (2016)

    Google Scholar 

  17. Pérez, P., Gangnet, M., Blake, A.: Poisson image editing. In: ACM SIGGRAPH 2003 Papers, pp. 313–318. SIGGRAPH 2003, ACM, New York, NY, USA (2003). https://doi.org/10.1145/1201775.882269, http://doi.acm.org/10.1145/1201775.882269

  18. Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: deep hierarchical feature learning on point sets in a metric space. arXiv preprint arXiv:1706.02413 (2017)

  19. Ren, J.S., Xu, L., Yan, Q., Sun, W.: Shepard convolutional neural networks. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 28, pp. 901–909. Curran Associates, Inc. (2015). http://papers.nips.cc/paper/5774-shepard-convolutional-neural-networks.pdf

  20. Shih, T.K., Tang, N.C., Hwang, J.N.: Exemplar-based video inpainting without ghost shadow artifacts by maintaining temporal continuity. IEEE Trans. Cir. and Sys. for Video Technol. 19(3), 347–360 (2009). https://doi.org/10.1109/TCSVT.2009.2013519, http://dx.doi.org/10.1109/TCSVT.2009.2013519

  21. Simakov, D., Caspi, Y., Shechtman, E., Irani, M.: Summarizing visual data using bidirectional similarity. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE (2008)

    Google Scholar 

  22. Steinbrücker, F., Sturm, J., Cremers, D.: Real-time visual odometry from dense RGB-D images. In: 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), pp. 719–722, November 2011. https://doi.org/10.1109/ICCVW.2011.6130321

  23. Wang, C., Huang, H., Han, X., Wang, J.: Video inpainting by jointly learning temporal structure and spatial details. In: Proceedings of the 33th AAAI Conference on Artificial Intelligence (2019)

    Google Scholar 

  24. Xu, R., Li, X., Zhou, B., Loy, C.C.: Deep flow-guided video inpainting. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019

    Google Scholar 

  25. Yu, J., Lin, Z., Yang, J., Shen, X., Lu, X., Huang, T.S.: Generative image inpainting with contextual attention. arXiv preprint arXiv:1801.07892 (2018)

  26. Zhang, J., Singh, S.: Loam: lidar odometry and mapping in real-time. In: Robotics: Science and Systems Conference, July 2014

    Google Scholar 

  27. Zhang, R., et al.: Autoremover: automatic object removal for autonomous driving videos. arXiv preprint arXiv:1911.12588 (2019)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Feixiang Lu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liao, M., Lu, F., Zhou, D., Zhang, S., Li, W., Yang, R. (2020). DVI: Depth Guided Video Inpainting for Autonomous Driving. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12366. Springer, Cham. https://doi.org/10.1007/978-3-030-58589-1_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58589-1_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58588-4

  • Online ISBN: 978-3-030-58589-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics