Abstract
Consider a video sequence captured by a single camera observing a complex dynamic scene containing an unknown mixture of multiple moving and possibly deforming objects. In this paper we propose an unsupervised approach to the challenging problem of simultaneously segmenting the scene into its constituent objects and reconstructing a 3D model of the scene. The strength of our approach comes from the ability to deal with real-world dynamic scenes and to handle seamlessly different types of motion: rigid, articulated and non-rigid. We formulate the problem as hierarchical graph-cut based segmentation where we decompose the whole scene into background and foreground objects and model the complex motion of non-rigid or articulated objects as a set of overlapping rigid parts. We evaluate the motion segmentation functionality of our approach on the Berkeley Motion Segmentation Dataset. In addition, to validate the capability of our approach to deal with real-world scenes we provide 3D reconstructions of some challenging videos from the YouTube-Objects dataset.
This research was funded by the European Research Council under the ERC Starting Grant agreement 204871-HUMANIS.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Adams, A., Baek, J., Davis, A.: Fast high-dimensional filtering using the permutohedral lattice. In: Eurographics (2010)
Bleyer, M., Rother, C., Kohli, P.: Surface stereo with soft segmentation. In: CVPR (2010)
Boros, E., Hammer, P.L.: Pseudo-boolean optimization. Discrete Applied Mathematics, 155–225 (2002)
Boykov, Y., Kolmogorov, V.: An Experimental Comparison of Min-Cut/Max-Flow Algorithms for Energy Minimization in Vision. PAMI 26(9), 1124–1137 (2004)
Boykov, Y., Veksler, O., Zabih, R.: Fast approximate energy minimization via graph cuts. PAMI 23 (2001)
Brox, T., Malik, J.: Object segmentation by long term analysis of point trajectories. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 282–295. Springer, Heidelberg (2010)
Costeira, J., Kanade, T.: A multi-body factorization method for motion analysis. In: ICCV, pp. 1071–1076 (1995)
Elhamifar, E., Vidal, R.: Sparse subspace clustering. In: CVPR (2009)
Fayad, J., Russell, C., Agapito, L.: Automated articulated structure and 3D shape recovery from point correspondences. In: IEEE International Conference on Computer Vision (ICCV), Barcelona, Spain (November 2011)
Fitzgibbon, A.W., Zisserman, A.: Multibody structure and motion: 3-D reconstruction of independently moving objects. In: Vernon, D. (ed.) ECCV 2000. LNCS, vol. 1842, pp. 891–906. Springer, Heidelberg (2000)
Galasso, F., Cipolla, R., Schiele, B.: Video segmentation with superpixels. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012, Part I. LNCS, vol. 7724, pp. 760–774. Springer, Heidelberg (2013)
Garg, R., Roussos, A., Agapito, L.: Dense variational reconstruction of non-rigid surfaces from monocular video. In: CVPR (2013)
Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision. Cambridge University Press (2000)
Isack, H., Boykov, Y.: Energy-based geometric multi-model fitting. International Journal of Computer Vision (IJCV) 97(2) (2012)
Kanatani, K.: Motion segmentation by subspace separation and model selection. In: ICCV, Vancouver, Canada, vol. 2, pp. 301–306 (July 2001)
Kohli, P., Ladicky, L., Torr, P.: Robust higher order potentials for enforcing label consistency. In: CVPR (2008)
Ladickỳ, L., Russell, C., Kohli, P., Torr, P.H.: Inference methods for crfs with co-occurrence statistics. International Journal of Computer Vision 103(2), 213–225 (2013)
Li, Z., Guo, J., Cheong, L.-F., Zhou, Z.: Perspective motion segmentation via collaborative clustering. In: ICCV (2013)
Lourakis, M.A., Argyros, A.: SBA: A Software Package for Generic Sparse Bundle Adjustment. ACM Trans. Math. Software (2009)
Narasimhan, M., Bilmes, J.A.: A submodular-supermodular procedure with applications to discriminative structure learning. arXiv preprint arXiv:1207.1404 (2012)
Ozden, K., Schindler, K., van Gool, L.: Multibody structure-from-motion in practice. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) (2010)
Paladini, M., Del Bue, A., Xavier, J., Agapito, L., Stosic, M., Dodig, M.: Factorization for Non-Rigid and Articulated Structure using Metric Projections. IJCV (2012)
Prest, A., Leistner, C., Civera, J., Schmid, C., Ferrari, V.: Learning object class detectors from weakly annotated video. In: CVPR (2012)
Rao, S., Tron, R., Vidal, R., Ma, Y.: Motion segmentation in the presence of outlying, incomplete or corrupted trajectories. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 32(10), 1832–1845 (2010)
Roussos, A., Russell, C., Garg, R., Agapito, L.: Dense multibody motion estimation and reconstruction from a handheld camera. In: ISMAR (2012)
Russell, C., Fayad, J., Agapito, L.: Energy based multiple model fitting for non-rigid structure from motion. In: CVPR (2011)
Schindler, K., Suter, D., Wang, H.: A model selection framework for multibody structure-and-motion of image sequences. International Journal of Computer Vision (IJCV) 79(2), 159–177 (2008)
Siva, P., Russell, C., Xiang, T., Agapito, L.: Looking beyond the image: Unsupervised learning for object saliency and detection. In: CVPR (2013)
Sundaram, N., Brox, T., Keutzer, K.: Dense point trajectories by GPU-accelerated large displacement optical flow. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part I. LNCS, vol. 6311, pp. 438–451. Springer, Heidelberg (2010)
Tomasi, C., Kanade, T.: Shape and motion from image streams: a factorization method - part 3 detection and tracking of point features. Technical Report CMU-CS-91-132, Computer Science Department, Carnegie Mellon University, Pittsburgh, PA (April 1991)
Torresani, L., Hertzmann, A., Bregler, C.: Non-rigid structure-from-motion: Estimating shape and motion with hierarchical priors. PAMI, 878–892 (2008)
Tresadern, P., Reid, I.: Articulated structure from motion by factorization. In: CVPR, vol. 2, pp. 1110–1115 (June 2005)
Varol, A., Salzmann, M., Tola, E., Fua, P.: Template-free monocular reconstruction of deformable surfaces. In: ICCV (2009)
Vidal, R., Ma, Y., Sastry, S.: Generalized principal component analysis (gpca). In: CVPR, pp. 621–628 (2003)
Xu, C., Corso, J.J.: Evaluation of super-voxel methods for early video processing. In: CVPR (2012)
Yan, J., Pollefeys, M.: A factorization-based approach for articulated non-rigid shape, motion and kinematic chain recovery from video. PAMI 30(5) (May 2008)
Yuille, A.L., Rangarajan, A.: The concave-convex procedure (cccp). In: NIPS (2002)
Zelnik-Manor, L., Irani, M.: Degeneracies, dependencies and their implications in multi-body and multi-sequence factorizations. In: CVPR, vol. 2, pp. 287–293 (June 2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Russell, C., Yu, R., Agapito, L. (2014). Video Pop-up: Monocular 3D Reconstruction of Dynamic Scenes. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds) Computer Vision – ECCV 2014. ECCV 2014. Lecture Notes in Computer Science, vol 8695. Springer, Cham. https://doi.org/10.1007/978-3-319-10584-0_38
Download citation
DOI: https://doi.org/10.1007/978-3-319-10584-0_38
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10583-3
Online ISBN: 978-3-319-10584-0
eBook Packages: Computer ScienceComputer Science (R0)