Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Unstructured video-based rendering: interactive exploration of casually captured videos

Published: 26 July 2010 Publication History

Abstract

We present an algorithm designed for navigating around a performance that was filmed as a "casual" multi-view video collection: real-world footage captured on hand held cameras by a few audience members. The objective is to easily navigate in 3D, generating a video-based rendering (VBR) of a performance filmed with widely separated cameras. Casually filmed events are especially challenging because they yield footage with complicated backgrounds and camera motion. Such challenging conditions preclude the use of most algorithms that depend on correlation-based stereo or 3D shape-from-silhouettes.
Our algorithm builds on the concepts developed for the exploration of photo-collections of empty scenes. Interactive performer-specific view-interpolation is now possible through innovations in interactive rendering and offline-matting relating to i) modeling the foreground subject as video-sprites on billboards, ii) modeling the background geometry with adaptive view-dependent textures, and iii) view interpolation that follows a performer. The billboards are embedded in a simple but realistic reconstruction of the environment. The reconstructed environment provides very effective visual cues for spatial navigation as the user transitions between viewpoints. The prototype is tested on footage from several challenging events, and demonstrates the editorial utility of the whole system and the particular value of our new inter-billboard optimization.

Supplementary Material

JPG File (tp124-10.jpg)
Supplemental material. (087.zip)
video.avi - Main video Additional material can be found in http://cvg.ethz.ch/research/unstructured-vbr/
MP4 File (tp124-10.mp4)

References

[1]
Arulampalam, M. S., Maskell, S., and Gordon, N. 2002. A tutorial on particle filters for online nonlinear/non-gaussian bayesian tracking. IEEE Trans. Signal Processing 50, 174--188.
[2]
Bai, X., Wang, J., Simons, D., and Sapiro, G. 2009. Video snapcut: robust video object cutout using localized classifiers. ACM Trans. Graph. 28, 3.
[3]
Ballan, L., and Cortelazzo, G. M. 2008. Marker-less motion capture of skinned models in a four camera set-up using optical flow and silhouettes. In 3DPVT.
[4]
Boykov, Y., and Kolmogorov, V. 2004. An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Trans. Pattern Anal. Mach. Intell. 26, 9, 1124--1137.
[5]
Buehler, C., Bosse, M., McMillan, L., Gortler, S. J., and Cohen, M. F. 2001. Unstructured lumigraph rendering. In SIGGRAPH, 425--432.
[6]
Campbell, N. D., Vogiatzis, G., Hernández, C., and Cipolla, R. 2007. Automatic 3d object segmentation in multiple views using volumetric graph-cuts. In 18th British Machine Vision Conference, vol. 1, 530--539.
[7]
Carranza, J., Theobalt, C., Magnor, M. A., and peter Seidel, H. 2003. Free-viewpoint video of human actors. In ACM Transactions on Graphics, 569--577.
[8]
Chen, S. E., and Williams, L. 1993. View interpolation for image synthesis. In SIGGRAPH '93: Proceedings of the 20th annual conference on Computer graphics and interactive techniques, 279--288.
[9]
Chuang, Y.-Y., Curless, B., Salesin, D. H., and Szeliski, R. 2001. A bayesian approach to digital matting. In Proceedings of IEEE CVPR 2001, vol. 2, 264--271.
[10]
Chuang, Y.-Y., Agarwala, A., Curless, B., Salesin, D. H., and Szeliski, R. 2002. Video matting of complex scenes. ACM Transactions on Graphics 21, 3 (July), 243--248.
[11]
de Aguiar, E., Stoll, C., Theobalt, C., Ahmed, N., Seidel, H. P., and Thrun, S. 2008. Performance capture from sparse multi-view video. ACM Trans. Graph. 27, 3, 1--10.
[12]
Debevec, P. E., Taylor, C. J., and Malik, J. 1996. Modeling and rendering architecture from photographs: A hybrid geometry- and image-based approach. In Proceedings of SIGGRAPH 96, Computer Graphics Proceedings, Annual Conference Series, 11--20.
[13]
Debevec, P., Borshukov, G., and Yu, Y. 1998. Efficient view-dependent image-based rendering with projective texture-mapping. In 9th Eurographics Workshop on Rendering.
[14]
Dragicevic, P., Ramos, G., Bibliowitcz, J., Nowrouzezahrai, D., Balakrishnan, R., and Singh, K. 2008. Video browsing by direct manipulation. In CHI '08: Proceeding of the twenty-sixth annual SIGCHI conference on Human factors in computing systems, 237--246.
[15]
Eisemann, M., Decker, B. D., Magnor, M., Bekaert, P., de Aguiar, E., Ahmed, N., Theobalt, C., and Sellent, A. 2008. Floating Textures. Computer Graphics Forum (Proc. Eurographics EG'08) 27, 2 (4), 409--418.
[16]
Franco, J.-S., and Boyer, E. 2005. Fusion of multi-view silhouette cues using a space occupancy grid. In ICCV, 1747--1753.
[17]
Goesele, M., Snavely, N., Curless, B., Hoppe, H., and Seitz, S. M. 2007. Multi-view stereo for community photo collections. In ICCV, 1--8.
[18]
Goldman, D. B., Gonterman, C., Curless, B., Salesin, D., and Seitz, S. M. 2008. Video object annotation, navigation, and composition. In UIST '08: Proceedings of the 21st annual ACM symposium on User interface software and technology, 3--12.
[19]
Goldman, D. B. 2007. A framework for video annotation, visualization, and interaction. PhD thesis.
[20]
Gortler, S. J., Grzeszczuk, R., Szeliski, R., and Cohen, M. F. 1996. The lumigraph. In SIGGRAPH, 43--54.
[21]
Grundland, M., Vohra, R., Williams, G. P., and Dodgson, N. A. 2006. Cross dissolve without cross fade: Preserving contrast, color and salience in image compositing. In Proceedings of EUROGRAPHICS, Computer Graphics Forum, 577--586.
[22]
Guillemaut, J.-Y., Hilton, A., Starck, J., Kilner, J., and Grau, O. 2007. A bayesian framework for simultaneous matting and 3d reconstruction. In 3DIM '07: Proceedings of the Sixth International Conference on 3-D Digital Imaging and Modeling, 167--176.
[23]
Guillemaut, J.-Y., Kilner, J., and Hilton, A. 2009. Robust graph-cut scene segmentation and reconstruction for free-viewpoint video of complex dynamic scenes. In Proc. International Conference on Computer Vision (ICCV 2009).
[24]
Hartley, R. I., and Zisserman, A. 2000. Multiple View Geometry in Computer Vision. Cambridge University Press, ISBN: 0521623049.
[25]
Hasler, N., Rosenhahn, B., Thormählen, T., Wand, M., Gall, J., and Seidel, H.-P. 2009. Markerless motion capture with unsynchronized moving cameras. In CVPR, 224--231.
[26]
Hayashi, K., and Saito, H. 2006. Synthesizing free-viewpoint images from multiple view videos in soccer stadium. In CGIV '06: Proceedings of the International Conference on Computer Graphics, Imaging and Visualisation, 220--225.
[27]
Hays, J., and Efros, A. A. 2007. Scene completion using millions of photographs. ACM Transactions on Graphics (SIGGRAPH 2007) 26, 3.
[28]
Heigl, B., Koch, R., Pollefeys, M., Denzler, J., and Van Gool, L. 1999. Plenoptic modeling and rendering from image sequences taken by hand-held camera. In Patter Recognition 1999, 21. DAGM-Symposium, 94--101.
[29]
Kanade, T., 2001. Carnegie mellon goes to the superbowl. http://www.ri.cmu.edu/events/sb35/tksuperbowl.html.
[30]
Karrer, T., Weiss, M., Lee, E., and Borchers, J. 2008. Dragon: a direct manipulation interface for frame-accurate in-scene video navigation. In CHI '08, 247--250.
[31]
Kilner, J., Starck, J., and Hilton, A. 2006. A comparative study of free-viewpoint video techniques for sports events. European Conference on Visual Media Production (CVMP).
[32]
Kilner, J., Starck, J., Hilton, A., and Grau, O. 2007. Dual-mode deformable models for free-viewpoint video of sports events. In 3DIM07, 177--184.
[33]
Kopf, J., Neubert, B., Chen, B., Cohen, M., Cohen-Or, D., Deussen, O., Uyttendaele, M., and Lischinski, D. 2008. Deep photo: model-based photograph enhancement and viewing. ACM Trans. Graph. 27, 5, 116.
[34]
Levoy, M., and Hanrahan, P. 1996. Light field rendering. In SIGGRAPH, 31--42.
[35]
Lhuillier, M., and Quan, L. 2005. A quasi-dense approach to surface reconstruction from uncalibrated images. IEEE Trans. Pattern Anal. Mach. Intell. 27, 3, 418--433.
[36]
Liu, F., Gleicher, M., Jin, H., and Agarwala, A. 2009. Content-preserving warps for 3d video stabilization. In ACM SIGGRAPH 2009, 1--9.
[37]
Lowe, D. G. 2004. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60, 2, 91--110.
[38]
Matusik, W., Buehler, C., Raskar, R., Gortler, S. J., and McMillan, L. 2000. Image-based visual hulls. In Proceedings of ACM SIGGRAPH, 369--374.
[39]
Pollefeys, M., Van Gool, L., Vergauwen, M., Verbiest, F., Cornelis, K., Tops, J., and Koch, R. 2004. Visual modeling with a hand-held camera. IJCV 59, 3, 207--232.
[40]
Rav-Acha, A., Kohli, P., Rother, C., and Fitzgibbon, A. 2008. Unwrap mosaics: A new representation for video editing. ACM Transactions on Graphics (SIGGRAPH 2008) (August).
[41]
Rong, G., and Tan, T.-S. 2006. Jump flooding in gpu with applications to voronoi diagram and distance transform. In ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games (I3D), ACM, 109--116.
[42]
Schindler, G., and Dellaert, F. 2010. Probabilistic temporal inference on reconstructed 3D scenes. In CVPR, 1--8.
[43]
Schödl, A., Szeliski, R., Salesin, D. H., and Essa, I. 2000. Video textures. In SIGGRAPH '00: Proceedings of the 27th annual conference on Computer graphics and interactive techniques, 489--498.
[44]
Schönemann, P. 1966. A generalized solution of the orthogonal procrustes problem. Psychometrika 31, 1 (March), 1--10.
[45]
Seitz, S. M., and Dyer, C. R. 1996. View morphing. In Proceedings of ACM SIGGRAPH, 21--30.
[46]
Seitz, S. M., Curless, B., Diebel, J., Scharstein, D., and Szeliski, R. 2006. A comparison and evaluation of multiview stereo reconstruction algorithms. In 2006 Conference on Computer Vision and Pattern Recognition (CVPR 2006), 519--528.
[47]
Sinha, S. N., and Pollefeys, M. 2004. Synchronization and calibration of camera networks from silhouettes. In ICPR '04: Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 1, 116--119.
[48]
Sinha, S. N., Steedly, D., Szeliski, R., Agrawala, M., and Pollefeys, M. 2008. Interactive 3d architectural modeling from unordered photo collections. ACM Transactions on Graphics (Proceedings of SIGGRAPH Asia 2008) 27, 5, 159.
[49]
Sivic, J., and Zisserman, A. 2003. Video Google: A text retrieval approach to object matching in videos. In Proceedings of the International Conference on Computer Vision, vol. 2, 1470--1477.
[50]
Snavely, N., Seitz, S. M., and Szeliski, R. 2006. Photo tourism: Exploring photo collections in 3d. In SIGGRAPH Conference Proceedings, 835--846.
[51]
Snavely, N., Garg, R., Seitz, S. M., and Szeliski, R. 2008. Finding paths through the world's photos. ACM Transactions on Graphics (Proceedings of SIGGRAPH 2008) 27, 3, 11--21.
[52]
Starck, J., and Hilton, A. 2007. Surface capture for performance based animation. IEEE Computer Graphics and Applications 27(3), 21--31.
[53]
Stich, T., Linz, C., Albuquerque, G., and Magnor, M. 2008. View and time interpolation in image space. Computer Graphics Forum (Proc. Pacific Graphics) 27, 7.
[54]
Sun, J., Zhang, W., Tang, X., and Shum, H.-Y. 2006. Background cut. In ECCV (2), 628--641.
[55]
Tuytelaars, T., and Van Gool, L. 2004. Synchronizing video sequences. Computer Vision and Pattern Recognition, IEEE Computer Society Conference on 1, 762--768.
[56]
van den Hengel, A., Dick, A., Thormählen, T., Ward, B., and Torr, P. H. S. 2007. Videotrace: Rapid interactive scene modelling from video. ACM Transactions on Graphics 26, 3 (July), 86:1--86:5.
[57]
Vedula, S., Baker, S., and Kanade, T. 2005. Image-based spatio-temporal modeling and view interpolation of dynamic events. ACM Transactions on Graphics 24, 2 (Apr.), 240--261.
[58]
Vlasic, D., Baran, I., Matusik, W., and Popović, J. 2008. Articulated mesh animation from multi-view silhouettes. ACM Transactions on Graphics 27, 3, 97:1--97:9.
[59]
Wang, J., and Bodenheimer, B. 2008. Synthesis and evaluation of linear motion transitions. ACM Trans. Graph. 27, 1, 1--15.
[60]
Wang, J., Bhat, P., Colburn, R. A., Agrawala, M., and Cohen, M. F. 2005. Interactive video cutout. ACM Trans. Graph. 24, 3, 585--594.
[61]
Waschbüsch, M., Würmlin, S., and Gross, M. H. 2007. 3d video billboard clouds. Computer Graphics Forum (Proc. Eurographics EG'07) 26, 3, 561--569.
[62]
Würmlin, S., and Niederberger, C., 2010. Realistic virtual replays for sports broadcasts. http://www.liberovision.com/.
[63]
Zach, C., Pock, T., and Bischof, H. 2007. A globally optimal algorithm for robust tv-11 range image integration. In IEEE International Conference on Computer Vision (ICCV).
[64]
Zitnick, C. L., Kang, S. B., Uyttendaele, M., Winder, S., and Szeliski, R. 2004. High-quality video view interpolation using a layered representation. ACM Transactions on Graphics 23, 3 (Aug.), 600--608.

Cited By

View all
  • (2023)Robust Dynamic Radiance Fields2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52729.2023.00010(13-23)Online publication date: Jun-2023
  • (2022)Dynamic scene novel view synthesis via deferred spatio-temporal consistencyComputers & Graphics10.1016/j.cag.2022.07.019107(220-230)Online publication date: Oct-2022
  • (2022)The One Where They Reconstructed 3D Humans and Environments in TV ShowsComputer Vision – ECCV 202210.1007/978-3-031-19836-6_41(732-749)Online publication date: 23-Oct-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Graphics
ACM Transactions on Graphics  Volume 29, Issue 4
July 2010
942 pages
ISSN:0730-0301
EISSN:1557-7368
DOI:10.1145/1778765
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 July 2010
Published in TOG Volume 29, Issue 4

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)1
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Robust Dynamic Radiance Fields2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52729.2023.00010(13-23)Online publication date: Jun-2023
  • (2022)Dynamic scene novel view synthesis via deferred spatio-temporal consistencyComputers & Graphics10.1016/j.cag.2022.07.019107(220-230)Online publication date: Oct-2022
  • (2022)The One Where They Reconstructed 3D Humans and Environments in TV ShowsComputer Vision – ECCV 202210.1007/978-3-031-19836-6_41(732-749)Online publication date: 23-Oct-2022
  • (2021)MonoMR: Synthesizing Pseudo-2.5D Mixed Reality Content from Monocular VideosApplied Sciences10.3390/app1117794611:17(7946)Online publication date: 27-Aug-2021
  • (2021)Video‐Based Rendering of Dynamic Stationary Environments from Unsynchronized InputsComputer Graphics Forum10.1111/cgf.1434240:4(73-86)Online publication date: 15-Jul-2021
  • (2020)Lossy Geometry Compression for High Resolution Voxel ScenesProceedings of the ACM on Computer Graphics and Interactive Techniques10.1145/33845413:1(1-13)Online publication date: 4-May-2020
  • (2020)Passthrough+Proceedings of the ACM on Computer Graphics and Interactive Techniques10.1145/33845403:1(1-17)Online publication date: 4-May-2020
  • (2020)A Time-independent Deformer for Elastic-rigid ContactsProceedings of the ACM on Computer Graphics and Interactive Techniques10.1145/33845393:1(1-21)Online publication date: 4-May-2020
  • (2020)Local Optimization for Robust Signed Distance Field CollisionProceedings of the ACM on Computer Graphics and Interactive Techniques10.1145/33845383:1(1-17)Online publication date: 4-May-2020
  • (2020)Real-time Approximation of Photometric Polygonal LightsProceedings of the ACM on Computer Graphics and Interactive Techniques10.1145/33845373:1(1-18)Online publication date: 4-May-2020
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media