research-article

FactorMatte: Redefining Video Matting for Re-Composition Tasks

Authors:

Abe DavisAuthors Info & Claims

ACM Transactions on Graphics (TOG), Volume 42, Issue 4

Article No.: 47, Pages 1 - 14

https://doi.org/10.1145/3592423

Published: 26 July 2023 Publication History

Abstract

We propose Factor Matting, an alternative formulation of the video matting problem in terms of counterfactual video synthesis that is better suited for re-composition tasks. The goal of factor matting is to separate the contents of a video into independent components, each representing a counterfactual version of the scene where the contents of other components have been removed. We show that factor matting maps well to a more general Bayesian framing of the matting problem that accounts for complex conditional interactions between layers. Based on this observation, we present a method for solving the factor matting problem that learns augmented patch-based appearance priors to produce useful decompositions even for video with complex cross-layer interactions like splashes, shadows, and reflections. Our method is trained per-video and does not require external training data or any knowledge about the 3D structure of the scene. Through extensive experiments, we show that it is able to produce useful decompositions of scenes with such complex interactions while performing competitively on classical matting tasks as well. We also demonstrate the benefits of our approach on a wide range of downstream video editing tasks. Our project website is at: https://factormatte.github.io/.

Supplementary Material

ZIP File (papers_604-supplemental.zip)

supplemental material

Download
6.81 MB

MP4 File (papers_604_VOD.mp4)

presentation

Download
169.21 MB

References

[1]

Xue Bai, Jue Wang, and David Simons. 2011. Towards temporally-coherent video matting. In International Conference on Computer Vision/Computer Graphics Collaboration Techniques and Applications.

[2]

Xue Bai, Jue Wang, David Simons, and Guillermo Sapiro. 2009. Video snapcut: robust video object cutout using localized classifiers. ACM Trans. Graph. 28, 3 (2009).

Digital Library

[3]

Olivier Barnich and Marc Van Droogenbroeck. 2010. ViBe: A universal background subtraction algorithm for video sequences. IEEE Trans. Image Process. 20, 6 (2010).

[4]

Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam. 2018. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proc. European Conference on Computer Vision (ECCV).

Digital Library

[5]

Qifeng Chen, Dingzeyu Li, and Chi-Keung Tang. 2013. KNN matting. Trans. Pattern Anal. Mach. Intell. 35, 9 (2013).

Digital Library

[6]

Donghyeon Cho, Yu-Wing Tai, and Inso Kweon. 2016. Natural image matting using deep convolutional neural networks. In Proc. European Conference on Computer Vision (ECCV).

[7]

Inchang Choi, Minhaeng Lee, and Yu-Wing Tai. 2012. Video matting using multi-frame nonlocal matting laplacian. In Proc. European Conference on Computer Vision (ECCV). 540--553.

Digital Library

[8]

Yung-Yu Chuang, Aseem Agarwala, Brian Curless, David H Salesin, and Richard Szeliski. 2002. Video matting of complex scenes. In SIGGRAPH.

[9]

Yung-Yu Chuang, Brian Curless, David H Salesin, and Richard Szeliski. 2001. A bayesian approach to digital matting. In Proc. Computer Vision and Pattern Recognition (CVPR), Vol. 2.

[10]

Mikhail Erofeev, Yury Gitman, Dmitriy S Vatolin, Alexey Fedorov, and Jue Wang. 2015. Perceptually Motivated Benchmark for Video Matting. In BMVC. 99--1.

[11]

Marco Forte and François Pitié. 2020. F, B, Alpha Matting. arXiv preprint arXiv:2003.07711 (2020).

[12]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR.

[13]

Qiqi Hou and Feng Liu. 2019. Context-aware image matting for simultaneous foreground and alpha estimation. In Proc. Int. Conf. on Computer Vision (ICCV).

[14]

Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. 2017. Image-to-image translation with conditional adversarial networks. In Proc. Computer Vision and Pattern Recognition (CVPR).

[15]

Animesh Karnewar and Oliver Wang. 2020. Msg-gan: Multi-scale gradients for generative adversarial networks. In Proc. Computer Vision and Pattern Recognition (CVPR).

[16]

Yoni Kasten, Dolev Ofri, Oliver Wang, and Tali Dekel. 2021. Layered neural atlases for consistent video editing. ACM Trans. Graph. 40, 6 (2021).

Digital Library

[17]

Zhanghan Ke, Jiayu Sun, Kaican Li, Qiong Yan, and Rynson WH Lau. 2022. Modnet: Real-time trimap-free portrait matting via objective decomposition. In AAAI, Vol. 36.

[18]

Sun-Young Lee, Jong-Chul Yoon, and In-Kwon Lee. 2010. Temporally coherent video matting. Graphical Models 72, 3 (2010).

[19]

Anat Levin, Dani Lischinski, and Yair Weiss. 2008. A Closed-Form Solution to Natural Image Matting. Trans. Pattern Anal. Mach. Intell. 30, 2 (2008).

Digital Library

[20]

Dingzeyu Li, Qifeng Chen, and Chi-Keung Tang. 2013. Motion-aware KNN Laplacian for video matting. In Proc. Int. Conf. on Computer Vision (ICCV).

Digital Library

[21]

Yaoyi Li and Hongtao Lu. 2020. Natural image matting via guided contextual attention. In AAAI, Vol. 34. 11450--11457.

[22]

Zhen Li, Cheng-Ze Lu, Jianhua Qin, Chun-Le Guo, and Ming-Ming Cheng. 2022. Towards an end-to-end framework for flow-guided video inpainting. In CVPR. 17562--17571.

[23]

Long Ang Lim and Hacer Yalim Keles. 2018. Foreground segmentation using convolutional neural networks for multiscale feature encoding. Pattern Recognition Letters 112 (2018).

[24]

Long Ang Lim and Hacer Yalim Keles. 2020. Learning multi-scale features for foreground segmentation. Pattern Analysis and Applications 23, 3 (2020).

[25]

Shanchuan Lin, Andrey Ryabtsev, Soumyadip Sengupta, Brian L Curless, Steven M Seitz, and Ira Kemelmacher-Shlizerman. 2021. Real-time high-resolution background matting. In Proc. Computer Vision and Pattern Recognition (CVPR).

[26]

Shanchuan Lin, Linjie Yang, Imran Saleemi, and Soumyadip Sengupta. 2022. Robust high-resolution video matting with temporal guidance. In IEEE Winter Conf. on Applications of Computer Vision (WACV).

[27]

Erika Lu, Forrester Cole, Tali Dekel, Weidi Xie, Andrew Zisserman, David Salesin, William T Freeman, and Michael Rubinstein. 2020. Layered neural rendering for retiming people in video. arXiv preprint arXiv:2009.07833 (2020).

[28]

Erika Lu, Forrester Cole, Tali Dekel, Andrew Zisserman, William T Freeman, and Michael Rubinstein. 2021. Omnimatte: Associating objects and their effects in video. In Proc. Computer Vision and Pattern Recognition (CVPR).

[29]

Hao Lu, Yutong Dai, Chunhua Shen, and Songcen Xu. 2019. Indices matter: Learning to index for deep image matting. In ICCV. 3266--3275.

[30]

Jordi Pont-Tuset, Federico Perazzi, Sergi Caelles, Pablo Arbeláez, Alexander Sorkine-Hornung, and Luc Van Gool. 2017. The 2017 DAVIS Challenge on Video Object Segmentation. arXiv:1704.00675 (2017).

[31]

Thomas Porter and Tom Duff. 1984. Compositing digital images. In SIGGRAPH.

[32]

Richard J Qian and M Ibrahim Sezan. 1999. Video background replacement without a blue screen. In IEEE Int. Conf. Image Process., Vol. 4.

[33]

M.A. Ruzon and C. Tomasi. 2000. Alpha estimation in natural images. In Proc. Computer Vision and Pattern Recognition (CVPR), Vol. 1.

[34]

Soumyadip Sengupta, Vivek Jayaram, Brian Curless, Steven M Seitz, and Ira Kemelmacher-Shlizerman. 2020. Background matting: The world is your green screen. In Proc. Computer Vision and Pattern Recognition (CVPR).

[35]

Hongje Seong, Seoung Wug Oh, Brian Price, Euntai Kim, and Joon-Young Lee. 2022. One-Trimap Video Matting. In ECCV. 430--448.

[36]

Dmitriy Smirnov, Michael Gharbi, Matthew Fisher, Vitor Guizilini, Alexei Efros, and Justin M Solomon. 2021. Marionette: Self-supervised sprite learning. Adv. Neural Inform. Process. Syst. 34 (2021).

[37]

Alvy Ray Smith and James F. Blinn. 1996. Blue Screen Matting. In SIGGRAPH (SIGGRAPH '96). New York, NY, USA, 10 pages.

Digital Library

[38]

Jian Sun, Jiaya Jia, Chi-Keung Tang, and Heung-Yeung Shum. 2004. Poisson Matting. ACM Trans. Graph. 23, 3 (August 2004), 7 pages.

Digital Library

[39]

Yanan Sun, Guanzhi Wang, Qiao Gu, Chi-Keung Tang, and Yu-Wing Tai. 2021. Deep video matting via spatio-temporal alignment and aggregation. In Proc. Computer Vision and Pattern Recognition (CVPR).

[40]

Jingwei Tang, Yagiz Aksoy, Cengiz Oztireli, Markus Gross, and Tunc Ozan Aydin. 2019. Learning-based sampling for natural image matting. In Proc. Computer Vision and Pattern Recognition (CVPR).

[41]

Zachary Teed and Jia Deng. 2020. Raft: Recurrent all-pairs field transforms for optical flow. In Proc. European Conference on Computer Vision (ECCV).

Digital Library

[42]

Ozan Tezcan, Prakash Ishwar, and Janusz Konrad. 2020. BSUV-Net: A fully-convolutional neural network for background subtraction of unseen videos. In IEEE Winter Conf. on Applications of Computer Vision (WACV).

[43]

J. Wang and M.F. Cohen. 2005. An iterative optimization approach for unified image segmentation and matting. In Proc. Int. Conf. on Computer Vision (ICCV), Vol. 2.

Digital Library

[44]

Ning Xu, Brian Price, Scott Cohen, and Thomas Huang. 2017. Deep image matting. In CVPR. 2970--2979.

[45]

Vickie Ye, Zhengqi Li, Richard Tucker, Angjoo Kanazawa, and Noah Snavely. 2022. Deformable Sprites for Unsupervised Video Decomposition. In Proc. Computer Vision and Pattern Recognition (CVPR).

[46]

Yunke Zhang, Chi Wang, Miaomiao Cui, Peiran Ren, Xuansong Xie, Xian-Sheng Hua, Hujun Bao, Qixing Huang, and Weiwei Xu. 2021. Attention-guided Temporally Coherent Video Object Matting. In ACMMM. 5128--5137.

Cited By

Chen AHuang HZhu YXue J(2024)Real-Time Multi-Person Video Synthesis with Controllable Prior-Guided MattingSensors10.3390/s2409279524:9(2795)Online publication date: 27-Apr-2024
https://doi.org/10.3390/s24092795
Li RZhang DGeng SZhou M(2024)Matting Algorithm with Improved Portrait Details for Images with Complex BackgroundsApplied Sciences10.3390/app1405194214:5(1942)Online publication date: 27-Feb-2024
https://doi.org/10.3390/app14051942
Lin GGao CHuang JKim CWang YZwicker MSaraf A(2023)OmnimatteRF: Robust Omnimatte with 3D Background Modeling2023 IEEE/CVF International Conference on Computer Vision (ICCV)10.1109/ICCV51070.2023.02145(23414-23423)Online publication date: 1-Oct-2023
https://doi.org/10.1109/ICCV51070.2023.02145
Show More Cited By

Index Terms

FactorMatte: Redefining Video Matting for Re-Composition Tasks
1. Computing methodologies

Recommendations

Automatic spectral video matting

This paper proposes automatic spectral video matting based on adaptive component detection and component-matching-based spectral matting. In the proposed automatic spectral video matting, adaptive component detection is used to automatically generate ...
Automatic video matting through scribble propagation
ICVGIP '16: Proceedings of the Tenth Indian Conference on Computer Vision, Graphics and Image Processing

Video matting is an extension of image matting and is used to extract the foreground matte from an arbitrary background of every frame in a video sequence. An automatic scribbling approach based on the relative motion of the foreground object with ...
The Video Matting Based on Background Reconstruction and Prediction
ETCS '11: Proceedings of the 2011 Third International Workshop on Education Technology and Computer Science - Volume 01

In this paper, we propose a new video matting method based on background reconstruction and prediction. Different from image matting technique, video matting can benefit from temporal consistency. so we can predict or reconstruct the background from ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Graphics

ACM Transactions on Graphics Volume 42, Issue 4

August 2023

1912 pages

ISSN:0730-0301

EISSN:1557-7368

DOI:10.1145/3609020

Issue’s Table of Contents

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 July 2023

Published in TOG Volume 42, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
137
Total Downloads

Downloads (Last 12 months)67
Downloads (Last 6 weeks)6

Reflects downloads up to 11 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Chen AHuang HZhu YXue J(2024)Real-Time Multi-Person Video Synthesis with Controllable Prior-Guided MattingSensors10.3390/s2409279524:9(2795)Online publication date: 27-Apr-2024
https://doi.org/10.3390/s24092795
Li RZhang DGeng SZhou M(2024)Matting Algorithm with Improved Portrait Details for Images with Complex BackgroundsApplied Sciences10.3390/app1405194214:5(1942)Online publication date: 27-Feb-2024
https://doi.org/10.3390/app14051942
Lin GGao CHuang JKim CWang YZwicker MSaraf A(2023)OmnimatteRF: Robust Omnimatte with 3D Background Modeling2023 IEEE/CVF International Conference on Computer Vision (ICCV)10.1109/ICCV51070.2023.02145(23414-23423)Online publication date: 1-Oct-2023
https://doi.org/10.1109/ICCV51070.2023.02145
Chan CYuan CSun CChen H(2023)Hashing Neural Video Decomposition with Multiplicative Residuals in Space-Time2023 IEEE/CVF International Conference on Computer Vision (ICCV)10.1109/ICCV51070.2023.00712(7709-7719)Online publication date: 1-Oct-2023
https://doi.org/10.1109/ICCV51070.2023.00712

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents