Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3641519.3657409acmconferencesArticle/Chapter ViewAbstractPublication PagessiggraphConference Proceedingsconference-collections
research-article

Object-level Scene Deocclusion

Published: 13 July 2024 Publication History

Abstract

Deoccluding the hidden portions of objects in a scene is a formidable task, particularly when addressing real-world scenes. In this paper, we present a new self-supervised PArallel visible-to-COmplete diffusion framework, named PACO, a foundation model for object-level scene deocclusion. Leveraging the rich prior of pre-trained models, we first design the parallel variational autoencoder, which produces a full-view feature map that simultaneously encodes multiple complete objects, and the visible-to-complete latent generator, which learns to implicitly predict the full-view feature map from partial-view feature map and text prompts extracted from the incomplete objects in the input image. To train PACO, we create a large-scale dataset with 500k samples to enable self-supervised learning, avoiding tedious annotations of the amodal masks and occluded regions. At inference, we devise a layer-wise deocclusion strategy to improve efficiency while maintaining the deocclusion quality. Extensive experiments on COCOA and various real-world scenes demonstrate the superior capability of PACO for scene deocclusion, surpassing the state of the arts by a large margin. Our method can also be extended to cross-domain scenes and novel categories that are not covered by the training set. Further, we demonstrate the deocclusion applicability of PACO in single-view 3D scene reconstruction and object recomposition. Project page: https://liuzhengzhe.github.io/Deocclude-Any-Object.github.io/.

Supplemental Material

ZIP File
Supplementary material and Presentation video

References

[1]
Jasmin Breitenstein and Tim Fingscheidt. 2022. Amodal cityscapes: a new dataset, its generation, and an amodal semantic segmentation challenge baseline. In IV.
[2]
Christopher P Burgess, Loic Matthey, Nicholas Watters, Rishabh Kabra, Irina Higgins, Matt Botvinick, and Alexander Lerchner. 2019. Monet: Unsupervised scene decomposition and representation. arXiv preprint arXiv:1901.11390 (2019).
[3]
Helisa Dhamo, Nassir Navab, and Federico Tombari. 2019. Object-driven multi-layer scene decomposition from a single image. In ICCV.
[4]
Prafulla Dhariwal and Alexander Nichol. 2021. Diffusion models beat GANs on image synthesis. NeurIPS (2021).
[5]
Kiana Ehsani, Roozbeh Mottaghi, and Ali Farhadi. 2018. SeGAN: Segmenting and generating the invisible. In CVPR.
[6]
Martin Engelcke, Adam R Kosiorek, Oiwi Parker Jones, and Ingmar Posner. 2020. Genesis: Generative scene inference and sampling with object-centric latent representations. ICLR (2020).
[7]
Patrick Follmann, Rebecca König, Philipp Härtinger, Michael Klostermann, and Tobias Böttger. 2019. Learning to see the invisible: End-to-end trainable amodal instance segmentation. In WACV.
[8]
Locatello Francesco, Weissenborn Dirk, Unterthiner Thomas, Mahendran Aravindh, Heigold Georg, Uszkoreit Jakob, Dosovitskiy Alexey, and Kipf Thomas. 2020. Object-centric learning with slot attention. NeurIPS (2020).
[9]
Klaus Greff, Raphaël Lopez Kaufman, Rishabh Kabra, Nick Watters, Christopher Burgess, Daniel Zoran, Loic Matthey, Matthew Botvinick, and Alexander Lerchner. 2019. Multi-object representation learning with iterative variational inference. In ICML.
[10]
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. GANs trained by a two time-scale update rule converge to a local Nash equilibrium. NIPS (2017).
[11]
Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic models. NeurIPS (2020).
[12]
Jonathan Ho and Tim Salimans. 2022. Classifier-free diffusion guidance. NeurIPS Workshop (2022).
[13]
Yuan-Ting Hu, Hong-Shuo Chen, Kexin Hui, Jia-Bin Huang, and Alexander G Schwing. 2019. Sail-vos: Semantic amodal instance level video object segmentation-a synthetic dataset and baselines. In CVPR.
[14]
Justin Johnson, Bharath Hariharan, Laurens Van Der Maaten, Li Fei-Fei, C Lawrence Zitnick, and Ross Girshick. 2017. CLEVR: A diagnostic dataset for compositional language and elementary visual reasoning. In CVPR.
[15]
Abhishek Kar, Shubham Tulsiani, Joao Carreira, and Jitendra Malik. 2015. Amodal completion and size constancy in natural scenes. In ICCV.
[16]
Lei Ke, Yu-Wing Tai, and Chi-Keung Tang. 2021. Deep occlusion-aware instance segmentation with overlapping bilayers. In CVPR.
[17]
Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013).
[18]
Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C Berg, Wan-Yen Lo, 2023. Segment anything. arXiv preprint arXiv:2304.02643 (2023).
[19]
Ke Li and Jitendra Malik. 2016. Amodal instance segmentation. In ECCV.
[20]
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common objects in context. In ECCV.
[21]
Buyu Liu, Bingbing Zhuang, and Manmohan Chandraker. 2022. Weakly But Deeply Supervised Occlusion-Reasoned Parametric Road Layouts. In CVPR.
[22]
Andreas Lugmayr, Martin Danelljan, Andres Romero, Fisher Yu, Radu Timofte, and Luc Van Gool. 2022. Repaint: Inpainting using denoising diffusion probabilistic models. In CVPR.
[23]
Kaustubh Mani, Swapnil Daga, Shubhika Garg, Sai Shankar Narasimhan, Madhava Krishna, and Krishna Murthy Jatavallabhula. 2020. Monolayout: Amodal scene layout from a single image. In WACV.
[24]
Rohit Mohan and Abhinav Valada. 2022a. Amodal panoptic segmentation. In CVPR.
[25]
Rohit Mohan and Abhinav Valada. 2022b. Perceiving the invisible: Proposal-free amodal panoptic segmentation. RAL (2022).
[26]
Tom Monnier, Elliot Vincent, Jean Ponce, and Mathieu Aubry. 2021. Unsupervised layered image decomposition into object prototypes. In ICCV.
[27]
Medhini Narasimhan, Erik Wijmans, Xinlei Chen, Trevor Darrell, Dhruv Batra, Devi Parikh, and Amanpreet Singh. 2020. Seeing the un-scene: Learning amodal semantic maps for room navigation. In ECCV.
[28]
Khoi Nguyen and Sinisa Todorovic. 2021. A weakly supervised amodal segmenter with boundary uncertainty estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 7396–7405.
[29]
OpenAI. 2023. GPT-4V(ision) System Card. (2023).
[30]
Ege Ozguroglu, Ruoshi Liu, Dídac Surís, Dian Chen, Achal Dave, Pavel Tokmakov, and Carl Vondrick. 2024. pix2gestalt: Amodal Segmentation by Synthesizing Wholes. (2024).
[31]
Dim P Papadopoulos, Youssef Tamaazousti, Ferda Ofli, Ingmar Weber, and Antonio Torralba. 2019. How to make a pizza: Learning a compositional layer-based GAN model. In CVPR.
[32]
Pulak Purkait, Christopher Zach, and Ian Reid. 2019. Seeing behind things: Extending semantic segmentation to occluded regions. In IROS.
[33]
Lu Qi, Li Jiang, Shu Liu, Xiaoyong Shen, and Jiaya Jia. 2019. Amodal instance segmentation with kins dataset. In CVPR.
[34]
René Ranftl, Katrin Lasinger, David Hafner, Konrad Schindler, and Vladlen Koltun. 2020. Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. TPAMI (2020).
[35]
Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. 2014. Stochastic backpropagation and approximate inference in deep generative models. In ICML.
[36]
Kabra Rishabh, Burgess Chris, Matthey Loic, Lopez Kaufman Raphael, Greff Klaus, Reynolds Malcolm, and Lerchner. Alexander. 2019. Multi-object datasets.
[37]
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022. High-resolution image synthesis with latent diffusion models. In CVPR.
[38]
Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. 2015. Deep unsupervised learning using nonequilibrium thermodynamics. In ICLM.
[39]
Jingxiang Sun, Bo Zhang, Ruizhi Shao, Lizhen Wang, Wen Liu, Zhenda Xie, and Yebin Liu. 2023. Dreamcraft3D: Hierarchical 3D generation with bootstrapped diffusion prior. arXiv preprint arXiv:2310.16818 (2023).
[40]
Yihong Sun, Adam Kortylewski, and Alan Yuille. 2022. Amodal segmentation through out-of-task and out-of-distribution generalization with a Bayesian model. In CVPR.
[41]
Roman Suvorov, Elizaveta Logacheva, Anton Mashikhin, Anastasia Remizova, Arsenii Ashukha, Aleksei Silvestrov, Naejin Kong, Harshith Goka, Kiwoong Park, and Victor Lempitsky. 2022. Resolution-robust large mask inpainting with fourier convolutions. In WACV.
[42]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In NIPS.
[43]
Angtian Wang, Yihong Sun, Adam Kortylewski, and Alan L Yuille. 2020. Robust object detection under occlusion with context-aware compositionalnets. In CVPR.
[44]
Yuting Xiao, Yanyu Xu, Ziming Zhong, Weixin Luo, Jiawei Li, and Shenghua Gao. 2021. Amodal segmentation based on visible region segmentation and shape prior. In AAAI.
[45]
Chaohao Xie, Shaohui Liu, Chao Li, Ming-Ming Cheng, Wangmeng Zuo, Xiao Liu, Shilei Wen, and Errui Ding. 2019. Image inpainting with learnable bidirectional attention maps. In ICCV.
[46]
Xiaosheng Yan, Feigege Wang, Wenxi Liu, Yuanlong Yu, Shengfeng He, and Jia Pan. 2019. Visualizing the invisible: Occluded vehicle segmentation and recovery. In ICCV.
[47]
Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, and Thomas S Huang. 2019. Free-form image inpainting with gated convolution. In ICCV.
[48]
Xiaoding Yuan, Adam Kortylewski, Yihong Sun, and Alan Yuille. 2021. Robust instance segmentation through reasoning about multi-object occlusion. In CVPR.
[49]
Guanqi Zhan, Chuanxia Zheng, Weidi Xie, and Andrew Zisserman. 2023. Amodal Ground Truth and Completion in the Wild. arXiv preprint arXiv:2312.17247 (2023).
[50]
Xiaohang Zhan, Xingang Pan, Bo Dai, Ziwei Liu, Dahua Lin, and Chen Change Loy. 2020. Self-supervised scene de-occlusion. In CVPR.
[51]
Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. 2023. Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3836–3847.
[52]
Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. 2018. The unreasonable effectiveness of deep features as a perceptual metric. In CVPR.
[53]
Ziheng Zhang, Anpei Chen, Ling Xie, Jingyi Yu, and Shenghua Gao. 2019. Learning semantics-aware distance map with semantics layering network for amodal instance segmentation. In ACM MM.
[54]
Chuanxia Zheng, Duy-Son Dao, Guoxian Song, Tat-Jen Cham, and Jianfei Cai. 2021. Visiting the invisible: Layer-by-layer completed scene decomposition. IJCV (2021).
[55]
Bolei Zhou, Hang Zhao, Xavier Puig, Sanja Fidler, Adela Barriuso, and Antonio Torralba. 2017. Scene parsing through ADE20k dataset. In CVPR.
[56]
Qiang Zhou, Shiyin Wang, Yitong Wang, Zilong Huang, and Xinggang Wang. 2021. Human de-occlusion: Invisible perception and recovery for humans. In CVPR.
[57]
Yan Zhu, Yuandong Tian, Dimitris Metaxas, and Piotr Dollár. 2017. Semantic amodal segmentation. In CVPR.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGGRAPH '24: ACM SIGGRAPH 2024 Conference Papers
July 2024
1106 pages
ISBN:9798400705250
DOI:10.1145/3641519
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 July 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. image recomposition
  2. object completion
  3. scene deocclusion

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

SIGGRAPH '24
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,822 of 8,601 submissions, 21%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 159
    Total Downloads
  • Downloads (Last 12 months)159
  • Downloads (Last 6 weeks)49
Reflects downloads up to 15 Oct 2024

Other Metrics

Citations

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media