research-article

IntrinsicDiffusion: Joint Intrinsic Layers from Latent Diffusion Models

Authors:

Anna Frühstück,

Christian Richardt,

Tuanfeng WangAuthors Info & Claims

SIGGRAPH '24: ACM SIGGRAPH 2024 Conference Papers

Article No.: 74, Pages 1 - 11

https://doi.org/10.1145/3641519.3657472

Published: 13 July 2024 Publication History

Abstract

Reasoning about the intrinsic properties of an image, such as albedo, illumination, and surface geometry, is a long-standing problem with many applications in image editing and compositing. Existing solutions to this ill-posed problem either heavily rely on manually designed priors or learn priors from limited datasets that lack diversity. Hence, they fall short in generalizing to in-the-wild test scenarios. In this paper, we show that a large-scale text-to-image generation model trained on a massive amount of visual data can implicitly learn intrinsic image priors. In particular, we introduce a novel conditioning mechanism built on top of a pre-trained foundational image generation model to jointly predict multiple intrinsic modalities from an input image. We demonstrate that predicting different modalities in a collaborative manner improves the overall quality. This design also enables mixing datasets with annotations of only a subset of the modalities during training, contributing to the generalizability of our approach. Our method achieves state-of-the-art performance in intrinsic image decomposition, both qualitatively and quantitatively. We also demonstrate downstream image editing applications, such as relighting and retexturing.

Supplemental Material

MP4 File - presentation

presentation

Download
406.65 MB

PDF File

Appendix

Download
15.42 MB

References

[1]

H. G. Barrow and J. M. Tenenbaum. 1978. Recovering intrinsic scene characteristics from images. Computer Vision Systems (1978).

[2]

Anil S Baslamisli, Thomas T Groenestege, Partha Das, Hoang-An Le, Sezer Karaoglu, and Theo Gevers. 2018. Joint learning of intrinsic images and semantic segmentation. In ECCV. 286–302.

[3]

Ronen Basri and David W Jacobs. 2003. Lambertian reflectance and linear subspaces. IEEE Transactions on Pattern Analysis and Machine Intelligence 25, 2 (2003), 218–233.

Digital Library

[4]

Sean Bell, Kavita Bala, and Noah Snavely. 2014. Intrinsic images in the wild. ACM Transactions on Graphics 33, 4 (2014), 159:1–12.

Digital Library

[5]

A.H. Bermano, R. Gal, Y. Alaluf, R. Mokady, Y. Nitzan, O. Tov, O. Patashnik, and D. Cohen-Or. 2022. State-of-the-Art in the Architecture, Methods and Applications of StyleGAN. Computer Graphics Forum 41, 2 (2022), 591–611. https://doi.org/10.1111/cgf.14503

[6]

Anand Bhattad, Daniel McKee, Derek Hoiem, and D.A. Forsyth. 2023. StyleGAN knows Normal, Depth, Albedo, and More. In NeurIPS.

[7]

Sai Bi, Xiaoguang Han, and Yizhou Yu. 2015. An L1 image transform for edge-preserving smoothing and scene-level intrinsic decomposition. ACM Transactions on Graphics 34, 4 (2015), 78:1–12.

Digital Library

[8]

Nicolas Bonneel, Balazs Kovacs, Sylvain Paris, and Kavita Bala. 2017. Intrinsic decompositions for image editing. Computer Graphics Forum 36, 2 (2017), 593–609.

Digital Library

[9]

Adrien Bousseau, Sylvain Paris, and Frédo Durand. 2009. User-assisted intrinsic images. ACM Transactions on Graphics 28, 5 (2009), 130:1–10. https://doi.org/10.1145/1618452.1618476

Digital Library

[10]

Chris Careaga and Yağız Aksoy. 2023. Intrinsic Image Decomposition via Ordinal Shading. ACM Trans. Graph. 43, 1 (2023), 12:1–24. https://doi.org/10.1145/3630750

Digital Library

[11]

Qifeng Chen and Vladlen Koltun. 2013. A simple model for intrinsic image decomposition with depth cues. In ICCV. 241–248.

[12]

Changwoon Choi, Juhyeon Kim, and Young Min Kim. 2023. IBL-NeRF: Image-Based Lighting Formulation of Neural Radiance Fields. Comput. Graph. Forum (2023). https://doi.org/10.1111/cgf.14929

[13]

Partha Das, Maxime Gevers, Sezer Karaoglu, and Theo Gevers. 2023. IDTransformer: Transformer for Intrinsic Image Decomposition. In ICCV Workshops. 816–825.

[14]

Partha Das, Sezer Karaoglu, and Theo Gevers. 2022. PIE-Net: Photometric Invariant Edge Guided Network for Intrinsic Image Decomposition. In CVPR.

[15]

Xiaodan Du, Nicholas Kolkin, Greg Shakhnarovich, and Anand Bhattad. 2023. Generative Models: What do they know? Do they know things? Let’s find out!. In NeurIPS.

[16]

Patrick Esser, Robin Rombach, and Bjorn Ommer. 2021. Taming Transformers for High-Resolution Image Synthesis. In CVPR. 12873–12883.

[17]

Qingnan Fan, Jiaolong Yang, Gang Hua, Baoquan Chen, and David Wipf. 2018. Revisiting deep intrinsic image decompositions. In CVPR. 8944–8952.

[18]

David Forsyth and Jason Rock. 2022. Intrinsic Image Decomposition using Paradigms. TPAMI 44, 11 (2022), 7624–7637. https://doi.org/10.1109/TPAMI.2021.3119551

[19]

Elena Garces, Adolfo Muñoz, Jorge Lopez-Moreno, and Diego Gutierrez. 2012. Intrinsic Images by Clustering. Computer Graphics Forum 31, 4 (2012), 1415–1424.

Digital Library

[20]

Elena Garces, Carlos Rodriguez-Pardo, Dan Casas, and Jorge Lopez-Moreno. 2022. A Survey on Intrinsic Images: Delving Deep Into Lambert and Beyond. International Journal of Computer Vision 130 (2022), 836–868. https://doi.org/10.1007/s11263-021-01563-8

Digital Library

[21]

Mathieu Garon, Kalyan Sunkavalli, Sunil Hadap, Nathan Carr, and Jean-François Lalonde. 2019. Fast Spatially-Varying Indoor Lighting Estimation. In CVPR. 6908–6917.

[22]

Peter Vincent Gehler, Carsten Rother, Martin Kiefel, Lumin Zhang, and Bernhard Schölkopf. 2011. Recovering Intrinsic Images with a Global Sparsity Prior on Reflectance. In NIPS.

[23]

Roger Grosse, Micah K Johnson, Edward H Adelson, and William T Freeman. 2009. Ground truth dataset and baseline evaluations for intrinsic image algorithms. In ICCV. 2335–2342.

[24]

Mohammed Hachama, Bernard Ghanem, and Peter Wonka. 2015. Intrinsic scene decomposition from RGB-D images. In ICCV. 810–818.

[25]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In CVPR. https://doi.org/10.1109/CVPR.2016.90

[26]

Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic models. In NeurIPS. 6840–6851.

[27]

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2022. LoRA: Low-Rank Adaptation of Large Language Models. In ICLR.

[28]

Junqing Huang, Michael Ruzhansky, Qianying Zhang, and Haihui Wang. 2023. Intrinsic Image Transfer for Illumination Manipulation. TPAMI 45, 6 (2023), 7444–7456. https://doi.org/10.1109/TPAMI.2022.3224253

Digital Library

[29]

Yasamin Jafarian, Tuanfeng Y Wang, Duygu Ceylan, Jimei Yang, Nathan Carr, Yi Zhou, and Hyun Soo Park. 2023. Normal-guided Garment UV Prediction for Human Re-texturing. In CVPR.

[30]

Yeying Jin, Ruoteng Li, Wenhan Yang, and Robby T Tan. 2023. Estimating Reflectance Layer from A Single Image: Integrating Reflectance Guidance and Shadow/Specular Aware Learning. In AAAI.

[31]

Tero Karras, Samuli Laine, and Timo Aila. 2021. A Style-Based Generator Architecture for Generative Adversarial Networks. IEEE Trans. Pattern Anal. Mach. Intell. 43, 12 (dec 2021), 4217–4228. https://doi.org/10.1109/TPAMI.2020.2970919

Digital Library

[32]

Seungryong Kim, Kihong Park, Kwanghoon Sohn, and Stephen Lin. 2016. Unified depth prediction and intrinsic image decomposition from a single image via joint convolutional neural fields. In ECCV. 143–159.

[33]

Peter Kocsis, Vincent Sitzmann, and Matthias Nießner. 2024. Intrinsic Image Diffusion for Single-view Material Estimation. In CVPR.

[34]

Balazs Kovacs, Sean Bell, Noah Snavely, and Kavita Bala. 2017. Shading annotations in the wild. In CVPR. 6998–7007.

[35]

Philipp Krähenbühl. 2018. Free supervision from video games. In CVPR.

[36]

Louis Lettry, Kenneth Vanhoey, and Luc Van Gool. 2018. DARN: a deep adversarial residual network for intrinsic image decomposition. In WACV. 1359–1367.

[37]

Daiqing Li, Junlin Yang, Karsten Kreis, Antonio Torralba, and Sanja Fidler. 2021a. Semantic Segmentation with Generative Models: Semi-Supervised Learning and Strong Out-of-Domain Generalization. In CVPR.

[38]

Zhengqin Li, Mohammad Shafiei, Ravi Ramamoorthi, Kalyan Sunkavalli, and Manmohan Chandraker. 2020. Inverse rendering for complex indoor scenes: Shape, spatially-varying lighting and SVBRDF from a single image. In CVPR. 2475–2484.

[39]

Zhengqi Li and Noah Snavely. 2018a. CGIntrinsics: Better Intrinsic Image Decomposition Through Physically-Based Rendering. In ECCV.

[40]

Zhengqi Li and Noah Snavely. 2018b. Learning intrinsic image decomposition from watching the world. In CVPR. 9039–9048.

[41]

Zhengqin Li, Ting-Wei Yu, Shen Sang, Sarah Wang, Meng Song, Yuhan Liu, Yu-Ying Yeh, Rui Zhu, Nitesh Gundavarapu, Jia Shi, Sai Bi, Hong-Xing Yu, Zexiang Xu, Kalyan Sunkavalli, Milos Hasan, Ravi Ramamoorthi, and Manmohan Chandraker. 2021b. OpenRooms: An Open Framework for Photorealistic Indoor Scene Datasets. In CVPR.

[42]

Shanchuan Lin, Bingchen Liu, Jiashi Li, and Xiao Yang. 2024. Common Diffusion Noise Schedules and Sample Steps are Flawed. In WACV.

[43]

Yunfei Liu, Yu Li, Shaodi You, and Feng Lu. 2020. Unsupervised learning for intrinsic image decomposition from a single image. In CVPR.

[44]

Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie, Yixuan Wei, Jia Ning, Yue Cao, Zheng Zhang, Li Dong, Furu Wei, and Baining Guo. 2022. Swin Transformer V2: Scaling Up Capacity and Resolution. In CVPR. 12009–12019.

[45]

Jundan Luo, Zhaoyang Huang, Yijin Li, Xiaowei Zhou, Guofeng Zhang, and Hujun Bao. 2020. NIID-Net: Adapting Surface Normal Knowledge for Intrinsic Image Decomposition in Indoor Scenes. IEEE Transactions on Visualization and Computer Graphics 26, 12 (2020), 3434–3445.

[46]

Jundan Luo, Nanxuan Zhao, Wenbin Li, and Christian Richardt. 2023. CRefNet: Learning Consistent Reflectance Estimation With a Decoder-Sharing Transformer. IEEE Transactions on Visualization and Computer Graphics (2023). https://doi.org/10.1109/TVCG.2023.3337870

Digital Library

[47]

Abhimitra Meka, Gereon Fox, Michael Zollhöfer, Christian Richardt, and Christian Theobalt. 2017. Live user-guided intrinsic video for static scenes. IEEE Transactions on Visualization and Computer Graphics 23, 11 (2017), 2447–2454.

Digital Library

[48]

Abhimitra Meka, Mohammad Shafiei, Michael Zollhöfer, Christian Richardt, and Christian Theobalt. 2021. Real-time Global Illumination Decomposition of Videos. ACM Transactions on Graphics 40, 3 (2021), 1–16.

Digital Library

[49]

Lukas Murmann, Michael Gharbi, Miika Aittala, and Fredo Durand. 2019. A Dataset of Multi-Illumination Images in the Wild. In ICCV. 4080–4089.

[50]

Takuya Narihira, Michael Maire, and Stella X Yu. 2015. Learning lightness from human judgement on relative reflectance. In CVPR. 2965–2973.

[51]

Ryan Po and Gordon Wetzstein. 2023. Compositional 3D Scene Generation using Locally Conditioned Diffusion. (2023). arXiv:2303.12218.

[52]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, 2021. Learning transferable visual models from natural language supervision. In ICML.

[53]

Mike Roberts, Jason Ramapuram, Anurag Ranjan, Atulit Kumar, Miguel Angel Bautista, Nathan Paczan, Russ Webb, and Joshua M Susskind. 2021. Hypersim: A photorealistic synthetic dataset for holistic indoor scene understanding. In ICCV. 10912–10922.

[54]

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022. High-Resolution Image Synthesis with Latent Diffusion Models. In CVPR.

[55]

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-Net: Convolutional networks for biomedical image segmentation. In MICCAI.

[56]

Tim Salimans and Jonathan Ho. 2022. Progressive Distillation for Fast Sampling of Diffusion Models. In ICLR.

[57]

Kripasindhu Sarkar, Marcel C. Buehler, Gengyan Li, Daoye Wang, Delio Vicini, Jérémy Riviere, Yinda Zhang, Sergio Orts-Escolano, Paulo Gotardo, Thabo Beeler, and Abhimitra Meka. 2023. LitNeRF: Intrinsic Radiance Decomposition for High-Quality View Synthesis and Relighting of Faces. In SIGGRAPH Asia. https://doi.org/10.1145/3610548.3618210

Digital Library

[58]

Saurabh Saxena, Charles Herrmann, Junhwa Hur, Abhishek Kar, Mohammad Norouzi, Deqing Sun, and David J. Fleet. 2023a. The Surprising Effectiveness of Diffusion Models for Optical Flow and Monocular Depth Estimation. In NeurIPS.

[59]

Saurabh Saxena, Junhwa Hur, Charles Herrmann, Deqing Sun, and David J. Fleet. 2023b. Zero-Shot Metric Depth with a Field-of-View Conditioned Diffusion Model. (2023). arXiv:2312.13252.

[60]

Viraj Shah, Svetlana Lazebnik, and Julien Philip. 2023. JoIN: Joint GANs Inversion for Intrinsic Image Decomposition. (2023). arXiv:2305.11321.

[61]

Jianbing Shen, Xiaoshan Yang, Yunde Jia, and Xuelong Li. 2011. Intrinsic images using optimization. In CVPR. 3481–3487.

[62]

Nathan Silberman, Derek Hoiem, Pushmeet Kohli, and Rob Fergus. 2012. Indoor segmentation and support inference from RGBD images. In ECCV. 746–760.

[63]

Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. 2015. Deep unsupervised learning using nonequilibrium thermodynamics. In ICML.

[64]

Igor Vasiljevic, Nick Kolkin, Shanyi Zhang, Ruotian Luo, Haochen Wang, Falcon Z. Dai, Andrea F. Daniele, Mohammadreza Mostajabi, Steven Basart, Matthew R. Walter, and Gregory Shakhnarovich. 2019. DIODE: A Dense Indoor and Outdoor DEpth Dataset. (2019). arXiv:1908.00463.

[65]

Zongji Wang, Yunfei Liu, and Feng Lu. 2023. Discriminative feature encoding for intrinsic image decomposition. Computational Visual Media 9 (2023), 597–618. https://doi.org/10.1007/s41095-022-0294-4

[66]

Chenglei Wu, Michael Zollhöfer, Matthias Nießner, Marc Stamminger, Shahram Izadi, and Christian Theobalt. 2014. Real-time shading-based refinement for consumer depth cameras. ACM Transactions on Graphics 33, 6 (2014), 200:1–10.

Digital Library

[67]

Jiaye Wu, Sanjoy Chowdhury, Hariharmano Shanmugaraja, David Jacobs, and Soumyadip Sengupta. 2023. Measured Albedo in the Wild: Filling the Gap in Intrinsics Evaluation. In ICCP.

[68]

Ling Yang, Zhilong Zhang, Yang Song, Shenda Hong, Runsheng Xu, Yue Zhao, Wentao Zhang, Bin Cui, and Ming-Hsuan Yang. 2023. Diffusion Models: A Comprehensive Survey of Methods and Applications. ACM Comput. Surv. 56, 4, Article 105 (nov 2023), 39 pages. https://doi.org/10.1145/3626235

Digital Library

[69]

Weicai Ye, Shuo Chen, Chong Bao, Hujun Bao, Marc Pollefeys, Zhaopeng Cui, and Guofeng Zhang. 2023. IntrinsicNeRF: Learning Intrinsic Neural Radiance Fields for Editable Novel View Synthesis. In ICCV.

[70]

Lap-Fai Yu, Sai-Kit Yeung, Yu-Wing Tai, and Stephen Lin. 2013. Shading-based shape refinement of RGB-D images. In CVPR. 1415–1422.

[71]

Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. 2023. Adding Conditional Control to Text-to-Image Diffusion Models. In ICCV.

[72]

Qi Zhao, Ping Tan, Qiang Dai, Li Shen, Enhua Wu, and Stephen Lin. 2012. A closed-form solution to Retinex with nonlocal texture constraints. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 7 (2012), 1437–1444.

Digital Library

[73]

Chengwei Zheng, Wenbin Lin, and Feng Xu. 2022. A Self-Occlusion Aware Lighting Model for Real-Time Dynamic Reconstruction. IEEE Transactions on Visualization and Computer Graphics (2022).

[74]

Hao Zhou, Xiang Yu, and David W Jacobs. 2019. GLoSH: Global-Local Spherical Harmonics for Intrinsic Image Decomposition. In ICCV. 7820–7829.

[75]

Tinghui Zhou, Philipp Krähenbühl, and Alexei A Efros. 2015. Learning data-driven reflectance priors for intrinsic image decomposition. In ICCV. 3469–3477.

[76]

Jingsen Zhu, Yuchi Huo, Qi Ye, Fujun Luan, Jifan Li, Dianbing Xi, Lisha Wang, Rui Tang, Wei Hua, Hujun Bao, and Rui Wang. 2023. I2-SDF: Intrinsic Indoor Scene Reconstruction and Editing via Raytracing in Neural SDFs. In CVPR. https://doi.org/10.1109/CVPR52729.2023.01202

[77]

Jingsen Zhu, Fujun Luan, Yuchi Huo, Zihao Lin, Zhihua Zhong, Dianbing Xi, Rui Wang, Hujun Bao, Jiaxiang Zheng, and Rui Tang. 2022. Learning-based Inverse Rendering of Complex Indoor Scenes with Differentiable Monte Carlo Raytracing. In Proceedings of SIGGRAPH Asia. 6:1–8.

[78]

Michael Zollhöfer, Angela Dai, Matthias Innmann, Chenglei Wu, Marc Stamminger, Christian Theobalt, and Matthias Nießner. 2015. Shading-based refinement on volumetric signed distance functions. ACM Transactions on Graphics 34, 4 (2015), 96:1–14.

Digital Library

[79]

Daniel Zoran, Phillip Isola, Dilip Krishnan, and William T Freeman. 2015. Learning ordinal relationships for mid-level vision. In ICCV.

Cited By

Jin YLi XWang JZhang YZhang M(2024)Raindrop Clarity: A Dual-Focused Dataset for Day and Night Raindrop RemovalComputer Vision – ECCV 202410.1007/978-3-031-72658-3_1(1-17)Online publication date: 29-Sep-2024
https://dl.acm.org/doi/10.1007/978-3-031-72658-3_1

Index Terms

IntrinsicDiffusion: Joint Intrinsic Layers from Latent Diffusion Models
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision representations
        Image representations
      2. Computer vision tasks
        Scene understanding
  2. Computer graphics
    1. Image manipulation
      1. Image processing

Recommendations

Intrinsic images using optimization
CVPR '11: Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition

In this paper, we present a novel intrinsic image recovery approach using optimization. Our approach is based on the assumption of in a local window in natural images. Our method adopts a premise that neighboring pixels in a local window of a single ...
Intrinsic Image Decomposition Using a Sparse Representation of Reflectance

Intrinsic image decomposition is an important problem that targets the recovery of shading and reflectance components from a single image. While this is an ill-posed problem on its own, we propose a novel approach for intrinsic image decomposition using ...
Intrinsic image estimation using near-$$L_0$$L0 sparse optimization

The objective of intrinsic images estimation is to decompose an input image into its intrinsic shading and reflectance components. This is a well-known under-constrained problem that has long been an open challenge. This paper proposes a novel approach ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGGRAPH '24: ACM SIGGRAPH 2024 Conference Papers

July 2024

1106 pages

ISBN:9798400705250

DOI:10.1145/3641519

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGGRAPH: ACM Special Interest Group on Computer Graphics and Interactive Techniques

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 July 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

EPSRC CAMERA 2.0
UKRI MyWorld Strength in Places Programme

Conference

SIGGRAPH '24

Sponsor:

SIGGRAPH

SIGGRAPH '24: Special Interest Group on Computer Graphics and Interactive Techniques Conference

July 27 - August 1, 2024

CO, Denver, USA

Acceptance Rates

Overall Acceptance Rate 1,822 of 8,601 submissions, 21%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
826
Total Downloads

Downloads (Last 12 months)826
Downloads (Last 6 weeks)172

Reflects downloads up to 23 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Jin YLi XWang JZhang YZhang M(2024)Raindrop Clarity: A Dual-Focused Dataset for Day and Night Raindrop RemovalComputer Vision – ECCV 202410.1007/978-3-031-72658-3_1(1-17)Online publication date: 29-Sep-2024
https://dl.acm.org/doi/10.1007/978-3-031-72658-3_1

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents