Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3641519.3657472acmconferencesArticle/Chapter ViewAbstractPublication PagessiggraphConference Proceedingsconference-collections
research-article

IntrinsicDiffusion: Joint Intrinsic Layers from Latent Diffusion Models

Published: 13 July 2024 Publication History

Abstract

Reasoning about the intrinsic properties of an image, such as albedo, illumination, and surface geometry, is a long-standing problem with many applications in image editing and compositing. Existing solutions to this ill-posed problem either heavily rely on manually designed priors or learn priors from limited datasets that lack diversity. Hence, they fall short in generalizing to in-the-wild test scenarios. In this paper, we show that a large-scale text-to-image generation model trained on a massive amount of visual data can implicitly learn intrinsic image priors. In particular, we introduce a novel conditioning mechanism built on top of a pre-trained foundational image generation model to jointly predict multiple intrinsic modalities from an input image. We demonstrate that predicting different modalities in a collaborative manner improves the overall quality. This design also enables mixing datasets with annotations of only a subset of the modalities during training, contributing to the generalizability of our approach. Our method achieves state-of-the-art performance in intrinsic image decomposition, both qualitatively and quantitatively. We also demonstrate downstream image editing applications, such as relighting and retexturing.

Supplemental Material

MP4 File - presentation
presentation
PDF File
Appendix

References

[1]
H. G. Barrow and J. M. Tenenbaum. 1978. Recovering intrinsic scene characteristics from images. Computer Vision Systems (1978).
[2]
Anil S Baslamisli, Thomas T Groenestege, Partha Das, Hoang-An Le, Sezer Karaoglu, and Theo Gevers. 2018. Joint learning of intrinsic images and semantic segmentation. In ECCV. 286–302.
[3]
Ronen Basri and David W Jacobs. 2003. Lambertian reflectance and linear subspaces. IEEE Transactions on Pattern Analysis and Machine Intelligence 25, 2 (2003), 218–233.
[4]
Sean Bell, Kavita Bala, and Noah Snavely. 2014. Intrinsic images in the wild. ACM Transactions on Graphics 33, 4 (2014), 159:1–12.
[5]
A.H. Bermano, R. Gal, Y. Alaluf, R. Mokady, Y. Nitzan, O. Tov, O. Patashnik, and D. Cohen-Or. 2022. State-of-the-Art in the Architecture, Methods and Applications of StyleGAN. Computer Graphics Forum 41, 2 (2022), 591–611. https://doi.org/10.1111/cgf.14503
[6]
Anand Bhattad, Daniel McKee, Derek Hoiem, and D.A. Forsyth. 2023. StyleGAN knows Normal, Depth, Albedo, and More. In NeurIPS.
[7]
Sai Bi, Xiaoguang Han, and Yizhou Yu. 2015. An L1 image transform for edge-preserving smoothing and scene-level intrinsic decomposition. ACM Transactions on Graphics 34, 4 (2015), 78:1–12.
[8]
Nicolas Bonneel, Balazs Kovacs, Sylvain Paris, and Kavita Bala. 2017. Intrinsic decompositions for image editing. Computer Graphics Forum 36, 2 (2017), 593–609.
[9]
Adrien Bousseau, Sylvain Paris, and Frédo Durand. 2009. User-assisted intrinsic images. ACM Transactions on Graphics 28, 5 (2009), 130:1–10. https://doi.org/10.1145/1618452.1618476
[10]
Chris Careaga and Yağız Aksoy. 2023. Intrinsic Image Decomposition via Ordinal Shading. ACM Trans. Graph. 43, 1 (2023), 12:1–24. https://doi.org/10.1145/3630750
[11]
Qifeng Chen and Vladlen Koltun. 2013. A simple model for intrinsic image decomposition with depth cues. In ICCV. 241–248.
[12]
Changwoon Choi, Juhyeon Kim, and Young Min Kim. 2023. IBL-NeRF: Image-Based Lighting Formulation of Neural Radiance Fields. Comput. Graph. Forum (2023). https://doi.org/10.1111/cgf.14929
[13]
Partha Das, Maxime Gevers, Sezer Karaoglu, and Theo Gevers. 2023. IDTransformer: Transformer for Intrinsic Image Decomposition. In ICCV Workshops. 816–825.
[14]
Partha Das, Sezer Karaoglu, and Theo Gevers. 2022. PIE-Net: Photometric Invariant Edge Guided Network for Intrinsic Image Decomposition. In CVPR.
[15]
Xiaodan Du, Nicholas Kolkin, Greg Shakhnarovich, and Anand Bhattad. 2023. Generative Models: What do they know? Do they know things? Let’s find out!. In NeurIPS.
[16]
Patrick Esser, Robin Rombach, and Bjorn Ommer. 2021. Taming Transformers for High-Resolution Image Synthesis. In CVPR. 12873–12883.
[17]
Qingnan Fan, Jiaolong Yang, Gang Hua, Baoquan Chen, and David Wipf. 2018. Revisiting deep intrinsic image decompositions. In CVPR. 8944–8952.
[18]
David Forsyth and Jason Rock. 2022. Intrinsic Image Decomposition using Paradigms. TPAMI 44, 11 (2022), 7624–7637. https://doi.org/10.1109/TPAMI.2021.3119551
[19]
Elena Garces, Adolfo Muñoz, Jorge Lopez-Moreno, and Diego Gutierrez. 2012. Intrinsic Images by Clustering. Computer Graphics Forum 31, 4 (2012), 1415–1424.
[20]
Elena Garces, Carlos Rodriguez-Pardo, Dan Casas, and Jorge Lopez-Moreno. 2022. A Survey on Intrinsic Images: Delving Deep Into Lambert and Beyond. International Journal of Computer Vision 130 (2022), 836–868. https://doi.org/10.1007/s11263-021-01563-8
[21]
Mathieu Garon, Kalyan Sunkavalli, Sunil Hadap, Nathan Carr, and Jean-François Lalonde. 2019. Fast Spatially-Varying Indoor Lighting Estimation. In CVPR. 6908–6917.
[22]
Peter Vincent Gehler, Carsten Rother, Martin Kiefel, Lumin Zhang, and Bernhard Schölkopf. 2011. Recovering Intrinsic Images with a Global Sparsity Prior on Reflectance. In NIPS.
[23]
Roger Grosse, Micah K Johnson, Edward H Adelson, and William T Freeman. 2009. Ground truth dataset and baseline evaluations for intrinsic image algorithms. In ICCV. 2335–2342.
[24]
Mohammed Hachama, Bernard Ghanem, and Peter Wonka. 2015. Intrinsic scene decomposition from RGB-D images. In ICCV. 810–818.
[25]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In CVPR. https://doi.org/10.1109/CVPR.2016.90
[26]
Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic models. In NeurIPS. 6840–6851.
[27]
Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2022. LoRA: Low-Rank Adaptation of Large Language Models. In ICLR.
[28]
Junqing Huang, Michael Ruzhansky, Qianying Zhang, and Haihui Wang. 2023. Intrinsic Image Transfer for Illumination Manipulation. TPAMI 45, 6 (2023), 7444–7456. https://doi.org/10.1109/TPAMI.2022.3224253
[29]
Yasamin Jafarian, Tuanfeng Y Wang, Duygu Ceylan, Jimei Yang, Nathan Carr, Yi Zhou, and Hyun Soo Park. 2023. Normal-guided Garment UV Prediction for Human Re-texturing. In CVPR.
[30]
Yeying Jin, Ruoteng Li, Wenhan Yang, and Robby T Tan. 2023. Estimating Reflectance Layer from A Single Image: Integrating Reflectance Guidance and Shadow/Specular Aware Learning. In AAAI.
[31]
Tero Karras, Samuli Laine, and Timo Aila. 2021. A Style-Based Generator Architecture for Generative Adversarial Networks. IEEE Trans. Pattern Anal. Mach. Intell. 43, 12 (dec 2021), 4217–4228. https://doi.org/10.1109/TPAMI.2020.2970919
[32]
Seungryong Kim, Kihong Park, Kwanghoon Sohn, and Stephen Lin. 2016. Unified depth prediction and intrinsic image decomposition from a single image via joint convolutional neural fields. In ECCV. 143–159.
[33]
Peter Kocsis, Vincent Sitzmann, and Matthias Nießner. 2024. Intrinsic Image Diffusion for Single-view Material Estimation. In CVPR.
[34]
Balazs Kovacs, Sean Bell, Noah Snavely, and Kavita Bala. 2017. Shading annotations in the wild. In CVPR. 6998–7007.
[35]
Philipp Krähenbühl. 2018. Free supervision from video games. In CVPR.
[36]
Louis Lettry, Kenneth Vanhoey, and Luc Van Gool. 2018. DARN: a deep adversarial residual network for intrinsic image decomposition. In WACV. 1359–1367.
[37]
Daiqing Li, Junlin Yang, Karsten Kreis, Antonio Torralba, and Sanja Fidler. 2021a. Semantic Segmentation with Generative Models: Semi-Supervised Learning and Strong Out-of-Domain Generalization. In CVPR.
[38]
Zhengqin Li, Mohammad Shafiei, Ravi Ramamoorthi, Kalyan Sunkavalli, and Manmohan Chandraker. 2020. Inverse rendering for complex indoor scenes: Shape, spatially-varying lighting and SVBRDF from a single image. In CVPR. 2475–2484.
[39]
Zhengqi Li and Noah Snavely. 2018a. CGIntrinsics: Better Intrinsic Image Decomposition Through Physically-Based Rendering. In ECCV.
[40]
Zhengqi Li and Noah Snavely. 2018b. Learning intrinsic image decomposition from watching the world. In CVPR. 9039–9048.
[41]
Zhengqin Li, Ting-Wei Yu, Shen Sang, Sarah Wang, Meng Song, Yuhan Liu, Yu-Ying Yeh, Rui Zhu, Nitesh Gundavarapu, Jia Shi, Sai Bi, Hong-Xing Yu, Zexiang Xu, Kalyan Sunkavalli, Milos Hasan, Ravi Ramamoorthi, and Manmohan Chandraker. 2021b. OpenRooms: An Open Framework for Photorealistic Indoor Scene Datasets. In CVPR.
[42]
Shanchuan Lin, Bingchen Liu, Jiashi Li, and Xiao Yang. 2024. Common Diffusion Noise Schedules and Sample Steps are Flawed. In WACV.
[43]
Yunfei Liu, Yu Li, Shaodi You, and Feng Lu. 2020. Unsupervised learning for intrinsic image decomposition from a single image. In CVPR.
[44]
Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie, Yixuan Wei, Jia Ning, Yue Cao, Zheng Zhang, Li Dong, Furu Wei, and Baining Guo. 2022. Swin Transformer V2: Scaling Up Capacity and Resolution. In CVPR. 12009–12019.
[45]
Jundan Luo, Zhaoyang Huang, Yijin Li, Xiaowei Zhou, Guofeng Zhang, and Hujun Bao. 2020. NIID-Net: Adapting Surface Normal Knowledge for Intrinsic Image Decomposition in Indoor Scenes. IEEE Transactions on Visualization and Computer Graphics 26, 12 (2020), 3434–3445.
[46]
Jundan Luo, Nanxuan Zhao, Wenbin Li, and Christian Richardt. 2023. CRefNet: Learning Consistent Reflectance Estimation With a Decoder-Sharing Transformer. IEEE Transactions on Visualization and Computer Graphics (2023). https://doi.org/10.1109/TVCG.2023.3337870
[47]
Abhimitra Meka, Gereon Fox, Michael Zollhöfer, Christian Richardt, and Christian Theobalt. 2017. Live user-guided intrinsic video for static scenes. IEEE Transactions on Visualization and Computer Graphics 23, 11 (2017), 2447–2454.
[48]
Abhimitra Meka, Mohammad Shafiei, Michael Zollhöfer, Christian Richardt, and Christian Theobalt. 2021. Real-time Global Illumination Decomposition of Videos. ACM Transactions on Graphics 40, 3 (2021), 1–16.
[49]
Lukas Murmann, Michael Gharbi, Miika Aittala, and Fredo Durand. 2019. A Dataset of Multi-Illumination Images in the Wild. In ICCV. 4080–4089.
[50]
Takuya Narihira, Michael Maire, and Stella X Yu. 2015. Learning lightness from human judgement on relative reflectance. In CVPR. 2965–2973.
[51]
Ryan Po and Gordon Wetzstein. 2023. Compositional 3D Scene Generation using Locally Conditioned Diffusion. (2023). arXiv:2303.12218.
[52]
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, 2021. Learning transferable visual models from natural language supervision. In ICML.
[53]
Mike Roberts, Jason Ramapuram, Anurag Ranjan, Atulit Kumar, Miguel Angel Bautista, Nathan Paczan, Russ Webb, and Joshua M Susskind. 2021. Hypersim: A photorealistic synthetic dataset for holistic indoor scene understanding. In ICCV. 10912–10922.
[54]
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022. High-Resolution Image Synthesis with Latent Diffusion Models. In CVPR.
[55]
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-Net: Convolutional networks for biomedical image segmentation. In MICCAI.
[56]
Tim Salimans and Jonathan Ho. 2022. Progressive Distillation for Fast Sampling of Diffusion Models. In ICLR.
[57]
Kripasindhu Sarkar, Marcel C. Buehler, Gengyan Li, Daoye Wang, Delio Vicini, Jérémy Riviere, Yinda Zhang, Sergio Orts-Escolano, Paulo Gotardo, Thabo Beeler, and Abhimitra Meka. 2023. LitNeRF: Intrinsic Radiance Decomposition for High-Quality View Synthesis and Relighting of Faces. In SIGGRAPH Asia. https://doi.org/10.1145/3610548.3618210
[58]
Saurabh Saxena, Charles Herrmann, Junhwa Hur, Abhishek Kar, Mohammad Norouzi, Deqing Sun, and David J. Fleet. 2023a. The Surprising Effectiveness of Diffusion Models for Optical Flow and Monocular Depth Estimation. In NeurIPS.
[59]
Saurabh Saxena, Junhwa Hur, Charles Herrmann, Deqing Sun, and David J. Fleet. 2023b. Zero-Shot Metric Depth with a Field-of-View Conditioned Diffusion Model. (2023). arXiv:2312.13252.
[60]
Viraj Shah, Svetlana Lazebnik, and Julien Philip. 2023. JoIN: Joint GANs Inversion for Intrinsic Image Decomposition. (2023). arXiv:2305.11321.
[61]
Jianbing Shen, Xiaoshan Yang, Yunde Jia, and Xuelong Li. 2011. Intrinsic images using optimization. In CVPR. 3481–3487.
[62]
Nathan Silberman, Derek Hoiem, Pushmeet Kohli, and Rob Fergus. 2012. Indoor segmentation and support inference from RGBD images. In ECCV. 746–760.
[63]
Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. 2015. Deep unsupervised learning using nonequilibrium thermodynamics. In ICML.
[64]
Igor Vasiljevic, Nick Kolkin, Shanyi Zhang, Ruotian Luo, Haochen Wang, Falcon Z. Dai, Andrea F. Daniele, Mohammadreza Mostajabi, Steven Basart, Matthew R. Walter, and Gregory Shakhnarovich. 2019. DIODE: A Dense Indoor and Outdoor DEpth Dataset. (2019). arXiv:1908.00463.
[65]
Zongji Wang, Yunfei Liu, and Feng Lu. 2023. Discriminative feature encoding for intrinsic image decomposition. Computational Visual Media 9 (2023), 597–618. https://doi.org/10.1007/s41095-022-0294-4
[66]
Chenglei Wu, Michael Zollhöfer, Matthias Nießner, Marc Stamminger, Shahram Izadi, and Christian Theobalt. 2014. Real-time shading-based refinement for consumer depth cameras. ACM Transactions on Graphics 33, 6 (2014), 200:1–10.
[67]
Jiaye Wu, Sanjoy Chowdhury, Hariharmano Shanmugaraja, David Jacobs, and Soumyadip Sengupta. 2023. Measured Albedo in the Wild: Filling the Gap in Intrinsics Evaluation. In ICCP.
[68]
Ling Yang, Zhilong Zhang, Yang Song, Shenda Hong, Runsheng Xu, Yue Zhao, Wentao Zhang, Bin Cui, and Ming-Hsuan Yang. 2023. Diffusion Models: A Comprehensive Survey of Methods and Applications. ACM Comput. Surv. 56, 4, Article 105 (nov 2023), 39 pages. https://doi.org/10.1145/3626235
[69]
Weicai Ye, Shuo Chen, Chong Bao, Hujun Bao, Marc Pollefeys, Zhaopeng Cui, and Guofeng Zhang. 2023. IntrinsicNeRF: Learning Intrinsic Neural Radiance Fields for Editable Novel View Synthesis. In ICCV.
[70]
Lap-Fai Yu, Sai-Kit Yeung, Yu-Wing Tai, and Stephen Lin. 2013. Shading-based shape refinement of RGB-D images. In CVPR. 1415–1422.
[71]
Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. 2023. Adding Conditional Control to Text-to-Image Diffusion Models. In ICCV.
[72]
Qi Zhao, Ping Tan, Qiang Dai, Li Shen, Enhua Wu, and Stephen Lin. 2012. A closed-form solution to Retinex with nonlocal texture constraints. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 7 (2012), 1437–1444.
[73]
Chengwei Zheng, Wenbin Lin, and Feng Xu. 2022. A Self-Occlusion Aware Lighting Model for Real-Time Dynamic Reconstruction. IEEE Transactions on Visualization and Computer Graphics (2022).
[74]
Hao Zhou, Xiang Yu, and David W Jacobs. 2019. GLoSH: Global-Local Spherical Harmonics for Intrinsic Image Decomposition. In ICCV. 7820–7829.
[75]
Tinghui Zhou, Philipp Krähenbühl, and Alexei A Efros. 2015. Learning data-driven reflectance priors for intrinsic image decomposition. In ICCV. 3469–3477.
[76]
Jingsen Zhu, Yuchi Huo, Qi Ye, Fujun Luan, Jifan Li, Dianbing Xi, Lisha Wang, Rui Tang, Wei Hua, Hujun Bao, and Rui Wang. 2023. I2-SDF: Intrinsic Indoor Scene Reconstruction and Editing via Raytracing in Neural SDFs. In CVPR. https://doi.org/10.1109/CVPR52729.2023.01202
[77]
Jingsen Zhu, Fujun Luan, Yuchi Huo, Zihao Lin, Zhihua Zhong, Dianbing Xi, Rui Wang, Hujun Bao, Jiaxiang Zheng, and Rui Tang. 2022. Learning-based Inverse Rendering of Complex Indoor Scenes with Differentiable Monte Carlo Raytracing. In Proceedings of SIGGRAPH Asia. 6:1–8.
[78]
Michael Zollhöfer, Angela Dai, Matthias Innmann, Chenglei Wu, Marc Stamminger, Christian Theobalt, and Matthias Nießner. 2015. Shading-based refinement on volumetric signed distance functions. ACM Transactions on Graphics 34, 4 (2015), 96:1–14.
[79]
Daniel Zoran, Phillip Isola, Dilip Krishnan, and William T Freeman. 2015. Learning ordinal relationships for mid-level vision. In ICCV.

Cited By

View all
  • (2024)Raindrop Clarity: A Dual-Focused Dataset for Day and Night Raindrop RemovalComputer Vision – ECCV 202410.1007/978-3-031-72658-3_1(1-17)Online publication date: 29-Sep-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGGRAPH '24: ACM SIGGRAPH 2024 Conference Papers
July 2024
1106 pages
ISBN:9798400705250
DOI:10.1145/3641519
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 July 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. diffusion model
  2. intrinsic image decomposition
  3. multi-task learning
  4. surface normal estimation

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • EPSRC CAMERA 2.0
  • UKRI MyWorld Strength in Places Programme

Conference

SIGGRAPH '24
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,822 of 8,601 submissions, 21%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)826
  • Downloads (Last 6 weeks)172
Reflects downloads up to 23 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Raindrop Clarity: A Dual-Focused Dataset for Day and Night Raindrop RemovalComputer Vision – ECCV 202410.1007/978-3-031-72658-3_1(1-17)Online publication date: 29-Sep-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media