Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3588432.3591552acmconferencesArticle/Chapter ViewAbstractPublication PagessiggraphConference Proceedingsconference-collections
research-article

TextDeformer: Geometry Manipulation using Text Guidance

Published: 23 July 2023 Publication History

Abstract

We present a technique for automatically producing a deformation of an input triangle mesh, guided solely by a text prompt. Our framework is capable of deformations that produce both large, low-frequency shape changes, and small high-frequency details. Our framework relies on differentiable rendering to connect geometry to powerful pre-trained image encoders, such as CLIP and DINO. Notably, updating mesh geometry by taking gradient steps through differentiable rendering is notoriously challenging, commonly resulting in deformed meshes with significant artifacts. These difficulties are amplified by noisy and inconsistent gradients from CLIP. To overcome this limitation, we opt to represent our mesh deformation through Jacobians, which updates deformations in a global, smooth manner (rather than locally-sub-optimal steps). Our key observation is that Jacobians are a representation that favors smoother, large deformations, leading to a global relation between vertices and pixels, and avoiding localized noisy gradients. Additionally, to ensure the resulting shape is coherent from all 3D viewpoints, we encourage the deep features computed on the 2D encoding of the rendering to be consistent for a given vertex from all viewpoints. We demonstrate that our method is capable of smoothly-deforming a wide variety of source mesh and target text prompts, achieving both large modifications to, e.g., body proportions of animals, as well as adding fine semantic details, such as shoe laces on an army boot and fine details of a face.

Supplemental Material

MP4 File
presentation

References

[1]
Noam Aigerman, Kunal Gupta, Vladimir G. Kim, Siddhartha Chaudhuri, Jun Saito, and Thibault Groueix. 2022. Neural Jacobian Fields: Learning Intrinsic Mappings of Arbitrary Meshes. ACM Trans. Graph. 41, 4, Article 109 (jul 2022), 17 pages. https://doi.org/10.1145/3528223.3530141
[2]
Shir Amir, Yossi Gandelsman, Shai Bagon, and Tali Dekel. 2022. Deep ViT Features as Dense Visual Descriptors. ECCVW What is Motion For? (2022).
[3]
Stephen W Bailey, Dalton Omens, Paul Dilorenzo, and James F O’Brien. 2020. Fast and deep facial deformations. ACM Transactions on Graphics (TOG) 39, 4 (2020), 94–1.
[4]
Stephen W. Bailey, Dave Otte, Paul Dilorenzo, and James F. O’Brien. 2018. Fast and Deep Deformation Approximations. ACM Transactions on Graphics 37, 4 (Aug. 2018), 119:1–12. https://doi.org/10.1145/3197517.3201300 Presented at SIGGRAPH 2018, Los Angeles.
[5]
Kevin Chen, Christopher B Choy, Manolis Savva, Angel X Chang, Thomas Funkhouser, and Silvio Savarese. 2018. Text2Shape: Generating Shapes from Natural Language by Learning Joint Embeddings. arXiv preprint arXiv:1803.08495 (2018).
[6]
Matt Deitke, Dustin Schwenk, Jordi Salvador, Luca Weihs, Oscar Michel, Eli VanderBilt, Ludwig Schmidt, Kiana Ehsani, Aniruddha Kembhavi, and Ali Farhadi. 2022. Objaverse: A Universe of Annotated 3D Objects. arXiv preprint arXiv:2212.08051 (2022).
[7]
Lawson Fulton, Vismay Modi, David Duvenaud, David I. W. Levin, and Alec Jacobson. 2019. Latent-space Dynamics for Reduced Deformable Simulation. Computer Graphics Forum (2019).
[8]
Ran Gal, Olga Sorkine, Niloy J. Mitra, and Daniel Cohen-Or. 2009. iWIRES: An Analyze-and-Edit Approach to Shape Manipulation. ACM Transactions on Graphics (Siggraph) 28, 3 (2009), #33, 1–10.
[9]
Lin Gao, Jie Yang, Yi-Ling Qiao, Yu-Kun Lai, Paul L Rosin, Weiwei Xu, and Shihong Xia. 2018. Automatic Unpaired Shape Deformation Transfer. ACM Transactions on Graphics (Proceedings of ACM SIGGRAPH Asia 2018) 37, 6 (2018), To appear.
[10]
Rana Hanocka, Noa Fish, Zhenhua Wang, Raja Giryes, Shachar Fleishman, and Daniel Cohen-Or. 2018. ALIGNet: partial-shape agnostic alignment via unsupervised learning. ACM Transactions on Graphics (TOG) 38, 1 (2018), 1.
[11]
Daniel Holden, Jun Saito, and Taku Komura. 2015. Learning an Inverse Rig Mapping for Character Animation. In Proceedings of the 14th ACM SIGGRAPH / Eurographics Symposium on Computer Animation(SCA ’15). Association for Computing Machinery, New York, NY, USA, 165–173. https://doi.org/10.1145/2786784.2786788
[12]
Alec Jacobson. 2013. Algorithms and Interfaces for Real-Time Deformation of 2D and 3D Shapes. In PhD Thesis, ETH Zurich.
[13]
Alec Jacobson, Zhigang Deng, Ladislav Kavan, and JP Lewis. 2014. Skinning: Real-time Shape Deformation. In ACM SIGGRAPH 2014 Courses.
[14]
Ajay Jain, Ben Mildenhall, Jonathan T. Barron, Pieter Abbeel, and Ben Poole. 2021. Zero-Shot Text-Guided Object Generation with Dream Fields. arXiv (December 2021).
[15]
Tomas Jakab, Richard Tucker, Ameesh Makadia, Jiajun Wu, Noah Snavely, and Angjoo Kanazawa. 2020. KeypointDeformer: Unsupervised 3D Keypoint Discovery for Shape Control. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[16]
Angjoo Kanazawa, Shubham Tulsiani, Alexei A. Efros, and Jitendra Malik. 2018. Learning Category-Specific Mesh Reconstruction from Image Collections. In ECCV.
[17]
Nasir Mohammad Khalid, Tianhao Xie, Eugene Belilovsky, and Popa Tiberiu. 2022. CLIP-Mesh: Generating textured meshes from text using pretrained image-text models. SIGGRAPH Asia 2022 Conference Papers (December 2022).
[18]
Gwanghyun Kim, Taesung Kwon, and Jong Chul Ye. 2022. DiffusionCLIP: Text-Guided Diffusion Models for Robust Image Manipulation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2426–2435.
[19]
Vladislav Kraevoy and Alla Sheffer. 2004. Cross-Parameterization and Compatible Remeshing of 3D Models. In ACM Transactions on Graphics (Proc. SIGGRAPH).
[20]
Samuli Laine, Janne Hellsten, Tero Karras, Yeongho Seol, Jaakko Lehtinen, and Timo Aila. 2020. Modular Primitives for High-Performance Differentiable Rendering. ACM Transactions on Graphics 39, 6 (2020).
[21]
Peizhuo Li, Kfir Aberman, Rana Hanocka, Libin Liu, Olga Sorkine-Hornung, and Baoquan Chen. 2021. Learning Skeletal Articulations with Neural Blend Shapes. ACM Transactions on Graphics (TOG) 40, 4 (2021), 1.
[22]
Chen-Hsuan Lin, Jun Gao, Luming Tang, Towaki Takikawa, Xiaohui Zeng, Xun Huang, Karsten Kreis, Sanja Fidler, Ming-Yu Liu, and Tsung-Yi Lin. 2022. Magic3D: High-Resolution Text-to-3D Content Creation. arXiv preprint arXiv:2211.10440 (2022).
[23]
Minghua Liu, Minhyuk Sung, Radomir Mech, and Hao Su. 2021. DeepMetaHandles: Learning Deformation Meta-Handles of 3D Meshes with Biharmonic Coordinates. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12–21.
[24]
Oscar Michel, Roi Bar-On, Richard Liu, Sagie Benaim, and Rana Hanocka. 2021. Text2Mesh: Text-Driven Neural Stylization for Meshes. arXiv preprint arXiv:2112.03221 (2021).
[25]
Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. 2020. NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. In ECCV.
[26]
Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, and Mark Chen. 2021. GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models. CoRR abs/2112.10741 (2021). arXiv:2112.10741https://arxiv.org/abs/2112.10741
[27]
Alex Nichol, Heewoo Jun, Prafulla Dhariwal, Pamela Mishkin, and Mark Chen. 2022. Point-E: A System for Generating 3D Point Clouds from Complex Prompts. arXiv preprint arXiv:2212.08751 (2022).
[28]
Or Patashnik, Zongze Wu, Eli Shechtman, Daniel Cohen-Or, and Dani Lischinski. 2021. StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 2085–2094.
[29]
Ben Poole, Ajay Jain, Jonathan T. Barron, and Ben Mildenhall. 2022. DreamFusion: Text-to-3D using 2D Diffusion. arXiv (2022).
[30]
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, 2021. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning. PMLR, 8748–8763.
[31]
Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. 2022. Hierarchical Text-Conditional Image Generation with CLIP Latents. ArXiv abs/2204.06125 (2022).
[32]
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022. High-Resolution Image Synthesis With Latent Diffusion Models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 10684–10695.
[33]
Cristian Romero, Dan Casas, Jesus Perez, and Miguel A. Otaduy. 2021. Learning Contact Corrections for Handle-Based Subspace Dynamics. ACM Trans. on Graphics (Proc. of ACM SIGGRAPH) 40, 4 (2021). http://gmrv.es/Publications/2021/RCPO21
[34]
Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, S Sara Mahdavi, Rapha Gontijo Lopes, 2022. Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding. arXiv preprint arXiv:2205.11487 (2022).
[35]
Aditya Sanghi, Hang Chu, Joseph G Lambourne, Ye Wang, Chin-Yi Cheng, and Marco Fumero. 2021. Clip-forge: Towards zero-shot text-to-shape generation. arXiv preprint arXiv:2110.02624 (2021).
[36]
Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Wortsman, 2022. Laion-5b: An open large-scale dataset for training next generation image-text models. NeurIPS (2022).
[37]
Meitar Shechter, Rana Hanocka, Gal Metzer, Raja Giryes, and Daniel Cohen-Or. 2022. NeuralMLS: Geometry-Aware Control Point Deformation. (2022).
[38]
Siyuan Shen, Yin Yang, Tianjia Shao, He Wang, Chenfanfu Jiang, Lei Lan, and Kun Zhou. 2021. High-order differentiable autoencoder for nonlinear model reduction. ACM Transactions on Graphics.
[39]
Olga Sorkine and Marc Alexa. 2007. As-Rigid-As-Possible Surface Modeling. In Proceedings of EUROGRAPHICS/ACM SIGGRAPH Symposium on Geometry Processing. 109–116.
[40]
Olga Sorkine, Daniel Cohen-Or, Yaron Lipman, Marc Alexa, Christian Rössl, and Hans-Peter Seidel. 2004. Laplacian Surface Editing. In Proceedings of the EUROGRAPHICS/ACM SIGGRAPH Symposium on Geometry Processing. ACM Press, 179–188.
[41]
Bo Sun, Xiangru Huang, Qixing Huang, Zaiwei Zhang, Junfeng Jiang, and Chandrajit Bajaj. 2021. ARAPReg: An As-Rigid-As Possible Regularization Loss for Learning Deformable Shape Generators. In ICCV.
[42]
Qingyang Tan, Lin Gao, Yu-Kun Lai, and Shihong Xia. 2018. Variational Autoencoders for Deforming 3D Mesh Models. In CVPR.
[43]
Jiaxiang Tang. 2022. Stable-dreamfusion: Text-to-3D with Stable-diffusion. https://github.com/ashawkey/stable-dreamfusion.
[44]
Haochen Wang, Xiaodan Du, Jiahao Li, Raymond A. Yeh, and Greg Shakhnarovich. 2022. Score Jacobian Chaining: Lifting Pretrained 2D Diffusion Models for 3D Generation. arXiv preprint arXiv:2212.00774 (2022).
[45]
Zhan Xu, Yang Zhou, Evangelos Kalogerakis, Chris Landreth, and Karan Singh. 2020. RigNet: Neural Rigging for Articulated Characters. ACM Trans. on Graphics 39 (2020).
[46]
Zhan Xu, Yang Zhou, Evangelos Kalogerakis, and Karan Singh. 2019. Predicting Animation Skeletons for 3D Articulated Models via Volumetric Nets. In 2019 International Conference on 3D Vision (3DV).
[47]
Wang Yifan, Noam Aigerman, Vladimir G. Kim, Siddhartha Chaudhuri, and Olga Sorkine-Hornung. 2020. Neural Cages for Detail-Preserving 3D Deformations. In CVPR.
[48]
Kangxue Yin, Jun Gao, Maria Shugrina, Sameh Khamis, and Sanja Fidler. 2021. 3DStyleNet: Creating 3D Shapes with Geometric and Texture Style Variations. In Proceedings of International Conference on Computer Vision (ICCV).
[49]
Mehmet Ersin Yumer, Siddhartha Chaudhuri, Jessica K Hodgins, and Levent Burak Kara. 2015. Semantic shape editing using deformation handles. ACM Transactions on Graphics (TOG) 34, 4 (2015), 1–12.
[50]
Mianlun Zheng, Yi Zhou, Duygu Ceylan, and Jernej Barbic. 2021. A Deep Emulator for Secondary Motion of 3D Characters. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5932–5940.

Cited By

View all
  • (2024)Real-time Large-scale Deformation of Gaussian SplattingACM Transactions on Graphics10.1145/368775643:6(1-17)Online publication date: 19-Dec-2024
  • (2024)Text-guided Controllable Mesh Refinement for Interactive 3D ModelingSIGGRAPH Asia 2024 Conference Papers10.1145/3680528.3687630(1-11)Online publication date: 3-Dec-2024
  • (2024)Controllable Neural Style Transfer for Dynamic MeshesACM SIGGRAPH 2024 Conference Papers10.1145/3641519.3657474(1-11)Online publication date: 13-Jul-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGGRAPH '23: ACM SIGGRAPH 2023 Conference Proceedings
July 2023
911 pages
ISBN:9798400701597
DOI:10.1145/3588432
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 July 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. CLIP
  2. deformation
  3. mesh
  4. text guidance

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

SIGGRAPH '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,822 of 8,601 submissions, 21%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)317
  • Downloads (Last 6 weeks)36
Reflects downloads up to 15 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Real-time Large-scale Deformation of Gaussian SplattingACM Transactions on Graphics10.1145/368775643:6(1-17)Online publication date: 19-Dec-2024
  • (2024)Text-guided Controllable Mesh Refinement for Interactive 3D ModelingSIGGRAPH Asia 2024 Conference Papers10.1145/3680528.3687630(1-11)Online publication date: 3-Dec-2024
  • (2024)Controllable Neural Style Transfer for Dynamic MeshesACM SIGGRAPH 2024 Conference Papers10.1145/3641519.3657474(1-11)Online publication date: 13-Jul-2024
  • (2024)Spice·E: Structural Priors in 3D Diffusion using Cross-Entity AttentionACM SIGGRAPH 2024 Conference Papers10.1145/3641519.3657461(1-11)Online publication date: 13-Jul-2024
  • (2024)iShapEditing: Intelligent Shape Editing with Diffusion ModelsComputer Graphics Forum10.1111/cgf.1525343:7Online publication date: 8-Nov-2024
  • (2024)FontCLIP: A Semantic Typography Visual‐Language Model for Multilingual Font ApplicationsComputer Graphics Forum10.1111/cgf.1504343:2Online publication date: 30-Apr-2024
  • (2024)GaussianEditor: Swift and Controllable 3D Editing with Gaussian Splatting2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.02029(21476-21485)Online publication date: 16-Jun-2024
  • (2024)4D-fy: Text-to-4D Generation Using Hybrid Score Distillation Sampling2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.00764(7996-8006)Online publication date: 16-Jun-2024
  • (2024)LeGO: Leveraging a Surface Deformation Network for Animatable Stylized Face Generation with One Example2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.00431(4505-4514)Online publication date: 16-Jun-2024
  • (2024)3D Paintbrush: Local Stylization of 3D Shapes with Cascaded Score Distillation2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.00428(4473-4483)Online publication date: 16-Jun-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media