Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content

FusionDeformer: text-guided mesh deformation using diffusion models

Published: 25 May 2024 Publication History


Mesh deformation has a wide range of applications, including character creation, geometry modelling, deforming animation, and morphing. Recently, mesh deformation methods based on CLIP models demonstrated the ability to perform automatic text-guided mesh deformation. However, using 2D guidance to deform a 3D mesh attempts to solve an ill-posed problem and leads to distortion and unsmoothness, which cannot be eliminated by CLIP-based methods because they focus on semantic-aware features and cannot identify these artefacts. To this end, we propose FusionDeformer, a novel automatic text-guided mesh deformation method that leverages diffusion models. The deformation is achieved by Score Distillation Sampling, which minimizes the KL-divergence between the distribution of rendered deformed mesh and the text-conditioned distribution. To alleviate the intrinsic ill-posed problem, we incorporate two approaches into our framework. The first approach involves combining multiple orthogonal views into a single image, providing robust deformation while avoiding the need for additional memory. The second approach incorporates a new regularization to address the unsmooth artefacts. Our experimental results show that the proposed method can generate high-quality, smoothly deformed meshes that align precisely with the input text description while preserving the topological relationships. Additionally, our method offers a text2morphing approach to animation design, enabling common users to produce special effects animation.


Aigerman N, Gupta K, Kim VG, Chaudhuri S, Saito J, and Groueix T Neural Jacobian fields: learning intrinsic mappings of arbitrary meshes ACM Trans. Graph. 2022 41 4 109:1-109:17
Bailey SW, Omens D, Dilorenzo P, and O’Brien JF Fast and deep facial deformations ACM Trans. Graph. 2020 39 4 94
Barron, J.T., Mildenhall, B., Tancik, M., Hedman, P., Martin-Brualla, R., Srinivasan, P.P.: Mip-nerf: a multiscale representation for anti-aliasing neural radiance fields. In: 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10–17, 2021, 5835–5844 (2021)
Cao, Y., Cao, Y.P., Han, K., Shan, Y., Wong, K.Y.K.: Guide3D: Create 3D Avatars from Text and Image Guidance. arXiv:2308.09705 (2023)
Chen, R., Chen, Y., Jiao, N., Jia, K.: Fantasia3d: disentangling geometry and appearance for high-quality text-to-3d content creation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 22,246–22,256 (2023)
Chen, Z., Zhang, H.: Learning implicit fields for generative shape modeling. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16–20, 2019, 5939–5948 (2019)
Gadelha, M., Maji, S., Wang, R.: 3d shape induction from 2d views of multiple objects. In: 2017 International Conference on 3D Vision, 3DV 2017, Qingdao, China, October 10–12, 2017, 402–411 (2017)
Gal R, Sorkine O, Mitra NJ, and Cohen-Or D iwires: an analyze-and-edit approach to shape manipulation ACM Trans. Graph. 2009 28 3 33
Gao, W., Aigerman, N., Groueix, T., Kim, V., Hanocka, R.: Textdeformer: geometry manipulation using text guidance. In: ACM SIGGRAPH 2023 Conference Proceedings, SIGGRAPH 2023, Los Angeles, CA, USA, August 6-10, 2023, 82:1–82:11 (2023)
Han, X., Cao, Y., Han, K., Zhu, X., Deng, J., Song, Y.Z., Xiang, T., Wong, K.Y.K.: Headsculpt: Crafting 3d head avatars with text. Advances in Neural Information Processing Systems 36 (2024)
Hanocka R, Fish N, Wang Z, Giryes R, Fleishman S, and Cohen-Or D Alignet: Partial-shape agnostic alignment via unsupervised learning ACM Trans. Graph. 2019 38 1 1:1-1:14
Henzler, P., Mitra, N.J., Ritschel, T.: Escaping plato’s cave: 3d shape from adversarial rendering. In: 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27–November 2, 2019, 9983–9992 (2019)
Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. In: Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6–12, 2020, virtual, 6840–6851 (2020)
Huang, Q., Huang, X., Sun, B., Zhang, Z., Jiang, J., Bajaj, C.: Arapreg: An as-rigid-as possible regularization loss for learning deformable shape generators. In: 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10–17, 2021, 5795–5805 (2021)
Huang, S., Yang, Z., Li, L., Yang, Y., Jia, J.: Avatarfusion: Zero-shot generation of clothing-decoupled 3d avatars using 2d diffusion. In: Proceedings of the 31st ACM International Conference on Multimedia, MM 2023, Ottawa, ON, Canada, 29 October 2023- 3 November 2023, 5734–5745 (2023)
Huang, Y., Yi, H., Xiu, Y., Liao, T., Tang, J., Cai, D., Thies, J.: TeCH: text-guided reconstruction of lifelike clothed humans. In: International Conference on 3D Vision (3DV) (2024)
Jacobson, A.: Algorithms and interfaces for real-time deformation of 2d and 3d shapes. Ph.D. thesis, ETH Zurich (2013)
Jain, A., Mildenhall, B., Barron, J.T., Abbeel, P., Poole, B.: Zero-shot text-guided object generation with dream fields. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18–24, 2022, 857–866 (2022)
Jakab, T., Tucker, R., Makadia, A., Wu, J., Snavely, N., Kanazawa, A.: Keypointdeformer: Unsupervised 3d keypoint discovery for shape control. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19–25, 2021, 12,783–12,792 (2021)
Kanazawa, A., Tulsiani, S., Efros, A.A., Malik, J.: Learning category-specific mesh reconstruction from image collections. In: Computer Vision—ECCV 2018—15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part XV, Lecture Notes in Computer Science, vol. 11219, 386–402 (2018)
Khalid, N.M., Xie, T., Belilovsky, E., Popa, T.: Clip-mesh: Generating textured meshes from text using pretrained image-text models. In: SIGGRAPH Asia 2022 Conference Papers, SA 2022, Daegu, Republic of Korea, December 6–9, 2022, 25:1–25:8 (2022)
Kim, B., Kwon, P., Lee, K., Lee, M., Han, S., Kim, D., Joo, H.: Chupa: carving 3d clothed humans from skinned shape priors using 2d diffusion probabilistic models. In: IEEE/CVF International Conference on Computer Vision, ICCV 2023, Paris, France, October 1-6, 2023, 15,919–15,930 (2023)
Kraevoy V and Sheffer A Cross-parameterization and compatible remeshing of 3d models ACM Trans. Graph. 2004 23 3 861-869
Laine S, Hellsten J, Karras T, Seol Y, Lehtinen J, and Aila T Modular primitives for high-performance differentiable rendering ACM Trans. Graph. 2020 39 6 194:1-194:14
Li, W., Chen, R., Chen, X., Tan, P.: Sweetdreamer: aligning geometric priors in 2d diffusion for consistent text-to-3d. (2023) arXiv preprint arXiv:2310.02596
Lin, C., Gao, J., Tang, L., Takikawa, T., Zeng, X., Huang, X., Kreis, K., Fidler, S., Liu, M., Lin, T.: Magic3d: high-resolution text-to-3d content creation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, June 17–24, 2023, 300–309 (2023)
Luo, S., Hu, W.: Diffusion probabilistic models for 3d point cloud generation. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19–25, 2021, 2837–2845 (2021)
Mescheder, L.M., Oechsle, M., Niemeyer, M., Nowozin, S., Geiger, A.: Occupancy networks: learning 3d reconstruction in function space. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16–20, 2019, 4460–4470 (2019)
Michel, O., Bar-On, R., Liu, R., Benaim, S., Hanocka, R.: Text2mesh: text-driven neural stylization for meshes. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18–24, 2022, 13,482–13,492 (2022)
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: Nerf: representing scenes as neural radiance fields for view synthesis. In: Computer Vision—ECCV 2020—16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I, Lecture Notes in Computer Science, vol. 12346, 405–421 (2020)
Mo K, Guerrero P, Yi L, Su H, Wonka P, Mitra NJ, and Guibas LJ Structurenet: hierarchical graph networks for 3d shape generation ACM Trans. Graph. 2019 38 6 242:1-242:19
Nichol, A.Q., Dhariwal, P.: Improved denoising diffusion probabilistic models. In: Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18–24 July 2021, Virtual Event, Proceedings of Machine Learning Research, vol. 139, 8162–8171 (2021)
von Platen, P., Patil, S., Lozhkov, A., Cuenca, P., Lambert, N., Rasul, K., Davaadorj, M., Wolf, T.: Diffusers: State-of-the-art diffusion models. https://github.com/huggingface/diffusers (2022)
Poole, B., Jain, A., Barron, J.T., Mildenhall, B.: Dreamfusion: Text-to-3d using 2d diffusion. In: The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1–5, 2023 (2023)
Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., Sutskever, I.: Learning transferable visual models from natural language supervision. In: Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, Proceedings of Machine Learning Research, vol. 139, 8748–8763 (2021)
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18–24, 2022, 10,674–10,685 (2022)
Romero C, Casas D, Pérez J, and Otaduy MA Learning contact corrections for handle-based subspace dynamics ACM Trans. Graph. 2021 40 4 131:1-131:12
Saharia, C., Chan, W., Saxena, S., Li, L., Whang, J., Denton, E.L., Ghasemipour, S.K.S., Lopes, R.G., Ayan, B.K., Salimans, T., Ho, J., Fleet, D.J., Norouzi, M.: Photorealistic text-to-image diffusion models with deep language understanding. In: Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28–December 9, 2022, 36,479–36,494 (2022)
Sanghi, A., Chu, H., Lambourne, J.G., Wang, Y., Cheng, C., Fumero, M., Malekshan, K.R.: Clip-forge: towards zero-shot text-to-shape generation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022, 18,582–18,592 (2022)
Shi, Y., Wang, P., Ye, J., Long, M., Li, K., Yang, X.: Mvdream: multi-view diffusion for 3d generation. arXiv preprint arXiv:2308.16512 (2023)
Sorkine, O., Botsch, M.: Interactive shape modeling and deformation. In: Eurographics (Tutorials), 11–37 (2009)
Stan, G.B.M., Wofk, D., Fox, S., Redden, A., Saxton, W., Yu, J., Aflalo, E., Tseng, S.Y., Nonato, F., Muller, M., et al.: Ldm3d: latent diffusion model for 3d. arXiv preprint arXiv:2305.10853 (2023)
Tan, Q., Gao, L., Lai, Y., Xia, S.: Variational autoencoders for deforming 3d mesh models. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, 5841–5850 (2018)
Tang, J., Zhou, H., Chen, X., Hu, T., Ding, E., Wang, J., Zeng, G.: Delicate textured mesh recovery from nerf via adaptive surface refinement. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 17,739–17,749 (2023)
Tsalicoglou, C., Manhardt, F., Tonioni, A., Niemeyer, M., Tombari, F.: Textmesh: Generation of realistic 3d meshes from text prompts. arXiv preprint arXiv:2304.12439 (2023)
Wang, N., Zhang, Y., Li, Z., Fu, Y., Liu, W., Jiang, Y.: Pixel2mesh: generating 3d mesh models from single RGB images. In: Computer Vision—ECCV 2018—15th European Conference, Munich, Germany, September 8–14, 2018, Proceedings, Part XI, Lecture Notes in Computer Science, vol. 11215, 55–71 (2018)
Wang, Y., Aigerman, N., Kim, V.G., Chaudhuri, S., Sorkine-Hornung, O.: Neural cages for detail-preserving 3d deformations. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13–19, 2020, 72–80 (2020)
Worchel, M., Diaz, R., Hu, W., Schreer, O., Feldmann, I., Eisert, P.: Multi-view mesh reconstruction with neural deferred shading. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18–24, 2022, 6177–6187 (2022)
Yang, G., Huang, X., Hao, Z., Liu, M., Belongie, S.J., Hariharan, B.: Pointflow: 3d point cloud generation with continuous normalizing flows. In: 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27–November 2, 2019, 4540–4549 (2019)
Yariv, L., Gu, J., Kasten, Y., Lipman, Y.: Volume rendering of neural implicit surfaces. In: Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6–14, 2021, virtual, 4805–4815 (2021)
Yümer ME, Chaudhuri S, Hodgins JK, and Kara LB Semantic shape editing using deformation handles ACM Trans. Graph. 2015 34 4 1-12
Zhao, M., Zhao, C., Liang, X., Li, L., Zhao, Z., Hu, Z., Fan, C., Yu, X.: Efficientdreamer: high-fidelity and robust 3d creation via orthogonal-view diffusion prior. arXiv:2308.13223 (2023)
Zheng, M., Zhou, Y., Ceylan, D., Barbic, J.: A deep emulator for secondary motion of 3d characters. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19–25, 2021, 5932–5940 (2021)



Information & Contributors


Published In

cover image The Visual Computer: International Journal of Computer Graphics
The Visual Computer: International Journal of Computer Graphics  Volume 40, Issue 7
Jul 2024
502 pages



Berlin, Heidelberg

Publication History

Published: 25 May 2024
Accepted: 04 May 2024

Author Tags

  1. Diffusion model
  2. Mesh deformation
  3. Score Distillation Sampling


  • Research-article

Funding Sources


Other Metrics

Bibliometrics & Citations


Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 23 Dec 2024

Other Metrics


View Options

View options







Share this Publication link

Share on social media