research-article

FusionDeformer: text-guided mesh deformation using diffusion models

Authors:

Xiaogang JinAuthors Info & Claims

The Visual Computer, Volume 40, Issue 7

Pages 4701 - 4712

https://doi.org/10.1007/s00371-024-03463-7

Published: 25 May 2024 Publication History

Abstract

Mesh deformation has a wide range of applications, including character creation, geometry modelling, deforming animation, and morphing. Recently, mesh deformation methods based on CLIP models demonstrated the ability to perform automatic text-guided mesh deformation. However, using 2D guidance to deform a 3D mesh attempts to solve an ill-posed problem and leads to distortion and unsmoothness, which cannot be eliminated by CLIP-based methods because they focus on semantic-aware features and cannot identify these artefacts. To this end, we propose FusionDeformer, a novel automatic text-guided mesh deformation method that leverages diffusion models. The deformation is achieved by Score Distillation Sampling, which minimizes the KL-divergence between the distribution of rendered deformed mesh and the text-conditioned distribution. To alleviate the intrinsic ill-posed problem, we incorporate two approaches into our framework. The first approach involves combining multiple orthogonal views into a single image, providing robust deformation while avoiding the need for additional memory. The second approach incorporates a new regularization to address the unsmooth artefacts. Our experimental results show that the proposed method can generate high-quality, smoothly deformed meshes that align precisely with the input text description while preserving the topological relationships. Additionally, our method offers a text2morphing approach to animation design, enabling common users to produce special effects animation.

References

[1]

Aigerman N, Gupta K, Kim VG, Chaudhuri S, Saito J, and Groueix T Neural Jacobian fields: learning intrinsic mappings of arbitrary meshes ACM Trans. Graph. 2022 41 4 109:1-109:17

Digital Library

[2]

Bailey SW, Omens D, Dilorenzo P, and O’Brien JF Fast and deep facial deformations ACM Trans. Graph. 2020 39 4 94

Digital Library

[3]

Barron, J.T., Mildenhall, B., Tancik, M., Hedman, P., Martin-Brualla, R., Srinivasan, P.P.: Mip-nerf: a multiscale representation for anti-aliasing neural radiance fields. In: 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10–17, 2021, 5835–5844 (2021)

[4]

Cao, Y., Cao, Y.P., Han, K., Shan, Y., Wong, K.Y.K.: Guide3D: Create 3D Avatars from Text and Image Guidance. arXiv:2308.09705 (2023)

[5]

Chen, R., Chen, Y., Jiao, N., Jia, K.: Fantasia3d: disentangling geometry and appearance for high-quality text-to-3d content creation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 22,246–22,256 (2023)

[6]

Chen, Z., Zhang, H.: Learning implicit fields for generative shape modeling. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16–20, 2019, 5939–5948 (2019)

[7]

Gadelha, M., Maji, S., Wang, R.: 3d shape induction from 2d views of multiple objects. In: 2017 International Conference on 3D Vision, 3DV 2017, Qingdao, China, October 10–12, 2017, 402–411 (2017)

[8]

Gal R, Sorkine O, Mitra NJ, and Cohen-Or D iwires: an analyze-and-edit approach to shape manipulation ACM Trans. Graph. 2009 28 3 33

Digital Library

[9]

Gao, W., Aigerman, N., Groueix, T., Kim, V., Hanocka, R.: Textdeformer: geometry manipulation using text guidance. In: ACM SIGGRAPH 2023 Conference Proceedings, SIGGRAPH 2023, Los Angeles, CA, USA, August 6-10, 2023, 82:1–82:11 (2023)

[10]

Han, X., Cao, Y., Han, K., Zhu, X., Deng, J., Song, Y.Z., Xiang, T., Wong, K.Y.K.: Headsculpt: Crafting 3d head avatars with text. Advances in Neural Information Processing Systems 36 (2024)

[11]

Hanocka R, Fish N, Wang Z, Giryes R, Fleishman S, and Cohen-Or D Alignet: Partial-shape agnostic alignment via unsupervised learning ACM Trans. Graph. 2019 38 1 1:1-1:14

Digital Library

[12]

Henzler, P., Mitra, N.J., Ritschel, T.: Escaping plato’s cave: 3d shape from adversarial rendering. In: 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27–November 2, 2019, 9983–9992 (2019)

[13]

Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. In: Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6–12, 2020, virtual, 6840–6851 (2020)

[14]

Huang, Q., Huang, X., Sun, B., Zhang, Z., Jiang, J., Bajaj, C.: Arapreg: An as-rigid-as possible regularization loss for learning deformable shape generators. In: 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10–17, 2021, 5795–5805 (2021)

[15]

Huang, S., Yang, Z., Li, L., Yang, Y., Jia, J.: Avatarfusion: Zero-shot generation of clothing-decoupled 3d avatars using 2d diffusion. In: Proceedings of the 31st ACM International Conference on Multimedia, MM 2023, Ottawa, ON, Canada, 29 October 2023- 3 November 2023, 5734–5745 (2023)

[16]

Huang, Y., Yi, H., Xiu, Y., Liao, T., Tang, J., Cai, D., Thies, J.: TeCH: text-guided reconstruction of lifelike clothed humans. In: International Conference on 3D Vision (3DV) (2024)

[17]

Jacobson, A.: Algorithms and interfaces for real-time deformation of 2d and 3d shapes. Ph.D. thesis, ETH Zurich (2013)

[18]

Jain, A., Mildenhall, B., Barron, J.T., Abbeel, P., Poole, B.: Zero-shot text-guided object generation with dream fields. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18–24, 2022, 857–866 (2022)

[19]

Jakab, T., Tucker, R., Makadia, A., Wu, J., Snavely, N., Kanazawa, A.: Keypointdeformer: Unsupervised 3d keypoint discovery for shape control. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19–25, 2021, 12,783–12,792 (2021)

[20]

Kanazawa, A., Tulsiani, S., Efros, A.A., Malik, J.: Learning category-specific mesh reconstruction from image collections. In: Computer Vision—ECCV 2018—15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part XV, Lecture Notes in Computer Science, vol. 11219, 386–402 (2018)

[21]

Khalid, N.M., Xie, T., Belilovsky, E., Popa, T.: Clip-mesh: Generating textured meshes from text using pretrained image-text models. In: SIGGRAPH Asia 2022 Conference Papers, SA 2022, Daegu, Republic of Korea, December 6–9, 2022, 25:1–25:8 (2022)

[22]

Kim, B., Kwon, P., Lee, K., Lee, M., Han, S., Kim, D., Joo, H.: Chupa: carving 3d clothed humans from skinned shape priors using 2d diffusion probabilistic models. In: IEEE/CVF International Conference on Computer Vision, ICCV 2023, Paris, France, October 1-6, 2023, 15,919–15,930 (2023)

[23]

Kraevoy V and Sheffer A Cross-parameterization and compatible remeshing of 3d models ACM Trans. Graph. 2004 23 3 861-869

Digital Library

[24]

Laine S, Hellsten J, Karras T, Seol Y, Lehtinen J, and Aila T Modular primitives for high-performance differentiable rendering ACM Trans. Graph. 2020 39 6 194:1-194:14

Digital Library

[25]

Li, W., Chen, R., Chen, X., Tan, P.: Sweetdreamer: aligning geometric priors in 2d diffusion for consistent text-to-3d. (2023) arXiv preprint arXiv:2310.02596

[26]

Lin, C., Gao, J., Tang, L., Takikawa, T., Zeng, X., Huang, X., Kreis, K., Fidler, S., Liu, M., Lin, T.: Magic3d: high-resolution text-to-3d content creation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, June 17–24, 2023, 300–309 (2023)

[27]

Luo, S., Hu, W.: Diffusion probabilistic models for 3d point cloud generation. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19–25, 2021, 2837–2845 (2021)

[28]

Mescheder, L.M., Oechsle, M., Niemeyer, M., Nowozin, S., Geiger, A.: Occupancy networks: learning 3d reconstruction in function space. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16–20, 2019, 4460–4470 (2019)

[29]

Michel, O., Bar-On, R., Liu, R., Benaim, S., Hanocka, R.: Text2mesh: text-driven neural stylization for meshes. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18–24, 2022, 13,482–13,492 (2022)

[30]

Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: Nerf: representing scenes as neural radiance fields for view synthesis. In: Computer Vision—ECCV 2020—16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I, Lecture Notes in Computer Science, vol. 12346, 405–421 (2020)

[31]

Mo K, Guerrero P, Yi L, Su H, Wonka P, Mitra NJ, and Guibas LJ Structurenet: hierarchical graph networks for 3d shape generation ACM Trans. Graph. 2019 38 6 242:1-242:19

Digital Library

[32]

Nichol, A.Q., Dhariwal, P.: Improved denoising diffusion probabilistic models. In: Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18–24 July 2021, Virtual Event, Proceedings of Machine Learning Research, vol. 139, 8162–8171 (2021)

[33]

von Platen, P., Patil, S., Lozhkov, A., Cuenca, P., Lambert, N., Rasul, K., Davaadorj, M., Wolf, T.: Diffusers: State-of-the-art diffusion models. https://github.com/huggingface/diffusers (2022)

[34]

Poole, B., Jain, A., Barron, J.T., Mildenhall, B.: Dreamfusion: Text-to-3d using 2d diffusion. In: The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1–5, 2023 (2023)

[35]

Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., Sutskever, I.: Learning transferable visual models from natural language supervision. In: Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, Proceedings of Machine Learning Research, vol. 139, 8748–8763 (2021)

[36]

Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18–24, 2022, 10,674–10,685 (2022)

[37]

Romero C, Casas D, Pérez J, and Otaduy MA Learning contact corrections for handle-based subspace dynamics ACM Trans. Graph. 2021 40 4 131:1-131:12

Digital Library

[38]

Saharia, C., Chan, W., Saxena, S., Li, L., Whang, J., Denton, E.L., Ghasemipour, S.K.S., Lopes, R.G., Ayan, B.K., Salimans, T., Ho, J., Fleet, D.J., Norouzi, M.: Photorealistic text-to-image diffusion models with deep language understanding. In: Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28–December 9, 2022, 36,479–36,494 (2022)

[39]

Sanghi, A., Chu, H., Lambourne, J.G., Wang, Y., Cheng, C., Fumero, M., Malekshan, K.R.: Clip-forge: towards zero-shot text-to-shape generation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022, 18,582–18,592 (2022)

[40]

Shi, Y., Wang, P., Ye, J., Long, M., Li, K., Yang, X.: Mvdream: multi-view diffusion for 3d generation. arXiv preprint arXiv:2308.16512 (2023)

[41]

Sorkine, O., Botsch, M.: Interactive shape modeling and deformation. In: Eurographics (Tutorials), 11–37 (2009)

[42]

Stan, G.B.M., Wofk, D., Fox, S., Redden, A., Saxton, W., Yu, J., Aflalo, E., Tseng, S.Y., Nonato, F., Muller, M., et al.: Ldm3d: latent diffusion model for 3d. arXiv preprint arXiv:2305.10853 (2023)

[43]

Tan, Q., Gao, L., Lai, Y., Xia, S.: Variational autoencoders for deforming 3d mesh models. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, 5841–5850 (2018)

[44]

Tang, J., Zhou, H., Chen, X., Hu, T., Ding, E., Wang, J., Zeng, G.: Delicate textured mesh recovery from nerf via adaptive surface refinement. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 17,739–17,749 (2023)

[45]

Tsalicoglou, C., Manhardt, F., Tonioni, A., Niemeyer, M., Tombari, F.: Textmesh: Generation of realistic 3d meshes from text prompts. arXiv preprint arXiv:2304.12439 (2023)

[46]

Wang, N., Zhang, Y., Li, Z., Fu, Y., Liu, W., Jiang, Y.: Pixel2mesh: generating 3d mesh models from single RGB images. In: Computer Vision—ECCV 2018—15th European Conference, Munich, Germany, September 8–14, 2018, Proceedings, Part XI, Lecture Notes in Computer Science, vol. 11215, 55–71 (2018)

[47]

Wang, Y., Aigerman, N., Kim, V.G., Chaudhuri, S., Sorkine-Hornung, O.: Neural cages for detail-preserving 3d deformations. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13–19, 2020, 72–80 (2020)

[48]

Worchel, M., Diaz, R., Hu, W., Schreer, O., Feldmann, I., Eisert, P.: Multi-view mesh reconstruction with neural deferred shading. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18–24, 2022, 6177–6187 (2022)

[49]

Yang, G., Huang, X., Hao, Z., Liu, M., Belongie, S.J., Hariharan, B.: Pointflow: 3d point cloud generation with continuous normalizing flows. In: 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27–November 2, 2019, 4540–4549 (2019)

[50]

Yariv, L., Gu, J., Kasten, Y., Lipman, Y.: Volume rendering of neural implicit surfaces. In: Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6–14, 2021, virtual, 4805–4815 (2021)

[51]

Yümer ME, Chaudhuri S, Hodgins JK, and Kara LB Semantic shape editing using deformation handles ACM Trans. Graph. 2015 34 4 1-12

Digital Library

[52]

Zhao, M., Zhao, C., Liang, X., Li, L., Zhao, Z., Hu, Z., Fan, C., Yu, X.: Efficientdreamer: high-fidelity and robust 3d creation via orthogonal-view diffusion prior. arXiv:2308.13223 (2023)

[53]

Zheng, M., Zhou, Y., Ceylan, D., Barbic, J.: A deep emulator for secondary motion of 3d characters. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19–25, 2021, 5932–5940 (2021)

Recommendations

As-rigid-as-possible mesh deformation and its application in hexahedral mesh generation

This paper presents an efficient and stable as-rigid-as-possible mesh deformation algorithm for planar shape deformation and hexahedral mesh generation. The deformation algorithm aims to preserve two local geometric properties: scale-invariant intrinsic ...
Shape from silhouette using topology-adaptive mesh deformation

We present a computationally efficient and robust shape from silhouette method based on topology-adaptive mesh deformation, which can produce accurate, smooth, and topologically consistent 3D mesh models of complex real objects. The deformation scheme ...
Hierarchical mesh deformation with shape preservation

It is very difficult to deform flexible objects in computer animation. This paper presents a novel approach to address this problem. A detail-sensitive and deformation-sensitive simplification is first conducted on the original mesh. The simplified mesh ...

Comments

Information & Contributors

Information

Published In

cover image The Visual Computer: International Journal of Computer Graphics

The Visual Computer: International Journal of Computer Graphics Volume 40, Issue 7

Jul 2024

502 pages

Issue’s Table of Contents

© The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2024. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. corrected publication 2024.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 25 May 2024

Accepted: 04 May 2024

Author Tags

Qualifiers

Research-article

Funding Sources

Key Research and Development Program of Zhejiang Province

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 23 Dec 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents