Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

FusionDeformer: text-guided mesh deformation using diffusion models

Published: 25 May 2024 Publication History

Abstract

Mesh deformation has a wide range of applications, including character creation, geometry modelling, deforming animation, and morphing. Recently, mesh deformation methods based on CLIP models demonstrated the ability to perform automatic text-guided mesh deformation. However, using 2D guidance to deform a 3D mesh attempts to solve an ill-posed problem and leads to distortion and unsmoothness, which cannot be eliminated by CLIP-based methods because they focus on semantic-aware features and cannot identify these artefacts. To this end, we propose FusionDeformer, a novel automatic text-guided mesh deformation method that leverages diffusion models. The deformation is achieved by Score Distillation Sampling, which minimizes the KL-divergence between the distribution of rendered deformed mesh and the text-conditioned distribution. To alleviate the intrinsic ill-posed problem, we incorporate two approaches into our framework. The first approach involves combining multiple orthogonal views into a single image, providing robust deformation while avoiding the need for additional memory. The second approach incorporates a new regularization to address the unsmooth artefacts. Our experimental results show that the proposed method can generate high-quality, smoothly deformed meshes that align precisely with the input text description while preserving the topological relationships. Additionally, our method offers a text2morphing approach to animation design, enabling common users to produce special effects animation.

References

[1]
Aigerman N, Gupta K, Kim VG, Chaudhuri S, Saito J, and Groueix T Neural Jacobian fields: learning intrinsic mappings of arbitrary meshes ACM Trans. Graph. 2022 41 4 109:1-109:17
[2]
Bailey SW, Omens D, Dilorenzo P, and O’Brien JF Fast and deep facial deformations ACM Trans. Graph. 2020 39 4 94
[3]
Barron, J.T., Mildenhall, B., Tancik, M., Hedman, P., Martin-Brualla, R., Srinivasan, P.P.: Mip-nerf: a multiscale representation for anti-aliasing neural radiance fields. In: 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10–17, 2021, 5835–5844 (2021)
[4]
Cao, Y., Cao, Y.P., Han, K., Shan, Y., Wong, K.Y.K.: Guide3D: Create 3D Avatars from Text and Image Guidance. arXiv:2308.09705 (2023)
[5]
Chen, R., Chen, Y., Jiao, N., Jia, K.: Fantasia3d: disentangling geometry and appearance for high-quality text-to-3d content creation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 22,246–22,256 (2023)
[6]
Chen, Z., Zhang, H.: Learning implicit fields for generative shape modeling. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16–20, 2019, 5939–5948 (2019)
[7]
Gadelha, M., Maji, S., Wang, R.: 3d shape induction from 2d views of multiple objects. In: 2017 International Conference on 3D Vision, 3DV 2017, Qingdao, China, October 10–12, 2017, 402–411 (2017)
[8]
Gal R, Sorkine O, Mitra NJ, and Cohen-Or D iwires: an analyze-and-edit approach to shape manipulation ACM Trans. Graph. 2009 28 3 33
[9]
Gao, W., Aigerman, N., Groueix, T., Kim, V., Hanocka, R.: Textdeformer: geometry manipulation using text guidance. In: ACM SIGGRAPH 2023 Conference Proceedings, SIGGRAPH 2023, Los Angeles, CA, USA, August 6-10, 2023, 82:1–82:11 (2023)
[10]
Han, X., Cao, Y., Han, K., Zhu, X., Deng, J., Song, Y.Z., Xiang, T., Wong, K.Y.K.: Headsculpt: Crafting 3d head avatars with text. Advances in Neural Information Processing Systems 36 (2024)
[11]
Hanocka R, Fish N, Wang Z, Giryes R, Fleishman S, and Cohen-Or D Alignet: Partial-shape agnostic alignment via unsupervised learning ACM Trans. Graph. 2019 38 1 1:1-1:14
[12]
Henzler, P., Mitra, N.J., Ritschel, T.: Escaping plato’s cave: 3d shape from adversarial rendering. In: 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27–November 2, 2019, 9983–9992 (2019)
[13]
Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. In: Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6–12, 2020, virtual, 6840–6851 (2020)
[14]
Huang, Q., Huang, X., Sun, B., Zhang, Z., Jiang, J., Bajaj, C.: Arapreg: An as-rigid-as possible regularization loss for learning deformable shape generators. In: 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10–17, 2021, 5795–5805 (2021)
[15]
Huang, S., Yang, Z., Li, L., Yang, Y., Jia, J.: Avatarfusion: Zero-shot generation of clothing-decoupled 3d avatars using 2d diffusion. In: Proceedings of the 31st ACM International Conference on Multimedia, MM 2023, Ottawa, ON, Canada, 29 October 2023- 3 November 2023, 5734–5745 (2023)
[16]
Huang, Y., Yi, H., Xiu, Y., Liao, T., Tang, J., Cai, D., Thies, J.: TeCH: text-guided reconstruction of lifelike clothed humans. In: International Conference on 3D Vision (3DV) (2024)
[17]
Jacobson, A.: Algorithms and interfaces for real-time deformation of 2d and 3d shapes. Ph.D. thesis, ETH Zurich (2013)
[18]
Jain, A., Mildenhall, B., Barron, J.T., Abbeel, P., Poole, B.: Zero-shot text-guided object generation with dream fields. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18–24, 2022, 857–866 (2022)
[19]
Jakab, T., Tucker, R., Makadia, A., Wu, J., Snavely, N., Kanazawa, A.: Keypointdeformer: Unsupervised 3d keypoint discovery for shape control. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19–25, 2021, 12,783–12,792 (2021)
[20]
Kanazawa, A., Tulsiani, S., Efros, A.A., Malik, J.: Learning category-specific mesh reconstruction from image collections. In: Computer Vision—ECCV 2018—15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part XV, Lecture Notes in Computer Science, vol. 11219, 386–402 (2018)
[21]
Khalid, N.M., Xie, T., Belilovsky, E., Popa, T.: Clip-mesh: Generating textured meshes from text using pretrained image-text models. In: SIGGRAPH Asia 2022 Conference Papers, SA 2022, Daegu, Republic of Korea, December 6–9, 2022, 25:1–25:8 (2022)
[22]
Kim, B., Kwon, P., Lee, K., Lee, M., Han, S., Kim, D., Joo, H.: Chupa: carving 3d clothed humans from skinned shape priors using 2d diffusion probabilistic models. In: IEEE/CVF International Conference on Computer Vision, ICCV 2023, Paris, France, October 1-6, 2023, 15,919–15,930 (2023)
[23]
Kraevoy V and Sheffer A Cross-parameterization and compatible remeshing of 3d models ACM Trans. Graph. 2004 23 3 861-869
[24]
Laine S, Hellsten J, Karras T, Seol Y, Lehtinen J, and Aila T Modular primitives for high-performance differentiable rendering ACM Trans. Graph. 2020 39 6 194:1-194:14
[25]
Li, W., Chen, R., Chen, X., Tan, P.: Sweetdreamer: aligning geometric priors in 2d diffusion for consistent text-to-3d. (2023) arXiv preprint arXiv:2310.02596
[26]
Lin, C., Gao, J., Tang, L., Takikawa, T., Zeng, X., Huang, X., Kreis, K., Fidler, S., Liu, M., Lin, T.: Magic3d: high-resolution text-to-3d content creation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, June 17–24, 2023, 300–309 (2023)
[27]
Luo, S., Hu, W.: Diffusion probabilistic models for 3d point cloud generation. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19–25, 2021, 2837–2845 (2021)
[28]
Mescheder, L.M., Oechsle, M., Niemeyer, M., Nowozin, S., Geiger, A.: Occupancy networks: learning 3d reconstruction in function space. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16–20, 2019, 4460–4470 (2019)
[29]
Michel, O., Bar-On, R., Liu, R., Benaim, S., Hanocka, R.: Text2mesh: text-driven neural stylization for meshes. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18–24, 2022, 13,482–13,492 (2022)
[30]
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: Nerf: representing scenes as neural radiance fields for view synthesis. In: Computer Vision—ECCV 2020—16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I, Lecture Notes in Computer Science, vol. 12346, 405–421 (2020)
[31]
Mo K, Guerrero P, Yi L, Su H, Wonka P, Mitra NJ, and Guibas LJ Structurenet: hierarchical graph networks for 3d shape generation ACM Trans. Graph. 2019 38 6 242:1-242:19
[32]
Nichol, A.Q., Dhariwal, P.: Improved denoising diffusion probabilistic models. In: Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18–24 July 2021, Virtual Event, Proceedings of Machine Learning Research, vol. 139, 8162–8171 (2021)
[33]
von Platen, P., Patil, S., Lozhkov, A., Cuenca, P., Lambert, N., Rasul, K., Davaadorj, M., Wolf, T.: Diffusers: State-of-the-art diffusion models. https://github.com/huggingface/diffusers (2022)
[34]
Poole, B., Jain, A., Barron, J.T., Mildenhall, B.: Dreamfusion: Text-to-3d using 2d diffusion. In: The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1–5, 2023 (2023)
[35]
Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., Sutskever, I.: Learning transferable visual models from natural language supervision. In: Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, Proceedings of Machine Learning Research, vol. 139, 8748–8763 (2021)
[36]
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18–24, 2022, 10,674–10,685 (2022)
[37]
Romero C, Casas D, Pérez J, and Otaduy MA Learning contact corrections for handle-based subspace dynamics ACM Trans. Graph. 2021 40 4 131:1-131:12
[38]
Saharia, C., Chan, W., Saxena, S., Li, L., Whang, J., Denton, E.L., Ghasemipour, S.K.S., Lopes, R.G., Ayan, B.K., Salimans, T., Ho, J., Fleet, D.J., Norouzi, M.: Photorealistic text-to-image diffusion models with deep language understanding. In: Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28–December 9, 2022, 36,479–36,494 (2022)
[39]
Sanghi, A., Chu, H., Lambourne, J.G., Wang, Y., Cheng, C., Fumero, M., Malekshan, K.R.: Clip-forge: towards zero-shot text-to-shape generation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022, 18,582–18,592 (2022)
[40]
Shi, Y., Wang, P., Ye, J., Long, M., Li, K., Yang, X.: Mvdream: multi-view diffusion for 3d generation. arXiv preprint arXiv:2308.16512 (2023)
[41]
Sorkine, O., Botsch, M.: Interactive shape modeling and deformation. In: Eurographics (Tutorials), 11–37 (2009)
[42]
Stan, G.B.M., Wofk, D., Fox, S., Redden, A., Saxton, W., Yu, J., Aflalo, E., Tseng, S.Y., Nonato, F., Muller, M., et al.: Ldm3d: latent diffusion model for 3d. arXiv preprint arXiv:2305.10853 (2023)
[43]
Tan, Q., Gao, L., Lai, Y., Xia, S.: Variational autoencoders for deforming 3d mesh models. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, 5841–5850 (2018)
[44]
Tang, J., Zhou, H., Chen, X., Hu, T., Ding, E., Wang, J., Zeng, G.: Delicate textured mesh recovery from nerf via adaptive surface refinement. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 17,739–17,749 (2023)
[45]
Tsalicoglou, C., Manhardt, F., Tonioni, A., Niemeyer, M., Tombari, F.: Textmesh: Generation of realistic 3d meshes from text prompts. arXiv preprint arXiv:2304.12439 (2023)
[46]
Wang, N., Zhang, Y., Li, Z., Fu, Y., Liu, W., Jiang, Y.: Pixel2mesh: generating 3d mesh models from single RGB images. In: Computer Vision—ECCV 2018—15th European Conference, Munich, Germany, September 8–14, 2018, Proceedings, Part XI, Lecture Notes in Computer Science, vol. 11215, 55–71 (2018)
[47]
Wang, Y., Aigerman, N., Kim, V.G., Chaudhuri, S., Sorkine-Hornung, O.: Neural cages for detail-preserving 3d deformations. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13–19, 2020, 72–80 (2020)
[48]
Worchel, M., Diaz, R., Hu, W., Schreer, O., Feldmann, I., Eisert, P.: Multi-view mesh reconstruction with neural deferred shading. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18–24, 2022, 6177–6187 (2022)
[49]
Yang, G., Huang, X., Hao, Z., Liu, M., Belongie, S.J., Hariharan, B.: Pointflow: 3d point cloud generation with continuous normalizing flows. In: 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27–November 2, 2019, 4540–4549 (2019)
[50]
Yariv, L., Gu, J., Kasten, Y., Lipman, Y.: Volume rendering of neural implicit surfaces. In: Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6–14, 2021, virtual, 4805–4815 (2021)
[51]
Yümer ME, Chaudhuri S, Hodgins JK, and Kara LB Semantic shape editing using deformation handles ACM Trans. Graph. 2015 34 4 1-12
[52]
Zhao, M., Zhao, C., Liang, X., Li, L., Zhao, Z., Hu, Z., Fan, C., Yu, X.: Efficientdreamer: high-fidelity and robust 3d creation via orthogonal-view diffusion prior. arXiv:2308.13223 (2023)
[53]
Zheng, M., Zhou, Y., Ceylan, D., Barbic, J.: A deep emulator for secondary motion of 3d characters. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19–25, 2021, 5932–5940 (2021)

Recommendations

Comments

Information & Contributors

Information

Published In

cover image The Visual Computer: International Journal of Computer Graphics
The Visual Computer: International Journal of Computer Graphics  Volume 40, Issue 7
Jul 2024
502 pages

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 25 May 2024
Accepted: 04 May 2024

Author Tags

  1. Diffusion model
  2. Mesh deformation
  3. Score Distillation Sampling

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 23 Dec 2024

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media