SceneCraft: An LLM Agent for Synthesizing 3D Scene as Blender Code

Hu, Ziniu; Iscen, Ahmet; Jain, Aashi; Kipf, Thomas; Yue, Yisong; Ross, David A.; Schmid, Cordelia; Fathi, Alireza

Computer Science > Computer Vision and Pattern Recognition

arXiv:2403.01248 (cs)

[Submitted on 2 Mar 2024]

Title:SceneCraft: An LLM Agent for Synthesizing 3D Scene as Blender Code

Authors:Ziniu Hu, Ahmet Iscen, Aashi Jain, Thomas Kipf, Yisong Yue, David A. Ross, Cordelia Schmid, Alireza Fathi

View PDF HTML (experimental)

Abstract:This paper introduces SceneCraft, a Large Language Model (LLM) Agent converting text descriptions into Blender-executable Python scripts which render complex scenes with up to a hundred 3D assets. This process requires complex spatial planning and arrangement. We tackle these challenges through a combination of advanced abstraction, strategic planning, and library learning. SceneCraft first models a scene graph as a blueprint, detailing the spatial relationships among assets in the scene. SceneCraft then writes Python scripts based on this graph, translating relationships into numerical constraints for asset layout. Next, SceneCraft leverages the perceptual strengths of vision-language foundation models like GPT-V to analyze rendered images and iteratively refine the scene. On top of this process, SceneCraft features a library learning mechanism that compiles common script functions into a reusable library, facilitating continuous self-improvement without expensive LLM parameter tuning. Our evaluation demonstrates that SceneCraft surpasses existing LLM-based agents in rendering complex scenes, as shown by its adherence to constraints and favorable human assessments. We also showcase the broader application potential of SceneCraft by reconstructing detailed 3D scenes from the Sintel movie and guiding a video generative model with generated scenes as intermediary control signal.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2403.01248 [cs.CV]
	(or arXiv:2403.01248v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2403.01248

Submission history

From: Ziniu Hu [view email]
[v1] Sat, 2 Mar 2024 16:16:26 UTC (41,902 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:SceneCraft: An LLM Agent for Synthesizing 3D Scene as Blender Code

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:SceneCraft: An LLM Agent for Synthesizing 3D Scene as Blender Code

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators