DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation

Zhao, Guosheng; Wang, Xiaofeng; Zhu, Zheng; Chen, Xinze; Huang, Guan; Bao, Xiaoyi; Wang, Xingang

Computer Science > Computer Vision and Pattern Recognition

arXiv:2403.06845 (cs)

[Submitted on 11 Mar 2024 (v1), last revised 11 Apr 2024 (this version, v2)]

Title:DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation

Authors:Guosheng Zhao, Xiaofeng Wang, Zheng Zhu, Xinze Chen, Guan Huang, Xiaoyi Bao, Xingang Wang

View PDF

Abstract:World models have demonstrated superiority in autonomous driving, particularly in the generation of multi-view driving videos. However, significant challenges still exist in generating customized driving videos. In this paper, we propose DriveDreamer-2, which builds upon the framework of DriveDreamer and incorporates a Large Language Model (LLM) to generate user-defined driving videos. Specifically, an LLM interface is initially incorporated to convert a user's query into agent trajectories. Subsequently, a HDMap, adhering to traffic regulations, is generated based on the trajectories. Ultimately, we propose the Unified Multi-View Model to enhance temporal and spatial coherence in the generated driving videos. DriveDreamer-2 is the first world model to generate customized driving videos, it can generate uncommon driving videos (e.g., vehicles abruptly cut in) in a user-friendly manner. Besides, experimental results demonstrate that the generated videos enhance the training of driving perception methods (e.g., 3D detection and tracking). Furthermore, video generation quality of DriveDreamer-2 surpasses other state-of-the-art methods, showcasing FID and FVD scores of 11.2 and 55.7, representing relative improvements of 30% and 50%.

Comments:	Project Page: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2403.06845 [cs.CV]
	(or arXiv:2403.06845v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2403.06845

Submission history

From: Guosheng Zhao [view email]
[v1] Mon, 11 Mar 2024 16:03:35 UTC (3,342 KB)
[v2] Thu, 11 Apr 2024 04:17:13 UTC (4,644 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators