Conditional Video Generation Using Action-Appearance Captions

Yamamoto, Shohei; Tejero-de-Pablos, Antonio; Ushiku, Yoshitaka; Harada, Tatsuya

Computer Science > Computer Vision and Pattern Recognition

arXiv:1812.01261v2 (cs)

[Submitted on 4 Dec 2018 (v1), last revised 5 Dec 2018 (this version, v2)]

Title:Conditional Video Generation Using Action-Appearance Captions

Authors:Shohei Yamamoto, Antonio Tejero-de-Pablos, Yoshitaka Ushiku, Tatsuya Harada

View PDF

Abstract:The field of automatic video generation has received a boost thanks to the recent Generative Adversarial Networks (GANs). However, most existing methods cannot control the contents of the generated video using a text caption, losing their usefulness to a large extent. This particularly affects human videos due to their great variety of actions and appearances. This paper presents Conditional Flow and Texture GAN (CFT-GAN), a GAN-based video generation method from action-appearance captions. We propose a novel way of generating video by encoding a caption (e.g., "a man in blue jeans is playing golf") in a two-stage generation pipeline. Our CFT-GAN uses such caption to generate an optical flow (action) and a texture (appearance) for each frame. As a result, the output video reflects the content specified in the caption in a plausible way. Moreover, to train our method, we constructed a new dataset for human video generation with captions. We evaluated the proposed method qualitatively and quantitatively via an ablation study and a user study. The results demonstrate that CFT-GAN is able to successfully generate videos containing the action and appearances indicated in the captions.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:1812.01261 [cs.CV]
	(or arXiv:1812.01261v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.1812.01261

Submission history

From: Antonio Tejero-de-Pablos [view email]
[v1] Tue, 4 Dec 2018 07:54:39 UTC (3,706 KB)
[v2] Wed, 5 Dec 2018 04:19:27 UTC (3,706 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Conditional Video Generation Using Action-Appearance Captions

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Conditional Video Generation Using Action-Appearance Captions

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators