SINC: Spatial Composition of 3D Human Motions for Simultaneous Action Generation

Athanasiou, Nikos; Petrovich, Mathis; Black, Michael J.; Varol, Gül

Computer Science > Computer Vision and Pattern Recognition

arXiv:2304.10417v3 (cs)

[Submitted on 20 Apr 2023 (v1), last revised 26 Mar 2024 (this version, v3)]

Title:SINC: Spatial Composition of 3D Human Motions for Simultaneous Action Generation

Authors:Nikos Athanasiou, Mathis Petrovich, Michael J. Black, Gül Varol

View PDF HTML (experimental)

Abstract:Our goal is to synthesize 3D human motions given textual inputs describing simultaneous actions, for example 'waving hand' while 'walking' at the same time. We refer to generating such simultaneous movements as performing 'spatial compositions'. In contrast to temporal compositions that seek to transition from one action to another, spatial compositing requires understanding which body parts are involved in which action, to be able to move them simultaneously. Motivated by the observation that the correspondence between actions and body parts is encoded in powerful language models, we extract this knowledge by prompting GPT-3 with text such as "what are the body parts involved in the action <action name>?", while also providing the parts list and few-shot examples. Given this action-part mapping, we combine body parts from two motions together and establish the first automated method to spatially compose two actions. However, training data with compositional actions is always limited by the combinatorics. Hence, we further create synthetic data with this approach, and use it to train a new state-of-the-art text-to-motion generation model, called SINC ("SImultaneous actioN Compositions for 3D human motions"). In our experiments, that training with such GPT-guided synthetic data improves spatial composition generation over baselines. Our code is publicly available at this https URL.

Comments:	Teaser Fixed
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2304.10417 [cs.CV]
	(or arXiv:2304.10417v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2304.10417

Submission history

From: Nikos Athanasiou [view email]
[v1] Thu, 20 Apr 2023 16:01:55 UTC (2,642 KB)
[v2] Sat, 19 Aug 2023 20:34:13 UTC (3,136 KB)
[v3] Tue, 26 Mar 2024 13:16:02 UTC (7,848 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:SINC: Spatial Composition of 3D Human Motions for Simultaneous Action Generation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:SINC: Spatial Composition of 3D Human Motions for Simultaneous Action Generation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators