Learning to Act from Actionless Videos through Dense Correspondences

Ko, Po-Chen; Mao, Jiayuan; Du, Yilun; Sun, Shao-Hua; Tenenbaum, Joshua B.

Computer Science > Robotics

arXiv:2310.08576 (cs)

[Submitted on 12 Oct 2023]

Title:Learning to Act from Actionless Videos through Dense Correspondences

Authors:Po-Chen Ko, Jiayuan Mao, Yilun Du, Shao-Hua Sun, Joshua B. Tenenbaum

View PDF

Abstract:In this work, we present an approach to construct a video-based robot policy capable of reliably executing diverse tasks across different robots and environments from few video demonstrations without using any action annotations. Our method leverages images as a task-agnostic representation, encoding both the state and action information, and text as a general representation for specifying robot goals. By synthesizing videos that ``hallucinate'' robot executing actions and in combination with dense correspondences between frames, our approach can infer the closed-formed action to execute to an environment without the need of any explicit action labels. This unique capability allows us to train the policy solely based on RGB videos and deploy learned policies to various robotic tasks. We demonstrate the efficacy of our approach in learning policies on table-top manipulation and navigation tasks. Additionally, we contribute an open-source framework for efficient video modeling, enabling the training of high-fidelity policy models with four GPUs within a single day.

Comments:	Project page: this https URL
Subjects:	Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:2310.08576 [cs.RO]
	(or arXiv:2310.08576v1 [cs.RO] for this version)
	https://doi.org/10.48550/arXiv.2310.08576

Submission history

From: Jiayuan Mao [view email]
[v1] Thu, 12 Oct 2023 17:59:23 UTC (10,131 KB)

Computer Science > Robotics

Title:Learning to Act from Actionless Videos through Dense Correspondences

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Robotics

Title:Learning to Act from Actionless Videos through Dense Correspondences

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators