Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–24 of 24 results for author: Pertsch, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.09246  [pdf, other

    cs.RO cs.LG

    OpenVLA: An Open-Source Vision-Language-Action Model

    Authors: Moo Jin Kim, Karl Pertsch, Siddharth Karamcheti, Ted Xiao, Ashwin Balakrishna, Suraj Nair, Rafael Rafailov, Ethan Foster, Grace Lam, Pannag Sanketi, Quan Vuong, Thomas Kollar, Benjamin Burchfiel, Russ Tedrake, Dorsa Sadigh, Sergey Levine, Percy Liang, Chelsea Finn

    Abstract: Large policies pretrained on a combination of Internet-scale vision-language data and diverse robot demonstrations have the potential to change how we teach robots new skills: rather than training new behaviors from scratch, we can fine-tune such vision-language-action (VLA) models to obtain robust, generalizable policies for visuomotor control. Yet, widespread adoption of VLAs for robotics has be… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Website: https://openvla.github.io/

  2. arXiv:2405.12213  [pdf, other

    cs.RO cs.LG

    Octo: An Open-Source Generalist Robot Policy

    Authors: Octo Model Team, Dibya Ghosh, Homer Walke, Karl Pertsch, Kevin Black, Oier Mees, Sudeep Dasari, Joey Hejna, Tobias Kreiman, Charles Xu, Jianlan Luo, You Liang Tan, Lawrence Yunliang Chen, Pannag Sanketi, Quan Vuong, Ted Xiao, Dorsa Sadigh, Chelsea Finn, Sergey Levine

    Abstract: Large policies pretrained on diverse robot datasets have the potential to transform robotic learning: instead of training new policies from scratch, such generalist robot policies may be finetuned with only a little in-domain data, yet generalize broadly. However, to be widely applicable across a range of robotic learning scenarios, environments, and tasks, such policies need to handle diverse sen… ▽ More

    Submitted 26 May, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

    Comments: Project website: https://octo-models.github.io

  3. arXiv:2405.05941  [pdf, other

    cs.RO cs.CV cs.LG

    Evaluating Real-World Robot Manipulation Policies in Simulation

    Authors: Xuanlin Li, Kyle Hsu, Jiayuan Gu, Karl Pertsch, Oier Mees, Homer Rich Walke, Chuyuan Fu, Ishikaa Lunawat, Isabel Sieh, Sean Kirmani, Sergey Levine, Jiajun Wu, Chelsea Finn, Hao Su, Quan Vuong, Ted Xiao

    Abstract: The field of robotics has made significant advances towards generalist robot manipulation policies. However, real-world evaluation of such policies is not scalable and faces reproducibility challenges, which are likely to worsen as policies broaden the spectrum of tasks they can perform. We identify control and visual disparities between real and simulated environments as key challenges for reliab… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  4. arXiv:2403.12945  [pdf, other

    cs.RO

    DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset

    Authors: Alexander Khazatsky, Karl Pertsch, Suraj Nair, Ashwin Balakrishna, Sudeep Dasari, Siddharth Karamcheti, Soroush Nasiriany, Mohan Kumar Srirama, Lawrence Yunliang Chen, Kirsty Ellis, Peter David Fagan, Joey Hejna, Masha Itkina, Marion Lepert, Yecheng Jason Ma, Patrick Tree Miller, Jimmy Wu, Suneel Belkhale, Shivin Dass, Huy Ha, Arhan Jain, Abraham Lee, Youngwoon Lee, Marius Memmel, Sungjae Park , et al. (74 additional authors not shown)

    Abstract: The creation of large, diverse, high-quality robot manipulation datasets is an important stepping stone on the path toward more capable and robust robotic manipulation policies. However, creating such datasets is challenging: collecting robot manipulation data in diverse environments poses logistical and safety challenges and requires substantial investments in hardware and human labour. As a resu… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: Project website: https://droid-dataset.github.io/

  5. arXiv:2403.12910  [pdf, other

    cs.RO cs.AI cs.LG

    Yell At Your Robot: Improving On-the-Fly from Language Corrections

    Authors: Lucy Xiaoyang Shi, Zheyuan Hu, Tony Z. Zhao, Archit Sharma, Karl Pertsch, Jianlan Luo, Sergey Levine, Chelsea Finn

    Abstract: Hierarchical policies that combine language and low-level control have been shown to perform impressively long-horizon robotic tasks, by leveraging either zero-shot high-level planners like pretrained language and vision-language models (LLMs/VLMs) or models trained on annotated robotic demonstrations. However, for complex and dexterous skills, attaining high success rates on long-horizon tasks st… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: Project website: https://yay-robot.github.io/

  6. arXiv:2312.08958  [pdf, other

    cs.LG cs.AI cs.RO

    LiFT: Unsupervised Reinforcement Learning with Foundation Models as Teachers

    Authors: Taewook Nam, Juyong Lee, Jesse Zhang, Sung Ju Hwang, Joseph J. Lim, Karl Pertsch

    Abstract: We propose a framework that leverages foundation models as teachers, guiding a reinforcement learning agent to acquire semantically meaningful behavior without human feedback. In our framework, the agent receives task instructions grounded in a training environment from large language models. Then, a vision-language model guides the agent in learning the multi-task language-conditioned policy by p… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

    Comments: 2nd Workshop on Agent Learning in Open-Endedness (ALOE) at NeurIPS 2023

  7. arXiv:2310.10021  [pdf, other

    cs.RO cs.AI cs.LG

    Bootstrap Your Own Skills: Learning to Solve New Tasks with Large Language Model Guidance

    Authors: Jesse Zhang, Jiahui Zhang, Karl Pertsch, Ziyi Liu, Xiang Ren, Minsuk Chang, Shao-Hua Sun, Joseph J. Lim

    Abstract: We propose BOSS, an approach that automatically learns to solve new long-horizon, complex, and meaningful tasks by growing a learned skill library with minimal supervision. Prior work in reinforcement learning require expert supervision, in the form of demonstrations or rich reward functions, to learn long-horizon tasks. Instead, our approach BOSS (BOotStrapping your own Skills) learns to accompli… ▽ More

    Submitted 17 October, 2023; v1 submitted 15 October, 2023; originally announced October 2023.

    Comments: CoRL 2023 (Oral); 24 pages, 11 figures

  8. arXiv:2310.08864  [pdf, other

    cs.RO

    Open X-Embodiment: Robotic Learning Datasets and RT-X Models

    Authors: Open X-Embodiment Collaboration, Abby O'Neill, Abdul Rehman, Abhinav Gupta, Abhiram Maddukuri, Abhishek Gupta, Abhishek Padalkar, Abraham Lee, Acorn Pooley, Agrim Gupta, Ajay Mandlekar, Ajinkya Jain, Albert Tung, Alex Bewley, Alex Herzog, Alex Irpan, Alexander Khazatsky, Anant Rai, Anchit Gupta, Andrew Wang, Andrey Kolobov, Anikait Singh, Animesh Garg, Aniruddha Kembhavi, Annie Xie , et al. (267 additional authors not shown)

    Abstract: Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning method… ▽ More

    Submitted 1 June, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

    Comments: Project website: https://robotics-transformer-x.github.io

  9. arXiv:2310.07899  [pdf, other

    cs.AI cs.RO

    RoboCLIP: One Demonstration is Enough to Learn Robot Policies

    Authors: Sumedh A Sontakke, Jesse Zhang, Sébastien M. R. Arnold, Karl Pertsch, Erdem Bıyık, Dorsa Sadigh, Chelsea Finn, Laurent Itti

    Abstract: Reward specification is a notoriously difficult problem in reinforcement learning, requiring extensive expert supervision to design robust reward functions. Imitation learning (IL) methods attempt to circumvent these problems by utilizing expert demonstrations but typically require a large number of in-domain expert demonstrations. Inspired by advances in the field of Video-and-Language Models (VL… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

  10. arXiv:2309.10150  [pdf, other

    cs.RO cs.AI cs.LG

    Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions

    Authors: Yevgen Chebotar, Quan Vuong, Alex Irpan, Karol Hausman, Fei Xia, Yao Lu, Aviral Kumar, Tianhe Yu, Alexander Herzog, Karl Pertsch, Keerthana Gopalakrishnan, Julian Ibarz, Ofir Nachum, Sumedh Sontakke, Grecia Salazar, Huong T Tran, Jodilyn Peralta, Clayton Tan, Deeksha Manjunath, Jaspiar Singht, Brianna Zitkovich, Tomas Jackson, Kanishka Rao, Chelsea Finn, Sergey Levine

    Abstract: In this work, we present a scalable reinforcement learning method for training multi-task policies from large offline datasets that can leverage both human demonstrations and autonomously collected data. Our method uses a Transformer to provide a scalable representation for Q-functions trained via offline temporal difference backups. We therefore refer to the method as Q-Transformer. By discretizi… ▽ More

    Submitted 17 October, 2023; v1 submitted 18 September, 2023; originally announced September 2023.

    Comments: See website at https://qtransformer.github.io

  11. arXiv:2307.15818  [pdf, other

    cs.RO cs.CL cs.CV cs.LG

    RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control

    Authors: Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Xi Chen, Krzysztof Choromanski, Tianli Ding, Danny Driess, Avinava Dubey, Chelsea Finn, Pete Florence, Chuyuan Fu, Montse Gonzalez Arenas, Keerthana Gopalakrishnan, Kehang Han, Karol Hausman, Alexander Herzog, Jasmine Hsu, Brian Ichter, Alex Irpan, Nikhil Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Isabel Leal , et al. (29 additional authors not shown)

    Abstract: We study how vision-language models trained on Internet-scale data can be incorporated directly into end-to-end robotic control to boost generalization and enable emergent semantic reasoning. Our goal is to enable a single end-to-end trained model to both learn to map robot observations to actions and enjoy the benefits of large-scale pretraining on language and vision-language data from the web.… ▽ More

    Submitted 28 July, 2023; originally announced July 2023.

    Comments: Website: https://robotics-transformer.github.io/

  12. arXiv:2306.11886  [pdf, other

    cs.RO cs.AI cs.LG

    SPRINT: Scalable Policy Pre-Training via Language Instruction Relabeling

    Authors: Jesse Zhang, Karl Pertsch, Jiahui Zhang, Joseph J. Lim

    Abstract: Pre-training robot policies with a rich set of skills can substantially accelerate the learning of downstream tasks. Prior works have defined pre-training tasks via natural language instructions, but doing so requires tedious human annotation of hundreds of thousands of instructions. Thus, we propose SPRINT, a scalable offline policy pre-training approach which substantially reduces the human effo… ▽ More

    Submitted 29 January, 2024; v1 submitted 20 June, 2023; originally announced June 2023.

    Comments: 29 pages, 18 figures. Published at ICRA 2024

  13. arXiv:2212.07407  [pdf, other

    cs.LG cs.RO

    Cross-Domain Transfer via Semantic Skill Imitation

    Authors: Karl Pertsch, Ruta Desai, Vikash Kumar, Franziska Meier, Joseph J. Lim, Dhruv Batra, Akshara Rai

    Abstract: We propose an approach for semantic imitation, which uses demonstrations from a source domain, e.g. human videos, to accelerate reinforcement learning (RL) in a different target domain, e.g. a robotic manipulator in a simulated kitchen. Instead of imitating low-level actions like joint velocities, our approach imitates the sequence of demonstrated semantic skills like "opening the microwave" or "t… ▽ More

    Submitted 14 December, 2022; originally announced December 2022.

    Comments: Project website: https://kpertsch.github.io/star

    Journal ref: CoRL 2022

  14. arXiv:2212.06817  [pdf, other

    cs.RO cs.AI cs.CL cs.CV cs.LG

    RT-1: Robotics Transformer for Real-World Control at Scale

    Authors: Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Joseph Dabis, Chelsea Finn, Keerthana Gopalakrishnan, Karol Hausman, Alex Herzog, Jasmine Hsu, Julian Ibarz, Brian Ichter, Alex Irpan, Tomas Jackson, Sally Jesmonth, Nikhil J Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Isabel Leal, Kuang-Huei Lee, Sergey Levine, Yao Lu, Utsav Malla, Deeksha Manjunath , et al. (26 additional authors not shown)

    Abstract: By transferring knowledge from large, diverse, task-agnostic datasets, modern machine learning models can solve specific downstream tasks either zero-shot or with small task-specific datasets to a high level of performance. While this capability has been demonstrated in other fields such as computer vision, natural language processing or speech recognition, it remains to be shown in robotics, wher… ▽ More

    Submitted 11 August, 2023; v1 submitted 13 December, 2022; originally announced December 2022.

    Comments: See website at robotics-transformer1.github.io

  15. arXiv:2212.04708  [pdf, other

    cs.RO cs.AI cs.LG

    PATO: Policy Assisted TeleOperation for Scalable Robot Data Collection

    Authors: Shivin Dass, Karl Pertsch, Hejia Zhang, Youngwoon Lee, Joseph J. Lim, Stefanos Nikolaidis

    Abstract: Large-scale data is an essential component of machine learning as demonstrated in recent advances in natural language processing and computer vision research. However, collecting large-scale robotic data is much more expensive and slower as each operator can control only a single robot at a time. To make this costly data collection process efficient and scalable, we propose Policy Assisted TeleOpe… ▽ More

    Submitted 1 June, 2023; v1 submitted 9 December, 2022; originally announced December 2022.

    Comments: Robotics: Science and Systems (RSS) 2023. Website: https://clvrai.com/pato

  16. arXiv:2204.11828  [pdf, other

    cs.LG cs.AI

    Skill-based Meta-Reinforcement Learning

    Authors: Taewook Nam, Shao-Hua Sun, Karl Pertsch, Sung Ju Hwang, Joseph J Lim

    Abstract: While deep reinforcement learning methods have shown impressive results in robot learning, their sample inefficiency makes the learning of complex, long-horizon behaviors with real robot systems infeasible. To mitigate this issue, meta-reinforcement learning methods aim to enable fast learning on novel tasks by learning how to learn. Yet, the application has been limited to short-horizon tasks wit… ▽ More

    Submitted 25 April, 2022; originally announced April 2022.

    Comments: ICLR 2022. Project page at https://namsan96.github.io/SiMPL

  17. arXiv:2204.11827  [pdf, other

    cs.LG cs.AI cs.RO

    Task-Induced Representation Learning

    Authors: Jun Yamada, Karl Pertsch, Anisha Gunjal, Joseph J. Lim

    Abstract: In this work, we evaluate the effectiveness of representation learning approaches for decision making in visually complex environments. Representation learning is essential for effective reinforcement learning (RL) from high-dimensional inputs. Unsupervised representation learning approaches based on reconstruction, prediction or contrastive learning have shown substantial learning efficiency gain… ▽ More

    Submitted 25 April, 2022; originally announced April 2022.

    Comments: International Conference on Learning Representations (ICLR), 2022

  18. arXiv:2107.10253  [pdf, other

    cs.LG cs.AI cs.RO

    Demonstration-Guided Reinforcement Learning with Learned Skills

    Authors: Karl Pertsch, Youngwoon Lee, Yue Wu, Joseph J. Lim

    Abstract: Demonstration-guided reinforcement learning (RL) is a promising approach for learning complex behaviors by leveraging both reward feedback and a set of target task demonstrations. Prior approaches for demonstration-guided RL treat every new task as an independent learning problem and attempt to follow the provided demonstrations step-by-step, akin to a human trying to imitate a completely unseen b… ▽ More

    Submitted 21 July, 2021; originally announced July 2021.

  19. arXiv:2010.11944  [pdf, other

    cs.LG cs.AI cs.RO

    Accelerating Reinforcement Learning with Learned Skill Priors

    Authors: Karl Pertsch, Youngwoon Lee, Joseph J. Lim

    Abstract: Intelligent agents rely heavily on prior experience when learning a new task, yet most modern reinforcement learning (RL) approaches learn every task from scratch. One approach for leveraging prior knowledge is to transfer skills learned on prior tasks to the new task. However, as the amount of prior experience increases, the number of transferable skills grows too, making it challenging to explor… ▽ More

    Submitted 22 October, 2020; originally announced October 2020.

    Comments: 4th Conference on Robot Learning (CoRL), 2020

  20. arXiv:2010.11940  [pdf, other

    cs.RO cs.AI cs.LG

    Motion Planner Augmented Reinforcement Learning for Robot Manipulation in Obstructed Environments

    Authors: Jun Yamada, Youngwoon Lee, Gautam Salhotra, Karl Pertsch, Max Pflueger, Gaurav S. Sukhatme, Joseph J. Lim, Peter Englert

    Abstract: Deep reinforcement learning (RL) agents are able to learn contact-rich manipulation tasks by maximizing a reward signal, but require large amounts of experience, especially in environments with many obstacles that complicate exploration. In contrast, motion planners use explicit models of the agent and environment to plan collision-free paths to faraway goals, but suffer from inaccurate models in… ▽ More

    Submitted 22 October, 2020; originally announced October 2020.

    Comments: Published at the Conference on Robot Learning (CoRL) 2020

  21. arXiv:2006.13205  [pdf, other

    cs.LG cs.AI cs.CV cs.RO stat.ML

    Long-Horizon Visual Planning with Goal-Conditioned Hierarchical Predictors

    Authors: Karl Pertsch, Oleh Rybkin, Frederik Ebert, Chelsea Finn, Dinesh Jayaraman, Sergey Levine

    Abstract: The ability to predict and plan into the future is fundamental for agents acting in the world. To reach a faraway goal, we predict trajectories at multiple timescales, first devising a coarse plan towards the goal and then gradually filling in details. In contrast, current learning approaches for visual prediction and planning fail on long-horizon tasks as they generate predictions (1) without con… ▽ More

    Submitted 27 November, 2020; v1 submitted 23 June, 2020; originally announced June 2020.

    Comments: Project page: orybkin.github.io/video-gcp. KP and OR contributed equally

  22. arXiv:1904.05869  [pdf, other

    cs.LG cs.CV stat.ML

    Keyframing the Future: Keyframe Discovery for Visual Prediction and Planning

    Authors: Karl Pertsch, Oleh Rybkin, Jingyun Yang, Shenghao Zhou, Konstantinos G. Derpanis, Kostas Daniilidis, Joseph Lim, Andrew Jaegle

    Abstract: Temporal observations such as videos contain essential information about the dynamics of the underlying scene, but they are often interleaved with inessential, predictable details. One way of dealing with this problem is by focusing on the most informative moments in a sequence. We propose a model that learns to discover these important events and the times when they occur and uses them to represe… ▽ More

    Submitted 7 May, 2020; v1 submitted 11 April, 2019; originally announced April 2019.

    Comments: Conference on Learning for Dynamics and Control, 2020. Website: https://sites.google.com/view/keyin/home

  23. arXiv:1806.09655  [pdf, other

    cs.LG cs.CV stat.ML

    Learning what you can do before doing anything

    Authors: Oleh Rybkin, Karl Pertsch, Konstantinos G. Derpanis, Kostas Daniilidis, Andrew Jaegle

    Abstract: Intelligent agents can learn to represent the action spaces of other agents simply by observing them act. Such representations help agents quickly learn to predict the effects of their own actions on the environment and to plan complex action sequences. In this work, we address the problem of learning an agent's action space purely from visual observation. We use stochastic video prediction to lea… ▽ More

    Submitted 12 February, 2019; v1 submitted 25 June, 2018; originally announced June 2018.

    Comments: Published at ICLR 2019. 10 pages + 15 pages of references and appendices

    Journal ref: International Conference on Learning Representations, 2019

  24. arXiv:1712.01924  [pdf, other

    cs.CV

    iPose: Instance-Aware 6D Pose Estimation of Partly Occluded Objects

    Authors: Omid Hosseini Jafari, Siva Karthik Mustikovela, Karl Pertsch, Eric Brachmann, Carsten Rother

    Abstract: We address the task of 6D pose estimation of known rigid objects from single input images in scenarios where the objects are partly occluded. Recent RGB-D-based methods are robust to moderate degrees of occlusion. For RGB inputs, no previous method works well for partly occluded objects. Our main contribution is to present the first deep learning-based system that estimates accurate poses for part… ▽ More

    Submitted 18 June, 2018; v1 submitted 5 December, 2017; originally announced December 2017.