Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 72 results for author: Dragan, A D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2310.10610  [pdf, other

    cs.AI cs.LG cs.RO

    Quantifying Assistive Robustness Via the Natural-Adversarial Frontier

    Authors: Jerry Zhi-Yang He, Zackory Erickson, Daniel S. Brown, Anca D. Dragan

    Abstract: Our ultimate goal is to build robust policies for robots that assist people. What makes this hard is that people can behave unexpectedly at test time, potentially interacting with the robot outside its training distribution and leading to failures. Even just measuring robustness is a challenge. Adversarial perturbations are the default, but they can paint the wrong picture: they can correspond to… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

  2. arXiv:2310.04373  [pdf, other

    cs.LG cs.AI

    Confronting Reward Model Overoptimization with Constrained RLHF

    Authors: Ted Moskovitz, Aaditya K. Singh, DJ Strouse, Tuomas Sandholm, Ruslan Salakhutdinov, Anca D. Dragan, Stephen McAleer

    Abstract: Large language models are typically aligned with human preferences by optimizing $\textit{reward models}$ (RMs) fitted to human feedback. However, human preferences are multi-faceted, and it is increasingly common to derive reward from a composition of simpler reward models which each capture a different aspect of language quality. This itself presents a challenge, as it is difficult to appropriat… ▽ More

    Submitted 10 October, 2023; v1 submitted 6 October, 2023; originally announced October 2023.

  3. arXiv:2309.03839  [pdf, other

    cs.RO cs.HC cs.LG

    Bootstrapping Adaptive Human-Machine Interfaces with Offline Reinforcement Learning

    Authors: Jensen Gao, Siddharth Reddy, Glen Berseth, Anca D. Dragan, Sergey Levine

    Abstract: Adaptive interfaces can help users perform sequential decision-making tasks like robotic teleoperation given noisy, high-dimensional command signals (e.g., from a brain-computer interface). Recent advances in human-in-the-loop machine learning enable such systems to improve by interacting with users, but tend to be limited by the amount of data that they can collect from individual users in practi… ▽ More

    Submitted 7 September, 2023; originally announced September 2023.

    Comments: Accepted to IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2023

  4. arXiv:2307.10026  [pdf, other

    cs.LG

    Contextual Reliability: When Different Features Matter in Different Contexts

    Authors: Gaurav Ghosal, Amrith Setlur, Daniel S. Brown, Anca D. Dragan, Aditi Raghunathan

    Abstract: Deep neural networks often fail catastrophically by relying on spurious correlations. Most prior work assumes a clear dichotomy into spurious and reliable features; however, this is often unrealistic. For example, most of the time we do not want an autonomous car to simply copy the speed of surrounding cars -- we don't want our car to run a red light if a neighboring car does so. However, we canno… ▽ More

    Submitted 19 July, 2023; originally announced July 2023.

    Comments: ICML 2023 Camera Ready Version

  5. arXiv:2305.16941  [pdf, other

    cs.SI cs.CY

    Engagement, User Satisfaction, and the Amplification of Divisive Content on Social Media

    Authors: Smitha Milli, Micah Carroll, Yike Wang, Sashrika Pandey, Sebastian Zhao, Anca D. Dragan

    Abstract: In a pre-registered randomized experiment, we found that, relative to a reverse-chronological baseline, Twitter's engagement-based ranking algorithm amplifies emotionally charged, out-group hostile content that users say makes them feel worse about their political out-group. Furthermore, we find that users do not prefer the political tweets selected by the algorithm, suggesting that the engagement… ▽ More

    Submitted 22 December, 2023; v1 submitted 26 May, 2023; originally announced May 2023.

  6. arXiv:2302.01928  [pdf, other

    cs.RO cs.AI cs.LG

    Aligning Robot and Human Representations

    Authors: Andreea Bobu, Andi Peng, Pulkit Agrawal, Julie Shah, Anca D. Dragan

    Abstract: To act in the world, robots rely on a representation of salient task aspects: for example, to carry a coffee mug, a robot may consider movement efficiency or mug orientation in its behavior. However, if we want robots to act for and with people, their representations must not be just functional but also reflective of what humans care about, i.e. they must be aligned. We observe that current learni… ▽ More

    Submitted 28 January, 2024; v1 submitted 3 February, 2023; originally announced February 2023.

    Comments: 14 pages, 3 figures, 1 table

  7. arXiv:2301.01392  [pdf, other

    cs.LG cs.AI

    Benchmarks and Algorithms for Offline Preference-Based Reward Learning

    Authors: Daniel Shin, Anca D. Dragan, Daniel S. Brown

    Abstract: Learning a reward function from human preferences is challenging as it typically requires having a high-fidelity simulator or using expensive and potentially unsafe actual physical rollouts in the environment. However, in many tasks the agent might have access to offline data from related tasks in the same target environment. While offline data is increasingly being used to aid policy optimization… ▽ More

    Submitted 3 January, 2023; originally announced January 2023.

    Comments: Transactions on Machine Learning Research. arXiv admin note: text overlap with arXiv:2107.09251

  8. arXiv:2301.00810  [pdf, other

    cs.RO cs.AI cs.HC cs.LG

    SIRL: Similarity-based Implicit Representation Learning

    Authors: Andreea Bobu, Yi Liu, Rohin Shah, Daniel S. Brown, Anca D. Dragan

    Abstract: When robots learn reward functions using high capacity models that take raw state directly as input, they need to both learn a representation for what matters in the task -- the task ``features" -- as well as how to combine these features into a single objective. If they try to do both at once from input designed to teach the full reward function, it is easy to end up with a representation that co… ▽ More

    Submitted 17 March, 2023; v1 submitted 2 January, 2023; originally announced January 2023.

    Comments: 12 pages, 6 figures, HRI 2023

  9. arXiv:2212.03175  [pdf, other

    cs.LG cs.AI cs.RO

    Learning Representations that Enable Generalization in Assistive Tasks

    Authors: Jerry Zhi-Yang He, Aditi Raghunathan, Daniel S. Brown, Zackory Erickson, Anca D. Dragan

    Abstract: Recent work in sim2real has successfully enabled robots to act in physical environments by training in simulation with a diverse ''population'' of environments (i.e. domain randomization). In this work, we focus on enabling generalization in assistive tasks: tasks in which the robot is acting to assist a user (e.g. helping someone with motor impairments with bathing or with scratching an itch). Su… ▽ More

    Submitted 5 December, 2022; originally announced December 2022.

  10. arXiv:2208.10687  [pdf, other

    cs.LG cs.AI

    The Effect of Modeling Human Rationality Level on Learning Rewards from Multiple Feedback Types

    Authors: Gaurav R. Ghosal, Matthew Zurek, Daniel S. Brown, Anca D. Dragan

    Abstract: When inferring reward functions from human behavior (be it demonstrations, comparisons, physical corrections, or e-stops), it has proven useful to model the human as making noisy-rational choices, with a "rationality coefficient" capturing how much noise or entropy we expect to see in the human behavior. Prior work typically sets the rationality level to a constant value, regardless of the type, o… ▽ More

    Submitted 9 March, 2023; v1 submitted 22 August, 2022; originally announced August 2022.

    Comments: Published at AAAI 2023; 10 pages, 5 figures plus appendices

  11. arXiv:2205.12381  [pdf, other

    cs.LG cs.HC cs.RO

    First Contact: Unsupervised Human-Machine Co-Adaptation via Mutual Information Maximization

    Authors: Siddharth Reddy, Sergey Levine, Anca D. Dragan

    Abstract: How can we train an assistive human-machine interface (e.g., an electromyography-based limb prosthesis) to translate a user's raw command signals into the actions of a robot or computer when there is no prior mapping, we cannot ask the user for supervision in the form of action labels or reward feedback, and we do not have prior knowledge of the tasks the user is trying to accomplish? The key idea… ▽ More

    Submitted 14 September, 2022; v1 submitted 24 May, 2022; originally announced May 2022.

    Comments: Accepted to Neural Information Processing Systems (NeurIPS) 2022

  12. arXiv:2204.06601  [pdf, other

    cs.LG cs.RO

    Causal Confusion and Reward Misidentification in Preference-Based Reward Learning

    Authors: Jeremy Tien, Jerry Zhi-Yang He, Zackory Erickson, Anca D. Dragan, Daniel S. Brown

    Abstract: Learning policies via preference-based reward learning is an increasingly popular method for customizing agent behavior, but has been shown anecdotally to be prone to spurious correlations and reward hacking behaviors. While much prior work focuses on causal confusion in reinforcement learning and behavioral cloning, we focus on a systematic study of causal confusion and reward misidentification w… ▽ More

    Submitted 18 March, 2023; v1 submitted 13 April, 2022; originally announced April 2022.

    Comments: In the proceedings of the Eleventh International Conference on Learning Representations (ICLR 2023). https://iclr.cc/virtual/2023/poster/10822

  13. arXiv:2203.02091  [pdf, other

    cs.RO cs.AI

    Teaching Robots to Span the Space of Functional Expressive Motion

    Authors: Arjun Sripathy, Andreea Bobu, Zhongyu Li, Koushil Sreenath, Daniel S. Brown, Anca D. Dragan

    Abstract: Our goal is to enable robots to perform functional tasks in emotive ways, be it in response to their users' emotional states, or expressive of their confidence levels. Prior work has proposed learning independent cost functions from user feedback for each target emotion, so that the robot may optimize it alongside task and environment specific objectives for any situation it encounters. However, t… ▽ More

    Submitted 2 August, 2022; v1 submitted 3 March, 2022; originally announced March 2022.

  14. arXiv:2203.02072  [pdf, other

    cs.HC cs.LG

    X2T: Training an X-to-Text Typing Interface with Online Learning from User Feedback

    Authors: Jensen Gao, Siddharth Reddy, Glen Berseth, Nicholas Hardy, Nikhilesh Natraj, Karunesh Ganguly, Anca D. Dragan, Sergey Levine

    Abstract: We aim to help users communicate their intent to machines using flexible, adaptive interfaces that translate arbitrary user input into desired actions. In this work, we focus on assistive typing applications in which a user cannot operate a keyboard, but can instead supply other inputs, such as webcam images that capture eye gaze or neural activity measured by a brain implant. Standard methods tra… ▽ More

    Submitted 6 March, 2022; v1 submitted 3 March, 2022; originally announced March 2022.

    Comments: Accepted to International Conference on Learning Representations (ICLR) 2021

  15. arXiv:2202.02465  [pdf, other

    cs.RO cs.HC cs.LG

    ASHA: Assistive Teleoperation via Human-in-the-Loop Reinforcement Learning

    Authors: Sean Chen, Jensen Gao, Siddharth Reddy, Glen Berseth, Anca D. Dragan, Sergey Levine

    Abstract: Building assistive interfaces for controlling robots through arbitrary, high-dimensional, noisy inputs (e.g., webcam images of eye gaze) can be challenging, especially when it involves inferring the user's desired action in the absence of a natural 'default' interface. Reinforcement learning from online user feedback on the system's performance presents a natural solution to this problem, and enab… ▽ More

    Submitted 4 February, 2022; originally announced February 2022.

    Comments: Accepted to IEEE Conference on Robotics and Automation (ICRA) 2022

  16. arXiv:2201.07082  [pdf, other

    cs.RO cs.AI cs.HC cs.LG

    Inducing Structure in Reward Learning by Learning Features

    Authors: Andreea Bobu, Marius Wiggert, Claire Tomlin, Anca D. Dragan

    Abstract: Reward learning enables robots to learn adaptable behaviors from human input. Traditional methods model the reward as a linear function of hand-crafted features, but that requires specifying all the relevant features a priori, which is impossible for real-world tasks. To get around this issue, recent deep Inverse Reinforcement Learning (IRL) methods learn rewards directly from the raw state but th… ▽ More

    Submitted 18 January, 2022; originally announced January 2022.

    Comments: 24 pages, 22 figures, accepted to the International Journal of Robotics Research. arXiv admin note: text overlap with arXiv:2006.13208

  17. arXiv:2111.09884  [pdf, other

    cs.RO cs.AI cs.LG

    Assisted Robust Reward Design

    Authors: Jerry Zhi-Yang He, Anca D. Dragan

    Abstract: Real-world robotic tasks require complex reward functions. When we define the problem the robot needs to solve, we pretend that a designer specifies this complex reward exactly, and it is set in stone from then on. In practice, however, reward design is an iterative process: the designer chooses a reward, eventually encounters an "edge-case" environment where the reward incentivizes the wrong beha… ▽ More

    Submitted 18 November, 2021; originally announced November 2021.

    Comments: 5th Conference on Robot Learning (CoRL 2021)

  18. arXiv:2109.14700  [pdf, other

    cs.RO

    Safety Assurances for Human-Robot Interaction via Confidence-aware Game-theoretic Human Models

    Authors: Ran Tian, Liting Sun, Andrea Bajcsy, Masayoshi Tomizuka, Anca D. Dragan

    Abstract: An outstanding challenge with safety methods for human-robot interaction is reducing their conservatism while maintaining robustness to variations in human behavior. In this work, we propose that robots use confidence-aware game-theoretic models of human behavior when assessing the safety of a human-robot interaction. By treating the influence between the human and robot as well as the human's rat… ▽ More

    Submitted 30 October, 2021; v1 submitted 29 September, 2021; originally announced September 2021.

  19. arXiv:2108.04219  [pdf, other

    cs.CV cs.HC cs.LG

    Pragmatic Image Compression for Human-in-the-Loop Decision-Making

    Authors: Siddharth Reddy, Anca D. Dragan, Sergey Levine

    Abstract: Standard lossy image compression algorithms aim to preserve an image's appearance, while minimizing the number of bits needed to transmit it. However, the amount of information actually needed by a user for downstream tasks -- e.g., deciding which product to click on in a shopping website -- is likely much lower. To achieve this lower bitrate, we would ideally only transmit the visual features tha… ▽ More

    Submitted 7 July, 2021; originally announced August 2021.

  20. arXiv:2107.09251  [pdf, other

    cs.LG

    Offline Preference-Based Apprenticeship Learning

    Authors: Daniel Shin, Daniel S. Brown, Anca D. Dragan

    Abstract: Learning a reward function from human preferences is challenging as it typically requires having a high-fidelity simulator or using expensive and potentially unsafe actual physical rollouts in the environment. However, in many tasks the agent might have access to offline data from related tasks in the same target environment. While offline data is increasingly being used to aid policy optimization… ▽ More

    Submitted 16 February, 2022; v1 submitted 20 July, 2021; originally announced July 2021.

    Comments: ICML Workshop on Human-AI Collaboration in Sequential Decision-Making, 2021

  21. arXiv:2107.02349  [pdf, other

    cs.RO cs.LG eess.SY

    Physical Interaction as Communication: Learning Robot Objectives Online from Human Corrections

    Authors: Dylan P. Losey, Andrea Bajcsy, Marcia K. O'Malley, Anca D. Dragan

    Abstract: When a robot performs a task next to a human, physical interaction is inevitable: the human might push, pull, twist, or guide the robot. The state-of-the-art treats these interactions as disturbances that the robot should reject or avoid. At best, these robots respond safely while the human interacts; but after the human lets go, these robots simply return to their original behavior. We recognize… ▽ More

    Submitted 5 July, 2021; originally announced July 2021.

  22. arXiv:2106.06499  [pdf, other

    cs.LG cs.AI

    Policy Gradient Bayesian Robust Optimization for Imitation Learning

    Authors: Zaynah Javed, Daniel S. Brown, Satvik Sharma, Jerry Zhu, Ashwin Balakrishna, Marek Petrik, Anca D. Dragan, Ken Goldberg

    Abstract: The difficulty in specifying rewards for many real-world problems has led to an increased focus on learning rewards from human feedback, such as demonstrations. However, there are often many different reward functions that explain the human feedback, leaving agents with uncertainty over what the true reward function is. While most policy optimization approaches handle this uncertainty by optimizin… ▽ More

    Submitted 21 June, 2021; v1 submitted 11 June, 2021; originally announced June 2021.

    Comments: In proceedings of the International Conference on Machine Learning (ICML) 2021

  23. arXiv:2105.01850  [pdf, other

    cs.LG stat.ML

    Preference learning along multiple criteria: A game-theoretic perspective

    Authors: Kush Bhatia, Ashwin Pananjady, Peter L. Bartlett, Anca D. Dragan, Martin J. Wainwright

    Abstract: The literature on ranking from ordinal data is vast, and there are several ways to aggregate overall preferences from pairwise comparisons between objects. In particular, it is well known that any Nash equilibrium of the zero sum game induced by the preference matrix defines a natural solution concept (winning distribution over objects) known as a von Neumann winner. Many real-world problems, howe… ▽ More

    Submitted 4 May, 2021; originally announced May 2021.

    Comments: 47 pages; published as a conference paper at NeurIPS 2020

  24. arXiv:2104.11353  [pdf, other

    cs.RO cs.LG eess.SY

    Optimal Cost Design for Model Predictive Control

    Authors: Avik Jain, Lawrence Chan, Daniel S. Brown, Anca D. Dragan

    Abstract: Many robotics domains use some form of nonconvex model predictive control (MPC) for planning, which sets a reduced time horizon, performs trajectory optimization, and replans at every step. The actual task typically requires a much longer horizon than is computationally tractable, and is specified via a cost function that cumulates over that full horizon. For instance, an autonomous car may have a… ▽ More

    Submitted 9 June, 2021; v1 submitted 22 April, 2021; originally announced April 2021.

    Comments: In proceedings of 3rd Annual Learning for Dynamics & Control Conference (L4DC) 2021

  25. arXiv:2104.08482  [pdf, other

    cs.LG stat.ML

    Agnostic learning with unknown utilities

    Authors: Kush Bhatia, Peter L. Bartlett, Anca D. Dragan, Jacob Steinhardt

    Abstract: Traditional learning approaches for classification implicitly assume that each mistake has the same cost. In many real-world problems though, the utility of a decision depends on the underlying context $x$ and decision $y$. However, directly incorporating these utilities into the learning objective is often infeasible since these can be quite complex and difficult for humans to specify. We forma… ▽ More

    Submitted 17 April, 2021; originally announced April 2021.

    Comments: 30 pages; published as a conference paper at ITCS 2021

  26. arXiv:2104.06556  [pdf, other

    cs.RO cs.HC cs.LG

    Situational Confidence Assistance for Lifelong Shared Autonomy

    Authors: Matthew Zurek, Andreea Bobu, Daniel S. Brown, Anca D. Dragan

    Abstract: Shared autonomy enables robots to infer user intent and assist in accomplishing it. But when the user wants to do a new task that the robot does not know about, shared autonomy will hinder their performance by attempting to assist them with something that is not their intent. Our key idea is that the robot can detect when its repertoire of intents is insufficient to explain the user's input, and g… ▽ More

    Submitted 13 April, 2021; originally announced April 2021.

    Comments: In proceedings ICRA 2021

  27. arXiv:2103.07815  [pdf, other

    cs.AI cs.RO

    Dynamically Switching Human Prediction Models for Efficient Planning

    Authors: Arjun Sripathy, Andreea Bobu, Daniel S. Brown, Anca D. Dragan

    Abstract: As environments involving both robots and humans become increasingly common, so does the need to account for people during planning. To plan effectively, robots must be able to respond to and sometimes influence what humans do. This requires a human model which predicts future human actions. A simple model may assume the human will continue what they did previously; a more complex one might predic… ▽ More

    Submitted 13 March, 2021; originally announced March 2021.

    Comments: ICRA '21

  28. arXiv:2103.05746  [pdf, other

    cs.RO cs.AI cs.HC eess.SY

    Analyzing Human Models that Adapt Online

    Authors: Andrea Bajcsy, Anand Siththaranjan, Claire J. Tomlin, Anca D. Dragan

    Abstract: Predictive human models often need to adapt their parameters online from human data. This raises previously ignored safety-related questions for robots relying on these models such as what the model could learn online and how quickly could it learn it. For instance, when will the robot have a confident estimate in a nearby human's goal? Or, what parameter initializations guarantee that the robot c… ▽ More

    Submitted 30 September, 2021; v1 submitted 9 March, 2021; originally announced March 2021.

    Comments: ICRA 2021

  29. arXiv:2103.05661  [pdf, other

    cs.AI cs.RO

    On complementing end-to-end human behavior predictors with planning

    Authors: Liting Sun, Xiaogang Jia, Anca D. Dragan

    Abstract: High capacity end-to-end approaches for human motion (behavior) prediction have the ability to represent subtle nuances in human behavior, but struggle with robustness to out of distribution inputs and tail events. Planning-based prediction, on the other hand, can reliably output decent-but-not-great predictions: it is much more stable in the face of distribution shift (as we verify in this work),… ▽ More

    Submitted 12 July, 2021; v1 submitted 9 March, 2021; originally announced March 2021.

    Journal ref: Robotics: Science and Systems, 2021

  30. arXiv:2101.05507  [pdf, other

    cs.LG cs.AI cs.HC cs.MA

    Evaluating the Robustness of Collaborative Agents

    Authors: Paul Knott, Micah Carroll, Sam Devlin, Kamil Ciosek, Katja Hofmann, A. D. Dragan, Rohin Shah

    Abstract: In order for agents trained by deep reinforcement learning to work alongside humans in realistic settings, we will need to ensure that the agents are \emph{robust}. Since the real world is very diverse, and human behavior often changes in response to agent deployment, the agent will likely encounter novel situations that have never been seen during training. This results in an evaluation challenge… ▽ More

    Submitted 14 January, 2021; originally announced January 2021.

  31. arXiv:2012.01557  [pdf, other

    cs.LG

    Value Alignment Verification

    Authors: Daniel S. Brown, Jordan Schneider, Anca D. Dragan, Scott Niekum

    Abstract: As humans interact with autonomous agents to perform increasingly complicated, potentially risky tasks, it is important to be able to efficiently evaluate an agent's performance and correctness. In this paper we formalize and theoretically analyze the problem of efficient value alignment verification: how to efficiently test whether the behavior of another agent is aligned with a human's values. T… ▽ More

    Submitted 11 June, 2021; v1 submitted 2 December, 2020; originally announced December 2020.

    Comments: In proceedings International Conference on Machine Learning (ICML) 2021

  32. arXiv:2008.02840  [pdf, other

    cs.LG cs.HC cs.RO stat.ML

    Assisted Perception: Optimizing Observations to Communicate State

    Authors: Siddharth Reddy, Sergey Levine, Anca D. Dragan

    Abstract: We aim to help users estimate the state of the world in tasks like robotic teleoperation and navigation with visual impairments, where users may have systematic biases that lead to suboptimal behavior: they might struggle to process observations from multiple sensors simultaneously, receive delayed observations, or overestimate distances to obstacles. While we cannot directly change the user's int… ▽ More

    Submitted 6 August, 2020; originally announced August 2020.

  33. arXiv:2006.13208  [pdf, other

    cs.RO cs.AI cs.HC cs.LG stat.ML

    Feature Expansive Reward Learning: Rethinking Human Input

    Authors: Andreea Bobu, Marius Wiggert, Claire Tomlin, Anca D. Dragan

    Abstract: When a person is not satisfied with how a robot performs a task, they can intervene to correct it. Reward learning methods enable the robot to adapt its reward function online based on such human input, but they rely on handcrafted features. When the correction cannot be explained by these features, recent work in deep Inverse Reinforcement Learning (IRL) suggests that the robot could ask for task… ▽ More

    Submitted 12 January, 2021; v1 submitted 23 June, 2020; originally announced June 2020.

    Comments: 13 pages, 14 figures

  34. arXiv:2002.04833  [pdf, other

    cs.LG cs.AI cs.HC cs.RO

    Reward-rational (implicit) choice: A unifying formalism for reward learning

    Authors: Hong Jun Jeon, Smitha Milli, Anca D. Dragan

    Abstract: It is often difficult to hand-specify what the correct reward function is for a task, so researchers have instead aimed to learn reward functions from human behavior or feedback. The types of behavior interpreted as evidence of the reward function have expanded greatly in recent years. We've gone from demonstrations, to comparisons, to reading into the information leaked when the human is pushing… ▽ More

    Submitted 11 December, 2020; v1 submitted 12 February, 2020; originally announced February 2020.

    Comments: Published at NeurIPS 2020

  35. arXiv:2002.00941  [pdf, other

    cs.RO cs.AI cs.HC cs.LG stat.ML

    Quantifying Hypothesis Space Misspecification in Learning from Human-Robot Demonstrations and Physical Corrections

    Authors: Andreea Bobu, Andrea Bajcsy, Jaime F. Fisac, Sampada Deglurkar, Anca D. Dragan

    Abstract: Human input has enabled autonomous systems to improve their capabilities and achieve complex behaviors that are otherwise challenging to generate automatically. Recent work focuses on how robots can use such input - like demonstrations or corrections - to learn intended objectives. These techniques assume that the human's desired objective already exists within the robot's hypothesis space. In rea… ▽ More

    Submitted 28 February, 2020; v1 submitted 3 February, 2020; originally announced February 2020.

    Comments: 20 pages. 12 figures, 1 table. IEEE Transactions on Robotics, 2020

  36. arXiv:2001.04465  [pdf, other

    cs.RO cs.AI cs.HC cs.LG stat.ML

    LESS is More: Rethinking Probabilistic Models of Human Behavior

    Authors: Andreea Bobu, Dexter R. R. Scobee, Jaime F. Fisac, S. Shankar Sastry, Anca D. Dragan

    Abstract: Robots need models of human behavior for both inferring human goals and preferences, and predicting what people will do. A common model is the Boltzmann noisily-rational decision model, which assumes people approximately optimize a reward function and choose trajectories in proportion to their exponentiated reward. While this model has been successful in a variety of robotics domains, its roots li… ▽ More

    Submitted 13 January, 2020; originally announced January 2020.

    Comments: 9 pages, 7 figures

  37. arXiv:1912.05652  [pdf, other

    cs.CY cs.LG stat.ML

    Learning Human Objectives by Evaluating Hypothetical Behavior

    Authors: Siddharth Reddy, Anca D. Dragan, Sergey Levine, Shane Legg, Jan Leike

    Abstract: We seek to align agent behavior with a user's objectives in a reinforcement learning setting with unknown dynamics, an unknown reward function, and unknown unsafe states. The user knows the rewards and unsafe states, but querying the user is expensive. To address this challenge, we propose an algorithm that safely and interactively learns a model of the user's reward function. We start with a gene… ▽ More

    Submitted 24 March, 2021; v1 submitted 5 December, 2019; originally announced December 2019.

    Comments: Published at International Conference on Machine Learning (ICML) 2020

  38. arXiv:1911.02320  [pdf, other

    cs.RO cs.HC cs.LG

    Nonverbal Robot Feedback for Human Teachers

    Authors: Sandy H. Huang, Isabella Huang, Ravi Pandya, Anca D. Dragan

    Abstract: Robots can learn preferences from human demonstrations, but their success depends on how informative these demonstrations are. Being informative is unfortunately very challenging, because during teaching, people typically get no transparency into what the robot already knows or has learned so far. In contrast, human students naturally provide a wealth of nonverbal feedback that reveals their level… ▽ More

    Submitted 6 November, 2019; originally announced November 2019.

    Comments: CoRL 2019

  39. arXiv:1910.13369  [pdf, other

    cs.RO cs.LG eess.SY

    A Hamilton-Jacobi Reachability-Based Framework for Predicting and Analyzing Human Motion for Safe Planning

    Authors: Somil Bansal, Andrea Bajcsy, Ellis Ratner, Anca D. Dragan, Claire J. Tomlin

    Abstract: Real-world autonomous systems often employ probabilistic predictive models of human behavior during planning to reason about their future motion. Since accurately modeling human behavior a priori is challenging, such models are often parameterized, enabling the robot to adapt predictions based on observations by maintaining a distribution over the model parameters. Although this enables data and p… ▽ More

    Submitted 5 April, 2020; v1 submitted 29 October, 2019; originally announced October 2019.

  40. arXiv:1910.02910  [pdf, other

    cs.RO cs.LG stat.ML

    Scaled Autonomy: Enabling Human Operators to Control Robot Fleets

    Authors: Gokul Swamy, Siddharth Reddy, Sergey Levine, Anca D. Dragan

    Abstract: Autonomous robots often encounter challenging situations where their control policies fail and an expert human operator must briefly intervene, e.g., through teleoperation. In settings where multiple robots act in separate environments, a single human operator can manage a fleet of robots by identifying and teleoperating one robot at any given time. The key challenge is that users have limited att… ▽ More

    Submitted 8 March, 2020; v1 submitted 21 September, 2019; originally announced October 2019.

    Comments: Accepted to International Conference on Robotics and Automation (ICRA) 2020

  41. arXiv:1909.04694  [pdf, other

    eess.SY cs.RO

    Efficient Iterative Linear-Quadratic Approximations for Nonlinear Multi-Player General-Sum Differential Games

    Authors: David Fridovich-Keil, Ellis Ratner, Lasse Peters, Anca D. Dragan, Claire J. Tomlin

    Abstract: Many problems in robotics involve multiple decision making agents. To operate efficiently in such settings, a robot must reason about the impact of its decisions on the behavior of other agents. Differential games offer an expressive theoretical framework for formulating these types of multi-agent problems. Unfortunately, most numerical solution techniques scale poorly with state dimension and are… ▽ More

    Submitted 18 March, 2020; v1 submitted 10 September, 2019; originally announced September 2019.

    Comments: 8 pages, 4 figures, accepted to the IEEE International Conference on Robotics and Automation

  42. arXiv:1907.11826  [pdf, ps, other

    stat.ML cs.LG stat.CO

    Bayesian Robustness: A Nonasymptotic Viewpoint

    Authors: Kush Bhatia, Yi-An Ma, Anca D. Dragan, Peter L. Bartlett, Michael I. Jordan

    Abstract: We study the problem of robustly estimating the posterior distribution for the setting where observed data can be contaminated with potentially adversarial outliers. We propose Rob-ULA, a robust variant of the Unadjusted Langevin Algorithm (ULA), and provide a finite-sample analysis of its sampling distribution. In particular, we show that after… ▽ More

    Submitted 26 July, 2019; originally announced July 2019.

    Comments: 30 pages, 5 figures

  43. arXiv:1906.09624  [pdf, other

    cs.LG cs.AI stat.ML

    On the Feasibility of Learning, Rather than Assuming, Human Biases for Reward Inference

    Authors: Rohin Shah, Noah Gundotra, Pieter Abbeel, Anca D. Dragan

    Abstract: Our goal is for agents to optimize the right reward function, despite how difficult it is for us to specify what that is. Inverse Reinforcement Learning (IRL) enables us to infer reward functions from demonstrations, but it usually assumes that the expert is noisily optimal. Real people, on the other hand, often have systematic biases: risk-aversion, myopia, etc. One option is to try to characteri… ▽ More

    Submitted 23 June, 2019; originally announced June 2019.

    Comments: Published at ICML 2019

  44. arXiv:1906.02641  [pdf, other

    cs.LG cs.HC cs.RO stat.ML

    An Extensible Interactive Interface for Agent Design

    Authors: Matthew Rahtz, James Fang, Anca D. Dragan, Dylan Hadfield-Menell

    Abstract: In artificial intelligence, we often specify tasks through a reward function. While this works well in some settings, many tasks are hard to specify this way. In deep reinforcement learning, for example, directly specifying a reward as a function of a high-dimensional observation is challenging. Instead, we present an interface for specifying tasks interactively using demonstrations. Our approach… ▽ More

    Submitted 8 August, 2019; v1 submitted 6 June, 2019; originally announced June 2019.

    Comments: Presented at 2019 ICML Workshop on Human in the Loop Learning (HILL 2019), Long Beach, USA

  45. arXiv:1905.11108  [pdf, other

    cs.LG stat.ML

    SQIL: Imitation Learning via Reinforcement Learning with Sparse Rewards

    Authors: Siddharth Reddy, Anca D. Dragan, Sergey Levine

    Abstract: Learning to imitate expert behavior from demonstrations can be challenging, especially in environments with high-dimensional, continuous observations and unknown dynamics. Supervised learning methods based on behavioral cloning (BC) suffer from distribution shift: because the agent greedily imitates demonstrated actions, it can drift away from demonstrated states due to error accumulation. Recent… ▽ More

    Submitted 25 September, 2019; v1 submitted 27 May, 2019; originally announced May 2019.

  46. arXiv:1903.03877  [pdf, other

    cs.AI

    Literal or Pedagogic Human? Analyzing Human Model Misspecification in Objective Learning

    Authors: Smitha Milli, Anca D. Dragan

    Abstract: It is incredibly easy for a system designer to misspecify the objective for an autonomous system ("robot''), thus motivating the desire to have the robot learn the objective from human behavior instead. Recent work has suggested that people have an interest in the robot performing well, and will thus behave pedagogically, choosing actions that are informative to the robot. In turn, robots benefit… ▽ More

    Submitted 28 June, 2019; v1 submitted 9 March, 2019; originally announced March 2019.

    Comments: Published at UAI 2019

  47. arXiv:1812.09376  [pdf, other

    cs.AI

    Human-AI Learning Performance in Multi-Armed Bandits

    Authors: Ravi Pandya, Sandy H. Huang, Dylan Hadfield-Menell, Anca D. Dragan

    Abstract: People frequently face challenging decision-making problems in which outcomes are uncertain or unknown. Artificial intelligence (AI) algorithms exist that can outperform humans at learning such tasks. Thus, there is an opportunity for AI agents to assist people in learning these tasks more effectively. In this work, we use a multi-armed bandit as a controlled setting in which to explore this direc… ▽ More

    Submitted 21 December, 2018; originally announced December 2018.

    Comments: Artificial Intelligence, Ethics and Society (AIES) 2019

  48. arXiv:1812.01225  [pdf, other

    cs.RO

    Learning from Extrapolated Corrections

    Authors: Jason Y. Zhang, Anca D. Dragan

    Abstract: Our goal is to enable robots to learn cost functions from user guidance. Often it is difficult or impossible for users to provide full demonstrations, so corrections have emerged as an easier guidance channel. However, when robots learn cost functions from corrections rather than demonstrations, they have to extrapolate a small amount of information -- the change of a waypoint along the way -- to… ▽ More

    Submitted 10 March, 2019; v1 submitted 4 December, 2018; originally announced December 2018.

  49. arXiv:1811.05929  [pdf, other

    cs.RO

    A Scalable Framework For Real-Time Multi-Robot, Multi-Human Collision Avoidance

    Authors: Andrea Bajcsy, Sylvia L. Herbert, David Fridovich-Keil, Jaime F. Fisac, Sampada Deglurkar, Anca D. Dragan, Claire J. Tomlin

    Abstract: Robust motion planning is a well-studied problem in the robotics literature, yet current algorithms struggle to operate scalably and safely in the presence of other moving agents, such as humans. This paper introduces a novel framework for robot navigation that accounts for high-order system dynamics and maintains safety in the presence of external disturbances, other robots, and non-deterministic… ▽ More

    Submitted 14 November, 2018; originally announced November 2018.

  50. arXiv:1810.08174  [pdf, other

    cs.RO

    Establishing Appropriate Trust via Critical States

    Authors: Sandy H. Huang, Kush Bhatia, Pieter Abbeel, Anca D. Dragan

    Abstract: In order to effectively interact with or supervise a robot, humans need to have an accurate mental model of its capabilities and how it acts. Learned neural network policies make that particularly challenging. We propose an approach for helping end-users build a mental model of such policies. Our key observation is that for most tasks, the essence of the policy is captured in a few critical states… ▽ More

    Submitted 18 October, 2018; originally announced October 2018.

    Comments: IROS 2018