Search | arXiv e-print repository

Safety Filters for Black-Box Dynamical Systems by Learning Discriminating Hyperplanes

Authors: Will Lavanakul, Jason J. Choi, Koushil Sreenath, Claire J. Tomlin

Abstract: Learning-based approaches are emerging as an effective approach for safety filters for black-box dynamical systems. Existing methods have relied on certificate functions like Control Barrier Functions (CBFs) and Hamilton-Jacobi (HJ) reachability value functions. The primary motivation for our work is the recognition that ultimately, enforcing the safety constraint as a control input constraint at… ▽ More Learning-based approaches are emerging as an effective approach for safety filters for black-box dynamical systems. Existing methods have relied on certificate functions like Control Barrier Functions (CBFs) and Hamilton-Jacobi (HJ) reachability value functions. The primary motivation for our work is the recognition that ultimately, enforcing the safety constraint as a control input constraint at each state is what matters. By focusing on this constraint, we can eliminate dependence on any specific certificate function-based design. To achieve this, we define a discriminating hyperplane that shapes the half-space constraint on control input at each state, serving as a sufficient condition for safety. This concept not only generalizes over traditional safety methods but also simplifies safety filter design by eliminating dependence on specific certificate functions. We present two strategies to learn the discriminating hyperplane: (a) a supervised learning approach, using pre-verified control invariant sets for labeling, and (b) a reinforcement learning (RL) approach, which does not require such labels. The main advantage of our method, unlike conventional safe RL approaches, is the separation of performance and safety. This offers a reusable safety filter for learning new tasks, avoiding the need to retrain from scratch. As such, we believe that the new notion of the discriminating hyperplane offers a more generalizable direction towards designing safety filters, encompassing and extending existing certificate-function-based or safe RL methodologies. △ Less

Submitted 21 May, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

Comments: * Indicate co-first authors. This is an extended version of the paper presented at L4DC 2024

arXiv:2311.13824 [pdf, other]

Constraint-Guided Online Data Selection for Scalable Data-Driven Safety Filters in Uncertain Robotic Systems

Authors: Jason J. Choi, Fernando Castañeda, Wonsuhk Jung, Bike Zhang, Claire J. Tomlin, Koushil Sreenath

Abstract: As the use of autonomous robotic systems expands in tasks that are complex and challenging to model, the demand for robust data-driven control methods that can certify safety and stability in uncertain conditions is increasing. However, the practical implementation of these methods often faces scalability issues due to the growing amount of data points with system complexity, and a significant rel… ▽ More As the use of autonomous robotic systems expands in tasks that are complex and challenging to model, the demand for robust data-driven control methods that can certify safety and stability in uncertain conditions is increasing. However, the practical implementation of these methods often faces scalability issues due to the growing amount of data points with system complexity, and a significant reliance on high-quality training data. In response to these challenges, this study presents a scalable data-driven controller that efficiently identifies and infers from the most informative data points for implementing data-driven safety filters. Our approach is grounded in the integration of a model-based certificate function-based method and Gaussian Process (GP) regression, reinforced by a novel online data selection algorithm that reduces time complexity from quadratic to linear relative to dataset size. Empirical evidence, gathered from successful real-world cart-pole swing-up experiments and simulated locomotion of a five-link bipedal robot, demonstrates the efficacy of our approach. Our findings reveal that our efficient online data selection algorithm, which strategically selects key data points, enhances the practicality and efficiency of data-driven certifying filters in complex robotic systems, significantly mitigating scalability concerns inherent in nonparametric learning-based control methods. △ Less

Submitted 23 November, 2023; originally announced November 2023.

Comments: The first three authors contributed equally to the work. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2309.06655 [pdf, other]

Out of Distribution Detection via Domain-Informed Gaussian Process State Space Models

Authors: Alonso Marco, Elias Morley, Claire J. Tomlin

Abstract: In order for robots to safely navigate in unseen scenarios using learning-based methods, it is important to accurately detect out-of-training-distribution (OoD) situations online. Recently, Gaussian process state-space models (GPSSMs) have proven useful to discriminate unexpected observations by comparing them against probabilistic predictions. However, the capability for the model to correctly di… ▽ More In order for robots to safely navigate in unseen scenarios using learning-based methods, it is important to accurately detect out-of-training-distribution (OoD) situations online. Recently, Gaussian process state-space models (GPSSMs) have proven useful to discriminate unexpected observations by comparing them against probabilistic predictions. However, the capability for the model to correctly distinguish between in- and out-of-training distribution observations hinges on the accuracy of these predictions, primarily affected by the class of functions the GPSSM kernel can represent. In this paper, we propose (i) a novel approach to embed existing domain knowledge in the kernel and (ii) an OoD online runtime monitor, based on receding-horizon predictions. Domain knowledge is provided in the form of a dataset, collected either in simulation or by using a nominal model. Numerical results show that the informed kernel yields better regression quality with smaller datasets, as compared to standard kernel choices. We demonstrate the effectiveness of the OoD monitor on a real quadruped navigating an indoor setting, which reliably classifies previously unseen terrains. △ Less

Submitted 15 September, 2023; v1 submitted 12 September, 2023; originally announced September 2023.

Comments: 7 pages, 4 figures

arXiv:2307.01917 [pdf, other]

Stranding Risk for Underactuated Vessels in Complex Ocean Currents: Analysis and Controllers

Authors: Andreas Doering, Marius Wiggert, Hanna Krasowski, Manan Doshi, Pierre F. J. Lermusiaux, Claire J. Tomlin

Abstract: Low-propulsion vessels can take advantage of powerful ocean currents to navigate towards a destination. Recent results demonstrated that vessels can reach their destination with high probability despite forecast errors. However, these results do not consider the critical aspect of safety of such vessels: because of their low propulsion which is much smaller than the magnitude of currents, they mig… ▽ More Low-propulsion vessels can take advantage of powerful ocean currents to navigate towards a destination. Recent results demonstrated that vessels can reach their destination with high probability despite forecast errors. However, these results do not consider the critical aspect of safety of such vessels: because of their low propulsion which is much smaller than the magnitude of currents, they might end up in currents that inevitably push them into unsafe areas such as shallow areas, garbage patches, and shipping lanes. In this work, we first investigate the risk of stranding for free-floating vessels in the Northeast Pacific. We find that at least 5.04% would strand within 90 days. Next, we encode the unsafe sets as hard constraints into Hamilton-Jacobi Multi-Time Reachability (HJ-MTR) to synthesize a feedback policy that is equivalent to re-planning at each time step at low computational cost. While applying this policy closed-loop guarantees safe operation when the currents are known, in realistic situations only imperfect forecasts are available. We demonstrate the safety of our approach in such realistic situations empirically with large-scale simulations of a vessel navigating in high-risk regions in the Northeast Pacific. We find that applying our policy closed-loop with daily re-planning on new forecasts can ensure safety with high probability even under forecast errors that exceed the maximal propulsion. Our method significantly improves safety over the baselines and still achieves a timely arrival of the vessel at the destination. △ Less

Submitted 4 July, 2023; originally announced July 2023.

Comments: 6 pages, 3 figures, submitted to 2023 IEEE 62th Annual Conference on Decision and Control (CDC) Andreas Doering and Marius Wiggert contributed equally to this work

arXiv:2307.01916 [pdf, other]

Maximizing Seaweed Growth on Autonomous Farms: A Dynamic Programming Approach for Underactuated Systems Navigating on Uncertain Ocean Currents

Authors: Matthias Killer, Marius Wiggert, Hanna Krasowski, Manan Doshi, Pierre F. J. Lermusiaux, Claire J. Tomlin

Abstract: Seaweed biomass offers significant potential for climate mitigation, but large-scale, autonomous open-ocean farms are required to fully exploit it. Such farms typically have low propulsion and are heavily influenced by ocean currents. We want to design a controller that maximizes seaweed growth over months by taking advantage of the non-linear time-varying ocean currents for reaching high-growth r… ▽ More Seaweed biomass offers significant potential for climate mitigation, but large-scale, autonomous open-ocean farms are required to fully exploit it. Such farms typically have low propulsion and are heavily influenced by ocean currents. We want to design a controller that maximizes seaweed growth over months by taking advantage of the non-linear time-varying ocean currents for reaching high-growth regions. The complex dynamics and underactuation make this challenging even when the currents are known. This is even harder when only short-term imperfect forecasts with increasing uncertainty are available. We propose a dynamic programming-based method to efficiently solve for the optimal growth value function when true currents are known. We additionally present three extensions when as in reality only forecasts are known: (1) our methods resulting value function can be used as feedback policy to obtain the growth-optimal control for all states and times, allowing closed-loop control equivalent to re-planning at every time step hence mitigating forecast errors, (2) a feedback policy for long-term optimal growth beyond forecast horizons using seasonal average current data as terminal reward, and (3) a discounted finite-time Dynamic Programming (DP) formulation to account for increasing ocean current estimate uncertainty. We evaluate our approach through 30-day simulations of floating seaweed farms in realistic Pacific Ocean current scenarios. Our method demonstrates an achievement of 95.8% of the best possible growth using only 5-day forecasts. This confirms the feasibility of using low-power propulsion and optimal control for enhanced seaweed growth on floating farms under real-world conditions. △ Less

Submitted 29 August, 2023; v1 submitted 4 July, 2023; originally announced July 2023.

Comments: 8 pages, submitted to 2023 IEEE 62th Annual Conference on Decision and Control (CDC) Matthias Killer and Marius Wiggert contributed equally to this work

arXiv:2302.01999 [pdf, other]

Online and Offline Learning of Player Objectives from Partial Observations in Dynamic Games

Authors: Lasse Peters, Vicenç Rubies-Royo, Claire J. Tomlin, Laura Ferranti, Javier Alonso-Mora, Cyrill Stachniss, David Fridovich-Keil

Abstract: Robots deployed to the real world must be able to interact with other agents in their environment. Dynamic game theory provides a powerful mathematical framework for modeling scenarios in which agents have individual objectives and interactions evolve over time. However, a key limitation of such techniques is that they require a-priori knowledge of all players' objectives. In this work, we address… ▽ More Robots deployed to the real world must be able to interact with other agents in their environment. Dynamic game theory provides a powerful mathematical framework for modeling scenarios in which agents have individual objectives and interactions evolve over time. However, a key limitation of such techniques is that they require a-priori knowledge of all players' objectives. In this work, we address this issue by proposing a novel method for learning players' objectives in continuous dynamic games from noise-corrupted, partial state observations. Our approach learns objectives by coupling the estimation of unknown cost parameters of each player with inference of unobserved states and inputs through Nash equilibrium constraints. By coupling past state estimates with future state predictions, our approach is amenable to simultaneous online learning and prediction in receding horizon fashion. We demonstrate our method in several simulated traffic scenarios in which we recover players' preferences for, e.g., desired travel speed and collision-avoidance behavior. Results show that our method reliably estimates game-theoretic models from noise-corrupted data that closely matches ground-truth objectives, consistently outperforming state-of-the-art approaches. △ Less

Submitted 14 May, 2023; v1 submitted 3 February, 2023; originally announced February 2023.

Comments: arXiv admin note: text overlap with arXiv:2106.03611

arXiv:2210.05015 [pdf, other]

doi 10.1613/jair.1.14525

Optimality Guarantees for Particle Belief Approximation of POMDPs

Authors: Michael H. Lim, Tyler J. Becker, Mykel J. Kochenderfer, Claire J. Tomlin, Zachary N. Sunberg

Abstract: Partially observable Markov decision processes (POMDPs) provide a flexible representation for real-world decision and control problems. However, POMDPs are notoriously difficult to solve, especially when the state and observation spaces are continuous or hybrid, which is often the case for physical systems. While recent online sampling-based POMDP algorithms that plan with observation likelihood w… ▽ More Partially observable Markov decision processes (POMDPs) provide a flexible representation for real-world decision and control problems. However, POMDPs are notoriously difficult to solve, especially when the state and observation spaces are continuous or hybrid, which is often the case for physical systems. While recent online sampling-based POMDP algorithms that plan with observation likelihood weighting have shown practical effectiveness, a general theory characterizing the approximation error of the particle filtering techniques that these algorithms use has not previously been proposed. Our main contribution is bounding the error between any POMDP and its corresponding finite sample particle belief MDP (PB-MDP) approximation. This fundamental bridge between PB-MDPs and POMDPs allows us to adapt any sampling-based MDP algorithm to a POMDP by solving the corresponding particle belief MDP, thereby extending the convergence guarantees of the MDP algorithm to the POMDP. Practically, this is implemented by using the particle filter belief transition model as the generative model for the MDP solver. While this requires access to the observation density model from the POMDP, it only increases the transition sampling complexity of the MDP solver by a factor of $\mathcal{O}(C)$, where $C$ is the number of particles. Thus, when combined with sparse sampling MDP algorithms, this approach can yield algorithms for POMDPs that have no direct theoretical dependence on the size of the state and observation spaces. In addition to our theoretical contribution, we perform five numerical experiments on benchmark POMDPs to demonstrate that a simple MDP algorithm adapted using PB-MDP approximation, Sparse-PFT, achieves performance competitive with other leading continuous observation POMDP solvers. △ Less

Submitted 19 October, 2023; v1 submitted 10 October, 2022; originally announced October 2022.

Journal ref: Journal of Artificial Intelligence Research, 77, 1591-1636 (2023)

arXiv:2208.10733 [pdf, other]

Recursively Feasible Probabilistic Safe Online Learning with Control Barrier Functions

Authors: Fernando Castañeda, Jason J. Choi, Wonsuhk Jung, Bike Zhang, Claire J. Tomlin, Koushil Sreenath

Abstract: Learning-based control has recently shown great efficacy in performing complex tasks for various applications. However, to deploy it in real systems, it is of vital importance to guarantee the system will stay safe. Control Barrier Functions (CBFs) offer mathematical tools for designing safety-preserving controllers for systems with known dynamics. In this article, we first introduce a model-uncer… ▽ More Learning-based control has recently shown great efficacy in performing complex tasks for various applications. However, to deploy it in real systems, it is of vital importance to guarantee the system will stay safe. Control Barrier Functions (CBFs) offer mathematical tools for designing safety-preserving controllers for systems with known dynamics. In this article, we first introduce a model-uncertainty-aware reformulation of CBF-based safety-critical controllers using Gaussian Process (GP) regression to close the gap between an approximate mathematical model and the real system, which results in a second-order cone program (SOCP)-based control design. We then present the pointwise feasibility conditions of the resulting safety controller, highlighting the level of richness that the available system information must meet to ensure safety. We use these conditions to devise an event-triggered online data collection strategy that ensures the recursive feasibility of the learned safety controller. Our method works by constantly reasoning about whether the current information is sufficient to ensure safety or if new measurements under active safe exploration are required to reduce the uncertainty. As a result, our proposed framework can guarantee the forward invariance of the safe set defined by the CBF with high probability, even if it contains a priori unexplored regions. We validate the proposed framework in two numerical simulation experiments. △ Less

Submitted 3 September, 2024; v1 submitted 23 August, 2022; originally announced August 2022.

Comments: Journal article. Includes the results of the 2021 CDC paper titled "Pointwise feasibility of gaussian process-based safety-critical control under model uncertainty" and proposes a recursively feasible safe online learning algorithm as new contribution

arXiv:2203.10142 [pdf, other]

Infinite-Horizon Reach-Avoid Zero-Sum Games via Deep Reinforcement Learning

Authors: Jingqi Li, Donggun Lee, Somayeh Sojoudi, Claire J. Tomlin

Abstract: In this paper, we consider the infinite-horizon reach-avoid zero-sum game problem, where the goal is to find a set in the state space, referred to as the reach-avoid set, such that the system starting at a state therein could be controlled to reach a given target set without violating constraints under the worst-case disturbance. We address this problem by designing a new value function with a con… ▽ More In this paper, we consider the infinite-horizon reach-avoid zero-sum game problem, where the goal is to find a set in the state space, referred to as the reach-avoid set, such that the system starting at a state therein could be controlled to reach a given target set without violating constraints under the worst-case disturbance. We address this problem by designing a new value function with a contracting Bellman backup, where the super-zero level set, i.e., the set of states where the value function is evaluated to be non-negative, recovers the reach-avoid set. Building upon this, we prove that the proposed method can be adapted to compute the viability kernel, or the set of states which could be controlled to satisfy given constraints, and the backward reachable set, or the set of states that could be driven towards a given target set. Finally, we propose to alleviate the curse of dimensionality issue in high-dimensional problems by extending Conservative Q-Learning, a deep reinforcement learning technique, to learn a value function such that the super-zero level set of the learned value function serves as a (conservative) approximation to the reach-avoid set. Our theoretical and empirical results suggest that the proposed method could learn reliably the reach-avoid set and the optimal control policy even with neural network approximation. △ Less

Submitted 24 August, 2024; v1 submitted 18 March, 2022; originally announced March 2022.

arXiv:2201.08538 [pdf, other]

Computation of Regions of Attraction for Hybrid Limit Cycles Using Reachability: An Application to Walking Robots

Authors: Jason J. Choi, Ayush Agrawal, Koushil Sreenath, Claire J. Tomlin, Somil Bansal

Abstract: Contact-rich robotic systems, such as legged robots and manipulators, are often represented as hybrid systems. However, the stability analysis and region-of-attraction computation for these systems are often challenging because of the discontinuous state changes upon contact (also referred to as state resets). In this work, we cast the computation of region-ofattraction as a Hamilton-Jacobi (HJ) r… ▽ More Contact-rich robotic systems, such as legged robots and manipulators, are often represented as hybrid systems. However, the stability analysis and region-of-attraction computation for these systems are often challenging because of the discontinuous state changes upon contact (also referred to as state resets). In this work, we cast the computation of region-ofattraction as a Hamilton-Jacobi (HJ) reachability problem. This enables us to leverage HJ reachability tools that are compatible with general nonlinear system dynamics, and can formally deal with state and input constraints as well as bounded disturbances. Our main contribution is the generalization of HJ reachability framework to account for the discontinuous state changes originating from state resets, which has remained a challenge until now. We apply our approach for computing region-of-attractions for several underactuated walking robots and demonstrate that the proposed approach can (a) recover a bigger region-of-attraction than state-of-the-art approaches, (b) handle state resets, nonlinear dynamics, external disturbances, and input constraints, and (c) also provides a stabilizing controller for the system that can leverage the state resets for enhancing system stability. △ Less

Submitted 8 February, 2022; v1 submitted 20 January, 2022; originally announced January 2022.

Comments: Accepted to IEEE RA-L & ICRA, 2022

arXiv:2112.12288 [pdf, other]

doi 10.15607/RSS.2021.XVII.077

Safety and Liveness Guarantees through Reach-Avoid Reinforcement Learning

Authors: Kai-Chieh Hsu, Vicenç Rubies-Royo, Claire J. Tomlin, Jaime F. Fisac

Abstract: Reach-avoid optimal control problems, in which the system must reach certain goal conditions while staying clear of unacceptable failure modes, are central to safety and liveness assurance for autonomous robotic systems, but their exact solutions are intractable for complex dynamics and environments. Recent successes in reinforcement learning methods to approximately solve optimal control problems… ▽ More Reach-avoid optimal control problems, in which the system must reach certain goal conditions while staying clear of unacceptable failure modes, are central to safety and liveness assurance for autonomous robotic systems, but their exact solutions are intractable for complex dynamics and environments. Recent successes in reinforcement learning methods to approximately solve optimal control problems with performance objectives make their application to certification problems attractive; however, the Lagrange-type objective used in reinforcement learning is not suitable to encode temporal logic requirements. Recent work has shown promise in extending the reinforcement learning machinery to safety-type problems, whose objective is not a sum, but a minimum (or maximum) over time. In this work, we generalize the reinforcement learning formulation to handle all optimal control problems in the reach-avoid category. We derive a time-discounted reach-avoid Bellman backup with contraction mapping properties and prove that the resulting reach-avoid Q-learning algorithm converges under analogous conditions to the traditional Lagrange-type problem, yielding an arbitrarily tight conservative approximation to the reach-avoid set. We further demonstrate the use of this formulation with deep reinforcement learning methods, retaining zero-violation guarantees by treating the approximate solutions as untrusted oracles in a model-predictive supervisory control framework. We evaluate our proposed framework on a range of nonlinear systems, validating the results against analytic and numerical solutions, and through Monte Carlo simulation in previously intractable problems. Our results open the door to a range of learning-based methods for safe-and-live autonomous behavior, with applications across robotics and automation. See https://github.com/SafeRoboticsLab/safety_rl for code and supplementary material. △ Less

Submitted 22 December, 2021; originally announced December 2021.

Comments: Accepted in Robotics: Science and Systems (RSS), 2021

arXiv:2112.09456 [pdf, other]

Compositional Learning-based Planning for Vision POMDPs

Authors: Sampada Deglurkar, Michael H. Lim, Johnathan Tucker, Zachary N. Sunberg, Aleksandra Faust, Claire J. Tomlin

Abstract: The Partially Observable Markov Decision Process (POMDP) is a powerful framework for capturing decision-making problems that involve state and transition uncertainty. However, most current POMDP planners cannot effectively handle high-dimensional image observations prevalent in real world applications, and often require lengthy online training that requires interaction with the environment. In thi… ▽ More The Partially Observable Markov Decision Process (POMDP) is a powerful framework for capturing decision-making problems that involve state and transition uncertainty. However, most current POMDP planners cannot effectively handle high-dimensional image observations prevalent in real world applications, and often require lengthy online training that requires interaction with the environment. In this work, we propose Visual Tree Search (VTS), a compositional learning and planning procedure that combines generative models learned offline with online model-based POMDP planning. The deep generative observation models evaluate the likelihood of and predict future image observations in a Monte Carlo tree search planner. We show that VTS is robust to different types of image noises that were not present during training and can adapt to different reward structures without the need to re-train. This new approach significantly and stably outperforms several baseline state-of-the-art vision POMDP algorithms while using a fraction of the training time. △ Less

Submitted 2 December, 2022; v1 submitted 17 December, 2021; originally announced December 2021.

arXiv:2109.10450 [pdf, other]

Towards cyber-physical systems robust to communication delays: A differential game approach

Authors: Shankar A. Deka, Donggun Lee, Claire J. Tomlin

Abstract: Collaboration between interconnected cyber-physical systems is becoming increasingly pervasive. Time-delays in communication channels between such systems are known to induce catastrophic failure modes, like high frequency oscillations in robotic manipulators in bilateral teleoperation or string instability in platoons of autonomous vehicles. This paper considers nonlinear time-delay systems repre… ▽ More Collaboration between interconnected cyber-physical systems is becoming increasingly pervasive. Time-delays in communication channels between such systems are known to induce catastrophic failure modes, like high frequency oscillations in robotic manipulators in bilateral teleoperation or string instability in platoons of autonomous vehicles. This paper considers nonlinear time-delay systems representing coupled robotic agents, and proposes controllers that are robust to time-varying communication delays. We introduce approximations that allow the delays to be considered as implicit control inputs themselves, and formulate the problem as a zero-sum differential game between the stabilizing controllers and the delays acting adversarially. The ensuing optimal control law is finally compared to known results from Lyapunov-Krasovskii based approaches via numerical experiments. △ Less

Submitted 21 September, 2021; originally announced September 2021.

Comments: 7 pages, 5 figures, Submitted to IEEE Control Systems Letters

MSC Class: 34K35; 49L12; 93D21

arXiv:2109.04874 [pdf, other]

Discretizing Dynamics for Maximum Likelihood Constraint Inference

Authors: Kaylene C. Stocking, David L. McPherson, Robert P. Matthew, Claire J. Tomlin

Abstract: Maximum likelihood constraint inference is a powerful technique for identifying unmodeled constraints that affect the behavior of a demonstrator acting under a known objective function. However, it was originally formulated only for discrete state-action spaces. Continuous dynamics are more useful for modeling many real-world systems of interest, including the movements of humans and robots. We pr… ▽ More Maximum likelihood constraint inference is a powerful technique for identifying unmodeled constraints that affect the behavior of a demonstrator acting under a known objective function. However, it was originally formulated only for discrete state-action spaces. Continuous dynamics are more useful for modeling many real-world systems of interest, including the movements of humans and robots. We present a method to generate a tabular state-action space that approximates continuous dynamics and can be used for constraint inference on demonstrations that obey the true system dynamics. We then demonstrate accurate constraint inference on nonlinear pendulum systems with 2- and 4-dimensional state spaces, and show that performance is robust to a range of hyperparameters. The demonstrations are not required to be fully optimal with respect to the objective, and the most likely constraints can be identified even when demonstrations cover only a small portion of the state space. For these reasons, the proposed approach may be especially useful for inferring constraints on human demonstrators, which has important applications in human-robot interaction and biomechanical medicine. △ Less

Submitted 10 September, 2021; originally announced September 2021.

Comments: 10 pages, 7 figures

arXiv:2106.07108 [pdf, other]

Pointwise Feasibility of Gaussian Process-based Safety-Critical Control under Model Uncertainty

Authors: Fernando Castañeda, Jason J. Choi, Bike Zhang, Claire J. Tomlin, Koushil Sreenath

Abstract: Control Barrier Functions (CBFs) and Control Lyapunov Functions (CLFs) are popular tools for enforcing safety and stability of a controlled system, respectively. They are commonly utilized to build constraints that can be incorporated in a min-norm quadratic program (CBF-CLF-QP) which solves for a safety-critical control input. However, since these constraints rely on a model of the system, when t… ▽ More Control Barrier Functions (CBFs) and Control Lyapunov Functions (CLFs) are popular tools for enforcing safety and stability of a controlled system, respectively. They are commonly utilized to build constraints that can be incorporated in a min-norm quadratic program (CBF-CLF-QP) which solves for a safety-critical control input. However, since these constraints rely on a model of the system, when this model is inaccurate the guarantees of safety and stability can be easily lost. In this paper, we present a Gaussian Process (GP)-based approach to tackle the problem of model uncertainty in safety-critical controllers that use CBFs and CLFs. The considered model uncertainty is affected by both state and control input. We derive probabilistic bounds on the effects that such model uncertainty has on the dynamics of the CBF and CLF. We then use these bounds to build safety and stability chance constraints that can be incorporated in a min-norm convex optimization-based controller, called GP-CBF-CLF-SOCP. As the main theoretical result of the paper, we present necessary and sufficient conditions for pointwise feasibility of the proposed optimization problem. We believe that these conditions could serve as a starting point towards understanding what are the minimal requirements on the distribution of data collected from the real system in order to guarantee safety. Finally, we validate the proposed framework with numerical simulations of an adaptive cruise controller for an automotive system. △ Less

Submitted 1 October, 2021; v1 submitted 13 June, 2021; originally announced June 2021.

Comments: The first two authors contributed equally. Accepted for publication in IEEE 60th Conference on Decision and Control (CDC 2021)

arXiv:2106.03611 [pdf, other]

Inferring Objectives in Continuous Dynamic Games from Noise-Corrupted Partial State Observations

Authors: Lasse Peters, David Fridovich-Keil, Vicenç Rubies-Royo, Claire J. Tomlin, Cyrill Stachniss

Abstract: Robots and autonomous systems must interact with one another and their environment to provide high-quality services to their users. Dynamic game theory provides an expressive theoretical framework for modeling scenarios involving multiple agents with differing objectives interacting over time. A core challenge when formulating a dynamic game is designing objectives for each agent that capture desi… ▽ More Robots and autonomous systems must interact with one another and their environment to provide high-quality services to their users. Dynamic game theory provides an expressive theoretical framework for modeling scenarios involving multiple agents with differing objectives interacting over time. A core challenge when formulating a dynamic game is designing objectives for each agent that capture desired behavior. In this paper, we propose a method for inferring parametric objective models of multiple agents based on observed interactions. Our inverse game solver jointly optimizes player objectives and continuous-state estimates by coupling them through Nash equilibrium constraints. Hence, our method is able to directly maximize the observation likelihood rather than other non-probabilistic surrogate criteria. Our method does not require full observations of game states or player strategies to identify player objectives. Instead, it robustly recovers this information from noisy, partial state observations. As a byproduct of estimating player objectives, our method computes a Nash equilibrium trajectory corresponding to those objectives. Thus, it is suitable for downstream trajectory forecasting tasks. We demonstrate our method in several simulated traffic scenarios. Results show that it reliably estimates player objectives from a short sequence of noise-corrupted partial state observations. Furthermore, using the estimated objectives, our method makes accurate predictions of each player's trajectory. △ Less

Submitted 7 August, 2021; v1 submitted 7 June, 2021; originally announced June 2021.

Comments: Submitted to RSS2021

arXiv:2103.05746 [pdf, other]

Analyzing Human Models that Adapt Online

Authors: Andrea Bajcsy, Anand Siththaranjan, Claire J. Tomlin, Anca D. Dragan

Abstract: Predictive human models often need to adapt their parameters online from human data. This raises previously ignored safety-related questions for robots relying on these models such as what the model could learn online and how quickly could it learn it. For instance, when will the robot have a confident estimate in a nearby human's goal? Or, what parameter initializations guarantee that the robot c… ▽ More Predictive human models often need to adapt their parameters online from human data. This raises previously ignored safety-related questions for robots relying on these models such as what the model could learn online and how quickly could it learn it. For instance, when will the robot have a confident estimate in a nearby human's goal? Or, what parameter initializations guarantee that the robot can learn the human's preferences in a finite number of observations? To answer such analysis questions, our key idea is to model the robot's learning algorithm as a dynamical system where the state is the current model parameter estimate and the control is the human data the robot observes. This enables us to leverage tools from reachability analysis and optimal control to compute the set of hypotheses the robot could learn in finite time, as well as the worst and best-case time it takes to learn them. We demonstrate the utility of our analysis tool in four human-robot domains, including autonomous driving and indoor navigation. △ Less

Submitted 30 September, 2021; v1 submitted 9 March, 2021; originally announced March 2021.

Comments: ICRA 2021

arXiv:2102.07039 [pdf, other]

doi 10.1109/TAC.2021.3059838

FaSTrack: a Modular Framework for Real-Time Motion Planning and Guaranteed Safe Tracking

Authors: Mo Chen, Sylvia L. Herbert, Haimin Hu, Ye Pu, Jaime F. Fisac, Somil Bansal, SooJean Han, Claire J. Tomlin

Abstract: Real-time, guaranteed safe trajectory planning is vital for navigation in unknown environments. However, real-time navigation algorithms typically sacrifice robustness for computation speed. Alternatively, provably safe trajectory planning tends to be too computationally intensive for real-time replanning. We propose FaSTrack, Fast and Safe Tracking, a framework that achieves both real-time replan… ▽ More Real-time, guaranteed safe trajectory planning is vital for navigation in unknown environments. However, real-time navigation algorithms typically sacrifice robustness for computation speed. Alternatively, provably safe trajectory planning tends to be too computationally intensive for real-time replanning. We propose FaSTrack, Fast and Safe Tracking, a framework that achieves both real-time replanning and guaranteed safety. In this framework, real-time computation is achieved by allowing any trajectory planner to use a simplified \textit{planning model} of the system. The plan is tracked by the system, represented by a more realistic, higher-dimensional \textit{tracking model}. We precompute the tracking error bound (TEB) due to mismatch between the two models and due to external disturbances. We also obtain the corresponding tracking controller used to stay within the TEB. The precomputation does not require prior knowledge of the environment. We demonstrate FaSTrack using Hamilton-Jacobi reachability for precomputation and three different real-time trajectory planners with three different tracking-planning model pairs. △ Less

Submitted 13 March, 2021; v1 submitted 13 February, 2021; originally announced February 2021.

Comments: Published in the IEEE Transactions on Automatic Control

arXiv:2101.05916 [pdf, other]

Scalable Learning of Safety Guarantees for Autonomous Systems using Hamilton-Jacobi Reachability

Authors: Sylvia Herbert, Jason J. Choi, Suvansh Sanjeev, Marsalis Gibson, Koushil Sreenath, Claire J. Tomlin

Abstract: Autonomous systems like aircraft and assistive robots often operate in scenarios where guaranteeing safety is critical. Methods like Hamilton-Jacobi reachability can provide guaranteed safe sets and controllers for such systems. However, often these same scenarios have unknown or uncertain environments, system dynamics, or predictions of other agents. As the system is operating, it may learn new k… ▽ More Autonomous systems like aircraft and assistive robots often operate in scenarios where guaranteeing safety is critical. Methods like Hamilton-Jacobi reachability can provide guaranteed safe sets and controllers for such systems. However, often these same scenarios have unknown or uncertain environments, system dynamics, or predictions of other agents. As the system is operating, it may learn new knowledge about these uncertainties and should therefore update its safety analysis accordingly. However, work to learn and update safety analysis is limited to small systems of about two dimensions due to the computational complexity of the analysis. In this paper we synthesize several techniques to speed up computation: decomposition, warm-starting, and adaptive grids. Using this new framework we can update safe sets by one or more orders of magnitude faster than prior work, making this technique practical for many realistic systems. We demonstrate our results on simulated 2D and 10D near-hover quadcopters operating in a windy environment. △ Less

Submitted 2 April, 2021; v1 submitted 14 January, 2021; originally announced January 2021.

Comments: The first two authors are co-first authors. ICRA 2021

arXiv:2012.10140 [pdf, other]

Voronoi Progressive Widening: Efficient Online Solvers for Continuous State, Action, and Observation POMDPs

Authors: Michael H. Lim, Claire J. Tomlin, Zachary N. Sunberg

Abstract: This paper introduces Voronoi Progressive Widening (VPW), a generalization of Voronoi optimistic optimization (VOO) and action progressive widening to partially observable Markov decision processes (POMDPs). Tree search algorithms can use VPW to effectively handle continuous or hybrid action spaces by efficiently balancing local and global action searching. This paper proposes two VPW-based algori… ▽ More This paper introduces Voronoi Progressive Widening (VPW), a generalization of Voronoi optimistic optimization (VOO) and action progressive widening to partially observable Markov decision processes (POMDPs). Tree search algorithms can use VPW to effectively handle continuous or hybrid action spaces by efficiently balancing local and global action searching. This paper proposes two VPW-based algorithms and analyzes them from theoretical and simulation perspectives. Voronoi Optimistic Weighted Sparse Sampling (VOWSS) is a theoretical tool that justifies VPW-based online solvers, and it is the first algorithm with global convergence guarantees for continuous state, action, and observation POMDPs. Voronoi Optimistic Monte Carlo Planning with Observation Weighting (VOMCPOW) is a versatile and efficient algorithm that consistently outperforms state-of-the-art POMDP algorithms in several simulation experiments. △ Less

Submitted 1 April, 2021; v1 submitted 18 December, 2020; originally announced December 2020.

arXiv:2011.07183 [pdf, other]

Gaussian Process-based Min-norm Stabilizing Controller for Control-Affine Systems with Uncertain Input Effects and Dynamics

Authors: Fernando Castañeda, Jason J. Choi, Bike Zhang, Claire J. Tomlin, Koushil Sreenath

Abstract: This paper presents a method to design a min-norm Control Lyapunov Function (CLF)-based stabilizing controller for a control-affine system with uncertain dynamics using Gaussian Process (GP) regression. In order to estimate both state and input-dependent model uncertainty, we propose a novel compound kernel that captures the control-affine nature of the problem. Furthermore, by the use of GP Upper… ▽ More This paper presents a method to design a min-norm Control Lyapunov Function (CLF)-based stabilizing controller for a control-affine system with uncertain dynamics using Gaussian Process (GP) regression. In order to estimate both state and input-dependent model uncertainty, we propose a novel compound kernel that captures the control-affine nature of the problem. Furthermore, by the use of GP Upper Confidence Bound analysis, we provide probabilistic bounds of the regression error, leading to the formulation of a CLF-based stability chance constraint which can be incorporated in a min-norm optimization problem. We show that this resulting optimization problem is convex, and we call it Gaussian Process-based Control Lyapunov Function Second-Order Cone Program (GP-CLF-SOCP). The data-collection process and the training of the GP regression model are carried out in an episodic learning fashion. We validate the proposed algorithm and controller in numerical simulations of an inverted pendulum and a kinematic bicycle model, resulting in stable trajectories which are very similar to the ones obtained if we actually knew the true plant dynamics. △ Less

Submitted 23 March, 2021; v1 submitted 13 November, 2020; originally announced November 2020.

Comments: The first two authors contributed equally. To appear at the 2021 American Control Conference (ACC)

arXiv:2011.04815 [pdf, other]

Encoding Defensive Driving as a Dynamic Nash Game

Authors: Chih-Yuan Chiu, David Fridovich-Keil, Claire J. Tomlin

Abstract: Robots deployed in real-world environments should operate safely in a robust manner. In scenarios where an "ego" agent navigates in an environment with multiple other "non-ego" agents, two modes of safety are commonly proposed -- adversarial robustness and probabilistic constraint satisfaction. However, while the former is generally computationally intractable and leads to overconservative solutio… ▽ More Robots deployed in real-world environments should operate safely in a robust manner. In scenarios where an "ego" agent navigates in an environment with multiple other "non-ego" agents, two modes of safety are commonly proposed -- adversarial robustness and probabilistic constraint satisfaction. However, while the former is generally computationally intractable and leads to overconservative solutions, the latter typically relies on strong distributional assumptions and ignores strategic coupling between agents. To avoid these drawbacks, we present a novel formulation of robustness within the framework of general-sum dynamic game theory, modeled on defensive driving. More precisely, we prepend an adversarial phase to the ego agent's cost function. That is, we prepend a time interval during which other agents are assumed to be temporarily distracted, in order to render the ego agent's equilibrium trajectory robust against other agents' potentially dangerous behavior during this time. We demonstrate the effectiveness of our new formulation in encoding safety via multiple traffic scenarios. △ Less

Submitted 30 March, 2021; v1 submitted 9 November, 2020; originally announced November 2020.

Comments: Accepted to ICRA 2021

arXiv:2011.00601 [pdf, other]

Approximate Solutions to a Class of Reachability Games

Authors: David Fridovich-Keil, Claire J. Tomlin

Abstract: In this paper, we present a method for finding approximate Nash equilibria in a broad class of reachability games. These games are often used to formulate both collision avoidance and goal satisfaction. Our method is computationally efficient, running in real-time for scenarios involving multiple players and more than ten state dimensions. The proposed approach forms a family of increasingly exact… ▽ More In this paper, we present a method for finding approximate Nash equilibria in a broad class of reachability games. These games are often used to formulate both collision avoidance and goal satisfaction. Our method is computationally efficient, running in real-time for scenarios involving multiple players and more than ten state dimensions. The proposed approach forms a family of increasingly exact approximations to the original game. Our results characterize the quality of these approximations and show operation in a receding horizon, minimally-invasive control context. Additionally, as a special case, our method reduces to local gradient-based optimization in the single-player (optimal control) setting, for which a wide variety of efficient algorithms exist. △ Less

Submitted 20 March, 2021; v1 submitted 1 November, 2020; originally announced November 2020.

Comments: Conference paper at ICRA 2021

arXiv:2009.02874 [pdf]

Dynamically Computing Adversarial Perturbations for Recurrent Neural Networks

Authors: Shankar A. Deka, Dušan M. Stipanović, Claire J. Tomlin

Abstract: Convolutional and recurrent neural networks have been widely employed to achieve state-of-the-art performance on classification tasks. However, it has also been noted that these networks can be manipulated adversarially with relative ease, by carefully crafted additive perturbations to the input. Though several experimentally established prior works exist on crafting and defending against attacks,… ▽ More Convolutional and recurrent neural networks have been widely employed to achieve state-of-the-art performance on classification tasks. However, it has also been noted that these networks can be manipulated adversarially with relative ease, by carefully crafted additive perturbations to the input. Though several experimentally established prior works exist on crafting and defending against attacks, it is also desirable to have theoretical guarantees on the existence of adversarial examples and robustness margins of the network to such examples. We provide both in this paper. We focus specifically on recurrent architectures and draw inspiration from dynamical systems theory to naturally cast this as a control problem, allowing us to dynamically compute adversarial perturbations at each timestep of the input sequence, thus resembling a feedback controller. Illustrative examples are provided to supplement the theoretical discussions. △ Less

Submitted 6 September, 2020; originally announced September 2020.

Comments: Submitted to IEEE Transactions on Neural Networks and Learning Systems

MSC Class: 68T07; 93B52; 93C10; 49N90 ACM Class: I.2.8

arXiv:2004.07584 [pdf, other]

Reinforcement Learning for Safety-Critical Control under Model Uncertainty, using Control Lyapunov Functions and Control Barrier Functions

Authors: Jason Choi, Fernando Castañeda, Claire J. Tomlin, Koushil Sreenath

Abstract: In this paper, the issue of model uncertainty in safety-critical control is addressed with a data-driven approach. For this purpose, we utilize the structure of an input-ouput linearization controller based on a nominal model along with a Control Barrier Function and Control Lyapunov Function based Quadratic Program (CBF-CLF-QP). Specifically, we propose a novel reinforcement learning framework wh… ▽ More In this paper, the issue of model uncertainty in safety-critical control is addressed with a data-driven approach. For this purpose, we utilize the structure of an input-ouput linearization controller based on a nominal model along with a Control Barrier Function and Control Lyapunov Function based Quadratic Program (CBF-CLF-QP). Specifically, we propose a novel reinforcement learning framework which learns the model uncertainty present in the CBF and CLF constraints, as well as other control-affine dynamic constraints in the quadratic program. The trained policy is combined with the nominal model-based CBF-CLF-QP, resulting in the Reinforcement Learning-based CBF-CLF-QP (RL-CBF-CLF-QP), which addresses the problem of model uncertainty in the safety constraints. The performance of the proposed method is validated by testing it on an underactuated nonlinear bipedal robot walking on randomly spaced stepping stones with one step preview, obtaining stable and safe walking under model uncertainty. △ Less

Submitted 4 June, 2020; v1 submitted 16 April, 2020; originally announced April 2020.

Comments: The first two authors contributed equally to this work

arXiv:2004.07276 [pdf, other]

Improving Input-Output Linearizing Controllers for Bipedal Robots via Reinforcement Learning

Authors: Fernando Castañeda, Mathias Wulfman, Ayush Agrawal, Tyler Westenbroek, Claire J. Tomlin, S. Shankar Sastry, Koushil Sreenath

Abstract: The main drawbacks of input-output linearizing controllers are the need for precise dynamics models and not being able to account for input constraints. Model uncertainty is common in almost every robotic application and input saturation is present in every real world system. In this paper, we address both challenges for the specific case of bipedal robot control by the use of reinforcement learni… ▽ More The main drawbacks of input-output linearizing controllers are the need for precise dynamics models and not being able to account for input constraints. Model uncertainty is common in almost every robotic application and input saturation is present in every real world system. In this paper, we address both challenges for the specific case of bipedal robot control by the use of reinforcement learning techniques. Taking the structure of a standard input-output linearizing controller, we use an additive learned term that compensates for model uncertainty. Moreover, by adding constraints to the learning problem we manage to boost the performance of the final controller when input limits are present. We demonstrate the effectiveness of the designed framework for different levels of uncertainty on the five-link planar walking robot RABBIT. △ Less

Submitted 2 May, 2020; v1 submitted 15 April, 2020; originally announced April 2020.

Comments: Final version appearing in Learning for Dynamics and Control (L4DC) 2020 Conference

arXiv:2004.02766 [pdf, other]

Technical Report: Adaptive Control for Linearizable Systems Using On-Policy Reinforcement Learning

Authors: Tyler Westenbroek, Eric Mazumdar, David Fridovich-Keil, Valmik Prabhu, Claire J. Tomlin, S. Shankar Sastry

Abstract: This paper proposes a framework for adaptively learning a feedback linearization-based tracking controller for an unknown system using discrete-time model-free policy-gradient parameter update rules. The primary advantage of the scheme over standard model-reference adaptive control techniques is that it does not require the learned inverse model to be invertible at all instances of time. This enab… ▽ More This paper proposes a framework for adaptively learning a feedback linearization-based tracking controller for an unknown system using discrete-time model-free policy-gradient parameter update rules. The primary advantage of the scheme over standard model-reference adaptive control techniques is that it does not require the learned inverse model to be invertible at all instances of time. This enables the use of general function approximators to approximate the linearizing controller for the system without having to worry about singularities. However, the discrete-time and stochastic nature of these algorithms precludes the direct application of standard machinery from the adaptive control literature to provide deterministic stability proofs for the system. Nevertheless, we leverage these techniques alongside tools from the stochastic approximation literature to demonstrate that with high probability the tracking and parameter errors concentrate near zero when a certain persistence of excitation condition is satisfied. A simulated example of a double pendulum demonstrates the utility of the proposed theory. 1 △ Less

Submitted 6 April, 2020; originally announced April 2020.

arXiv:2002.04354 [pdf, other]

Inference-Based Strategy Alignment for General-Sum Differential Games

Authors: Lasse Peters, David Fridovich-Keil, Claire J. Tomlin, Zachary N. Sunberg

Abstract: In many settings where multiple agents interact, the optimal choices for each agent depend heavily on the choices of the others. These coupled interactions are well-described by a general-sum differential game, in which players have differing objectives, the state evolves in continuous time, and optimal play may be characterized by one of many equilibrium concepts, e.g., a Nash equilibrium. Often,… ▽ More In many settings where multiple agents interact, the optimal choices for each agent depend heavily on the choices of the others. These coupled interactions are well-described by a general-sum differential game, in which players have differing objectives, the state evolves in continuous time, and optimal play may be characterized by one of many equilibrium concepts, e.g., a Nash equilibrium. Often, problems admit multiple equilibria. From the perspective of a single agent in such a game, this multiplicity of solutions can introduce uncertainty about how other agents will behave. This paper proposes a general framework for resolving ambiguity between equilibria by reasoning about the equilibrium other agents are aiming for. We demonstrate this framework in simulations of a multi-player human-robot navigation problem that yields two main conclusions: First, by inferring which equilibrium humans are operating at, the robot is able to predict trajectories more accurately, and second, by discovering and aligning itself to this equilibrium the robot is able to reduce the cost for all players. △ Less

Submitted 6 May, 2020; v1 submitted 11 February, 2020; originally announced February 2020.

arXiv:1910.13369 [pdf, other]

A Hamilton-Jacobi Reachability-Based Framework for Predicting and Analyzing Human Motion for Safe Planning

Authors: Somil Bansal, Andrea Bajcsy, Ellis Ratner, Anca D. Dragan, Claire J. Tomlin

Abstract: Real-world autonomous systems often employ probabilistic predictive models of human behavior during planning to reason about their future motion. Since accurately modeling human behavior a priori is challenging, such models are often parameterized, enabling the robot to adapt predictions based on observations by maintaining a distribution over the model parameters. Although this enables data and p… ▽ More Real-world autonomous systems often employ probabilistic predictive models of human behavior during planning to reason about their future motion. Since accurately modeling human behavior a priori is challenging, such models are often parameterized, enabling the robot to adapt predictions based on observations by maintaining a distribution over the model parameters. Although this enables data and priors to improve the human model, observation models are difficult to specify and priors may be incorrect, leading to erroneous state predictions that can degrade the safety of the robot motion plan. In this work, we seek to design a predictor which is more robust to misspecified models and priors, but can still leverage human behavioral data online to reduce conservatism in a safe way. To do this, we cast human motion prediction as a Hamilton-Jacobi reachability problem in the joint state space of the human and the belief over the model parameters. We construct a new continuous-time dynamical system, where the inputs are the observations of human behavior, and the dynamics include how the belief over the model parameters change. The results of this reachability computation enable us to both analyze the effect of incorrect priors on future predictions in continuous state and time, as well as to make predictions of the human state in the future. We compare our approach to the worst-case forward reachable set and a stochastic predictor which uses Bayesian inference and produces full future state distributions. Our comparisons in simulation and in hardware demonstrate how our framework can enable robust planning while not being overly conservative, even when the human model is inaccurate. △ Less

Submitted 5 April, 2020; v1 submitted 29 October, 2019; originally announced October 2019.

arXiv:1910.13272 [pdf, other]

Feedback Linearization for Unknown Systems via Reinforcement Learning

Authors: Tyler Westenbroek, David Fridovich-Keil, Eric Mazumdar, Shreyas Arora, Valmik Prabhu, S. Shankar Sastry, Claire J. Tomlin

Abstract: We present a novel approach to control design for nonlinear systems which leverages model-free policy optimization techniques to learn a linearizing controller for a physical plant with unknown dynamics. Feedback linearization is a technique from nonlinear control which renders the input-output dynamics of a nonlinear plant \emph{linear} under application of an appropriate feedback controller. Onc… ▽ More We present a novel approach to control design for nonlinear systems which leverages model-free policy optimization techniques to learn a linearizing controller for a physical plant with unknown dynamics. Feedback linearization is a technique from nonlinear control which renders the input-output dynamics of a nonlinear plant \emph{linear} under application of an appropriate feedback controller. Once a linearizing controller has been constructed, desired output trajectories for the nonlinear plant can be tracked using a variety of linear control techniques. However, the calculation of a linearizing controller requires a precise dynamics model for the system. As a result, model-based approaches for learning exact linearizing controllers generally require a simple, highly structured model of the system with easily identifiable parameters. In contrast, the model-free approach presented in this paper is able to approximate the linearizing controller for the plant using general function approximation architectures. Specifically, we formulate a continuous-time optimization problem over the parameters of a learned linearizing controller whose optima are the set of parameters which best linearize the plant. We derive conditions under which the learning problem is (strongly) convex and provide guarantees which ensure the true linearizing controller for the plant is recovered. We then discuss how model-free policy optimization algorithms can be used to solve a discrete-time approximation to the problem using data collected from the real-world plant. The utility of the framework is demonstrated in simulation and on a real-world robotic platform. △ Less

Submitted 21 April, 2020; v1 submitted 29 October, 2019; originally announced October 2019.

arXiv:1910.04332 [pdf, other]

doi 10.24963/ijcai.2020/572

Sparse tree search optimality guarantees in POMDPs with continuous observation spaces

Authors: Michael H. Lim, Claire J. Tomlin, Zachary N. Sunberg

Abstract: Partially observable Markov decision processes (POMDPs) with continuous state and observation spaces have powerful flexibility for representing real-world decision and control problems but are notoriously difficult to solve. Recent online sampling-based algorithms that use observation likelihood weighting have shown unprecedented effectiveness in domains with continuous observation spaces. However… ▽ More Partially observable Markov decision processes (POMDPs) with continuous state and observation spaces have powerful flexibility for representing real-world decision and control problems but are notoriously difficult to solve. Recent online sampling-based algorithms that use observation likelihood weighting have shown unprecedented effectiveness in domains with continuous observation spaces. However there has been no formal theoretical justification for this technique. This work offers such a justification, proving that a simplified algorithm, partially observable weighted sparse sampling (POWSS), will estimate Q-values accurately with high probability and can be made to perform arbitrarily near the optimal solution by increasing computational power. △ Less

Submitted 5 June, 2023; v1 submitted 9 October, 2019; originally announced October 2019.

arXiv:1910.00681 [pdf, other]

An Iterative Quadratic Method for General-Sum Differential Games with Feedback Linearizable Dynamics

Authors: David Fridovich-Keil, Vicenc Rubies-Royo, Claire J. Tomlin

Abstract: Iterative linear-quadratic (ILQ) methods are widely used in the nonlinear optimal control community. Recent work has applied similar methodology in the setting of multiplayer general-sum differential games. Here, ILQ methods are capable of finding local equilibria in interactive motion planning problems in real-time. As in most iterative procedures, however, this approach can be sensitive to initi… ▽ More Iterative linear-quadratic (ILQ) methods are widely used in the nonlinear optimal control community. Recent work has applied similar methodology in the setting of multiplayer general-sum differential games. Here, ILQ methods are capable of finding local equilibria in interactive motion planning problems in real-time. As in most iterative procedures, however, this approach can be sensitive to initial conditions and hyperparameter choices, which can result in poor computational performance or even unsafe trajectories. In this paper, we focus our attention on a broad class of dynamical systems which are feedback linearizable, and exploit this structure to improve both algorithmic reliability and runtime. We showcase our new algorithm in three distinct traffic scenarios, and observe that in practice our method converges significantly more often and more quickly than was possible without exploiting the feedback linearizable structure. △ Less

Submitted 19 March, 2020; v1 submitted 1 October, 2019; originally announced October 2019.

Comments: 7 pages, 5 figures, accepted to IEEE International Conference on Robotics and Automation (2020)

arXiv:1909.05699 [pdf, other]

Closed-loop Model Selection for Kernel-based Models using Bayesian Optimization

Authors: Thomas Beckers, Somil Bansal, Claire J. Tomlin, Sandra Hirche

Abstract: Kernel-based nonparametric models have become very attractive for model-based control approaches for nonlinear systems. However, the selection of the kernel and its hyperparameters strongly influences the quality of the learned model. Classically, these hyperparameters are optimized to minimize the prediction error of the model but this process totally neglects its later usage in the control loop.… ▽ More Kernel-based nonparametric models have become very attractive for model-based control approaches for nonlinear systems. However, the selection of the kernel and its hyperparameters strongly influences the quality of the learned model. Classically, these hyperparameters are optimized to minimize the prediction error of the model but this process totally neglects its later usage in the control loop. In this work, we present a framework to optimize the kernel and hyperparameters of a kernel-based model directly with respect to the closed-loop performance of the model. Our framework uses Bayesian optimization to iteratively refine the kernel-based model using the observed performance on the actual system until a desired performance is achieved. We demonstrate the proposed approach in a simulation and on a 3-DoF robotic arm. △ Less

Submitted 12 September, 2019; originally announced September 2019.

Journal ref: IEEE Conference on Decision and Control 2019

arXiv:1909.04694 [pdf, other]

Efficient Iterative Linear-Quadratic Approximations for Nonlinear Multi-Player General-Sum Differential Games

Authors: David Fridovich-Keil, Ellis Ratner, Lasse Peters, Anca D. Dragan, Claire J. Tomlin

Abstract: Many problems in robotics involve multiple decision making agents. To operate efficiently in such settings, a robot must reason about the impact of its decisions on the behavior of other agents. Differential games offer an expressive theoretical framework for formulating these types of multi-agent problems. Unfortunately, most numerical solution techniques scale poorly with state dimension and are… ▽ More Many problems in robotics involve multiple decision making agents. To operate efficiently in such settings, a robot must reason about the impact of its decisions on the behavior of other agents. Differential games offer an expressive theoretical framework for formulating these types of multi-agent problems. Unfortunately, most numerical solution techniques scale poorly with state dimension and are rarely used in real-time applications. For this reason, it is common to predict the future decisions of other agents and solve the resulting decoupled, i.e., single-agent, optimal control problem. This decoupling neglects the underlying interactive nature of the problem; however, efficient solution techniques do exist for broad classes of optimal control problems. We take inspiration from one such technique, the iterative linear-quadratic regulator (ILQR), which solves repeated approximations with linear dynamics and quadratic costs. Similarly, our proposed algorithm solves repeated linear-quadratic games. We experimentally benchmark our algorithm in several examples with a variety of initial conditions and show that the resulting strategies exhibit complex interactive behavior. Our results indicate that our algorithm converges reliably and runs in real-time. In a three-player, 14-state simulated intersection problem, our algorithm initially converges in < 0.25s. Receding horizon invocations converge in < 50 ms in a hardware collision-avoidance test. △ Less

Submitted 18 March, 2020; v1 submitted 10 September, 2019; originally announced September 2019.

Comments: 8 pages, 4 figures, accepted to the IEEE International Conference on Robotics and Automation

arXiv:1905.00532 [pdf, other]

An Efficient Reachability-Based Framework for Provably Safe Autonomous Navigation in Unknown Environments

Authors: Andrea Bajcsy, Somil Bansal, Eli Bronstein, Varun Tolani, Claire J. Tomlin

Abstract: Real-world autonomous vehicles often operate in a priori unknown environments. Since most of these systems are safety-critical, it is important to ensure they operate safely in the face of environment uncertainty, such as unseen obstacles. Current safety analysis tools enable autonomous systems to reason about safety given full information about the state of the environment a priori. However, thes… ▽ More Real-world autonomous vehicles often operate in a priori unknown environments. Since most of these systems are safety-critical, it is important to ensure they operate safely in the face of environment uncertainty, such as unseen obstacles. Current safety analysis tools enable autonomous systems to reason about safety given full information about the state of the environment a priori. However, these tools do not scale well to scenarios where the environment is being sensed in real time, such as during navigation tasks. In this work, we propose a novel, real-time safety analysis method based on Hamilton-Jacobi reachability that provides strong safety guarantees despite environment uncertainty. Our safety method is planner-agnostic and provides guarantees for a variety of mapping sensors. We demonstrate our approach in simulation and in hardware to provide safety guarantees around a state-of-the-art vision-based, learning-based planner. △ Less

Submitted 1 May, 2019; originally announced May 2019.

arXiv:1902.10320 [pdf, other]

doi 10.1145/3302504.3311795

A New Simulation Metric to Determine Safe Environments and Controllers for Systems with Unknown Dynamics

Authors: Shromona Ghosh, Somil Bansal, Alberto Sangiovanni-Vincentelli, Sanjit A. Seshia, Claire J. Tomlin

Abstract: We consider the problem of extracting safe environments and controllers for reach-avoid objectives for systems with known state and control spaces, but unknown dynamics. In a given environment, a common approach is to synthesize a controller from an abstraction or a model of the system (potentially learned from data). However, in many situations, the relationship between the dynamics of the model… ▽ More We consider the problem of extracting safe environments and controllers for reach-avoid objectives for systems with known state and control spaces, but unknown dynamics. In a given environment, a common approach is to synthesize a controller from an abstraction or a model of the system (potentially learned from data). However, in many situations, the relationship between the dynamics of the model and the \textit{actual system} is not known; and hence it is difficult to provide safety guarantees for the system. In such cases, the Standard Simulation Metric (SSM), defined as the worst-case norm distance between the model and the system output trajectories, can be used to modify a reach-avoid specification for the system into a more stringent specification for the abstraction. Nevertheless, the obtained distance, and hence the modified specification, can be quite conservative. This limits the set of environments for which a safe controller can be obtained. We propose SPEC, a specification-centric simulation metric, which overcomes these limitations by computing the distance using only the trajectories that violate the specification for the system. We show that modifying a reach-avoid specification with SPEC allows us to synthesize a safe controller for a larger set of environments compared to SSM. We also propose a probabilistic method to compute SPEC for a general class of systems. Case studies using simulators for quadrotors and autonomous cars illustrate the advantages of the proposed metric for determining safe environment sets and controllers. △ Less

Submitted 26 February, 2019; originally announced February 2019.

Comments: 22nd ACM International Conference on Hybrid Systems: Computation and Control (2019)

arXiv:1811.07834 [pdf, other]

Safely Probabilistically Complete Real-Time Planning and Exploration in Unknown Environments

Authors: David Fridovich-Keil, Jaime F. Fisac, Claire J. Tomlin

Abstract: We present a new framework for motion planning that wraps around existing kinodynamic planners and guarantees recursive feasibility when operating in a priori unknown, static environments. Our approach makes strong guarantees about overall safety and collision avoidance by utilizing a robust controller derived from reachability analysis. We ensure that motion plans never exit the safe backward rea… ▽ More We present a new framework for motion planning that wraps around existing kinodynamic planners and guarantees recursive feasibility when operating in a priori unknown, static environments. Our approach makes strong guarantees about overall safety and collision avoidance by utilizing a robust controller derived from reachability analysis. We ensure that motion plans never exit the safe backward reachable set of the initial state, while safely exploring the space. This preserves the safety of the initial state, and guarantees that that we will eventually find the goal if it is possible to do so while exploring safely. We implement our framework in the Robot Operating System (ROS) software environment and demonstrate it in a real-time simulation. △ Less

Submitted 6 March, 2019; v1 submitted 19 November, 2018; originally announced November 2018.

Comments: 7 pages, accepted to ICRA 2019

arXiv:1811.05929 [pdf, other]

A Scalable Framework For Real-Time Multi-Robot, Multi-Human Collision Avoidance

Authors: Andrea Bajcsy, Sylvia L. Herbert, David Fridovich-Keil, Jaime F. Fisac, Sampada Deglurkar, Anca D. Dragan, Claire J. Tomlin

Abstract: Robust motion planning is a well-studied problem in the robotics literature, yet current algorithms struggle to operate scalably and safely in the presence of other moving agents, such as humans. This paper introduces a novel framework for robot navigation that accounts for high-order system dynamics and maintains safety in the presence of external disturbances, other robots, and non-deterministic… ▽ More Robust motion planning is a well-studied problem in the robotics literature, yet current algorithms struggle to operate scalably and safely in the presence of other moving agents, such as humans. This paper introduces a novel framework for robot navigation that accounts for high-order system dynamics and maintains safety in the presence of external disturbances, other robots, and non-deterministic intentional agents. Our approach precomputes a tracking error margin for each robot, generates confidence-aware human motion predictions, and coordinates multiple robots with a sequential priority ordering, effectively enabling scalable safe trajectory planning and execution. We demonstrate our approach in hardware with two robots and two humans. We also showcase our work's scalability in a larger simulation. △ Less

Submitted 14 November, 2018; originally announced November 2018.

arXiv:1809.00706 [pdf, other]

A Minimum Discounted Reward Hamilton-Jacobi Formulation for Computing Reachable Sets

Authors: Anayo K. Akametalu, Shromona Ghosh, Jaime F. Fisac, Claire J. Tomlin

Abstract: We propose a novel formulation for approximating reachable sets through a minimum discounted reward optimal control problem. The formulation yields a continuous solution that can be obtained by solving a Hamilton-Jacobi equation. Furthermore, the numerical approximation to this solution can be obtained as the unique fixed-point to a contraction mapping. This allows for more efficient solution meth… ▽ More We propose a novel formulation for approximating reachable sets through a minimum discounted reward optimal control problem. The formulation yields a continuous solution that can be obtained by solving a Hamilton-Jacobi equation. Furthermore, the numerical approximation to this solution can be obtained as the unique fixed-point to a contraction mapping. This allows for more efficient solution methods that could not be applied under traditional formulations for solving reachable sets. In addition, this formulation provides a link between reinforcement learning and learning reachable sets for systems with unknown dynamics, allowing algorithms from the former to be applied to the latter. We use two benchmark examples, double integrator, and pursuit-evasion games, to show the correctness of the formulation as well as its strengths in comparison to previous work. △ Less

Submitted 3 September, 2018; originally announced September 2018.

arXiv:1808.00649 [pdf, other]

Robust Tracking with Model Mismatch for Fast and Safe Planning: an SOS Optimization Approach

Authors: Sumeet Singh, Mo Chen, Sylvia L. Herbert, Claire J. Tomlin, Marco Pavone

Abstract: In the pursuit of real-time motion planning, a commonly adopted practice is to compute a trajectory by running a planning algorithm on a simplified, low-dimensional dynamical model, and then employ a feedback tracking controller that tracks such a trajectory by accounting for the full, high-dimensional system dynamics. While this strategy of planning with model mismatch generally yields fast compu… ▽ More In the pursuit of real-time motion planning, a commonly adopted practice is to compute a trajectory by running a planning algorithm on a simplified, low-dimensional dynamical model, and then employ a feedback tracking controller that tracks such a trajectory by accounting for the full, high-dimensional system dynamics. While this strategy of planning with model mismatch generally yields fast computation times, there are no guarantees of dynamic feasibility, which hampers application to safety-critical systems. Building upon recent work that addressed this problem through the lens of Hamilton-Jacobi (HJ) reachability, we devise an algorithmic framework whereby one computes, offline, for a pair of "planner" (i.e., low-dimensional) and "tracking" (i.e., high-dimensional) models, a feedback tracking controller and associated tracking bound. This bound is then used as a safety margin when generating motion plans via the low-dimensional model. Specifically, we harness the computational tool of sum-of-squares (SOS) programming to design a bilinear optimization algorithm for the computation of the feedback tracking controller and associated tracking bound. The algorithm is demonstrated via numerical experiments, with an emphasis on investigating the trade-off between the increased computational scalability afforded by SOS and its intrinsic conservativeness. Collectively, our results enable scaling the appealing strategy of planning with model mismatch to systems that are beyond the reach of HJ analysis, while maintaining safety guarantees. △ Less

Submitted 28 July, 2019; v1 submitted 1 August, 2018; originally announced August 2018.

Comments: Presented at WAFR 2018; final version v2 -- fixed typos

arXiv:1806.00109 [pdf, other]

Probabilistically Safe Robot Planning with Confidence-Based Human Predictions

Authors: Jaime F. Fisac, Andrea Bajcsy, Sylvia L. Herbert, David Fridovich-Keil, Steven Wang, Claire J. Tomlin, Anca D. Dragan

Abstract: In order to safely operate around humans, robots can employ predictive models of human motion. Unfortunately, these models cannot capture the full complexity of human behavior and necessarily introduce simplifying assumptions. As a result, predictions may degrade whenever the observed human behavior departs from the assumed structure, which can have negative implications for safety. In this paper,… ▽ More In order to safely operate around humans, robots can employ predictive models of human motion. Unfortunately, these models cannot capture the full complexity of human behavior and necessarily introduce simplifying assumptions. As a result, predictions may degrade whenever the observed human behavior departs from the assumed structure, which can have negative implications for safety. In this paper, we observe that how "rational" human actions appear under a particular model can be viewed as an indicator of that model's ability to describe the human's current motion. By reasoning about this model confidence in a real-time Bayesian framework, we show that the robot can very quickly modulate its predictions to become more uncertain when the model performs poorly. Building on recent work in provably-safe trajectory planning, we leverage these confidence-aware human motion predictions to generate assured autonomous robot motion. Our new analysis combines worst-case tracking error guarantees for the physical robot with probabilistic time-varying human predictions, yielding a quantitative, probabilistic safety certificate. We demonstrate our approach with a quadcopter navigating around a human. △ Less

Submitted 31 May, 2018; originally announced June 2018.

Comments: Robotics Science and Systems (RSS) 2018

arXiv:1803.03237 [pdf, other]

doi 10.1109/ICRA.2019.8793919

A Classification-based Approach for Approximate Reachability

Authors: Vicenc Rubies-Royo, David Fridovich-Keil, Sylvia Herbert, Claire J. Tomlin

Abstract: Hamilton-Jacobi (HJ) reachability analysis has been developed over the past decades into a widely-applicable tool for determining goal satisfaction and safety verification in nonlinear systems. While HJ reachability can be formulated very generally, computational complexity can be a serious impediment for many systems of practical interest. Much prior work has been devoted to computing approximate… ▽ More Hamilton-Jacobi (HJ) reachability analysis has been developed over the past decades into a widely-applicable tool for determining goal satisfaction and safety verification in nonlinear systems. While HJ reachability can be formulated very generally, computational complexity can be a serious impediment for many systems of practical interest. Much prior work has been devoted to computing approximate solutions to large reachability problems, yet many of these methods may only apply to very restrictive problem classes, do not generate controllers, and/or can be extremely conservative. In this paper, we present a new method for approximating the optimal controller of the HJ reachability problem for control-affine systems. While also a specific problem class, many dynamical systems of interest are, or can be well approximated, by control-affine models. We explicitly avoid storing a representation of the reachability value function, and instead learn a controller as a sequence of simple binary classifiers. We compare our approach to existing grid-based methodologies in HJ reachability and demonstrate its utility on several examples, including a physical quadrotor navigation task. △ Less

Submitted 20 May, 2019; v1 submitted 8 March, 2018; originally announced March 2018.

arXiv:1802.04929 [pdf, other]

Context-Specific Validation of Data-Driven Models

Authors: Somil Bansal, Shromona Ghosh, Alberto Sangiovanni-Vincentelli, Sanjit A. Seshia, Claire J. Tomlin

Abstract: With an increasing use of data-driven models to control robotic systems, it has become important to develop a methodology for validating such models before they can be deployed to design a controller for the actual system. Specifically, it must be ensured that the controller designed for a learned model would perform as expected on the actual physical system. We propose a context-specific validati… ▽ More With an increasing use of data-driven models to control robotic systems, it has become important to develop a methodology for validating such models before they can be deployed to design a controller for the actual system. Specifically, it must be ensured that the controller designed for a learned model would perform as expected on the actual physical system. We propose a context-specific validation framework to quantify the quality of a learned model based on a distance measure between the closed-loop actual system and the learned model. We then propose an active sampling scheme to compute a probabilistic upper bound on this distance in a sample-efficient manner. The proposed framework validates the learned model against only those behaviors of the system that are relevant for the purpose for which we intend to use this model, and does not require any a priori knowledge of the system dynamics. Several simulations illustrate the practicality of the proposed framework for validating the models of real-world systems, and consequently, for controller synthesis. △ Less

Submitted 25 March, 2018; v1 submitted 13 February, 2018; originally announced February 2018.

arXiv:1711.05928 [pdf, ps, other]

Budget-Constrained Multi-Armed Bandits with Multiple Plays

Authors: Datong P. Zhou, Claire J. Tomlin

Abstract: We study the multi-armed bandit problem with multiple plays and a budget constraint for both the stochastic and the adversarial setting. At each round, exactly $K$ out of $N$ possible arms have to be played (with $1\leq K \leq N$). In addition to observing the individual rewards for each arm played, the player also learns a vector of costs which has to be covered with an a-priori defined budget… ▽ More We study the multi-armed bandit problem with multiple plays and a budget constraint for both the stochastic and the adversarial setting. At each round, exactly $K$ out of $N$ possible arms have to be played (with $1\leq K \leq N$). In addition to observing the individual rewards for each arm played, the player also learns a vector of costs which has to be covered with an a-priori defined budget $B$. The game ends when the sum of current costs associated with the played arms exceeds the remaining budget. Firstly, we analyze this setting for the stochastic case, for which we assume each arm to have an underlying cost and reward distribution with support $[c_{\min}, 1]$ and $[0, 1]$, respectively. We derive an Upper Confidence Bound (UCB) algorithm which achieves $O(NK^4 \log B)$ regret. Secondly, for the adversarial case in which the entire sequence of rewards and costs is fixed in advance, we derive an upper bound on the regret of order $O(\sqrt{NB\log(N/K)})$ utilizing an extension of the well-known $\texttt{Exp3}$ algorithm. We also provide upper bounds that hold with high probability and a lower bound of order $Ω((1 - K/N)^2 \sqrt{NB/K})$. △ Less

Submitted 16 November, 2017; originally announced November 2017.

Comments: 20 pages

arXiv:1710.04731 [pdf, other]

Planning, Fast and Slow: A Framework for Adaptive Real-Time Safe Trajectory Planning

Authors: David Fridovich-Keil, Sylvia L. Herbert, Jaime F. Fisac, Sampada Deglurkar, Claire J. Tomlin

Abstract: Motion planning is an extremely well-studied problem in the robotics community, yet existing work largely falls into one of two categories: computationally efficient but with few if any safety guarantees, or able to give stronger guarantees but at high computational cost. This work builds on a recent development called FaSTrack in which a slow offline computation provides a modular safety guarante… ▽ More Motion planning is an extremely well-studied problem in the robotics community, yet existing work largely falls into one of two categories: computationally efficient but with few if any safety guarantees, or able to give stronger guarantees but at high computational cost. This work builds on a recent development called FaSTrack in which a slow offline computation provides a modular safety guarantee for a faster online planner. We introduce the notion of "meta-planning" in which a refined offline computation enables safe switching between different online planners. This provides autonomous systems with the ability to adapt motion plans to a priori unknown environments in real-time as sensor measurements detect new obstacles, and the flexibility to maneuver differently in the presence of obstacles than they would in free space, all while maintaining a strict safety guarantee. We demonstrate the meta-planning algorithm both in simulation and in hardware using a small Crazyflie 2.0 quadrotor. △ Less

Submitted 6 March, 2018; v1 submitted 12 October, 2017; originally announced October 2017.

Comments: ICRA, International Conference on Robotics and Automation, ICRA 2018, 8 pages, 9 figures

arXiv:1705.01292 [pdf, other]

A General Safety Framework for Learning-Based Control in Uncertain Robotic Systems

Authors: Jaime F. Fisac, Anayo K. Akametalu, Melanie N. Zeilinger, Shahab Kaynama, Jeremy Gillula, Claire J. Tomlin

Abstract: The proven efficacy of learning-based control schemes strongly motivates their application to robotic systems operating in the physical world. However, guaranteeing correct operation during the learning process is currently an unresolved issue, which is of vital importance in safety-critical systems. We propose a general safety framework based on Hamilton-Jacobi reachability methods that can work… ▽ More The proven efficacy of learning-based control schemes strongly motivates their application to robotic systems operating in the physical world. However, guaranteeing correct operation during the learning process is currently an unresolved issue, which is of vital importance in safety-critical systems. We propose a general safety framework based on Hamilton-Jacobi reachability methods that can work in conjunction with an arbitrary learning algorithm. The method exploits approximate knowledge of the system dynamics to guarantee constraint satisfaction while minimally interfering with the learning process. We further introduce a Bayesian mechanism that refines the safety analysis as the system acquires new evidence, reducing initial conservativeness when appropriate while strengthening guarantees through real-time validation. The result is a least-restrictive, safety-preserving control law that intervenes only when (a) the computed safety guarantees require it, or (b) confidence in the computed guarantees decays in light of new observations. We prove theoretical safety guarantees combining probabilistic and worst-case analysis and demonstrate the proposed framework experimentally on a quadrotor vehicle. Even though safety analysis is based on a simple point-mass model, the quadrotor successfully arrives at a suitable controller by policy-gradient reinforcement learning without ever crashing, and safely retracts away from a strong external disturbance introduced during flight. △ Less

Submitted 14 February, 2018; v1 submitted 3 May, 2017; originally announced May 2017.

Comments: Accepted for publication in IEEE Transactions on Automatic Control. Video with experiments: https://youtu.be/WAAxyeSk2bw

ACM Class: I.2.9; I.2.8; I.2.6

arXiv:1703.09260 [pdf, other]

Goal-Driven Dynamics Learning via Bayesian Optimization

Authors: Somil Bansal, Roberto Calandra, Ted Xiao, Sergey Levine, Claire J. Tomlin

Abstract: Real-world robots are becoming increasingly complex and commonly act in poorly understood environments where it is extremely challenging to model or learn their true dynamics. Therefore, it might be desirable to take a task-specific approach, wherein the focus is on explicitly learning the dynamics model which achieves the best control performance for the task at hand, rather than learning the tru… ▽ More Real-world robots are becoming increasingly complex and commonly act in poorly understood environments where it is extremely challenging to model or learn their true dynamics. Therefore, it might be desirable to take a task-specific approach, wherein the focus is on explicitly learning the dynamics model which achieves the best control performance for the task at hand, rather than learning the true dynamics. In this work, we use Bayesian optimization in an active learning framework where a locally linear dynamics model is learned with the intent of maximizing the control performance, and used in conjunction with optimal control schemes to efficiently design a controller for a given task. This model is updated directly based on the performance observed in experiments on the physical system in an iterative manner until a desired performance is achieved. We demonstrate the efficacy of the proposed approach through simulations and real experiments on a quadrotor testbed. △ Less

Submitted 21 September, 2017; v1 submitted 27 March, 2017; originally announced March 2017.

Comments: This is the extended version of the CDC'17 paper titled "Goal-Driven Dynamics Learning via Bayesian Optimization."

arXiv:1703.07373 [pdf, other]

doi 10.1109/CDC.2017.8263867

FaSTrack: a Modular Framework for Fast and Guaranteed Safe Motion Planning

Authors: Sylvia L. Herbert, Mo Chen, SooJean Han, Somil Bansal, Jaime F. Fisac, Claire J. Tomlin

Abstract: Fast and safe navigation of dynamical systems through a priori unknown cluttered environments is vital to many applications of autonomous systems. However, trajectory planning for autonomous systems is computationally intensive, often requiring simplified dynamics that sacrifice safety and dynamic feasibility in order to plan efficiently. Conversely, safe trajectories can be computed using more so… ▽ More Fast and safe navigation of dynamical systems through a priori unknown cluttered environments is vital to many applications of autonomous systems. However, trajectory planning for autonomous systems is computationally intensive, often requiring simplified dynamics that sacrifice safety and dynamic feasibility in order to plan efficiently. Conversely, safe trajectories can be computed using more sophisticated dynamic models, but this is typically too slow to be used for real-time planning. We propose a new algorithm FaSTrack: Fast and Safe Tracking for High Dimensional systems. A path or trajectory planner using simplified dynamics to plan quickly can be incorporated into the FaSTrack framework, which provides a safety controller for the vehicle along with a guaranteed tracking error bound. This bound captures all possible deviations due to high dimensional dynamics and external disturbances. Note that FaSTrack is modular and can be used with most current path or trajectory planners. We demonstrate this framework using a 10D nonlinear quadrotor model tracking a 3D path obtained from an RRT planner. △ Less

Submitted 13 February, 2021; v1 submitted 21 March, 2017; originally announced March 2017.

Comments: Published in the Proceedings of the IEEE Conference on Decision and Control, 2017

arXiv:1703.00980 [pdf, other]

How Peer Effects Influence Energy Consumption

Authors: Datong P. Zhou, Mardavij Roozbehani, Munther A. Dahleh, Claire J. Tomlin

Abstract: This paper analyzes the impact of peer effects on electricity consumption of a network of rational, utility-maximizing users. Users derive utility from consuming electricity as well as consuming less energy than their neighbors. However, a disutility is incurred for consuming more than their neighbors. To maximize the profit of the load-serving entity that provides electricity to such users, we de… ▽ More This paper analyzes the impact of peer effects on electricity consumption of a network of rational, utility-maximizing users. Users derive utility from consuming electricity as well as consuming less energy than their neighbors. However, a disutility is incurred for consuming more than their neighbors. To maximize the profit of the load-serving entity that provides electricity to such users, we develop a two-stage game-theoretic model, where the entity sets the prices in the first stage. In the second stage, consumers decide on their demand in response to the observed price set in the first stage so as to maximize their utility. To this end, we derive theoretical statements under which such peer effects reduce aggregate user consumption. Further, we obtain expressions for the resulting electricity consumption and profit of the load serving entity for the case of perfect price discrimination and a single price under complete information, and approximations under incomplete information. Simulations suggest that exposing only a selected subset of all users to peer effects maximizes the entity's profit. △ Less

Submitted 17 March, 2017; v1 submitted 2 March, 2017; originally announced March 2017.

Comments: 9 pages, 4 figures

arXiv:1703.00972 [pdf, other]

Eliciting Private User Information for Residential Demand Response

Authors: Datong P. Zhou, Maximilian Balandat, Munther A. Dahleh, Claire J. Tomlin

Abstract: Residential Demand Response has emerged as a viable tool to alleviate supply and demand imbalances of electricity, particularly during times when the electric grid is strained due a shortage of supply. Demand Response providers bid reduction capacity into the wholesale electricity market by asking their customers under contract to temporarily reduce their consumption in exchange for a monetary inc… ▽ More Residential Demand Response has emerged as a viable tool to alleviate supply and demand imbalances of electricity, particularly during times when the electric grid is strained due a shortage of supply. Demand Response providers bid reduction capacity into the wholesale electricity market by asking their customers under contract to temporarily reduce their consumption in exchange for a monetary incentive. To contribute to the analysis of consumer behavior in response to such incentives, this paper formulates Demand Response as a Mechanism Design problem, where a Demand Response Provider elicits private information of its rational, profit-maximizing customers who derive positive expected utility by participating in reduction events. By designing an incentive compatible and individually rational mechanism to collect users' price elasticities of demand, the Demand Response provider can target the most susceptible users to incentives. We measure reductions by comparing the materialized consumption to the projected consumption, which we model as the "10-in-10"-baseline, the regulatory standard set by the California Independent System Operator. Due to the suboptimal performance of this baseline, we show, using consumption data of residential customers in California, that Demand Response Providers receive payments for "virtual reductions", which exist due to the inaccuracies of the baseline rather than actual reductions. Improving the accuracy of the baseline diminishes the contribution of these virtual reductions. △ Less

Submitted 3 September, 2017; v1 submitted 2 March, 2017; originally announced March 2017.

Comments: 8 pages, 7 figures

Showing 1–50 of 59 results for author: Tomlin, C J