-
Heterogeneous Roles against Assignment Based Policies in Two vs Two Target Defense Game
Authors:
Goutam Das,
Violetta Rostobaya,
James Berneburg,
Zachary I. Bell,
Michael Dorothy,
Daigo Shishika
Abstract:
In this paper, we consider a target defense game in which the attacker team seeks to reach a high-value target while the defender team seeks to prevent that by capturing them away from the target. To address the curse of dimensionality, a popular approach to solve such team-vs-team game is to decompose it into a set of one-vs-one games. Such an approximation assumes independence between teammates…
▽ More
In this paper, we consider a target defense game in which the attacker team seeks to reach a high-value target while the defender team seeks to prevent that by capturing them away from the target. To address the curse of dimensionality, a popular approach to solve such team-vs-team game is to decompose it into a set of one-vs-one games. Such an approximation assumes independence between teammates assigned to different one-vs-one games, ignoring the possibility of a richer set of cooperative behaviors, ultimately leading to suboptimality. In this paper, we provide teammate-aware strategies for the attacker team and show that they can outperform the assignment-based strategy, if the defenders still employ an assignment-based strategy. More specifically, the attacker strategy involves heterogeneous roles where one attacker actively intercepts a defender to help its teammate reach the target. We provide sufficient conditions under which such a strategy benefits the attackers, and we validate the results using numerical simulations.
△ Less
Submitted 14 September, 2024;
originally announced September 2024.
-
Scalar Field Mapping with Adaptive High-Intensity Region Avoidance
Authors:
Muzaffar Qureshi,
Tochukwu Elijah Ogri,
Zachary I. Bell,
Rushikesh Kamalapurkar
Abstract:
This research is motivated by a scenario where a group of UAVs is assigned to map an unknown scalar field, with the imperative of maintaining a safe distance from the sources of the field to evade detection or damage. The location of the sources is unknown a priori, so the UAVs rely on measurements of the field intensity to gauge safety. The UAVs estimate the unknown scalar field using Gaussian pr…
▽ More
This research is motivated by a scenario where a group of UAVs is assigned to map an unknown scalar field, with the imperative of maintaining a safe distance from the sources of the field to evade detection or damage. The location of the sources is unknown a priori, so the UAVs rely on measurements of the field intensity to gauge safety. The UAVs estimate the unknown scalar field using Gaussian process (GP) regression and use the estimate to generate a map of high-intensity regions using Hough transform (HT), updated online based on the field measurements. A convergence analysis shows the boundedness of the error between the actual scalar field and the learned scalar field. The effectiveness of the method is evaluated through simulations, showcasing its ability to accurately learn scalar fields with multiple high-intensity regions while reducing the number of measurements taken inside the high-intensity regions.
△ Less
Submitted 18 July, 2024;
originally announced July 2024.
-
State and Input Constrained Output-Feedback Adaptive Optimal Control of Affine Nonlinear Systems
Authors:
Tochukwu Elijah Ogri,
Muzaffar Qureshi,
Zachary I. Bell,
Rushikesh Kamalapurkar
Abstract:
In this paper, a novel online, output-feedback, critic-only, model-based reinforcement learning framework is developed for safety-critical control systems operating in complex environments. The developed framework ensures system stability and safety, regardless of the lack of full-state measurement, while learning and implementing an optimal controller. The approach leverages linear matrix inequal…
▽ More
In this paper, a novel online, output-feedback, critic-only, model-based reinforcement learning framework is developed for safety-critical control systems operating in complex environments. The developed framework ensures system stability and safety, regardless of the lack of full-state measurement, while learning and implementing an optimal controller. The approach leverages linear matrix inequality-based observer design method to efficiently search for observer gains for effective state estimation. Then, approximate dynamic programming is used to develop an approximate controller that uses simulated experiences to guarantee the safety and stability of the closed-loop system. Safety is enforced by adding a recentered robust Lyapunov-like barrier function to the cost function that effectively enforces safety constraints, even in the presence of uncertainty in the state. Lyapunov-based stability analysis is used to guarantee uniform ultimate boundedness of the trajectories of the closed-loop system and ensure safety. Simulation studies are performed to demonstrate the effectiveness of the developed method through two real-world safety-critical scenarios, ensuring that the state trajectories of a given system remain in a given set and obstacle avoidance.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Uncertainty-Aware Guidance for Target Tracking subject to Intermittent Measurements using Motion Model Learning
Authors:
Andres Pulido,
Kyle Volle,
Kristy Waters,
Zachary I. Bell,
Prashant Ganesh,
Jane Shin
Abstract:
This letter presents a novel guidance law for target tracking applications where the target motion model is unknown and sensor measurements are intermittent due to unknown environmental conditions and low measurement update rate. In this work, the target motion model is represented by a transformer-based neural network and trained by previous target position measurements. This neural network (NN)-…
▽ More
This letter presents a novel guidance law for target tracking applications where the target motion model is unknown and sensor measurements are intermittent due to unknown environmental conditions and low measurement update rate. In this work, the target motion model is represented by a transformer-based neural network and trained by previous target position measurements. This neural network (NN)-based motion model serves as the prediction step in a particle filter for target state estimation and uncertainty quantification. Then this estimation uncertainty is utilized in the information-driven guidance law to compute a path for the mobile agent to travel to a position with maximum expected entropy reduction (EER). The computation of EER is performed in real-time by approximating the probability distribution of the state using the particle representation from particle filter. Simulation and hardware experiments are performed with a quadcopter agent and TurtleBot target to demonstrate that the presented guidance law outperforms two other baseline guidance methods.
△ Less
Submitted 1 February, 2024;
originally announced February 2024.
-
An adaptive optimal control approach to monocular depth observability maximization
Authors:
Tochukwu Elijah Ogri,
Muzaffar Qureshi,
Zachary I. Bell,
Kristy Waters,
Rushikesh Kamalapurkar
Abstract:
This paper presents an integral concurrent learning (ICL)-based observer for a monocular camera to accurately estimate the Euclidean distance to features on a stationary object, under the restriction that state information is unavailable. Using distance estimates, an infinite horizon optimal regulation problem is solved, which aims to regulate the camera to a goal location while maximizing feature…
▽ More
This paper presents an integral concurrent learning (ICL)-based observer for a monocular camera to accurately estimate the Euclidean distance to features on a stationary object, under the restriction that state information is unavailable. Using distance estimates, an infinite horizon optimal regulation problem is solved, which aims to regulate the camera to a goal location while maximizing feature observability. Lyapunov-based stability analysis is used to guarantee exponential convergence of depth estimates and input-to-state stability of the goal location relative to the camera. The effectiveness of the proposed approach is verified in simulation, and a table illustrating improved observability is provided.
△ Less
Submitted 6 June, 2024; v1 submitted 17 January, 2024;
originally announced January 2024.
-
Distributed Asynchronous Discrete-Time Feedback Optimization
Authors:
Gabriel Behrendt,
Matthew Longmire,
Zachary I. Bell,
Matthew Hale
Abstract:
In this article, we present an algorithm that drives the outputs of a network of agents to jointly track the solutions of time-varying optimization problems in a way that is robust to asynchrony in the agents' operations. We consider three operations that can be asynchronous: (1) computations of control inputs, (2) measurements of network outputs, and (3) communications of agents' inputs and outpu…
▽ More
In this article, we present an algorithm that drives the outputs of a network of agents to jointly track the solutions of time-varying optimization problems in a way that is robust to asynchrony in the agents' operations. We consider three operations that can be asynchronous: (1) computations of control inputs, (2) measurements of network outputs, and (3) communications of agents' inputs and outputs. We first show that our algorithm converges to the solution of a time-invariant feedback optimization problem in linear time. Next, we show that our algorithm drives outputs to track the solution of time-varying feedback optimization problems within a bounded error dependent upon the movement of the minimizers and degree of asynchrony in a way that we make precise. These convergence results are extended to quantify agents' asymptotic behavior as the length of their time horizon approaches infinity. Then, to ensure satisfactory network performance, we specify the timing of agents' operations relative to changes in the objective function that ensure a desired error bound. Numerical experiments confirm these developments and show the success of our distributed feedback optimization algorithm under asynchrony.
△ Less
Submitted 1 December, 2023;
originally announced December 2023.
-
Defending a Static Target Point with a Slow Defender
Authors:
Goutam Das,
Michael Dorothy,
Zachary I. Bell,
Daigo Shishika
Abstract:
This paper studies a target-defense game played between a slow defender and a fast attacker. The attacker wins the game if it reaches the target while avoiding the defender's capture disk. The defender wins the game by preventing the attacker from reaching the target, which includes reaching the target and containing it in the capture disk. Depending on the initial condition, the attacker must cir…
▽ More
This paper studies a target-defense game played between a slow defender and a fast attacker. The attacker wins the game if it reaches the target while avoiding the defender's capture disk. The defender wins the game by preventing the attacker from reaching the target, which includes reaching the target and containing it in the capture disk. Depending on the initial condition, the attacker must circumnavigate the defender's capture disk, resulting in a constrained trajectory. This condition produces three phases of the game, which we analyze to solve for the game of kind. We provide the barrier surface that divides the state space into attacker-win and defender win regions, and present the corresponding strategies that guarantee win for each region. Numerical experiments demonstrate the theoretical results as well as the efficacy of the proposed strategies.
△ Less
Submitted 16 March, 2024; v1 submitted 6 November, 2023;
originally announced November 2023.
-
Deep Nonlinear Adaptive Control for Unmanned Aerial Systems Operating under Dynamic Uncertainties
Authors:
Zachary Lamb,
Zachary I. Bell,
Matthew Longmire,
Jared Paquet,
Prashant Ganesh,
Ricardo Sanfelice
Abstract:
Recent literature in the field of machine learning (ML) control has shown promising theoretical results for a Deep Neural Network (DNN) based Nonlinear Adaptive Controller (DNAC) capable of achieving trajectory tracking for nonlinear systems. Expanding on this work, this paper applies DNAC to the Attitude Control System (ACS) of a quadrotor and shows improvement to attitude control performance und…
▽ More
Recent literature in the field of machine learning (ML) control has shown promising theoretical results for a Deep Neural Network (DNN) based Nonlinear Adaptive Controller (DNAC) capable of achieving trajectory tracking for nonlinear systems. Expanding on this work, this paper applies DNAC to the Attitude Control System (ACS) of a quadrotor and shows improvement to attitude control performance under disturbed flying conditions where the model uncertainty is high. Moreover, these results are noteworthy for ML control because they were achieved with no prior training data and an arbitrary system dynamics initialization; simply put, the controller presented in this paper is practically modelless, yet yields the ability to force trajectory tracking for nonlinear systems while rejecting significant undesirable model disturbances learned through a DNN. The combination of ML techniques to learn a system's dynamics and the Lyapunov analysis required to provide stability guarantees leads to a controller with applications in safety-critical systems that may undergo uncertain model changes, as is the case for most aerial systems. Experimental findings are analyzed in the final section of this paper, and DNAC is shown to outperform the trajectory tracking capabilities of PID, MRAC, and the recently developed Deep Model Reference Adaptive Control (DMRAC) schemes.
△ Less
Submitted 14 October, 2023;
originally announced October 2023.
-
Network Preference Dynamics using Lattice Theory
Authors:
Hans Riess,
Gregory Henselman-Petrusek,
Michael C. Munger,
Robert Ghrist,
Zachary I. Bell,
Michael M. Zavlanos
Abstract:
Preferences, fundamental in all forms of strategic behavior and collective decision-making, in their raw form, are an abstract ordering on a set of alternatives. Agents, we assume, revise their preferences as they gain more information about other agents. Exploiting the ordered algebraic structure of preferences, we introduce a message-passing algorithm for heterogeneous agents distributed over a…
▽ More
Preferences, fundamental in all forms of strategic behavior and collective decision-making, in their raw form, are an abstract ordering on a set of alternatives. Agents, we assume, revise their preferences as they gain more information about other agents. Exploiting the ordered algebraic structure of preferences, we introduce a message-passing algorithm for heterogeneous agents distributed over a network to update their preferences based on aggregations of the preferences of their neighbors in a graph. We demonstrate the existence of equilibrium points of the resulting global dynamical system of local preference updates and provide a sufficient condition for trajectories to converge to equilibria: stable preferences. Finally, we present numerical simulations demonstrating our preliminary results.
△ Less
Submitted 10 July, 2024; v1 submitted 29 September, 2023;
originally announced October 2023.
-
State and Parameter Estimation for Affine Nonlinear Systems
Authors:
Tochukwu Elijah Ogri,
Zachary I. Bell,
Rushikesh Kamalapurkar
Abstract:
Real-world control applications in complex and uncertain environments require adaptability to handle model uncertainties and robustness against disturbances. This paper presents an online, output-feedback, critic-only, model-based reinforcement learning architecture that simultaneously learns and implements an optimal controller while maintaining stability during the learning phase. Using multipli…
▽ More
Real-world control applications in complex and uncertain environments require adaptability to handle model uncertainties and robustness against disturbances. This paper presents an online, output-feedback, critic-only, model-based reinforcement learning architecture that simultaneously learns and implements an optimal controller while maintaining stability during the learning phase. Using multiplier matrices, a convenient way to search for observer gains is designed along with a controller that learns from simulated experience to ensure stability and convergence of trajectories of the closed-loop system to a neighborhood of the origin. Local uniform ultimate boundedness of the trajectories is established using a Lyapunov-based analysis and demonstrated through simulation results, under mild excitation conditions.
△ Less
Submitted 21 April, 2023; v1 submitted 4 April, 2023;
originally announced April 2023.
-
Output Feedback Adaptive Optimal Control of Affine Nonlinear systems with a Linear Measurement Model
Authors:
Tochukwu Elijah Ogri,
S. M. Nahid Mahmud,
Zachary I. Bell,
Rushikesh Kamalapurkar
Abstract:
Real-world control applications in complex and uncertain environments require adaptability to handle model uncertainties and robustness against disturbances. This paper presents an online, output-feedback, critic-only, model-based reinforcement learning architecture that simultaneously learns and implements an optimal controller while maintaining stability during the learning phase. Using multipli…
▽ More
Real-world control applications in complex and uncertain environments require adaptability to handle model uncertainties and robustness against disturbances. This paper presents an online, output-feedback, critic-only, model-based reinforcement learning architecture that simultaneously learns and implements an optimal controller while maintaining stability during the learning phase. Using multiplier matrices, a convenient way to search for observer gains is designed along with a controller that learns from simulated experience to ensure stability and convergence of trajectories of the closed-loop system to a neighborhood of the origin. Local uniform ultimate boundedness of the trajectories is established using a Lyapunov-based analysis and demonstrated through simulation results, under mild excitation conditions.
△ Less
Submitted 3 April, 2023; v1 submitted 12 October, 2022;
originally announced October 2022.
-
Guarding a Non-Maneuverable Translating Line with an Attached Defender
Authors:
Goutam Das,
Michael Dorothy,
Zachary I. Bell,
Daigo Shishika
Abstract:
In this paper we consider a target-guarding differential game where the defender must protect a linearly translating line-segment by intercepting an attacker who tries to reach it. In contrast to common target-guarding problems, we assume that the defender is attached to the target and moves along with it. This assumption affects the defenders' maximum speed in inertial frame, which depends on the…
▽ More
In this paper we consider a target-guarding differential game where the defender must protect a linearly translating line-segment by intercepting an attacker who tries to reach it. In contrast to common target-guarding problems, we assume that the defender is attached to the target and moves along with it. This assumption affects the defenders' maximum speed in inertial frame, which depends on the target's direction of motion. Zero-sum differential game of degree for both the attacker-win and defender-win scenarios are studied, where the payoff is defined to be the distance between the two agents at the time of game termination. We derive the equilibrium strategies and the Value function by leveraging the solution for the infinite-length target scenario. The zero-level set of this Value function provides the barrier surface that divides the state space into defender-win and attacker-win regions. We present simulation results to demonstrate the theoretical results.
△ Less
Submitted 19 September, 2022;
originally announced September 2022.
-
A Zeroth-Order Momentum Method for Risk-Averse Online Convex Games
Authors:
Zifan Wang,
Yi Shen,
Zachary I. Bell,
Scott Nivison,
Michael M. Zavlanos,
Karl H. Johansson
Abstract:
We consider risk-averse learning in repeated unknown games where the goal of the agents is to minimize their individual risk of incurring significantly high cost. Specifically, the agents use the conditional value at risk (CVaR) as a risk measure and rely on bandit feedback in the form of the cost values of the selected actions at every episode to estimate their CVaR values and update their action…
▽ More
We consider risk-averse learning in repeated unknown games where the goal of the agents is to minimize their individual risk of incurring significantly high cost. Specifically, the agents use the conditional value at risk (CVaR) as a risk measure and rely on bandit feedback in the form of the cost values of the selected actions at every episode to estimate their CVaR values and update their actions. A major challenge in using bandit feedback to estimate CVaR is that the agents can only access their own cost values, which, however, depend on the actions of all agents. To address this challenge, we propose a new risk-averse learning algorithm with momentum that utilizes the full historical information on the cost values. We show that this algorithm achieves sub-linear regret and matches the best known algorithms in the literature. We provide numerical experiments for a Cournot game that show that our method outperforms existing methods.
△ Less
Submitted 6 September, 2022;
originally announced September 2022.
-
Safe Controller for Output Feedback Linear Systems using Model-Based Reinforcement Learning
Authors:
S M Nahid Mahmud,
Moad Abudia,
Scott A Nivison,
Zachary I. Bell,
Rushikesh Kamalapurkar
Abstract:
The objective of this research is to enable safety-critical systems to simultaneously learn and execute optimal control policies in a safe manner to achieve complex autonomy. Learning optimal policies via trial and error, i.e., traditional reinforcement learning, is difficult to implement in safety-critical systems, particularly when task restarts are unavailable. Safe model-based reinforcement le…
▽ More
The objective of this research is to enable safety-critical systems to simultaneously learn and execute optimal control policies in a safe manner to achieve complex autonomy. Learning optimal policies via trial and error, i.e., traditional reinforcement learning, is difficult to implement in safety-critical systems, particularly when task restarts are unavailable. Safe model-based reinforcement learning techniques based on a barrier transformation have recently been developed to address this problem. However, these methods rely on full state feedback, limiting their usability in a real-world environment. In this work, an output-feedback safe model-based reinforcement learning technique based on a novel barrier-aware dynamic state estimator has been designed to address this issue. The developed approach facilitates simultaneous learning and execution of safe control policies for safety-critical linear systems. Simulation results indicate that barrier transformation is an effective approach to achieve online reinforcement learning in safety-critical systems using output feedback.
△ Less
Submitted 4 April, 2022;
originally announced April 2022.
-
Safety aware model-based reinforcement learning for optimal control of a class of output-feedback nonlinear systems
Authors:
S M Nahid Mahmud,
Moad Abudia,
Scott A Nivison,
Zachary I. Bell,
Rushikesh Kamalapurkar
Abstract:
The ability to learn and execute optimal control policies safely is critical to realization of complex autonomy, especially where task restarts are not available and/or the systems are safety-critical. Safety requirements are often expressed in terms of state and/or control constraints. Methods such as barrier transformation and control barrier functions have been successfully used, in conjunction…
▽ More
The ability to learn and execute optimal control policies safely is critical to realization of complex autonomy, especially where task restarts are not available and/or the systems are safety-critical. Safety requirements are often expressed in terms of state and/or control constraints. Methods such as barrier transformation and control barrier functions have been successfully used, in conjunction with model-based reinforcement learning, for safe learning in systems under state constraints, to learn the optimal control policy. However, existing barrier-based safe learning methods rely on full state feedback. In this paper, an output-feedback safe model-based reinforcement learning technique is developed that utilizes a novel dynamic state estimator to implement simultaneous learning and control for a class of safety-critical systems with partially observable state.
△ Less
Submitted 1 October, 2021;
originally announced October 2021.
-
Asynchronous Zeroth-Order Distributed Optimization with Residual Feedback
Authors:
Yi Shen,
Yan Zhang,
Scott Nivison,
Zachary I. Bell,
Michael M. Zavlanos
Abstract:
We consider a zeroth-order distributed optimization problem, where the global objective function is a black-box function and, as such, its gradient information is inaccessible to the local agents. Instead, the local agents can only use the values of the objective function to estimate the gradient and update their local decision variables. In this paper, we also assume that these updates are done a…
▽ More
We consider a zeroth-order distributed optimization problem, where the global objective function is a black-box function and, as such, its gradient information is inaccessible to the local agents. Instead, the local agents can only use the values of the objective function to estimate the gradient and update their local decision variables. In this paper, we also assume that these updates are done asynchronously. To solve this problem, we propose an asynchronous zeroth-order distributed optimization method that relies on a one-point residual feedback to estimate the unknown gradient. We show that this estimator is unbiased under asynchronous updating, and theoretically analyze the convergence of the proposed method. We also present numerical experiments that demonstrate that our method outperforms two-point methods under asynchronous updating. To the best of our knowledge, this is the first asynchronous zeroth-order distributed optimization method that is also supported by theoretical guarantees.
△ Less
Submitted 28 September, 2021;
originally announced September 2021.
-
Safe Model-Based Reinforcement Learning for Systems with Parametric Uncertainties
Authors:
S M Nahid Mahmud,
Scott A Nivison,
Zachary I. Bell,
Rushikesh Kamalapurkar
Abstract:
Reinforcement learning has been established over the past decade as an effective tool to find optimal control policies for dynamical systems, with recent focus on approaches that guarantee safety during the learning and/or execution phases. In general, safety guarantees are critical in reinforcement learning when the system is safety-critical and/or task restarts are not practically feasible. In o…
▽ More
Reinforcement learning has been established over the past decade as an effective tool to find optimal control policies for dynamical systems, with recent focus on approaches that guarantee safety during the learning and/or execution phases. In general, safety guarantees are critical in reinforcement learning when the system is safety-critical and/or task restarts are not practically feasible. In optimal control theory, safety requirements are often expressed in terms of state and/or control constraints. In recent years, reinforcement learning approaches that rely on persistent excitation have been combined with a barrier transformation to learn the optimal control policies under state constraints. To soften the excitation requirements, model-based reinforcement learning methods that rely on exact model knowledge have also been integrated with the barrier transformation framework. The objective of this paper is to develop safe reinforcement learning method for deterministic nonlinear systems, with parametric uncertainties in the model, to learn approximate constrained optimal policies without relying on stringent excitation conditions. To that end, a model-based reinforcement learning technique that utilizes a novel filtered concurrent learning method, along with a barrier transformation, is developed in this paper to realize simultaneous learning of unknown model parameters and approximate optimal state-constrained control policies for safety-critical systems.
△ Less
Submitted 5 October, 2021; v1 submitted 24 July, 2020;
originally announced July 2020.
-
A Switched Systems Approach to Path Following with Intermittent State Feedback
Authors:
Hsi-Yuan Chen,
Zachary I. Bell,
Patryk Deptula,
Warren E. Dixon
Abstract:
Autonomous agents are often tasked with operating in an area where feedback is unavailable. Inspired by such applications, this paper develops a novel switched systems-based control method for uncertain nonlinear systems with temporary loss of state feedback. To compensate for intermittent feedback, an observer is used while state feedback is available to reduce the estimation error, and a predict…
▽ More
Autonomous agents are often tasked with operating in an area where feedback is unavailable. Inspired by such applications, this paper develops a novel switched systems-based control method for uncertain nonlinear systems with temporary loss of state feedback. To compensate for intermittent feedback, an observer is used while state feedback is available to reduce the estimation error, and a predictor is utilized to propagate the estimates while state feedback is unavailable. Based on the resulting subsystems, maximum and minimum dwell time conditions are developed via a Lyapunov-based switched systems analysis to relax the constraint of maintaining constant feedback. Using the dwell time conditions, a switching trajectory is developed to enter and exit the feedback denied region in a manner that ensures the overall switched system remains stable. A scheme for designing a switching trajectory with a smooth transition function is provided. Simulation and experimental results are presented to demonstrate the performance of control design.
△ Less
Submitted 15 March, 2018;
originally announced March 2018.