Search | arXiv e-print repository

Networked Communication for Mean-Field Games with Function Approximation and Empirical Mean-Field Estimation

Authors: Patrick Benjamin, Alessandro Abate

Abstract: Recent works have provided algorithms by which decentralised agents, which may be connected via a communication network, can learn equilibria in Mean-Field Games from a single, non-episodic run of the empirical system. However, these algorithms are given for tabular settings: this computationally limits the size of players' observation space, meaning that the algorithms are not able to handle anyt… ▽ More Recent works have provided algorithms by which decentralised agents, which may be connected via a communication network, can learn equilibria in Mean-Field Games from a single, non-episodic run of the empirical system. However, these algorithms are given for tabular settings: this computationally limits the size of players' observation space, meaning that the algorithms are not able to handle anything but small state spaces, nor to generalise beyond policies depending on the ego player's state to so-called 'population-dependent' policies. We address this limitation by introducing function approximation to the existing setting, drawing on the Munchausen Online Mirror Descent method that has previously been employed only in finite-horizon, episodic, centralised settings. While this permits us to include the population's mean-field distribution in the observation for each player's policy, it is arguably unrealistic to assume that decentralised agents would have access to this global information: we therefore additionally provide new algorithms that allow agents to estimate the global empirical distribution based on a local neighbourhood, and to improve this estimate via communication over a given network. Our experiments showcase how the communication network allows decentralised agents to estimate the mean-field distribution for population-dependent policies, and that exchanging policy information helps networked agents to outperform both independent and even centralised agents in function-approximation settings, by an even greater margin than in tabular settings. △ Less

Submitted 21 August, 2024; originally announced August 2024.

arXiv:2408.03093 [pdf, other]

Learning Provably Robust Policies in Uncertain Parametric Environments

Authors: Yannik Schnitzer, Alessandro Abate, David Parker

Abstract: We present a data-driven approach for learning MDP policies that are robust across stochastic environments whose transition probabilities are defined by parameters with an unknown distribution. We produce probably approximately correct (PAC) guarantees for the performance of these learned policies in a new, unseen environment over the unknown distribution. Our approach is based on finite samples o… ▽ More We present a data-driven approach for learning MDP policies that are robust across stochastic environments whose transition probabilities are defined by parameters with an unknown distribution. We produce probably approximately correct (PAC) guarantees for the performance of these learned policies in a new, unseen environment over the unknown distribution. Our approach is based on finite samples of the MDP environments, for each of which we build an approximation of the model as an interval MDP, by exploring a set of generated trajectories. We use the built approximations to synthesise a single policy that performs well (meets given requirements) across the sampled environments, and furthermore bound its risk (of not meeting the given requirements) when deployed in an unseen environment. Our procedure offers a trade-off between the guaranteed performance of the learned policy and the risk of not meeting the guarantee in an unseen environment. Our approach exploits knowledge of the environment's state space and graph structure, and we show how additional knowledge of its parametric structure can be leveraged to optimize learning and to obtain tighter guarantees from less samples. We evaluate our approach on a diverse range of established benchmarks, demonstrating that we can generate highly performing and robust policies, along with guarantees that tightly quantify their performance and the associated risk. △ Less

Submitted 6 August, 2024; originally announced August 2024.

arXiv:2407.10971 [pdf, other]

Walking the Values in Bayesian Inverse Reinforcement Learning

Authors: Ondrej Bajgar, Alessandro Abate, Konstantinos Gatsis, Michael A. Osborne

Abstract: The goal of Bayesian inverse reinforcement learning (IRL) is recovering a posterior distribution over reward functions using a set of demonstrations from an expert optimizing for a reward unknown to the learner. The resulting posterior over rewards can then be used to synthesize an apprentice policy that performs well on the same or a similar task. A key challenge in Bayesian IRL is bridging the c… ▽ More The goal of Bayesian inverse reinforcement learning (IRL) is recovering a posterior distribution over reward functions using a set of demonstrations from an expert optimizing for a reward unknown to the learner. The resulting posterior over rewards can then be used to synthesize an apprentice policy that performs well on the same or a similar task. A key challenge in Bayesian IRL is bridging the computational gap between the hypothesis space of possible rewards and the likelihood, often defined in terms of Q values: vanilla Bayesian IRL needs to solve the costly forward planning problem - going from rewards to the Q values - at every step of the algorithm, which may need to be done thousands of times. We propose to solve this by a simple change: instead of focusing on primarily sampling in the space of rewards, we can focus on primarily working in the space of Q-values, since the computation required to go from Q-values to reward is radically cheaper. Furthermore, this reversion of the computation makes it easy to compute the gradient allowing efficient sampling using Hamiltonian Monte Carlo. We propose ValueWalk - a new Markov chain Monte Carlo method based on this insight - and illustrate its advantages on several tasks. △ Less

Submitted 15 July, 2024; originally announced July 2024.

Comments: Published at the 40th Conference on Uncertainty in Artificial Intelligence (UAI 2024)

arXiv:2406.15753 [pdf, other]

The Perils of Optimizing Learned Reward Functions: Low Training Error Does Not Guarantee Low Regret

Authors: Lukas Fluri, Leon Lang, Alessandro Abate, Patrick Forré, David Krueger, Joar Skalse

Abstract: In reinforcement learning, specifying reward functions that capture the intended task can be very challenging. Reward learning aims to address this issue by learning the reward function. However, a learned reward model may have a low error on the training distribution, and yet subsequently produce a policy with large regret. We say that such a reward model has an error-regret mismatch. The main so… ▽ More In reinforcement learning, specifying reward functions that capture the intended task can be very challenging. Reward learning aims to address this issue by learning the reward function. However, a learned reward model may have a low error on the training distribution, and yet subsequently produce a policy with large regret. We say that such a reward model has an error-regret mismatch. The main source of an error-regret mismatch is the distributional shift that commonly occurs during policy optimization. In this paper, we mathematically show that a sufficiently low expected test error of the reward model guarantees low worst-case regret, but that for any fixed expected test error, there exist realistic data distributions that allow for error-regret mismatch to occur. We then show that similar problems persist even when using policy regularization techniques, commonly employed in methods such as RLHF. Our theoretical results highlight the importance of developing new ways to measure the quality of learned reward models. △ Less

Submitted 22 June, 2024; originally announced June 2024.

Comments: 58 pages, 1 figure

arXiv:2406.10023 [pdf, other]

Deep Bayesian Active Learning for Preference Modeling in Large Language Models

Authors: Luckeciano C. Melo, Panagiotis Tigas, Alessandro Abate, Yarin Gal

Abstract: Leveraging human preferences for steering the behavior of Large Language Models (LLMs) has demonstrated notable success in recent years. Nonetheless, data selection and labeling are still a bottleneck for these systems, particularly at large scale. Hence, selecting the most informative points for acquiring human feedback may considerably reduce the cost of preference labeling and unleash the furth… ▽ More Leveraging human preferences for steering the behavior of Large Language Models (LLMs) has demonstrated notable success in recent years. Nonetheless, data selection and labeling are still a bottleneck for these systems, particularly at large scale. Hence, selecting the most informative points for acquiring human feedback may considerably reduce the cost of preference labeling and unleash the further development of LLMs. Bayesian Active Learning provides a principled framework for addressing this challenge and has demonstrated remarkable success in diverse settings. However, previous attempts to employ it for Preference Modeling did not meet such expectations. In this work, we identify that naive epistemic uncertainty estimation leads to the acquisition of redundant samples. We address this by proposing the Bayesian Active Learner for Preference Modeling (BAL-PM), a novel stochastic acquisition policy that not only targets points of high epistemic uncertainty according to the preference model but also seeks to maximize the entropy of the acquired prompt distribution in the feature space spanned by the employed LLM. Notably, our experiments demonstrate that BAL-PM requires 33% to 68% fewer preference labels in two popular human preference datasets and exceeds previous stochastic Bayesian acquisition policies. △ Less

Submitted 14 June, 2024; originally announced June 2024.

arXiv:2405.17304 [pdf, ps, other]

Stochastic Omega-Regular Verification and Control with Supermartingales

Authors: Alessandro Abate, Mirco Giacobbe, Diptarko Roy

Abstract: We present for the first time a supermartingale certificate for $ω$-regular specifications. We leverage the Robbins & Siegmund convergence theorem to characterize supermartingale certificates for the almost-sure acceptance of Streett conditions on general stochastic processes, which we call Streett supermartingales. This enables effective verification and control of discrete-time stochastic dynami… ▽ More We present for the first time a supermartingale certificate for $ω$-regular specifications. We leverage the Robbins & Siegmund convergence theorem to characterize supermartingale certificates for the almost-sure acceptance of Streett conditions on general stochastic processes, which we call Streett supermartingales. This enables effective verification and control of discrete-time stochastic dynamical models with infinite state space under $ω$-regular and linear temporal logic specifications. Our result generalises reachability, safety, reach-avoid, persistence and recurrence specifications; our contribution applies to discrete-time stochastic dynamical models and probabilistic programs with discrete and continuous state spaces and distributions, and carries over to deterministic models and programs. We provide a synthesis algorithm for control policies and Streett supermartingales as proof certificates for $ω$-regular objectives, which is sound and complete for supermartingales and control policies with polynomial templates and any stochastic dynamical model whose post-expectation is expressible as a polynomial. We additionally provide an optimisation of our algorithm that reduces the problem to satisfiability modulo theories, under the assumption that templates and post-expectation are in piecewise linear form. We have built a prototype and have demonstrated the efficacy of our approach on several exemplar $ω$-regular verification and control synthesis problems. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: The conference version of this manuscript appeared at CAV'24

arXiv:2405.15723 [pdf, other]

Bisimulation Learning

Authors: Alessandro Abate, Mirco Giacobbe, Yannik Schnitzer

Abstract: We introduce a data-driven approach to computing finite bisimulations for state transition systems with very large, possibly infinite state space. Our novel technique computes stutter-insensitive bisimulations of deterministic systems, which we characterize as the problem of learning a state classifier together with a ranking function for each class. Our procedure learns a candidate state classifi… ▽ More We introduce a data-driven approach to computing finite bisimulations for state transition systems with very large, possibly infinite state space. Our novel technique computes stutter-insensitive bisimulations of deterministic systems, which we characterize as the problem of learning a state classifier together with a ranking function for each class. Our procedure learns a candidate state classifier and candidate ranking functions from a finite dataset of sample states; then, it checks whether these generalise to the entire state space using satisfiability modulo theory solving. Upon the affirmative answer, the procedure concludes that the classifier constitutes a valid stutter-insensitive bisimulation of the system. Upon a negative answer, the solver produces a counterexample state for which the classifier violates the claim, adds it to the dataset, and repeats learning and checking in a counterexample-guided inductive synthesis loop until a valid bisimulation is found. We demonstrate on a range of benchmarks from reactive verification and software model checking that our method yields faster verification results than alternative state-of-the-art tools in practice. Our method produces succinct abstractions that enable an effective verification of linear temporal logic without next operator, and are interpretable for system diagnostics. △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2405.08232 [pdf, other]

Distributionally Robust Aggregation of Electric Vehicle Flexibility

Authors: Karan Mukhi, Chengrui Qu, Pengcheng You, Alessandro Abate

Abstract: We address the problem of characterising the aggregate flexibility in populations of electric vehicles (EVs) with uncertain charging requirements. Building on previous results that provide exact characterisations of the aggregate flexibility in populations with known charging requirements, in this paper we extend the aggregation methods so that charging requirements are uncertain, but sampled from… ▽ More We address the problem of characterising the aggregate flexibility in populations of electric vehicles (EVs) with uncertain charging requirements. Building on previous results that provide exact characterisations of the aggregate flexibility in populations with known charging requirements, in this paper we extend the aggregation methods so that charging requirements are uncertain, but sampled from a given distribution. In particular, we construct \textit{distributionally robust aggregate flexibility sets}, sets of aggregate charging profiles over which we can provide probabilistic guarantees that actual realised populations will be able to track. By leveraging measure concentration results that establish powerful finite sample guarantees, we are able to give tight bounds on these robust flexibility sets, even in low sample regimes that are well suited for aggregating small populations of EVs. We detail explicit methods of calculating these sets and provide numerical results that validate the theory developed here. △ Less

Submitted 13 May, 2024; originally announced May 2024.

Comments: 6 pages, conference

arXiv:2405.06624 [pdf, other]

Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems

Authors: David "davidad" Dalrymple, Joar Skalse, Yoshua Bengio, Stuart Russell, Max Tegmark, Sanjit Seshia, Steve Omohundro, Christian Szegedy, Ben Goldhaber, Nora Ammann, Alessandro Abate, Joe Halpern, Clark Barrett, Ding Zhao, Tan Zhi-Xuan, Jeannette Wing, Joshua Tenenbaum

Abstract: Ensuring that AI systems reliably and robustly avoid harmful or dangerous behaviours is a crucial challenge, especially for AI systems with a high degree of autonomy and general intelligence, or systems used in safety-critical contexts. In this paper, we will introduce and define a family of approaches to AI safety, which we will refer to as guaranteed safe (GS) AI. The core feature of these appro… ▽ More Ensuring that AI systems reliably and robustly avoid harmful or dangerous behaviours is a crucial challenge, especially for AI systems with a high degree of autonomy and general intelligence, or systems used in safety-critical contexts. In this paper, we will introduce and define a family of approaches to AI safety, which we will refer to as guaranteed safe (GS) AI. The core feature of these approaches is that they aim to produce AI systems which are equipped with high-assurance quantitative safety guarantees. This is achieved by the interplay of three core components: a world model (which provides a mathematical description of how the AI system affects the outside world), a safety specification (which is a mathematical description of what effects are acceptable), and a verifier (which provides an auditable proof certificate that the AI satisfies the safety specification relative to the world model). We outline a number of approaches for creating each of these three core components, describe the main technical challenges, and suggest a number of potential solutions to them. We also argue for the necessity of this approach to AI safety, and for the inadequacy of the main alternative approaches. △ Less

Submitted 8 July, 2024; v1 submitted 10 May, 2024; originally announced May 2024.

arXiv:2404.18813 [pdf, other]

Safe Reach Set Computation via Neural Barrier Certificates

Authors: Alessandro Abate, Sergiy Bogomolov, Alec Edwards, Kostiantyn Potomkin, Sadegh Soudjani, Paolo Zuliani

Abstract: We present a novel technique for online safety verification of autonomous systems, which performs reachability analysis efficiently for both bounded and unbounded horizons by employing neural barrier certificates. Our approach uses barrier certificates given by parameterized neural networks that depend on a given initial set, unsafe sets, and time horizon. Such networks are trained efficiently off… ▽ More We present a novel technique for online safety verification of autonomous systems, which performs reachability analysis efficiently for both bounded and unbounded horizons by employing neural barrier certificates. Our approach uses barrier certificates given by parameterized neural networks that depend on a given initial set, unsafe sets, and time horizon. Such networks are trained efficiently offline using system simulations sampled from regions of the state space. We then employ a meta-neural network to generalize the barrier certificates to state space regions that are outside the training set. These certificates are generated and validated online as sound over-approximations of the reachable states, thus either ensuring system safety or activating appropriate alternative actions in unsafe scenarios. We demonstrate our technique on case studies from linear models to nonlinear control-dependent models for online autonomous driving scenarios. △ Less

Submitted 29 April, 2024; originally announced April 2024.

Comments: IFAC Conference on Analysis and Design of Hybrid Systems

arXiv:2404.03314 [pdf, other]

Learning to Bid in Forward Electricity Markets Using a No-Regret Algorithm

Authors: Arega Getaneh Abate, Dorsa Majdi, Jalal Kazempour, Maryam Kamgarpour

Abstract: It is a common practice in the current literature of electricity markets to use game-theoretic approaches for strategic price bidding. However, they generally rely on the assumption that the strategic bidders have prior knowledge of rival bids, either perfectly or with some uncertainty. This is not necessarily a realistic assumption. This paper takes a different approach by relaxing such an assump… ▽ More It is a common practice in the current literature of electricity markets to use game-theoretic approaches for strategic price bidding. However, they generally rely on the assumption that the strategic bidders have prior knowledge of rival bids, either perfectly or with some uncertainty. This is not necessarily a realistic assumption. This paper takes a different approach by relaxing such an assumption and exploits a no-regret learning algorithm for repeated games. In particular, by using the \emph{a posteriori} information about rivals' bids, a learner can implement a no-regret algorithm to optimize her/his decision making. Given this information, we utilize a multiplicative weight-update algorithm, adapting bidding strategies over multiple rounds of an auction to minimize her/his regret. Our numerical results show that when the proposed learning approach is used the social cost and the market-clearing prices can be higher than those corresponding to the classical game-theoretic approaches. The takeaway for market regulators is that electricity markets might be exposed to greater market power of suppliers than what classical analysis shows. △ Less

Submitted 4 April, 2024; originally announced April 2024.

arXiv:2403.15398 [pdf]

An International and Multidisciplinary Teaching Experience with Real Industrial Team Project Development

Authors: Martin Mellado, Eduardo Vendrell, Filomena Ferrucci, Andrea Abate, Detlef Zuhlke, Bernard Riera

Abstract: This paper presents the design, objectives, experiences, and results of an international cooperation project funded by the European Commission in the context of the Erasmus Intensive Programme (IP, for short) designed to improve students' curricula. An IP is a short programme of study (minimum 2 weeks) that brings together university students and staff from at least three countries in order to enc… ▽ More This paper presents the design, objectives, experiences, and results of an international cooperation project funded by the European Commission in the context of the Erasmus Intensive Programme (IP, for short) designed to improve students' curricula. An IP is a short programme of study (minimum 2 weeks) that brings together university students and staff from at least three countries in order to encourage efficient and multinational teaching of specialist topics, which might otherwise not be taught at all. This project lasted for 6 years, covering two different editions, each one with three year duration. This project lasted for 6 years, covering two different editions, each one with three year duration. The first edition, named SAVRO (Simulation and Virtual Reality in Robotics for Industrial Assembly Processes) was held in the period 2008-2010, with the participation of three Universities, namely the Universitat Politecnica de Valencia (Spain), acting as IP coordinator, the Technische Universitat Kaiserslautern (Germany), and the Universita degli Studi di Salerno (Italy). The Universite de Reims Champagne-Ardenne (France) participated as a new partner in the subsequent edition (2011-2013) of the IP, renamed as HUMAIN (Human-Machine Interaction). Both editions of the teaching project were characterized by the same objectives and organizational aspects, aiming to provide educational initiatives based on active teaching through collaborative works between international institutions, involving industrial partners too. The aim of the paper is to illustrate the best practices that characterized the organization of our experience as well as to present some general recommendations and suggestions on how to devise computing academic curricula. △ Less

Submitted 17 February, 2024; originally announced March 2024.

Comments: 21 pages

arXiv:2403.06854 [pdf, other]

Quantifying the Sensitivity of Inverse Reinforcement Learning to Misspecification

Authors: Joar Skalse, Alessandro Abate

Abstract: Inverse reinforcement learning (IRL) aims to infer an agent's preferences (represented as a reward function $R$) from their behaviour (represented as a policy $π$). To do this, we need a behavioural model of how $π$ relates to $R$. In the current literature, the most common behavioural models are optimality, Boltzmann-rationality, and causal entropy maximisation. However, the true relationship bet… ▽ More Inverse reinforcement learning (IRL) aims to infer an agent's preferences (represented as a reward function $R$) from their behaviour (represented as a policy $π$). To do this, we need a behavioural model of how $π$ relates to $R$. In the current literature, the most common behavioural models are optimality, Boltzmann-rationality, and causal entropy maximisation. However, the true relationship between a human's preferences and their behaviour is much more complex than any of these behavioural models. This means that the behavioural models are misspecified, which raises the concern that they may lead to systematic errors if applied to real data. In this paper, we analyse how sensitive the IRL problem is to misspecification of the behavioural model. Specifically, we provide necessary and sufficient conditions that completely characterise how the observed data may differ from the assumed behavioural model without incurring an error above a given threshold. In addition to this, we also characterise the conditions under which a behavioural model is robust to small perturbations of the observed policy, and we analyse how robust many behavioural models are to misspecification of their parameter values (such as e.g.\ the discount rate). Our analysis suggests that the IRL problem is highly sensitive to misspecification, in the sense that very mild misspecification can lead to very large errors in the inferred reward function. △ Less

Submitted 11 March, 2024; originally announced March 2024.

arXiv:2401.15838 [pdf, other]

Distributed Markov Chain Monte Carlo Sampling based on the Alternating Direction Method of Multipliers

Authors: Alexandros E. Tzikas, Licio Romao, Mert Pilanci, Alessandro Abate, Mykel J. Kochenderfer

Abstract: Many machine learning applications require operating on a spatially distributed dataset. Despite technological advances, privacy considerations and communication constraints may prevent gathering the entire dataset in a central unit. In this paper, we propose a distributed sampling scheme based on the alternating direction method of multipliers, which is commonly used in the optimization literatur… ▽ More Many machine learning applications require operating on a spatially distributed dataset. Despite technological advances, privacy considerations and communication constraints may prevent gathering the entire dataset in a central unit. In this paper, we propose a distributed sampling scheme based on the alternating direction method of multipliers, which is commonly used in the optimization literature due to its fast convergence. In contrast to distributed optimization, distributed sampling allows for uncertainty quantification in Bayesian inference tasks. We provide both theoretical guarantees of our algorithm's convergence and experimental evidence of its superiority to the state-of-the-art. For our theoretical results, we use convex optimization tools to establish a fundamental inequality on the generated local sample iterates. This inequality enables us to show convergence of the distribution associated with these iterates to the underlying target distribution in Wasserstein distance. In simulation, we deploy our algorithm on linear and logistic regression tasks and illustrate its fast convergence compared to existing gradient-based methods. △ Less

Submitted 28 January, 2024; originally announced January 2024.

arXiv:2401.14811 [pdf, ps, other]

On the Limitations of Markovian Rewards to Express Multi-Objective, Risk-Sensitive, and Modal Tasks

Authors: Joar Skalse, Alessandro Abate

Abstract: In this paper, we study the expressivity of scalar, Markovian reward functions in Reinforcement Learning (RL), and identify several limitations to what they can express. Specifically, we look at three classes of RL tasks; multi-objective RL, risk-sensitive RL, and modal RL. For each class, we derive necessary and sufficient conditions that describe when a problem in this class can be expressed usi… ▽ More In this paper, we study the expressivity of scalar, Markovian reward functions in Reinforcement Learning (RL), and identify several limitations to what they can express. Specifically, we look at three classes of RL tasks; multi-objective RL, risk-sensitive RL, and modal RL. For each class, we derive necessary and sufficient conditions that describe when a problem in this class can be expressed using a scalar, Markovian reward. Moreover, we find that scalar, Markovian rewards are unable to express most of the instances in each of these three classes. We thereby contribute to a more complete understanding of what standard reward functions can and cannot express. In addition to this, we also call attention to modal problems as a new class of problems, since they have so far not been given any systematic treatment in the RL literature. We also briefly outline some approaches for solving some of the problems we discuss, by means of bespoke RL algorithms. △ Less

Submitted 26 January, 2024; originally announced January 2024.

Journal ref: Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence, PMLR 216:1974-1984, 2023

arXiv:2312.11314 [pdf, other]

Safeguarded Progress in Reinforcement Learning: Safe Bayesian Exploration for Control Policy Synthesis

Authors: Rohan Mitta, Hosein Hasanbeig, Jun Wang, Daniel Kroening, Yiannis Kantaros, Alessandro Abate

Abstract: This paper addresses the problem of maintaining safety during training in Reinforcement Learning (RL), such that the safety constraint violations are bounded at any point during learning. In a variety of RL applications the safety of the agent is particularly important, e.g. autonomous platforms or robots that work in proximity of humans. As enforcing safety during training might severely limit th… ▽ More This paper addresses the problem of maintaining safety during training in Reinforcement Learning (RL), such that the safety constraint violations are bounded at any point during learning. In a variety of RL applications the safety of the agent is particularly important, e.g. autonomous platforms or robots that work in proximity of humans. As enforcing safety during training might severely limit the agent's exploration, we propose here a new architecture that handles the trade-off between efficient progress and safety during exploration. As the exploration progresses, we update via Bayesian inference Dirichlet-Categorical models of the transition probabilities of the Markov decision process that describes the environment dynamics. This paper proposes a way to approximate moments of belief about the risk associated to the action selection policy. We construct those approximations, and prove the convergence results. We propose a novel method for leveraging the expectation approximations to derive an approximate bound on the confidence that the risk is below a certain level. This approach can be easily interleaved with RL and we present experimental results to showcase the performance of the overall architecture. △ Less

Submitted 18 December, 2023; originally announced December 2023.

arXiv:2312.06344 [pdf, other]

Learning Robust Policies for Uncertain Parametric Markov Decision Processes

Authors: Luke Rickard, Alessandro Abate, Kostas Margellos

Abstract: Synthesising verifiably correct controllers for dynamical systems is crucial for safety-critical problems. To achieve this, it is important to account for uncertainty in a robust manner, while at the same time it is often of interest to avoid being overly conservative with the view of achieving a better cost. We propose a method for verifiably safe policy synthesis for a class of finite state mode… ▽ More Synthesising verifiably correct controllers for dynamical systems is crucial for safety-critical problems. To achieve this, it is important to account for uncertainty in a robust manner, while at the same time it is often of interest to avoid being overly conservative with the view of achieving a better cost. We propose a method for verifiably safe policy synthesis for a class of finite state models, under the presence of structural uncertainty. In particular, we consider uncertain parametric Markov decision processes (upMDPs), a special class of Markov decision processes, with parameterised transition functions, where such parameters are drawn from a (potentially) unknown distribution. Our framework leverages recent advancements in the so-called scenario approach theory, where we represent the uncertainty by means of scenarios, and provide guarantees on synthesised policies satisfying probabilistic computation tree logic (PCTL) formulae. We consider several common benchmarks/problems and compare our work to recent developments for verifying upMDPs. △ Less

Submitted 15 May, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

Comments: 10 pages, accepted for oral presentation at L4DC

arXiv:2311.09793 [pdf, other]

Fossil 2.0: Formal Certificate Synthesis for the Verification and Control of Dynamical Models

Authors: Alec Edwards, Andrea Peruffo, Alessandro Abate

Abstract: This paper presents Fossil 2.0, a new major release of a software tool for the synthesis of certificates (e.g., Lyapunov and barrier functions) for dynamical systems modelled as ordinary differential and difference equations. Fossil 2.0 is much improved from its original release, including new interfaces, a significantly expanded certificate portfolio, controller synthesis and enhanced extensibili… ▽ More This paper presents Fossil 2.0, a new major release of a software tool for the synthesis of certificates (e.g., Lyapunov and barrier functions) for dynamical systems modelled as ordinary differential and difference equations. Fossil 2.0 is much improved from its original release, including new interfaces, a significantly expanded certificate portfolio, controller synthesis and enhanced extensibility. We present these new features as part of this tool paper. Fossil implements a counterexample-guided inductive synthesis (CEGIS) loop ensuring the soundness of the method. Our tool uses neural networks as templates to generate candidate functions, which are then formally proven by an SMT solver acting as an assertion verifier. Improvements with respect to the first release include a wider range of certificates, synthesis of control laws, and support for discrete-time models. △ Less

Submitted 16 April, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

Comments: HSCC 2024 Tool Paper

arXiv:2311.09786 [pdf, other]

doi 10.4204/EPTCS.395.10

Correct-by-Construction Control for Stochastic and Uncertain Dynamical Models via Formal Abstractions

Authors: Thom Badings, Nils Jansen, Licio Romao, Alessandro Abate

Abstract: Automated synthesis of correct-by-construction controllers for autonomous systems is crucial for their deployment in safety-critical scenarios. Such autonomous systems are naturally modeled as stochastic dynamical models. The general problem is to compute a controller that provably satisfies a given task, represented as a probabilistic temporal logic specification. However, factors such as stochas… ▽ More Automated synthesis of correct-by-construction controllers for autonomous systems is crucial for their deployment in safety-critical scenarios. Such autonomous systems are naturally modeled as stochastic dynamical models. The general problem is to compute a controller that provably satisfies a given task, represented as a probabilistic temporal logic specification. However, factors such as stochastic uncertainty, imprecisely known parameters, and hybrid features make this problem challenging. We have developed an abstraction framework that can be used to solve this problem under various modeling assumptions. Our approach is based on a robust finite-state abstraction of the stochastic dynamical model in the form of a Markov decision process with intervals of probabilities (iMDP). We use state-of-the-art verification techniques to compute an optimal policy on the iMDP with guarantees for satisfying the given specification. We then show that, by construction, we can refine this policy into a feedback controller for which these guarantees carry over to the dynamical model. In this short paper, we survey our recent research in this area and highlight two challenges (related to scalability and dealing with nonlinear dynamics) that we aim to address with our ongoing research. △ Less

Submitted 16 November, 2023; originally announced November 2023.

Comments: In Proceedings FMAS 2023, arXiv:2311.08987. arXiv admin note: text overlap with arXiv:2301.01526

Journal ref: EPTCS 395, 2023, pp. 144-152

arXiv:2310.01951 [pdf, other]

Probabilistic Reach-Avoid for Bayesian Neural Networks

Authors: Matthew Wicker, Luca Laurenti, Andrea Patane, Nicola Paoletti, Alessandro Abate, Marta Kwiatkowska

Abstract: Model-based reinforcement learning seeks to simultaneously learn the dynamics of an unknown stochastic environment and synthesise an optimal policy for acting in it. Ensuring the safety and robustness of sequential decisions made through a policy in such an environment is a key challenge for policies intended for safety-critical scenarios. In this work, we investigate two complementary problems: f… ▽ More Model-based reinforcement learning seeks to simultaneously learn the dynamics of an unknown stochastic environment and synthesise an optimal policy for acting in it. Ensuring the safety and robustness of sequential decisions made through a policy in such an environment is a key challenge for policies intended for safety-critical scenarios. In this work, we investigate two complementary problems: first, computing reach-avoid probabilities for iterative predictions made with dynamical models, with dynamics described by Bayesian neural network (BNN); second, synthesising control policies that are optimal with respect to a given reach-avoid specification (reaching a "target" state, while avoiding a set of "unsafe" states) and a learned BNN model. Our solution leverages interval propagation and backward recursion techniques to compute lower bounds for the probability that a policy's sequence of actions leads to satisfying the reach-avoid specification. Such computed lower bounds provide safety certification for the given policy and BNN model. We then introduce control synthesis algorithms to derive policies maximizing said lower bounds on the safety probability. We demonstrate the effectiveness of our method on a series of control benchmarks characterized by learned BNN dynamics models. On our most challenging benchmark, compared to purely data-driven policies the optimal synthesis algorithm is able to provide more than a four-fold increase in the number of certifiable states and more than a three-fold increase in the average guaranteed reach-avoid probability. △ Less

Submitted 3 October, 2023; originally announced October 2023.

Comments: 47 pages, 10 figures. arXiv admin note: text overlap with arXiv:2105.10134

arXiv:2309.15257 [pdf, other]

STARC: A General Framework For Quantifying Differences Between Reward Functions

Authors: Joar Skalse, Lucy Farnik, Sumeet Ramesh Motwani, Erik Jenner, Adam Gleave, Alessandro Abate

Abstract: In order to solve a task using reinforcement learning, it is necessary to first formalise the goal of that task as a reward function. However, for many real-world tasks, it is very difficult to manually specify a reward function that never incentivises undesirable behaviour. As a result, it is increasingly popular to use \emph{reward learning algorithms}, which attempt to \emph{learn} a reward fun… ▽ More In order to solve a task using reinforcement learning, it is necessary to first formalise the goal of that task as a reward function. However, for many real-world tasks, it is very difficult to manually specify a reward function that never incentivises undesirable behaviour. As a result, it is increasingly popular to use \emph{reward learning algorithms}, which attempt to \emph{learn} a reward function from data. However, the theoretical foundations of reward learning are not yet well-developed. In particular, it is typically not known when a given reward learning algorithm with high probability will learn a reward function that is safe to optimise. This means that reward learning algorithms generally must be evaluated empirically, which is expensive, and that their failure modes are difficult to anticipate in advance. One of the roadblocks to deriving better theoretical guarantees is the lack of good methods for quantifying the difference between reward functions. In this paper we provide a solution to this problem, in the form of a class of pseudometrics on the space of all reward functions that we call STARC (STAndardised Reward Comparison) metrics. We show that STARC metrics induce both an upper and a lower bound on worst-case regret, which implies that our metrics are tight, and that any metric with the same properties must be bilipschitz equivalent to ours. Moreover, we also identify a number of issues with reward metrics proposed by earlier works. Finally, we evaluate our metrics empirically, to demonstrate their practical efficacy. STARC metrics can be used to make both theoretical and empirical analysis of reward learning algorithms both easier and more principled. △ Less

Submitted 11 March, 2024; v1 submitted 26 September, 2023; originally announced September 2023.

arXiv:2309.06090 [pdf, other]

A General Verification Framework for Dynamical and Control Models via Certificate Synthesis

Authors: Alec Edwards, Andrea Peruffo, Alessandro Abate

Abstract: An emerging branch of control theory specialises in certificate learning, concerning the specification of a desired (possibly complex) system behaviour for an autonomous or control model, which is then analytically verified by means of a function-based proof. However, the synthesis of controllers abiding by these complex requirements is in general a non-trivial task and may elude the most expert c… ▽ More An emerging branch of control theory specialises in certificate learning, concerning the specification of a desired (possibly complex) system behaviour for an autonomous or control model, which is then analytically verified by means of a function-based proof. However, the synthesis of controllers abiding by these complex requirements is in general a non-trivial task and may elude the most expert control engineers. This results in a need for automatic techniques that are able to design controllers and to analyse a wide range of elaborate specifications. In this paper, we provide a general framework to encode system specifications and define corresponding certificates, and we present an automated approach to formally synthesise controllers and certificates. Our approach contributes to the broad field of safe learning for control, exploiting the flexibility of neural networks to provide candidate control and certificate functions, whilst using SMT-solvers to offer a formal guarantee of correctness. We test our framework by developing a prototype software tool, and assess its efficacy at verification via control and certificate synthesis over a large and varied suite of benchmarks. △ Less

Submitted 1 July, 2024; v1 submitted 12 September, 2023; originally announced September 2023.

arXiv:2308.10587 [pdf, other]

Formal Analysis and Verification of Max-Plus Linear Systems

Authors: Muhammad Syifa'ul Mufid, Andrea Micheli, Alessandro Abate, Alessandro Cimatti

Abstract: Max-Plus Linear (MPL) systems are an algebraic formalism with practical applications in transportation networks, manufacturing and biological systems. In this paper, we investigate the problem of automatically analyzing the properties of MPL, taking into account both structural properties such as transient and cyclicity, and the open problem of user-defined temporal properties. We propose Time-Dif… ▽ More Max-Plus Linear (MPL) systems are an algebraic formalism with practical applications in transportation networks, manufacturing and biological systems. In this paper, we investigate the problem of automatically analyzing the properties of MPL, taking into account both structural properties such as transient and cyclicity, and the open problem of user-defined temporal properties. We propose Time-Difference LTL (TDLTL), a logic that encompasses the delays between the discrete time events governed by an MPL system, and characterize the problem of model checking TDLTL over MPL. We first consider a framework based on the verification of infinite-state transition systems, and propose an approach based on an encoding into model checking. Then, we leverage the specific features of MPL systems to devise a highly optimized, combinational approach based on Satisfiability Modulo Theory (SMT). We experimentally evaluate the features of the proposed approaches on a large set of benchmarks. The results show that the proposed approach substantially outperforms the state of the art competitors in expressiveness and effectiveness, and demonstrate the superiority of the combinational approach over the reduction to model checking. △ Less

Submitted 21 August, 2023; originally announced August 2023.

Comments: 28 pages (including appendixes)

arXiv:2307.15546 [pdf, other]

On the Trade-off Between Efficiency and Precision of Neural Abstraction

Authors: Alec Edwards, Mirco Giacobbe, Alessandro Abate

Abstract: Neural abstractions have been recently introduced as formal approximations of complex, nonlinear dynamical models. They comprise a neural ODE and a certified upper bound on the error between the abstract neural network and the concrete dynamical model. So far neural abstractions have exclusively been obtained as neural networks consisting entirely of $ReLU$ activation functions, resulting in neura… ▽ More Neural abstractions have been recently introduced as formal approximations of complex, nonlinear dynamical models. They comprise a neural ODE and a certified upper bound on the error between the abstract neural network and the concrete dynamical model. So far neural abstractions have exclusively been obtained as neural networks consisting entirely of $ReLU$ activation functions, resulting in neural ODE models that have piecewise affine dynamics, and which can be equivalently interpreted as linear hybrid automata. In this work, we observe that the utility of an abstraction depends on its use: some scenarios might require coarse abstractions that are easier to analyse, whereas others might require more complex, refined abstractions. We therefore consider neural abstractions of alternative shapes, namely either piecewise constant or nonlinear non-polynomial (specifically, obtained via sigmoidal activations). We employ formal inductive synthesis procedures to generate neural abstractions that result in dynamical models with these semantics. Empirically, we demonstrate the trade-off that these different neural abstraction templates have vis-a-vis their precision and synthesis time, as well as the time required for their safety verification (done via reachability computation). We improve existing synthesis techniques to enable abstraction of higher-dimensional models, and additionally discuss the abstraction of complex neural ODEs to improve the efficiency of reachability analysis for these models. △ Less

Submitted 2 October, 2023; v1 submitted 28 July, 2023; originally announced July 2023.

Comments: Appeared at QEST 2023. Added codebase link; corrected Eq. 11

arXiv:2307.05059 [pdf, ps, other]

doi 10.4204/EPTCS.379.17

On Imperfect Recall in Multi-Agent Influence Diagrams

Authors: James Fox, Matt MacDermott, Lewis Hammond, Paul Harrenstein, Alessandro Abate, Michael Wooldridge

Abstract: Multi-agent influence diagrams (MAIDs) are a popular game-theoretic model based on Bayesian networks. In some settings, MAIDs offer significant advantages over extensive-form game representations. Previous work on MAIDs has assumed that agents employ behavioural policies, which set independent conditional probability distributions over actions for each of their decisions. In settings with imperfec… ▽ More Multi-agent influence diagrams (MAIDs) are a popular game-theoretic model based on Bayesian networks. In some settings, MAIDs offer significant advantages over extensive-form game representations. Previous work on MAIDs has assumed that agents employ behavioural policies, which set independent conditional probability distributions over actions for each of their decisions. In settings with imperfect recall, however, a Nash equilibrium in behavioural policies may not exist. We overcome this by showing how to solve MAIDs with forgetful and absent-minded agents using mixed policies and two types of correlated equilibrium. We also analyse the computational complexity of key decision problems in MAIDs, and explore tractable cases. Finally, we describe applications of MAIDs to Markov games and team situations, where imperfect recall is often unavoidable. △ Less

Submitted 11 July, 2023; originally announced July 2023.

Comments: In Proceedings TARK 2023, arXiv:2307.04005

Journal ref: EPTCS 379, 2023, pp. 201-220

arXiv:2306.02766 [pdf, other]

Networked Communication for Decentralised Agents in Mean-Field Games

Authors: Patrick Benjamin, Alessandro Abate

Abstract: We introduce networked communication to the mean-field game framework, in particular to oracle-free settings where $N$ decentralised agents learn along a single, non-episodic run of the empirical system. We prove that our architecture, with only a few reasonable assumptions about network structure, has sample guarantees bounded between those of the centralised- and independent-learning cases. We d… ▽ More We introduce networked communication to the mean-field game framework, in particular to oracle-free settings where $N$ decentralised agents learn along a single, non-episodic run of the empirical system. We prove that our architecture, with only a few reasonable assumptions about network structure, has sample guarantees bounded between those of the centralised- and independent-learning cases. We discuss how the sample guarantees of the three theoretical algorithms do not actually result in practical convergence. We therefore show that in practical settings where the theoretical parameters are not observed (leading to poor estimation of the Q-function), our communication scheme significantly accelerates convergence over the independent case (and often even the centralised case), without relying on the assumption of a centralised learner. We contribute further practical enhancements to all three theoretical algorithms, allowing us to present their first empirical demonstrations. Our experiments confirm that we can remove several of the theoretical assumptions of the algorithms, and display the empirical convergence benefits brought by our new networked communication. We additionally show that the networked approach has significant advantages, over both the centralised and independent alternatives, in terms of robustness to unexpected learning failures and to changes in population size. △ Less

Submitted 28 June, 2024; v1 submitted 5 June, 2023; originally announced June 2023.

arXiv:2303.17618 [pdf, other]

Data-driven abstractions via adaptive refinements and a Kantorovich metric [extended version]

Authors: Adrien Banse, Licio Romao, Alessandro Abate, Raphaël M. Jungers

Abstract: We introduce an adaptive refinement procedure for smart, and scalable abstraction of dynamical systems. Our technique relies on partitioning the state space depending on the observation of future outputs. However, this knowledge is dynamically constructed in an adaptive, asymmetric way. In order to learn the optimal structure, we define a Kantorovich-inspired metric between Markov chains, and we u… ▽ More We introduce an adaptive refinement procedure for smart, and scalable abstraction of dynamical systems. Our technique relies on partitioning the state space depending on the observation of future outputs. However, this knowledge is dynamically constructed in an adaptive, asymmetric way. In order to learn the optimal structure, we define a Kantorovich-inspired metric between Markov chains, and we use it as a loss function. Our technique is prone to data-driven frameworks, but not restricted to. We also study properties of the above mentioned metric between Markov chains, which we believe could be of application for wider purpose. We propose an algorithm to approximate it, and we show that our method yields a much better computational complexity than using classical linear programming techniques. △ Less

Submitted 30 October, 2023; v1 submitted 30 March, 2023; originally announced March 2023.

Comments: This paper is an extended version of a CDC2023 submission

arXiv:2303.13657 [pdf, other]

Policy Evaluation in Distributional LQR

Authors: Zifan Wang, Yulong Gao, Siyi Wang, Michael M. Zavlanos, Alessandro Abate, Karl H. Johansson

Abstract: Distributional reinforcement learning (DRL) enhances the understanding of the effects of the randomness in the environment by letting agents learn the distribution of a random return, rather than its expected value as in standard RL. At the same time, a main challenge in DRL is that policy evaluation in DRL typically relies on the representation of the return distribution, which needs to be carefu… ▽ More Distributional reinforcement learning (DRL) enhances the understanding of the effects of the randomness in the environment by letting agents learn the distribution of a random return, rather than its expected value as in standard RL. At the same time, a main challenge in DRL is that policy evaluation in DRL typically relies on the representation of the return distribution, which needs to be carefully designed. In this paper, we address this challenge for a special class of DRL problems that rely on linear quadratic regulator (LQR) for control, advocating for a new distributional approach to LQR, which we call \emph{distributional LQR}. Specifically, we provide a closed-form expression of the distribution of the random return which, remarkably, is applicable to all exogenous disturbances on the dynamics, as long as they are independent and identically distributed (i.i.d.). While the proposed exact return distribution consists of infinitely many random variables, we show that this distribution can be approximated by a finite number of random variables, and the associated approximation error can be analytically bounded under mild assumptions. Using the approximate return distribution, we propose a zeroth-order policy gradient algorithm for risk-averse LQR using the Conditional Value at Risk (CVaR) as a measure of risk. Numerical experiments are provided to illustrate our theoretical results. △ Less

Submitted 23 March, 2023; originally announced March 2023.

Comments: 12pages

arXiv:2302.13888 [pdf, other]

k-Prize Weighted Voting Games

Authors: Wei-Chen Lee, David Hyland, Alessandro Abate, Edith Elkind, Jiarui Gan, Julian Gutierrez, Paul Harrenstein, Michael Wooldridge

Abstract: We introduce a natural variant of weighted voting games, which we refer to as k-Prize Weighted Voting Games. Such games consist of n players with weights, and k prizes, of possibly differing values. The players form coalitions, and the i-th largest coalition (by the sum of weights of its members) wins the i-th largest prize, which is then shared among its members. We present four solution concepts… ▽ More We introduce a natural variant of weighted voting games, which we refer to as k-Prize Weighted Voting Games. Such games consist of n players with weights, and k prizes, of possibly differing values. The players form coalitions, and the i-th largest coalition (by the sum of weights of its members) wins the i-th largest prize, which is then shared among its members. We present four solution concepts to analyse the games in this class, and characterise the existence of stable outcomes in games with three players and two prizes, and in games with uniform prizes. We then explore the efficiency of stable outcomes in terms of Pareto optimality and utilitarian social welfare. Finally, we study the computational complexity of finding stable outcomes. △ Less

Submitted 2 March, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

Comments: Accepted to AAMAS 2023

arXiv:2301.11683 [pdf, other]

Neural Abstractions

Authors: Alessandro Abate, Alec Edwards, Mirco Giacobbe

Abstract: We present a novel method for the safety verification of nonlinear dynamical models that uses neural networks to represent abstractions of their dynamics. Neural networks have extensively been used before as approximators; in this work, we make a step further and use them for the first time as abstractions. For a given dynamical model, our method synthesises a neural network that overapproximates… ▽ More We present a novel method for the safety verification of nonlinear dynamical models that uses neural networks to represent abstractions of their dynamics. Neural networks have extensively been used before as approximators; in this work, we make a step further and use them for the first time as abstractions. For a given dynamical model, our method synthesises a neural network that overapproximates its dynamics by ensuring an arbitrarily tight, formally certified bound on the approximation error. For this purpose, we employ a counterexample-guided inductive synthesis procedure. We show that this produces a neural ODE with non-deterministic disturbances that constitutes a formal abstraction of the concrete model under analysis. This guarantees a fundamental property: if the abstract model is safe, i.e., free from any initialised trajectory that reaches an undesirable state, then the concrete model is also safe. By using neural ODEs with ReLU activation functions as abstractions, we cast the safety verification problem for nonlinear dynamical models into that of hybrid automata with affine dynamics, which we verify using SpaceEx. We demonstrate that our approach performs comparably to the mature tool Flow* on existing benchmark nonlinear models. We additionally demonstrate and that it is effective on models that do not exhibit local Lipschitz continuity, which are out of reach to the existing technologies. △ Less

Submitted 27 January, 2023; originally announced January 2023.

Comments: NeurIPS 2022

arXiv:2301.06136 [pdf, other]

doi 10.4230/LIPIcs.CONCUR.2023.22

Quantitative Verification with Neural Networks

Authors: Alessandro Abate, Alec Edwards, Mirco Giacobbe, Hashan Punchihewa, Diptarko Roy

Abstract: We present a data-driven approach to the quantitative verification of probabilistic programs and stochastic dynamical models. Our approach leverages neural networks to compute tight and sound bounds for the probability that a stochastic process hits a target condition within finite time. This problem subsumes a variety of quantitative verification questions, from the reachability and safety analys… ▽ More We present a data-driven approach to the quantitative verification of probabilistic programs and stochastic dynamical models. Our approach leverages neural networks to compute tight and sound bounds for the probability that a stochastic process hits a target condition within finite time. This problem subsumes a variety of quantitative verification questions, from the reachability and safety analysis of discrete-time stochastic dynamical models, to the study of assertion-violation and termination analysis of probabilistic programs. We rely on neural networks to represent supermartingale certificates that yield such probability bounds, which we compute using a counterexample-guided inductive synthesis loop: we train the neural certificate while tightening the probability bound over samples of the state space using stochastic optimisation, and then we formally check the certificate's validity over every possible state using satisfiability modulo theories; if we receive a counterexample, we add it to our set of samples and repeat the loop until validity is confirmed. We demonstrate on a diverse set of benchmarks that, thanks to the expressive power of neural networks, our method yields smaller or comparable probability bounds than existing symbolic methods in all cases, and that our approach succeeds on models that are entirely beyond the reach of such alternative techniques. △ Less

Submitted 11 March, 2024; v1 submitted 15 January, 2023; originally announced January 2023.

Comments: The conference version of this manuscript appeared at CONCUR 2023

ACM Class: F.3.1; D.2.4

arXiv:2301.02324 [pdf, other]

doi 10.1016/j.artint.2023.103919

Reasoning about Causality in Games

Authors: Lewis Hammond, James Fox, Tom Everitt, Ryan Carey, Alessandro Abate, Michael Wooldridge

Abstract: Causal reasoning and game-theoretic reasoning are fundamental topics in artificial intelligence, among many other disciplines: this paper is concerned with their intersection. Despite their importance, a formal framework that supports both these forms of reasoning has, until now, been lacking. We offer a solution in the form of (structural) causal games, which can be seen as extending Pearl's caus… ▽ More Causal reasoning and game-theoretic reasoning are fundamental topics in artificial intelligence, among many other disciplines: this paper is concerned with their intersection. Despite their importance, a formal framework that supports both these forms of reasoning has, until now, been lacking. We offer a solution in the form of (structural) causal games, which can be seen as extending Pearl's causal hierarchy to the game-theoretic domain, or as extending Koller and Milch's multi-agent influence diagrams to the causal domain. We then consider three key questions: i) How can the (causal) dependencies in games - either between variables, or between strategies - be modelled in a uniform, principled manner? ii) How may causal queries be computed in causal games, and what assumptions does this require? iii) How do causal games compare to existing formalisms? To address question i), we introduce mechanised games, which encode dependencies between agents' decision rules and the distributions governing the game. In response to question ii), we present definitions of predictions, interventions, and counterfactuals, and discuss the assumptions required for each. Regarding question iii), we describe correspondences between causal games and other formalisms, and explain how causal games can be used to answer queries that other causal or game-theoretic models do not support. Finally, we highlight possible applications of causal games, aided by an extensive open-source Python library. △ Less

Submitted 17 April, 2023; v1 submitted 5 January, 2023; originally announced January 2023.

Comments: Published in Artificial Intelligence (2023)

arXiv:2301.01526 [pdf, other]

doi 10.1613/jair.1.14253

Robust Control for Dynamical Systems With Non-Gaussian Noise via Formal Abstractions

Authors: Thom Badings, Licio Romao, Alessandro Abate, David Parker, Hasan A. Poonawala, Marielle Stoelinga, Nils Jansen

Abstract: Controllers for dynamical systems that operate in safety-critical settings must account for stochastic disturbances. Such disturbances are often modeled as process noise in a dynamical system, and common assumptions are that the underlying distributions are known and/or Gaussian. In practice, however, these assumptions may be unrealistic and can lead to poor approximations of the true noise distri… ▽ More Controllers for dynamical systems that operate in safety-critical settings must account for stochastic disturbances. Such disturbances are often modeled as process noise in a dynamical system, and common assumptions are that the underlying distributions are known and/or Gaussian. In practice, however, these assumptions may be unrealistic and can lead to poor approximations of the true noise distribution. We present a novel controller synthesis method that does not rely on any explicit representation of the noise distributions. In particular, we address the problem of computing a controller that provides probabilistic guarantees on safely reaching a target, while also avoiding unsafe regions of the state space. First, we abstract the continuous control system into a finite-state model that captures noise by probabilistic transitions between discrete states. As a key contribution, we adapt tools from the scenario approach to compute probably approximately correct (PAC) bounds on these transition probabilities, based on a finite number of samples of the noise. We capture these bounds in the transition probability intervals of a so-called interval Markov decision process (iMDP). This iMDP is, with a user-specified confidence probability, robust against uncertainty in the transition probabilities, and the tightness of the probability intervals can be controlled through the number of samples. We use state-of-the-art verification techniques to provide guarantees on the iMDP and compute a controller for which these guarantees carry over to the original control system. In addition, we develop a tailored computational scheme that reduces the complexity of the synthesis of these guarantees on the iMDP. Benchmarks on realistic control systems show the practical applicability of our method, even when the iMDP has hundreds of millions of transitions. △ Less

Submitted 4 January, 2023; originally announced January 2023.

Comments: To appear in the Journal of Artificial Intelligence Research (JAIR). arXiv admin note: text overlap with arXiv:2110.12662

Journal ref: Journal of Artificial Intelligence Research (JAIR) 76 (2023) 341-391

arXiv:2212.13769 [pdf, other]

doi 10.24963/ijcai.2022/476

Lexicographic Multi-Objective Reinforcement Learning

Authors: Joar Skalse, Lewis Hammond, Charlie Griffin, Alessandro Abate

Abstract: In this work we introduce reinforcement learning techniques for solving lexicographic multi-objective problems. These are problems that involve multiple reward signals, and where the goal is to learn a policy that maximises the first reward signal, and subject to this constraint also maximises the second reward signal, and so on. We present a family of both action-value and policy gradient algorit… ▽ More In this work we introduce reinforcement learning techniques for solving lexicographic multi-objective problems. These are problems that involve multiple reward signals, and where the goal is to learn a policy that maximises the first reward signal, and subject to this constraint also maximises the second reward signal, and so on. We present a family of both action-value and policy gradient algorithms that can be used to solve such problems, and prove that they converge to policies that are lexicographically optimal. We evaluate the scalability and performance of these algorithms empirically, demonstrating their practical applicability. As a more specific application, we show how our algorithms can be used to impose safety constraints on the behaviour of an agent, and compare their performance in this context with that of other constrained reinforcement learning algorithms. △ Less

Submitted 28 December, 2022; originally announced December 2022.

Journal ref: IJCAI 2022; Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence. Main Track, Pages 3430-3436

arXiv:2212.03201 [pdf, ps, other]

Misspecification in Inverse Reinforcement Learning

Authors: Joar Skalse, Alessandro Abate

Abstract: The aim of Inverse Reinforcement Learning (IRL) is to infer a reward function $R$ from a policy $π$. To do this, we need a model of how $π$ relates to $R$. In the current literature, the most common models are optimality, Boltzmann rationality, and causal entropy maximisation. One of the primary motivations behind IRL is to infer human preferences from human behaviour. However, the true relationsh… ▽ More The aim of Inverse Reinforcement Learning (IRL) is to infer a reward function $R$ from a policy $π$. To do this, we need a model of how $π$ relates to $R$. In the current literature, the most common models are optimality, Boltzmann rationality, and causal entropy maximisation. One of the primary motivations behind IRL is to infer human preferences from human behaviour. However, the true relationship between human preferences and human behaviour is much more complex than any of the models currently used in IRL. This means that they are misspecified, which raises the worry that they might lead to unsound inferences if applied to real-world data. In this paper, we provide a mathematical analysis of how robust different IRL models are to misspecification, and answer precisely how the demonstrator policy may differ from each of the standard models before that model leads to faulty inferences about the reward function $R$. We also introduce a framework for reasoning about misspecification in IRL, together with formal tools that can be used to easily derive the misspecification robustness of new IRL models. △ Less

Submitted 24 March, 2023; v1 submitted 6 December, 2022; originally announced December 2022.

Journal ref: Proceedings of the AAAI Conference on Artificial Intelligence, 2023

arXiv:2212.00679 [pdf, other]

Formal Controller Synthesis for Markov Jump Linear Systems with Uncertain Dynamics

Authors: Luke Rickard, Thom Badings, Licio Romao, Alessandro Abate

Abstract: Automated synthesis of provably correct controllers for cyber-physical systems is crucial for deployment in safety-critical scenarios. However, hybrid features and stochastic or unknown behaviours make this problem challenging. We propose a method for synthesising controllers for Markov jump linear systems (MJLSs), a class of discrete-time models for cyber-physical systems, so that they certifiabl… ▽ More Automated synthesis of provably correct controllers for cyber-physical systems is crucial for deployment in safety-critical scenarios. However, hybrid features and stochastic or unknown behaviours make this problem challenging. We propose a method for synthesising controllers for Markov jump linear systems (MJLSs), a class of discrete-time models for cyber-physical systems, so that they certifiably satisfy probabilistic computation tree logic (PCTL) formulae. An MJLS consists of a finite set of stochastic linear dynamics and discrete jumps between these dynamics that are governed by a Markov decision process (MDP). We consider the cases where the transition probabilities of this MDP are either known up to an interval or completely unknown. Our approach is based on a finite-state abstraction that captures both the discrete (mode-jumping) and continuous (stochastic linear) behaviour of the MJLS. We formalise this abstraction as an interval MDP (iMDP) for which we compute intervals of transition probabilities using sampling techniques from the so-called 'scenario approach', resulting in a probabilistically sound approximation. We apply our method to multiple realistic benchmark problems, in particular, a temperature control and an aerial vehicle delivery problem. △ Less

Submitted 4 August, 2023; v1 submitted 1 December, 2022; originally announced December 2022.

Comments: 15 pages, accepted to QEST

arXiv:2210.05989 [pdf, other]

Probabilities Are Not Enough: Formal Controller Synthesis for Stochastic Dynamical Models with Epistemic Uncertainty

Authors: Thom Badings, Licio Romao, Alessandro Abate, Nils Jansen

Abstract: Capturing uncertainty in models of complex dynamical systems is crucial to designing safe controllers. Stochastic noise causes aleatoric uncertainty, whereas imprecise knowledge of model parameters leads to epistemic uncertainty. Several approaches use formal abstractions to synthesize policies that satisfy temporal specifications related to safety and reachability. However, the underlying models… ▽ More Capturing uncertainty in models of complex dynamical systems is crucial to designing safe controllers. Stochastic noise causes aleatoric uncertainty, whereas imprecise knowledge of model parameters leads to epistemic uncertainty. Several approaches use formal abstractions to synthesize policies that satisfy temporal specifications related to safety and reachability. However, the underlying models exclusively capture aleatoric but not epistemic uncertainty, and thus require that model parameters are known precisely. Our contribution to overcoming this restriction is a novel abstraction-based controller synthesis method for continuous-state models with stochastic noise and uncertain parameters. By sampling techniques and robust analysis, we capture both aleatoric and epistemic uncertainty, with a user-specified confidence level, in the transition probability intervals of a so-called interval Markov decision process (iMDP). We synthesize an optimal policy on this iMDP, which translates (with the specified confidence level) to a feedback controller for the continuous model with the same performance guarantees. Our experimental benchmarks confirm that accounting for epistemic uncertainty leads to controllers that are more robust against variations in parameter values. △ Less

Submitted 7 December, 2022; v1 submitted 12 October, 2022; originally announced October 2022.

Comments: Accepted at AAAI 2023

arXiv:2209.15320 [pdf, other]

Bounded Robustness in Reinforcement Learning via Lexicographic Objectives

Authors: Daniel Jarne Ornia, Licio Romao, Lewis Hammond, Manuel Mazo Jr., Alessandro Abate

Abstract: Policy robustness in Reinforcement Learning may not be desirable at any cost: the alterations caused by robustness requirements from otherwise optimal policies should be explainable, quantifiable and formally verifiable. In this work we study how policies can be maximally robust to arbitrary observational noise by analysing how they are altered by this noise through a stochastic linear operator in… ▽ More Policy robustness in Reinforcement Learning may not be desirable at any cost: the alterations caused by robustness requirements from otherwise optimal policies should be explainable, quantifiable and formally verifiable. In this work we study how policies can be maximally robust to arbitrary observational noise by analysing how they are altered by this noise through a stochastic linear operator interpretation of the disturbances, and establish connections between robustness and properties of the noise kernel and of the underlying MDPs. Then, we construct sufficient conditions for policy robustness, and propose a robustness-inducing scheme, applicable to any policy gradient algorithm, that formally trades off expected policy utility for robustness through lexicographic optimisation, while preserving convergence and sub-optimality in the policy synthesis. △ Less

Submitted 11 December, 2023; v1 submitted 30 September, 2022; originally announced September 2022.

arXiv:2209.10341 [pdf, other]

LCRL: Certified Policy Synthesis via Logically-Constrained Reinforcement Learning

Authors: Hosein Hasanbeig, Daniel Kroening, Alessandro Abate

Abstract: LCRL is a software tool that implements model-free Reinforcement Learning (RL) algorithms over unknown Markov Decision Processes (MDPs), synthesising policies that satisfy a given linear temporal specification with maximal probability. LCRL leverages partially deterministic finite-state machines known as Limit Deterministic Buchi Automata (LDBA) to express a given linear temporal specification. A… ▽ More LCRL is a software tool that implements model-free Reinforcement Learning (RL) algorithms over unknown Markov Decision Processes (MDPs), synthesising policies that satisfy a given linear temporal specification with maximal probability. LCRL leverages partially deterministic finite-state machines known as Limit Deterministic Buchi Automata (LDBA) to express a given linear temporal specification. A reward function for the RL algorithm is shaped on-the-fly, based on the structure of the LDBA. Theoretical guarantees under proper assumptions ensure the convergence of the RL algorithm to an optimal policy that maximises the satisfaction probability. We present case studies to demonstrate the applicability, ease of use, scalability, and performance of LCRL. Owing to the LDBA-guided exploration and LCRL model-free architecture, we observe robust performance, which also scales well when compared to standard RL approaches (whenever applicable to LTL specifications). Full instructions on how to execute all the case studies in this paper are provided on a GitHub page that accompanies the LCRL distribution www.github.com/grockious/lcrl. △ Less

Submitted 21 September, 2022; originally announced September 2022.

Comments: Evaluated and Accepted by the 19th International Conference on Quantitative Evaluation of Systems 2022

arXiv:2208.11838 [pdf, other]

Learning Task Automata for Reinforcement Learning using Hidden Markov Models

Authors: Alessandro Abate, Yousif Almulla, James Fox, David Hyland, Michael Wooldridge

Abstract: Training reinforcement learning (RL) agents using scalar reward signals is often infeasible when an environment has sparse and non-Markovian rewards. Moreover, handcrafting these reward functions before training is prone to misspecification, especially when the environment's dynamics are only partially known. This paper proposes a novel pipeline for learning non-Markovian task specifications as su… ▽ More Training reinforcement learning (RL) agents using scalar reward signals is often infeasible when an environment has sparse and non-Markovian rewards. Moreover, handcrafting these reward functions before training is prone to misspecification, especially when the environment's dynamics are only partially known. This paper proposes a novel pipeline for learning non-Markovian task specifications as succinct finite-state `task automata' from episodes of agent experience within unknown environments. We leverage two key algorithmic insights. First, we learn a product MDP, a model composed of the specification's automaton and the environment's MDP (both initially unknown), by treating the product MDP as a partially observable MDP and using the well-known Baum-Welch algorithm for learning hidden Markov models. Second, we propose a novel method for distilling the task automaton (assumed to be a deterministic finite automaton) from the learnt product MDP. Our learnt task automaton enables the decomposition of a task into its constituent sub-tasks, which improves the rate at which an RL agent can later synthesise an optimal policy. It also provides an interpretable encoding of high-level environmental and task features, so a human can readily verify that the agent has learnt coherent tasks with no misspecifications. In addition, we take steps towards ensuring that the learnt automaton is environment-agnostic, making it well-suited for use in transfer learning. Finally, we provide experimental results compared with two baselines to illustrate our algorithm's performance in different environments and tasks. △ Less

Submitted 3 October, 2023; v1 submitted 24 August, 2022; originally announced August 2022.

Comments: 14 pages, 7 figures, Accepted to the 26th European Conference on Artificial Intelligence (ECAI 2023)

arXiv:2208.06385

Low Emission Building Control with Zero-Shot Reinforcement Learning

Authors: Scott R. Jeen, Alessandro Abate, Jonathan M. Cullen

Abstract: Heating and cooling systems in buildings account for 31\% of global energy use, much of which are regulated by Rule Based Controllers (RBCs) that neither maximise energy efficiency nor minimise emissions by interacting optimally with the grid. Control via Reinforcement Learning (RL) has been shown to significantly improve building energy efficiency, but existing solutions require access to buildin… ▽ More Heating and cooling systems in buildings account for 31\% of global energy use, much of which are regulated by Rule Based Controllers (RBCs) that neither maximise energy efficiency nor minimise emissions by interacting optimally with the grid. Control via Reinforcement Learning (RL) has been shown to significantly improve building energy efficiency, but existing solutions require access to building-specific simulators or data that cannot be expected for every building in the world. In response, we show it is possible to obtain emission-reducing policies without such knowledge a priori--a paradigm we call zero-shot building control. We combine ideas from system identification and model-based RL to create PEARL (Probabilistic Emission-Abating Reinforcement Learning) and show that a short period of active exploration is all that is required to build a performant model. In experiments across three varied building energy simulations, we show PEARL outperforms an existing RBC once, and popular RL baselines in all cases, reducing building emissions by as much as 31\% whilst maintaining thermal comfort. Our source code is available online via https://enjeeneer.io/projects/pearl . △ Less

Submitted 15 August, 2022; v1 submitted 12 August, 2022; originally announced August 2022.

Comments: This paper should have been submitted as a replacement for arXiv:2206.14191

arXiv:2206.14191 [pdf, other]

doi 10.1609/aaai.v37i12.26668

Low Emission Building Control with Zero-Shot Reinforcement Learning

Authors: Scott R. Jeen, Alessandro Abate, Jonathan M. Cullen

Abstract: Heating and cooling systems in buildings account for 31% of global energy use, much of which are regulated by Rule Based Controllers (RBCs) that neither maximise energy efficiency nor minimise emissions by interacting optimally with the grid. Control via Reinforcement Learning (RL) has been shown to significantly improve building energy efficiency, but existing solutions require access to building… ▽ More Heating and cooling systems in buildings account for 31% of global energy use, much of which are regulated by Rule Based Controllers (RBCs) that neither maximise energy efficiency nor minimise emissions by interacting optimally with the grid. Control via Reinforcement Learning (RL) has been shown to significantly improve building energy efficiency, but existing solutions require access to building-specific simulators or data that cannot be expected for every building in the world. In response, we show it is possible to obtain emission-reducing policies without such knowledge a priori--a paradigm we call zero-shot building control. We combine ideas from system identification and model-based RL to create PEARL (Probabilistic Emission-Abating Reinforcement Learning) and show that a short period of active exploration is all that is required to build a performant model. In experiments across three varied building energy simulations, we show PEARL outperforms an existing RBC once, and popular RL baselines in all cases, reducing building emissions by as much as 31% whilst maintaining thermal comfort. Our source code is available online via https://enjeeneer.io/projects/pearl/ △ Less

Submitted 5 March, 2023; v1 submitted 28 June, 2022; originally announced June 2022.

Comments: Accepted at AAAI 2023. Code available via https://enjeeneer.io/projects/pearl/

arXiv:2203.07475 [pdf, other]

Invariance in Policy Optimisation and Partial Identifiability in Reward Learning

Authors: Joar Skalse, Matthew Farrugia-Roberts, Stuart Russell, Alessandro Abate, Adam Gleave

Abstract: It is often very challenging to manually design reward functions for complex, real-world tasks. To solve this, one can instead use reward learning to infer a reward function from data. However, there are often multiple reward functions that fit the data equally well, even in the infinite-data limit. This means that the reward function is only partially identifiable. In this work, we formally chara… ▽ More It is often very challenging to manually design reward functions for complex, real-world tasks. To solve this, one can instead use reward learning to infer a reward function from data. However, there are often multiple reward functions that fit the data equally well, even in the infinite-data limit. This means that the reward function is only partially identifiable. In this work, we formally characterise the partial identifiability of the reward function given several popular reward learning data sources, including expert demonstrations and trajectory comparisons. We also analyse the impact of this partial identifiability for several downstream tasks, such as policy optimisation. We unify our results in a framework for comparing data sources and downstream tasks by their invariances, with implications for the design and selection of data sources for reward learning. △ Less

Submitted 7 June, 2023; v1 submitted 14 March, 2022; originally announced March 2022.

Comments: ICML 2023. 9 pages main paper, 26 pages total, 3 figures

ACM Class: I.2.6

arXiv:2110.12662 [pdf, other]

doi 10.1609/aaai.v36i9.21201

Sampling-Based Robust Control of Autonomous Systems with Non-Gaussian Noise

Authors: Thom S. Badings, Alessandro Abate, Nils Jansen, David Parker, Hasan A. Poonawala, Marielle Stoelinga

Abstract: Controllers for autonomous systems that operate in safety-critical settings must account for stochastic disturbances. Such disturbances are often modelled as process noise, and common assumptions are that the underlying distributions are known and/or Gaussian. In practice, however, these assumptions may be unrealistic and can lead to poor approximations of the true noise distribution. We present a… ▽ More Controllers for autonomous systems that operate in safety-critical settings must account for stochastic disturbances. Such disturbances are often modelled as process noise, and common assumptions are that the underlying distributions are known and/or Gaussian. In practice, however, these assumptions may be unrealistic and can lead to poor approximations of the true noise distribution. We present a novel planning method that does not rely on any explicit representation of the noise distributions. In particular, we address the problem of computing a controller that provides probabilistic guarantees on safely reaching a target. First, we abstract the continuous system into a discrete-state model that captures noise by probabilistic transitions between states. As a key contribution, we adapt tools from the scenario approach to compute probably approximately correct (PAC) bounds on these transition probabilities, based on a finite number of samples of the noise. We capture these bounds in the transition probability intervals of a so-called interval Markov decision process (iMDP). This iMDP is robust against uncertainty in the transition probabilities, and the tightness of the probability intervals can be controlled through the number of samples. We use state-of-the-art verification techniques to provide guarantees on the iMDP, and compute a controller for which these guarantees carry over to the autonomous system. Realistic benchmarks show the practical applicability of our method, even when the iMDP has millions of states or transitions. △ Less

Submitted 13 December, 2021; v1 submitted 25 October, 2021; originally announced October 2021.

Journal ref: AAAI 2022 (distinguished paper)

arXiv:2105.10134 [pdf, other]

Certification of Iterative Predictions in Bayesian Neural Networks

Authors: Matthew Wicker, Luca Laurenti, Andrea Patane, Nicola Paoletti, Alessandro Abate, Marta Kwiatkowska

Abstract: We consider the problem of computing reach-avoid probabilities for iterative predictions made with Bayesian neural network (BNN) models. Specifically, we leverage bound propagation techniques and backward recursion to compute lower bounds for the probability that trajectories of the BNN model reach a given set of states while avoiding a set of unsafe states. We use the lower bounds in the context… ▽ More We consider the problem of computing reach-avoid probabilities for iterative predictions made with Bayesian neural network (BNN) models. Specifically, we leverage bound propagation techniques and backward recursion to compute lower bounds for the probability that trajectories of the BNN model reach a given set of states while avoiding a set of unsafe states. We use the lower bounds in the context of control and reinforcement learning to provide safety certification for given control policies, as well as to synthesize control policies that improve the certification bounds. On a set of benchmarks, we demonstrate that our framework can be employed to certify policies over BNNs predictions for problems of more than $10$ dimensions, and to effectively synthesize policies that significantly increase the lower bound on the satisfaction probability. △ Less

Submitted 19 June, 2021; v1 submitted 21 May, 2021; originally announced May 2021.

Comments: Accepted, UAI 2021. 17 pages

arXiv:2104.14691 [pdf, other]

Grid-Free Computation of Probabilistic Safety with Malliavin Calculus

Authors: Francesco Cosentino, Harald Oberhauser, Alessandro Abate

Abstract: This work concerns continuous-time, continuous-space stochastic dynamical systems described by stochastic differential equations (SDE). It presents a new approach to compute probabilistic safety regions, namely sets of initial conditions of the SDE associated to trajectories that are safe with a probability larger than a given threshold. The approach introduces a functional that is minimised at th… ▽ More This work concerns continuous-time, continuous-space stochastic dynamical systems described by stochastic differential equations (SDE). It presents a new approach to compute probabilistic safety regions, namely sets of initial conditions of the SDE associated to trajectories that are safe with a probability larger than a given threshold. The approach introduces a functional that is minimised at the border of the probabilistic safety region, then solves an optimisation problem using techniques from Malliavin Calculus, which computes such region. Unlike existing results in the literature, the new approach allows one to compute probabilistic safety regions without gridding the state space of the SDE. △ Less

Submitted 10 January, 2023; v1 submitted 29 April, 2021; originally announced April 2021.

arXiv:2102.12855 [pdf, other]

doi 10.1109/LRA.2021.3101544

Modular Deep Reinforcement Learning for Continuous Motion Planning with Temporal Logic

Authors: Mingyu Cai, Mohammadhosein Hasanbeig, Shaoping Xiao, Alessandro Abate, Zhen Kan

Abstract: This paper investigates the motion planning of autonomous dynamical systems modeled by Markov decision processes (MDP) with unknown transition probabilities over continuous state and action spaces. Linear temporal logic (LTL) is used to specify high-level tasks over infinite horizon, which can be converted into a limit deterministic generalized Büchi automaton (LDGBA) with several accepting sets.… ▽ More This paper investigates the motion planning of autonomous dynamical systems modeled by Markov decision processes (MDP) with unknown transition probabilities over continuous state and action spaces. Linear temporal logic (LTL) is used to specify high-level tasks over infinite horizon, which can be converted into a limit deterministic generalized Büchi automaton (LDGBA) with several accepting sets. The novelty is to design an embedded product MDP (EP-MDP) between the LDGBA and the MDP by incorporating a synchronous tracking-frontier function to record unvisited accepting sets of the automaton, and to facilitate the satisfaction of the accepting conditions. The proposed LDGBA-based reward shaping and discounting schemes for the model-free reinforcement learning (RL) only depend on the EP-MDP states and can overcome the issues of sparse rewards. Rigorous analysis shows that any RL method that optimizes the expected discounted return is guaranteed to find an optimal policy whose traces maximize the satisfaction probability. A modular deep deterministic policy gradient (DDPG) is then developed to generate such policies over continuous state and action spaces. The performance of our framework is evaluated via an array of OpenAI gym environments. △ Less

Submitted 23 January, 2022; v1 submitted 23 February, 2021; originally announced February 2021.

Comments: arXiv admin note: text overlap with arXiv:2010.06797

Journal ref: IEEE Robotics and Automation Letters, 2021

arXiv:2102.05008 [pdf, other]

Equilibrium Refinements for Multi-Agent Influence Diagrams: Theory and Practice

Authors: Lewis Hammond, James Fox, Tom Everitt, Alessandro Abate, Michael Wooldridge

Abstract: Multi-agent influence diagrams (MAIDs) are a popular form of graphical model that, for certain classes of games, have been shown to offer key complexity and explainability advantages over traditional extensive form game (EFG) representations. In this paper, we extend previous work on MAIDs by introducing the concept of a MAID subgame, as well as subgame perfect and trembling hand perfect equilibri… ▽ More Multi-agent influence diagrams (MAIDs) are a popular form of graphical model that, for certain classes of games, have been shown to offer key complexity and explainability advantages over traditional extensive form game (EFG) representations. In this paper, we extend previous work on MAIDs by introducing the concept of a MAID subgame, as well as subgame perfect and trembling hand perfect equilibrium refinements. We then prove several equivalence results between MAIDs and EFGs. Finally, we describe an open source implementation for reasoning about MAIDs and computing their equilibria. △ Less

Submitted 9 February, 2021; originally announced February 2021.

Comments: Accepted to the 20th International Conference on Autonomous Agents and Multiagent Systems (AAMAS-21)

arXiv:2102.00582 [pdf, other]

Multi-Agent Reinforcement Learning with Temporal Logic Specifications

Authors: Lewis Hammond, Alessandro Abate, Julian Gutierrez, Michael Wooldridge

Abstract: In this paper, we study the problem of learning to satisfy temporal logic specifications with a group of agents in an unknown environment, which may exhibit probabilistic behaviour. From a learning perspective these specifications provide a rich formal language with which to capture tasks or objectives, while from a logic and automated verification perspective the introduction of learning capabili… ▽ More In this paper, we study the problem of learning to satisfy temporal logic specifications with a group of agents in an unknown environment, which may exhibit probabilistic behaviour. From a learning perspective these specifications provide a rich formal language with which to capture tasks or objectives, while from a logic and automated verification perspective the introduction of learning capabilities allows for practical applications in large, stochastic, unknown environments. The existing work in this area is, however, limited. Of the frameworks that consider full linear temporal logic or have correctness guarantees, all methods thus far consider only the case of a single temporal logic specification and a single agent. In order to overcome this limitation, we develop the first multi-agent reinforcement learning technique for temporal logic specifications, which is also novel in its ability to handle multiple specifications. We provide correctness and convergence guarantees for our main algorithm - ALMANAC (Automaton/Logic Multi-Agent Natural Actor-Critic) - even when using function approximation. Alongside our theoretical results, we further demonstrate the applicability of our technique via a set of preliminary experiments. △ Less

Submitted 9 February, 2021; v1 submitted 31 January, 2021; originally announced February 2021.

Comments: Accepted to the 20th International Conference on Autonomous Agents and Multiagent Systems (AAMAS-21)

arXiv:2101.07491 [pdf, ps, other]

Automated Verification and Synthesis of Stochastic Hybrid Systems: A Survey

Authors: Abolfazl Lavaei, Sadegh Soudjani, Alessandro Abate, Majid Zamani

Abstract: Stochastic hybrid systems have received significant attentions as a relevant modelling framework describing many systems, from engineering to the life sciences: they enable the study of numerous applications, including transportation networks, biological systems and chemical reaction networks, smart energy and power grids, and beyond. Automated verification and policy synthesis for stochastic hybr… ▽ More Stochastic hybrid systems have received significant attentions as a relevant modelling framework describing many systems, from engineering to the life sciences: they enable the study of numerous applications, including transportation networks, biological systems and chemical reaction networks, smart energy and power grids, and beyond. Automated verification and policy synthesis for stochastic hybrid systems can be inherently challenging: this is due to the heterogeneity of their dynamics (presence of continuous and discrete components), the presence of uncertainty, and in some applications the large dimension of state and input sets. Over the past few years, a few hundred articles have investigated these models, and developed diverse and powerful approaches to mitigate difficulties encountered in the analysis and synthesis of such complex stochastic systems. In this survey, we overview the most recent results in the literature and discuss different approaches, including (in)finite abstractions, verification and synthesis for temporal logic specifications, stochastic similarity relations, (control) barrier certificates, compositional techniques, and a selection of results on continuous-time stochastic systems; we finally survey recently developed software tools that implement the discussed approaches. Throughout the manuscript we discuss a few open topics to be considered as potential future research directions: we hope that this survey will guide younger researchers through a comprehensive understanding of the various challenges, tools, and solutions in this enticing and rich scientific area. △ Less

Submitted 10 March, 2022; v1 submitted 19 January, 2021; originally announced January 2021.

Showing 1–50 of 85 results for author: Abate, A