Search | arXiv e-print repository

RL$^3$: Boosting Meta Reinforcement Learning via RL inside RL$^2$

Authors: Abhinav Bhatia, Samer B. Nashed, Shlomo Zilberstein

Abstract: Meta reinforcement learning (meta-RL) methods such as RL$^2$ have emerged as promising approaches for learning data-efficient RL algorithms tailored to a given task distribution. However, they show poor asymptotic performance and struggle with out-of-distribution tasks because they rely on sequence models, such as recurrent neural networks or transformers, to process experiences rather than summar… ▽ More Meta reinforcement learning (meta-RL) methods such as RL$^2$ have emerged as promising approaches for learning data-efficient RL algorithms tailored to a given task distribution. However, they show poor asymptotic performance and struggle with out-of-distribution tasks because they rely on sequence models, such as recurrent neural networks or transformers, to process experiences rather than summarize them using general-purpose RL components such as value functions. In contrast, traditional RL algorithms are data-inefficient as they do not use domain knowledge, but they do converge to an optimal policy in the limit. We propose RL$^3$, a principled hybrid approach that incorporates action-values, learned per task through traditional RL, in the inputs to meta-RL. We show that RL$^3$ earns greater cumulative reward in the long term, compared to RL$^2$, while maintaining data-efficiency in the short term, and generalizes better to out-of-distribution tasks. Experiments are conducted on both custom and benchmark discrete domains from the meta-RL literature that exhibit a range of short-term, long-term, and complex dependencies. △ Less

Submitted 26 March, 2024; v1 submitted 28 June, 2023; originally announced June 2023.

arXiv:2301.05753 [pdf, ps, other]

Fairness and Sequential Decision Making: Limits, Lessons, and Opportunities

Authors: Samer B. Nashed, Justin Svegliato, Su Lin Blodgett

Abstract: As automated decision making and decision assistance systems become common in everyday life, research on the prevention or mitigation of potential harms that arise from decisions made by these systems has proliferated. However, various research communities have independently conceptualized these harms, envisioned potential applications, and proposed interventions. The result is a somewhat fracture… ▽ More As automated decision making and decision assistance systems become common in everyday life, research on the prevention or mitigation of potential harms that arise from decisions made by these systems has proliferated. However, various research communities have independently conceptualized these harms, envisioned potential applications, and proposed interventions. The result is a somewhat fractured landscape of literature focused generally on ensuring decision-making algorithms "do the right thing". In this paper, we compare and discuss work across two major subsets of this literature: algorithmic fairness, which focuses primarily on predictive systems, and ethical decision making, which focuses primarily on sequential decision making and planning. We explore how each of these settings has articulated its normative concerns, the viability of different techniques for these different settings, and how ideas from each setting may have utility for the other. △ Less

Submitted 13 January, 2023; originally announced January 2023.

Comments: 10 pages

arXiv:2205.15462 [pdf, other]

Causal Explanations for Sequential Decision Making Under Uncertainty

Authors: Samer B. Nashed, Saaduddin Mahmud, Claudia V. Goldman, Shlomo Zilberstein

Abstract: We introduce a novel framework for causal explanations of stochastic, sequential decision-making systems built on the well-studied structural causal model paradigm for causal reasoning. This single framework can identify multiple, semantically distinct explanations for agent actions -- something not previously possible. In this paper, we establish exact methods and several approximation techniques… ▽ More We introduce a novel framework for causal explanations of stochastic, sequential decision-making systems built on the well-studied structural causal model paradigm for causal reasoning. This single framework can identify multiple, semantically distinct explanations for agent actions -- something not previously possible. In this paper, we establish exact methods and several approximation techniques for causal inference on Markov decision processes using this framework, followed by results on the applicability of the exact methods and some run time bounds. We discuss several scenarios that illustrate the framework's flexibility and the results of experiments with human subjects that confirm the benefits of this approach. △ Less

Submitted 10 January, 2023; v1 submitted 30 May, 2022; originally announced May 2022.

Comments: 9 pages, 7 figures

arXiv:2007.15746 [pdf, other]

Laser2Vec: Similarity-based Retrieval for Robotic Perception Data

Authors: Samer B. Nashed

Abstract: As mobile robot capabilities improve and deployment times increase, tools to analyze the growing volume of data are becoming necessary. Current state-of-the-art logging, playback, and exploration systems are insufficient for practitioners seeking to discover systemic points of failure in robotic systems. This paper presents a suite of algorithms for similarity-based queries of robotic perception d… ▽ More As mobile robot capabilities improve and deployment times increase, tools to analyze the growing volume of data are becoming necessary. Current state-of-the-art logging, playback, and exploration systems are insufficient for practitioners seeking to discover systemic points of failure in robotic systems. This paper presents a suite of algorithms for similarity-based queries of robotic perception data and implements a system for storing 2D LiDAR data from many deployments cheaply and evaluating top-k queries for complete or partial scans efficiently. We generate compressed representations of laser scans via a convolutional variational autoencoder and store them in a database, where a light-weight dense network for distance function approximation is run at query time. Our query evaluator leverages the local continuity of the embedding space to generate evaluation orders that, in expectation, dominate full linear scans of the database. The accuracy, robustness, scalability, and efficiency of our system is tested on real-world data gathered from dozens of deployments and synthetic data generated by corrupting real data. We find our system accurately and efficiently identifies similar scans across a number of episodes where the robot encountered the same location, or similar indoor structures or objects. △ Less

Submitted 30 July, 2020; originally announced July 2020.

Comments: 6 pages

ACM Class: I.2.9

arXiv:1803.01378 [pdf, other]

Localization under Topological Uncertainty for Lane Identification of Autonomous Vehicles

Authors: Samer B. Nashed, David M. Ilstrup, Joydeep Biswas

Abstract: Autonomous vehicles (AVs) require accurate metric and topological location estimates for safe, effective navigation and decision-making. Although many high-definition (HD) roadmaps exist, they are not always accurate since public roads are dynamic, shaped unpredictably by both human activity and nature. Thus, AVs must be able to handle situations in which the topology specified by the map does not… ▽ More Autonomous vehicles (AVs) require accurate metric and topological location estimates for safe, effective navigation and decision-making. Although many high-definition (HD) roadmaps exist, they are not always accurate since public roads are dynamic, shaped unpredictably by both human activity and nature. Thus, AVs must be able to handle situations in which the topology specified by the map does not agree with reality. We present the Variable Structure Multiple Hidden Markov Model (VSM-HMM) as a framework for localizing in the presence of topological uncertainty, and demonstrate its effectiveness on an AV where lane membership is modeled as a topological localization process. VSM-HMMs use a dynamic set of HMMs to simultaneously reason about location within a set of most likely current topologies and therefore may also be applied to topological structure estimation as well as AV lane estimation. In addition, we present an extension to the Earth Mover's Distance which allows uncertainty to be taken into account when computing the distance between belief distributions on simplices of arbitrary relative sizes. △ Less

Submitted 4 March, 2018; originally announced March 2018.

Comments: 6 pages, to appear in ICRA 2018

arXiv:1711.08566 [pdf, other]

Human-in-the-Loop SLAM

Authors: Samer B. Nashed, Joydeep Biswas

Abstract: Building large-scale, globally consistent maps is a challenging problem, made more difficult in environments with limited access, sparse features, or when using data collected by novice users. For such scenarios, where state-of-the-art mapping algorithms produce globally inconsistent maps, we introduce a systematic approach to incorporating sparse human corrections, which we term Human-in-the-Loop… ▽ More Building large-scale, globally consistent maps is a challenging problem, made more difficult in environments with limited access, sparse features, or when using data collected by novice users. For such scenarios, where state-of-the-art mapping algorithms produce globally inconsistent maps, we introduce a systematic approach to incorporating sparse human corrections, which we term Human-in-the-Loop Simultaneous Localization and Mapping (HitL-SLAM). Given an initial factor graph for pose graph SLAM, HitL-SLAM accepts approximate, potentially erroneous, and rank-deficient human input, infers the intended correction via expectation maximization (EM), back-propagates the extracted corrections over the pose graph, and finally jointly optimizes the factor graph including the human inputs as human correction factor terms, to yield globally consistent large-scale maps. We thus contribute an EM formulation for inferring potentially rank-deficient human corrections to mapping, and human correction factor extensions to the factor graphs for pose graph SLAM that result in a principled approach to joint optimization of the pose graph while simultaneously accounting for multiple forms of human correction. We present empirical results showing the effectiveness of HitL-SLAM at generating globally accurate and consistent maps even when given poor initial estimates of the map. △ Less

Submitted 22 November, 2017; originally announced November 2017.

Comments: AAAI 2018

Showing 1–6 of 6 results for author: Nashed, S B