Search | arXiv e-print repository

Representation Learning For Efficient Deep Multi-Agent Reinforcement Learning

Abstract: Sample efficiency remains a key challenge in multi-agent reinforcement learning (MARL). A promising approach is to learn a meaningful latent representation space through auxiliary learning objectives alongside the MARL objective to aid in learning a successful control policy. In our work, we present MAPO-LSO (Multi-Agent Policy Optimization with Latent Space Optimization) which applies a form of c… ▽ More Sample efficiency remains a key challenge in multi-agent reinforcement learning (MARL). A promising approach is to learn a meaningful latent representation space through auxiliary learning objectives alongside the MARL objective to aid in learning a successful control policy. In our work, we present MAPO-LSO (Multi-Agent Policy Optimization with Latent Space Optimization) which applies a form of comprehensive representation learning devised to supplement MARL training. Specifically, MAPO-LSO proposes a multi-agent extension of transition dynamics reconstruction and self-predictive learning that constructs a latent state optimization scheme that can be trivially extended to current state-of-the-art MARL algorithms. Empirical results demonstrate MAPO-LSO to show notable improvements in sample efficiency and learning performance compared to its vanilla MARL counterpart without any additional MARL hyperparameter tuning on a diverse suite of MARL tasks. △ Less

Submitted 4 June, 2024; originally announced June 2024.

arXiv:2404.18638 [pdf, other]

Reinforcement Learning Problem Solving with Large Language Models

Authors: Sina Gholamian, Domingo Huh

Abstract: Large Language Models (LLMs) encapsulate an extensive amount of world knowledge, and this has enabled their application in various domains to improve the performance of a variety of Natural Language Processing (NLP) tasks. This has also facilitated a more accessible paradigm of conversation-based interactions between humans and AI systems to solve intended problems. However, one interesting avenue… ▽ More Large Language Models (LLMs) encapsulate an extensive amount of world knowledge, and this has enabled their application in various domains to improve the performance of a variety of Natural Language Processing (NLP) tasks. This has also facilitated a more accessible paradigm of conversation-based interactions between humans and AI systems to solve intended problems. However, one interesting avenue that shows untapped potential is the use of LLMs as Reinforcement Learning (RL) agents to enable conversational RL problem solving. Therefore, in this study, we explore the concept of formulating Markov Decision Process-based RL problems as LLM prompting tasks. We demonstrate how LLMs can be iteratively prompted to learn and optimize policies for specific RL tasks. In addition, we leverage the introduced prompting technique for episode simulation and Q-Learning, facilitated by LLMs. We then show the practicality of our approach through two detailed case studies for "Research Scientist" and "Legal Matter Intake" workflows. △ Less

Submitted 29 April, 2024; originally announced April 2024.

arXiv:2402.17002 [pdf, other]

Discovering Abstract Symbolic Relations by Learning Unitary Group Representations

Authors: Dongsung Huh

Abstract: We investigate a principled approach for symbolic operation completion (SOC), a minimal task for studying symbolic reasoning. While conceptually similar to matrix completion, SOC poses a unique challenge in modeling abstract relationships between discrete symbols. We demonstrate that SOC can be efficiently solved by a minimal model - a bilinear map - with a novel factorized architecture. Inspired… ▽ More We investigate a principled approach for symbolic operation completion (SOC), a minimal task for studying symbolic reasoning. While conceptually similar to matrix completion, SOC poses a unique challenge in modeling abstract relationships between discrete symbols. We demonstrate that SOC can be efficiently solved by a minimal model - a bilinear map - with a novel factorized architecture. Inspired by group representation theory, this architecture leverages matrix embeddings of symbols, modeling each symbol as an operator that dynamically influences others. Our model achieves perfect test accuracy on SOC with comparable or superior sample efficiency to Transformer baselines across most datasets, while boasting significantly faster learning speeds (100-1000$\times$). Crucially, the model exhibits an implicit bias towards learning general group structures, precisely discovering the unitary representations of underlying groups. This remarkable property not only confers interpretability but also significant implications for automatic symmetry discovery in geometric deep learning. Overall, our work establishes group theory as a powerful guiding principle for discovering abstract algebraic structures in deep learning, and showcases matrix representations as a compelling alternative to traditional vector embeddings for modeling symbolic relationships. △ Less

Submitted 22 May, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

arXiv:2312.10256 [pdf, other]

Multi-agent Reinforcement Learning: A Comprehensive Survey

Authors: Dom Huh, Prasant Mohapatra

Abstract: Multi-agent systems (MAS) are widely prevalent and crucially important in numerous real-world applications, where multiple agents must make decisions to achieve their objectives in a shared environment. Despite their ubiquity, the development of intelligent decision-making agents in MAS poses several open challenges to their effective implementation. This survey examines these challenges, placing… ▽ More Multi-agent systems (MAS) are widely prevalent and crucially important in numerous real-world applications, where multiple agents must make decisions to achieve their objectives in a shared environment. Despite their ubiquity, the development of intelligent decision-making agents in MAS poses several open challenges to their effective implementation. This survey examines these challenges, placing an emphasis on studying seminal concepts from game theory (GT) and machine learning (ML) and connecting them to recent advancements in multi-agent reinforcement learning (MARL), i.e. the research of data-driven decision-making within MAS. Therefore, the objective of this survey is to provide a comprehensive perspective along the various dimensions of MARL, shedding light on the unique opportunities that are presented in MARL applications while highlighting the inherent challenges that accompany this potential. Therefore, we hope that our work will not only contribute to the field by analyzing the current landscape of MARL but also motivate future directions with insights for deeper integration of concepts from related domains of GT and ML. With this in mind, this work delves into a detailed exploration of recent and past efforts of MARL and its related fields and describes prior solutions that were proposed and their limitations, as well as their applications. △ Less

Submitted 2 July, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

arXiv:2305.00604 [pdf, other]

ISAAC Newton: Input-based Approximate Curvature for Newton's Method

Authors: Felix Petersen, Tobias Sutter, Christian Borgelt, Dongsung Huh, Hilde Kuehne, Yuekai Sun, Oliver Deussen

Abstract: We present ISAAC (Input-baSed ApproximAte Curvature), a novel method that conditions the gradient using selected second-order information and has an asymptotically vanishing computational overhead, assuming a batch size smaller than the number of neurons. We show that it is possible to compute a good conditioner based on only the input to a respective layer without a substantial computational over… ▽ More We present ISAAC (Input-baSed ApproximAte Curvature), a novel method that conditions the gradient using selected second-order information and has an asymptotically vanishing computational overhead, assuming a batch size smaller than the number of neurons. We show that it is possible to compute a good conditioner based on only the input to a respective layer without a substantial computational overhead. The proposed method allows effective training even in small-batch stochastic regimes, which makes it competitive to first-order as well as second-order methods. △ Less

Submitted 30 April, 2023; originally announced May 2023.

Comments: Published at ICLR 2023, Code @ https://github.com/Felix-Petersen/isaac, Video @ https://youtu.be/7RKRX-MdwqM

arXiv:2301.08864 [pdf, other]

Decentralized Multi-agent Filtering

Authors: Dom Huh, Prasant Mohapatra

Abstract: This paper addresses the considerations that comes along with adopting decentralized communication for multi-agent localization applications in discrete state spaces. In this framework, we extend the original formulation of the Bayes filter, a foundational probabilistic tool for discrete state estimation, by appending a step of greedy belief sharing as a method to propagate information and improve… ▽ More This paper addresses the considerations that comes along with adopting decentralized communication for multi-agent localization applications in discrete state spaces. In this framework, we extend the original formulation of the Bayes filter, a foundational probabilistic tool for discrete state estimation, by appending a step of greedy belief sharing as a method to propagate information and improve local estimates' posteriors. We apply our work in a model-based multi-agent grid-world setting, where each agent maintains a belief distribution for every agents' state. Our results affirm the utility of our proposed extensions for decentralized collaborative tasks. The code base for this work is available in the following repo △ Less

Submitted 20 January, 2023; originally announced January 2023.

arXiv:2205.14546 [pdf, other]

The Missing Invariance Principle Found -- the Reciprocal Twin of Invariant Risk Minimization

Authors: Dongsung Huh, Avinash Baidya

Abstract: Machine learning models often generalize poorly to out-of-distribution (OOD) data as a result of relying on features that are spuriously correlated with the label during training. Recently, the technique of Invariant Risk Minimization (IRM) was proposed to learn predictors that only use invariant features by conserving the feature-conditioned label expectation $\mathbb{E}_e[y|f(x)]$ across environ… ▽ More Machine learning models often generalize poorly to out-of-distribution (OOD) data as a result of relying on features that are spuriously correlated with the label during training. Recently, the technique of Invariant Risk Minimization (IRM) was proposed to learn predictors that only use invariant features by conserving the feature-conditioned label expectation $\mathbb{E}_e[y|f(x)]$ across environments. However, more recent studies have demonstrated that IRM-v1, a practical version of IRM, can fail in various settings. Here, we identify a fundamental flaw of IRM formulation that causes the failure. We then introduce a complementary notion of invariance, MRI, based on conserving the label-conditioned feature expectation $\mathbb{E}_e[f(x)|y]$, which is free of this flaw. Further, we introduce a simplified, practical version of the MRI formulation called MRI-v1. We prove that for general linear problems, MRI-v1 guarantees invariant predictors given sufficient number of environments. We also empirically demonstrate that MRI-v1 strongly out-performs IRM-v1 and consistently achieves near-optimal OOD generalization in image-based nonlinear problems. △ Less

Submitted 16 January, 2023; v1 submitted 28 May, 2022; originally announced May 2022.

Comments: NeurIPS 2022

arXiv:2106.13037 [pdf, other]

Mix and Mask Actor-Critic Methods

Authors: Dom Huh

Abstract: Shared feature spaces for actor-critic methods aims to capture generalized latent representations to be used by the policy and value function with the hopes for a more stable and sample-efficient optimization. However, such a paradigm present a number of challenges in practice, as parameters generating a shared representation must learn off two distinct objectives, resulting in competing updates a… ▽ More Shared feature spaces for actor-critic methods aims to capture generalized latent representations to be used by the policy and value function with the hopes for a more stable and sample-efficient optimization. However, such a paradigm present a number of challenges in practice, as parameters generating a shared representation must learn off two distinct objectives, resulting in competing updates and learning perturbations. In this paper, we present a novel feature-sharing framework to address these difficulties by introducing the mix and mask mechanisms and the distributional scalarization technique. These mechanisms behaves dynamically to couple and decouple connected latent features variably between the policy and value function, while the distributional scalarization standardizes the two objectives using a probabilistic standpoint. From our experimental results, we demonstrate significant performance improvements compared to alternative methods using separate networks and networks with a shared backbone. △ Less

Submitted 24 June, 2021; originally announced June 2021.

arXiv:2101.00728 [pdf, other]

Synthetic Embedding-based Data Generation Methods for Student Performance

Authors: Dom Huh

Abstract: Given the inherent class imbalance issue within student performance datasets, samples belonging to the edges of the target class distribution pose a challenge for predictive machine learning algorithms to learn. In this paper, we introduce a general framework for synthetic embedding-based data generation (SEDG), a search-based approach to generate new synthetic samples using embeddings to correct… ▽ More Given the inherent class imbalance issue within student performance datasets, samples belonging to the edges of the target class distribution pose a challenge for predictive machine learning algorithms to learn. In this paper, we introduce a general framework for synthetic embedding-based data generation (SEDG), a search-based approach to generate new synthetic samples using embeddings to correct the detriment effects of class imbalances optimally. We compare the SEDG framework to past synthetic data generation methods, including deep generative models, and traditional sampling methods. In our results, we find SEDG to outperform the traditional re-sampling methods for deep neural networks and perform competitively for common machine learning classifiers on the student performance task in several standard performance metrics. △ Less

Submitted 3 January, 2021; originally announced January 2021.

arXiv:2007.16001 [pdf, other]

Greedy Bandits with Sampled Context

Authors: Dom Huh

Abstract: Bayesian strategies for contextual bandits have proved promising in single-state reinforcement learning tasks by modeling uncertainty using context information from the environment. In this paper, we propose Greedy Bandits with Sampled Context (GB-SC), a method for contextual multi-armed bandits to develop the prior from the context information using Thompson Sampling, and arm selection using an e… ▽ More Bayesian strategies for contextual bandits have proved promising in single-state reinforcement learning tasks by modeling uncertainty using context information from the environment. In this paper, we propose Greedy Bandits with Sampled Context (GB-SC), a method for contextual multi-armed bandits to develop the prior from the context information using Thompson Sampling, and arm selection using an epsilon-greedy policy. The framework GB-SC allows for evaluation of context-reward dependency, as well as providing robustness for partially observable context vectors by leveraging the prior developed. Our experimental results show competitive performance on the Mushroom environment in terms of expected regret and expected cumulative regret, as well as insights on how each context subset affects decision-making. △ Less

Submitted 27 July, 2020; originally announced July 2020.

arXiv:2007.16000 [pdf, other]

Hierarchical BiGraph Neural Network as Recommendation Systems

Authors: Dom Huh

Abstract: Graph neural networks emerge as a promising modeling method for applications dealing with datasets that are best represented in the graph domain. In specific, developing recommendation systems often require addressing sparse structured data which often lacks the feature richness in either the user and/or item side and requires processing within the correct context for optimal performance. These da… ▽ More Graph neural networks emerge as a promising modeling method for applications dealing with datasets that are best represented in the graph domain. In specific, developing recommendation systems often require addressing sparse structured data which often lacks the feature richness in either the user and/or item side and requires processing within the correct context for optimal performance. These datasets intuitively can be mapped to and represented as networks or graphs. In this paper, we propose the Hierarchical BiGraph Neural Network (HBGNN), a hierarchical approach of using GNNs as recommendation systems and structuring the user-item features using a bigraph framework. Our experimental results show competitive performance with current recommendation system methods and transferability. △ Less

Submitted 27 July, 2020; originally announced July 2020.

arXiv:2003.08743 [pdf, other]

Generative Multi-Stream Architecture For American Sign Language Recognition

Authors: Dom Huh, Sai Gurrapu, Frederick Olson, Huzefa Rangwala, Parth Pathak, Jana Kosecka

Abstract: With advancements in deep model architectures, tasks in computer vision can reach optimal convergence provided proper data preprocessing and model parameter initialization. However, training on datasets with low feature-richness for complex applications limit and detriment optimal convergence below human performance. In past works, researchers have provided external sources of complementary data a… ▽ More With advancements in deep model architectures, tasks in computer vision can reach optimal convergence provided proper data preprocessing and model parameter initialization. However, training on datasets with low feature-richness for complex applications limit and detriment optimal convergence below human performance. In past works, researchers have provided external sources of complementary data at the cost of supplementary hardware, which are fed in streams to counteract this limitation and boost performance. We propose a generative multi-stream architecture, eliminating the need for additional hardware with the intent to improve feature richness without risking impracticability. We also introduce the compact spatio-temporal residual block to the standard 3-dimensional convolutional model, C3D. Our rC3D model performs comparatively to the top C3D residual variant architecture, the pseudo-3D model, on the FASL-RGB dataset. Our methods have achieved 95.62% validation accuracy with a variance of 1.42% from training, outperforming past models by 0.45% in validation accuracy and 5.53% in variance. △ Less

Submitted 9 March, 2020; originally announced March 2020.

arXiv:1706.04698 [pdf, other]

Gradient Descent for Spiking Neural Networks

Authors: Dongsung Huh, Terrence J. Sejnowski

Abstract: Much of studies on neural computation are based on network models of static neurons that produce analog output, despite the fact that information processing in the brain is predominantly carried out by dynamic neurons that produce discrete pulses called spikes. Research in spike-based computation has been impeded by the lack of efficient supervised learning algorithm for spiking networks. Here, we… ▽ More Much of studies on neural computation are based on network models of static neurons that produce analog output, despite the fact that information processing in the brain is predominantly carried out by dynamic neurons that produce discrete pulses called spikes. Research in spike-based computation has been impeded by the lack of efficient supervised learning algorithm for spiking networks. Here, we present a gradient descent method for optimizing spiking network models by introducing a differentiable formulation of spiking networks and deriving the exact gradient calculation. For demonstration, we trained recurrent spiking networks on two dynamic tasks: one that requires optimizing fast (~millisecond) spike-based interactions for efficient encoding of information, and a delayed memory XOR task over extended duration (~second). The results show that our method indeed optimizes the spiking network dynamics on the time scale of individual spikes as well as behavioral time scales. In conclusion, our result offers a general purpose supervised learning algorithm for spiking neural networks, thus advancing further investigations on spike-based computation. △ Less

Submitted 19 June, 2017; v1 submitted 14 June, 2017; originally announced June 2017.

arXiv:1506.07515 [pdf, other]

The Vector Space of Convex Curves: How to Mix Shapes

Authors: Dongsung Huh

Abstract: We present a novel, log-radius profile representation for convex curves and define a new operation for combining the shape features of curves. Unlike the standard, angle profile-based methods, this operation accurately combines the shape features in a visually intuitive manner. This method have implications in shape analysis as well as in investigating how the brain perceives and generates curved… ▽ More We present a novel, log-radius profile representation for convex curves and define a new operation for combining the shape features of curves. Unlike the standard, angle profile-based methods, this operation accurately combines the shape features in a visually intuitive manner. This method have implications in shape analysis as well as in investigating how the brain perceives and generates curved shapes and motions. △ Less

Submitted 24 June, 2015; originally announced June 2015.

Showing 1–14 of 14 results for author: Huh, D