-
Confidence Aware Inverse Constrained Reinforcement Learning
Authors:
Sriram Ganapathi Subramanian,
Guiliang Liu,
Mohammed Elmahgiubi,
Kasra Rezaee,
Pascal Poupart
Abstract:
In coming up with solutions to real-world problems, humans implicitly adhere to constraints that are too numerous and complex to be specified completely. However, reinforcement learning (RL) agents need these constraints to learn the correct optimal policy in these settings. The field of Inverse Constraint Reinforcement Learning (ICRL) deals with this problem and provides algorithms that aim to es…
▽ More
In coming up with solutions to real-world problems, humans implicitly adhere to constraints that are too numerous and complex to be specified completely. However, reinforcement learning (RL) agents need these constraints to learn the correct optimal policy in these settings. The field of Inverse Constraint Reinforcement Learning (ICRL) deals with this problem and provides algorithms that aim to estimate the constraints from expert demonstrations collected offline. Practitioners prefer to know a measure of confidence in the estimated constraints, before deciding to use these constraints, which allows them to only use the constraints that satisfy a desired level of confidence. However, prior works do not allow users to provide the desired level of confidence for the inferred constraints. This work provides a principled ICRL method that can take a confidence level with a set of expert demonstrations and outputs a constraint that is at least as constraining as the true underlying constraint with the desired level of confidence. Further, unlike previous methods, this method allows a user to know if the number of expert trajectories is insufficient to learn a constraint with a desired level of confidence, and therefore collect more expert trajectories as required to simultaneously learn constraints with the desired level of confidence and a policy that achieves the desired level of performance.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
How Useful is Intermittent, Asynchronous Expert Feedback for Bayesian Optimization?
Authors:
Agustinus Kristiadi,
Felix Strieth-Kalthoff,
Sriram Ganapathi Subramanian,
Vincent Fortuin,
Pascal Poupart,
Geoff Pleiss
Abstract:
Bayesian optimization (BO) is an integral part of automated scientific discovery -- the so-called self-driving lab -- where human inputs are ideally minimal or at least non-blocking. However, scientists often have strong intuition, and thus human feedback is still useful. Nevertheless, prior works in enhancing BO with expert feedback, such as by incorporating it in an offline or online but blockin…
▽ More
Bayesian optimization (BO) is an integral part of automated scientific discovery -- the so-called self-driving lab -- where human inputs are ideally minimal or at least non-blocking. However, scientists often have strong intuition, and thus human feedback is still useful. Nevertheless, prior works in enhancing BO with expert feedback, such as by incorporating it in an offline or online but blocking (arrives at each BO iteration) manner, are incompatible with the spirit of self-driving labs. In this work, we study whether a small amount of randomly arriving expert feedback that is being incorporated in a non-blocking manner can improve a BO campaign. To this end, we run an additional, independent computing thread on top of the BO loop to handle the feedback-gathering process. The gathered feedback is used to learn a Bayesian preference model that can readily be incorporated into the BO thread, to steer its exploration-exploitation process. Experiments on toy and chemistry datasets suggest that even just a few intermittent, asynchronous expert feedback can be useful for improving or constraining BO. This can especially be useful for its implication in improving self-driving labs, e.g. making them more data-efficient and less costly.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Scaling Data Plane Verification with Intent-based Slicing
Authors:
Kuan-Yen Chou,
Santhosh Prabhu,
Giri Subramanian,
Wenxuan Zhou,
Aanand Nayyar,
Brighten Godfrey,
Matthew Caesar
Abstract:
Data plane verification has grown into a powerful tool to ensure network correctness. However, existing monolithic data plane models have high memory requirements with large networks, and the existing method of scaling out is too limited in expressiveness to capture practical network features. In this paper, we describe Scylla, a general data plane verifier that provides fine-grained scale-out wit…
▽ More
Data plane verification has grown into a powerful tool to ensure network correctness. However, existing monolithic data plane models have high memory requirements with large networks, and the existing method of scaling out is too limited in expressiveness to capture practical network features. In this paper, we describe Scylla, a general data plane verifier that provides fine-grained scale-out without the need for a monolithic network model. Scylla creates models for what we call intent-based slices, each of which is constructed at a fine (rule-level) granularity with just enough to verify a given set of intents. The sliced models are retained in memory across a cluster and are incrementally updated in a distributed compute cluster in response to network updates. Our experiments show that Scylla makes the scaling problem more granular -- tied to the size of the intent-based slices rather than that of the overall network. This enables Scylla to verify large, complex networks in minimum units of work that are significantly smaller (in both memory and time) than past techniques, enabling fast scale-out verification with minimal resource requirement.
△ Less
Submitted 31 May, 2024;
originally announced May 2024.
-
Information Compression in Dynamic Information Disclosure Games
Authors:
Dengwang Tang,
Vijay G. Subramanian
Abstract:
We consider a two-player dynamic information design problem between a principal and a receiver -- a game is played between the two agents on top of a Markovian system controlled by the receiver's actions, where the principal obtains and strategically shares some information about the underlying system with the receiver in order to influence their actions. In our setting, both players have long-ter…
▽ More
We consider a two-player dynamic information design problem between a principal and a receiver -- a game is played between the two agents on top of a Markovian system controlled by the receiver's actions, where the principal obtains and strategically shares some information about the underlying system with the receiver in order to influence their actions. In our setting, both players have long-term objectives, and the principal sequentially commits to their strategies instead of committing at the beginning. Further, the principal cannot directly observe the system state, but at every turn they can choose randomized experiments to observe the system partially. The principal can share details about the experiments to the receiver. For our analysis we impose the truthful disclosure rule: the principal is required to truthfully announce the details and the result of each experiment to the receiver immediately after the experiment result is revealed. Based on the received information, the receiver takes an action when its their turn, with the action influencing the state of the underlying system. We show that there exist Perfect Bayesian equilibria in this game where both agents play Canonical Belief Based (CBB) strategies using a compressed version of their information, rather than full information, to choose experiments (for the principal) or actions (for the receiver). We also provide a backward inductive procedure to solve for an equilibrium in CBB strategies.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
ChemGymRL: An Interactive Framework for Reinforcement Learning for Digital Chemistry
Authors:
Chris Beeler,
Sriram Ganapathi Subramanian,
Kyle Sprague,
Nouha Chatti,
Colin Bellinger,
Mitchell Shahen,
Nicholas Paquin,
Mark Baula,
Amanuel Dawit,
Zihan Yang,
Xinkai Li,
Mark Crowley,
Isaac Tamblyn
Abstract:
This paper provides a simulated laboratory for making use of Reinforcement Learning (RL) for chemical discovery. Since RL is fairly data intensive, training agents `on-the-fly' by taking actions in the real world is infeasible and possibly dangerous. Moreover, chemical processing and discovery involves challenges which are not commonly found in RL benchmarks and therefore offer a rich space to wor…
▽ More
This paper provides a simulated laboratory for making use of Reinforcement Learning (RL) for chemical discovery. Since RL is fairly data intensive, training agents `on-the-fly' by taking actions in the real world is infeasible and possibly dangerous. Moreover, chemical processing and discovery involves challenges which are not commonly found in RL benchmarks and therefore offer a rich space to work in. We introduce a set of highly customizable and open-source RL environments, ChemGymRL, based on the standard Open AI Gym template. ChemGymRL supports a series of interconnected virtual chemical benches where RL agents can operate and train. The paper introduces and details each of these benches using well-known chemical reactions as illustrative examples, and trains a set of standard RL algorithms in each of these benches. Finally, discussion and comparison of the performances of several standard RL methods are provided in addition to a list of directions for future work as a vision for the further development and usage of ChemGymRL.
△ Less
Submitted 23 May, 2023;
originally announced May 2023.
-
Learning from Multiple Independent Advisors in Multi-agent Reinforcement Learning
Authors:
Sriram Ganapathi Subramanian,
Matthew E. Taylor,
Kate Larson,
Mark Crowley
Abstract:
Multi-agent reinforcement learning typically suffers from the problem of sample inefficiency, where learning suitable policies involves the use of many data samples. Learning from external demonstrators is a possible solution that mitigates this problem. However, most prior approaches in this area assume the presence of a single demonstrator. Leveraging multiple knowledge sources (i.e., advisors)…
▽ More
Multi-agent reinforcement learning typically suffers from the problem of sample inefficiency, where learning suitable policies involves the use of many data samples. Learning from external demonstrators is a possible solution that mitigates this problem. However, most prior approaches in this area assume the presence of a single demonstrator. Leveraging multiple knowledge sources (i.e., advisors) with expertise in distinct aspects of the environment could substantially speed up learning in complex environments. This paper considers the problem of simultaneously learning from multiple independent advisors in multi-agent reinforcement learning. The approach leverages a two-level Q-learning architecture, and extends this framework from single-agent to multi-agent settings. We provide principled algorithms that incorporate a set of advisors by both evaluating the advisors at each state and subsequently using the advisors to guide action selection. We also provide theoretical convergence and sample complexity guarantees. Experimentally, we validate our approach in three different test-beds and show that our algorithms give better performances than baselines, can effectively integrate the combined expertise of different advisors, and learn to ignore bad advice.
△ Less
Submitted 2 March, 2023; v1 submitted 26 January, 2023;
originally announced January 2023.
-
ABB-BERT: A BERT model for disambiguating abbreviations and contractions
Authors:
Prateek Kacker,
Andi Cupallari,
Aswin Gridhar Subramanian,
Nimit Jain
Abstract:
Abbreviations and contractions are commonly found in text across different domains. For example, doctors' notes contain many contractions that can be personalized based on their choices. Existing spelling correction models are not suitable to handle expansions because of many reductions of characters in words. In this work, we propose ABB-BERT, a BERT-based model, which deals with an ambiguous lan…
▽ More
Abbreviations and contractions are commonly found in text across different domains. For example, doctors' notes contain many contractions that can be personalized based on their choices. Existing spelling correction models are not suitable to handle expansions because of many reductions of characters in words. In this work, we propose ABB-BERT, a BERT-based model, which deals with an ambiguous language containing abbreviations and contractions. ABB-BERT can rank them from thousands of options and is designed for scale. It is trained on Wikipedia text, and the algorithm allows it to be fine-tuned with little compute to get better performance for a domain or person. We are publicly releasing the training dataset for abbreviations and contractions derived from Wikipedia.
△ Less
Submitted 8 July, 2022;
originally announced July 2022.
-
Decentralized Mean Field Games
Authors:
Sriram Ganapathi Subramanian,
Matthew E. Taylor,
Mark Crowley,
Pascal Poupart
Abstract:
Multiagent reinforcement learning algorithms have not been widely adopted in large scale environments with many agents as they often scale poorly with the number of agents. Using mean field theory to aggregate agents has been proposed as a solution to this problem. However, almost all previous methods in this area make a strong assumption of a centralized system where all the agents in the environ…
▽ More
Multiagent reinforcement learning algorithms have not been widely adopted in large scale environments with many agents as they often scale poorly with the number of agents. Using mean field theory to aggregate agents has been proposed as a solution to this problem. However, almost all previous methods in this area make a strong assumption of a centralized system where all the agents in the environment learn the same policy and are effectively indistinguishable from each other. In this paper, we relax this assumption about indistinguishable agents and propose a new mean field system known as Decentralized Mean Field Games, where each agent can be quite different from others. All agents learn independent policies in a decentralized fashion, based on their local observations. We define a theoretical solution concept for this system and provide a fixed point guarantee for a Q-learning based algorithm in this system. A practical consequence of our approach is that we can address a `chicken-and-egg' problem in empirical mean field reinforcement learning algorithms. Further, we provide Q-learning and actor-critic algorithms that use the decentralized mean field learning approach and give stronger performances compared to common baselines in this area. In our setting, agents do not need to be clones of each other and learn in a fully decentralized fashion. Hence, for the first time, we show the application of mean field learning methods in fully competitive environments, large-scale continuous action space environments, and other environments with heterogeneous agents. Importantly, we also apply the mean field method in a ride-sharing problem using a real-world dataset. We propose a decentralized solution to this problem, which is more practical than existing centralized training methods.
△ Less
Submitted 13 April, 2022; v1 submitted 16 December, 2021;
originally announced December 2021.
-
Investigation of Independent Reinforcement Learning Algorithms in Multi-Agent Environments
Authors:
Ken Ming Lee,
Sriram Ganapathi Subramanian,
Mark Crowley
Abstract:
Independent reinforcement learning algorithms have no theoretical guarantees for finding the best policy in multi-agent settings. However, in practice, prior works have reported good performance with independent algorithms in some domains and bad performance in others. Moreover, a comprehensive study of the strengths and weaknesses of independent algorithms is lacking in the literature. In this pa…
▽ More
Independent reinforcement learning algorithms have no theoretical guarantees for finding the best policy in multi-agent settings. However, in practice, prior works have reported good performance with independent algorithms in some domains and bad performance in others. Moreover, a comprehensive study of the strengths and weaknesses of independent algorithms is lacking in the literature. In this paper, we carry out an empirical comparison of the performance of independent algorithms on four PettingZoo environments that span the three main categories of multi-agent environments, i.e., cooperative, competitive, and mixed. We show that in fully-observable environments, independent algorithms can perform on par with multi-agent algorithms in cooperative and competitive settings. For the mixed environments, we show that agents trained via independent algorithms learn to perform well individually, but fail to learn to cooperate with allies and compete with enemies. We also show that adding recurrence improves the learning of independent algorithms in cooperative partially observable environments.
△ Less
Submitted 1 November, 2021;
originally announced November 2021.
-
Multi-Agent Advisor Q-Learning
Authors:
Sriram Ganapathi Subramanian,
Matthew E. Taylor,
Kate Larson,
Mark Crowley
Abstract:
In the last decade, there have been significant advances in multi-agent reinforcement learning (MARL) but there are still numerous challenges, such as high sample complexity and slow convergence to stable policies, that need to be overcome before wide-spread deployment is possible. However, many real-world environments already, in practice, deploy sub-optimal or heuristic approaches for generating…
▽ More
In the last decade, there have been significant advances in multi-agent reinforcement learning (MARL) but there are still numerous challenges, such as high sample complexity and slow convergence to stable policies, that need to be overcome before wide-spread deployment is possible. However, many real-world environments already, in practice, deploy sub-optimal or heuristic approaches for generating policies. An interesting question that arises is how to best use such approaches as advisors to help improve reinforcement learning in multi-agent domains. In this paper, we provide a principled framework for incorporating action recommendations from online sub-optimal advisors in multi-agent settings. We describe the problem of ADvising Multiple Intelligent Reinforcement Agents (ADMIRAL) in nonrestrictive general-sum stochastic game environments and present two novel Q-learning based algorithms: ADMIRAL - Decision Making (ADMIRAL-DM) and ADMIRAL - Advisor Evaluation (ADMIRAL-AE), which allow us to improve learning by appropriately incorporating advice from an advisor (ADMIRAL-DM), and evaluate the effectiveness of an advisor (ADMIRAL-AE). We analyze the algorithms theoretically and provide fixed-point guarantees regarding their learning in general-sum stochastic games. Furthermore, extensive experiments illustrate that these algorithms: can be used in a variety of environments, have performances that compare favourably to other related baselines, can scale to large state-action spaces, and are robust to poor advice from advisors.
△ Less
Submitted 1 March, 2023; v1 submitted 25 October, 2021;
originally announced November 2021.
-
On the benefits of being constrained when receiving signals
Authors:
Shih-Tang Su,
David Kempe,
Vijay G. Subramanian
Abstract:
We study a Bayesian persuasion setting in which the receiver is trying to match the (binary) state of the world. The sender's utility is partially aligned with the receiver's, in that conditioned on the receiver's action, the sender derives higher utility when the state of the world matches the action.
Our focus is on whether, in such a setting, being constrained helps a receiver. Intuitively, i…
▽ More
We study a Bayesian persuasion setting in which the receiver is trying to match the (binary) state of the world. The sender's utility is partially aligned with the receiver's, in that conditioned on the receiver's action, the sender derives higher utility when the state of the world matches the action.
Our focus is on whether, in such a setting, being constrained helps a receiver. Intuitively, if the receiver can only take the sender's preferred action with a smaller probability, the sender might have to reveal more information, so that the receiver can take the action more specifically when the sender prefers it. We show that with a binary state of the world, this intuition indeed carries through: under very mild non-degeneracy conditions, a more constrained receiver will always obtain (weakly) higher utility than a less constrained one. Unfortunately, without additional assumptions, the result does not hold when there are more than two states in the world, which we show with an explicit example.
△ Less
Submitted 25 October, 2021; v1 submitted 21 October, 2021;
originally announced October 2021.
-
Bayesian Persuasion in Sequential Trials
Authors:
Shih-Tang Su,
Vijay G. Subramanian,
Grant Schoenebeck
Abstract:
We consider a Bayesian persuasion or information design problem where the sender tries to persuade the receiver to take a particular action via a sequence of signals. This we model by considering multi-phase trials with different experiments conducted based on the outcomes of prior experiments. In contrast to most of the literature, we consider the problem with constraints on signals imposed on th…
▽ More
We consider a Bayesian persuasion or information design problem where the sender tries to persuade the receiver to take a particular action via a sequence of signals. This we model by considering multi-phase trials with different experiments conducted based on the outcomes of prior experiments. In contrast to most of the literature, we consider the problem with constraints on signals imposed on the sender. This we achieve by fixing some of the experiments in an exogenous manner; these are called determined experiments. This modeling helps us understand real-world situations where this occurs: e.g., multi-phase drug trials where the FDA determines some of the experiments, funding of a startup by a venture capital firm, start-up acquisition by big firms where late-stage assessments are determined by the potential acquirer, multi-round job interviews where the candidates signal initially by presenting their qualifications but the rest of the screening procedures are determined by the interviewer. The non-determined experiments (signals) in the multi-phase trial are to be chosen by the sender in order to persuade the receiver best. With a binary state of the world, we start by deriving the optimal signaling policy in the only non-trivial configuration of a two-phase trial with binary-outcome experiments. We then generalize to multi-phase trials with binary-outcome experiments where the determined experiments can be placed at any chosen node in the trial tree. Here we present a dynamic programming algorithm to derive the optimal signaling policy that uses the two-phase trial solution's structural insights. We also contrast the optimal signaling policy structure with classical Bayesian persuasion strategies to highlight the impact of the signaling constraints on the sender.
△ Less
Submitted 22 November, 2021; v1 submitted 18 October, 2021;
originally announced October 2021.
-
The Effect of Q-function Reuse on the Total Regret of Tabular, Model-Free, Reinforcement Learning
Authors:
Volodymyr Tkachuk,
Sriram Ganapathi Subramanian,
Matthew E. Taylor
Abstract:
Some reinforcement learning methods suffer from high sample complexity causing them to not be practical in real-world situations. $Q$-function reuse, a transfer learning method, is one way to reduce the sample complexity of learning, potentially improving usefulness of existing algorithms. Prior work has shown the empirical effectiveness of $Q$-function reuse for various environments when applied…
▽ More
Some reinforcement learning methods suffer from high sample complexity causing them to not be practical in real-world situations. $Q$-function reuse, a transfer learning method, is one way to reduce the sample complexity of learning, potentially improving usefulness of existing algorithms. Prior work has shown the empirical effectiveness of $Q$-function reuse for various environments when applied to model-free algorithms. To the best of our knowledge, there has been no theoretical work showing the regret of $Q$-function reuse when applied to the tabular, model-free setting. We aim to bridge the gap between theoretical and empirical work in $Q$-function reuse by providing some theoretical insights on the effectiveness of $Q$-function reuse when applied to the $Q$-learning with UCB-Hoeffding algorithm. Our main contribution is showing that in a specific case if $Q$-function reuse is applied to the $Q$-learning with UCB-Hoeffding algorithm it has a regret that is independent of the state or action space. We also provide empirical results supporting our theoretical findings.
△ Less
Submitted 7 March, 2021;
originally announced March 2021.
-
Partially Observable Mean Field Reinforcement Learning
Authors:
Sriram Ganapathi Subramanian,
Matthew E. Taylor,
Mark Crowley,
Pascal Poupart
Abstract:
Traditional multi-agent reinforcement learning algorithms are not scalable to environments with more than a few agents, since these algorithms are exponential in the number of agents. Recent research has introduced successful methods to scale multi-agent reinforcement learning algorithms to many agent scenarios using mean field theory. Previous work in this field assumes that an agent has access t…
▽ More
Traditional multi-agent reinforcement learning algorithms are not scalable to environments with more than a few agents, since these algorithms are exponential in the number of agents. Recent research has introduced successful methods to scale multi-agent reinforcement learning algorithms to many agent scenarios using mean field theory. Previous work in this field assumes that an agent has access to exact cumulative metrics regarding the mean field behaviour of the system, which it can then use to take its actions. In this paper, we relax this assumption and maintain a distribution to model the uncertainty regarding the mean field of the system. We consider two different settings for this problem. In the first setting, only agents in a fixed neighbourhood are visible, while in the second setting, the visibility of agents is determined at random based on distances. For each of these settings, we introduce a Q-learning based algorithm that can learn effectively. We prove that this Q-learning estimate stays very close to the Nash Q-value (under a common set of assumptions) for the first setting. We also empirically show our algorithms outperform multiple baselines in three different games in the MAgents framework, which supports large environments with many agents learning simultaneously to achieve possibly distinct goals.
△ Less
Submitted 24 January, 2021; v1 submitted 31 December, 2020;
originally announced December 2020.
-
Maximum Reward Formulation In Reinforcement Learning
Authors:
Sai Krishna Gottipati,
Yashaswi Pathak,
Rohan Nuttall,
Sahir,
Raviteja Chunduru,
Ahmed Touati,
Sriram Ganapathi Subramanian,
Matthew E. Taylor,
Sarath Chandar
Abstract:
Reinforcement learning (RL) algorithms typically deal with maximizing the expected cumulative return (discounted or undiscounted, finite or infinite horizon). However, several crucial applications in the real world, such as drug discovery, do not fit within this framework because an RL agent only needs to identify states (molecules) that achieve the highest reward within a trajectory and does not…
▽ More
Reinforcement learning (RL) algorithms typically deal with maximizing the expected cumulative return (discounted or undiscounted, finite or infinite horizon). However, several crucial applications in the real world, such as drug discovery, do not fit within this framework because an RL agent only needs to identify states (molecules) that achieve the highest reward within a trajectory and does not need to optimize for the expected cumulative return. In this work, we formulate an objective function to maximize the expected maximum reward along a trajectory, derive a novel functional form of the Bellman equation, introduce the corresponding Bellman operators, and provide a proof of convergence. Using this formulation, we achieve state-of-the-art results on the task of molecule generation that mimics a real-world drug discovery pipeline.
△ Less
Submitted 18 December, 2023; v1 submitted 7 October, 2020;
originally announced October 2020.
-
A review of machine learning applications in wildfire science and management
Authors:
Piyush Jain,
Sean C P Coogan,
Sriram Ganapathi Subramanian,
Mark Crowley,
Steve Taylor,
Mike D Flannigan
Abstract:
Artificial intelligence has been applied in wildfire science and management since the 1990s, with early applications including neural networks and expert systems. Since then the field has rapidly progressed congruently with the wide adoption of machine learning (ML) in the environmental sciences. Here, we present a scoping review of ML in wildfire science and management. Our objective is to improv…
▽ More
Artificial intelligence has been applied in wildfire science and management since the 1990s, with early applications including neural networks and expert systems. Since then the field has rapidly progressed congruently with the wide adoption of machine learning (ML) in the environmental sciences. Here, we present a scoping review of ML in wildfire science and management. Our objective is to improve awareness of ML among wildfire scientists and managers, as well as illustrate the challenging range of problems in wildfire science available to data scientists. We first present an overview of popular ML approaches used in wildfire science to date, and then review their use in wildfire science within six problem domains: 1) fuels characterization, fire detection, and mapping; 2) fire weather and climate change; 3) fire occurrence, susceptibility, and risk; 4) fire behavior prediction; 5) fire effects; and 6) fire management. We also discuss the advantages and limitations of various ML approaches and identify opportunities for future advances in wildfire science and management within a data science context. We identified 298 relevant publications, where the most frequently used ML methods included random forests, MaxEnt, artificial neural networks, decision trees, support vector machines, and genetic algorithms. There exists opportunities to apply more current ML methods (e.g., deep learning and agent based learning) in wildfire science. However, despite the ability of ML models to learn on their own, expertise in wildfire science is necessary to ensure realistic modelling of fire processes across multiple scales, while the complexity of some ML methods requires sophisticated knowledge for their application. Finally, we stress that the wildfire research and management community plays an active role in providing relevant, high quality data for use by practitioners of ML methods.
△ Less
Submitted 19 August, 2020; v1 submitted 1 March, 2020;
originally announced March 2020.
-
Towards Label-Free 3D Segmentation of Optical Coherence Tomography Images of the Optic Nerve Head Using Deep Learning
Authors:
Sripad Krishna Devalla,
Tan Hung Pham,
Satish Kumar Panda,
Liang Zhang,
Giridhar Subramanian,
Anirudh Swaminathan,
Chin Zhi Yun,
Mohan Rajan,
Sujatha Mohan,
Ramaswami Krishnadas,
Vijayalakshmi Senthil,
John Mark S. de Leon,
Tin A. Tun,
Ching-Yu Cheng,
Leopold Schmetterer,
Shamira Perera,
Tin Aung,
Alexandre H. Thiery,
Michael J. A. Girard
Abstract:
Since the introduction of optical coherence tomography (OCT), it has been possible to study the complex 3D morphological changes of the optic nerve head (ONH) tissues that occur along with the progression of glaucoma. Although several deep learning (DL) techniques have been recently proposed for the automated extraction (segmentation) and quantification of these morphological changes, the device s…
▽ More
Since the introduction of optical coherence tomography (OCT), it has been possible to study the complex 3D morphological changes of the optic nerve head (ONH) tissues that occur along with the progression of glaucoma. Although several deep learning (DL) techniques have been recently proposed for the automated extraction (segmentation) and quantification of these morphological changes, the device specific nature and the difficulty in preparing manual segmentations (training data) limit their clinical adoption. With several new manufacturers and next-generation OCT devices entering the market, the complexity in deploying DL algorithms clinically is only increasing. To address this, we propose a DL based 3D segmentation framework that is easily translatable across OCT devices in a label-free manner (i.e. without the need to manually re-segment data for each device). Specifically, we developed 2 sets of DL networks. The first (referred to as the enhancer) was able to enhance OCT image quality from 3 OCT devices, and harmonized image-characteristics across these devices. The second performed 3D segmentation of 6 important ONH tissue layers. We found that the use of the enhancer was critical for our segmentation network to achieve device independency. In other words, our 3D segmentation network trained on any of 3 devices successfully segmented ONH tissue layers from the other two devices with high performance (Dice coefficients > 0.92). With such an approach, we could automatically segment images from new OCT devices without ever needing manual segmentation data from such devices.
△ Less
Submitted 22 February, 2020;
originally announced February 2020.
-
Multi Type Mean Field Reinforcement Learning
Authors:
Sriram Ganapathi Subramanian,
Pascal Poupart,
Matthew E. Taylor,
Nidhi Hegde
Abstract:
Mean field theory provides an effective way of scaling multiagent reinforcement learning algorithms to environments with many agents that can be abstracted by a virtual mean agent. In this paper, we extend mean field multiagent algorithms to multiple types. The types enable the relaxation of a core assumption in mean field reinforcement learning, which is that all agents in the environment are pla…
▽ More
Mean field theory provides an effective way of scaling multiagent reinforcement learning algorithms to environments with many agents that can be abstracted by a virtual mean agent. In this paper, we extend mean field multiagent algorithms to multiple types. The types enable the relaxation of a core assumption in mean field reinforcement learning, which is that all agents in the environment are playing almost similar strategies and have the same goal. We conduct experiments on three different testbeds for the field of many agent reinforcement learning, based on the standard MAgents framework. We consider two different kinds of mean field environments: a) Games where agents belong to predefined types that are known a priori and b) Games where the type of each agent is unknown and therefore must be learned based on observations. We introduce new algorithms for each type of game and demonstrate their superior performance over state of the art algorithms that assume that all agents belong to the same type and other baseline algorithms in the MAgent framework.
△ Less
Submitted 21 June, 2022; v1 submitted 6 February, 2020;
originally announced February 2020.
-
Derandomized Load Balancing using Random Walks on Expander Graphs
Authors:
Dengwang Tang,
Vijay G. Subramanian
Abstract:
In a computing center with a huge amount of machines, when a job arrives, a dispatcher need to decide which machine to route this job to based on limited information. A classical method, called the power-of-$d$ choices algorithm is to pick $d$ servers independently at random and dispatch the job to the least loaded server among the $d$ servers. In this paper, we analyze a low-randomness variant of…
▽ More
In a computing center with a huge amount of machines, when a job arrives, a dispatcher need to decide which machine to route this job to based on limited information. A classical method, called the power-of-$d$ choices algorithm is to pick $d$ servers independently at random and dispatch the job to the least loaded server among the $d$ servers. In this paper, we analyze a low-randomness variant of this dispatching scheme, where $d$ queues are sampled through $d$ independent non-backtracking random walks on a $k$-regular graph $G$. Under certain assumptions of the graph $G$ we show that under this scheme, the dynamics of the queuing system converges to the same deterministic ordinary differential equation (ODE) for the power-of-$d$ choices scheme. We also show that the system is stable under the proposed scheme, and the stationary distribution of the system converges to the fixed point of the ODE.
△ Less
Submitted 21 April, 2019; v1 submitted 17 January, 2019;
originally announced January 2019.
-
Balanced Allocation with Random Walk Based Sampling
Authors:
Dengwang Tang,
Vijay G. Subramanian
Abstract:
In the standard ball-in-bins experiment, a well-known scheme is to sample $d$ bins independently and uniformly at random and put the ball into the least loaded bin. It can be shown that this scheme yields a maximum load of $\log\log n/\log d+O(1)$ with high probability.
Subsequent work analyzed the model when at each time, $d$ bins are sampled through some correlated or non-uniform way. However,…
▽ More
In the standard ball-in-bins experiment, a well-known scheme is to sample $d$ bins independently and uniformly at random and put the ball into the least loaded bin. It can be shown that this scheme yields a maximum load of $\log\log n/\log d+O(1)$ with high probability.
Subsequent work analyzed the model when at each time, $d$ bins are sampled through some correlated or non-uniform way. However, the case when the sampling for different balls are correlated are rarely investigated. In this paper we propose three schemes for the ball-in-bins allocation problem. We assume that there is an underlying $k$-regular graph connecting the bins. The three schemes are variants of power-of-$d$ choices, except that the sampling of $d$ bins at each time are based on the locations of $d$ independently moving non-backtracking random walkers, with the positions of the random walkers being reset when certain events occurs. We show that under some conditions for the underlying graph that can be summarized as the graph having large enough girth, all three schemes can perform as well as power-of-$d$, so that the maximum load is bounded by $\log\log n/\log d+O(1)$ with high probability.
△ Less
Submitted 11 October, 2018; v1 submitted 5 October, 2018;
originally announced October 2018.
-
A Deep Learning Approach to Denoise Optical Coherence Tomography Images of the Optic Nerve Head
Authors:
Sripad Krishna Devalla,
Giridhar Subramanian,
Tan Hung Pham,
Xiaofei Wang,
Shamira Perera,
Tin A. Tun,
Tin Aung,
Leopold Schmetterer,
Alexandre H. Thiery,
Michael J. A. Girard
Abstract:
Purpose: To develop a deep learning approach to de-noise optical coherence tomography (OCT) B-scans of the optic nerve head (ONH).
Methods: Volume scans consisting of 97 horizontal B-scans were acquired through the center of the ONH using a commercial OCT device (Spectralis) for both eyes of 20 subjects. For each eye, single-frame (without signal averaging), and multi-frame (75x signal averaging…
▽ More
Purpose: To develop a deep learning approach to de-noise optical coherence tomography (OCT) B-scans of the optic nerve head (ONH).
Methods: Volume scans consisting of 97 horizontal B-scans were acquired through the center of the ONH using a commercial OCT device (Spectralis) for both eyes of 20 subjects. For each eye, single-frame (without signal averaging), and multi-frame (75x signal averaging) volume scans were obtained. A custom deep learning network was then designed and trained with 2,328 "clean B-scans" (multi-frame B-scans), and their corresponding "noisy B-scans" (clean B-scans + gaussian noise) to de-noise the single-frame B-scans. The performance of the de-noising algorithm was assessed qualitatively, and quantitatively on 1,552 B-scans using the signal to noise ratio (SNR), contrast to noise ratio (CNR), and mean structural similarity index metrics (MSSIM).
Results: The proposed algorithm successfully denoised unseen single-frame OCT B-scans. The denoised B-scans were qualitatively similar to their corresponding multi-frame B-scans, with enhanced visibility of the ONH tissues. The mean SNR increased from $4.02 \pm 0.68$ dB (single-frame) to $8.14 \pm 1.03$ dB (denoised). For all the ONH tissues, the mean CNR increased from $3.50 \pm 0.56$ (single-frame) to $7.63 \pm 1.81$ (denoised). The MSSIM increased from $0.13 \pm 0.02$ (single frame) to $0.65 \pm 0.03$ (denoised) when compared with the corresponding multi-frame B-scans.
Conclusions: Our deep learning algorithm can denoise a single-frame OCT B-scan of the ONH in under 20 ms, thus offering a framework to obtain superior quality OCT B-scans with reduced scanning times and minimal patient discomfort.
△ Less
Submitted 27 September, 2018;
originally announced September 2018.
-
Two-Dimensional Pattern Languages
Authors:
Henning Fernau,
Markus L. Schmid,
K. G. Subramanian
Abstract:
We introduce several classes of array languages obtained by generalising Angluin's pattern languages to the two-dimensional case. These classes of two-dimensional pattern languages are compared with respect to their expressive power and their closure properties are investigated.
We introduce several classes of array languages obtained by generalising Angluin's pattern languages to the two-dimensional case. These classes of two-dimensional pattern languages are compared with respect to their expressive power and their closure properties are investigated.
△ Less
Submitted 13 July, 2017;
originally announced July 2017.
-
Competitive Resource Allocation in HetNets: the Impact of Small-cell Spectrum Constraints and Investment Costs
Authors:
Cheng Chen,
Randall A. Berry,
Michael L. Honig,
Vijay G. Subramanian
Abstract:
Heterogeneous wireless networks with small-cell deployments in licensed and unlicensed spectrum bands are a promising approach for expanding wireless connectivity and service. As a result, wireless service providers (SPs) are adding small-cells to augment their existing macro-cell deployments. This added flexibility complicates network management, in particular, service pricing and spectrum alloca…
▽ More
Heterogeneous wireless networks with small-cell deployments in licensed and unlicensed spectrum bands are a promising approach for expanding wireless connectivity and service. As a result, wireless service providers (SPs) are adding small-cells to augment their existing macro-cell deployments. This added flexibility complicates network management, in particular, service pricing and spectrum allocations across macro- and small-cells. Further, these decisions depend on the degree of competition among SPs. Restrictions on shared spectrum access imposed by regulators, such as low power constraints that lead to small-cell deployments, along with the investment cost needed to add small cells to an existing network, also impact strategic decisions and market efficiency. If the revenue generated by small-cells does not cover the investment cost, then there will be no deployment even if it increases social welfare. We study the implications of such spectrum constraints and investment costs on resource allocation and pricing decisions by competitive SPs, along with the associated social welfare. Our results show that while the optimal resource allocation taking constraints and investment into account can be uniquely determined, adding those features with strategic SPs can have a substantial effect on the equilibrium market structure.
△ Less
Submitted 6 August, 2017; v1 submitted 17 April, 2017;
originally announced April 2017.
-
The Impact of Small-Cell Bandwidth Requirements on Strategic Operators
Authors:
Cheng Chen,
Randall A. Berry,
Michael L. Honig,
Vijay G. Subramanian
Abstract:
Small-cell deployment in licensed and unlicensed spectrum is considered to be one of the key approaches to cope with the ongoing wireless data demand explosion. Compared to traditional cellular base stations with large transmission power, small-cells typically have relatively low transmission power, which makes them attractive for some spectrum bands that have strict power regulations, for example…
▽ More
Small-cell deployment in licensed and unlicensed spectrum is considered to be one of the key approaches to cope with the ongoing wireless data demand explosion. Compared to traditional cellular base stations with large transmission power, small-cells typically have relatively low transmission power, which makes them attractive for some spectrum bands that have strict power regulations, for example, the 3.5GHz band [1]. In this paper we consider a heterogeneous wireless network consisting of one or more service providers (SPs). Each SP operates in both macro-cells and small-cells, and provides service to two types of users: mobile and fixed. Mobile users can only associate with macro-cells whereas fixed users can connect to either macro- or small-cells. The SP charges a price per unit rate for each type of service. Each SP is given a fixed amount of bandwidth and splits it between macro- and small-cells. Motivated by bandwidth regulations, such as those for the 3.5Gz band, we assume a minimum amount of bandwidth has to be set aside for small-cells. We study the optimal pricing and bandwidth allocation strategies in both monopoly and competitive scenarios. In the monopoly scenario the strategy is unique. In the competitive scenario there exists a unique Nash equilibrium, which depends on the regulatory constraints. We also analyze the social welfare achieved, and compare it to that without the small-cell bandwidth constraints. Finally, we discuss implications of our results on the effectiveness of the minimum bandwidth constraint on influencing small-cell deployments.
△ Less
Submitted 9 January, 2017;
originally announced January 2017.
-
A Descending Price Auction for Matching Markets
Authors:
Shih-Tang Su,
Jacob D. Abernethy,
Grant Schoenebeck,
Vijay G. Subramanian
Abstract:
This work presents a descending-price-auction algorithm to obtain the maximum market-clearing price vector (MCP) in unit-demand matching markets with m items by exploiting the combinatorial structure. With a shrewd choice of goods for which the prices are reduced in each step, the algorithm only uses the combinatorial structure, which avoids solving LPs and enjoys a strongly polynomial runtime of…
▽ More
This work presents a descending-price-auction algorithm to obtain the maximum market-clearing price vector (MCP) in unit-demand matching markets with m items by exploiting the combinatorial structure. With a shrewd choice of goods for which the prices are reduced in each step, the algorithm only uses the combinatorial structure, which avoids solving LPs and enjoys a strongly polynomial runtime of $O(m^4)$. Critical to the algorithm is determining the set of under-demanded goods for which we reduce the prices simultaneously in each step of the algorithm. This we accomplish by choosing the subset of goods that maximize a skewness function, which makes the bipartite graph series converges to the combinatorial structure at the maximum MCP in $O(m^2)$ steps. A graph coloring algorithm is proposed to find the set of goods with the maximal skewness value that yields $O(m^4)$ complexity.
△ Less
Submitted 4 November, 2017; v1 submitted 16 July, 2016;
originally announced July 2016.
-
The Impact of Unlicensed Access on Small-Cell Resource Allocation
Authors:
Cheng Chen,
Randall A. Berry,
Michael L. Honig,
Vijay G. Subramanian
Abstract:
Small cells deployed in licensed spectrum and unlicensed access via WiFi provide different ways of expanding wireless services to low mobility users. That reduces the demand for conventional macro-cellular networks, which are better suited for wide-area mobile coverage. The mix of these technologies seen in practice depends in part on the decisions made by wireless service providers that seek to m…
▽ More
Small cells deployed in licensed spectrum and unlicensed access via WiFi provide different ways of expanding wireless services to low mobility users. That reduces the demand for conventional macro-cellular networks, which are better suited for wide-area mobile coverage. The mix of these technologies seen in practice depends in part on the decisions made by wireless service providers that seek to maximize revenue, and allocations of licensed and unlicensed spectrum by regulators. To understand these interactions we present a model in which a service provider allocates available licensed spectrum across two separate bands, one for macro- and one for small-cells, in order to serve two types of users: mobile and fixed. We assume a service model in which the providers can charge a (different) price per unit rate for each type of service (macro- or small-cell); unlicensed access is free. With this setup we study how the addition of unlicensed spectrum affects prices and the optimal allocation of bandwidth across macro-/small-cells. We also characterize the optimal fraction of unlicensed spectrum when new bandwidth becomes available.
△ Less
Submitted 21 January, 2016;
originally announced January 2016.
-
Efficient and Secure Routing Protocol for Wireless Sensor Networks through Optimal Power Control and Optimal Handoff-Based Recovery Mechanism
Authors:
Ganesh Subramanian
Abstract:
Advances in wireless sensor network (WSN) technology have provided the availability of small and low-cost sensor with capability of sensing various types of physical and environmental conditions, data processing, and wireless communication. In WSN, the sensor nodes have a limited transmission range, and their processing and storage capabilities as well as their energy resources are also limited. M…
▽ More
Advances in wireless sensor network (WSN) technology have provided the availability of small and low-cost sensor with capability of sensing various types of physical and environmental conditions, data processing, and wireless communication. In WSN, the sensor nodes have a limited transmission range, and their processing and storage capabilities as well as their energy resources are also limited. Modified triple umpiring system (MTUS) has already proved its better performance in Wireless Sensor Networks. In this paper, we extended the MTUS by incorporating optimal signal to noise ratio (SNR)-based power control mechanism and optimal handoff-based self-recovery features to form an efficient and secure routing for WSN. Extensive investigation studies using Glomosim-2.03 Simulator show that efficient and secure routing protocol (ESRP) with optimal power control mechanism, and handoff-based self-recovery can significantly reduce the power usage.
△ Less
Submitted 24 December, 2015;
originally announced December 2015.
-
Explaining Snapshots of Network Diffusions: Structural and Hardness Results
Authors:
Georgios Askalidis,
Randall A. Berry,
Vijay G. Subramanian
Abstract:
Much research has been done on studying the diffusion of ideas or technologies on social networks including the \textit{Influence Maximization} problem and many of its variations. Here, we investigate a type of inverse problem. Given a snapshot of the diffusion process, we seek to understand if the snapshot is feasible for a given dynamic, i.e., whether there is a limited number of nodes whose ini…
▽ More
Much research has been done on studying the diffusion of ideas or technologies on social networks including the \textit{Influence Maximization} problem and many of its variations. Here, we investigate a type of inverse problem. Given a snapshot of the diffusion process, we seek to understand if the snapshot is feasible for a given dynamic, i.e., whether there is a limited number of nodes whose initial adoption can result in the snapshot in finite time. While similar questions have been considered for epidemic dynamics, here, we consider this problem for variations of the deterministic Linear Threshold Model, which is more appropriate for modeling strategic agents. Specifically, we consider both sequential and simultaneous dynamics when deactivations are allowed and when they are not. Even though we show hardness results for all variations we consider, we show that the case of sequential dynamics with deactivations allowed is significantly harder than all others. In contrast, sequential dynamics make the problem trivial on cliques even though it's complexity for simultaneous dynamics is unknown. We complement our hardness results with structural insights that can help better understand diffusions of social networks under various dynamics.
△ Less
Submitted 25 April, 2014; v1 submitted 25 February, 2014;
originally announced February 2014.
-
One-dimensional Array Grammars and P Systems with Array Insertion and Deletion Rules
Authors:
Rudolf Freund,
Sergiu Ivanov,
Marion Oswald,
K. G. Subramanian
Abstract:
We consider the (one-dimensional) array counterpart of contextual as well as insertion and deletion string grammars and consider the operations of array insertion and deletion in array grammars. First we show that the emptiness problem for P systems with (one-dimensional) insertion rules is undecidable. Then we show computational completeness of P systems using (one-dimensional) array insertion an…
▽ More
We consider the (one-dimensional) array counterpart of contextual as well as insertion and deletion string grammars and consider the operations of array insertion and deletion in array grammars. First we show that the emptiness problem for P systems with (one-dimensional) insertion rules is undecidable. Then we show computational completeness of P systems using (one-dimensional) array insertion and deletion rules even of norm one only. The main result of the paper exhibits computational completeness of one-dimensional array grammars using array insertion and deletion rules of norm at most two.
△ Less
Submitted 5 September, 2013;
originally announced September 2013.
-
Comparative Statics On The Allocation Of Spectrum
Authors:
Vijay G. Subramanian,
Mike Honig,
Randy Berry
Abstract:
Allocation of spectrum is an important policy issue and decisions taken have ramifications for future growth of wireless communications and achieving universal connectivity. In this paper, on a common footing we compare the social welfare obtained from the allocation of new spectrum under different alternatives: to licensed providers in monopolistic, oligopolistic and perfectly competitive setting…
▽ More
Allocation of spectrum is an important policy issue and decisions taken have ramifications for future growth of wireless communications and achieving universal connectivity. In this paper, on a common footing we compare the social welfare obtained from the allocation of new spectrum under different alternatives: to licensed providers in monopolistic, oligopolistic and perfectly competitive settings, and for unlicensed access. For this purpose we use mathematical models of competition in congestible resources. Initially we assume that any new bandwidth is available for free, but we also generalize our results to include investment decisions when prices are charged for bandwidth acquisition.
△ Less
Submitted 21 July, 2013;
originally announced July 2013.
-
Searching and Bargaining with Middlemen
Authors:
Thanh Nguyen,
Vijay G. Subramanian,
Randall A. Berry
Abstract:
We study decentralized markets with the presence of middlemen, modeled by a non-cooperative bargaining game in trading networks. Our goal is to investigate how the network structure of the market and the role of middlemen influence the market's efficiency and fairness. We introduce the concept of limit stationary equilibrium in a general trading network and use it to analyze how competition among…
▽ More
We study decentralized markets with the presence of middlemen, modeled by a non-cooperative bargaining game in trading networks. Our goal is to investigate how the network structure of the market and the role of middlemen influence the market's efficiency and fairness. We introduce the concept of limit stationary equilibrium in a general trading network and use it to analyze how competition among middlemen is influenced by the network structure, how endogenous delay emerges in trade and how surplus is shared between producers and consumers.
△ Less
Submitted 7 July, 2013; v1 submitted 12 February, 2013;
originally announced February 2013.
-
Convexity Conditions for 802.11 WLANs
Authors:
Vijay G. Subramanian,
Douglas J. Leith
Abstract:
In this paper we characterise the maximal convex subsets of the (non-convex) rate region in 802.11 WLANs. In addition to being of intrinsic interest as a fundamental property of 802.11 WLANs, this characterisation can be exploited to allow the wealth of convex optimisation approaches to be applied to 802.11 WLANs.
In this paper we characterise the maximal convex subsets of the (non-convex) rate region in 802.11 WLANs. In addition to being of intrinsic interest as a fundamental property of 802.11 WLANs, this characterisation can be exploited to allow the wealth of convex optimisation approaches to be applied to 802.11 WLANs.
△ Less
Submitted 14 June, 2012;
originally announced June 2012.
-
Log-Convexity of Rate Region in 802.11e WLANs
Authors:
Douglas J. Leith,
Vijay G. Subramanian,
Ken R. Duffy
Abstract:
In this paper we establish the log-convexity of the rate region in 802.11 WLANs. This generalises previous results for Aloha networks and has immediate implications for optimisation based approaches to the analysis and design of 802.11 wireless networks.
In this paper we establish the log-convexity of the rate region in 802.11 WLANs. This generalises previous results for Aloha networks and has immediate implications for optimisation based approaches to the analysis and design of 802.11 wireless networks.
△ Less
Submitted 22 February, 2011;
originally announced February 2011.
-
Max-min Fairness in 802.11 Mesh Networks
Authors:
Douglas J. Leith,
Qizhi Cao,
Vijay G. Subramanian
Abstract:
In this paper we build upon the recent observation that the 802.11 rate region is log-convex and, for the first time, characterise max-min fair rate allocations for a large class of 802.11 wireless mesh networks. By exploiting features of the 802.11e/n MAC, in particular TXOP packet bursting, we are able to use this characterisation to establish a straightforward, practically implementable appro…
▽ More
In this paper we build upon the recent observation that the 802.11 rate region is log-convex and, for the first time, characterise max-min fair rate allocations for a large class of 802.11 wireless mesh networks. By exploiting features of the 802.11e/n MAC, in particular TXOP packet bursting, we are able to use this characterisation to establish a straightforward, practically implementable approach for achieving max-min throughput fairness. We demonstrate that this approach can be readily extended to encompass time-based fairness in multi-rate 802.11 mesh networks.
△ Less
Submitted 3 March, 2010; v1 submitted 8 February, 2010;
originally announced February 2010.