Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleJuly 2018
DOP: Deep Optimistic Planning with Approximate Value Function Evaluation
AAMAS '18: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent SystemsPages 2210–2212Research on reinforcement learning has demonstrated promising results in manifold applications and domains. Still, efficiently learning effective robot behaviors is very difficult, due to unstructured scenarios, high uncertainties, and large state ...
- research-articleJuly 2018
Multi-Armed Bandit Algorithms for Spare Time Planning of a Mobile Service Robot
AAMAS '18: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent SystemsPages 2195–2197We assume that service robots will have spare time in between scheduled user requests, which they could use to perform additional unrequested services in order to learn a model of users' preferences and receive reward. However, a mobile service robot is ...
- research-articleJuly 2018
Recurrent Deep Multiagent Q-Learning for Autonomous Agents in Future Smart Grid
AAMAS '18: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent SystemsPages 2136–2138The broker mechanism is widely applied to serve for interested parties to derive long-term policies to reduce costs or gain profits in smart grid. However, brokers are faced with a number of challenging problems such as balancing demand and supply from ...
- research-articleJuly 2018
An Optimal Algorithm for the Stochastic Bandits with Knowing Near-optimal Mean Reward
AAMAS '18: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent SystemsPages 2130–2132This paper studies a variation of stochastic multi-armed bandit (MAB) problem where the agent knows a prior knowledge named Near-optimal Mean Reward (NoMR). We show that the cumulative regret of this bandit variation has a lower bound of Ω Δeft(1/Δ), ...
-
- research-articleJuly 2018
Trial without Error: Towards Safe Reinforcement Learning via Human Intervention
AAMAS '18: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent SystemsPages 2067–2069During training, model-free reinforcement learning (RL) systems can explore actions that lead to harmful or costly consequences. Having a human "in the loop'' and ready to intervene at all times can prevent these mistakes, but is prohibitively expensive ...
- research-articleJuly 2018
RAIL: Risk-Averse Imitation Learning
- Anirban Santara,
- Abhishek Naik,
- Balaraman Ravindran,
- Dipankar Das,
- Dheevatsa Mudigere,
- Sasikanth Avancha,
- Bharat Kaul
AAMAS '18: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent SystemsPages 2062–2063Imitation learning algorithms learn viable policies by imitating an expert's behavior when reward signals are not available. Generative Adversarial Imitation Learning (GAIL) is a state-of-the-art algorithm for learning policies when the expert's ...
- research-articleJuly 2018
Algorithms to Manage Load Shedding Events in Developing Countries
AAMAS '18: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent SystemsPages 2034–2036Due to the limited generation capacity of power stations, many developing countries frequently resort to disconnecting large parts of the power grid from supply, a process termed load shedding. This leaves homes without electricity, causing them ...
- research-articleJuly 2018
Link-based Parameterized Micro-tolling Scheme for Optimal Traffic Management
AAMAS '18: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent SystemsPages 2013–2015In the micro-tolling paradigm, different toll values are assigned to different links within a congestible traffic network. Self-interested agents then select minimal cost routes, where cost is a function of the travel time and tolls paid. A centralized ...
- research-articleJuly 2018
Leveraging Observational Learning for Exploration in Bandits
AAMAS '18: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent SystemsPages 2001–2003Learning from a target has been tackled in the reinforcement learning (RL) setting [1, 7] as imitation learning, either through behaviour cloning or inverse RL. In the former, the agent regresses directly onto the policy of a target [5], while in the ...
- research-articleJuly 2018
Introspective Reinforcement Learning and Learning from Demonstration
AAMAS '18: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent SystemsPages 1992–1994Reinforcement learning is a paradigm to model how an autonomous agent learns to maximise its cumulative reward by interacting with the environment. One challenge faced by reinforcement learning is that in many environments the reward signal is sparse, ...
- research-articleJuly 2018
Guiding Reinforcement Learning Exploration Using Natural Language
AAMAS '18: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent SystemsPages 1956–1958In this work we present a technique for using natural language to help reinforcement learning generalize to unseen environments using neural machine translation techniques. These techniques are then integrated into policy shaping to make it more ...
- research-articleJuly 2018
Incident Prediction and Response Optimization
AAMAS '18: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent SystemsPages 1758–1760In urban areas across the globe, incidents like crime, fire and accidents often result in massive losses of life and property. In such scenarios, quick response can minimize or prevent damage. Emergency responder services are eager to adopt mechanisms ...
- research-articleJuly 2018
Adaptive Dynamic Pricing for Market-based Allocation of Interdependent Commodities
AAMAS '18: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent SystemsPages 1755–1757Ongoing digitization of all kinds of human enterprise is allowing sophisticated pricing strategies to be used in domains where previously this has not been feasible. In the mobility domain, commodities such as shared cars or electric vehicle charging ...
- research-articleJuly 2018
Utility Decomposition for Planning under Uncertainty for Autonomous Driving
AAMAS '18: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent SystemsPages 1731–1732The objective of this research is to provide scalable decision making algorithms for autonomously navigating urban environments. The vehicle must plan in a stochastic environment with many entities to avoid, rapid changes in driver behavior, and partial ...
- research-articleJuly 2018
Decentralized Reinforcement Learning Inspired by Multiagent Systems
AAMAS '18: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent SystemsPages 1729–1730Existence can perhaps be viewed an exercise of searching high-dimensional, rugged, and approximated (using training data) landscapes for (often time-delayed) rewards. Bounded rationality imposes limits on the success of solutions that can be found by ...
- research-articleJuly 2018
Behavior Model Calibration for Epidemic Simulations
AAMAS '18: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent SystemsPages 1640–1648Computational epidemiologists frequently employ large-scale agent-based simulations of human populations to study disease outbreaks and assess intervention strategies. The agents used in such simulations rarely capture the real-world decision-making of ...
- research-articleJuly 2018
Human-Interactive Subgoal Supervision for Efficient Inverse Reinforcement Learning
AAMAS '18: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent SystemsPages 1380–1387Humans are able to understand and perform complex tasks by strategically structuring tasks into incremental steps or sub-goals. For a robot attempting to learn to perform a sequential task with critical subgoal states, these subgoal states can provide a ...
- research-articleJuly 2018
Multi-Armed Bandit Algorithms for Crowdsourcing Systems with Online Estimation of Workers' Ability
AAMAS '18: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent SystemsPages 1345–1352Crowdsourcing systems have become a valuable solution for various organizations to outsource work on a temporary basis. Quality assurance in these systems remains a key issue due to the distributed setup of the crowdsourcing platforms and the absence of ...
- research-articleJuly 2018
Faster Policy Adaptation in Environments with Exogeneity: A State Augmentation Approach
AAMAS '18: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent SystemsPages 1035–1043The reinforcement learning literature typically assumes fixed state transition functions for the sake of tractability. However, in many real-world tasks, the state transition function changes over time, and this change may be governed by exogenous ...