Reinforcement learning for joint optimization of multiple rewards
Abstract
References
Index Terms
- Reinforcement learning for joint optimization of multiple rewards
Recommendations
Semi-Regenerative Processes with Unbounded Rewards
A semi-regenerative process SRP is combined with a reward structure such that the accumulated reward during [0, t] is the sum of a functional of the SRP and a functional of the embedded Markov renewal process MRP. For the expected discounted return a ...
Continuous-Time Markov Decision Processes with Discounted Rewards: The Case of Polish Spaces
This paper deals with continuous-time Markov decision processes in Polish spaces, under an expected discounted reward criterion. The transition rates of underlying continuous-time jump Markov processes are allowed to be unbounded, and the reward rates ...
Optimal learning with non-Gaussian rewards
WSC '13: Proceedings of the 2013 Winter Simulation Conference: Simulation: Making Decisions in a Complex WorldWe propose a theoretical and computational framework for approximating the optimal policy in multi-armed bandit problems where the reward distributions are non-Gaussian. We first construct a probabilistic interpolation of the sequence of discrete-time ...
Comments
Information & Contributors
Information
Published In
Publisher
JMLR.org
Publication History
Qualifiers
- Research-article
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 18Total Downloads
- Downloads (Last 12 months)18
- Downloads (Last 6 weeks)10
Other Metrics
Citations
View Options
Get Access
Login options
Check if you have access through your login credentials or your institution to get full access on this article.
Sign in