Adaptive-resolution reinforcement learning with polynomial exploration in deterministic domains

Bernstein, Andrey; Shimkin, Nahum

doi:10.1007/s10994-010-5186-7

Adaptive-resolution reinforcement learning with polynomial exploration in deterministic domains

Published: 05 May 2010

Volume 81, pages 359–397, (2010)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Adaptive-resolution reinforcement learning with polynomial exploration in deterministic domains

Download PDF

Andrey Bernstein¹ &
Nahum Shimkin¹

828 Accesses
21 Citations
Explore all metrics

Abstract

We propose a model-based learning algorithm, the Adaptive-resolution Reinforcement Learning (ARL) algorithm, that aims to solve the online, continuous state space reinforcement learning problem in a deterministic domain. Our goal is to combine adaptive-resolution approximation schemes with efficient exploration in order to obtain polynomial learning rates. The proposed algorithm uses an adaptive approximation of the optimal value function using kernel-based averaging, going from coarse to fine kernel-based representation of the state space, which enables us to use finer resolution in the “important” areas of the state space, and coarser resolution elsewhere. We consider an online learning approach, in which we discover these important areas online, using an uncertainty intervals exploration technique. In addition, we introduce an incremental variant of the ARL (IARL), which is a more practical version of the original algorithm with reduced computational complexity at each stage. Polynomial learning rates in terms of mistake bound (in a PAC framework) are established for these algorithms, under appropriate continuity assumptions.

Article PDF

Bayesian Reinforcement Learning with Exploration

Adaptive Sparse Grids in Reinforcement Learning

Uniformly constrained reinforcement learning

Article 06 December 2023

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Albus, J. S. (1975). A new approach to manipulator control: the cerebellar model articulation controller (CMAC). Journal of Dynamic Systems, Measurement and Control, 97, 220–227.
MATH Google Scholar
Antos, A., Szepesvári, C., & Munos, R. (2008). Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path. Machine Learning, 71(1), 89–129.
Article Google Scholar
Auer, P., & Ortner, R. (2006). Logarithmic online regret bounds for undiscounted reinforcement learning. In Proceedings of neural information processing systems conference (NIPS).
Bernstein, A. (2007). Adaptive state aggregation for reinforcement learning. Master’s thesis, Technion—Israel Institute of Technology. URL: http://tx.technion.ac.il/~andreyb/MSc_Thesis_final.pdf.
Bernstein, A., & Shimkin, N. (2008). Adaptive aggregation for reinforcement learning with efficient exploration: deterministic domains. In Proceedings of the 21st annual conference on learning theory (COLT 2008).
Bertsekas, D. P. (2007). Dynamic programming and optimal control (3rd ed., vol. 2). Belmont: Athena Scientific.
Google Scholar
Bonarini, A., Lazaric, A., & Restelli, M. (2005). LEAP: learning entities adaptive partitioning. In Proceedings of neural information processing systems conference (NIPS 2005), workshop on reinforcement learning benchmarks and bake-offs II, Whistler, Canada (pp. 41–47).
Boutilier, C., Dean, T., & Hanks, S. (1999). Decision-theoretic planning: structural assumptions and computational leverage. Journal of Artificial Intelligence Research, 11, 1–94.
MATH MathSciNet Google Scholar
Brafman, R. I., & Tennenholtz, M. (2002). R-MAX—a general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research, 3, 213–231.
Article MathSciNet Google Scholar
Chapman, H. (2007). Global confidence bound algorithms for the exploration-exploitation tradeoff in reinforcement learning. Master’s thesis, Technion—Israel Institute of Technology.
Chow, C.-S., & Tsitsiklis, J. N. (1991). An optimal one-way multigrid algorithm for discrete-time stochastic control. IEEE Transactions on Automatic Control, 36(8), 898–914.
Article MATH MathSciNet Google Scholar
Doya, K. (2000). Reinforcement learning in continuous time and space. Neural Computation, 12, 219–245.
Article Google Scholar
Jong, N., & Stone, P. (2006). Kernel-based models for reinforcement learning in continuous state spaces. In 23th international conference on machine learning (ICML 2006), workshop on kernel machines and reinforcement learning.
Kakade, S. M. (2003). On the sample complexity of reinforcement learning. PhD thesis, Gatsby Computational Neuroscience Unit, University College London, UK.
Kearns, M., & Singh, S. P. (2002). Near-optimal reinforcement learning in polynomial time. Machine Learning, 49, 209–232.
Article MATH Google Scholar
Konda, V. R., & Tsitsiklis, J. N. (2003). On actor-critic algorithms. SIAM Journal on Control and Optimization, 42(4), 1143–1166.
Article MATH MathSciNet Google Scholar
Loth, M., Davy, M., Coulom, R., & Preux, P. (2006) Equi-gradient temporal difference learning. In 23th international conference on machine learning (ICML 2006), workshop on kernel machines and reinforcement learning.
Moore, A. W., & Atkeson, C. G. (1995). The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces. Machine Learning, 21, 199–233.
Google Scholar
Munos, R., & Moore, A. W. (2002). Variable resolution discretization in optimal control. Machine Learning, 49, 291–323.
Article MATH Google Scholar
Munos, R., & Szepesvári, C. (2008). Finite-time bounds for fitted value iteration. Journal of Machine Learning Research, 9, 815–857.
Google Scholar
Nouri, A., & Littman, M. L. (2008). Multi-resolution exploration in continuous spaces. In Advances in neural information processing systems (NIPS) 21 (pp. 1209–1216).
Ormoneit, D., & Sen, S. (2002). Kernel-based reinforcement learning. Machine Learning, 49, 161–178.
Article MATH Google Scholar
Powell, W. B. (2007). Approximate dynamic programming for operations research: solving the curses of dimensionality. New York: Wiley.
Google Scholar
Puterman, M. L. (1994). Markov decision processes: discrete stochastic dynamic programming. New York: Wiley.
MATH Google Scholar
Singh, S. P., Jaakkola, T., & Jordan, M. I. (1995). Reinforcement learning with soft state aggregation. In Advances in neural information processing systems (NIPS) 7 (pp. 361–368).
Strehl, A. L., & Littman, M. L. (2005). A theoretical analysis of model-based interval estimation. In Proceedings of the 22nd international conference on machine learning (pp. 857–864).
Strehl, A. L., Li, L., & Littman, M. L. (2006a). Incremental model-based learners with formal learning-time guarantees. In Proceedings of the 22nd international conference on uncertainty in artificial intelligence (pp. 485–493).
Strehl, A. L., Wiewiora, E., Langford, J., & Littman, M. L. (2006b). PAC model-free reinforcement learning. In Proceedings of the 23nd international conference on machine learning (pp. 881–888).
Sutton, R. S. (1996). Generalization in reinforcement learning: successful examples using sparse coarse coding. In Advances in neural information processing systems 8 (NIPS) (pp. 1038–1044).
Tewari, A., & Bartlett, P. L. (2007). Optimistic linear programming gives logarithmic regret for irreducible MDPs. In Proceedings of neural information processing systems conference (NIPS).
Unser, M. (1999). Splines: A perfect fit for signal and image processing. IEEE Signal Processing Magazine, 16, 22–38.
Article Google Scholar
Whitt, W. (1978). Approximations of dynamic programs, I. Mathematics of Operations Research, 3(3), 231–243.
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering, Technion—Israel Institute of Technology, Haifa, Israel
Andrey Bernstein & Nahum Shimkin

Authors

Andrey Bernstein
View author publications
You can also search for this author in PubMed Google Scholar
Nahum Shimkin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andrey Bernstein.

Additional information

Editor: Roni Khardon.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bernstein, A., Shimkin, N. Adaptive-resolution reinforcement learning with polynomial exploration in deterministic domains. Mach Learn 81, 359–397 (2010). https://doi.org/10.1007/s10994-010-5186-7

Download citation

Received: 07 July 2009
Revised: 25 March 2010
Accepted: 16 April 2010
Published: 05 May 2010
Issue Date: December 2010
DOI: https://doi.org/10.1007/s10994-010-5186-7

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Adaptive-resolution reinforcement learning with polynomial exploration in deterministic domains

Abstract

Article PDF

Similar content being viewed by others

Bayesian Reinforcement Learning with Exploration

Adaptive Sparse Grids in Reinforcement Learning

Uniformly constrained reinforcement learning

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Adaptive-resolution reinforcement learning with polynomial exploration in deterministic domains

Abstract

Article PDF

Similar content being viewed by others

Bayesian Reinforcement Learning with Exploration

Adaptive Sparse Grids in Reinforcement Learning

Uniformly constrained reinforcement learning

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation