Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/777092.777126acmconferencesArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Reinforcement learning for POMDPs based on action values and stochastic optimization

Published: 28 July 2002 Publication History

Abstract

We present a new, model-free reinforcement learning algorithm for learning to control partially-observable Markov decision processes. The algorithm incorporates ideas from action-value based reinforcement learning approaches, such as Q-Learning, as well as ideas from the stochastic optimization literature. Key to our approach is a new definition of action value, which makes the algorithm theoretically sound for partially-observable settings. We show that special cases of our algorithm can achieve probability one convergence to locally optimal policies in the limit, or probably approximately correct hill-climbing to a locally optimal policy in a finite number of samples.

References

[1]
Baird, L. C., and Moore, A.W. 1999. Gradient descent for general reinforcement learning. In Advances in Neural Information Processing Systems 11. MIT Press.
[2]
Baird, L. C. 1995. Residual algorithms: Reinforcement learning with function approximation. In Proceedings of the Twelfth International Conference on Machine Learning , 30-37. Morgan Kaufmann.
[3]
Gordon, G. 1996. Chattering in Sarsa(λ). CMU Learning Lab Internal Report. Available at www.cs.cmu.edu/~ggordon.
[4]
Greiner, R. 1996. PALO: A probabilistic hill-climbing algorithm. Artificial Intelligence 84(1-2):177-204.
[5]
Kaelbling, L.P. 1993. Learning in embedded systems. Cambridge, MA: MIT Press.
[6]
Kleywegt, A. J.; Shapiro, A.; and de Mello, T. H. 2001. The sample average approximation method for stochastic discrete optimization. SIAM Journal of Optimization 12:479-502.
[7]
Littman, M. L. 1994. Memoryless policies: Theoretical limitations and practical results. In From Animals to Animats 3: Proceedings of the Third International Conference on Simulation of Adaptive Behavior. Cambridge, MA: MIT Press.
[8]
Loch, J., and Singh, S. 1998. Using eligibility traces to find the best memoryless policy in a partially observable Markov decision process. In Proceedings of the Fifteenth International Conference on Machine Learning. San Francisco, CA: Morgan Kaufmann.
[9]
Madani, O.; Condon, A.; and Hanks, S. 1999. On the undecidability of probabilistic planning and infinite-horizon partially observable Markov decision process problems. In Proceedings of the Sixteenth National Conference on Artificial Intelligence. Cambridge, MA: MIT Press.
[10]
Maron, O., and Moore, A. 1994. Hoeffding races: Accelerating model selection search for classification and function approximation. In Advances in Neural Information Processing Systems 6, 59-66.
[11]
McCallum, A. K. 1995. Reinforcement learning with selective perception and hidden state. Ph.D. Dissertation, University of Rochester.
[12]
Ng, A., and Jordan, M. 2000. PEGASUS: A policy search method for large MDPs and POMDPs. In Uncertainty in Artificial Intelligence, Proceedings of the Sixteenth Conference .
[13]
Parr, R., and Russell, S. 1995. Approximating optimal policies for partially observable stochastic domains: In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence (IJCAI-95). San Francisco, CA: Morgan Kaufmann.
[14]
Pendrith, M.D., and McGarity, M. J. 1998. An analysis of direct reinforcement learning in non-Markovian domains. In Machine Learning: Proceedings of the 15th International Conference, 421-429.
[15]
Pendrith, M. D., and Ryan, M. R. K. 1996. Actual return reinforcement learning versus temporal differences: Some theoretical and experimental results. In Saitta, L., ed., Machine Learning: Proceedings of the 13th International Conference, 373-381.
[16]
Sutton, R. S., and Barto, A. G. 1998. Reinforcement Learning: An Introduction. Cambridge, Massachusetts: MIT Press/Bradford Books.
[17]
Sutton, R. S.; McAllister, D.; Singh, S.; and Mansour, Y. 2000. Policy gradient methods for reinforcement learning with function approximation. In Advances in Neural Information Processing Systems 12. MIT Press.
[18]
Whitehead, S. D. 1992. Reinforcement Learning for the Adaptive Control of Perception and Action. Ph.D. Dissertation, University of Rochester.
[19]
Williams, R. J. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning 8:229-256.

Cited By

View all
  • (2017)Can bounded and self-interested agents be teammates? Application to planning in ad hoc teamsAutonomous Agents and Multi-Agent Systems10.1007/s10458-016-9354-431:4(821-860)Online publication date: 1-Jul-2017
  • (2016)Learning to Act Optimally in Partially Observable Multiagent SettingsProceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems10.5555/2936924.2937241(1532-1533)Online publication date: 9-May-2016
  • (2016)Reinforcement Learning in Partially Observable Multiagent SettingsProceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems10.5555/2936924.2937002(530-538)Online publication date: 9-May-2016
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
Eighteenth national conference on Artificial intelligence
July 2002
1068 pages
ISBN:0262511290

Sponsors

  • NSF: National Science Foundation
  • Alberta Informatics Circle of Research Excellence (iCORE)
  • SIGAI: ACM Special Interest Group on Artificial Intelligence
  • Naval Research Laboratory: Naval Research Laboratory
  • AAAI: American Association for Artificial Intelligence
  • NASA Ames Research Center: NASA Ames Research Center
  • DARPA: Defense Advanced Research Projects Agency

Publisher

American Association for Artificial Intelligence

United States

Publication History

Published: 28 July 2002

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 02 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2017)Can bounded and self-interested agents be teammates? Application to planning in ad hoc teamsAutonomous Agents and Multi-Agent Systems10.1007/s10458-016-9354-431:4(821-860)Online publication date: 1-Jul-2017
  • (2016)Learning to Act Optimally in Partially Observable Multiagent SettingsProceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems10.5555/2936924.2937241(1532-1533)Online publication date: 9-May-2016
  • (2016)Reinforcement Learning in Partially Observable Multiagent SettingsProceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems10.5555/2936924.2937002(530-538)Online publication date: 9-May-2016
  • (2014)Team behavior in interactive dynamic influence diagrams with applications to ad hoc teamsProceedings of the 2014 international conference on Autonomous agents and multi-agent systems10.5555/2615731.2616061(1559-1560)Online publication date: 5-May-2014
  • (2012)Induction and learning of finite-state controllers from simulationProceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 310.5555/2343896.2343922(1203-1204)Online publication date: 4-Jun-2012
  • (2011)Reinforcement learning through global stochastic search in N-MDPsProceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part II10.5555/2034117.2034139(326-340)Online publication date: 5-Sep-2011
  • (2011)LearnPNPRoboCup 201010.5555/1983806.1983843(418-429)Online publication date: 1-Jan-2011
  • (2011)Reinforcement learning through global stochastic search in N-MDPsProceedings of the 2011th European Conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II10.1007/978-3-642-23783-6_21(326-340)Online publication date: 5-Sep-2011
  • (2010)Improving the performance of complex agent plans through reinforcement learningProceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 110.5555/1838206.1838302(723-730)Online publication date: 10-May-2010
  • (2002)The thing that we tried didn't work very wellProceedings of the Eighteenth conference on Uncertainty in artificial intelligence10.5555/2073876.2073895(154-161)Online publication date: 1-Aug-2002

View Options

Get Access

Login options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media