Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Intensive versus Non-intensive Actor-Critic Reinforcement Learning Algorithms

  • Conference paper
Artificial Intelligence and Soft Computing - ICAISC 2004 (ICAISC 2004)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3070))

Included in the following conference series:

Abstract

Algorithms of reinforcement learning usually employ consecutive agent’s actions to construct gradients estimators to adjust agent’s policy. The policy is then the result of some kind of stochastic approximation. Because of slowness of stochastic approximation, such algorithms are usually much too slow to be employed, e.g. in real-time adaptive control.

In this paper we analyze replacing the stochastic approximation with the estimation based on the entire available history of an agent-environment interaction. We design an algorithm of reinforcement learning in continuous space/action domain that is of orders of magnitude faster then the classical methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Barto, G., Sutton, R.S., Anderson, C.W.: Neuronlike Adaptive Elements That Can Learn Difficult Learning Control Problems. IEEE Trans. Syst., Man, Cybern. SMC-13, 834–846 (1983)

    Google Scholar 

  2. Doya, K.: Reinforcemente learning in continuous time and space. Neural Computation 12, 243–269 (2000)

    Article  Google Scholar 

  3. Konda, V.R., Tsitsiklis, J.N.: Actor-Critic Algorithms. SIAM Journal on Control and Optimization 42(4), 1143–1166 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  4. Lagoudakis, M.G., Paar, R.: Model-free least-squares policy iteration. Advances in Neural Information Processing Systems 14 (2002)

    Google Scholar 

  5. Moore, W., Atkeson, C.G.: Prioritized Sweeping: Reinforcement Learning with Less Data and Less Real Time. Machine Learning 13 (October 1993)

    Google Scholar 

  6. Precup, D., Sutton, R.S., Singh, S.: Eligibility Traces for Off-Policy Policy Evaluation. In: Proceedings of the 17th International Conference on Machine Learning, Morgan Kaufmann, San Francisco (2000)

    Google Scholar 

  7. Precup, D., Sutton, R.S., Dasgupta, S.: Off-policy temporal-difference learning with function approximation. In: Proceedings of the Eighteenth International Conference on Machine Learning (2001)

    Google Scholar 

  8. Sutton, R.S.: Integrated Architectures For Learning, Planning, and Reacting Based on Approximating Dynamic Programming. In: Proceedings of the Seventh Int. Conf. on Machine Learning, pp. 216–224. Morgan Kaufmann, San Francisco (1990)

    Google Scholar 

  9. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)

    Google Scholar 

  10. Sutton, R.S., McAllester, D., Singh, S., Mansour, Y.: Policy Gradient Methods for Reinforcement Learning with Function Approximation. In: Advances in Information Processing Systems, vol. 12, pp. 1057–1063. MIT Press, Cambridge (2000)

    Google Scholar 

  11. Watkins, C., Dayan, P.: Q-Learning. Machine Learning 8, 279–292 (1992)

    MATH  Google Scholar 

  12. Wawrzynski, P., Pacut, A.: A simple actor-critic algorithm for continuous environments (2003) (submitted for publication), available at http://home.elka.pw.edu.pl/~pwawrzyn

  13. Wawrzynski, P., Pacut, A.: Model-free off-policy reinforcement learning in continuous environment (2004) (submitted for publication), available at http://home.elka.pw.edu.pl/~pwawrzyn

  14. Williams, R.: Simple statistical gradient following algorithms for connectionist reinforcement learning. Machine Learning 8, 256–299 (1992)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wawrzynski, P., Pacut, A. (2004). Intensive versus Non-intensive Actor-Critic Reinforcement Learning Algorithms. In: Rutkowski, L., Siekmann, J.H., Tadeusiewicz, R., Zadeh, L.A. (eds) Artificial Intelligence and Soft Computing - ICAISC 2004. ICAISC 2004. Lecture Notes in Computer Science(), vol 3070. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24844-6_145

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-24844-6_145

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22123-4

  • Online ISBN: 978-3-540-24844-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics