Intensive versus Non-intensive Actor-Critic Reinforcement Learning Algorithms

Wawrzynski, Pawel; Pacut, Andrzej

doi:10.1007/978-3-540-24844-6_145

Pawel Wawrzynski²² &
Andrzej Pacut²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3070))

Included in the following conference series:

International Conference on Artificial Intelligence and Soft Computing

1701 Accesses
1 Citations

Abstract

Algorithms of reinforcement learning usually employ consecutive agent’s actions to construct gradients estimators to adjust agent’s policy. The policy is then the result of some kind of stochastic approximation. Because of slowness of stochastic approximation, such algorithms are usually much too slow to be employed, e.g. in real-time adaptive control.

In this paper we analyze replacing the stochastic approximation with the estimation based on the entire available history of an agent-environment interaction. We design an algorithm of reinforcement learning in continuous space/action domain that is of orders of magnitude faster then the classical methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Understanding Failures of Deterministic Actor-Critic with Continuous Action Spaces and Sparse Rewards

Reinforcement Learning Algorithms with Selector, Tuner, or Estimator

Article 19 September 2023

Multi-agent Gradient-Based Off-Policy Actor-Critic Algorithm for Distributed Reinforcement Learning

Article Open access 24 June 2024

References

Barto, G., Sutton, R.S., Anderson, C.W.: Neuronlike Adaptive Elements That Can Learn Difficult Learning Control Problems. IEEE Trans. Syst., Man, Cybern. SMC-13, 834–846 (1983)
Google Scholar
Doya, K.: Reinforcemente learning in continuous time and space. Neural Computation 12, 243–269 (2000)
Article Google Scholar
Konda, V.R., Tsitsiklis, J.N.: Actor-Critic Algorithms. SIAM Journal on Control and Optimization 42(4), 1143–1166 (2003)
Article MATH MathSciNet Google Scholar
Lagoudakis, M.G., Paar, R.: Model-free least-squares policy iteration. Advances in Neural Information Processing Systems 14 (2002)
Google Scholar
Moore, W., Atkeson, C.G.: Prioritized Sweeping: Reinforcement Learning with Less Data and Less Real Time. Machine Learning 13 (October 1993)
Google Scholar
Precup, D., Sutton, R.S., Singh, S.: Eligibility Traces for Off-Policy Policy Evaluation. In: Proceedings of the 17th International Conference on Machine Learning, Morgan Kaufmann, San Francisco (2000)
Google Scholar
Precup, D., Sutton, R.S., Dasgupta, S.: Off-policy temporal-difference learning with function approximation. In: Proceedings of the Eighteenth International Conference on Machine Learning (2001)
Google Scholar
Sutton, R.S.: Integrated Architectures For Learning, Planning, and Reacting Based on Approximating Dynamic Programming. In: Proceedings of the Seventh Int. Conf. on Machine Learning, pp. 216–224. Morgan Kaufmann, San Francisco (1990)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Google Scholar
Sutton, R.S., McAllester, D., Singh, S., Mansour, Y.: Policy Gradient Methods for Reinforcement Learning with Function Approximation. In: Advances in Information Processing Systems, vol. 12, pp. 1057–1063. MIT Press, Cambridge (2000)
Google Scholar
Watkins, C., Dayan, P.: Q-Learning. Machine Learning 8, 279–292 (1992)
MATH Google Scholar
Wawrzynski, P., Pacut, A.: A simple actor-critic algorithm for continuous environments (2003) (submitted for publication), available at http://home.elka.pw.edu.pl/~pwawrzyn
Wawrzynski, P., Pacut, A.: Model-free off-policy reinforcement learning in continuous environment (2004) (submitted for publication), available at http://home.elka.pw.edu.pl/~pwawrzyn
Williams, R.: Simple statistical gradient following algorithms for connectionist reinforcement learning. Machine Learning 8, 256–299 (1992)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Control and Computation Engineering, Warsaw University of Technology, 00 665, Warsaw, Poland
Pawel Wawrzynski & Andrzej Pacut

Authors

Pawel Wawrzynski
View author publications
You can also search for this author in PubMed Google Scholar
Andrzej Pacut
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Artificial Intelligence, Academy of Humanities and Economics, Poland
Leszek Rutkowski
German Research Center of Artificial Intelligence (DFKI), Germany
Jörg H. Siekmann
Institute of Automatics, AGH University of Science and Technology, Al. Mickiewicza 30, PL-30-059, Kraków, Poland
Ryszard Tadeusiewicz
Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Initiative in Soft Computing (BISC), 94720-1776, Berkeley, CA
Lotfi A. Zadeh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wawrzynski, P., Pacut, A. (2004). Intensive versus Non-intensive Actor-Critic Reinforcement Learning Algorithms. In: Rutkowski, L., Siekmann, J.H., Tadeusiewicz, R., Zadeh, L.A. (eds) Artificial Intelligence and Soft Computing - ICAISC 2004. ICAISC 2004. Lecture Notes in Computer Science(), vol 3070. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24844-6_145

Download citation

DOI: https://doi.org/10.1007/978-3-540-24844-6_145
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22123-4
Online ISBN: 978-3-540-24844-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Intensive versus Non-intensive Actor-Critic Reinforcement Learning Algorithms

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Understanding Failures of Deterministic Actor-Critic with Continuous Action Spaces and Sparse Rewards

Reinforcement Learning Algorithms with Selector, Tuner, or Estimator

Multi-agent Gradient-Based Off-Policy Actor-Critic Algorithm for Distributed Reinforcement Learning

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Intensive versus Non-intensive Actor-Critic Reinforcement Learning Algorithms

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Understanding Failures of Deterministic Actor-Critic with Continuous Action Spaces and Sparse Rewards

Reinforcement Learning Algorithms with Selector, Tuner, or Estimator

Multi-agent Gradient-Based Off-Policy Actor-Critic Algorithm for Distributed Reinforcement Learning

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation