article

Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning

Author:

Ronald J. WilliamsAuthors Info & Claims

Machine Learning, Volume 8, Issue 3-4

Pages 229 - 256

https://doi.org/10.1007/BF00992696

Published: 01 May 1992 Publication History

Publisher Site

Abstract

This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units. These algorithms, called REINFORCE algorithms, are shown to make weight adjustments in a direction that lies along the gradient of expected reinforcement in both immediate-reinforcement tasks and certain limited forms of delayed-reinforcement tasks, and they do this without explicitly computing gradient estimates or even storing information from which such estimates could be computed. Specific examples of such algorithms are presented, some of which bear a close relationship to certain existing algorithms while others are novel but potentially interesting in their own right. Also given are results that show how such algorithms can be naturally integrated with backpropagation. We close with a brief discussion of a number of additional issues surrounding the use of such algorithms, including what is known about their limiting behaviors as well as further considerations that might be used to help develop similar but potentially more powerful reinforcement learning algorithms.

Cited By

View all

Kikuta DIkeuchi HTajiri KToyama YNakamura MNakano YDastani MSichman JAlechina NDignum V(2024)Electric Vehicle Routing for Emergency Power Supply with Deep Reinforcement LearningProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663152(2336-2338)Online publication date: 6-May-2024
https://dl.acm.org/doi/10.5555/3635637.3663152
Yang YFan MHe CWang JHuang HSartoretti GDastani MSichman JAlechina NDignum V(2024)Attention-based Priority Learning for Limited Time Multi-Agent Path FindingProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663063(1993-2001)Online publication date: 6-May-2024
https://dl.acm.org/doi/10.5555/3635637.3663063
Wu YFan MCao ZGao RHou YSartoretti GDastani MSichman JAlechina NDignum V(2024)Collaborative Deep Reinforcement Learning for Solving Multi-Objective Vehicle Routing ProblemsProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663059(1956-1965)Online publication date: 6-May-2024
https://dl.acm.org/doi/10.5555/3635637.3663059
Show More Cited By

Index Terms

Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks

Recommendations

Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching

To date, reinforcement learning has mostly been studied solving simple learning tasks. Reinforcement learning methods that have been studied so far typically converge slowly. The purpose of this work is thus two-fold: 1) to investigate the utility of ...
Reinforcement learning algorithms: A brief survey
Highlights
- RL can be used to solve problems involving sequential decision-making.
- RL is based on trial-and-error learning through rewards and punishments.
- The ultimate goal of an RL agent is to maximize cumulative reward.
- RL agent tries ...
Abstract
Reinforcement Learning (RL) is a machine learning (ML) technique to learn sequential decision-making in complex problems. RL is inspired by trial-and-error based human/animal learning. It can learn an optimal policy autonomously with knowledge ...
Reward Shaping in Episodic Reinforcement Learning
AAMAS '17: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems

Recent advancements in reinforcement learning confirm that reinforcement learning techniques can solve large scale problems leading to high quality autonomous decision making. It is a matter of time until we will see large scale applications of ...

Comments

Information & Contributors

Information

Published In

Machine Language Volume 8, Issue 3-4

May 1992

167 pages

ISSN:0885-6125

Editors:
Jaime G. Carbonell
Carnegie Mellon Univ., Pittsburgh, PA
,
Thomas Dietterich
Oregon State Univ., Corvallis

Issue’s Table of Contents

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 May 1992

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1,642
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 02 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Kikuta DIkeuchi HTajiri KToyama YNakamura MNakano YDastani MSichman JAlechina NDignum V(2024)Electric Vehicle Routing for Emergency Power Supply with Deep Reinforcement LearningProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663152(2336-2338)Online publication date: 6-May-2024
https://dl.acm.org/doi/10.5555/3635637.3663152
Yang YFan MHe CWang JHuang HSartoretti GDastani MSichman JAlechina NDignum V(2024)Attention-based Priority Learning for Limited Time Multi-Agent Path FindingProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663063(1993-2001)Online publication date: 6-May-2024
https://dl.acm.org/doi/10.5555/3635637.3663063
Wu YFan MCao ZGao RHou YSartoretti GDastani MSichman JAlechina NDignum V(2024)Collaborative Deep Reinforcement Learning for Solving Multi-Objective Vehicle Routing ProblemsProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663059(1956-1965)Online publication date: 6-May-2024
https://dl.acm.org/doi/10.5555/3635637.3663059
Orzan NAcar EGrossi DRădulescu RDastani MSichman JAlechina NDignum V(2024)Emergent Cooperation under Uncertain Incentive AlignmentProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663012(1521-1530)Online publication date: 6-May-2024
https://dl.acm.org/doi/10.5555/3635637.3663012
Leung CHu SLeung HDastani MSichman JAlechina NDignum V(2024)The Stochastic Evolutionary Dynamics of Softmax Policy Gradient in GamesProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3662966(1101-1109)Online publication date: 6-May-2024
https://dl.acm.org/doi/10.5555/3635637.3662966
Yu FChen MXia XZhu DPeng QDeng K(2024)Logistics Distribution Route Optimization With Time Windows Based on Multi-Agent Deep Reinforcement LearningInternational Journal of Information Technologies and Systems Approach10.4018/IJITSA.34208417:1(1-23)Online publication date: 16-Apr-2024
https://dl.acm.org/doi/10.4018/IJITSA.342084
Meng SZhou JChen XLiu YLu FHuang X(2024)Structure-Information-Based Reasoning over the Knowledge Graph: A Survey of Methods and ApplicationsACM Transactions on Knowledge Discovery from Data10.1145/367114818:8(1-42)Online publication date: 16-Aug-2024
https://dl.acm.org/doi/10.1145/3671148
Malagón MIrurozki ECeberio J(2024)A Combinatorial Optimization Framework for Probability-Based Algorithms by Means of Generative ModelsACM Transactions on Evolutionary Learning and Optimization10.1145/36656504:3(1-28)Online publication date: 22-May-2024
https://dl.acm.org/doi/10.1145/3665650
Heuillet ANasser AArioui HTabia H(2024)Efficient Automation of Neural Network Design: A Survey on Differentiable Neural Architecture SearchACM Computing Surveys10.1145/366513856:11(1-36)Online publication date: 28-Jun-2024
https://dl.acm.org/doi/10.1145/3665138
Franceschelli GMusolesi M(2024)Creativity and Machine Learning: A SurveyACM Computing Surveys10.1145/366459556:11(1-41)Online publication date: 28-Jun-2024
https://dl.acm.org/doi/10.1145/3664595
Show More Cited By

Abstract

Cited By

Index Terms

Recommendations

Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching

Reinforcement learning algorithms: A brief survey

Reward Shaping in Episodic Reinforcement Learning

Comments

Information

Published In

Publisher

Publication History

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations