Anton Schwartz

While most Reinforcement Learning work utilizes temporal discounting to evaluate performance, the reasons for this are unclear. Is it out of desire or necessity? We argue that it is not out of desire, and seek to dispel the notion that... more

While most Reinforcement Learning work utilizes temporal discounting to evaluate performance, the reasons for this are unclear. Is it out of desire or necessity? We argue that it is not out of desire, and seek to dispel the notion that temporal discounting is necessary by proposing a framework for undiscounted optimization. We present a metric of undiscounted performance and an algorithm for finding action policies that maximize that measure. The technique, which we call Rlearning, is modelled after the popular Q-learning algorithm [17]. Initial experimental results are presented which attest to a great improvement over Q-learning in some simple cases.

Publication Date: 1993

Publication Name: Machine Learning: Proceedings of the Tenth International Conference

Research Interests:
Artificial Intelligence, Reinforcement Learning, Machine Learning, Dynamic programming, Habituation, and Hedonic Treadmill

Download (.pdf)

Publisher: Elsevier

Publication Name: Machine Learning Proceedings 1993

Research Interests:
Computer Science and Reinforcement Learning

Download (.pdf)

Publication Date: Sep 26, 1999

Research Interests:
Computer Science and Reinforcement Learning

Download (.pdf)

Reinforcement learning techniques address the problem of learning to select actions in unknown, dynamic environments. It is widely acknowledged that to be of use in complex domains, reinforcement learning techniques must be combined with... more

Reinforcement learning techniques address the problem of learning to select actions in unknown, dynamic environments. It is widely acknowledged that to be of use in complex domains, reinforcement learning techniques must be combined with generalizing function approximation methods such as artificial neural networks. Little, however, is understood about the theoretical properties of such combinations, and many researchers have encountered failures in practice. In this paper we identify a prime source of such failures—namely, a systematic overestimation of utility values. Using Watkins’ Q-Learning [18] as an example, we give a theoretical account of the phenomenon, deriving conditions under which one may expected it to cause learning to fail. Employing some of the most popular function approximators, we present experimental results which support the theoretical findings.

Publication Date: 1999

Research Interests:
Computer Science, Reinforcement Learning, Function approximation, Artificial Neural Network, and Generating Function

Download (.pdf)

Publication Date: 1995

Publication Name: Nips

Research Interests:
Reinforcement Learning

Download (.pdf)

Publication Date: 1994

Publication Name: Nips

Publication Date: Oct 21, 1993

Research Interests:
Reinforcement Learning, Function approximation, Artificial Neural Network, and Generating Function

Download (.pdf)

Publication Date: 1993

Publication Name: Machine Learning: Proceedings of the Tenth International Conference

Research Interests: Artificial Intelligence, Reinforcement Learning, Machine Learning, Dynamic programming, Habituation, and Hedonic Treadmill<div>()</div>

Publisher: Elsevier

Publication Name: Machine Learning Proceedings 1993

Research Interests: Computer Science and Reinforcement Learning<div>()</div>

Publication Date: Sep 26, 1999

Research Interests: Computer Science and Reinforcement Learning<div>()</div>

Publication Date: 1999

Research Interests: Computer Science, Reinforcement Learning, Function approximation, Artificial Neural Network, and Generating Function<div>()</div>

Publication Date: 1995

Publication Name: Nips

Research Interests: Reinforcement Learning<div>()</div>

Publication Date: 1994

Publication Name: Nips

Publication Date: Oct 21, 1993

Research Interests: Reinforcement Learning, Function approximation, Artificial Neural Network, and Generating Function<div>()</div>

Log In

Research Interests:
Artificial Intelligence, Reinforcement Learning, Machine Learning, Dynamic programming, Habituation, and Hedonic Treadmill

Research Interests:
Computer Science and Reinforcement Learning

Research Interests:
Computer Science and Reinforcement Learning

Research Interests:
Computer Science, Reinforcement Learning, Function approximation, Artificial Neural Network, and Generating Function

Research Interests:
Reinforcement Learning

Research Interests:
Reinforcement Learning, Function approximation, Artificial Neural Network, and Generating Function