ON THE WORST-CASE ANALYSIS OF TEMPORAL-DIFFERENCE LEARNING ALGORITHMS

ON THE WORST-CASE ANALYSIS OF TEMPORAL-DIFFERENCE LEARNING ALGORITHMSOctober 1994

October 1994

1994 Technical Report

Publisher:

University of California at Santa Cruz
Computer and Information Sciences Dept. 265 Applied Sciences Building Santa Cruz, CA
United States

Published:01 October 1994

Bibliometrics

Abstract

We study the worst-case behavior of a family of learning algorithms based on Sutton''s method of temporal differences. In our on-line learning framework, learning takes place in a sequence of trials, and the goal of the learning algorithm is to estimate a discounted sum of all the reinforcements that will be received in the future. In this setting, we are able to prove general upper bounds on the performance of a slightly modified version of Sutton''s so-called TD(lambda) algorithm. These bounds are stated in terms of the performance of the best linear predictor on the given training sequence, and are proved without making any statistical assumptions of any kind about the process producing the learner''s observed training sequence. We also prove lower bounds on the performance of any algorithm for this learning problem, and give a similar analysis of the closely related problem of learning to predict in a model in which the learner must produce predictions for a whole batch of observations before receiving reinforcement.

Contributors

Robert Elias Schapire
Microsoft Research
- Publication Years1987 - 2024
- Publication counts139
- Citation count14,736
- Available for Download37
- Downloads (cumulative)48,470
- Downloads (12 months)6,217
- Downloads (6 weeks)930
- Average Downloads per Article1,310
- Average Citation per Article106
View Full Profile
Manfred Klaus Warmuth
Google LLC
- Publication Years1981 - 2024
- Publication counts172
- Citation count5,052
- Available for Download36
- Downloads (cumulative)27,608
- Downloads (12 months)3,990
- Downloads (6 weeks)586
- Average Downloads per Article767
- Average Citation per Article29
View Full Profile

Comments

Recommendations

On the worst-case analysis of temporal-difference learning algorithms
Special issue on reinforcement learning
Gradient temporal-difference learning algorithms
Relative Loss Bounds for Temporal-Difference Learning

Foster and Vovk proved relative loss bounds for linear regression where the total loss of the on-line algorithm minus the total loss of the best linear predictor (chosen in hindsight) grows logarithmically with the number of trials. We give similar ...

Browse Reports

Sections

On the worst-case analysis of temporal-difference learning algorithms

Gradient temporal-difference learning algorithms

Relative Loss Bounds for Temporal-Difference Learning

Save to Binder