Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3061053.3061249guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Online Bellman residual and temporal difference algorithms with predictive error guarantees

Published: 09 July 2016 Publication History

Abstract

We establish connections from optimizing Bellman Residual and Temporal Difference Loss to worst-case long-term predictive error. In the online learning framework, learning takes place over a sequence of trials with the goal of predicting a future discounted sum of rewards. Our first analysis shows that, together with a stability assumption, any no-regret online learning algorithm that minimizes Bellman error ensures small prediction error. Our second analysis shows that applying the family of online mirror descent algorithms on temporal difference loss also ensures small prediction error. No statistical assumptions are made on the sequence of observations, which could be non-Markovian or even adversarial. Our approach thus establishes a broad new family of provably sound algorithms and provides a generalization of previous worst-case results for minimizing predictive error. We investigate the potential advantages of some of this family both theoretically and empirically on benchmark problems.

References

[1]
Leemon Baird. Residual algorithms: Reinforcement learning with function approximation. In Proceedings of the 12th International Conference on Machine Learning (ICML 1995) , pages 30-37, 1995.
[2]
Adam Coates, Pieter Abbeel, and Andrew Y Ng. Learning for control from multiple demonstrations. In Proceedings of the 25th international conference on Machine learning (ICML 2008) , pages 144-151, 2008.
[3]
Elad Hazan and Satyen Kale. Projection-free Online Learning. In 29th International Conference on Machine Learning (ICML 2012) , pages 521-528, 2012.
[4]
Elad Hazan, Amit Agarwal, and Satyen Kale. Logarithmic regret algorithms for online convex optimization. In Proceedings of the 19th annual conference on Computa- tional Learning Theory (COLT 2006) , pages 169-192, 2006.
[5]
Brian Kulis, Peter L Bartlett, Bartlett Eecs, and Berkeley Edu. Implicit Online Learning. In Proceedings of the 27th international conference on Machine learning (ICML 2010) , pages 575-582, 2010.
[6]
Lihong Li. A worst-case comparison between temporal difference and residual gradient with linear function approximation. In Proceedings of the 25th international conference on Machine learning (ICML 2008) , pages 560-567, 2008.
[7]
Stephane Ross and J. Andrew Bagnell. Stability Conditions for Online Learnability. arXiv:1108.3154 , 2011.
[8]
Ankan Saha, Prateek Jain, and Ambuj Tewari. The Interplay Between Stability and Regret in Online Learning. arXiv preprint arXiv:1211.6158 , pages 1-19, 2012.
[9]
Robert E. Schapire and Manfred K. Warmuth. On the worst-case analysis of temporal-difference learning algorithms. Machine Learning , 22(1):95-121, 1996.
[10]
Bruno Scherrer. Should one compute the Temporal Difference fix point or minimize the Bellman Residual? The unified oblique projection view. International Conference on Machine Learning (ICML 2010) , 2010.
[11]
Ralf Schoknecht and Artur Merke. TD(0) Converges Provably Faster than the Residual Gradient Algorithm. In International Conference on Machine Learning (ICML 2003) , pages 680-687, 2003.
[12]
Shai Shalev-Shwartz. Online Learning and Online Convex Optimization. Foundations and Trends in Machine Learning , 4(2):107-194, 2011.
[13]
Wen Sun and J. Andrew (Drew) Bagnell. Online Bellman Residual Algorithms with Predictive Error Guarantees. In The 31st Conference on Uncertainty in Artificial Intelligence (UAI 2015) , July 2015.
[14]
Richard S. Sutton and Andrew G. Barto. Reinforcement Learning: An Introduction . MIT Press, 1998.
[15]
R S Sutton. Learning to Predict by the Methods of Temporal Difference. Machine Learning , pages 9- 44, 1988.
[16]
Aviv Tamar, Panos Toulis, Shie Mannor, and Edoardo M. Airoldi. Implicit Temporal Differences. arXiv:1412.6734 , pages 1-6, 2014.
[17]
Martin Zinkevich. Online Convex Programming and Generalized Infinitesimal Gradient Ascent. In International Conference on Machine Learning (ICML 2003) , pages 421-422, 2003.

Cited By

View all
  • (2018)Continuous-time value function approximation in reproducing kernel hilbert spacesProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3327144.3327205(2818-2829)Online publication date: 3-Dec-2018

Index Terms

  1. Online Bellman residual and temporal difference algorithms with predictive error guarantees
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Guide Proceedings
      IJCAI'16: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence
      July 2016
      4277 pages
      ISBN:9781577357704

      Sponsors

      • Sony: Sony Corporation
      • Arizona State University: Arizona State University
      • Microsoft: Microsoft
      • Facebook: Facebook
      • AI Journal: AI Journal

      Publisher

      AAAI Press

      Publication History

      Published: 09 July 2016

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 16 Oct 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2018)Continuous-time value function approximation in reproducing kernel hilbert spacesProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3327144.3327205(2818-2829)Online publication date: 3-Dec-2018

      View Options

      View options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media