A perspective on off-policy evaluation in reinforcement learning

Li, Lihong

doi:10.1007/s11704-019-9901-7

A perspective on off-policy evaluation in reinforcement learning

Perspective
Published: 17 June 2019

Volume 13, pages 911–912, (2019)
Cite this article

Frontiers of Computer Science Aims and scope Submit manuscript

Lihong Li¹

288 Accesses
5 Citations
Explore all metrics

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Bottou L, Peters J, Quiñonero-Candela J, Charles D X, Chickering D M, Portugaly E, Ray D, Simard P, Snelson E. Counterfactual reasoning and learning systems: the example of computational advertising. Journal of Machine Learning Research, 2013, 14(1): 3207–3260
MathSciNet MATH Google Scholar
Hofmann K, Li L, Radlinski F. Online evaluation for information retrieval. Foundations and Trends in Information Retrieval, 2016, 10(1): 1–117
Article Google Scholar
Li L, Chu W, Langford J, Schapire R E. A contextual-bandit approach to personalized news article recommendation. In: Proceedings of the 19th International Conference on World Wide Web. 2010, 661–670
Chapter Google Scholar
Dudík M, Langford J, Li L. Doubly robust policy evaluation and learning. In: Proceedings of the 28th International Conference on Machine Learning. 2011, 1097–1104
Google Scholar
Swaminathan A, Joachims T. The self-normalized estimator for counterfactual learning. In: Proceedings of the 28th International Conference on Neural Information Processing Systems. 2015, 3231–3239
Google Scholar
Wang Y X, Agarwal A, Dudík M. Optimal and adaptive off-policy evaluation in contextual bandits. In: Proceedings of the 34th International Conference on Machine Learning. 2017, 3589–3597
Google Scholar
Jiang N, Li L. Doubly robust off-policy evaluation for reinforcement learning. In: Proceedings of the 33rd International Conference on Machine Learning. 2016, 652–661
Google Scholar
Li L, Munos R, Szepesvári C. Toward minimax off-policy value estimation. In: Proceedings of the 18th International Conference on Artificial Intelligence and Statistics. 2015, 608–616
Google Scholar
Precup D, Sutton R S, Singh S P. Eligibility traces for off-policy policy evaluation. In: Proceedings of the 17th International Conference on Machine Learning. 2000, 759–766
Google Scholar
Liu Q, Li L, Tang Z, Zhou D. Breaking the curse of horizon: infinite-horizon off-policy estimation. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2018, 5361–5371
Google Scholar

Download references

Author information

Authors and Affiliations

Google Brain, Kirkland, WA, 98033, USA
Lihong Li

Authors

Lihong Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lihong Li.

Additional information

Lihong Li is a research scientist at Google Brain, USA. Previously, he held research positions at Yahoo! Research (Silicon Valley) and Microsoft Research (Redmond). His main research interests are in reinforcement learning, including contextual bandits, and other related problems in AI. His work has found applications in recommendation, advertising, Web search and conversation systems, and has won best paper awards at ICML, AISTATS and WSDM. He serves as area chair or senior program committee member at major AI/ML conferences such as AAAI, ICLR, ICML, IJCAI and NIPS/NeurIPS.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, L. A perspective on off-policy evaluation in reinforcement learning. Front. Comput. Sci. 13, 911–912 (2019). https://doi.org/10.1007/s11704-019-9901-7

Download citation

Received: 04 April 2019
Published: 17 June 2019
Issue Date: October 2019
DOI: https://doi.org/10.1007/s11704-019-9901-7

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A perspective on off-policy evaluation in reinforcement learning

Access this article

Subscribe and save

Buy Now

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Subscribe and save

Buy Now

Search

Navigation