tutorial

Offline Evaluation and Optimization for Interactive Systems

Author:

Lihong LiAuthors Info & Claims

WSDM '15: Proceedings of the Eighth ACM International Conference on Web Search and Data Mining

Pages 413 - 414

https://doi.org/10.1145/2684822.2697040

Published: 02 February 2015 Publication History

Get Access

Abstract

Evaluating and optimizing an interactive system (like search engines, recommender and advertising systems) from historical data against a predefined online metric is challenging, especially when that metric is computed from user feedback such as clicks and payments. The key challenge is counterfactual in nature: we only observe a user's feedback for actions taken by the system, but we do not know what that user would have reacted to a different action. The golden standard to evaluate such metrics of a user-interacting system is online A/B experiments (a.k.a. randomized controlled experiments), which can be expensive in terms of both time and engineering resources. Offline evaluation/optimization (sometimes referred to as off-policy learning in the literature) thus becomes critical, aiming to evaluate the same metrics without running (many) expensive A/B experiments on live users. One approach to offline evaluation is to build a user model that simulates user behavior (clicks, purchases, etc.) under various contexts, and then evaluate metrics of a system with this simulator. While being straightforward and common in practice, the reliability of such model-based approaches relies heavily on how well the user model is built. Furthermore, it is often difficult to know a priori whether a user model is good enough to be trustable.

Recent years have seen a growing interest in another solution to the offline evaluation problem. Using statistical techniques like importance sampling and doubly robust estimation, the approach can give unbiased estimates of metrics for a wide range of problems. It enjoys other benefits as well. For example, it often allows data scientists to obtain a confidence interval for the estimate to quantify the amount of uncertainty; it does not require building user models, so is more robust and easier to apply. All these benefits make the approach particularly attractive to a wide range of problems. Successful applications have been reported in the last few years by some of the industrial leaders.

This tutorial gives a review of the basic theory and representative techniques. Applications of these techniques are illustrated through several case studies done at Microsoft and Yahoo!.

Cited By

View all

Zhang WZhou TWang JXu JKrishnapuram BShah MSmola AAggarwal CShen DRastogi R(2016)Bid-aware Gradient Descent for Unbiased Learning with Censored Data in Display AdvertisingProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining10.1145/2939672.2939713(665-674)Online publication date: 13-Aug-2016
https://dl.acm.org/doi/10.1145/2939672.2939713
Narita YYasui SYata K(undefined)Efficient Counterfactual Learning from Bandit FeedbackSSRN Electronic Journal10.2139/ssrn.3300346
https://doi.org/10.2139/ssrn.3300346

Index Terms

Offline Evaluation and Optimization for Interactive Systems
1. Computing methodologies
  1. Machine learning
2. Information systems
  1. Information retrieval

Recommendations

Counterfactual Estimation and Optimization of Click Metrics in Search Engines: A Case Study
WWW '15 Companion: Proceedings of the 24th International Conference on World Wide Web

Optimizing an interactive system against a predefined online metric is particularly challenging, especially when the metric is computed from user feedback such as clicks and payments. The key challenge is the counterfactual nature: in the case of Web ...
Bridging the Gap Between User-centric and Offline Evaluation of Personalized Recommendation Systems
UMAP '18: Adjunct Publication of the 26th Conference on User Modeling, Adaptation and Personalization

In this paper, we propose to evaluate recommender systems by conducting both offline and user-centric evaluations, while considering multiple quality aspects in realistic settings. This comprehensive evaluation would provide insight on how to improve ...
A comparative analysis of offline and online evaluations and discussion of research paper recommender system evaluation
RepSys '13: Proceedings of the International Workshop on Reproducibility and Replication in Recommender Systems Evaluation

Offline evaluations are the most common evaluation method for research paper recommender systems. However, no thorough discussion on the appropriateness of offline evaluations has taken place, despite some voiced criticism. We conducted a study in which ...

Comments

Information & Contributors

Information

Published In

WSDM '15: Proceedings of the Eighth ACM International Conference on Web Search and Data Mining

February 2015

482 pages

ISBN:9781450333177

DOI:10.1145/2684822

General Chairs:
Xueqi Cheng
ICT, Chinese Academy of Sciences, China
,
Hang Li
Huawei Technologies, China
,
Program Chairs:
Evgeniy Gabrilovich
Google, USA
,
Jie Tang
Tsinghua University, China

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 February 2015

Check for updates

Author Tags

Qualifiers

Tutorial

Conference

WSDM 2015

Sponsor:

WSDM 2015: Eighth ACM International Conference on Web Search and Data Mining

February 2 - 6, 2015

Shanghai, China

Acceptance Rates

WSDM '15 Paper Acceptance Rate 39 of 238 submissions, 16%;

Overall Acceptance Rate 498 of 2,863 submissions, 17%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
206
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 21 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Zhang WZhou TWang JXu JKrishnapuram BShah MSmola AAggarwal CShen DRastogi R(2016)Bid-aware Gradient Descent for Unbiased Learning with Censored Data in Display AdvertisingProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining10.1145/2939672.2939713(665-674)Online publication date: 13-Aug-2016
https://dl.acm.org/doi/10.1145/2939672.2939713
Narita YYasui SYata K(undefined)Efficient Counterfactual Learning from Bandit FeedbackSSRN Electronic Journal10.2139/ssrn.3300346
https://doi.org/10.2139/ssrn.3300346

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Counterfactual Estimation and Optimization of Click Metrics in Search Engines: A Case Study

Bridging the Gap Between User-centric and Offline Evaluation of Personalized Recommendation Systems

A comparative analysis of offline and online evaluations and discussion of research paper recommender system evaluation

Comments

Published In

Sponsors

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Other Metrics

Article Metrics

Other Metrics

Cited By

Login options

Full Access

PDF

eReader

Abstract

Cited By

Index Terms

Recommendations

Counterfactual Estimation and Optimization of Click Metrics in Search Engines: A Case Study

Bridging the Gap Between User-centric and Offline Evaluation of Personalized Recommendation Systems

A comparative analysis of offline and online evaluations and discussion of research paper recommender system evaluation

Comments

Information

Published In

Sponsors

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Get Access

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations