Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2684822.2685311acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

Toward Predicting the Outcome of an A/B Experiment for Search Relevance

Published: 02 February 2015 Publication History

Abstract

A standard approach to estimating online click-based metrics of a ranking function is to run it in a controlled experiment on live users. While reliable and popular in practice, configuring and running an online experiment is cumbersome and time-intensive. In this work, inspired by recent successes of offline evaluation techniques for recommender systems, we study an alternative that uses historical search log to reliably predict online click-based metrics of a \emph{new} ranking function, without actually running it on live users. To tackle novel challenges encountered in Web search, variations of the basic techniques are proposed. The first is to take advantage of diversified behavior of a search engine over a long period of time to simulate randomized data collection, so that our approach can be used at very low cost. The second is to replace exact matching (of recommended items in previous work) by \emph{fuzzy} matching (of search result pages) to increase data efficiency, via a better trade-off of bias and variance. Extensive experimental results based on large-scale real search data from a major commercial search engine in the US market demonstrate our approach is promising and has potential for wide use in Web search.

References

[1]
Ricardo Baeza-Yates and Berthier Ribeiro-Neto. Modern Information Retrieval: The Concepts and Technology behind Search. ACM Press Books. Addison-Wesley Professional, 2nd edition, 2011.
[2]
Nicholas J. Belkin. Some(what) grand challenges for information retrieval. ACM SIGIR Forum, 42(1):47--54, 2008.
[3]
Leon Bottou, Jonas Peters, Joaquin Qui~nonero Candela, Denis Xavier Charles, D. Max Chickering, Elon Portugaly, Dipankar Ray, Patrice Simard, and Ed Snelson. Counterfactual reasoning and learning systems: The example of computational advertising. Journal of Machine Learning Research, 14:3207--3260, 2013.
[4]
Christopher J. C. Burges, Tal Shaked, Erin Renshaw, Ari Lazier, Matt Deeds, Nicole Hamilton, and Gregory N. Hullender. Learning to rank using gradient descent. In Proceedings of the 22nd International Conference on Machine Learning, pages 89--96, 2005.
[5]
Georg Buscher, Ludger van Elst, and Andreas Dengel. Segment-level display time as implicit feedback: A comparison to eye tracking. In Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 67--74, 2009.
[6]
Olivier Chapelle, Thorsten Joachims, Filip Radlinski, and Yisong Yue. Large scale validation and analysis of interleaved search evaluation. ACM Transactions on Information Science, 30(1), 2012.
[7]
Olivier Chapelle, Eren Manavoglu, and Romer Rosales. Simple and scalable response prediction for display advertising. ACM Transactions on Intelligent Systems and Technology. To appear.
[8]
Olivier Chapelle and Ya Zhang. A dynamic Bayesian network click model for Web search ranking. In Proceedings of the 18th International Conference on World Wide Web, pages 1--10, 2009.
[9]
Aleksandr Chuklin, Pavel Serdyukov, and Maarten de Rijke. Click model-based information retrieval metrics. In Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 493--502, 2013.
[10]
Georges Dupret and Benjamin Piwowarski. A user browsing model to predict search engine click data from past observations. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 331--338, 2008.
[11]
Fan Guo, Chao Liu, Anitha Kannan, Tom Minka, Michael J. Taylor, Yi-Min Wang, and Christos Faloutsos. Click chain model in Web search. In Proceedings of the 18th International Conference on World Wide Web, pages 11--20, 2009.
[12]
Katja Hofmann, Anne Schuth, Shimon Whiteson, and Maarten de Rijke. Reusing historical interaction data for faster online learning to rank for IR. In Proceedings of the 6th ACM International Conference on Web Search and Data Mining, pages 183--192, 2013.
[13]
Katja Hofmann, Shimon Whiteson, and Maarten de Rijke. Estimating interleaved comparison outcomes from historical click data. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pages 1779--1783, 2012.
[14]
Kalervo Jarvelin and Jaana Kekalainen. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems, 20(4):422--446, 2002.
[15]
Gabriella Kazai and Homer Sung. Dissimilarity based query selection for efficient preference based IR evaluation. In Proceedings of the European Conference on Information Retrieval, pages 172--183, 2014.
[16]
Ron Kohavi, Roger Longbotham, Dan Sommerfield, and Randal M. Henne. Controlled experiments on the web: Survey and practical guide. Data Minining and Knowledge Discovery, 18:140--181, 2009.
[17]
Diane Lambert and Daryl Pregibon. More bang for their bucks: Assessing new features for online advertisers. SIGKDD Explorations, 9(2):100--107, 2007.
[18]
John Langford, Alexander L. Strehl, and Jennifer Wortman. Exploration scavenging. In Proceedings of the 25th International Conference on Machine Learning, pages 528--535, 2008.
[19]
Lihong Li, Shunbao Chen, Ankur Gupta, and Jim Kleban. Counterfactual analysis of click metrics for search engine optimization. Technical Report MSR-TR-2014-32, Microsoft Research, 2014.
[20]
Lihong Li, Wei Chu, John Langford, and Robert E. Schapire. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th International Conference on World Wide Web, pages 661--670, 2010.
[21]
Lihong Li, Wei Chu, John Langford, and Xuanhui Wang. Unbiased offine evaluation of contextual-bandit-based news article recommendation algorithms. In Proceedings of the 4th International Conference on Web Search and Data Mining, pages 297--306, 2011.
[22]
Lihong Li, Remi Munos, and Csaba Szepesvari. On minimax optimal off-policy policy evaluation. Technical report, Microsoft Research, 2014.
[23]
Andreas Maurer and Massimiliano Pontil. Empirical Bernstein bounds and sample-variance penalization. In Proceedings of the Twenty-Second Conference on Learning Theory, pages 247--254, 2009.
[24]
Filip Radlinski and Nick Craswell. Optimized interleaving for online retrieval evaluation. In Proceedings of the 6th ACM International Conference on Web Search and Data Mining, pages 245--254, 2013.
[25]
Stephen Robertson. On the history of evaluation in IR. Journal of Information Science, 34(4):439--456, 2008.
[26]
Alexander L. Strehl, John Langford, Lihong Li, and Sham M. Kakade. Learning from logged implicit exploration data. In Advances in Neural Information Processing Systems 23, pages 2217--2225, 2011.
[27]
Liang Tang, Romer Rosales, Ajit Singh, and Deepak Agarwal. Automatic ad format selection via contextual bandits. In Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, pages 1587--1594, 2013.
[28]
Andrew H. Turpin and William Hersh. Why batch and user evaluations do not give the same results. In Proceedings of the 24th ACM SIGIR Conference on Research and Development in Information Retrieval, pages 225--231, 2001.
[29]
Zhaohui Zheng, Hongyuan Zha, Tong Zhang, Olivier Chapelle, Keke Chen, and Gordon Sun. A general boosting method and its application to learning ranking functions for web search. In Advances in Neural Information Processing Systems 20, pages 1000--1007, 2008.

Cited By

View all
  • (2024)Automating Pipelines of A/B Tests with Population Split Using Self-Adaptation and Machine LearningProceedings of the 19th International Symposium on Software Engineering for Adaptive and Self-Managing Systems10.1145/3643915.3644087(84-97)Online publication date: 15-Apr-2024
  • (2024)A/B testingJournal of Systems and Software10.1016/j.jss.2024.112011211:COnline publication date: 2-Jul-2024
  • (2023)Validating Synthetic Usage Data in Living Lab EnvironmentsJournal of Data and Information Quality10.1145/3623640Online publication date: 24-Sep-2023
  • Show More Cited By

Index Terms

  1. Toward Predicting the Outcome of an A/B Experiment for Search Relevance

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WSDM '15: Proceedings of the Eighth ACM International Conference on Web Search and Data Mining
    February 2015
    482 pages
    ISBN:9781450333177
    DOI:10.1145/2684822
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 02 February 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. contextual bandits
    2. evaluation
    3. experimentation
    4. information retrieval
    5. web search

    Qualifiers

    • Research-article

    Conference

    WSDM 2015

    Acceptance Rates

    WSDM '15 Paper Acceptance Rate 39 of 238 submissions, 16%;
    Overall Acceptance Rate 498 of 2,863 submissions, 17%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)10
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 21 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Automating Pipelines of A/B Tests with Population Split Using Self-Adaptation and Machine LearningProceedings of the 19th International Symposium on Software Engineering for Adaptive and Self-Managing Systems10.1145/3643915.3644087(84-97)Online publication date: 15-Apr-2024
    • (2024)A/B testingJournal of Systems and Software10.1016/j.jss.2024.112011211:COnline publication date: 2-Jul-2024
    • (2023)Validating Synthetic Usage Data in Living Lab EnvironmentsJournal of Data and Information Quality10.1145/3623640Online publication date: 24-Sep-2023
    • (2023)Meta Policy Learning for Cold-Start Conversational RecommendationProceedings of the Sixteenth ACM International Conference on Web Search and Data Mining10.1145/3539597.3570443(222-230)Online publication date: 27-Feb-2023
    • (2022)KuaiRecProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557220(540-550)Online publication date: 17-Oct-2022
    • (2022)Offline Evaluation of Ranked Lists using Parametric Estimation of PropensitiesProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3532032(622-632)Online publication date: 6-Jul-2022
    • (2021)Control variates for slate off-policy evaluationProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3540541(3667-3679)Online publication date: 6-Dec-2021
    • (2021)UserSim: User Simulation via Supervised GenerativeAdversarial NetworkProceedings of the Web Conference 202110.1145/3442381.3450125(3582-3589)Online publication date: 19-Apr-2021
    • (2020)Keeping Dataset Biases out of the SimulationProceedings of the 14th ACM Conference on Recommender Systems10.1145/3383313.3412252(190-199)Online publication date: 22-Sep-2020
    • (2019)Off-Policy Evaluation of Probabilistic Identity Data in Lookalike ModelingProceedings of the Twelfth ACM International Conference on Web Search and Data Mining10.1145/3289600.3291033(483-491)Online publication date: 30-Jan-2019
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media