Abstract
This study presents a theoretical analysis on the efficiency of interleaving, an efficient online evaluation method for rankings. Although interleaving has already been applied to production systems, the source of its high efficiency has not been clarified in the literature. Therefore, this study presents a theoretical analysis on the efficiency of interleaving methods. We begin by designing a simple interleaving method similar to ordinary interleaving methods. Then, we explore a condition under which the interleaving method is more efficient than A/B testing and find that this is the case when users leave the ranking depending on the item’s relevance, a typical assumption made in click models. Finally, we perform experiments based on numerical analysis and user simulation, demonstrating that the theoretical results are consistent with the empirical results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Allan, J., Carterette, B., Aslam, J.A., Pavlu, V., Dachev, B., Kanoulas, E.: Million query track 2007 overview. Tech. Rep., Massachusetts University Amherst Department of Computer Science (2007)
Brost, B., Cox, I.J., Seldin, Y., Lioma, C.: An improved multileaving algorithm for online ranker evaluation. In: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, pp. 745–748 (2016)
Chapelle, O., Joachims, T., Radlinski, F., Yue, Y.: Large-scale validation and analysis of interleaved search evaluation. ACM Trans. Inf. Syst. (TOIS) 30(1), 1–41 (2012)
Chapelle, O., Zhang, Y.: A dynamic Bayesian network click model for web search ranking. In: Proceedings of the 18th International Conference on World Wide Web, pp. 1–10 (2009)
Chuklin, A., Markov, I., De Rijke, M.: Click models for web search. Morgan & Claypool Publishers (2015)
Clarke, C.L., Craswell, N., Soboroff, I.: Overview of the TREC 2009 web track. Tech. rep., Waterloo University (Ontario) (2009)
Craswell, N., Zoeter, O., Taylor, M., Ramsey, B.: An experimental comparison of click position-bias models. In: Proceedings of the 1st ACM International Conference on Web Search and Data Mining, pp. 87–94 (2008)
Deng, A., Xu, Y., Kohavi, R., Walker, T.: Improving the sensitivity of online controlled experiments by utilizing pre-experiment data. In: Proceedings of the 6th ACM International Conference on Web Search and Data Mining, pp. 123–132 (2013)
Dupret, G.E., Piwowarski, B.: A user browsing model to predict search engine click data from past observations. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 331–338 (2008)
Grbovic, M., Cheng, H.: Real-time personalization using embeddings for search ranking at airbnb. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 311–320. KDD 2018, Association for Computing Machinery (2018)
Guo, F., Liu, C., Wang, Y.M.: Efficient multiple-click models in web search. In: Proceedings of the Second ACM International Conference on Web Search and Data Mining, pp. 124–131 (2009)
Hofmann, K., Whiteson, S., De Rijke, M.: A probabilistic method for inferring preferences from clicks. In: Proceedings of the 20th ACM International on Conference on Information and Knowledge Management, pp. 249–258 (2011)
Hofmann, K., Whiteson, S., Rijke, M.D.: Fidelity, soundness, and efficiency of interleaved comparison methods. ACM Trans. Inf. Syst. (TOIS) 31(4), 1–43 (2013)
Iizuka, K., Seki, Y., Kato, M.P.: Decomposition and interleaving for variance reduction of post-click metrics. In: Proceedings of the 2021 ACM SIGIR International Conference on Theory of Information Retrieval, pp. 221–230 (2021)
Joachims, T.: Optimizing search engines using clickthrough data. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 133–142 (2002)
Kharitonov, E., Macdonald, C., Serdyukov, P., Ounis, I.: Using historical click data to increase interleaving sensitivity. In: Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, pp. 679–688 (2013)
Kharitonov, E., Macdonald, C., Serdyukov, P., Ounis, I.: Generalized team draft interleaving. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pp. 773–782 (2015)
Okura, S., Tagami, Y., Ono, S., Tajima, A.: Embedding-based news recommendation for millions of users. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1933–1942. KDD 2017, Association for Computing Machinery (2017)
Oosterhuis, H., de Rijke, M.: Sensitive and scalable online evaluation with theoretical guarantees. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pp. 77–86. CIKM 2017, Association for Computing Machinery (2017)
Oosterhuis, H., de Rijke, M.: Taking the counterfactual online: efficient and unbiased online evaluation for ranking. In: Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval, pp. 137–144 (2020)
Poyarkov, A., Drutsa, A., Khalyavin, A., Gusev, G., Serdyukov, P.: Boosted decision tree regression adjustment for variance reduction in online controlled experiments. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 235–244 (2016)
Qin, T., Liu, T.Y., Xu, J., Li, H.: LETOR: a benchmark collection for research on learning to rank for information retrieval. Inf. Retrieval 13(4), 346–374 (2010)
Radlinski, F., Craswell, N.: Optimized interleaving for online retrieval evaluation. In: Proceedings of the 6th ACM International Conference on Web Search and Data Mining, pp. 245–254 (2013)
Radlinski, F., Kurup, M., Joachims, T.: How does clickthrough data reflect retrieval quality? In: Proceedings of the 17th ACM conference on Information and Knowledge Management, pp. 43–52 (2008)
Schuth, A., Sietsma, F., Whiteson, S., Lefortier, D., de Rijke, M.: Multileaved comparisons for fast online evaluation. In: Proceedings of the 23rd ACM International Conference on Information and Knowledge Management, pp. 71–80 (2014)
Voorhees, E.M., Harman, D.: Overview of TREC 2003. In: TREC, pp. 1–13 (2003)
Xie, H., Aurisset, J.: Improving the sensitivity of online controlled experiments: case studies at Netflix. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 645–654 (2016)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Iizuka, K., Morita, H., Kato, M.P. (2023). Theoretical Analysis on the Efficiency of Interleaved Comparisons. In: Kamps, J., et al. Advances in Information Retrieval. ECIR 2023. Lecture Notes in Computer Science, vol 13980. Springer, Cham. https://doi.org/10.1007/978-3-031-28244-7_29
Download citation
DOI: https://doi.org/10.1007/978-3-031-28244-7_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-28243-0
Online ISBN: 978-3-031-28244-7
eBook Packages: Computer ScienceComputer Science (R0)