Theoretical Analysis on the Efficiency of Interleaved Comparisons

Iizuka, Kojiro; Morita, Hajime; Kato, Makoto P.

doi:10.1007/978-3-031-28244-7_29

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13980))

Included in the following conference series:

European Conference on Information Retrieval

1712 Accesses

Abstract

This study presents a theoretical analysis on the efficiency of interleaving, an efficient online evaluation method for rankings. Although interleaving has already been applied to production systems, the source of its high efficiency has not been clarified in the literature. Therefore, this study presents a theoretical analysis on the efficiency of interleaving methods. We begin by designing a simple interleaving method similar to ordinary interleaving methods. Then, we explore a condition under which the interleaving method is more efficient than A/B testing and find that this is the case when users leave the ranking depending on the item’s relevance, a typical assumption made in click models. Finally, we perform experiments based on numerical analysis and user simulation, demonstrating that the theoretical results are consistent with the empirical results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Stat-Weight: Improving the Estimator of Interleaved Methods Outcomes with Statistical Hypothesis Testing

The effect of pooling and evaluation depth on IR metrics

Article 21 June 2016

Online Experimentation for Information Retrieval

Notes

1.
https://github.com/mpkato/interleaving.

References

Allan, J., Carterette, B., Aslam, J.A., Pavlu, V., Dachev, B., Kanoulas, E.: Million query track 2007 overview. Tech. Rep., Massachusetts University Amherst Department of Computer Science (2007)
Google Scholar
Brost, B., Cox, I.J., Seldin, Y., Lioma, C.: An improved multileaving algorithm for online ranker evaluation. In: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, pp. 745–748 (2016)
Google Scholar
Chapelle, O., Joachims, T., Radlinski, F., Yue, Y.: Large-scale validation and analysis of interleaved search evaluation. ACM Trans. Inf. Syst. (TOIS) 30(1), 1–41 (2012)
Article Google Scholar
Chapelle, O., Zhang, Y.: A dynamic Bayesian network click model for web search ranking. In: Proceedings of the 18th International Conference on World Wide Web, pp. 1–10 (2009)
Google Scholar
Chuklin, A., Markov, I., De Rijke, M.: Click models for web search. Morgan & Claypool Publishers (2015)
Google Scholar
Clarke, C.L., Craswell, N., Soboroff, I.: Overview of the TREC 2009 web track. Tech. rep., Waterloo University (Ontario) (2009)
Google Scholar
Craswell, N., Zoeter, O., Taylor, M., Ramsey, B.: An experimental comparison of click position-bias models. In: Proceedings of the 1st ACM International Conference on Web Search and Data Mining, pp. 87–94 (2008)
Google Scholar
Deng, A., Xu, Y., Kohavi, R., Walker, T.: Improving the sensitivity of online controlled experiments by utilizing pre-experiment data. In: Proceedings of the 6th ACM International Conference on Web Search and Data Mining, pp. 123–132 (2013)
Google Scholar
Dupret, G.E., Piwowarski, B.: A user browsing model to predict search engine click data from past observations. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 331–338 (2008)
Google Scholar
Grbovic, M., Cheng, H.: Real-time personalization using embeddings for search ranking at airbnb. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 311–320. KDD 2018, Association for Computing Machinery (2018)
Google Scholar
Guo, F., Liu, C., Wang, Y.M.: Efficient multiple-click models in web search. In: Proceedings of the Second ACM International Conference on Web Search and Data Mining, pp. 124–131 (2009)
Google Scholar
Hofmann, K., Whiteson, S., De Rijke, M.: A probabilistic method for inferring preferences from clicks. In: Proceedings of the 20th ACM International on Conference on Information and Knowledge Management, pp. 249–258 (2011)
Google Scholar
Hofmann, K., Whiteson, S., Rijke, M.D.: Fidelity, soundness, and efficiency of interleaved comparison methods. ACM Trans. Inf. Syst. (TOIS) 31(4), 1–43 (2013)
Article Google Scholar
Iizuka, K., Seki, Y., Kato, M.P.: Decomposition and interleaving for variance reduction of post-click metrics. In: Proceedings of the 2021 ACM SIGIR International Conference on Theory of Information Retrieval, pp. 221–230 (2021)
Google Scholar
Joachims, T.: Optimizing search engines using clickthrough data. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 133–142 (2002)
Google Scholar
Kharitonov, E., Macdonald, C., Serdyukov, P., Ounis, I.: Using historical click data to increase interleaving sensitivity. In: Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, pp. 679–688 (2013)
Google Scholar
Kharitonov, E., Macdonald, C., Serdyukov, P., Ounis, I.: Generalized team draft interleaving. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pp. 773–782 (2015)
Google Scholar
Okura, S., Tagami, Y., Ono, S., Tajima, A.: Embedding-based news recommendation for millions of users. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1933–1942. KDD 2017, Association for Computing Machinery (2017)
Google Scholar
Oosterhuis, H., de Rijke, M.: Sensitive and scalable online evaluation with theoretical guarantees. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pp. 77–86. CIKM 2017, Association for Computing Machinery (2017)
Google Scholar
Oosterhuis, H., de Rijke, M.: Taking the counterfactual online: efficient and unbiased online evaluation for ranking. In: Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval, pp. 137–144 (2020)
Google Scholar
Poyarkov, A., Drutsa, A., Khalyavin, A., Gusev, G., Serdyukov, P.: Boosted decision tree regression adjustment for variance reduction in online controlled experiments. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 235–244 (2016)
Google Scholar
Qin, T., Liu, T.Y., Xu, J., Li, H.: LETOR: a benchmark collection for research on learning to rank for information retrieval. Inf. Retrieval 13(4), 346–374 (2010)
Article Google Scholar
Radlinski, F., Craswell, N.: Optimized interleaving for online retrieval evaluation. In: Proceedings of the 6th ACM International Conference on Web Search and Data Mining, pp. 245–254 (2013)
Google Scholar
Radlinski, F., Kurup, M., Joachims, T.: How does clickthrough data reflect retrieval quality? In: Proceedings of the 17th ACM conference on Information and Knowledge Management, pp. 43–52 (2008)
Google Scholar
Schuth, A., Sietsma, F., Whiteson, S., Lefortier, D., de Rijke, M.: Multileaved comparisons for fast online evaluation. In: Proceedings of the 23rd ACM International Conference on Information and Knowledge Management, pp. 71–80 (2014)
Google Scholar
Voorhees, E.M., Harman, D.: Overview of TREC 2003. In: TREC, pp. 1–13 (2003)
Google Scholar
Xie, H., Aurisset, J.: Improving the sensitivity of online controlled experiments: case studies at Netflix. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 645–654 (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

Gunosy Inc., Shibuya, Japan
Kojiro Iizuka & Hajime Morita
University of Tsukuba, Tsukuba, Japan
Kojiro Iizuka & Makoto P. Kato

Authors

Kojiro Iizuka
View author publications
You can also search for this author in PubMed Google Scholar
Hajime Morita
View author publications
You can also search for this author in PubMed Google Scholar
Makoto P. Kato
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Kojiro Iizuka , Hajime Morita or Makoto P. Kato .

Editor information

Editors and Affiliations

University of Amsterdam, Amsterdam, The Netherlands
Jaap Kamps
Université Grenoble-Alpes, Saint-Martin-d’Hères, France
Lorraine Goeuriot
Università della Svizzera Italiana, Lugano, Switzerland
Fabio Crestani
University of Copenhagen, Copenhagen, Denmark
Maria Maistro
University of Tsukuba, Ibaraki, Japan
Hideo Joho
Dublin City University, Dublin, Ireland
Brian Davis
Dublin City University, Dublin, Ireland
Cathal Gurrin
Universität Regensburg, Regensburg, Germany
Udo Kruschwitz
Dublin City University, Dublin, Ireland
Annalina Caputo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Iizuka, K., Morita, H., Kato, M.P. (2023). Theoretical Analysis on the Efficiency of Interleaved Comparisons. In: Kamps, J., et al. Advances in Information Retrieval. ECIR 2023. Lecture Notes in Computer Science, vol 13980. Springer, Cham. https://doi.org/10.1007/978-3-031-28244-7_29

Download citation

DOI: https://doi.org/10.1007/978-3-031-28244-7_29
Published: 17 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-28243-0
Online ISBN: 978-3-031-28244-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Theoretical Analysis on the Efficiency of Interleaved Comparisons

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Stat-Weight: Improving the Estimator of Interleaved Methods Outcomes with Statistical Hypothesis Testing

The effect of pooling and evaluation depth on IR metrics

Online Experimentation for Information Retrieval

Notes

References

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Theoretical Analysis on the Efficiency of Interleaved Comparisons

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Stat-Weight: Improving the Estimator of Interleaved Methods Outcomes with Statistical Hypothesis Testing

The effect of pooling and evaluation depth on IR metrics

Online Experimentation for Information Retrieval

Notes

References

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation