Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2556195.2556256acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

Relative confidence sampling for efficient on-line ranker evaluation

Published: 24 February 2014 Publication History
  • Get Citation Alerts
  • Abstract

    A key challenge in information retrieval is that of on-line ranker evaluation: determining which one of a finite set of rankers performs the best in expectation on the basis of user clicks on presented document lists. When the presented lists are constructed using interleaved comparison methods, which interleave lists proposed by two different candidate rankers, then the problem of minimizing the total regret accumulated while evaluating the rankers can be formalized as a K-armed dueling bandits problem. In this paper, we propose a new method called relative confidence sampling (RCS) that aims to reduce cumulative regret by being less conservative than existing methods in eliminating rankers from contention. In addition, we present an empirical comparison between RCS and two state-of-the-art methods, relative upper confidence bound and SAVAGE. The results demonstrate that RCS can substantially outperform these alternatives on several large learning to rank datasets.

    References

    [1]
    S. Agrawal and N. Goyal. Analysis of Thompson sampling for the multi-armed bandit problem. In Conference on Learning Theory, pages 1--26, 2012.
    [2]
    P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47(2-3):235--256, 2002.
    [3]
    S. Bubeck, R. Munos, G. Stoltz, and C. Szepesvari. X-armed bandits. Journal of Machine Learning Research, 12:1655--1695, 2011.
    [4]
    O. Chapelle and Y. Chang. Yahoo! learning to rank challenge overview. Journal of Machine Learning Research-Proceedings Track, 14:1--24, 2011.
    [5]
    O. Chapelle and L. Li. An empirical evaluation of Thompson sampling. In NIPS, 2011.
    [6]
    O. Chapelle, T. Joachims, F. Radlinski, and Y. Yue. Large-scale validation and analysis of interleaved search evaluation. ACM Trans. Inf. Syst., 30(1): 6:1--6:41, 2012.
    [7]
    A. Chuklin, A. Schuth, K. Hofmann, P. Serdyukov, and M. de Rijke. Evaluating aggregated search using interleaving. In CIKM 2013. ACM, October 2013.
    [8]
    C. W. Cleverdon, J. Mills, and M. Keen. Factors determining the performance of indexing systems. In ASLIB Cranfield project. 1966.
    [9]
    N. Craswell, O. Zoeter, M. Taylor, and B. Ramsey. An experimental comparison of click position-bias models. In WSDM '08, pages 87--94, 2008.
    [10]
    N. de Freitas, A. Smola, and M. Zoghi. Exponential regret bounds for Gaussian process bandits with deterministic observations. In ICML, 2012.
    [11]
    F. Guo, L. Li, and C. Faloutsos. Tailoring click models to user goals. In WSCD '09, pages 88--92, 2009.
    [12]
    F. Guo, C. Liu, and Y. M. Wang. Efficient multiple-click models in web search. In WSDM '09, pages 124--131, New York, NY, USA, 2009. ACM.
    [13]
    J. He, C. Zhai, and X. Li. Evaluation of methods for relative comparison of retrieval systems based on clickthroughs. In CIKM '09, pages 2029--2032, 2009.
    [14]
    K. Hofmann, S. Whiteson, and M. de Rijke. A probabilistic method for inferring preferences from clicks. In CIKM '11, pages 249--258, USA, 2011.
    [15]
    K. Hofmann, S. Whiteson, and M. de Rijke. Fidelity, soundness, and efficiency of interleaved comparison methods. ACM Trans. Inf. Syst., 31(4), 2013.
    [16]
    K. Hofmann, S. Whiteson, and M. de Rijke. Balancing exploration and exploitation in listwise and pairwise online learning to rank for information retrieval. Information Retrieval, 16(1):63--90, 2013.
    [17]
    S. Ji, K. Zhou, C. Liao, Z. Zheng, G.-R. Xue, O. Chapelle, G. Sun, and H. Zha. Global ranking by exploiting user clicks. In SIGIR '09, pages 35--42, 2009.
    [18]
    T. Joachims. Optimizing search engines using clickthrough data. In KDD '02, pages 133--142, 2002.
    [19]
    T. Joachims. Evaluating retrieval performance using clickthrough data. In J. Franke, G. Nakhaeizadeh, and I. Renz, editors, Text Mining, pages 79--96. 2003.
    [20]
    S. Jung, J. L. Herlocker, and J. Webster. Click data as implicit relevance feedback in web search. Information Processing & Management, 43(3):791--807, 2007.
    [21]
    J. Kamps, M. Koolen, and A. Trotman. Comparative analysis of clicks and judgments for IR evaluation. In WSCD'09, pages 80--87, 2009.
    [22]
    E. Kauffmann, N. Korda, and R. Munos. Thompson sampling: an asymptotically optimal finite time analysis. In ALT'12, pages 199--213, 2012.
    [23]
    T. Lai and H. Robbins. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6(1):4--22, 1985.
    [24]
    Microsoft Learning to Rank Datasets, 2012. http://research.microsoft.com/en-us/projects/mslr/default.aspx.
    [25]
    R. Munos. Optimistic optimization of a deterministic function without the knowledge of its smoothness. In NIPS, 2011.
    [26]
    F. Radlinski and N. Craswell. Comparing the sensitivity of information retrieval metrics. In SIGIR '10, pages 667--674, 2010.
    [27]
    F. Radlinski and N. Craswell. Optimized interleaving for online retrieval evaluation. In WSDM '13, 2013.
    [28]
    F. Radlinski, M. Kurup, and T. Joachims. How does clickthrough data reect retrieval quality? In CIKM'08, pages 43--52, 2008.
    [29]
    F. Scholer, M. Shokouhi, B. Billerbeck, and A. Turpin. Using clicks as implicit judgments: expectations versus observations. In ECIR'08, pages 28--39, 2008.
    [30]
    N. Srinivas, A. Krause, S. M. Kakade, and M. Seeger. Gaussian process optimization in the bandit setting: No regret and experimental design. In ICML, 2010.
    [31]
    W. Thompson. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, pages 285--294, 1933.
    [32]
    T. Urvoy, F. Clerot, R. Féraud, and S. Naamane. Generic exploration and k-armed voting bandits. In ICML, 2013.
    [33]
    M. Valko, A. Carpentier, and R. Munos. Stochastic simultaneous optimistic optimization. In ICML, 2013.
    [34]
    E. M. Voorhees and D. K. Harman. TREC: Experiment and Evaluation in Information Retrieval. MIT Press, 2005.
    [35]
    Y. Yue and T. Joachims. Interactively optimizing information retrieval systems as a dueling bandits problem. In ICML, 2009.
    [36]
    Y. Yue and T. Joachims. Beat the mean bandit. In ICML, 2011.
    [37]
    Y. Yue, J. Broder, R. Kleinberg, and T. Joachims. The K-armed dueling bandits problem. Journal of Computer and System Sciences, 78(5):1538--1556, Sept. 2009.
    [38]
    M. Zoghi, S. Whiteson, R. Munos, and M. de Rijke. Relative upper confidence bound for the k-armed dueling bandits problem. Techn. Report arXiv:1312.3393, 2013.

    Cited By

    View all
    • (2023)Principled reinforcement learning with human feedback from pairwise or K-wise comparisonsProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3620222(46037-43067)Online publication date: 23-Jul-2023
    • (2022)Dirichlet–Luce choice model for learning from interactionsUser Modeling and User-Adapted Interaction10.1007/s11257-022-09331-032:4(611-648)Online publication date: 4-Jun-2022
    • (2020)MergeDTSACM Transactions on Information Systems10.1145/341175338:4(1-28)Online publication date: 10-Sep-2020
    • Show More Cited By

    Index Terms

    1. Relative confidence sampling for efficient on-line ranker evaluation

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      WSDM '14: Proceedings of the 7th ACM international conference on Web search and data mining
      February 2014
      712 pages
      ISBN:9781450323512
      DOI:10.1145/2556195
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 24 February 2014

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. evaluation
      2. implicit feedback
      3. on-line learning

      Qualifiers

      • Research-article

      Conference

      WSDM 2014

      Acceptance Rates

      WSDM '14 Paper Acceptance Rate 64 of 355 submissions, 18%;
      Overall Acceptance Rate 498 of 2,863 submissions, 17%

      Upcoming Conference

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)16
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 10 Aug 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)Principled reinforcement learning with human feedback from pairwise or K-wise comparisonsProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3620222(46037-43067)Online publication date: 23-Jul-2023
      • (2022)Dirichlet–Luce choice model for learning from interactionsUser Modeling and User-Adapted Interaction10.1007/s11257-022-09331-032:4(611-648)Online publication date: 4-Jun-2022
      • (2020)MergeDTSACM Transactions on Information Systems10.1145/341175338:4(1-28)Online publication date: 10-Sep-2020
      • (2020)DUELING BANDIT PROBLEMSProbability in the Engineering and Informational Sciences10.1017/S026996482000060136:2(264-275)Online publication date: 20-Nov-2020
      • (2020)Counterfactual Online Learning to RankAdvances in Information Retrieval10.1007/978-3-030-45439-5_28(415-430)Online publication date: 8-Apr-2020
      • (2019)Bandit algorithms in information retrieval evaluation and rankingJournal of Physics: Conference Series10.1088/1742-6596/1339/1/0120051339(012005)Online publication date: 16-Dec-2019
      • (2019)Optimizing Ranking Models in an Online SettingAdvances in Information Retrieval10.1007/978-3-030-15712-8_25(382-396)Online publication date: 7-Apr-2019
      • (2018)Advancements in dueling banditsProceedings of the 27th International Joint Conference on Artificial Intelligence10.5555/3304652.3304790(5502-5510)Online publication date: 13-Jul-2018
      • (2018)Differentiable Unbiased Online Learning to RankProceedings of the 27th ACM International Conference on Information and Knowledge Management10.1145/3269206.3271686(1293-1302)Online publication date: 17-Oct-2018
      • (2016)Double Thompson sampling for dueling banditsProceedings of the 30th International Conference on Neural Information Processing Systems10.5555/3157096.3157169(649-657)Online publication date: 5-Dec-2016
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media