research-article

Fast Ranking with Additive Ensembles of Oblivious and Non-Oblivious Regression Trees

Authors:

Claudio Lucchese,

Franco Maria Nardini,

Salvatore Orlando,

Raffaele Perego,

Nicola Tonellotto,

Rossano VenturiniAuthors Info & Claims

ACM Transactions on Information Systems (TOIS), Volume 35, Issue 2

Article No.: 15, Pages 1 - 31

https://doi.org/10.1145/2987380

Published: 12 December 2016 Publication History

Abstract

Learning-to-Rank models based on additive ensembles of regression trees have been proven to be very effective for scoring query results returned by large-scale Web search engines. Unfortunately, the computational cost of scoring thousands of candidate documents by traversing large ensembles of trees is high. Thus, several works have investigated solutions aimed at improving the efficiency of document scoring by exploiting advanced features of modern CPUs and memory hierarchies. In this article, we present QuickScorer, a new algorithm that adopts a novel cache-efficient representation of a given tree ensemble, performs an interleaved traversal by means of fast bitwise operations, and supports ensembles of oblivious trees. An extensive and detailed test assessment is conducted on two standard Learning-to-Rank datasets and on a novel very large dataset we made publicly available for conducting significant efficiency tests. The experiments show unprecedented speedups over the best state-of-the-art baselines ranging from 1.9 × to 6.6 × . The analysis of low-level profiling traces shows that QuickScorer efficiency is due to its cache-aware approach in terms of both data layout and access patterns and to a control flow that entails very low branch mis-prediction rates.

References

[1]

Nima Asadi, Jimmy Lin, and Arjen P. de Vries. 2014. Runtime optimizations for tree-based machine learning models. IEEE Trans. Knowl. Data Eng. 26, 9 (2014), 2281--2292.

[2]

Nima Asadi and Jimmy J. Lin. 2013. Training efficient tree-based models for document ranking. In Proceedings of the 35th European Conference on Information Retrieval (ECIR). Springer, 146--157.

Digital Library

[3]

Christopher J. C. Burges. 2010. From RankNet to LambdaRank to LambdaMART: An Overview. Technical Report MSR-TR-2010-82.

[4]

Berkant Barla Cambazoglu, Hugo Zaragoza, Olivier Chapelle, Jiang Chen, Ciya Liao, Zhaohui Zheng, and Jon Degenhardt. 2010. Early exit optimizations for additive machine learned ranking systems. In Proceedings of the 3rd International Conference on Web Search and Data Mining (WSDM). ACM, 411--420.

Digital Library

[5]

Gabriele Capannini, Domenico Dato, Claudio Lucchese, Monica Mori, Franco Maria Nardini, Salvatore Orlando, Raffaele Perego, and Nicola Tonellotto. 2015. QuickRank: A C++ suite of learning to rank algorithms. In Proceedings of the 6th Italian Information Retrieval Workshop (IIR).

[6]

Gabriele Capannini, Claudio Lucchese, Franco Maria Nardini, Salvatore Orlando, Raffaele Perego, and Nicola Tonellotto. 2016. Quality versus efficiency in document scoring with learning-to-rank models. Information Processing 8 Management (2016). In press.

Digital Library

[7]

Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD Conference on Knowledge Discovery 8 Data Mining (KDD). ACM. In press.

Digital Library

[8]

Domenico Dato, Claudio Lucchese, Franco Maria Nardini, Salvatore Orlando, Raffaele Perego, Nicola Tonellotto, and Rossano Venturini. 2015. A method to rank documents by a computer, using additive ensembles of regression trees and cache optimization, and search engine using such a method. Tiscali S.p.A. PCT29914, (pending) (2015).

[9]

Jerome H. Friedman. 2001. Greedy function approximation: A gradient boosting machine. Ann. Stat. (2001), 1189--1232.

[10]

Yasser Ganjisaffar, Rich Caruana, and Cristina Videira Lopes. 2011. Bagging gradient-boosted trees for high precision, low variance ranking models. In Proceeding of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). ACM, 85--94.

Digital Library

[11]

Andrey Gulin, Igor Kuralenok, and Dmitry Pavlov. 2011. Winning the transfer learning track of Yahoo&excl;’s learning to rank challenge with yetirank. In Workshop and Conference Proceedings, JMLR. 63--76.

[12]

Kalervo Järvelin and Jaana Kekäläinen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Trans. Inform. Syst. 20, 4 (2002), 422--446.

Digital Library

[13]

Xin Jin, Tao Yang, and Xun Tang. 2016. A comparison of cache blocking methods for fast execution of ensemble-based score computation. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). ACM, 629--638.

Digital Library

[14]

Ron Kohavi. 1994. Bottom-up induction of oblivious read-once decision graphs: Strengths and limitations. In Proceedings of the 12th National Conference on Artificial Intelligence (AAAI). AAAI Press, 613--618.

Digital Library

[15]

Pat Langley and Stephanie Sage. 1994. Oblivious decision trees and abstract cases. In Working Notes of the AAAI-94 Workshop on Case-Based Reasoning. AAAI Press, 113--117.

[16]

Tie-Yan Liu. 2009. Learning to rank for information retrieval. Found. Trends Inform. Retriev. 3, 3 (2009), 225--331.

Digital Library

[17]

Claudio Lucchese, Franco Maria Nardini, Salvatore Orlando, Raffaele Perego, Fabrizio Silvestri, and Salvatore Trani. 2016. Post-learning optimization of tree ensembles for efficient ranking. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). ACM, 949--952.

Digital Library

[18]

Claudio Lucchese, Franco Maria Nardini, Salvatore Orlando, Raffaele Perego, Nicola Tonellotto, and Rossano Venturini. 2015. QuickScorer: A fast algorithm to rank documents with additive ensembles of regression trees. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). ACM, 73--82.

Digital Library

[19]

Claudio Lucchese, Franco Maria Nardini, Salvatore Orlando, Raffaele Perego, Nicola Tonellotto, and Rossano Venturini. 2016. Exploiting CPU SIMD extensions to speed-up document scoring with tree ensembles. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). ACM, 833--836.

Digital Library

[20]

David Patterson and John Hennessy. 2014. Computer Organization and Design (5th ed.). Morgan Kaufmann.

Digital Library

[21]

Stephen Robertson and Hugo Zaragoza. 2009. The probabilistic relevance framework: BM25 and beyond. Found. Trends Inform. Retriev. 3, 4 (2009), 333--389.

Digital Library

[22]

Ilya Segalovich. 2010. Machine learning in search quality at Yandex. Presentation at the industry track of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). Retrieved from http://download.yandex.ru/company/presentation/yandex-sigir.ppt. (2010).

[23]

Toby Sharp. 2008. Implementing decision trees and forests on a GPU. In Proc. Computer Vision 2008. Springer, 595--608.

[24]

Xun Tang, Xin Jin, and Tao Yang. 2014. Cache-conscious runtime optimization for ranking ensembles. In Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). ACM, 1123--1126.

Digital Library

[25]

Brian Van Essen, Chris Macaraeg, Maya Gokhale, and Ryan Prenger. 2012. Accelerating a random forest classifier: Multi-core, GP-GPU, or FPGA? In Proceedings of the 20th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, 232--239.

Digital Library

[26]

Paul Viola and Michael J. Jones. 2004. Robust real-time face detection. Int. J. Comput. Vis. 57, 2 (2004), 137--154.

Digital Library

[27]

Lidan Wang, Jimmy J. Lin, and Donald Metzler. 2010a. Learning to efficiently rank. In Proceeding of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). ACM, 138--145.

Digital Library

[28]

Lidan Wang, Jimmy J. Lin, and Donald Metzler. 2011. A cascade ranking model for efficient ranked retrieval. In Proceeding of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). ACM, 105--114.

Digital Library

[29]

Lidan Wang, Donald Metzler, and Jimmy J. Lin. 2010b. Ranking under temporal constraints. In Proceedings of the 19th ACM Conference on Information and Knowledge Management (CIKM). ACM, 79--88.

Digital Library

[30]

Qiang Wu, Christopher J. C. Burges, Krysta M. Svore, and Jianfeng Gao. 2010. Adapting boosting for information retrieval measures. Information Retrieval (2010).

Digital Library

[31]

Zhixiang Xu, Kilian Weinberger, and Olivier Chapelle. 2012. The greedy miser: Learning under test-time budgets. In Proceedings of the 29th International Conference on Machine Learning (ICML). ACM, 1175--1182.

[32]

Dawei Yin, Yuening Hu, Jiliang Tang, Tim Daly Jr., Mianwei Zhou, Hua Ouyang, Jianhui Chen, Changsung Kang, Hongbo Deng, Chikashi Nobata, Jean-Marc Langlois, and Yi Chang. 2016. Ranking relevance in yahoo search. In Proceedings of the 22nd ACM SIGKDD Conference on Knowledge Discovery 8 Data Mining (KDD). ACM. In press.

Digital Library

Cited By

Kang Jde Rijke MOosterhuis HHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)Estimating the Hessian Matrix of Ranking Objectives for Stochastic Learning to Rank with Gradient Boosted TreesProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657918(2390-2394)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657918
Hager PDeffayet RRenders JZoeter Ode Rijke MHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)Unbiased Learning to Rank Meets Reality: Lessons from Baidu's Large-Scale Search DatasetProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657892(1546-1556)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657892
Yang THan CLuo CGupta PPhillips JAi QChua TNgo CKa-Wei Lee RKumar RLauw H(2024)Mitigating Exploitation Bias in Learning to Rank with an Uncertainty-aware Empirical Bayes ApproachProceedings of the ACM Web Conference 202410.1145/3589334.3645487(1486-1496)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645487
Show More Cited By

Index Terms

Fast Ranking with Additive Ensembles of Oblivious and Non-Oblivious Regression Trees
1. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results
      1. Retrieval efficiency
    2. Retrieval models and ranking
      1. Learning to rank

Recommendations

Post-Learning Optimization of Tree Ensembles for Efficient Ranking
SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval

Learning to Rank (LtR) is the machine learning method of choice for producing high quality document ranking functions from a ground-truth of training examples. In practice, efficiency and effectiveness are intertwined concepts and trading off ...
QuickScorer: A Fast Algorithm to Rank Documents with Additive Ensembles of Regression Trees
SIGIR '15: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval

Learning-to-Rank models based on additive ensembles of regression trees have proven to be very effective for ranking query results returned by Web search engines, a scenario where quality and efficiency requirements are very demanding. Unfortunately, ...
Exploiting CPU SIMD Extensions to Speed-up Document Scoring with Tree Ensembles
SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval

Scoring documents with learning-to-rank (LtR) models based on large ensembles of regression trees is currently deemed one of the best solutions to effectively rank query results to be returned by large scale Information Retrieval systems. This paper ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Information Systems

ACM Transactions on Information Systems Volume 35, Issue 2

April 2017

232 pages

ISSN:1046-8188

EISSN:1558-2868

DOI:10.1145/3001595

Editor:
Maarten de Rijke
University of Amsterdam, The Netherlands

Issue’s Table of Contents

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 December 2016

Accepted: 01 August 2016

Revised: 01 August 2016

Received: 01 January 2016

Published in TOIS Volume 35, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

54
Total Citations
View Citations
516
Total Downloads

Downloads (Last 12 months)41
Downloads (Last 6 weeks)2

Reflects downloads up to 30 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Kang Jde Rijke MOosterhuis HHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)Estimating the Hessian Matrix of Ranking Objectives for Stochastic Learning to Rank with Gradient Boosted TreesProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657918(2390-2394)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657918
Hager PDeffayet RRenders JZoeter Ode Rijke MHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)Unbiased Learning to Rank Meets Reality: Lessons from Baidu's Large-Scale Search DatasetProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657892(1546-1556)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657892
Yang THan CLuo CGupta PPhillips JAi QChua TNgo CKa-Wei Lee RKumar RLauw H(2024)Mitigating Exploitation Bias in Learning to Rank with an Uncertainty-aware Empirical Bayes ApproachProceedings of the ACM Web Conference 202410.1145/3589334.3645487(1486-1496)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645487
Mao HZou LZheng YTang JChu XZhao JWang QYin DChua TNgo CKa-Wei Lee RKumar RLauw H(2024)Whole Page Unbiased Learning to RankProceedings of the ACM Web Conference 202410.1145/3589334.3645474(1431-1440)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645474
Lyu LRoy NOosterhuis HAnand A(2024)Is Interpretable Machine Learning Effective at Feature Selection for Neural Learning-to-Rank?Advances in Information Retrieval10.1007/978-3-031-56066-8_29(384-402)Online publication date: 24-Mar-2024
https://dl.acm.org/doi/10.1007/978-3-031-56066-8_29
Tan HYang KYu H(2024)An In-Depth Comparison of Neural and Probabilistic Tree Models for Learning-to-rankAdvances in Information Retrieval10.1007/978-3-031-56063-7_39(468-476)Online publication date: 23-Mar-2024
https://doi.org/10.1007/978-3-031-56063-7_39
Marcuzzi FLucchese COrlando SFrommholz IHopfgartner FLee MOakes MLalmas MZhang MSantos R(2023)LambdaRank Gradients are IncoherentProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3614948(1777-1786)Online publication date: 21-Oct-2023
https://dl.acm.org/doi/10.1145/3583780.3614948
Bai AJagerman RQin ZYan LKar PLin BWang XBendersky MNajork MFrommholz IHopfgartner FLee MOakes MLalmas MZhang MSantos R(2023)Regression Compatible Listwise Objectives for Calibrated Ranking with Binary RelevanceProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3614712(4502-4508)Online publication date: 21-Oct-2023
https://dl.acm.org/doi/10.1145/3583780.3614712
Bruch SLucchese CNardini F(2023)Report on the 1st Workshop on Reaching Efficiency in Neural Information Retrieval (ReNeuIR 2022) at SIGIR 2022ACM SIGIR Forum10.1145/3582900.358291656:2(1-14)Online publication date: 31-Jan-2023
https://dl.acm.org/doi/10.1145/3582900.3582916
Gupta SOosterhuis Hde Rijke MChen HDuh WHuang HKato MMothe JPoblete B(2023)Safe Deployment for Counterfactual Learning to Rank with Exposure-Based Risk MinimizationProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591760(249-258)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3539618.3591760
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents