Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3292500.3330676acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Open access

Combining Decision Trees and Neural Networks for Learning-to-Rank in Personal Search

Published: 25 July 2019 Publication History

Abstract

Decision Trees (DTs) like LambdaMART have been one of the most effective types of learning-to-rank algorithms in the past decade. They typically work well with hand-crafted dense features (e.g., BM25 scores). Recently, Neural Networks (NNs) have shown impressive results in leveraging sparse and complex features (e.g., query and document keywords) directly when a large amount of training data is available. While there is a large body of work on how to use NNs for semantic matching between queries and documents, relatively less work has been conducted to compare NNs with DTs for general learning-to-rank tasks, where dense features are also available and DTs can achieve state-of-the-art performance. In this paper, we study how to combine DTs and NNs to effectively bring the benefits from both sides in the learning-to-rank setting. Specifically, we focus our study on personal search where clicks are used as the primary labels with unbiased learning-to-rank algorithms and a significantly large amount of training data is easily available. Our combination methods are based on ensemble learning. We design 12 variants and compare them based on two aspects, ranking effectiveness and ease-of-deployment, using two of the largest personal search services: Gmail search and Google Drive search. We show that direct application of existing ensemble methods can not achieve both aspects. We thus design a novel method that uses NNs to compensate DTs via boosting. We show that such a method is not only easier to deploy, but also gives comparable or better ranking accuracy.

Supplementary Material

MP4 File (p2032-li.mp4)

References

[1]
Eugene Agichtein, Eric Brill, and Susan Dumais. 2006. Improving web search ranking by incorporating user behavior information. In Proceedings of the 29th annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 19--26.
[2]
Qingyao Ai, Keping Bi, Jiafeng Guo, and W Bruce Croft. 2018. Learning a Deep Listwise Context Model for Ranking Refinement. In Proceedings of the 41st annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 135--144.
[3]
Michael Bendersky, Xuanhui Wang, Donald Metzler, and Marc Najork. 2017. Learning from user interactions in personal search via attribute parameterization. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining. ACM, 791--799.
[4]
Leo Breiman. 2001. Random forests. Machine learning, Vol. 45, 1 (2001), 5--32.
[5]
Chris Burges, Tal Shaked, Erin Renshaw, Ari Lazier, Matt Deeds, Nicole Hamilton, and Greg Hullender. 2005. Learning to rank using gradient descent. In Proceedings of the 22nd International Conference on Machine learning. ACM, 89--96.
[6]
Christopher Burges, Krysta Svore, Paul Bennett, Andrzej Pastusiak, and Qiang Wu. 2011. Learning to rank using an ensemble of lambda-gradient models. In Proceedings of the Learning to Rank Challenge. 25--35.
[7]
Christopher JC Burges. 2010. From RankNet to LambdaRank to LambdaMART: An overview. Learning, Vol. 11, 23--581 (2010), 81.
[8]
Christopher J Burges, Robert Ragno, and Quoc V Le. 2007. Learning to rank with nonsmooth cost functions. In Advances in Neural Information Processing Systems. 193--200.
[9]
Olivier Chapelle and Yi Chang. 2011. Yahoo! learning to rank challenge overview. In Proceedings of the Learning to Rank Challenge . 1--24.
[10]
Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, et almbox. 2016. Wide & deep learning for recommender systems. In Proceedings of the 1st Workshop on Deep Learning for Recommender Systems. ACM, 7--10.
[11]
W Bruce Croft, Donald Metzler, and Trevor Strohman. 2010. Search engines: Information retrieval in practice. Vol. 283. Addison-Wesley Reading.
[12]
Christopher D Manning, Prabhakar Raghavan, and Hinrich Schutza. 2008. Introduction to information retrieval. An Introduction To Information Retrieval, Vol. 151, 177 (2008), 5.
[13]
Mostafa Dehghani, Hamed Zamani, Aliaksei Severyn, Jaap Kamps, and W Bruce Croft. 2017. Neural ranking models with weak supervision. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 65--74.
[14]
Pinar Donmez, Krysta M Svore, and Christopher JC Burges. 2009. On the local optimality of LambdaRank. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 460--467.
[15]
Yoav Freund, Raj Iyer, Robert E Schapire, and Yoram Singer. 2003. An efficient boosting algorithm for combining preferences. Journal of Machine Learning Research, Vol. 4, Nov (2003), 933--969.
[16]
Jerome H Friedman. 2001. Greedy function approximation: a gradient boosting machine. Annals of statistics (2001), 1189--1232.
[17]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770--778.
[18]
Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. 2013. Learning deep structured semantic models for web search using clickthrough data. In Proceedings of the 22nd ACM International Conference on Conference on Information & Knowledge Management. ACM, 2333--2338.
[19]
Muhammad Ibrahim and Mark Carman. 2016. Comparing pointwise and listwise objective functions for random-forest-based learning-to-rank. ACM Transactions on Information Systems (TOIS), Vol. 34, 4 (2016), 20.
[20]
Kalervo J"arvelin and Jaana Kek"al"ainen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems (TOIS), Vol. 20, 4 (2002), 422--446.
[21]
Thorsten Joachims. 2006. Training linear SVMs in linear time. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 217--226.
[22]
Thorsten Joachims, Adith Swaminathan, and Tobias Schnabel. 2017. Unbiased learning-to-rank with biased feedback. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining. ACM, 781--789.
[23]
Steven G Krantz and Harold R Parks. 2002. A primer of real analytic functions .Springer Science & Business Media.
[24]
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature, Vol. 521, 7553 (2015), 436.
[25]
Francesco Lettich, Claudio Lucchese, Franco Maria Nardini, Salvatore Orlando, Raffaele Perego, Nicola Tonellotto, and Rossano Venturini. 2018. Parallel Traversal of Large Ensembles of Decision Trees. IEEE Transactions on Parallel and Distributed Systems (2018).
[26]
Pan Li, Arya Mazumdar, and Olgica Milenkovic. 2017. Efficient Rank Aggregation via Lehmer Codes. In Artificial Intelligence and Statistics . 450--459.
[27]
Ping Li, Qiang Wu, and Christopher J Burges. 2008. Mcrank: Learning to rank using multiple classification and gradient boosting. In Advances in Neural Information Processing Systems. 897--904.
[28]
Xiaoliang Ling, Weiwei Deng, Chen Gu, Hucheng Zhou, Cui Li, and Feng Sun. 2017. Model ensemble for click prediction in bing search ads. In Proceedings of the 26th International Conference on World Wide Web Companion. International World Wide Web Conferences Steering Committee, 689--698.
[29]
Tie-Yan Liu. 2009. Learning to rank for information retrieval. Foundations and Trends® in Information Retrieval, Vol. 3, 3 (2009), 225--331.
[30]
Tie-Yan Liu, Jun Xu, Tao Qin, Wenying Xiong, and Hang Li. 2007. Letor: Benchmark dataset for research on learning to rank for information retrieval. In Proceedings of SIGIR 2007 workshop on learning to rank for information retrieval, Vol. 310. ACM Amsterdam, The Netherlands.
[31]
Yoelle Maarek. 2017. Mail Search: It's Getting Personal!. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '17). ACM, New York, NY, USA, 3--3.
[32]
Llew Mason, Jonathan Baxter, Peter L Bartlett, and Marcus R Frean. 2000. Boosting algorithms as gradient descent. In Advances in Neural Information Processing Systems. 512--518.
[33]
Bhaskar Mitra and Nick Craswell. 2017. Neural Models for Information Retrieval. arXiv preprint arXiv:1705.01509 (2017).
[34]
David Opitz and Richard Maclin. 1999. Popular ensemble methods: An empirical study. Journal of artificial intelligence research, Vol. 11 (1999), 169--198.
[35]
Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageRank citation ranking: Bringing order to the web. Technical Report. Stanford InfoLab.
[36]
Natalia Ponomareva, Thomas Colthurst, Gilbert Hendry, Salem Haykal, and Soroush Radpour. 2017. Compact multi-class boosted trees. In 2017 IEEE International Conference on Big Data . IEEE, 47--56.
[37]
Jay M Ponte and W Bruce Croft. 1998. A language modeling approach to information retrieval. In Proceedings of the 21st annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 275--281.
[38]
Si Si, Huan Zhang, Sathiya Keerthi, Druv Mahajan, Inderjit Dhillon, and Cho-Jui Hsieh. 2017. Gradient boosted decision trees for high dimensional sparse output. In International Conference on Machine Learning .
[39]
Ruoxi Wang, Bin Fu, Gang Fu, and Mingliang Wang. 2017. Deep & cross network for ad click predictions. In Proceedings of the ADKDD'17. ACM, 12.
[40]
Xuanhui Wang, Michael Bendersky, Donald Metzler, and Marc Najork. 2016. Learning to rank with selection bias in personal search. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval . ACM, 115--124.
[41]
Xuanhui Wang, Nadav Golbandi, Michael Bendersky, Donald Metzler, and Marc Najork. 2018. Position bias estimation for unbiased learning to rank in personal search. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. ACM, 610--618.
[42]
Qiang Wu, Christopher JC Burges, Krysta M Svore, and Jianfeng Gao. 2010. Adapting boosting for information retrieval measures. Information Retrieval, Vol. 13, 3 (2010), 254--270.
[43]
Jun Xu and Hang Li. 2007. AdaRank: a boosting algorithm for information retrieval. In Proceedings of the 30th annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 391--398.
[44]
Dawei Yin, Yuening Hu, Jiliang Tang, Tim Daly, Mianwei Zhou, Hua Ouyang, Jianhui Chen, Changsung Kang, Hongbo Deng, Chikashi Nobata, et almbox. 2016. Ranking relevance in yahoo search. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 323--332.
[45]
Hamed Zamani, Michael Bendersky, Xuanhui Wang, and Mingyang Zhang. 2017. Situational context for ranking in personal search. In Proceedings of the 26th International Conference on World Wide Web. 1531--1540.

Cited By

View all
  • (2024)Horse race rank prediction using learning-to-rank approachesKorean Journal of Applied Statistics10.5351/KJAS.2024.37.2.23937:2(239-253)Online publication date: 30-Apr-2024
  • (2024)Ensemble Modelling for Predicting Fish MortalityApplied Sciences10.3390/app1415654014:15(6540)Online publication date: 26-Jul-2024
  • (2023)RD-SuiteProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667673(35748-35760)Online publication date: 10-Dec-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
July 2019
3305 pages
ISBN:9781450362016
DOI:10.1145/3292500
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 July 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. decision trees
  2. ensemble methods
  3. learning to rank
  4. neural networks
  5. personal search

Qualifiers

  • Research-article

Conference

KDD '19
Sponsor:

Acceptance Rates

KDD '19 Paper Acceptance Rate 110 of 1,200 submissions, 9%;
Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)279
  • Downloads (Last 6 weeks)33
Reflects downloads up to 24 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Horse race rank prediction using learning-to-rank approachesKorean Journal of Applied Statistics10.5351/KJAS.2024.37.2.23937:2(239-253)Online publication date: 30-Apr-2024
  • (2024)Ensemble Modelling for Predicting Fish MortalityApplied Sciences10.3390/app1415654014:15(6540)Online publication date: 26-Jul-2024
  • (2023)RD-SuiteProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667673(35748-35760)Online publication date: 10-Dec-2023
  • (2023)Tree based Progressive Regression Model for Watch-Time Prediction in Short-video RecommendationProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599919(4497-4506)Online publication date: 6-Aug-2023
  • (2023)Residual Fusion Models with Neural Networks for CTR Prediction2023 8th International Conference on Computer Science and Engineering (UBMK)10.1109/UBMK59864.2023.10286706(01-04)Online publication date: 13-Sep-2023
  • (2023)Making Machine Learning More Energy Efficient by Bringing It Closer to the SensorIEEE Micro10.1109/MM.2023.331634843:6(11-18)Online publication date: 1-Nov-2023
  • (2023)Gradient-Boosted Based Structured and Unstructured LearningArtificial Neural Networks and Machine Learning – ICANN 202310.1007/978-3-031-44213-1_37(439-451)Online publication date: 22-Sep-2023
  • (2022)Scale Calibration of Deep Ranking ModelsProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3534678.3539072(4300-4309)Online publication date: 14-Aug-2022
  • (2022)On Optimizing Top-K Metrics for Neural Ranking ModelsProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531849(2303-2307)Online publication date: 6-Jul-2022
  • (2022)Pushing the Envelope of Gradient Boosting Forests via Globally-Optimized Oblique Trees2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52688.2022.00038(285-294)Online publication date: Jun-2022
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media