research-article

MtCut: A Multi-Task Framework for Ranked List Truncation

Authors:

Hongming PiaoAuthors Info & Claims

WSDM '22: Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining

February 2022

Pages 1054 - 1062

https://doi.org/10.1145/3488560.3498466

Published: 15 February 2022 Publication History

Abstract

Ranked list truncation aims to cut the ranked results in short considering user-defined objectives, which balances the overall utility and user efforts over retrieval results. The exact selection of an optimal cut-off position brings potential benefits in various real-world applications, such as patent search and legal search. However, there is significant retrieval bias in the ranked list. The result scores and the disorder of document sequences cause difficulties in judging the relevance between the queries and documents -- alleviating the existing methods' performance improvement. In this work, we investigate the characteristics of retrieval bias on altering truncation and propose a multi-task truncation model, MtCut. It employs two auxiliary tasks to make complementary for the retrieval bias. As a practical evaluation, we explore its performance on two datasets, and the results show that MtCut outperforms the state-of-the-art methods on both F1-score and DCG metrics.

Supplementary Material

MP4 File (WSDM22-fp495.mp4)

Presentation video of the multi-task truncation model, MtCut, which proposes a novel and efficient solution for ranked list truncation.

Download
36.40 MB

MP4 File (WSDM22-fp495.mp4)

Presentation video of the multi-task truncation model, MtCut, which proposes a novel and efficient solution for ranked list truncation.

Download
36.40 MB

References

[1]

Avi Arampatzis. 2001. Unbiased sd threshold optimization, initial query degradation, decay, and incrementality, for adaptive document filtering. In TREC .

[2]

Avi Arampatzis, Jean Beney, Cornelis HA Koster, and Theo P van der Weide. 2000. Incrementality, half-life, and threshold optimization for adaptive document filtering. (2000).

[3]

Avi Arampatzis, Jaap Kamps, and Stephen Robertson. 2009. Where to stop reading a ranked list? Threshold optimization using truncated score distributions. In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval . 524--531.

Digital Library

[4]

Avi Arampatzis and André van Hameran. 2001. The score-distributional threshold optimization for adaptive binary classification tasks. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval . 285--293.

Digital Library

[5]

Dara Bahri, Yi Tay, Che Zheng, Donald Metzler, and Andrew Tomkins. 2020. Choppy: Cut transformer for ranked list truncation. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 1513--1516.

Digital Library

[6]

Richard A Caruana. 1993. Multitask connectionist learning. In In Proceedings of the 1993 Connectionist Models Summer School. Citeseer.

[7]

J Shane Culpepper, Charles LA Clarke, and Jimmy Lin. 2016. Dynamic cutoff prediction in multi-stage retrieval systems. In Proceedings of the 21st Australasian Document Computing Symposium. 17--24.

Digital Library

[8]

J Shane Culpepper, Fernando Diaz, and Mark D Smucker. 2018. Research frontiers in information retrieval: Report from the third strategic workshop on information retrieval in lorne (swirl 2018). In ACM SIGIR Forum, Vol. 52. ACM New York, NY, USA, 34--90.

Digital Library

[9]

Arthur P Dempster, Nan M Laird, and Donald B Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (Methodological), Vol. 39, 1 (1977), 1--22.

[10]

Michel Deudon. 2018. Learning semantic similarity in a continuous space. In Proceedings of the 32nd International Conference on Neural Information Processing Systems . 994--1005.

[11]

Long Duong, Trevor Cohn, Steven Bird, and Paul Cook. 2015. Low resource dependency parsing: Cross-lingual parameter sharing in a neural network parser. In Proceedings of the 53rd annual meeting of the Association for Computational Linguistics and the 7th international joint conference on natural language processing (volume 2: short papers) . 845--850.

[12]

Alex Graves. 2013. Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850 (2013).

[13]

Jiafeng Guo, Yixing Fan, Qingyao Ai, and W Bruce Croft. 2016. A deep relevance matching model for ad-hoc retrieval. In Proceedings of the 25th ACM international on conference on information and knowledge management . 55--64.

Digital Library

[14]

Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. 2013. Learning deep structured semantic models for web search using clickthrough data. In Proceedings of the 22nd ACM international conference on Information & Knowledge Management. 2333--2338.

Digital Library

[15]

Kalervo J"arvelin and Jaana Kek"al"ainen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems (TOIS), Vol. 20, 4 (2002), 422--446.

Digital Library

[16]

Di Jin, Cuiying Huo, Chundong Liang, and Liang Yang. 2021 a. Heterogeneous Graph Neural Network via Attribute Completion. In Proceedings of the Web Conference 2021. 391--400.

Digital Library

[17]

Di Jin, Zhizhi Yu, Pengfei Jiao, Shirui Pan, Philip S Yu, and Weixiong Zhang. 2021 b. A survey of community detection approaches: From statistical modeling to deep learning. arXiv preprint arXiv:2101.01669 (2021).

[18]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

[19]

Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In International conference on machine learning. PMLR, 1188--1196.

[20]

PM Lerman. 1980. Fitting segmented regression models by grid search. Journal of the Royal Statistical Society: Series C (Applied Statistics), Vol. 29, 1 (1980), 77--84.

[21]

Yen-Chieh Lien, Daniel Cohen, and W Bruce Croft. 2019. An assumption-free approach to the dynamic truncation of ranked lists. In Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval . 79--82.

Digital Library

[22]

Mihai Lupu and Allan Hanbury. 2013. Patent retrieval. Foundations and Trends in Information Retrieval, Vol. 7, 1 (2013), 1--97.

Digital Library

[23]

Jiaqi Ma, Zhe Zhao, Xinyang Yi, Jilin Chen, Lichan Hong, and Ed H Chi. 2018. Modeling task relationships in multi-task learning with multi-gate mixture-of-experts. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1930--1939.

Digital Library

[24]

Raghavan Manmatha, Toni Rath, and Fangfang Feng. 2001. Modeling score distributions for combining the outputs of search engines. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval. 267--275.

Digital Library

[25]

Diyawu Mumin, Lei-Lei Shi, Lu Liu, and John Panneerselvam. 2022. Data-Driven Diffusion Recommendation in Online Social Networks for the Internet of People. IEEE Transactions on Systems, Man, and Cybernetics: Systems, Vol. 52, 1 (2022), 166--178. https://doi.org/10.1109/TSMC.2020.3015355

[26]

Mohammad Norouzi, Samy Bengio, Navdeep Jaitly, Mike Schuster, Yonghui Wu, Dale Schuurmans, et al. 2016. Reward augmented maximum likelihood for neural structured prediction. Advances In Neural Information Processing Systems, Vol. 29 (2016), 1723--1731.

[27]

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, Vol. 32 (2019), 8026--8037.

[28]

Hao Peng, Jianxin Li, Qiran Gong, Yuanxin Ning, Senzhang Wang, and Lifang He. 2020. Motif-Matching Based Subgraph-Level Attentional Convolutional Network for Graph Classification. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 5387--5394.

[29]

Hao Peng, Jianxin Li, Yangqiu Song, Renyu Yang, Rajiv Ranjan, Philip S Yu, and Lifang He. 2021. Streaming Social Event Detection and Evolution Discovery in Heterogeneous Information Networks. ACM Transactions on Knowledge Discovery from Data (TKDD), Vol. 15, 5 (2021), 1--33.

[30]

Tao Qin, Tie-Yan Liu, Jun Xu, and Hang Li. 2010. LETOR: A benchmark collection for research on learning to rank for information retrieval. Information Retrieval, Vol. 13, 4 (2010), 346--374.

Digital Library

[31]

Juan Ramos et al. 2003. Using tf-idf to determine word relevance in document queries. In Proceedings of the first instructional conference on machine learning, Vol. 242. Citeseer, 29--48.

[32]

Radim v Rehr uv rek and Petr Sojka. 2010. Software Framework for Topic Modelling with Large Corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks. ELRA, Valletta, Malta, 45--50. http://is.muni.cz/publication/884893/en.

[33]

Henning Risvik. 2007. Principal component analysis (PCA) & NIPALS algorithm. Report 1, Vol. 6 (2007).

[34]

Qingyun Sun, Jianxin Li, Hao Peng, Jia Wu, Xingcheng Fu, Cheng Ji, and Philip S Yu. 2021 a. Graph Structure Learning with Variational Information Bottleneck. arXiv preprint arXiv:2112.08903 (2021).

[35]

Qingyun Sun, Jianxin Li, Hao Peng, Jia Wu, Yuanxing Ning, Philip S Yu, and Lifang He. 2021 b. SUGAR: Subgraph neural network with reinforcement pooling and self-supervised mutual information mechanism. In Proceedings of the Web Conference 2021 . 2081--2091.

Digital Library

[36]

Hongyan Tang, Junning Liu, Ming Zhao, and Xudong Gong. 2020. Progressive layered extraction (ple): A novel multi-task learning (mtl) model for personalized recommendations. In Fourteenth ACM Conference on Recommender Systems . 269--278.

Digital Library

[37]

Stephen Tomlinson, Douglas W Oard, Jason R Baron, and Paul Thompson. 2007. Overview of the TREC 2007 Legal Track. In TREC. Citeseer.

[38]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998--6008.

[39]

Lidan Wang, Jimmy Lin, and Donald Metzler. 2011. A cascade ranking model for efficient ranked retrieval. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval. 105--114.

Digital Library

[40]

Chen Wu, Ruqing Zhang, Jiafeng Guo, Yixing Fan, Yanyan Lan, and Xueqi Cheng. 2021. Learning to Truncate Ranked Lists for Information Retrieval. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 4453--4461.

[41]

Yichao Wu and Yufeng Liu. 2007. Robust truncated hinge loss support vector machines. J. Amer. Statist. Assoc., Vol. 102, 479 (2007), 974--983.

[42]

Chengfeng Xu, Jian Feng, Pengpeng Zhao, Fuzhen Zhuang, Deqing Wang, Yanchi Liu, and Victor S Sheng. 2021. Long-and short-term self-attention network for sequential recommendation. Neurocomputing, Vol. 423 (2021), 580--589.

[43]

Han Zhang, Ian Goodfellow, Dimitris Metaxas, and Augustus Odena. 2019. Self-attention generative adversarial networks. In International conference on machine learning . PMLR, 7354--7363.

[44]

Guorui Zhou, Xiaoqiang Zhu, Chenru Song, Ying Fan, Han Zhu, Xiao Ma, Yanghui Yan, Junqi Jin, Han Li, and Kun Gai. 2018. Deep interest network for click-through rate prediction. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining . 1059--1068.

Digital Library

[45]

Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang. 2021. Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of AAAI .

Cited By

Huang JZheng ZKang YWang ZAngélica LLattanzi SMuñoz Medina AAkoglu LGionis AVassilvitskii S(2024)From Second to First: Mixed Censored Multi-Task Learning for Winning Price PredictionProceedings of the 17th ACM International Conference on Web Search and Data Mining10.1145/3616855.3635838(295-303)Online publication date: 4-Mar-2024
https://dl.acm.org/doi/10.1145/3616855.3635838
Ye FLi SChua TNgo CKa-Wei Lee RKumar RLauw H(2024)MileCut: A Multi-view Truncation Framework for Legal Case RetrievalProceedings of the ACM on Web Conference 202410.1145/3589334.3645349(1341-1349)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645349

Index Terms

MtCut: A Multi-Task Framework for Ranked List Truncation
1. Information systems
  1. Information retrieval

Recommendations

Choppy: Cut Transformer for Ranked List Truncation
SIGIR '20: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval

Work in information retrieval has traditionally focused on ranking and relevance: given a query, return some number of results ordered by relevance to the user. However, the problem of determining how many results to return, i.e. how to optimally ...
Read More
An Assumption-Free Approach to the Dynamic Truncation of Ranked Lists
ICTIR '19: Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval

In traditional retrieval environments, a ranked list of candidate documents is produced without regard to the number of documents. With the rise in interactive IR as well as professional searches such as legal retrieval, this results in a substantial ...
Read More
An Empirical Evaluation on Semantic Search Performance of Keyword-Based and Semantic Search Engines: Google, Yahoo, Msn and Hakia
ICIMP '09: Proceedings of the 2009 Fourth International Conference on Internet Monitoring and Protection

This paper investigates the semantic search performance of search engines. Initially, three keyword-based search engines (Google, Yahoo and Msn) and a semantic search engine (Hakia) were selected. Then, ten queries, from various topics, and four phrases,...
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WSDM '22: Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining

February 2022

1690 pages

ISBN:9781450391320

DOI:10.1145/3488560

General Chairs:
K. Selcuk Candan
Arizona State University, USA
,
Huan Liu
Arizona State University, USA
,
Program Chairs:
Leman Akoglu
Carnegie Mellon University, USA
,
Xin Luna Dong
Meta Platforms, Inc. (former Facebook), USA
,
Jiliang Tang
Michigan State University, USA

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 February 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

NSFC

Conference

WSDM '22

Sponsor:

WSDM '22: The Fifteenth ACM International Conference on Web Search and Data Mining

February 21 - 25, 2022

AZ, Virtual Event, USA

Acceptance Rates

Overall Acceptance Rate 498 of 2,863 submissions, 17%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
205
Total Downloads

Downloads (Last 12 months)44
Downloads (Last 6 weeks)3

Other Metrics

View Author Metrics

Citations

Cited By

Huang JZheng ZKang YWang ZAngélica LLattanzi SMuñoz Medina AAkoglu LGionis AVassilvitskii S(2024)From Second to First: Mixed Censored Multi-Task Learning for Winning Price PredictionProceedings of the 17th ACM International Conference on Web Search and Data Mining10.1145/3616855.3635838(295-303)Online publication date: 4-Mar-2024
https://dl.acm.org/doi/10.1145/3616855.3635838
Ye FLi SChua TNgo CKa-Wei Lee RKumar RLauw H(2024)MileCut: A Multi-view Truncation Framework for Legal Case RetrievalProceedings of the ACM on Web Conference 202410.1145/3589334.3645349(1341-1349)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645349

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents