Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3488560.3498466acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

MtCut: A Multi-Task Framework for Ranked List Truncation

Published: 15 February 2022 Publication History
  • Get Citation Alerts
  • Abstract

    Ranked list truncation aims to cut the ranked results in short considering user-defined objectives, which balances the overall utility and user efforts over retrieval results. The exact selection of an optimal cut-off position brings potential benefits in various real-world applications, such as patent search and legal search. However, there is significant retrieval bias in the ranked list. The result scores and the disorder of document sequences cause difficulties in judging the relevance between the queries and documents -- alleviating the existing methods' performance improvement. In this work, we investigate the characteristics of retrieval bias on altering truncation and propose a multi-task truncation model, MtCut. It employs two auxiliary tasks to make complementary for the retrieval bias. As a practical evaluation, we explore its performance on two datasets, and the results show that MtCut outperforms the state-of-the-art methods on both F1-score and DCG metrics.

    Supplementary Material

    MP4 File (WSDM22-fp495.mp4)
    Presentation video of the multi-task truncation model, MtCut, which proposes a novel and efficient solution for ranked list truncation.
    MP4 File (WSDM22-fp495.mp4)
    Presentation video of the multi-task truncation model, MtCut, which proposes a novel and efficient solution for ranked list truncation.

    References

    [1]
    Avi Arampatzis. 2001. Unbiased sd threshold optimization, initial query degradation, decay, and incrementality, for adaptive document filtering. In TREC .
    [2]
    Avi Arampatzis, Jean Beney, Cornelis HA Koster, and Theo P van der Weide. 2000. Incrementality, half-life, and threshold optimization for adaptive document filtering. (2000).
    [3]
    Avi Arampatzis, Jaap Kamps, and Stephen Robertson. 2009. Where to stop reading a ranked list? Threshold optimization using truncated score distributions. In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval . 524--531.
    [4]
    Avi Arampatzis and André van Hameran. 2001. The score-distributional threshold optimization for adaptive binary classification tasks. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval . 285--293.
    [5]
    Dara Bahri, Yi Tay, Che Zheng, Donald Metzler, and Andrew Tomkins. 2020. Choppy: Cut transformer for ranked list truncation. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 1513--1516.
    [6]
    Richard A Caruana. 1993. Multitask connectionist learning. In In Proceedings of the 1993 Connectionist Models Summer School. Citeseer.
    [7]
    J Shane Culpepper, Charles LA Clarke, and Jimmy Lin. 2016. Dynamic cutoff prediction in multi-stage retrieval systems. In Proceedings of the 21st Australasian Document Computing Symposium. 17--24.
    [8]
    J Shane Culpepper, Fernando Diaz, and Mark D Smucker. 2018. Research frontiers in information retrieval: Report from the third strategic workshop on information retrieval in lorne (swirl 2018). In ACM SIGIR Forum, Vol. 52. ACM New York, NY, USA, 34--90.
    [9]
    Arthur P Dempster, Nan M Laird, and Donald B Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (Methodological), Vol. 39, 1 (1977), 1--22.
    [10]
    Michel Deudon. 2018. Learning semantic similarity in a continuous space. In Proceedings of the 32nd International Conference on Neural Information Processing Systems . 994--1005.
    [11]
    Long Duong, Trevor Cohn, Steven Bird, and Paul Cook. 2015. Low resource dependency parsing: Cross-lingual parameter sharing in a neural network parser. In Proceedings of the 53rd annual meeting of the Association for Computational Linguistics and the 7th international joint conference on natural language processing (volume 2: short papers) . 845--850.
    [12]
    Alex Graves. 2013. Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850 (2013).
    [13]
    Jiafeng Guo, Yixing Fan, Qingyao Ai, and W Bruce Croft. 2016. A deep relevance matching model for ad-hoc retrieval. In Proceedings of the 25th ACM international on conference on information and knowledge management . 55--64.
    [14]
    Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. 2013. Learning deep structured semantic models for web search using clickthrough data. In Proceedings of the 22nd ACM international conference on Information & Knowledge Management. 2333--2338.
    [15]
    Kalervo J"arvelin and Jaana Kek"al"ainen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems (TOIS), Vol. 20, 4 (2002), 422--446.
    [16]
    Di Jin, Cuiying Huo, Chundong Liang, and Liang Yang. 2021 a. Heterogeneous Graph Neural Network via Attribute Completion. In Proceedings of the Web Conference 2021. 391--400.
    [17]
    Di Jin, Zhizhi Yu, Pengfei Jiao, Shirui Pan, Philip S Yu, and Weixiong Zhang. 2021 b. A survey of community detection approaches: From statistical modeling to deep learning. arXiv preprint arXiv:2101.01669 (2021).
    [18]
    Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
    [19]
    Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In International conference on machine learning. PMLR, 1188--1196.
    [20]
    PM Lerman. 1980. Fitting segmented regression models by grid search. Journal of the Royal Statistical Society: Series C (Applied Statistics), Vol. 29, 1 (1980), 77--84.
    [21]
    Yen-Chieh Lien, Daniel Cohen, and W Bruce Croft. 2019. An assumption-free approach to the dynamic truncation of ranked lists. In Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval . 79--82.
    [22]
    Mihai Lupu and Allan Hanbury. 2013. Patent retrieval. Foundations and Trends in Information Retrieval, Vol. 7, 1 (2013), 1--97.
    [23]
    Jiaqi Ma, Zhe Zhao, Xinyang Yi, Jilin Chen, Lichan Hong, and Ed H Chi. 2018. Modeling task relationships in multi-task learning with multi-gate mixture-of-experts. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1930--1939.
    [24]
    Raghavan Manmatha, Toni Rath, and Fangfang Feng. 2001. Modeling score distributions for combining the outputs of search engines. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval. 267--275.
    [25]
    Diyawu Mumin, Lei-Lei Shi, Lu Liu, and John Panneerselvam. 2022. Data-Driven Diffusion Recommendation in Online Social Networks for the Internet of People. IEEE Transactions on Systems, Man, and Cybernetics: Systems, Vol. 52, 1 (2022), 166--178. https://doi.org/10.1109/TSMC.2020.3015355
    [26]
    Mohammad Norouzi, Samy Bengio, Navdeep Jaitly, Mike Schuster, Yonghui Wu, Dale Schuurmans, et al. 2016. Reward augmented maximum likelihood for neural structured prediction. Advances In Neural Information Processing Systems, Vol. 29 (2016), 1723--1731.
    [27]
    Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, Vol. 32 (2019), 8026--8037.
    [28]
    Hao Peng, Jianxin Li, Qiran Gong, Yuanxin Ning, Senzhang Wang, and Lifang He. 2020. Motif-Matching Based Subgraph-Level Attentional Convolutional Network for Graph Classification. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 5387--5394.
    [29]
    Hao Peng, Jianxin Li, Yangqiu Song, Renyu Yang, Rajiv Ranjan, Philip S Yu, and Lifang He. 2021. Streaming Social Event Detection and Evolution Discovery in Heterogeneous Information Networks. ACM Transactions on Knowledge Discovery from Data (TKDD), Vol. 15, 5 (2021), 1--33.
    [30]
    Tao Qin, Tie-Yan Liu, Jun Xu, and Hang Li. 2010. LETOR: A benchmark collection for research on learning to rank for information retrieval. Information Retrieval, Vol. 13, 4 (2010), 346--374.
    [31]
    Juan Ramos et al. 2003. Using tf-idf to determine word relevance in document queries. In Proceedings of the first instructional conference on machine learning, Vol. 242. Citeseer, 29--48.
    [32]
    Radim v Rehr uv rek and Petr Sojka. 2010. Software Framework for Topic Modelling with Large Corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks. ELRA, Valletta, Malta, 45--50. http://is.muni.cz/publication/884893/en.
    [33]
    Henning Risvik. 2007. Principal component analysis (PCA) & NIPALS algorithm. Report 1, Vol. 6 (2007).
    [34]
    Qingyun Sun, Jianxin Li, Hao Peng, Jia Wu, Xingcheng Fu, Cheng Ji, and Philip S Yu. 2021 a. Graph Structure Learning with Variational Information Bottleneck. arXiv preprint arXiv:2112.08903 (2021).
    [35]
    Qingyun Sun, Jianxin Li, Hao Peng, Jia Wu, Yuanxing Ning, Philip S Yu, and Lifang He. 2021 b. SUGAR: Subgraph neural network with reinforcement pooling and self-supervised mutual information mechanism. In Proceedings of the Web Conference 2021 . 2081--2091.
    [36]
    Hongyan Tang, Junning Liu, Ming Zhao, and Xudong Gong. 2020. Progressive layered extraction (ple): A novel multi-task learning (mtl) model for personalized recommendations. In Fourteenth ACM Conference on Recommender Systems . 269--278.
    [37]
    Stephen Tomlinson, Douglas W Oard, Jason R Baron, and Paul Thompson. 2007. Overview of the TREC 2007 Legal Track. In TREC. Citeseer.
    [38]
    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998--6008.
    [39]
    Lidan Wang, Jimmy Lin, and Donald Metzler. 2011. A cascade ranking model for efficient ranked retrieval. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval. 105--114.
    [40]
    Chen Wu, Ruqing Zhang, Jiafeng Guo, Yixing Fan, Yanyan Lan, and Xueqi Cheng. 2021. Learning to Truncate Ranked Lists for Information Retrieval. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 4453--4461.
    [41]
    Yichao Wu and Yufeng Liu. 2007. Robust truncated hinge loss support vector machines. J. Amer. Statist. Assoc., Vol. 102, 479 (2007), 974--983.
    [42]
    Chengfeng Xu, Jian Feng, Pengpeng Zhao, Fuzhen Zhuang, Deqing Wang, Yanchi Liu, and Victor S Sheng. 2021. Long-and short-term self-attention network for sequential recommendation. Neurocomputing, Vol. 423 (2021), 580--589.
    [43]
    Han Zhang, Ian Goodfellow, Dimitris Metaxas, and Augustus Odena. 2019. Self-attention generative adversarial networks. In International conference on machine learning . PMLR, 7354--7363.
    [44]
    Guorui Zhou, Xiaoqiang Zhu, Chenru Song, Ying Fan, Han Zhu, Xiao Ma, Yanghui Yan, Junqi Jin, Han Li, and Kun Gai. 2018. Deep interest network for click-through rate prediction. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining . 1059--1068.
    [45]
    Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang. 2021. Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of AAAI .

    Cited By

    View all
    • (2024)From Second to First: Mixed Censored Multi-Task Learning for Winning Price PredictionProceedings of the 17th ACM International Conference on Web Search and Data Mining10.1145/3616855.3635838(295-303)Online publication date: 4-Mar-2024
    • (2024)MileCut: A Multi-view Truncation Framework for Legal Case RetrievalProceedings of the ACM on Web Conference 202410.1145/3589334.3645349(1341-1349)Online publication date: 13-May-2024

    Index Terms

    1. MtCut: A Multi-Task Framework for Ranked List Truncation

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      WSDM '22: Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining
      February 2022
      1690 pages
      ISBN:9781450391320
      DOI:10.1145/3488560
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 15 February 2022

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. bias correction
      2. information retrieval
      3. multi-task learning
      4. ranked list truncation

      Qualifiers

      • Research-article

      Funding Sources

      • NSFC

      Conference

      WSDM '22

      Acceptance Rates

      Overall Acceptance Rate 498 of 2,863 submissions, 17%

      Upcoming Conference

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)44
      • Downloads (Last 6 weeks)3

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)From Second to First: Mixed Censored Multi-Task Learning for Winning Price PredictionProceedings of the 17th ACM International Conference on Web Search and Data Mining10.1145/3616855.3635838(295-303)Online publication date: 4-Mar-2024
      • (2024)MileCut: A Multi-view Truncation Framework for Legal Case RetrievalProceedings of the ACM on Web Conference 202410.1145/3589334.3645349(1341-1349)Online publication date: 13-May-2024

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media