Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2499178.2499179acmotherconferencesArticle/Chapter ViewAbstractPublication PagesictirConference Proceedingsconference-collections
research-article

A Theoretical Analysis of Pseudo-Relevance Feedback Models

Published: 29 September 2013 Publication History

Abstract

Our goal in this study is to compare several widely used pseudo-relevance feedback (PRF) models and understand what explains their respective behavior. To do so, we first analyze how different PRF models behave through the characteristics of the terms they select and through their performance on two widely used test collections. This analysis reveals that several well-known models surprisingly tend to select very common terms, with low IDF (inverse document frequency). We then introduce several conditions PRF models should satisfy regarding both the terms they select and the way they weigh them, prior to study whether standard PRF models satisfy these conditions or not. This study reveals that most models are deficient with respect to at least one condition, and that this deficiency explains the results of our analysis of the behavior of the models, as well as some of the results reported on the respective performance of PRF models. Based on the PRF conditions, we finally propose possible corrections for the simple mixture model. The PRF models obtained after these corrections outperform their standard version and yield state-of-the-art PRF models which confirms the validity of our theoretical analysis.

References

[1]
G. Amati, C. Carpineto, G. Romano, and F. U. Bordoni. Fondazione Ugo Bordoni at TREC 2003: robust and web track, 2003.
[2]
S. Clinchant and E. Gaussier. Information-based models for ad hoc IR. In SIGIR'10, SIGIR '10, pages 234--241, New York, NY, USA, 2010. ACM.
[3]
S. Clinchant and E. Gaussier. Is document frequency important for prf? In ICTIR, pages 89--100, 2011.
[4]
K. Collins-Thompson. Estimating robust query models with convex optimization. In NIPS, pages 329--336, 2008.
[5]
K. Collins-Thompson. Reducing the risk of query expansion via robust constrained optimization. In CIKM'09, CIKM '09, pages 837--846, 2009.
[6]
K. Collins-Thompson and J. Callan. Estimation and use of uncertainty in pseudo-relevance feedback. In SIGIR'07, SIGIR '07, pages 303--310, 2007.
[7]
D. W. Crabtree, P. Andreae, and X. Gao. Exploiting underrepresented query aspects for automatic query expansion. In KDD'07, KDD '07, pages 191--200, New York, NY, USA, 2007. ACM.
[8]
R. Cummins and C. O'Riordan. An axiomatic comparison of learned term-weighting schemes in information retrieval: clarifications and extensions. Artif. Intell. Rev., 28:51--68, June 2007.
[9]
J. V. Dillon and K. Collins-Thompson. A unified optimization framework for robust pseudo-relevance feedback algorithms. In CIKM, pages 1069--1078, 2010.
[10]
C. Elkan. Clustering documents with an exponential-family approximation of the Dirichlet compound multinomial distribution. In W. W. Cohen and A. Moore, editors, ICML, volume 148, pages 289--296. ACM, 2006.
[11]
H. Fang, T. Tao, and C. Zhai. A formal study of information retrieval heuristics. In SIGIR '04, 2004.
[12]
H. Fang and C. Zhai. Semantic term matching in axiomatic approaches to information retrieval. In SIGIR'06, SIGIR '06, pages 115--122, 2006.
[13]
V. Lavrenko and W. B. Croft. Relevance based language models. In SIGIR '01, pages 120--127, New York, NY, USA, 2001. ACM.
[14]
K. S. Lee, W. B. Croft, and J. Allan. A cluster-based resampling method for pseudo-relevance feedback. In SIGIR'08, 2008.
[15]
Y. Lv and C. Zhai. Adaptive relevance feedback in information retrieval. In CIKM'09, CIKM '09, pages 255--264, New York, NY, USA, 2009. ACM.
[16]
Y. Lv and C. Zhai. A comparative study of methods for estimating query language models with pseudo feedback. In CIKM '09, 2009.
[17]
Y. Lv and C. Zhai. Positional relevance model for pseudo-relevance feedback. In SIGIR'10, 2010.
[18]
S. Robertson. On term selection for query expansion. Journal of Documentation, 46, 1990.
[19]
J. Seo and W. B. Croft. Geometric representations for multiple documents. In SIGIR '10, pages 251--258, New York, NY, USA, 2010. ACM.
[20]
T. Tao and C. Zhai. Regularized estimation of mixture models for robust pseudo-relevance feedback. In SIGIR'06, SIGIR '06, pages 162--169, 2006.
[21]
Z. Xu and R. Akella. A new probabilistic retrieval model based on the dirichlet compound multinomial distribution. In SIGIR '08, pages 427--434, 2008.
[22]
C. Zhai and J. Lafferty. Model-based feedback in the language modeling approach to information retrieval. In CIKM '01, pages 403--410, 2001.
[23]
C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to information retrieval. ACM Trans. Inf. Syst., 22(2):179--214, 2004.

Cited By

View all
  • (2024)Cross-Model Comparative Loss for Enhancing Neuronal Utility in Language UnderstandingACM Transactions on Information Systems10.1145/365259942:5(1-29)Online publication date: 27-Apr-2024
  • (2024)An Intrinsic Framework of Information Retrieval Evaluation MeasuresIntelligent Systems and Applications10.1007/978-3-031-47721-8_47(692-713)Online publication date: 10-Jan-2024
  • (2023)Information Retrieval Evaluation Measures Defined on Some Axiomatic Models of PreferencesACM Transactions on Information Systems10.1145/363217142:3(1-35)Online publication date: 8-Nov-2023
  • Show More Cited By

Index Terms

  1. A Theoretical Analysis of Pseudo-Relevance Feedback Models

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICTIR '13: Proceedings of the 2013 Conference on the Theory of Information Retrieval
    September 2013
    148 pages
    ISBN:9781450321075
    DOI:10.1145/2499178
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    • Findwise: Findwise AB
    • Google Inc.
    • Spinque: Spinque
    • Univ. of Copenhagen: University of Copenhagen
    • LARM: LARM Audio Research Archive
    • Royal School of Library and Information Science: Royal School of Library and Information Science
    • Yahoo! Labs

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 29 September 2013

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Axiomatic Theory
    2. IR Theory
    3. Pseudo Relevance Feedback

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    ICTIR '13
    Sponsor:
    • Findwise
    • Spinque
    • Univ. of Copenhagen
    • LARM
    • Royal School of Library and Information Science

    Acceptance Rates

    ICTIR '13 Paper Acceptance Rate 11 of 51 submissions, 22%;
    Overall Acceptance Rate 235 of 527 submissions, 45%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)5
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 15 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Cross-Model Comparative Loss for Enhancing Neuronal Utility in Language UnderstandingACM Transactions on Information Systems10.1145/365259942:5(1-29)Online publication date: 27-Apr-2024
    • (2024)An Intrinsic Framework of Information Retrieval Evaluation MeasuresIntelligent Systems and Applications10.1007/978-3-031-47721-8_47(692-713)Online publication date: 10-Jan-2024
    • (2023)Information Retrieval Evaluation Measures Defined on Some Axiomatic Models of PreferencesACM Transactions on Information Systems10.1145/363217142:3(1-35)Online publication date: 8-Nov-2023
    • (2023)Pseudo Relevance Feedback with Deep Language Models and Dense Retrievers: Successes and PitfallsACM Transactions on Information Systems10.1145/357072441:3(1-40)Online publication date: 10-Apr-2023
    • (2023)A deep actor critic reinforcement learning framework for learning to rankNeurocomputing10.1016/j.neucom.2023.126314547(126314)Online publication date: Aug-2023
    • (2022)LoLProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3532017(825-836)Online publication date: 6-Jul-2022
    • (2022)A Large Scale Document-Term Matching Method Based on Information Retrieval2022 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom)10.1109/ISPA-BDCloud-SocialCom-SustainCom57177.2022.00048(323-330)Online publication date: Dec-2022
    • (2022)Improving Query Representations for Dense Retrieval with Pseudo Relevance Feedback: A Reproducibility StudyAdvances in Information Retrieval10.1007/978-3-030-99736-6_40(599-612)Online publication date: 5-Apr-2022
    • (2021)QA4PRF: A Question Answering Based Framework for Pseudo Relevance FeedbackIEEE Access10.1109/ACCESS.2021.31186009(139303-139314)Online publication date: 2021
    • (2020)A Reinforcement Learning Framework for Relevance FeedbackProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3397271.3401099(59-68)Online publication date: 25-Jul-2020
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media