research-article

A Theoretical Analysis of Pseudo-Relevance Feedback Models

Authors:

Stéphane Clinchant,

Eric GaussierAuthors Info & Claims

ICTIR '13: Proceedings of the 2013 Conference on the Theory of Information Retrieval

Pages 6 - 13

https://doi.org/10.1145/2499178.2499179

Published: 29 September 2013 Publication History

Abstract

Our goal in this study is to compare several widely used pseudo-relevance feedback (PRF) models and understand what explains their respective behavior. To do so, we first analyze how different PRF models behave through the characteristics of the terms they select and through their performance on two widely used test collections. This analysis reveals that several well-known models surprisingly tend to select very common terms, with low IDF (inverse document frequency). We then introduce several conditions PRF models should satisfy regarding both the terms they select and the way they weigh them, prior to study whether standard PRF models satisfy these conditions or not. This study reveals that most models are deficient with respect to at least one condition, and that this deficiency explains the results of our analysis of the behavior of the models, as well as some of the results reported on the respective performance of PRF models. Based on the PRF conditions, we finally propose possible corrections for the simple mixture model. The PRF models obtained after these corrections outperform their standard version and yield state-of-the-art PRF models which confirms the validity of our theoretical analysis.

References

[1]

G. Amati, C. Carpineto, G. Romano, and F. U. Bordoni. Fondazione Ugo Bordoni at TREC 2003: robust and web track, 2003.

[2]

S. Clinchant and E. Gaussier. Information-based models for ad hoc IR. In SIGIR'10, SIGIR '10, pages 234--241, New York, NY, USA, 2010. ACM.

Digital Library

[3]

S. Clinchant and E. Gaussier. Is document frequency important for prf? In ICTIR, pages 89--100, 2011.

Digital Library

[4]

K. Collins-Thompson. Estimating robust query models with convex optimization. In NIPS, pages 329--336, 2008.

[5]

K. Collins-Thompson. Reducing the risk of query expansion via robust constrained optimization. In CIKM'09, CIKM '09, pages 837--846, 2009.

Digital Library

[6]

K. Collins-Thompson and J. Callan. Estimation and use of uncertainty in pseudo-relevance feedback. In SIGIR'07, SIGIR '07, pages 303--310, 2007.

Digital Library

[7]

D. W. Crabtree, P. Andreae, and X. Gao. Exploiting underrepresented query aspects for automatic query expansion. In KDD'07, KDD '07, pages 191--200, New York, NY, USA, 2007. ACM.

Digital Library

[8]

R. Cummins and C. O'Riordan. An axiomatic comparison of learned term-weighting schemes in information retrieval: clarifications and extensions. Artif. Intell. Rev., 28:51--68, June 2007.

Digital Library

[9]

J. V. Dillon and K. Collins-Thompson. A unified optimization framework for robust pseudo-relevance feedback algorithms. In CIKM, pages 1069--1078, 2010.

Digital Library

[10]

C. Elkan. Clustering documents with an exponential-family approximation of the Dirichlet compound multinomial distribution. In W. W. Cohen and A. Moore, editors, ICML, volume 148, pages 289--296. ACM, 2006.

Digital Library

[11]

H. Fang, T. Tao, and C. Zhai. A formal study of information retrieval heuristics. In SIGIR '04, 2004.

Digital Library

[12]

H. Fang and C. Zhai. Semantic term matching in axiomatic approaches to information retrieval. In SIGIR'06, SIGIR '06, pages 115--122, 2006.

Digital Library

[13]

V. Lavrenko and W. B. Croft. Relevance based language models. In SIGIR '01, pages 120--127, New York, NY, USA, 2001. ACM.

Digital Library

[14]

K. S. Lee, W. B. Croft, and J. Allan. A cluster-based resampling method for pseudo-relevance feedback. In SIGIR'08, 2008.

Digital Library

[15]

Y. Lv and C. Zhai. Adaptive relevance feedback in information retrieval. In CIKM'09, CIKM '09, pages 255--264, New York, NY, USA, 2009. ACM.

Digital Library

[16]

Y. Lv and C. Zhai. A comparative study of methods for estimating query language models with pseudo feedback. In CIKM '09, 2009.

Digital Library

[17]

Y. Lv and C. Zhai. Positional relevance model for pseudo-relevance feedback. In SIGIR'10, 2010.

Digital Library

[18]

S. Robertson. On term selection for query expansion. Journal of Documentation, 46, 1990.

Digital Library

[19]

J. Seo and W. B. Croft. Geometric representations for multiple documents. In SIGIR '10, pages 251--258, New York, NY, USA, 2010. ACM.

Digital Library

[20]

T. Tao and C. Zhai. Regularized estimation of mixture models for robust pseudo-relevance feedback. In SIGIR'06, SIGIR '06, pages 162--169, 2006.

Digital Library

[21]

Z. Xu and R. Akella. A new probabilistic retrieval model based on the dirichlet compound multinomial distribution. In SIGIR '08, pages 427--434, 2008.

Digital Library

[22]

C. Zhai and J. Lafferty. Model-based feedback in the language modeling approach to information retrieval. In CIKM '01, pages 403--410, 2001.

Digital Library

[23]

C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to information retrieval. ACM Trans. Inf. Syst., 22(2):179--214, 2004.

Digital Library

Cited By

Zhu YPang LWu KLan YShen HCheng X(2024)Cross-Model Comparative Loss for Enhancing Neuronal Utility in Language UnderstandingACM Transactions on Information Systems10.1145/365259942:5(1-29)Online publication date: 27-Apr-2024
https://dl.acm.org/doi/10.1145/3652599
Giner F(2024)An Intrinsic Framework of Information Retrieval Evaluation MeasuresIntelligent Systems and Applications10.1007/978-3-031-47721-8_47(692-713)Online publication date: 10-Jan-2024
https://doi.org/10.1007/978-3-031-47721-8_47
Giner F(2023)Information Retrieval Evaluation Measures Defined on Some Axiomatic Models of PreferencesACM Transactions on Information Systems10.1145/363217142:3(1-35)Online publication date: 8-Nov-2023
https://dl.acm.org/doi/10.1145/3632171
Show More Cited By

Index Terms

A Theoretical Analysis of Pseudo-Relevance Feedback Models
1. Information systems
  1. Information retrieval

Recommendations

The effect of low-level image features on pseudo relevance feedback

Relevance feedback (RF) is a technique popularly used to improve the effectiveness of traditional content-based image retrieval systems. However, users must provide relevant and/or irrelevant images as feedback for their queries, which is a tedious ...
Pseudo Relevance Feedback with Deep Language Models and Dense Retrievers: Successes and Pitfalls
Pseudo Relevance Feedback (PRF) is known to improve the effectiveness of bag-of-words retrievers. At the same time, deep language models have been shown to outperform traditional bag-of-words rerankers. However, it is unclear how to integrate PRF directly ...
Hyperlink-extended pseudo relevance feedback for improved microblog retrieval
SoMeRA '14: Proceedings of the first international workshop on Social media retrieval and analysis

Microblog retrieval has received much attention in recent years due to the wide spread of social microblogging platforms such as Twitter. Many research studies investigated different approaches for microblog retrieval. Query expansion is one of the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICTIR '13: Proceedings of the 2013 Conference on the Theory of Information Retrieval

September 2013

148 pages

ISBN:9781450321075

DOI:10.1145/2499178

Editors:
Oren Kurland
Technion - Israel Institute of Technology
,
Donald Metzler
Google, USA
,
Christina Lioma
University of Copenhagen, Denmark
,
Birger Larsen
University of Copenhagen, Denmark
,
Peter Ingwersen
University of Copenhagen, Denmark

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Findwise: Findwise AB
Google Inc.
Spinque: Spinque
Univ. of Copenhagen: University of Copenhagen
LARM: LARM Audio Research Archive
Royal School of Library and Information Science: Royal School of Library and Information Science
Yahoo! Labs

In-Cooperation

SIGIR: ACM Special Interest Group on Information Retrieval
British Computer Society: BCS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 September 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ICTIR '13

Sponsor:

Findwise
Spinque
Univ. of Copenhagen
LARM
Royal School of Library and Information Science

ICTIR '13: International Conference on the Theory of Information Retrieval

September 29 - October 2, 2013

Copenhagen, Denmark

Acceptance Rates

ICTIR '13 Paper Acceptance Rate 11 of 51 submissions, 22%;

Overall Acceptance Rate 235 of 527 submissions, 45%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

33
Total Citations
View Citations
275
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)0

Reflects downloads up to 15 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhu YPang LWu KLan YShen HCheng X(2024)Cross-Model Comparative Loss for Enhancing Neuronal Utility in Language UnderstandingACM Transactions on Information Systems10.1145/365259942:5(1-29)Online publication date: 27-Apr-2024
https://dl.acm.org/doi/10.1145/3652599
Giner F(2024)An Intrinsic Framework of Information Retrieval Evaluation MeasuresIntelligent Systems and Applications10.1007/978-3-031-47721-8_47(692-713)Online publication date: 10-Jan-2024
https://doi.org/10.1007/978-3-031-47721-8_47
Giner F(2023)Information Retrieval Evaluation Measures Defined on Some Axiomatic Models of PreferencesACM Transactions on Information Systems10.1145/363217142:3(1-35)Online publication date: 8-Nov-2023
https://dl.acm.org/doi/10.1145/3632171
Li HMourad AZhuang SKoopman BZuccon G(2023)Pseudo Relevance Feedback with Deep Language Models and Dense Retrievers: Successes and PitfallsACM Transactions on Information Systems10.1145/357072441:3(1-40)Online publication date: 10-Apr-2023
https://dl.acm.org/doi/10.1145/3570724
Padhye VLakshmanan K(2023)A deep actor critic reinforcement learning framework for learning to rankNeurocomputing10.1016/j.neucom.2023.126314547(126314)Online publication date: Aug-2023
https://doi.org/10.1016/j.neucom.2023.126314
Zhu YPang LLan YShen HCheng XAmigo ECastells PGonzalo JCarterette BCulpepper JKazai G(2022)LoLProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3532017(825-836)Online publication date: 6-Jul-2022
https://dl.acm.org/doi/10.1145/3477495.3532017
Feng JZhao RJiang J(2022)A Large Scale Document-Term Matching Method Based on Information Retrieval2022 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom)10.1109/ISPA-BDCloud-SocialCom-SustainCom57177.2022.00048(323-330)Online publication date: Dec-2022
https://doi.org/10.1109/ISPA-BDCloud-SocialCom-SustainCom57177.2022.00048
Li HZhuang SMourad AMa XLin JZuccon G(2022)Improving Query Representations for Dense Retrieval with Pseudo Relevance Feedback: A Reproducibility StudyAdvances in Information Retrieval10.1007/978-3-030-99736-6_40(599-612)Online publication date: 5-Apr-2022
https://doi.org/10.1007/978-3-030-99736-6_40
Ma HHou JZhu CZhang WTang RLai JZhu JHe XYu Y(2021)QA4PRF: A Question Answering Based Framework for Pseudo Relevance FeedbackIEEE Access10.1109/ACCESS.2021.31186009(139303-139314)Online publication date: 2021
https://doi.org/10.1109/ACCESS.2021.3118600
Montazeralghaem AZamani HAllan JHuang JChang YCheng XKamps JMurdock VWen JLiu Y(2020)A Reinforcement Learning Framework for Relevance FeedbackProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3397271.3401099(59-68)Online publication date: 25-Jul-2020
https://dl.acm.org/doi/10.1145/3397271.3401099
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents