Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1835449.1835559acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Multilingual PRF: english lends a helping hand

Published: 19 July 2010 Publication History

Abstract

In this paper, we present a novel approach to Pseudo-Relevance Feedback (PRF) called Multilingual PRF (MultiPRF). The key idea is to harness multilinguality. Given a query in a language, we take the help of another language to ameliorate the well known problems of PRF, viz. (a) The expansion terms from PRF are primarily based on co-occurrence relationships with query terms, and thus other terms which are lexically and semantically related, such as morphological variants and synonyms, are not explicitly captured, and (b) PRF is quite sensitive to the quality of the initially retrieved top k documents and is thus not robust. In MultiPRF, given a query in language L1, it is translated into language L2 and PRF is performed on a collection in language L2 and the resultant feedback model is translated from L2 back into L1. The final feedback model is obtained by combining the translated model with the original feedback model of the query in L1.
Experiments were performed on standard CLEF collections in languages with widely differing characteristics, viz., French, German, Finnish and Hungarian with English as the assisting language. We observe that MultiPRF outperforms PRF and is more robust with consistent and significant improvements in the above widely differing languages. A thorough analysis of the results reveal that the second language helps in obtaining both co-occurrence based conceptual terms as well as lexically and semantically related terms. Additionally, the use of the second language collection reduces the sensitivity to performance of initial retrieval, thereby making it more robust.

References

[1]
G. Amati, C. Carpineto, and G. Romano. Query Difficulty, Robustness, and Selective Application of Query Expansion. In ECIR '04, Sunderland, UK, pages 127--137, 2004.
[2]
A. Berger and J. D. Lafferty. Information Retrieval as Statistical Translation. In SIGIR '99, pages 222--229,Berkeley, USA, 1999. ACM.
[3]
M. Braschler and C. Peters. Cross-Language Evaluation Forum: Objectives, Results, Achievements. Information Retrieval, 7(1-2):7--31, 2004.
[4]
C. Buckley, G. Salton, J. Allan, and A. Singhal. Automatic Query Expansion Using SMART: TREC3. In TREC-3, pages 69--80, 1994.
[5]
G. Cao, J.-Y. Nie, J. Gao, and S. Robertson. Selecting Good Expansion Terms for Pseudo-Relevance Feedback. In SIGIR '08, pages 243--250, NY, USA, 2008. ACM.
[6]
K. Collins-Thompson and J. Callan. Query Expansion Using Random Walk Models. In CIKM '05, pages 704--711, NY, USA, 2005. ACM.
[7]
S. Cronen-Townsend, Y. Zhou, and W. B. Croft. A Framework for Selective Query Expansion. In CIKM '04, pages 236--237, NY, USA, 2004. ACM.
[8]
I. Dagan, A. Itai, and U. Schwall. Two Languages Are More Informative Than One. In ACL '91, pages 130--137, Morristown, NJ, USA, 1991. ACL.
[9]
A. Dempster, N. Laird, and D. Rubin. Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society, 39:1--38, 1977.
[10]
T. S. Dumais, A. T. Letsche, L. M. Littman, and K. T. Landauer. Automatic Cross-Language Retrieval Using Latent Semantic Indexing. In AAAI Technical Report SS-97-05, pages 18--24, 1997.
[11]
W. Gao, J. Blitzer, and M. Zhou. Using English Information in Non-English Web Search. In iNEWS '08: ACM Workshop on Improving Non English Web Searching, pages 17--24, NY, USA, 2008. ACM.
[12]
D. Hawking, P. Thistlewaite, and D. Harman. Scaling Up The TREC Collection. Information Retrieval, 1(1-2):115--137, 1999.
[13]
H. Hoang, A. Birch, C. Callison-Burch, R. Zens,R. Aachen, A. Constantin, M. Federico, N. Bertoldi, C. Dyer, B. Cowan, W. Shen, C. Moran, and O. Bojar. Moses: Open Source Toolkit for Statistical Machine Translation. In ACL '07, Prague, Czech Republic, pages 177--180, 2007. ACL.
[14]
John Lafferty and Chengxiang Zhai. Probabilistic Relevance Models Based on Document and Query Generation. In Language Modeling for Information Retrieval, volume 13, pages 1--10. Kluwer International Series on IR, 2003.
[15]
K. S. Jones, S. Walker, and S. E. Robertson. A Probabilistic Model of Information Retrieval: Development and Comparative Experiments. Information Processing and Management, 36(6):779--808, 2000.
[16]
Lafferty and C. Zhai. Document Language Models,Query Models, and Risk Minimization for Information Retrieval. In SIGIR '01, pages 111--119, NY, USA, 2001. ACM.
[17]
V. Lavrenko and W. B. Croft. Relevance Based Language Models. In SIGIR '01, pages 120--127, NY,USA, 2001. ACM.
[18]
D. Metzler and W. B. Croft. Latent Concept Expansion Using Markov Random Fields. In SIGIR '07, pages 311{318, NY, USA, 2007. ACM.
[19]
M. Mitra, A. Singhal, and C. Buckley. Improving Automatic Query Expansion. In SIGIR '98, pages 206--214, NY, USA, 1998. ACM.
[20]
F. J. Och and H. Ney. A Systematic Comparison of Various Statistical Alignment Models. Computational Linguistics, 29(1):19--51, 2003.
[21]
I. Ounis, G. Amati, P. V., B. He, C. Macdonald, and Johnson. Terrier Information Retrieval Platform. In ECIR '05, Volume 3408 of Lecture Notes in Computer Science, pages 517--519. Springer, 2005.
[22]
K. Philipp. Europarl: A Parallel Corpus for Statistical Machine Translation. In MT Summit, 2005.
[23]
S. Robertson. On GMAP: and Other Transformations. In CIKM '06, pages 78--83, NY, USA, 2006. ACM.
[24]
T. Sakai, T. Manabe, and M. Koyama. Flexible Pseudo-Relevance Feedback via Selective Sampling. ACM Transactions on Asian Language Information Processing (TALIP), 4(2):111--135, 2005.
[25]
T. Tao and C. Zhai. Regularized Estimation of Mixture Models for Robust Pseudo-Relevance Feedback. In SIGIR '06, pages 162--169, NY, USA, 2006. ACM.
[26]
J. Tiedemann. The Use of Parallel Corpora in Monolingual Lexicography -- How Word Alignment Can Identify Morphological and Semantic Relations. In Proceedings of the 6th Conference on Computational Lexicography and Corpus Research (COMPLEX), pages 143--151, UK, July 2001.
[27]
E. Voorhees. Overview of The TREC 2005 Robust Retrieval Track. In E. M. Voorhees and L. P. Buckland, Editors, The Fourteenth Text REtrieval Conference, TREC 2005, Gaithersburg, MD, 2006. NIST.
[28]
E. M. Voorhees. Query Expansion Using Lexical-Semantic Relations. In SIGIR '94, pages 61--69, NY, USA, 1994. Springer-Verlag.
[29]
D. Wu, D. He, H. Ji, and R. Grishman. A Study of Using an Out-Of-Box Commercial MT System for Query Translation in CLIR. In iNEWS '08: ACM Workshop on Improving Non English Web Searching, pages 71--76, New York, NY, USA, 2008. ACM.
[30]
J. Xu and W. B. Croft. Improving the Effectiveness of Information Retrieval with Local Context Analysis. ACM Transactions on Information Systems, 18(1):79--112, 2000.
[31]
Y. Xu, G. J. Jones, and B. Wang. Query Dependent Pseudo-Relevance Feedback Based on Wikipedia. In SIGIR '09, pages 59--66, NY, USA, 2009. ACM.
[32]
C. Zhai and J. Lafferty. Model-based Feedback in the Language Modeling Approach to Information Retrieval. In CIKM '01, pages 403--410, NY, USA, 2001. ACM Press.
[33]
C. Zhai and J. Lafferty. A Study of Smoothing Methods for Language Models applied to Information Retrieval. ACM Transactions on Information Systems, 22(2):179--214, 2004.

Cited By

View all
  • (2016)Query Expansion in Resource-Scarce LanguagesACM Transactions on Asian and Low-Resource Language Information Processing10.1145/299764316:2(1-17)Online publication date: 18-Nov-2016
  • (2015)Multilingual information retrieval in the language modeling frameworkInformation Retrieval10.1007/s10791-015-9255-118:3(246-281)Online publication date: 1-Jun-2015
  • (2014)Cross-Language Pseudo-Relevance Feedback Techniques for Informal TextProceedings of the 36th European Conference on IR Research on Advances in Information Retrieval - Volume 841610.5555/2964060.2964176(260-272)Online publication date: 13-Apr-2014
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
July 2010
944 pages
ISBN:9781450301534
DOI:10.1145/1835449
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 July 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. language models
  2. multilingual
  3. pseudo-relevance feedback
  4. query expansion

Qualifiers

  • Research-article

Conference

SIGIR '10
Sponsor:

Acceptance Rates

SIGIR '10 Paper Acceptance Rate 87 of 520 submissions, 17%;
Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 24 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2016)Query Expansion in Resource-Scarce LanguagesACM Transactions on Asian and Low-Resource Language Information Processing10.1145/299764316:2(1-17)Online publication date: 18-Nov-2016
  • (2015)Multilingual information retrieval in the language modeling frameworkInformation Retrieval10.1007/s10791-015-9255-118:3(246-281)Online publication date: 1-Jun-2015
  • (2014)Cross-Language Pseudo-Relevance Feedback Techniques for Informal TextProceedings of the 36th European Conference on IR Research on Advances in Information Retrieval - Volume 841610.5555/2964060.2964176(260-272)Online publication date: 13-Apr-2014
  • (2013)Modeling click-through based word-pairs for web searchProceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval10.1145/2484028.2484082(483-492)Online publication date: 28-Jul-2013
  • (2012)Leveraging interlingual classification to improve web searchProceedings of the 21st International Conference on World Wide Web10.1145/2187980.2188114(535-536)Online publication date: 16-Apr-2012
  • (2012)Combining Signals for Cross-Lingual Relevance FeedbackInformation Retrieval Technology10.1007/978-3-642-35341-3_31(356-365)Online publication date: 2012
  • (2011)Expanding queries with term and phrase translations in patent retrievalProceedings of the Second international conference on Multidisciplinary information retrieval facility10.5555/2018142.2018147(16-29)Online publication date: 6-Jun-2011
  • (2011)Fractional similarityProceedings of the 33rd European conference on Advances in information retrieval10.5555/1996889.1996919(226-237)Online publication date: 18-Apr-2011
  • (2011)Enriching document representation via translation for improved monolingual information retrievalProceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval10.1145/2009916.2010030(853-862)Online publication date: 24-Jul-2011
  • (2011)Expanding Queries with Term and Phrase Translations in Patent RetrievalMultidisciplinary Information Retrieval10.1007/978-3-642-21353-3_3(16-29)Online publication date: 2011
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media