Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3209978.3210147acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
short-paper

A New Term Frequency Normalization Model for Probabilistic Information Retrieval

Published: 27 June 2018 Publication History

Abstract

In probabilistic BM25, term frequency normalization is one of the key components. It is often controlled by parameters $k_1$ and b, which need to be optimized for each given data set. In this paper, we assume and show empirically that term frequency normalization should be specific with query length in order to optimize retrieval performance. Following this intuition, we first propose a new term frequency normalization with query length for probabilistic information retrieval, namely \textttBM25\tiny QL . Then \textttBM25\tiny QL is incorporated into the state-of-the-art models CRTER riptsize 2 and LDA-BM25, denoted as $\textttCRTER riptsize 2 ^\texttt\tiny QL $ and \textttLDA-BM25\tiny QL respectively. A series of experiments show that our proposed approaches \textttBM25\tiny QL, $\textttCRTER riptsize 2 ^\texttt\tiny QL $ and \textttLDA-BM25\tiny QL are comparable to BM25, CRTER riptsize 2 and LDA-BM25 with the optimal b setting in terms of MAP on all the data sets.

References

[1]
R. Cummins and C. O'Riordan . 2009. The Effect of Query Length on Normalisation in Information Retrieval Proc. of the 2009 AICS. 26--32.
[2]
R. Cummins and C. O'Riordan . 2012. A Constraint to Automatically Regulate Document-length Normalisation Proc. of the 21st ACM CIKM. 2443--2446.
[3]
F. Jian, J. X. Huang, J. Zhao, T. He and P. Hu . 2016. A Simple Enhancement for Ad-hoc Information Retrieval via Topic Modelling Proc. of the 39th ACM SIGIR. 733--736.
[4]
B. He and I. Ounis . 2007. On Setting the Hyper-parameters of Term Frequency Normalization for Information Retrieval. ACM TOIS Vol. 25, 3 (2007), 13.
[5]
J. X. Huang J. Zhao and B. He . 2011. CRTER: Using Cross Terms to Enhance Probabilistic IR Proc. of the 34th ACM SIGIR. 155--164.
[6]
Y. Lv . 2015. A Study of Query Length Heuristics in Information Retrieval Proc. of the 24th ACM CIKM. 1747--1750.
[7]
Y. Lv and C. Zhai . 2011 a. Adaptive Term Frequency Normalization for BM25. In Proc. of the 20th ACM CIKM. 1985--1988.
[8]
Y. Lv and C. Zhai . 2011 b. Lower-bounding Term Frequency Normalization. In Proc. of the 20th ACM CIKM. 7--16.
[9]
Y. Lv and C. Zhai . 2011 c. When Documents Are Very Long, BM25 Fails!. In Proc. of the 34th ACM SIGIR. 1103--1104.
[10]
X. Huang S. Robertson S. Walker M. Beaulieu, M. Gatford and P. Williams . 1996. Okapi at TREC-5. In Proc. of the 5th TREC. 143--166.
[11]
Jiaul H. Paik . 2013. A Novel TF-IDF Weighting Scheme for Effective Ranking Proc. of the 36th ACM SIGIR. 343--352.
[12]
J.R. Wen R. Song, L. Yu and W.H. Hon . 2011. A Proximity Probabilistic Model for Information Retrieval. Tech. Rep., Microsoft Research (2011).
[13]
C. Clarke S. Buttcher and B. Lushman . 2006. Term Proximity Scoring for Ad-hoc Retrieval on Very Large Text Collections Proc. of the 29th ACM SIGIR. 621 -- 622.
[14]
H. Zaragoza S. Robertson and M. Taylor . 2004. Simple BM25 Extension to Multiple Weighted Fields Proc. of the 13th ACM CIKM. 42--49.
[15]
K.F. Wong K.L. Kwok T.L. Chung, R.W.P. Luk and D.L. Lee . 2006. Adapting Pivoted Document-length Normalization for Query Size: Experiments in Chinese and English. ACM TALIP Vol. 5, 3 (2006), 245--263.
[16]
X. Wei and W. B. Croft . 2006. LDA-Based Document Models for Ad-hoc Retrieval. In Proc. of the 29th ACM SIGIR. 178--185.

Cited By

View all
  • (2019)A topic‐based term frequency normalization framework to enhance probabilistic information retrievalComputational Intelligence10.1111/coin.1224836:2(486-521)Online publication date: 20-Nov-2019

Index Terms

  1. A New Term Frequency Normalization Model for Probabilistic Information Retrieval

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '18: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval
    June 2018
    1509 pages
    ISBN:9781450356572
    DOI:10.1145/3209978
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 June 2018

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. bm25
    2. probabilistic model
    3. term frequency normalization

    Qualifiers

    • Short-paper

    Funding Sources

    Conference

    SIGIR '18
    Sponsor:

    Acceptance Rates

    SIGIR '18 Paper Acceptance Rate 86 of 409 submissions, 21%;
    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)11
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 30 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2019)A topic‐based term frequency normalization framework to enhance probabilistic information retrievalComputational Intelligence10.1111/coin.1224836:2(486-521)Online publication date: 20-Nov-2019

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media