Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Semi-subsumed Events: A Probabilistic Semantics of the BM25 Term Frequency Quantification

  • Conference paper
Advances in Information Retrieval Theory (ICTIR 2009)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5766))

Included in the following conference series:

Abstract

Through BM25, the asymptotic term frequency quantification TF = tf/(tf+K), where tf is the within-document term frequency and K is a normalisation factor, became popular. This paper reports a finding regarding the meaning of the TF quantification: in the triangle of independence and subsumption, the TF quantification forms the altitude, that is, the middle between independent and subsumed events. We refer to this new assumption as semi-subsumed. While this finding of a well-defined probabilistic assumption solves the probabilistic interpretation of the BM25 TF quantification, it is also of wider impact regarding probability theory.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Robertson, S.: Understanding inverse document frequency: On theoretical arguments for idf. Journal of Documentation 60, 503–520 (2004)

    Article  Google Scholar 

  2. Robertson, S.E., Walker, S.: Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval. In: ACM SIGIR, pp. 232–241 (1994)

    Google Scholar 

  3. Robertson, S.E., Walker, S., Hancock-Beaulieu, M.: Large test collection experiments on an operational interactive system: Okapi at TREC. IP&M 31, 345–360 (1995)

    Google Scholar 

  4. Roelleke, T., Tsikrika, T., Kazai, G.: A general matrix framework for modelling information retrieval. IP&M, Special Issue on Theory in Information Retrieval 42(1) (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wu, H., Roelleke, T. (2009). Semi-subsumed Events: A Probabilistic Semantics of the BM25 Term Frequency Quantification. In: Azzopardi, L., et al. Advances in Information Retrieval Theory. ICTIR 2009. Lecture Notes in Computer Science, vol 5766. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04417-5_43

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-04417-5_43

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-04416-8

  • Online ISBN: 978-3-642-04417-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics