Abstract
Through BM25, the asymptotic term frequency quantification TF = tf/(tf+K), where tf is the within-document term frequency and K is a normalisation factor, became popular. This paper reports a finding regarding the meaning of the TF quantification: in the triangle of independence and subsumption, the TF quantification forms the altitude, that is, the middle between independent and subsumed events. We refer to this new assumption as semi-subsumed. While this finding of a well-defined probabilistic assumption solves the probabilistic interpretation of the BM25 TF quantification, it is also of wider impact regarding probability theory.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Robertson, S.: Understanding inverse document frequency: On theoretical arguments for idf. Journal of Documentation 60, 503–520 (2004)
Robertson, S.E., Walker, S.: Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval. In: ACM SIGIR, pp. 232–241 (1994)
Robertson, S.E., Walker, S., Hancock-Beaulieu, M.: Large test collection experiments on an operational interactive system: Okapi at TREC. IP&M 31, 345–360 (1995)
Roelleke, T., Tsikrika, T., Kazai, G.: A general matrix framework for modelling information retrieval. IP&M, Special Issue on Theory in Information Retrieval 42(1) (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wu, H., Roelleke, T. (2009). Semi-subsumed Events: A Probabilistic Semantics of the BM25 Term Frequency Quantification. In: Azzopardi, L., et al. Advances in Information Retrieval Theory. ICTIR 2009. Lecture Notes in Computer Science, vol 5766. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04417-5_43
Download citation
DOI: https://doi.org/10.1007/978-3-642-04417-5_43
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04416-8
Online ISBN: 978-3-642-04417-5
eBook Packages: Computer ScienceComputer Science (R0)