Abstract
In this paper we explore various methods to estimate the collection parameter of the information based models for ad hoc information retrieval. In previous studies, this parameter was set to the average number of documents where the word under consideration appears. We introduce here a fully formalized estimation method for both the log-logistic and the smoothed power law models that leads to improved versions of these models in IR. Furthermore, we show that the previous setting of the collection parameter of the log-logistic model is a special case of the estimated value proposed here.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Balakrishnan, N., Rao, C.R.: Advances in Survival Analysis, 3rd edn. Handbook of Statistics, vol. 23, ch. 5, p. 96. North Holland (February 2004)
Church, K.W., Gale, W.A.: Poisson mixtures. Natural Language Engineering 1, 163–190 (1995)
Clinchant, S., Gaussier, E.: Information-based models for ad hoc ir. In: Proceedings of the 33rd Annual International ACM SIGIR Conference (2010)
Clinchant, S., Gaussier, E.: Retrieval constraints and word frequency distributions a log-logistic model for ir. Information Retrieval 14(1), 5–25 (2011)
Fang, H., Tao, T., Zhai, C.: A formal study of information retrieval heuristics. In: Proceedings of the 27th Annual International ACM SIGIR Conference (2004)
Johnson, N., Kemp, A., Kotz, S.: Univariate Discrete Distributions. John Wiley & Sons, Inc. (1993)
Kaplan, E.L., Meier, P.: Nonparametric estimation from incomplete observations. Journal of the American Statistical Association 53(282), 457–481 (1958)
Lv, Y., Zhai, C.: A Log-Logistic Model-Based Interpretation of TF Normalization of BM25. In: Baeza-Yates, R., de Vries, A.P., Zaragoza, H., Cambazoglu, B.B., Murdock, V., Lempel, R., Silvestri, F. (eds.) ECIR 2012. LNCS, vol. 7224, pp. 244–255. Springer, Heidelberg (2012)
Ponte, J.M., Bruce Croft, W.: A language modeling approach to information retrieval. In: Proceedings of the 21st Annual International ACM SIGIR Conference (1998)
Robertson, S.E., Zaragoza, H.: The probabilistic relevance framework: BM25 and beyond. Foundations and Trends in Information Retrieval 3(4), 333–389 (2009)
Zhai, C., Lafferty, J.D.: A study of smoothing methods for language models applied to ad hoc information retrieval. In: Proceedings of the 24th Annual International ACM SIGIR Conference (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Goswami, P., Gaussier, E. (2013). Estimation of the Collection Parameter of Information Models for IR. In: Serdyukov, P., et al. Advances in Information Retrieval. ECIR 2013. Lecture Notes in Computer Science, vol 7814. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36973-5_39
Download citation
DOI: https://doi.org/10.1007/978-3-642-36973-5_39
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36972-8
Online ISBN: 978-3-642-36973-5
eBook Packages: Computer ScienceComputer Science (R0)