Abstract
We explore the utility of different types of topic models for retrieval purposes. Based on prior work, we describe several ways that topic models can be integrated into the retrieval process. We evaluate the effectiveness of different types of topic models within those retrieval approaches. We show that: (1) topic models are effective for document smoothing; (2) more rigorous topic models such as Latent Dirichlet Allocation provide gains over cluster-based models; (3) more elaborate topic models that capture topic dependencies provide no additional gains; (4) smoothing documents by using their similar documents is as effective as smoothing them by using topic models; (5) doing query expansion should utilize topics discovered in the top feedback documents instead of coarse-grained topics from the whole corpus; (6) generally, incorporating topics in the feedback documents for building relevance models can benefit the performance more for queries that have more relevant documents.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Xu, J., Croft, W.B.: Cluster-based language models for distributed retrieval. In: Proceedings of ACM SIGIR, pp. 254–261 (1999)
Blei, D., Ng, A., Jordan, M.: Latent Dirichlet Allocation. Journal of machine Learning Research 3, 993–1022 (2003)
Li, W., McCallum, A.: Pachinko Allocation: DAG-structured mixture models of topic correlations. In: Proceedings of ICML, Pittsburgh, PA, pp. 577–584 (2006)
Nigam, K., McCallum, A., Thrun, S., Mitchell, T.M.: Text classification from labeled and unlabeled documents using EM. Machine Learning 39(2/3), 103–134 (2000)
Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of ACM SIGIR, Berkeley,CA,USA, pp. 50–57 (1999)
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. Journal of the American Society for Information Science 41, 391–407 (1990)
Wei, X., Croft, W.B.: LDA-based document models for ad-hoc retrieval. In: Proceedings of ACM SIGIR, Seattle, Washington, pp. 178–185 (2006)
Steyvers, M., Griffiths, T.: Probabilistic topic models. Handbook of Latent Semantic Analysis (2007)
Lavrenko, V., Croft, W.B.: Relevance-based language models. In: Proceedings of ACM SIGIR, pp. 120–127 (2001)
Zhai, C., Lafferty, J.: Model-based feedback in the language modeling approach to information retrieval. In: Proceedings of CIKM, pp. 403–410 (2001)
Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to ad-hoc Information Retrieval. In: Proceedings of ACM SIGIR, pp. 334–342 (2001)
Liu, X., Croft, W.B.: Cluster-based retrieval using language models. In: Proceedings of ACM SIGIR, Sheffield, UK, pp. 186–193 (2004)
Tao, T., Wang, X., Mei, Q., Zhai, C.: Language model information retrieval with document expansion. In: Proceedings of HLT/NAACL, pp. 407–414 (2006)
Lafferty, J., Zhai, C.: Document language models, query models, and risk minimization for Information Retrieval. In: Proceedings of ACM SIGIR, pp. 111–119 (2001)
Lavrenko, V.: A generative theory of relevance. Ph.D. Dissertation, 55–56 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yi, X., Allan, J. (2009). A Comparative Study of Utilizing Topic Models for Information Retrieval. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds) Advances in Information Retrieval. ECIR 2009. Lecture Notes in Computer Science, vol 5478. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00958-7_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-00958-7_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00957-0
Online ISBN: 978-3-642-00958-7
eBook Packages: Computer ScienceComputer Science (R0)