research-article

Group topic model: organizing topics into groups

Authors:

Tian TianAuthors Info & Claims

Information Retrieval Journal, Volume 18, Issue 1

Pages 1 - 25

https://doi.org/10.1007/s10791-014-9244-9

Published: 01 February 2015 Publication History

Abstract

Latent Dirichlet allocation defines hidden topics to capture latent semantics in text documents. However, it assumes that all the documents are represented by the same topics, resulting in the “forced topic” problem. To solve this problem, we developed a group latent Dirichlet allocation (GLDA). GLDA uses two kinds of topics: local topics and global topics. The highly related local topics are organized into groups to describe the local semantics, whereas the global topics are shared by all the documents to describe the background semantics. GLDA uses variational inference algorithms for both offline and online data. We evaluated the proposed model for topic modeling and document clustering. Our experimental results indicated that GLDA can achieve a competitive performance when compared with state-of-the-art approaches.

References

[1]

Blei, D., & Lafferty, J. (2006). Dynamic topic models. In Proceedings of the 23rd international conference on machine learning (pp. 113–120). ACM.

[2]

Blei, D., & McAuliffe, J. (2007). Supervised topic models. In Proceedings of the neural information processing systems.

[3]

Blei D, Ng A, and Jordan M Latent Dirichlet allocation The Journal of Machine Learning Research 2003 3 993-1022

[4]

Blei D and Lafferty J A correlated topic model fo science The Annals of Applied Statistics 2007 1 1 17-35

[5]

Blei D, Griffiths T, and Jordan M The nested chinese restaurant process and Bayesian nonparametric inference of topic hierarchies Journal of the ACM 2010 57 2 1-30

[6]

Blei D Probabilistic topic models Communications of the ACM 2012 55 4 77-84

[7]

Boyd-Graber, J., & Blei, D. (2008). Syntactic topic models. In Proceedings of neural information processing systems.

[8]

Cai, D., He, X., & Han, J. (2011). Locally consistent concept factorization for document clustering. IEEE Transactions on Knowledge and Data Engineering, 23(6), 902–913.

[9]

Chang J and Blei D Hierarchical relational models for document networks Annals of Applied Statistics 2010 4 1 124-150

[10]

Deerwester S, Dumais ST, Furnas GW, Landauer TK, and Harshman R Indexing by latent semantic analysis Journal of the American Society for Information Science 1990 41 6 391-407

[11]

Doyle, G., & Elkan, C. (2009). Accounting for burstiness in topic models. In Proceedings of the 26th international conference on machine learning (pp. 281–288). ACM.

[12]

Hoffman, M., & Blei, D. (2010). Online learning for latent Dirichlet allocation. In Advances in neural information processing systems.

[13]

Hoffman M, Blei D, and Wang C Stochastic variational inference Journal of Machine Learning Research 2013 14 1 1303-1347

[14]

Hofmann, T. (1999). Probabilistic latent semantic indexing. In Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval (pp. 50–57). ACM.

[15]

Jing L, Ng MK, and Huang JZ An entropy weighting k-means algorithm for subspace clustering of high-dimensional sparse data IEEE Transactions on Knowledge and Data Engineering 2007 19 8 1026-1041

[16]

Koller D and Friedman N Probabilistic graphical models: Principles and techniques 2009 Cambridge MIT Press

[17]

Li, W., & McCallum, A. (2006). Pachinko allocation: Dag-structured mixture models of topic correlations. In Proceedings of the 23rd international conference on machine learning (pp. 577–584). ACM.

[18]

Li, F., & Perona, P. (2005). A Bayesian hierarchical model for learning natural scene categories. In Computer vision and pattern recognition (Vol. 2, pp. 524–531). IEEE.

[19]

Lovasz L and Plummer M Matching theory 1986 North Holland Akademiai Kiado

[20]

Lu Y, Mei Q, and Zhai C Investigating task performance of probabilistic topic models: An empirical study of PLSA and LDA Information Retrieval 2011 14 2 178-203

[21]

Reisinger, J., Waters, A., Silverthorn, B., & Mooney, R. (2009). Decoupling sparsity and smoothness in the discrete hierarchical Dirichlet process. In Proceedings of neural information processing systems (pp. 1982–1989). (2009).

[22]

Reisinger, J., Waters, A., Silverthorn, B., & Mooney, R. (2010). Spherical topic models. In Proceedings of the 27th international conference on machine learning. ACM.

[23]

Sivic, J., Russell, B., Zisserman, A., Freeman, W., & Efros, A. (2008). Unsupervised discovery of visual object class hierarchies. In Proceedings of the computer vision and pattern recognition (pp. 1–8). IEEE.

[24]

Teh YW, Jordan MI, Beal MJ, and Blei DM Hierarchical Dirichlet processes Journal of the American Statistical Association 2006 101 476 1566-1581

[25]

Wallach, H. M. (2006). Topic modeling: Beyond bag-of-words. In Proceedings of the 23rd international conference on machine learning (pp. 977–984). ACM.

[26]

Wallach, H. M. (2008). Structured topic models for language. Ph.D. thesis. Newnham College, University of Cambridge.

[27]

Wallach, H., Mimno, D., & McCallum, A. (2009a). Rethinking LDA: Why priors matter. In Advances in neural information processing systems.

[28]

Wallach, H. M., Murray, I., Salakhutdinov, R., & Mimn, D. (2009b). Evaluation methods for topic models. In Proceedings of the 26th conference on uncertainty in artificial intelligence (pp. 1105–111). ACM.

[29]

Wang, X., Mccallum, A., & Wei, X. (2007). Topical n-grams: Phrase and topic discovery, with an application to information retrieval. In Proceedings of the 7th IEEE international conference on data mining. IEEE.

[30]

Wang, C., Thiesson, B., Meek, C., & Blei, D. (2009). Markov topic models. In Proceedings of the 12th international conference on artificial intelligence and statistics (pp. 583–590). Journal of Machine Learning Research.

[31]

Xie, P., & Xing, E. P. (2013). Integrating document clustering and topic modeling. In Proceedings of the 20th conference on uncertainty in artificial intelligence (pp. 694–703).

[32]

Zhang, D., Wang, J., & Si, L. (2011). Document clustering with universum. In Proceedings of the 34th international ACM SIGIR conference on research and development in information retrieval (pp. 873–882). ACM.

[33]

Zinkevich, M. (2003). Online convex programming and generalized infinitesimal gradient ascent. In 20th international conference on machine learning. ACM.

Cited By

Liu CYang S(2024)A two-stage clustering ensemble algorithm applicable to risk assessment of railway signaling faultsExpert Systems with Applications: An International Journal10.1016/j.eswa.2024.123500249:PAOnline publication date: 1-Sep-2024
https://dl.acm.org/doi/10.1016/j.eswa.2024.123500
Li XWang BWang YOuyang JGarg HThanh D(2023)Weakly supervised prototype topic model with discriminative seed words: modifying the category prior by self-exploring supervised signalsSoft Computing - A Fusion of Foundations, Methodologies and Applications10.1007/s00500-022-07771-927:9(5397-5410)Online publication date: 6-Jan-2023
https://dl.acm.org/doi/10.1007/s00500-022-07771-9
Yan RGao G(2019)Pseudo Topic Analysis for Boosting Pseudo Relevance FeedbackWeb and Big Data10.1007/978-3-030-26072-9_26(345-361)Online publication date: 1-Aug-2019
https://dl.acm.org/doi/10.1007/978-3-030-26072-9_26
Show More Cited By

Index Terms

Group topic model: organizing topics into groups

Index terms have been assigned to the content through auto-classification.

Recommendations

Joint sentiment/topic model for sentiment analysis
CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management

Sentiment analysis or opinion mining aims to use automated tools to detect subjective information such as opinions, attitudes, and feelings expressed in text. This paper proposes a novel probabilistic modeling framework based on Latent Dirichlet ...
Twitter Opinion Topic Model: Extracting Product Opinions from Tweets by Leveraging Hashtags and Sentiment Lexicon
CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management

Aspect-based opinion mining is widely applied to review data to aggregate or summarize opinions of a product, and the current state-of-the-art is achieved with Latent Dirichlet Allocation (LDA)-based model. Although social media data like tweets are ...
Topic-Driven Multi-document Summarization
IALP '10: Proceedings of the 2010 International Conference on Asian Language Processing

This paper presents a topic-driven framework for generating a generic summary from multi-documents. Our approach is based on the intuition that, from the statistical point of view, the summary’s probability distribution over the topics should be ...

Comments

Information & Contributors

Information

Published In

cover image Information Retrieval

Information Retrieval Volume 18, Issue 1

Feb 2015

94 pages

ISSN:1386-4564

Issue’s Table of Contents

© Springer Science+Business Media New York 2014.

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 February 2015

Accepted: 27 August 2014

Received: 12 March 2014

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 04 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Liu CYang S(2024)A two-stage clustering ensemble algorithm applicable to risk assessment of railway signaling faultsExpert Systems with Applications: An International Journal10.1016/j.eswa.2024.123500249:PAOnline publication date: 1-Sep-2024
https://dl.acm.org/doi/10.1016/j.eswa.2024.123500
Li XWang BWang YOuyang JGarg HThanh D(2023)Weakly supervised prototype topic model with discriminative seed words: modifying the category prior by self-exploring supervised signalsSoft Computing - A Fusion of Foundations, Methodologies and Applications10.1007/s00500-022-07771-927:9(5397-5410)Online publication date: 6-Jan-2023
https://dl.acm.org/doi/10.1007/s00500-022-07771-9
Yan RGao G(2019)Pseudo Topic Analysis for Boosting Pseudo Relevance FeedbackWeb and Big Data10.1007/978-3-030-26072-9_26(345-361)Online publication date: 1-Aug-2019
https://dl.acm.org/doi/10.1007/978-3-030-26072-9_26
Li XLi CChi JOuyang J(2018)Short text topic modeling by exploring original documentsKnowledge and Information Systems10.1007/s10115-017-1099-056:2(443-462)Online publication date: 1-Aug-2018
https://dl.acm.org/doi/10.1007/s10115-017-1099-0
Van Linh NAnh NThan KDang C(2017)An effective and interpretable method for document classificationKnowledge and Information Systems10.1007/s10115-016-0956-650:3(763-793)Online publication date: 1-Mar-2017
https://dl.acm.org/doi/10.1007/s10115-016-0956-6

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents