Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/2997189.2997285guideproceedingsArticle/Chapter ViewAbstractPublication PagesnipsConference Proceedingsconference-collections
Article

Online learning for Latent Dirichlet Allocation

Published: 06 December 2010 Publication History

Abstract

We develop an online variational Bayes (VB) algorithm for Latent Dirichlet Allocation (LDA). Online LDA is based on online stochastic optimization with a natural gradient step, which we show converges to a local optimum of the VB objective function. It can handily analyze massive document collections, including those arriving in a stream. We study the performance of online LDA in several ways, including by fitting a 100-topic topic model to 3.3M articles from Wikipedia in a single pass. We demonstrate that online LDA finds topic models as good or better than those found with batch VB, and in a fraction of the time.

References

[1]
M. Braun and J. McAuliffe. Variational inference for large-scale models of discrete choice. arXiv, (0712.2526), 2008.
[2]
D. Blei and M. Jordan. Variational methods for the Dirichlet process. In Proc. 21st Int'l Conf. on Machine Learning, 2004.
[3]
A. Asuncion, M. Welling, P. Smyth, and Y.W. Teh. On smoothing and inference for topic models. In Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence, 2009.
[4]
D. Newman, A. Asuncion, P. Smyth, and M. Welling. Distributed inference for latent Dirichlet allocation. In Neural Information Processing Systems, 2007.
[5]
Feng Yan, Ningyi Xu, and Yuan Qi. Parallel inference for latent Dirichlet allocation on graphics processing units. In Advances in Neural Information Processing Systems 22, pages 2134-2142, 2009.
[6]
L. Bottou and O. Bousquet. The tradeoffs of large scale learning. In Advances in Neural Information Processing Systems, volume 20, pages 161-168. NIPS Foundation (http://books.nips.cc), 2008.
[7]
D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993-1022, January 2003.
[8]
Hanna Wallach, David Mimno, and Andrew McCallum. Rethinking lda: Why priors matter. In Advances in Neural Information Processing Systems 22, pages 1973-1981, 2009.
[9]
W. Buntine. Variational extentions to EM and multinomial PCA. In European Conf. on Machine Learning, 2002.
[10]
J. Mairal, F. Bach, J. Ponce, and G. Sapiro. Online learning for matrix factorization and sparse coding. Journal of Machine Learning Research, 11(1):19-60, 2010.
[11]
L. Yao, D. Mimno, and A. McCallum. Efficient methods for topic model inference on streaming document collections. In KDD 2009: Proc. 15th ACM SIGKDD int'l Conf. on Knowledge discovery and data mining, pages 937-946, 2009.
[12]
M. Jordan, Z. Ghahramani, T. Jaakkola, and L. Saul. Introduction to variational methods for graphical models. Machine Learning, 37:183-233, 1999.
[13]
H. Attias. A variational Bayesian framework for graphical models. In Advances in Neural Information Processing Systems 12, 2000.
[14]
A. Dempster, N. Laird, and D. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39:1-38, 1977.
[15]
L. Bottou and N. Murata. Stochastic approximations and efficient learning. The Handbook of Brain Theory and Neural Networks, Second edition. The MIT Press, Cambridge, MA, 2002.
[16]
M.A. Sato. Online model selection based on the variational Bayes. Neural Computation, 13(7):1649-1681, 2001.
[17]
P. Liang and D. Klein. Online EM for unsupervised models. In Proc. Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 611-619, 2009.
[18]
H. Robbins and S. Monro. A stochastic approximation method. The Annals of Mathematical Statistics, 22(3):400-407, 1951.
[19]
L. Bottou. Online learning and stochastic approximations. Cambridge University Press, Cambridge, UK, 1998.
[20]
R.M. Neal and G.E. Hinton. A view of the EM algorithm that justifies incremental, sparse, and other variants. Learning in graphical models, 89:355-368, 1998.
[21]
M.A. Sato and S. Ishii. On-line EM algorithm for the normalized Gaussian network. Neural Computation, 12(2):407-432, 2000.
[22]
T. Griffiths and M. Steyvers. Finding scientific topics. Proc. National Academy of Science, 2004.
[23]
X. Song, C.Y. Lin, B.L. Tseng, and M.T. Sun. Modeling and predicting personal information dissemination behavior. In KDD 2005: Proc. 11th ACM SIGKDD int'l Conf. on Knowledge discovery and data mining. ACM, 2005.
[24]
K.R. Canini, L. Shi, and T.L. Griffiths. Online inference of topics with latent Dirichlet allocation. In Proceedings of the International Conference on Artificial Intelligence and Statistics, volume 5, 2009.
[25]
J. Chang, J. Boyd-Graber, S. Gerrish, C. Wang, and D. Blei. Reading tea leaves: How humans interpret topic models. In Advances in Neural Information Processing Systems 21 (NIPS), 2009.

Cited By

View all
  • (2023)Assessment of the Quality of Topic Models for Information Retrieval ApplicationsProceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3578337.3605118(265-274)Online publication date: 9-Aug-2023
  • (2023)A systematic review of the use of topic models for short text social media analysisArtificial Intelligence Review10.1007/s10462-023-10471-x56:12(14223-14255)Online publication date: 1-May-2023
  • (2022)Darwin's Theory of CensorshipProceedings of the 21st Workshop on Privacy in the Electronic Society10.1145/3559613.3563206(103-108)Online publication date: 7-Nov-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
NIPS'10: Proceedings of the 24th International Conference on Neural Information Processing Systems - Volume 1
December 2010
2630 pages

Publisher

Curran Associates Inc.

Red Hook, NY, United States

Publication History

Published: 06 December 2010

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Assessment of the Quality of Topic Models for Information Retrieval ApplicationsProceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3578337.3605118(265-274)Online publication date: 9-Aug-2023
  • (2023)A systematic review of the use of topic models for short text social media analysisArtificial Intelligence Review10.1007/s10462-023-10471-x56:12(14223-14255)Online publication date: 1-May-2023
  • (2022)Darwin's Theory of CensorshipProceedings of the 21st Workshop on Privacy in the Electronic Society10.1145/3559613.3563206(103-108)Online publication date: 7-Nov-2022
  • (2022)ADPLProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531933(245-255)Online publication date: 6-Jul-2022
  • (2022)Unsupervised model for aspect categorization and implicit aspect extractionKnowledge and Information Systems10.1007/s10115-022-01678-564:6(1625-1651)Online publication date: 1-Jun-2022
  • (2021)Are Topics Interesting or Not? An LDA-based Topic-graph Probabilistic Model for Web Search PersonalizationACM Transactions on Information Systems10.1145/347610640:3(1-24)Online publication date: 30-Dec-2021
  • (2021)Topic-based Video AnalysisACM Computing Surveys10.1145/345908954:6(1-34)Online publication date: 13-Jul-2021
  • (2021)Distributed Latent Dirichlet Allocation on StreamsACM Transactions on Knowledge Discovery from Data10.1145/345152816:1(1-20)Online publication date: 20-Jul-2021
  • (2021)Hierarchical Concept-Driven Language ModelACM Transactions on Knowledge Discovery from Data10.1145/345116715:6(1-22)Online publication date: 19-May-2021
  • (2021)Topic modelling of authentication events in an enterprise computer network2016 IEEE Conference on Intelligence and Security Informatics (ISI)10.1109/ISI.2016.7745466(190-192)Online publication date: 11-Mar-2021
  • Show More Cited By

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media