Abstract
Stochastic variational inference (SVI) can learn topic models with very big corpora. It optimizes the variational objective by using the stochastic natural gradient algorithm with a decreasing learning rate. This rate is crucial for SVI; however, it is often tuned by hand in real applications. To address this, we develop a novel algorithm, which tunes the learning rate of each iteration adaptively. The proposed algorithm uses the Kullback–Leibler (KL) divergence to measure the similarity between the variational distribution with noisy update and that with batch update, and then optimizes the learning rates by minimizing the KL divergence. We apply our algorithm to two representative topic models: latent Dirichlet allocation and hierarchical Dirichlet process. Experimental results indicate that our algorithm performs better and converges faster than commonly used learning rates.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Jin F, Huang M L, Zhu X Y. Guided structure-aware review summarization. Journal of Computer Science and Technology, 2011, 26(4): 676–684.
Li P, Wang B, Jin W. Improving Web document clustering through employing user-related tag expansion techniques. Journal of Computer Science and Technology, 2012, 27(3): 554–566.
Newman D, Asuncion A, Smyth P, Welling M. Distributed algorithms for topic models. Journal of Machine Learning Research, 2009, 10: 1801–1828.
Yan F, Xu N, Qi Y. Parallel inference for latent Dirichlet allocation on graphics processing units. In Proc. the 23rd NIPS, Dec. 2009, pp.2134-2142.
Liu Z, Zhang Y, Chang E, Sun M. PLDA+: Parallel latent Dirichlet allocation with data placement and pipeline processing. ACM Transactions on Intelligent Systems and Technology, 2011, 2(3): Article No. 26.
AlSumait L, Barbara D, Domeniconi C. On-line LDA: Adaptive topic models for mining text streams with applications to topic detection and tracking. In Proc. the 8th ICDM, Dec. 2008, pp.3-12.
Yao L, Mimno D, McCallum A. Efficient methods for topic model inference on streaming document collections. In Proc. the 15th SIGKDD, June 28-July 1, 2009, pp.937-945.
Hoffman M D, Blei D M. Online learning for latent Dirichlet allocation. In Proc. the 24th NIPS, Dec. 2010.
Mimno D, Hoffman M D, Blei D M. Sparse stochastic inference for latent Dirichlet allocation. In Proc. the 29th ICML, June 27-July 3, 2012, pp.1599-1606.
Wang C, Chen X, Smola A, Xing E P. Variance reduction for stochastic gradient optimization. In Proc. the 27th NIPS, Dec. 2013.
Patterson S, Teh Y W. Stochastic gradient riemannian langevin dynamics on probability simplex. In Proc. the 27th NIPS, Dec. 2013.
Zeng J, Liu Z Q, Cao X Q. Online belief propagation for topic modeling. arXiv.1210.2179, June 2013. http://arxiv.org/pdf/1210.2179.pdf, July 2015.
Hoffman M D, Blei D M,Wang C, Paisley J. Stochastic variational inference. Journal of Machine Learning Research, 2013, 14(1): 1303–1347.
Ouyang J, Lu Y, Li X. Momentum online LDA for largescale datasets. In Proc. the 21st ECAI, August 2014, pp.1075-1076.
Blei D M, Ng A Y, Jordan M I. Latent Dirichlet allocation. Journal of Machine Learning Research, 2013, 3: 993–1022.
Teh Y W, Jordan M I, Beal M J, Blei D M. Hierarchical Dirichlet processes. Journal of the American Statistical Association, 2006, 101(476): 1566–1581.
Wang C, Paisley J, Blei D M. Online variational inference for the hierarchical Dirichlet process. In Proc. the 14th AISTATS, April 2011, pp.752-760.
Johnson M J, Willsky A S. Stochastic variational inference for Bayesian time series models. In Proc. the 31st ICML, June 2014, pp.3872-3880.
Hernandez-Lobato J M, Houlsby N, Ghahramani Z. Stochastic inference for scalable probabilistic modeling of binary matrices. In Proc. the 31st ICML, June 2014, pp.1693-1710.
Robbins H, Monro S. A stochastic approximation method. The Annals of Mathematical Statistics, 1951, 22(3): 400–407.
Ranganath R, Wang C, Blei D M, Xing E P. An adaptive learning rate for stochastic variational inference. In Proc. the 30th ICML, June 2013, pp.298-306.
Amari S. Natural gradient words efficiently in learning. Neural Computation, 1998, 10(2): 251–276.
Schaul T, Zhang S, LeCun Y. No more pesky learning rates. In Proc. the 30th ICML, June 2013, pp.343-351.
Nemirovski A, Juditsky A, Lan G, Shapiro A. Robust stochastic approximation approach to stochastic programming. SIAM Journal on Optimization, 2009, 19(4): 1574–1609.
Collober R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P. Natural language processing (almost) from scratch. Journal of Machine Learning Research, 2011, 12: 2493–2537.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was supported by the National Natural Science Foundation of China under Grant Nos. 61170092, 61133011 and 61103091.
Rights and permissions
About this article
Cite this article
Li, XM., Ouyang, JH. Tuning the Learning Rate for Stochastic Variational Inference. J. Comput. Sci. Technol. 31, 428–436 (2016). https://doi.org/10.1007/s11390-016-1636-4
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11390-016-1636-4