Tuning the Learning Rate for Stochastic Variational Inference

Li, Xi-Ming; Ouyang, Ji-Hong

doi:10.1007/s11390-016-1636-4

Tuning the Learning Rate for Stochastic Variational Inference

Regular Paper
Published: 07 March 2016

Volume 31, pages 428–436, (2016)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Xi-Ming Li^1,2 &
Ji-Hong Ouyang^1,2

176 Accesses
2 Citations
Explore all metrics

Abstract

Stochastic variational inference (SVI) can learn topic models with very big corpora. It optimizes the variational objective by using the stochastic natural gradient algorithm with a decreasing learning rate. This rate is crucial for SVI; however, it is often tuned by hand in real applications. To address this, we develop a novel algorithm, which tunes the learning rate of each iteration adaptively. The proposed algorithm uses the Kullback–Leibler (KL) divergence to measure the similarity between the variational distribution with noisy update and that with batch update, and then optimizes the learning rates by minimizing the KL divergence. We apply our algorithm to two representative topic models: latent Dirichlet allocation and hierarchical Dirichlet process. Experimental results indicate that our algorithm performs better and converges faster than commonly used learning rates.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Topic modeling for large-scale text data

Article 11 June 2015

A Principled Approach to Expectation Maximisation and Latent Dirichlet Allocation Using Jeffrey’s Update Rule

$$\mathtt{SpectralLeader}$$ : Online Spectral Learning for Single Topic Models

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Jin F, Huang M L, Zhu X Y. Guided structure-aware review summarization. Journal of Computer Science and Technology, 2011, 26(4): 676–684.
Article Google Scholar
Li P, Wang B, Jin W. Improving Web document clustering through employing user-related tag expansion techniques. Journal of Computer Science and Technology, 2012, 27(3): 554–566.
Article Google Scholar
Newman D, Asuncion A, Smyth P, Welling M. Distributed algorithms for topic models. Journal of Machine Learning Research, 2009, 10: 1801–1828.
MathSciNet MATH Google Scholar
Yan F, Xu N, Qi Y. Parallel inference for latent Dirichlet allocation on graphics processing units. In Proc. the 23rd NIPS, Dec. 2009, pp.2134-2142.
Liu Z, Zhang Y, Chang E, Sun M. PLDA+: Parallel latent Dirichlet allocation with data placement and pipeline processing. ACM Transactions on Intelligent Systems and Technology, 2011, 2(3): Article No. 26.
AlSumait L, Barbara D, Domeniconi C. On-line LDA: Adaptive topic models for mining text streams with applications to topic detection and tracking. In Proc. the 8th ICDM, Dec. 2008, pp.3-12.
Yao L, Mimno D, McCallum A. Efficient methods for topic model inference on streaming document collections. In Proc. the 15th SIGKDD, June 28-July 1, 2009, pp.937-945.
Hoffman M D, Blei D M. Online learning for latent Dirichlet allocation. In Proc. the 24th NIPS, Dec. 2010.
Mimno D, Hoffman M D, Blei D M. Sparse stochastic inference for latent Dirichlet allocation. In Proc. the 29th ICML, June 27-July 3, 2012, pp.1599-1606.
Wang C, Chen X, Smola A, Xing E P. Variance reduction for stochastic gradient optimization. In Proc. the 27th NIPS, Dec. 2013.
Patterson S, Teh Y W. Stochastic gradient riemannian langevin dynamics on probability simplex. In Proc. the 27th NIPS, Dec. 2013.
Zeng J, Liu Z Q, Cao X Q. Online belief propagation for topic modeling. arXiv.1210.2179, June 2013. http://arxiv.org/pdf/1210.2179.pdf, July 2015.
Hoffman M D, Blei D M,Wang C, Paisley J. Stochastic variational inference. Journal of Machine Learning Research, 2013, 14(1): 1303–1347.
MathSciNet MATH Google Scholar
Ouyang J, Lu Y, Li X. Momentum online LDA for largescale datasets. In Proc. the 21st ECAI, August 2014, pp.1075-1076.
Blei D M, Ng A Y, Jordan M I. Latent Dirichlet allocation. Journal of Machine Learning Research, 2013, 3: 993–1022.
MATH Google Scholar
Teh Y W, Jordan M I, Beal M J, Blei D M. Hierarchical Dirichlet processes. Journal of the American Statistical Association, 2006, 101(476): 1566–1581.
Article MathSciNet MATH Google Scholar
Wang C, Paisley J, Blei D M. Online variational inference for the hierarchical Dirichlet process. In Proc. the 14th AISTATS, April 2011, pp.752-760.
Johnson M J, Willsky A S. Stochastic variational inference for Bayesian time series models. In Proc. the 31st ICML, June 2014, pp.3872-3880.
Hernandez-Lobato J M, Houlsby N, Ghahramani Z. Stochastic inference for scalable probabilistic modeling of binary matrices. In Proc. the 31st ICML, June 2014, pp.1693-1710.
Robbins H, Monro S. A stochastic approximation method. The Annals of Mathematical Statistics, 1951, 22(3): 400–407.
Article MathSciNet MATH Google Scholar
Ranganath R, Wang C, Blei D M, Xing E P. An adaptive learning rate for stochastic variational inference. In Proc. the 30th ICML, June 2013, pp.298-306.
Amari S. Natural gradient words efficiently in learning. Neural Computation, 1998, 10(2): 251–276.
Article MathSciNet Google Scholar
Schaul T, Zhang S, LeCun Y. No more pesky learning rates. In Proc. the 30th ICML, June 2013, pp.343-351.
Nemirovski A, Juditsky A, Lan G, Shapiro A. Robust stochastic approximation approach to stochastic programming. SIAM Journal on Optimization, 2009, 19(4): 1574–1609.
Article MathSciNet MATH Google Scholar
Collober R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P. Natural language processing (almost) from scratch. Journal of Machine Learning Research, 2011, 12: 2493–2537.
MATH Google Scholar

Download references

Author information

Authors and Affiliations

College of Computer Science and Technology, Jilin University, Changchun, 130000, China
Xi-Ming Li & Ji-Hong Ouyang
Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, 130000, China
Xi-Ming Li & Ji-Hong Ouyang

Authors

Xi-Ming Li
View author publications
You can also search for this author in PubMed Google Scholar
Ji-Hong Ouyang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ji-Hong Ouyang.

Additional information

This work was supported by the National Natural Science Foundation of China under Grant Nos. 61170092, 61133011 and 61103091.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, XM., Ouyang, JH. Tuning the Learning Rate for Stochastic Variational Inference. J. Comput. Sci. Technol. 31, 428–436 (2016). https://doi.org/10.1007/s11390-016-1636-4

Download citation

Received: 10 November 2014
Revised: 27 July 2015
Published: 07 March 2016
Issue Date: March 2016
DOI: https://doi.org/10.1007/s11390-016-1636-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Tuning the Learning Rate for Stochastic Variational Inference

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Topic modeling for large-scale text data

A Principled Approach to Expectation Maximisation and Latent Dirichlet Allocation Using Jeffrey’s Update Rule

$$\mathtt{SpectralLeader}$$ : Online Spectral Learning for Single Topic Models

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Tuning the Learning Rate for Stochastic Variational Inference

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Topic modeling for large-scale text data

A Principled Approach to Expectation Maximisation and Latent Dirichlet Allocation Using Jeffrey’s Update Rule

$$\mathtt{SpectralLeader}$$ : Online Spectral Learning for Single Topic Models

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation