Online and Stochastic Universal Gradient Methods for Minimizing Regularized Hölder Continuous Finite Sums in Machine Learning

Shi, Ziqiang; Liu, Rujie

doi:10.1007/978-3-319-18038-0_29

Ziqiang Shi¹⁰ &
Rujie Liu¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9077))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

3603 Accesses

Abstract

Online and stochastic gradient methods have emerged as potent tools in large scale optimization with both smooth convex and nonsmooth convex problems from the classes $C^{1,1}(\mathbb {R}^p)$ and $C^{1,0}(\mathbb {R}^p)$ respectively. However, to our best knowledge, there is few paper using incremental gradient methods to optimization the intermediate classes of convex problems with Hölder continuous functions $C^{1,v}(\mathbb {R}^p)$. In order to fill the difference and the gap between the methods for smooth and nonsmooth problems, in this work, we propose several online and stochastic universal gradient methods, which we do not need to know the actual degree of the smoothness of the objective function in advance. We expanded the scope of the problems involved in machine learning to Hölder continuous functions and to propose a general family of first-order methods. Regret and convergent analysis shows that our methods enjoy strong theoretical guarantees. For the first time, we establish algorithms that enjoys a linear convergence rate for convex functions that have Hölder continuous gradients.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Gradient-Type Methods for Optimization Problems with Polyak-Łojasiewicz Condition: Early Stopping and Adaptivity to Inexactness Parameter

Linesearch Newton-CG methods for convex optimization with noise

Article Open access 17 August 2022

An efficient gradient method with approximate optimal stepsize for large-scale unconstrained optimization

Article 21 June 2017

References

Beck, A., Teboulle, M.: Mirror descent and nonlinear projected subgradient methods for convex optimization. Operations Research Letters 31(3), 167–175 (2003)
Article MATH MathSciNet Google Scholar
Duchi, J., Shalev-Shwartz, S., Singer, Y., Tewari, A.: Composite objective mirror descent (2010)
Google Scholar
Duchi, J.C., Agarwal, A., Wainwright, M.J.: Dual averaging for distributed optimization. In: 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1564–1565. IEEE (2012)
Google Scholar
Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Computers & Mathematics with Applications 2(1), 17–40 (1976)
Article MATH Google Scholar
Mairal, J.: Optimization with first-order surrogate functions. arXiv preprint arXiv:1305.3120 (2013)
Nesterov, Y.: Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM Journal on Optimization 22(2), 341–362 (2012)
Article MATH MathSciNet Google Scholar
Nesterov, Y.: Primal-dual subgradient methods for convex problems. Mathematical Programming 120(1), 221–259 (2009)
Article MATH MathSciNet Google Scholar
Nesterov, Y.: Universal gradient methods for convex optimization problems. CORE (2013)
Google Scholar
Schmidt, M., Roux, N.L., Bach, F.: Minimizing finite sums with the stochastic average gradient. arXiv preprint arXiv:1309.2388 (2013)
Shalev-Shwartz, S., Zhang, T.: Proximal stochastic dual coordinate ascent. arXiv preprint arXiv:1211.2717 (2012)
Shi, Z., Han, J., Zheng, T., Deng, S.: Audio segment classification using online learning based tensor representation feature discrimination. IEEE transactions on audio, speech, and language processing 21(1–2), 186–196 (2013)
Google Scholar
Suzuki, T.: Dual averaging and proximal gradient descent for online alternating direction multiplier method. In: Proceedings of ICML 2013, pp. 392–400 (2013)
Google Scholar
Wang, H., Banerjee, A.: Online alternating direction method. arXiv preprint arXiv:1206.6448 (2012)
Xiao, L.: Dual averaging methods for regularized stochastic learning and online optimization. The Journal of Machine Learning Research 11, 2543–2596 (2010)
MATH Google Scholar
Xiao, L., Zhang, T.: A proximal stochastic gradient method with progressive variance reduction. arXiv preprint arXiv:1403.4699 (2014)
Zinkevich, M.: Online convex programming and generalized infinitesimal gradient ascent. In: Proceedings of ICML 2003 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Fujitsu Research and Development Center, Beijing, China
Ziqiang Shi & Rujie Liu

Authors

Ziqiang Shi
View author publications
You can also search for this author in PubMed Google Scholar
Rujie Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ziqiang Shi .

Editor information

Editors and Affiliations

Ho Chi Minh City University of Technology, Ho Chi Minh City, Vietnam
Tru Cao
Singapore Management University, Singapore, Singapore
Ee-Peng Lim
Nanjing University, Nanjing, China
Zhi-Hua Zhou
Japan Advanced Institute of Science and Technology, Nomi City, Japan
Tu-Bao Ho
University of Hong Kong, Hong Kong, Hong Kong SAR
David Cheung
Osaka University, Osaka, Japan
Hiroshi Motoda

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shi, Z., Liu, R. (2015). Online and Stochastic Universal Gradient Methods for Minimizing Regularized Hölder Continuous Finite Sums in Machine Learning. In: Cao, T., Lim, EP., Zhou, ZH., Ho, TB., Cheung, D., Motoda, H. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2015. Lecture Notes in Computer Science(), vol 9077. Springer, Cham. https://doi.org/10.1007/978-3-319-18038-0_29

Download citation

DOI: https://doi.org/10.1007/978-3-319-18038-0_29
Published: 17 April 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18037-3
Online ISBN: 978-3-319-18038-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Online and Stochastic Universal Gradient Methods for Minimizing Regularized Hölder Continuous Finite Sums in Machine Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Gradient-Type Methods for Optimization Problems with Polyak-Łojasiewicz Condition: Early Stopping and Adaptivity to Inexactness Parameter

Linesearch Newton-CG methods for convex optimization with noise

An efficient gradient method with approximate optimal stepsize for large-scale unconstrained optimization

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Online and Stochastic Universal Gradient Methods for Minimizing Regularized Hölder Continuous Finite Sums in Machine Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Gradient-Type Methods for Optimization Problems with Polyak-Łojasiewicz Condition: Early Stopping and Adaptivity to Inexactness Parameter

Linesearch Newton-CG methods for convex optimization with noise

An efficient gradient method with approximate optimal stepsize for large-scale unconstrained optimization

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation