A Randomized Block-Coordinate Adam online learning optimization algorithm

Zhou, Yangfan; Zhang, Mingchuan; Zhu, Junlong; Zheng, Ruijuan; Wu, Qingtao

doi:10.1007/s00521-020-04718-9

A Randomized Block-Coordinate Adam online learning optimization algorithm

Original Article
Published: 18 January 2020

Volume 32, pages 12671–12684, (2020)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Yangfan Zhou¹,
Mingchuan Zhang ORCID: orcid.org/0000-0002-2523-1089¹,
Junlong Zhu¹,
Ruijuan Zheng¹ &
…
Qingtao Wu¹

1066 Accesses
32 Citations
Explore all metrics

Abstract

In recent years, stochastic gradient descent (SGD) becomes one of the most important optimization algorithms in many fields, such as deep learning and reinforcement learning. However, the computation of full gradient in SGD is prohibitive when dealing with high-dimensional vectors. For this reason, we propose a randomized block-coordinate Adam (RBC-Adam) online learning optimization algorithm. At each round, RBC-Adam randomly chooses a variable from a subset of parameters to compute the gradient and updates the parameters along the negative gradient direction. Moreover, this paper analyzes the convergence of RBC-Adam and obtains the regret bound, $O(\sqrt{T})$, where T is a time horizon. The theoretical results are verified by simulated experiments on four public datasets. Moreover, the simulated experiment results show that the computational cost of RBC-Adam is lower than the variants of Adam.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Adam revisited: a weighted past gradients perspective

Article 03 January 2020

ABNGrad: adaptive step size gradient descent for optimizing neural networks

Article 16 February 2024

Accelerating adaptive online learning by matrix approximation

Article 24 January 2019

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/

References

Zhang M, Yang M, Wu Q, Zheng R, Zhu J (2018) Smart perception and autonomic optimization: a novel bio-inspired hybrid routing protocol for MANETs. Fut Gen Comput Syst 81:505–513
Article Google Scholar
Ai Z, Zhou Y, Song F (2018) A smart collaborative routing protocol for reliable data diffusion in IoT scenarios. Sensors 18(6):1926
Article Google Scholar
Zhang H, Quan W, Chao H, Qiao C (2016) Smart identifier network: a collaborative architecture for the future internet. IEEE Netw 30(3):46–51
Article Google Scholar
Song F, Zhou Y, Chang L, Zhang H (2019) Modeling space-terrestrial integrated networks with smart collaborative theory. IEEE Netw 33(1):51–57
Article Google Scholar
Klein S, Staring M, Pluim JPW (2008) Evaluation of optimization methods for nonrigid medical image registration using mutual information and B-splines. IEEE Trans Image Process 16(12):2879–2890
Article MathSciNet Google Scholar
Quan W, Cheng N, Qin M, Zhang H, Chan HA, Shen X (2018) Adaptive transmission control for software defined vehicular networks. IEEE Wirel Commun Lett Commun Lett 8:653–656
Article Google Scholar
Mokhtari A, Ling Q, Ribeiro A (2017) Network Newton distributed optimization methods. IEEE Trans Signal Process 65(1):146–161
Article MathSciNet Google Scholar
Bijral AS, Sarwate AD, Srebro N (2017) Data-dependent convergence for consensus stochastic optimization. IEEE Trans Autom Control 62(9):4483–4498
Article MathSciNet Google Scholar
Li Y, Liang Y (2018) Learning overparameterized neural networks via stochastic gradient descent on structured data. In: NIPS, Montreal, Canada, Dec 2018, pp 8157–8166
Qiao Y, Lew BV, Lelieveldt BPF, Staring M (2016) Fast automatic step size estimation for gradient descent optimization of image registration. IEEE Trans Med Imaging 35(2):391–403
Article Google Scholar
Cheng WY, Juang CF (2014) A fuzzy model with online incremental SVM and margin-selective gradient descent learning for classification problems. IEEE Trans Fuzzy Syst 22(2):324–337
Article Google Scholar
Arablouei R, Werner S, Dogancay K (2014) Analysis of the gradient-descent total least-squares adaptive filtering algorithm. IEEE Trans Signal Process 62(5):1256–1264
Article MathSciNet Google Scholar
Shi S, Wang Q, Chu X, Li B (2018) A DAG model of synchronous stochastic gradient descent in distributed deep learning. In: ICPADS, Singapore, Dec 2018, pp 425–432
Lee C, Cho K, Kang W (September 2018) Directional analysis of stochastic gradient descent via von Mises–Fisher distributions in deep learning. [Online]. Available: arXiv:1810.00150, Initial submission
Cohen K, Nedić A, Srikant R (2017) On projected stochastic gradient descent algorithm with weighted averaging for least squares regression. IEEE Trans Autom Control 62(11):5974–5981
Article MathSciNet Google Scholar
Zhou F, Cong GJ (2018) On the convergence properties of a $k$-step averaging stochastic gradient descent algorithm for nonconvex optimization. In: IJCAI, Stockholm, Sweden, pp 3219–3227, July 2018
Shen ZB, Qian H, Mu TZ, Zhang C (2017) Accelerated doubly stochastic gradient algorithm for large-scale empirical risk minimization. In: IJCAI, Melbourne, Australia, Aug 2017, pp 2715–2721
Duchi J, Hazan E, Singer Y (2011) Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res 12:2121–2159
MathSciNet MATH Google Scholar
Zeiler MD (2012) ADADELTA: an adaptive learning rate method (online) Dec 2012, Available: arXiv:1212.5701 Initial submission.
Tieleman T, Hinton G (2012) RmsProp: divide the gradient by a running average of its recent magnitude. In: COURSERA: neural networks for machine learning
Kingma DP, Ba JL (2015) Adam: a method for stochastic optimization. In: ICLR, San Diego, America, May 2015
Ruder S (2016) An overview of gradient descent optimization algorithms (online) , Sept 2016. Available: arXiv:1609.04747, Initial submission.
Dozat T (2016) Incorporating Nesterov momentum into Adam. In: ICLR, San Juan, Puerto Rico, May 2016
Shazeer N, Stern M (2018) Adafactor: adaptive learning rates with sublinear memory cost. In: ICML, Stockholm, Sweden, PMLR, July 2018, pp 4596–4604
Reddi SJ, Kale S, Kumar S (2018) On the convergence of Adam and beyond. In: ICLR, Vancouver, Canada, May 2018
Zhang JW, Cui LM, Gouza FB (2018) GADAM: genetic-evolutionary ADAM for deep neural network optimization (online) May 2018. Available: arXiv:1805.07500, Initial submission
Zaheer M, Reddi S, Sachan D, Sachan S, Kumar S (2018) Adaptive methods for Nonconvex optimization. In: NIPS, Montreal, Canada, Curran Associates, Inc, Dec 2018
Nesterov YE (1983) A method of solving a convex programming problem with convergence rate $O(1/{k^{2}})$. Soviet Mathematics Doklady 27:372–376
MATH Google Scholar
Nesterov Y (2004) Introductory lectures on convex optimization: a basic course. Springer, Boston, MA
Book Google Scholar
Khan ME, Nielsen D, Tangkaratt V, Lin W, Gal Y, Srivastava A (2018) Fast and scalable bayesian deep learning by weight-perturbation in Adam. In: ICML, Stockholm, Sweden, PMLR, July 2018
Tseng P (2001) Convergence of a block coordinate descent method for nondifferentiable minimization. J Optim Theory Appl 109(3):475–494
Article MathSciNet Google Scholar
Nesterov Y (2012) Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J Optim 22(2):341–364
Article MathSciNet Google Scholar
Hu EL, Kwok JT (2015) Scalable nonparametric low-rank kernel learning using block coordinate descent. IEEE Trans Neural Netw Learn Syst 26(9):1927–1938
Article MathSciNet Google Scholar
Zhao T, Yu M, Wang Y, Arora R, Liu H (2014) Accelerated mini-batch randomized block coordinate descent method. In: NIPS, Montreal, Canada, Curran Associates, Inc., Dec 2014, pp 3329–3337
Simon LJ, Martin J, Schmidt M, Pletscher P (2013) Block-coordinate frank-wolfe optimization for structural SVMs. In: ICML, Atlanta, America, PMLR, June 2013, pp 53–61
Singh C, Nedić A, Srikant R (2014) Random block-coordinate gradient projection algorithms. In: CDC, Los Angeles, America, IEEE, Dec 2014, pp 185–190
Xie TY, Liu B, Xu YY, Ghavamzadeh M, Chow Y, Lyu D (2018) A block coordinate ascent algorithm for mean-variance optimization. In: NIPS, Montreal, Canada, Curran Associates, Inc., Dec 2018, pp 1073–1083
Cohen A, Hasidim A, Koren T, Lazic N, Mansour Y, Talwar K (2018) Online linear quadratic control. In: ICML, Stockholm, Sweden, PMLR, July 2018, pp 1029–1038
Wang Y, Yao Q, James TK, Lionel MN (2018) Online convolutional sparse coding with sample-dependent dictionary. In: ICML, Stockholm, Sweden, PMLR, July 2018, pp 5209–5218
Zhang W, Zhao P, Zhu W, Hoi SCH, Zhang T (2017) Projection-free distributed online learning in networks. In: ICML, Sydney, Australia, PMLR, Aug 2017, pp 4054–4062
Zhang M, Quan W, Cheng N, Wu Q, Zhu J, Zheng R, Li K (2019) Distributed conditional gradient online learning for IoT optimization. IEEE Intern Things J. https://doi.org/10.1109/JIOT.2019.2919562
Article Google Scholar
Nedić A, Lee S, Raginsky M (2015) Decentralized online optimization with global objectives and local communication. In: ACC, America, July 2015, pp 4497–4503
Zhu J, Xu C, Guan J, Wu DO (2018) Differentially private distributed online algorithms over time-varying directed networks. IEEE Trans Signal Inf Process Netw 4(1):4–17
Article MathSciNet Google Scholar
Zinkevich M (2003) Online convex programming and generalized infinitesimal gradient ascent. In: ICML, Washington DC, America, AAAI Press, Aug 2003, pp 928–936
Boyd S, Vandenberghe L (2013) Convex optimization. Cambridge University Press, Cambridge
MATH Google Scholar
Durrett R (2005) Probability: theory and examples, 3rd edn. Cengage Learning, Singapore
MATH Google Scholar

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China (NSFC) under Grants Nos. 61976243, 61971458, and U1604155, and in part by the Scientific and Technological Innovation Team of Colleges and Universities in Henan Province under Grants No. 20IRTSTHN018, and in part by the basic research projects in the University of Henan Province under Grants No. 19zx010, and in part by the Science and Technology Development Programs of Henan Province under Grant No. 192102210284.

Author information

Authors and Affiliations

School of Information Engineering, Henan University of Science and Technology, Luoyang, 471023, China
Yangfan Zhou, Mingchuan Zhang, Junlong Zhu, Ruijuan Zheng & Qingtao Wu

Authors

Yangfan Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Mingchuan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Junlong Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Ruijuan Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Qingtao Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mingchuan Zhang.

Ethics declarations

Conflict of Interest

The authors declare that there is no conflict of interests regarding the publication of this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhou, Y., Zhang, M., Zhu, J. et al. A Randomized Block-Coordinate Adam online learning optimization algorithm. Neural Comput & Applic 32, 12671–12684 (2020). https://doi.org/10.1007/s00521-020-04718-9

Download citation

Received: 27 June 2019
Accepted: 07 January 2020
Published: 18 January 2020
Issue Date: August 2020
DOI: https://doi.org/10.1007/s00521-020-04718-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Randomized Block-Coordinate Adam online learning optimization algorithm

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Adam revisited: a weighted past gradients perspective

ABNGrad: adaptive step size gradient descent for optimizing neural networks

Accelerating adaptive online learning by matrix approximation

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

A Randomized Block-Coordinate Adam online learning optimization algorithm

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Adam revisited: a weighted past gradients perspective

ABNGrad: adaptive step size gradient descent for optimizing neural networks

Accelerating adaptive online learning by matrix approximation

Explore related subjects

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation