Revisiting the Characteristics of Stochastic Gradient Noise and Dynamics

Wu, Yixin; Luo, Rui; Zhang, Chen; Wang, Jun; Yang, Yaodong

Computer Science > Machine Learning

arXiv:2109.09833 (cs)

[Submitted on 20 Sep 2021]

Title:Revisiting the Characteristics of Stochastic Gradient Noise and Dynamics

Authors:Yixin Wu, Rui Luo, Chen Zhang, Jun Wang, Yaodong Yang

View PDF

Abstract:In this paper, we characterize the noise of stochastic gradients and analyze the noise-induced dynamics during training deep neural networks by gradient-based optimizers. Specifically, we firstly show that the stochastic gradient noise possesses finite variance, and therefore the classical Central Limit Theorem (CLT) applies; this indicates that the gradient noise is asymptotically Gaussian. Such an asymptotic result validates the wide-accepted assumption of Gaussian noise. We clarify that the recently observed phenomenon of heavy tails within gradient noise may not be intrinsic properties, but the consequence of insufficient mini-batch size; the gradient noise, which is a sum of limited i.i.d. random variables, has not reached the asymptotic regime of CLT, thus deviates from Gaussian. We quantitatively measure the goodness of Gaussian approximation of the noise, which supports our conclusion. Secondly, we analyze the noise-induced dynamics of stochastic gradient descent using the Langevin equation, granting for momentum hyperparameter in the optimizer with a physical interpretation. We then proceed to demonstrate the existence of the steady-state distribution of stochastic gradient descent and approximate the distribution at a small learning rate.

Comments:	18 pages
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2109.09833 [cs.LG]
	(or arXiv:2109.09833v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2109.09833

Submission history

From: Yixin Wu [view email]
[v1] Mon, 20 Sep 2021 20:39:14 UTC (861 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2021-09

Change to browse by:

cs
cs.AI

References & Citations

DBLP - CS Bibliography

listing | bibtex

Rui Luo
Chen Zhang
Jun Wang
Yaodong Yang

export BibTeX citation

Computer Science > Machine Learning

Title:Revisiting the Characteristics of Stochastic Gradient Noise and Dynamics

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Revisiting the Characteristics of Stochastic Gradient Noise and Dynamics

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators