Convergence Theory of Learning Over-parameterized ResNet: A Full Characterization

Zhang, Huishuai; Yu, Da; Yi, Mingyang; Chen, Wei; Liu, Tie-Yan

Computer Science > Machine Learning

arXiv:1903.07120v2 (cs)

[Submitted on 17 Mar 2019 (v1), revised 30 May 2019 (this version, v2), latest version 31 Jan 2023 (v5)]

Title:Convergence Theory of Learning Over-parameterized ResNet: A Full Characterization

Authors:Huishuai Zhang, Da Yu, Mingyang Yi, Wei Chen, Tie-Yan Liu

View PDF

Abstract:ResNet structure has achieved great empirical success since its debut. Recent work established the convergence of learning over-parameterized ResNet with a scaling factor $\tau=1/L$ on the residual branch where $L$ is the network depth. However, it is not clear how learning ResNet behaves for other values of $\tau$. In this paper, we fully characterize the convergence theory of gradient descent for learning over-parameterized ResNet with different values of $\tau$. Specifically, with hiding logarithmic factor and constant coefficients, we show that for $\tau\le 1/\sqrt{L}$ gradient descent is guaranteed to converge to the global minma, and especially when $\tau\le 1/L$ the convergence is irrelevant of the network depth. Conversely, we show that for $\tau>L^{-\frac{1}{2}+c}$, the forward output grows at least with rate $L^c$ in expectation and then the learning fails because of gradient explosion for large $L$. This means the bound $\tau\le 1/\sqrt{L}$ is sharp for learning ResNet with arbitrary depth. To the best of our knowledge, this is the first work that studies learning ResNet with full range of $\tau$.

Comments:	31 pages
Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:1903.07120 [cs.LG]
	(or arXiv:1903.07120v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1903.07120

Submission history

From: Huishuai Zhang [view email]
[v1] Sun, 17 Mar 2019 16:15:56 UTC (346 KB)
[v2] Thu, 30 May 2019 05:45:07 UTC (163 KB)
[v3] Wed, 26 Jun 2019 08:03:07 UTC (163 KB)
[v4] Fri, 12 Jul 2019 08:33:44 UTC (163 KB)
[v5] Tue, 31 Jan 2023 03:40:35 UTC (982 KB)

Computer Science > Machine Learning

Title:Convergence Theory of Learning Over-parameterized ResNet: A Full Characterization

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Convergence Theory of Learning Over-parameterized ResNet: A Full Characterization

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators