Are ResNets Provably Better than Linear Predictors?

Shamir, Ohad

Computer Science > Machine Learning

arXiv:1804.06739v1 (cs)

[Submitted on 18 Apr 2018 (this version), latest version 27 Sep 2018 (v4)]

Title:Are ResNets Provably Better than Linear Predictors?

Authors:Ohad Shamir

View PDF

Abstract:A residual network (or ResNet) is a standard deep neural net architecture, with state-of-the-art performance across numerous applications. The main premise of ResNets is that they allow the training of each layer to focus on fitting just the residual of the previous layer's output and the target output. Thus, we should expect that the trained network is no worse than what we can obtain if we remove the residual layers and train a shallower network instead. However, due to the non-convexity of the optimization problem, it is not at all clear that ResNets indeed achieve this behavior, rather than getting stuck at some arbitrarily poor local minimum. In this paper, we rigorously prove that arbitrarily deep, nonlinear ResNets indeed exhibit this behavior, in the sense that the optimization landscape contains no local minima with value above what can be obtained with a linear predictor (namely a 1-layer network). Notably, we show this under minimal or no assumptions on the precise network architecture, data distribution, or loss function used. We also provide a quantitative analysis of second-order stationary points for this problem, and show that with a certain tweak to the architecture, training the network with standard stochastic gradient descent achieves an objective value no worse than any fixed linear predictor.

Comments:	20 pages
Subjects:	Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
Cite as:	arXiv:1804.06739 [cs.LG]
	(or arXiv:1804.06739v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1804.06739

Submission history

From: Ohad Shamir [view email]
[v1] Wed, 18 Apr 2018 14:06:15 UTC (145 KB)
[v2] Thu, 10 May 2018 16:10:51 UTC (146 KB)
[v3] Fri, 11 May 2018 10:58:10 UTC (146 KB)
[v4] Thu, 27 Sep 2018 10:30:26 UTC (146 KB)

Computer Science > Machine Learning

Title:Are ResNets Provably Better than Linear Predictors?

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Are ResNets Provably Better than Linear Predictors?

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators