Towards an Empirically Guided Understanding of the Loss Landscape of Neural Networks

Benzing, Frederik

doi:10.3929/ethz-b-000572885

Download

Full text (PDF, 13.44Mb)

Open access

Author

Benzing, Frederik

Date

2022

Type

Doctoral Thesis

ETH Bibliography

yes

Altmetrics

Download

Full text (PDF, 13.44Mb)

Rights / license

In Copyright - Non-Commercial Use Permitted

Abstract

One of the most important and ubiquitous building blocks of machine learning is gradient based optimization. While it has and continues to contribute to the vast majority of recent successes of deep neural networks, it comes both with some limitations and the potential for further improvements. Catastrophic forgetting, which is the subject of the fist two parts of this thesis, is one such limitation. It refers to the observation that when gradient based learning algorithms are asked to learn different tasks sequentially, they overwrite knowledge from earlier tasks. In the machine learning community, several different ideas and formalisations of this problem are being investigated. One of the most difficult versions is a setting in which the use of data from earlier distributions is strictly forbidden. In this domain, an important line of work are so-called regularisation based algorithms. Our first contribution is to unify a large family of these algorithms by showing that they all rely on the same theoretical idea to limit catastrophic forgetting. This had not only been unknown, but we also show how this is an accidental feature of at least some of the algorithms. To demonstrate the practical impact of these insights, we also show how they can be used to make some algorithms more robust and performant across a variety of settings. The second part of the thesis uses tools from the first part and tackles a similar problem, but does so from a different angle. Namely it focusses on the phenomenon of catastrophic forgetting – also known as the stability-plasticity dilemma – from the viewpoint of neuroscience. It proposes and analyses a simple synaptic learning rule, based on the stochasticity of synaptic signal transmission and shows how this learning rule can alleviate catastrophic forgetting in model neural network. Moreover, the learning rule’s effects on energy-efficient information processing are investigated extending prior work which explores computational roles of the aforementioned and somewhat mysterious stochastic nature of synaptic signal transmission. Finally, the third part of the thesis focuses on potential improvements of standard first-order gradient based optimizers. One of the most successful lines of work in this area are Kronecker-factored optimizers, whose influence has reached beyond optimization to areas like Bayesian machine learning, catastrophic forgetting or meta-learning. Kronecker-factored optimizers are motivated and thought of as approximations of natural gradient descent, a well-known second-order optimization method. We will show that a host of empirical results contradict this view of KFAC as a second-order optimizer and propose an alternative, fundamentally different theoretical explanation for its effectiveness. This does not only give important new insights into one of the most powerful optimizers for neural networks, but can also be used to derive a more efficient optimizer. Show more

Permanent link

https://doi.org/10.3929/ethz-b-000572885

Publication status

published

External links

Search print copy at ETH Library

Contributors

Examiner: Steger, Angelika
Examiner: Aitchison, Laurence
Examiner: Pascanu, Razvan

Publisher

ETH Zurich

Subject

Optimization; Gradient Descent; Machine Learning; Continual learning

Organisational unit

03672 - Steger, Angelika / Steger, Angelika

More

Show all metadata

ETH Bibliography

yes

Altmetrics

Research Collection

Search

Towards an Empirically Guided Understanding of the Loss Landscape of Neural Networks Mendeley CSV RIS BibTeX

Towards an Empirically Guided Understanding of the Loss Landscape of Neural Networks

Mendeley

CSV

RIS

BibTeX