Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–9 of 9 results for author: Zucchet, N

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.21064  [pdf, other

    cs.LG cs.AI math.OC

    Recurrent neural networks: vanishing and exploding gradients are not the end of the story

    Authors: Nicolas Zucchet, Antonio Orvieto

    Abstract: Recurrent neural networks (RNNs) notoriously struggle to learn long-term memories, primarily due to vanishing and exploding gradients. The recent success of state-space models (SSMs), a subclass of RNNs, to overcome such difficulties challenges our theoretical understanding. In this paper, we delve into the optimization challenges of RNNs and discover that, as the memory of a network increases, ch… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  2. arXiv:2309.05858  [pdf, other

    cs.LG cs.AI

    Uncovering mesa-optimization algorithms in Transformers

    Authors: Johannes von Oswald, Eyvind Niklasson, Maximilian Schlegel, Seijin Kobayashi, Nicolas Zucchet, Nino Scherrer, Nolan Miller, Mark Sandler, Blaise Agüera y Arcas, Max Vladymyrov, Razvan Pascanu, João Sacramento

    Abstract: Transformers have become the dominant model in deep learning, but the reason for their superior performance is poorly understood. Here, we hypothesize that the strong performance of Transformers stems from an architectural bias towards mesa-optimization, a learned process running within the forward pass of a model consisting of the following two steps: (i) the construction of an internal learning… ▽ More

    Submitted 11 September, 2023; originally announced September 2023.

  3. arXiv:2309.01775  [pdf, other

    cs.LG cs.NE

    Gated recurrent neural networks discover attention

    Authors: Nicolas Zucchet, Seijin Kobayashi, Yassir Akram, Johannes von Oswald, Maxime Larcher, Angelika Steger, João Sacramento

    Abstract: Recent architectural developments have enabled recurrent neural networks (RNNs) to reach and even surpass the performance of Transformers on certain sequence modeling tasks. These modern RNNs feature a prominent design pattern: linear recurrent layers interconnected by feedforward paths with multiplicative gating. Here, we show how RNNs equipped with these two design elements can exactly implement… ▽ More

    Submitted 7 February, 2024; v1 submitted 4 September, 2023; originally announced September 2023.

  4. arXiv:2305.15947  [pdf, other

    cs.LG cs.NE

    Online learning of long-range dependencies

    Authors: Nicolas Zucchet, Robert Meier, Simon Schug, Asier Mujika, João Sacramento

    Abstract: Online learning holds the promise of enabling efficient long-term credit assignment in recurrent neural networks. However, current algorithms fall short of offline backpropagation by either not being scalable or failing to learn long-range dependencies. Here we present a high-performance online learning algorithm that merely doubles the memory and computational requirements of a single inference p… ▽ More

    Submitted 6 November, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

    Comments: Accepted at NeurIPS 2023

  5. arXiv:2209.07509  [pdf, other

    cs.LG

    Random initialisations performing above chance and how to find them

    Authors: Frederik Benzing, Simon Schug, Robert Meier, Johannes von Oswald, Yassir Akram, Nicolas Zucchet, Laurence Aitchison, Angelika Steger

    Abstract: Neural networks trained with stochastic gradient descent (SGD) starting from different random initialisations typically find functionally very similar solutions, raising the question of whether there are meaningful differences between different SGD solutions. Entezari et al.\ recently conjectured that despite different initialisations, the solutions found by SGD lie in the same loss valley after t… ▽ More

    Submitted 7 November, 2022; v1 submitted 15 September, 2022; originally announced September 2022.

    Comments: NeurIPS 2022, 14th Annual Workshop on Optimization for Machine Learning (OPT2022)

  6. arXiv:2207.01332  [pdf, other

    cs.LG cs.NE

    The least-control principle for local learning at equilibrium

    Authors: Alexander Meulemans, Nicolas Zucchet, Seijin Kobayashi, Johannes von Oswald, João Sacramento

    Abstract: Equilibrium systems are a powerful way to express neural computations. As special cases, they include models of great current interest in both neuroscience and machine learning, such as deep neural networks, equilibrium recurrent neural networks, deep equilibrium models, or meta-learning. Here, we present a new principle for learning such systems with a temporally- and spatially-local rule. Our pr… ▽ More

    Submitted 31 October, 2022; v1 submitted 4 July, 2022; originally announced July 2022.

    Comments: Published at NeurIPS 2022. 56 pages

    MSC Class: 68T07 ACM Class: I.2.6

  7. Beyond backpropagation: bilevel optimization through implicit differentiation and equilibrium propagation

    Authors: Nicolas Zucchet, João Sacramento

    Abstract: This paper reviews gradient-based techniques to solve bilevel optimization problems. Bilevel optimization is a general way to frame the learning of systems that are implicitly defined through a quantity that they minimize. This characterization can be applied to neural networks, optimizers, algorithmic solvers and even physical systems, and allows for greater modeling flexibility compared to an ex… ▽ More

    Submitted 27 October, 2022; v1 submitted 6 May, 2022; originally announced May 2022.

  8. arXiv:2110.14402  [pdf, other

    cs.LG cs.NE

    Learning where to learn: Gradient sparsity in meta and continual learning

    Authors: Johannes von Oswald, Dominic Zhao, Seijin Kobayashi, Simon Schug, Massimo Caccia, Nicolas Zucchet, João Sacramento

    Abstract: Finding neural network weights that generalize well from small datasets is difficult. A promising approach is to learn a weight initialization such that a small number of weight changes results in low generalization error. We show that this form of meta-learning can be improved by letting the learning algorithm decide which weights to change, i.e., by learning where to learn. We find that patterne… ▽ More

    Submitted 27 October, 2021; originally announced October 2021.

    Comments: Published at NeurIPS 2021

  9. arXiv:2104.01677  [pdf, other

    cs.LG cs.NE q-bio.NC

    A contrastive rule for meta-learning

    Authors: Nicolas Zucchet, Simon Schug, Johannes von Oswald, Dominic Zhao, João Sacramento

    Abstract: Humans and other animals are capable of improving their learning performance as they solve related tasks from a given problem domain, to the point of being able to learn from extremely limited data. While synaptic plasticity is generically thought to underlie learning in the brain, the precise neural and synaptic mechanisms by which learning processes improve through experience are not well unders… ▽ More

    Submitted 3 October, 2022; v1 submitted 4 April, 2021; originally announced April 2021.

    Comments: 32 pages, 10 figures, published at NeurIPS 2022