Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–11 of 11 results for author: Willi, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.18420  [pdf, other

    cs.LG cs.AI

    Mixture of Experts in a Mixture of RL settings

    Authors: Timon Willi, Johan Obando-Ceron, Jakob Foerster, Karolina Dziugaite, Pablo Samuel Castro

    Abstract: Mixtures of Experts (MoEs) have gained prominence in (self-)supervised learning due to their enhanced inference efficiency, adaptability to distributed training, and modularity. Previous research has illustrated that MoEs can significantly boost Deep Reinforcement Learning (DRL) performance by expanding the network's parameter count while reducing dormant neurons, thereby enhancing the model's lea… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  2. arXiv:2402.08609  [pdf, other

    cs.LG cs.AI

    Mixtures of Experts Unlock Parameter Scaling for Deep RL

    Authors: Johan Obando-Ceron, Ghada Sokar, Timon Willi, Clare Lyle, Jesse Farebrother, Jakob Foerster, Gintare Karolina Dziugaite, Doina Precup, Pablo Samuel Castro

    Abstract: The recent rapid progress in (self) supervised learning models is in large part predicted by empirical scaling laws: a model's performance scales proportionally to its size. Analogous scaling laws remain elusive for reinforcement learning domains, however, where increasing the parameter count of a model often hurts its final performance. In this paper, we demonstrate that incorporating Mixture-of-… ▽ More

    Submitted 26 June, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

  3. arXiv:2402.05782  [pdf, other

    cs.LG cs.AI cs.GT cs.MA

    Analysing the Sample Complexity of Opponent Shaping

    Authors: Kitty Fung, Qizhen Zhang, Chris Lu, Jia Wan, Timon Willi, Jakob Foerster

    Abstract: Learning in general-sum games often yields collectively sub-optimal results. Addressing this, opponent shaping (OS) methods actively guide the learning processes of other agents, empirically leading to improved individual and group performances in many settings. Early OS methods use higher-order derivatives to shape the learning of co-players, making them unsuitable for shaping multiple learning s… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

    Journal ref: AAMAS 2024

  4. arXiv:2402.01088  [pdf, other

    cs.GT cs.MA

    The Danger Of Arrogance: Welfare Equilibra As A Solution To Stackelberg Self-Play In Non-Coincidental Games

    Authors: Jake Levi, Chris Lu, Timon Willi, Christian Schroeder de Witt, Jakob Foerster

    Abstract: The increasing prevalence of multi-agent learning systems in society necessitates understanding how to learn effective and safe policies in general-sum multi-agent environments against a variety of opponents, including self-play. General-sum learning is difficult because of non-stationary opponents and misaligned incentives. Our first main contribution is to show that many recent approaches to gen… ▽ More

    Submitted 27 March, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

    Comments: 31 pages, 23 figures

  5. arXiv:2312.12568  [pdf, other

    cs.AI

    Scaling Opponent Shaping to High Dimensional Games

    Authors: Akbir Khan, Timon Willi, Newton Kwan, Andrea Tacchetti, Chris Lu, Edward Grefenstette, Tim Rocktäschel, Jakob Foerster

    Abstract: In multi-agent settings with mixed incentives, methods developed for zero-sum games have been shown to lead to detrimental outcomes. To address this issue, opponent shaping (OS) methods explicitly learn to influence the learning dynamics of co-players and empirically lead to improved individual and collective outcomes. However, OS methods have only been evaluated in low-dimensional environments du… ▽ More

    Submitted 10 February, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

  6. arXiv:2312.12564  [pdf, other

    cs.LG cs.GT cs.MA

    Leading the Pack: N-player Opponent Shaping

    Authors: Alexandra Souly, Timon Willi, Akbir Khan, Robert Kirk, Chris Lu, Edward Grefenstette, Tim Rocktäschel

    Abstract: Reinforcement learning solutions have great success in the 2-player general sum setting. In this setting, the paradigm of Opponent Shaping (OS), in which agents account for the learning of their co-players, has led to agents which are able to avoid collectively bad outcomes, whilst also maximizing their reward. These methods have currently been limited to 2-player game. However, the real world inv… ▽ More

    Submitted 26 December, 2023; v1 submitted 19 December, 2023; originally announced December 2023.

  7. arXiv:2311.10090  [pdf, other

    cs.LG cs.AI cs.MA

    JaxMARL: Multi-Agent RL Environments in JAX

    Authors: Alexander Rutherford, Benjamin Ellis, Matteo Gallici, Jonathan Cook, Andrei Lupu, Gardar Ingvarsson, Timon Willi, Akbir Khan, Christian Schroeder de Witt, Alexandra Souly, Saptarashmi Bandyopadhyay, Mikayel Samvelyan, Minqi Jiang, Robert Tjarko Lange, Shimon Whiteson, Bruno Lacerda, Nick Hawes, Tim Rocktaschel, Chris Lu, Jakob Nicolaus Foerster

    Abstract: Benchmarks play an important role in the development of machine learning algorithms. For example, research in reinforcement learning (RL) has been heavily influenced by available environments and benchmarks. However, RL environments are traditionally run on the CPU, limiting their scalability with typical academic compute. Recent advancements in JAX have enabled the wider use of hardware accelerat… ▽ More

    Submitted 19 December, 2023; v1 submitted 16 November, 2023; originally announced November 2023.

  8. arXiv:2211.11030  [pdf, other

    cs.LG cs.AI cs.CR

    Adversarial Cheap Talk

    Authors: Chris Lu, Timon Willi, Alistair Letcher, Jakob Foerster

    Abstract: Adversarial attacks in reinforcement learning (RL) often assume highly-privileged access to the victim's parameters, environment, or data. Instead, this paper proposes a novel adversarial setting called a Cheap Talk MDP in which an Adversary can merely append deterministic messages to the Victim's observation, resulting in a minimal range of influence. The Adversary cannot occlude ground truth, in… ▽ More

    Submitted 11 July, 2023; v1 submitted 20 November, 2022; originally announced November 2022.

    Comments: To be published at ICML 2023. Project video and code are available at https://sites.google.com/view/adversarial-cheap-talk

  9. arXiv:2205.01447  [pdf, other

    cs.AI cs.MA

    Model-Free Opponent Shaping

    Authors: Chris Lu, Timon Willi, Christian Schroeder de Witt, Jakob Foerster

    Abstract: In general-sum games, the interaction of self-interested learning agents commonly leads to collectively worst-case outcomes, such as defect-defect in the iterated prisoner's dilemma (IPD). To overcome this, some methods, such as Learning with Opponent-Learning Awareness (LOLA), shape their opponents' learning process. However, these methods are myopic since only a small number of steps can be anti… ▽ More

    Submitted 4 November, 2022; v1 submitted 3 May, 2022; originally announced May 2022.

    Comments: ICML 2022 camera ready version. Code: https://github.com/luchris429/Model-Free-Opponent-Shaping

  10. arXiv:2203.04098  [pdf, other

    cs.LG cs.AI cs.GT

    COLA: Consistent Learning with Opponent-Learning Awareness

    Authors: Timon Willi, Alistair Letcher, Johannes Treutlein, Jakob Foerster

    Abstract: Learning in general-sum games is unstable and frequently leads to socially undesirable (Pareto-dominated) outcomes. To mitigate this, Learning with Opponent-Learning Awareness (LOLA) introduced opponent shaping to this setting, by accounting for each agent's influence on their opponents' anticipated learning steps. However, the original LOLA formulation (and follow-up work) is inconsistent because… ▽ More

    Submitted 27 June, 2022; v1 submitted 8 March, 2022; originally announced March 2022.

    Comments: Accepted @ ICML 2022

  11. arXiv:1906.05915  [pdf, other

    cs.LG stat.ML

    Recurrent Neural Processes

    Authors: Timon Willi, Jonathan Masci, Jürgen Schmidhuber, Christian Osendorfer

    Abstract: We extend Neural Processes (NPs) to sequential data through Recurrent NPs or RNPs, a family of conditional state space models. RNPs model the state space with Neural Processes. Given time series observed on fast real-world time scales but containing slow long-term variabilities, RNPs may derive appropriate slow latent time scales. They do so in an efficient manner by establishing conditional indep… ▽ More

    Submitted 5 November, 2019; v1 submitted 13 June, 2019; originally announced June 2019.