William Fedus

Who Am I?

I'm a machine learning researcher, a Ph.D. candidate at the Université de Montréal coadvised by Yoshua Bengio and Hugo Larochelle, and a Student Research Scholar at Google Brain. My research interests span supervised, unsupervised and reinforcement learning -- all with an aim to build robust, reliable algorithms. Along with a growing number of researchers in machine learning, my background is in physics. I spent several sunny years at UC San Diego getting an M.S. in physics where I was coadvised by David Meyer and Gary Cottrell . Before that, I graduated from MIT in physics and collaborated on a directional dark matter detector: DMTPC.

2018-Now

Google Brain

Student Research Scholar

2017-Now

Université de Montréal

Ph.D. in Computer Science

Winter and Spring 2017

Google Brain

Internship on Generative Adversarial Networks (GANs)

Summer and Fall 2016

Google Accelerated Sciences

Internship on microscopy image semantic segmentation

2013-2017

UC San Diego

M.S. in Physics

2010-2013

Fidelity

Equity research associate

2008-2009

Cambridge University

Junior year abroad in physics

2006-2010

Massachusetts Institute of Technology

B.S. in Physics (8A for you MIT nerds)

Machine Learning

Revisiting Fundamentals of Experience Replay

William Fedus, Prajit Ramachandran, Rishabh Agarwal, Yoshua Bengio, Hugo Larochelle, Mark Rowland, Will Dabney

Experience replay is central to off-policy algorithms in deep reinforcement learning (RL), but there remain significant gaps in our understanding. We therefore present a systematic and extensive analysis of experience replay in Q-learning methods, focusing on two fundamental properties: the replay capacity and the ratio of learning updates to experience collected (replay ratio). Our additive and ablative studies upend conventional wisdom around experience replay - greater capacity is found to substantially increase the performance of certain algorithms, while leaving others unaffected. Counter-intuitively we show that theoretically ungrounded, uncorrected n-step returns are uniquely beneficial while other techniques confer limited benefit for sifting through larger memory. Separately, by directly controlling the replay ratio we contextualize previous observations in the literature and empirically measure its importance across three deep RL algorithms. Finally, we conclude by testing a set of hypotheses on the nature of these performance benefits.

ICML 2020

Paper
Code

On Catastrophic Interference in Atari 2600 Games

William Fedus*, Dibya Ghosh*, John D. Martin, Marc G. Bellemare, Yoshua Bengio, Hugo Larochelle

Model-free deep reinforcement learning algorithms are troubled with poor sample efficiency -- learning reliable policies generally requires a vast amount of interaction with the environment. One hypothesis is that catastrophic interference between various segments within the environment is an issue. In this paper, we perform a large-scale empirical study on the presence of catastrophic interference in the Arcade Learning Environment and find that learning particular game segments frequently degrades performance on previously learned segments. In what we term the Memento observation, we show that an identically parameterized agent spawned from a state where the original agent plateaued, reliably makes further progress. This phenomenon is general -- we find consistent performance boosts across architectures, learning algorithms and environments. Our results indicate that eliminating catastrophic interference can contribute towards improved performance and data efficiency of deep reinforcement learning algorithms.

NeurIPS 2019 BARL Workshop (Oral)

Algorithmic Improvements for Deep Reinforcement Learning applied to Interactive Fiction

Vishal Jain, William Fedus, Hugo Larochelle, Doina Precup, Marc G. Bellemare

Text-based games are a natural challenge domain for deep reinforcement learning algorithms. Their state and action spaces are combinatorially large, their reward function is sparse, and they are partially observable: the agent is informed of the consequences of its actions through textual feedback. In this paper we emphasize this latter point and consider the design of a deep reinforcement learning agent that can play from feedback alone. Our design recognizes and takes advantage of the structural characteristics of text-based games. We first propose a contextualisation mechanism, based on accumulated reward, which simplifies the learning problem and mitigates partial observability. We then study different methods that rely on the notion that most actions are ineffectual in any given situation, following Zahavy et al.'s idea of an admissible action. We evaluate these techniques in a series of text-based games of increasing difficulty based on the TextWorld framework, as well as the iconic game Zork. Empirically, we find that these techniques improve the performance of a baseline deep reinforcement learning agent applied to text-based games.

AAAI 2020 (Oral)

Paper

Benchmarking Bonus-Based Exploration Methods on the Arcade Learning Environment

Adrien Ali Taïga, William Fedus, Marlos C. Machado, Aaron Courville, Marc G. Bellemare

This paper provides an empirical evaluation of recently developed exploration algorithms within the Arcade Learning Environment (ALE). We study the use of different reward bonuses that incentives exploration in reinforcement learning. We do so by fixing the learning algorithm used and focusing only on the impact of the different exploration bonuses in the agent's performance. We use Rainbow, the state-of-the-art algorithm for value-based agents, and focus on some of the bonuses proposed in the last few years. We consider the impact these algorithms have on performance within the popular game Montezuma's Revenge which has gathered a lot of interest from the exploration community, across the the set of seven games identified by Bellemare et al. (2016) as challenging for exploration, and easier games where exploration is not an issue. We find that, in our setting, recently developed bonuses do not provide significantly improved performance on Montezuma's Revenge or hard exploration games. We also find that existing bonus-based methods may negatively impact performance on games in which exploration is not an issue and may even perform worse than ϵ-greedy exploration.

ICML 2019 Workshop (Oral, Best Paper); ICLR 2020

Hyperbolic Discounting and Learning over Multiple Horizons

William Fedus, Carles Gelada, Yoshua Bengio, Marc G. Bellemare, Hugo Larochelle

Reinforcement learning (RL) typically defines a discount factor as part of the Markov Decision Process. The discount factor values future rewards by an exponential scheme that leads to theoretical convergence guarantees of the Bellman equation. However, evidence from psychology, economics and neuroscience suggests that humans and animals instead have hyperbolic time-preferences. In this work we revisit the fundamentals of discounting in RL and bridge this disconnect by implementing an RL agent that acts via hyperbolic discounting. We demonstrate that a simple approach approximates hyperbolic discount functions while still using familiar temporal-difference learning techniques in RL. Additionally, and independent of hyperbolic discounting, we make a surprising discovery that simultaneously learning value functions over multiple time-horizons is an effective auxiliary task which often improves over a strong value-based RL agent, Rainbow.

RLDM 2019 (Oral, Best Paper)

Paper
Code

Language GANs Falling Short

Massimo Caccia*, Lucas Caccia*, William Fedus, Hugo Larochelle, Joelle Pineau, Laurent Charlin

Generating high-quality text with sufficient diversity is essential for a wide range of Natural Language Generation (NLG) tasks. Maximum-Likelihood (MLE) models trained with teacher forcing have consistently been reported as weak baselines, where poor performance is attributed to exposure bias (Bengio et al., 2015; Ranzato et al., 2015); at inference time, the model is fed its own prediction instead of a ground-truth token, which can lead to accumulating errors and poor samples. This line of reasoning has led to an outbreak of adversarial based approaches for NLG, on the account that GANs do not suffer from exposure bias. In this work, we make several surprising observations which contradict common beliefs. First, we revisit the canonical evaluation framework for NLG, and point out fundamental flaws with quality-only evaluation: we show that one can outperform such metrics using a simple, well-known temperature parameter to artificially reduce the entropy of the model's conditional distributions. Second, we leverage the control over the quality / diversity trade-off given by this parameter to evaluate models over the whole quality-diversity spectrum and find MLE models constantly outperform the proposed GAN variants over the whole quality-diversity space. Our results have several implications: 1) The impact of exposure bias on sample quality is less severe than previously thought, 2) temperature tuning provides a better quality / diversity trade-off than adversarial training while being easier to train, easier to cross-validate, and less computationally expensive.

NeurIPS 2018 Workshop; ICLR 2020

Paper
Code

Deep Graph Infomax

Petar Veličković, William Fedus, William L. Hamilton, Pietro Liò, Yoshua Bengio, R Devon Hjelm

We present Deep Graph Infomax (DGI), a general approach for learning node representations within graph-structured data in an unsupervised manner. DGI relies on maximizing mutual information between patch representations and corresponding high-level summaries of graphs---both derived using established graph convolutional network architectures. The learnt patch representations summarize subgraphs centered around nodes of interest, and can thus be reused for downstream node-wise learning tasks. In contrast to most prior approaches to unsupervised learning with GCNs, DGI does not rely on random walk objectives, and is readily applicable to both transductive and inductive learning setups. We demonstrate competitive performance on a variety of node classification benchmarks, which at times even exceeds the performance of supervised learning.

ICLR 2019

Paper
Code

Recall Traces: Backtracking Models for Efficient Reinforcement Learning

Anirudh Goyal, Philemon Brakel, William Fedus, Soumye Singhal, Timothy Lillicrap, Sergey Levine, Hugo Larochelle, Yoshua Bengio

In many environments only a tiny subset of all states yield high reward. In these cases, few of the interactions with the environment provide a relevant learning signal. Hence, we may want to preferentially train on those high-reward states and the probable trajectories leading to them. To this end, we advocate for the use of a backtracking model that predicts the preceding states that terminate at a given high-reward state. We can train a model which, starting from a high value state (or one that is estimated to have high value), predicts and sample for which the (state, action)-tuples may have led to that high value state. These traces of (state, action) pairs, which we refer to as Recall Traces, sampled from this backtracking model starting from a high value state, are informative as they terminate in good states, and hence we can use these traces to improve a policy. We provide a variational interpretation for this idea and a practical algorithm in which the backtracking model samples from an approximate posterior distribution over trajectories which lead to large rewards. Our method improves the sample efficiency of both on- and off-policy RL algorithms across several environments and tasks.

ICLR 2019

Paper

Disentangling the independently controllable factors of variation by interacting with the world

Valentin Thomas*, Emmanuel Bengio*, William Fedus*, Jules Pondard, Philippe Beaudoin, Hugo Larochelle, Joelle Pineau, Doina Precup, Yoshua Bengio

It has been postulated that a good representation is one that disentangles the underlying explanatory factors of variation. However, it remains an open question what kind of training framework could potentially achieve that. Whereas most previous work focuses on the static setting (e.g., with images), we postulate that some of the causal factors could be discovered if the learner is allowed to interact with its environment. The agent can experiment with different actions and observe their effects. More specifically, we hypothesize that some of these factors correspond to aspects of the environment which are independently controllable, i.e., that there exists a policy and a learnable feature for each such aspect of the environment, such that this policy can yield changes in that feature with minimal changes to other features that explain the statistical variations in the observed data. We propose a specific objective function to find such factors, and verify experimentally that it can indeed disentangle independently controllable aspects of the environment without any extrinsic reward signal.

NIPS 2017 Workshop

Paper

Many Paths to Equilibrium: GANs Do Not Need to Decrease a Divergence At Every Step

William Fedus*, Mihaela Rosca*, Balaji Lakshminarayanan, Andrew M. Dai, Shakir Mohamed, Ian Goodfellow

Generative adversarial networks (GANs) are a family of generative models that do not minimize a single training criterion. Unlike other generative models, the data distribution is learned via a game between a generator (the generative model) and a discriminator (a teacher providing training signal) that each minimize their own cost. GANs are designed to reach a Nash equilibrium at which each player cannot reduce their cost without changing the other players' parameters. One useful approach for the theory of GANs is to show that a divergence between the training distribution and the model distribution obtains its minimum value at equilibrium. Several recent research directions have been motivated by the idea that this divergence is the primary guide for the learning process and that every step of learning should decrease the divergence. We show that this view is overly restrictive. During GAN training, the discriminator provides learning signal in situations where the gradients of the divergences between distributions would not be useful. We provide empirical counterexamples to the view of GAN training as divergence minimization. Specifically, we demonstrate that GANs are able to learn distributions in situations where the divergence minimization point of view predicts they would fail. We also show that gradient penalties motivated from the divergence minimization perspective are equally helpful when applied in other contexts in which the divergence minimization perspective does not predict they would be helpful. This contributes to a growing body of evidence that GAN training may be more usefully viewed as approaching Nash equilibria via trajectories that do not necessarily minimize a specific divergence at each step.

ICLR 2018

Paper

MaskGAN: Better Text Generation via Filling in the ______

William Fedus, Ian Goodfellow, Andrew M. Dai

Neural text generation models are often autoregressive language models or seq2seq models. Neural autoregressive and seq2seq models that generate text by sam- pling words sequentially, with each word conditioned on the previous model, are state-of-the-art for several machine translation and summarization benchmarks. These benchmarks are often defined by validation perplexity even though this is not a direct measure of sample quality. Language models are typically trained via maximum likelihood and most often with teacher forcing. Teacher forcing is well-suited to optimizing perplexity but can result in poor sample quality because generating text requires conditioning on sequences of words that were never ob- served at training time. We propose to improve sample quality using Generative Adversarial Networks (GANs), which explicitly train the generator to produce high quality samples and have shown a lot of success in image generation. GANs were originally designed to output differentiable values, so discrete language generation is challenging for them. We introduce an actor-critic conditional GAN that fills in missing text conditioned on the surrounding context. We show qualitatively and quantitatively, evidence that this produces more realistic text samples compared to a maximum likelihood trained model.

ICLR 2018

Paper
Code

In-silico Labeling: Training Networks to Predict Fluorescent Labels in Unlabeled Images

E. Christiansen et al.

Microscopy is a central method in life sciences. Many popular methods, such as antibody labeling, are used to add physical fluorescent labels to specific cellular constituents. However, these approaches have significant drawbacks, including inconsistency; limitations in the number of simultaneous labels because of spectral overlap; and necessary perturbations of the experiment, such as fixing the cells, to generate the measurement. Here, we show that a computational machine-learning approach, which we call ‘‘in silico labeling’’ (ISL), reliably predicts some fluorescent labels from transmitted-light images of unlabeled fixed or live biological samples. ISL predicts a range of labels, such as those for nuclei, cell type (e.g., neural), and cell state (e.g., cell death). Because prediction happens in silico, the method is consistent, is not limited by spectral overlap, and does not disturb the experiment. ISL generates biological measurements that would otherwise be problematic or impossible to acquire.

Cell Journal

Paper
Code

Persistent Homology for Mobile Phone Data Analysis

William Fedus, Mike Gartner, Alex Georges, David A. Meyer, David Rideout

Topological data analysis is a new approach to analyzing the structure of high dimensional datasets. Persistent homology, specifically, generalizes hierarchical clustering methods to identify significant higher dimensional properties. In this project, we analyze mobile network data from Senegal to determine whether significant topological structure is present. We investigate two independent questions: whether the introduction of the Dakar motorway has any significant impact on the topological structure of the data, and how communities can be constructed using this method. We consider three independent metrics to compute the persistent homology. In two of these metrics, we see no topological change in the data given the introduction of the motorway; in the remaining metric, we see a possible indication of topological change. The behavior of clustering using the persistent homology calculation is sensitive to the choice of metric, and is similar in one case to the communities computed using modularity maximization.

NetMob 2015 Challenge

Paper
Poster

Physics

Background Rejection in the DMTPC Dark Matter Search Using Charge Signals

J. Lopez et al.

The Dark Matter Time Projection Chamber (DMTPC) collaboration is developing low-pressure gas TPC detectors for measuring WIMP-nucleon interactions. Optical readout with CCD cameras allows for the detection for the daily modulation in the direction of the dark matter wind, while several charge readout channels allow for the measurement of additional recoil properties. In this article, we show that the addition of the charge readout analysis to the CCD allows us too obtain a statistics-limited 90% C.L. upper limit on the e− rejection factor of 5.6 × 10−6 for recoils with energies between 40 and 200 keVee. In addition, requiring coincidence between charge signals and light in the CCD reduces CCD-specific backgrounds by more than two orders of magnitude.

Nuclear Instruments and Methods in Physics Research Sec A

Paper

DMTPC: Dark matter detection with directional sensitivity.

J. Battat et al.

The Dark Matter Time Projection Chamber (DMTPC) experiment uses CF_4 gas at low pressure (0.1 atm) to search for the directional signature of Galactic WIMP dark matter. We describe the DMTPC apparatus and summarize recent results from a 35.7 g-day exposure surface run at MIT. After nuclear recoil cuts are applied to the data, we find 105 candidate events in the energy range 80 - 200 keV, which is consistent with the expected cosmogenic neutron background. Using this data, we obtain a limit on the spin-dependent WIMP-proton cross-section of 2.0 \times 10^{-33} cm^2 at a WIMP mass of 115 GeV/c^2. This detector is currently deployed underground at the Waste Isolation Pilot Plant in New Mexico.

Paper

First Dark Matter Search Results from a Surface Run of the 10-L DMTPC Directional Dark Matter Detector.

S. Ahlen et al.

The Dark Matter Time Projection Chamber (DMTPC) is a low pressure (75 Torr CF4) 10 liter detector capable of measuring the vector direction of nuclear recoils with the goal of directional dark matter detection. In this paper we present the first dark matter limit from DMTPC. In an analysis window of 80-200 keV recoil energy, based on a 35.7 g-day exposure, we set a 90% C.L. upper limit on the spin-dependent WIMP-proton cross section of 2.0 x 10^{-33} cm^{2} for 115 GeV/c^2 dark matter particle mass.

Physics Letters B

Paper

The case for a directional dark matter detector and the status of current experimental efforts.

S. Ahlen et al.

We present the case for a dark matter detector with directional sensitivity. This document was developed at the 2009 CYGNUS workshop on directional dark matter detection, and contains contributions from theorists and experimental groups in the field. We describe the need for a dark matter detector with directional sensitivity; each directional dark matter experiment presents their project's status; and we close with a feasibility study for scaling up to a one ton directional detector, which would cost around $150M.

International Journal of Modern Physics A

Paper

Who Am I?

Machine Learning

Physics

Random