Search | arXiv e-print repository

Curiosity in Hindsight: Intrinsic Exploration in Stochastic Environments

Authors: Daniel Jarrett, Corentin Tallec, Florent Altché, Thomas Mesnard, Rémi Munos, Michal Valko

Abstract: Consider the problem of exploration in sparse-reward or reward-free environments, such as in Montezuma's Revenge. In the curiosity-driven paradigm, the agent is rewarded for how much each realized outcome differs from their predicted outcome. But using predictive error as intrinsic motivation is fragile in stochastic environments, as the agent may become trapped by high-entropy areas of the state-… ▽ More Consider the problem of exploration in sparse-reward or reward-free environments, such as in Montezuma's Revenge. In the curiosity-driven paradigm, the agent is rewarded for how much each realized outcome differs from their predicted outcome. But using predictive error as intrinsic motivation is fragile in stochastic environments, as the agent may become trapped by high-entropy areas of the state-action space, such as a "noisy TV". In this work, we study a natural solution derived from structural causal models of the world: Our key idea is to learn representations of the future that capture precisely the unpredictable aspects of each outcome -- which we use as additional input for predictions, such that intrinsic rewards only reflect the predictable aspects of world dynamics. First, we propose incorporating such hindsight representations into models to disentangle "noise" from "novelty", yielding Curiosity in Hindsight: a simple and scalable generalization of curiosity that is robust to stochasticity. Second, we instantiate this framework for the recently introduced BYOL-Explore algorithm as our prime example, resulting in the noise-robust BYOL-Hindsight. Third, we illustrate its behavior under a variety of different stochasticities in a grid world, and find improvements over BYOL-Explore in hard-exploration Atari games with sticky actions. Notably, we show state-of-the-art results in exploring Montezuma's Revenge with sticky actions, while preserving performance in the non-sticky setting. △ Less

Submitted 14 July, 2023; v1 submitted 18 November, 2022; originally announced November 2022.

Journal ref: In Proc. 40th International Conference on Machine Learning (ICML 2023)

arXiv:2211.04236 [pdf, other]

Self-conditioned Embedding Diffusion for Text Generation

Authors: Robin Strudel, Corentin Tallec, Florent Altché, Yilun Du, Yaroslav Ganin, Arthur Mensch, Will Grathwohl, Nikolay Savinov, Sander Dieleman, Laurent Sifre, Rémi Leblond

Abstract: Can continuous diffusion models bring the same performance breakthrough on natural language they did for image generation? To circumvent the discrete nature of text data, we can simply project tokens in a continuous space of embeddings, as is standard in language modeling. We propose Self-conditioned Embedding Diffusion, a continuous diffusion mechanism that operates on token embeddings and allows… ▽ More Can continuous diffusion models bring the same performance breakthrough on natural language they did for image generation? To circumvent the discrete nature of text data, we can simply project tokens in a continuous space of embeddings, as is standard in language modeling. We propose Self-conditioned Embedding Diffusion, a continuous diffusion mechanism that operates on token embeddings and allows to learn flexible and scalable diffusion models for both conditional and unconditional text generation. Through qualitative and quantitative evaluation, we show that our text diffusion models generate samples comparable with those produced by standard autoregressive language models - while being in theory more efficient on accelerator hardware at inference time. Our work paves the way for scaling up diffusion models for text, similarly to autoregressive models, and for improving performance with recent refinements to continuous diffusion. △ Less

Submitted 8 November, 2022; originally announced November 2022.

Comments: 15 pages

arXiv:2206.08332 [pdf, other]

BYOL-Explore: Exploration by Bootstrapped Prediction

Authors: Zhaohan Daniel Guo, Shantanu Thakoor, Miruna Pîslar, Bernardo Avila Pires, Florent Altché, Corentin Tallec, Alaa Saade, Daniele Calandriello, Jean-Bastien Grill, Yunhao Tang, Michal Valko, Rémi Munos, Mohammad Gheshlaghi Azar, Bilal Piot

Abstract: We present BYOL-Explore, a conceptually simple yet general approach for curiosity-driven exploration in visually-complex environments. BYOL-Explore learns a world representation, the world dynamics, and an exploration policy all-together by optimizing a single prediction loss in the latent space with no additional auxiliary objective. We show that BYOL-Explore is effective in DM-HARD-8, a challeng… ▽ More We present BYOL-Explore, a conceptually simple yet general approach for curiosity-driven exploration in visually-complex environments. BYOL-Explore learns a world representation, the world dynamics, and an exploration policy all-together by optimizing a single prediction loss in the latent space with no additional auxiliary objective. We show that BYOL-Explore is effective in DM-HARD-8, a challenging partially-observable continuous-action hard-exploration benchmark with visually-rich 3-D environments. On this benchmark, we solve the majority of the tasks purely through augmenting the extrinsic reward with BYOL-Explore s intrinsic reward, whereas prior work could only get off the ground with human demonstrations. As further evidence of the generality of BYOL-Explore, we show that it achieves superhuman performance on the ten hardest exploration games in Atari while having a much simpler design than other competitive agents. △ Less

Submitted 16 June, 2022; originally announced June 2022.

arXiv:2103.16559 [pdf, other]

Broaden Your Views for Self-Supervised Video Learning

Authors: Adrià Recasens, Pauline Luc, Jean-Baptiste Alayrac, Luyu Wang, Ross Hemsley, Florian Strub, Corentin Tallec, Mateusz Malinowski, Viorica Patraucean, Florent Altché, Michal Valko, Jean-Bastien Grill, Aäron van den Oord, Andrew Zisserman

Abstract: Most successful self-supervised learning methods are trained to align the representations of two independent views from the data. State-of-the-art methods in video are inspired by image techniques, where these two views are similarly extracted by cropping and augmenting the resulting crop. However, these methods miss a crucial element in the video domain: time. We introduce BraVe, a self-supervise… ▽ More Most successful self-supervised learning methods are trained to align the representations of two independent views from the data. State-of-the-art methods in video are inspired by image techniques, where these two views are similarly extracted by cropping and augmenting the resulting crop. However, these methods miss a crucial element in the video domain: time. We introduce BraVe, a self-supervised learning framework for video. In BraVe, one of the views has access to a narrow temporal window of the video while the other view has a broad access to the video content. Our models learn to generalise from the narrow view to the general content of the video. Furthermore, BraVe processes the views with different backbones, enabling the use of alternative augmentations or modalities into the broad view such as optical flow, randomly convolved RGB frames, audio or their combinations. We demonstrate that BraVe achieves state-of-the-art results in self-supervised representation learning on standard video and audio classification benchmarks including UCF101, HMDB51, Kinetics, ESC-50 and AudioSet. △ Less

Submitted 19 October, 2021; v1 submitted 30 March, 2021; originally announced March 2021.

Comments: This paper is an extended version of our ICCV-21 paper. It includes more results as well as a minor architectural variation which improves results

arXiv:2010.10241 [pdf, ps, other]

BYOL works even without batch statistics

Authors: Pierre H. Richemond, Jean-Bastien Grill, Florent Altché, Corentin Tallec, Florian Strub, Andrew Brock, Samuel Smith, Soham De, Razvan Pascanu, Bilal Piot, Michal Valko

Abstract: Bootstrap Your Own Latent (BYOL) is a self-supervised learning approach for image representation. From an augmented view of an image, BYOL trains an online network to predict a target network representation of a different augmented view of the same image. Unlike contrastive methods, BYOL does not explicitly use a repulsion term built from negative pairs in its training objective. Yet, it avoids co… ▽ More Bootstrap Your Own Latent (BYOL) is a self-supervised learning approach for image representation. From an augmented view of an image, BYOL trains an online network to predict a target network representation of a different augmented view of the same image. Unlike contrastive methods, BYOL does not explicitly use a repulsion term built from negative pairs in its training objective. Yet, it avoids collapse to a trivial, constant representation. Thus, it has recently been hypothesized that batch normalization (BN) is critical to prevent collapse in BYOL. Indeed, BN flows gradients across batch elements, and could leak information about negative views in the batch, which could act as an implicit negative (contrastive) term. However, we experimentally show that replacing BN with a batch-independent normalization scheme (namely, a combination of group normalization and weight standardization) achieves performance comparable to vanilla BYOL ($73.9\%$ vs. $74.3\%$ top-1 accuracy under the linear evaluation protocol on ImageNet with ResNet-$50$). Our finding disproves the hypothesis that the use of batch statistics is a crucial ingredient for BYOL to learn useful representations. △ Less

Submitted 20 October, 2020; originally announced October 2020.

arXiv:2007.12509 [pdf, other]

Monte-Carlo Tree Search as Regularized Policy Optimization

Authors: Jean-Bastien Grill, Florent Altché, Yunhao Tang, Thomas Hubert, Michal Valko, Ioannis Antonoglou, Rémi Munos

Abstract: The combination of Monte-Carlo tree search (MCTS) with deep reinforcement learning has led to significant advances in artificial intelligence. However, AlphaZero, the current state-of-the-art MCTS algorithm, still relies on handcrafted heuristics that are only partially understood. In this paper, we show that AlphaZero's search heuristics, along with other common ones such as UCT, are an approxima… ▽ More The combination of Monte-Carlo tree search (MCTS) with deep reinforcement learning has led to significant advances in artificial intelligence. However, AlphaZero, the current state-of-the-art MCTS algorithm, still relies on handcrafted heuristics that are only partially understood. In this paper, we show that AlphaZero's search heuristics, along with other common ones such as UCT, are an approximation to the solution of a specific regularized policy optimization problem. With this insight, we propose a variant of AlphaZero which uses the exact solution to this policy optimization problem, and show experimentally that it reliably outperforms the original algorithm in multiple domains. △ Less

Submitted 24 July, 2020; originally announced July 2020.

Comments: Accepted to International Conference on Machine Learning (ICML), 2020

arXiv:2006.07733 [pdf, other]

Bootstrap your own latent: A new approach to self-supervised Learning

Authors: Jean-Bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre H. Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Bilal Piot, Koray Kavukcuoglu, Rémi Munos, Michal Valko

Abstract: We introduce Bootstrap Your Own Latent (BYOL), a new approach to self-supervised image representation learning. BYOL relies on two neural networks, referred to as online and target networks, that interact and learn from each other. From an augmented view of an image, we train the online network to predict the target network representation of the same image under a different augmented view. At the… ▽ More We introduce Bootstrap Your Own Latent (BYOL), a new approach to self-supervised image representation learning. BYOL relies on two neural networks, referred to as online and target networks, that interact and learn from each other. From an augmented view of an image, we train the online network to predict the target network representation of the same image under a different augmented view. At the same time, we update the target network with a slow-moving average of the online network. While state-of-the art methods rely on negative pairs, BYOL achieves a new state of the art without them. BYOL reaches $74.3\%$ top-1 classification accuracy on ImageNet using a linear evaluation with a ResNet-50 architecture and $79.6\%$ with a larger ResNet. We show that BYOL performs on par or better than the current state of the art on both transfer and semi-supervised benchmarks. Our implementation and pretrained models are given on GitHub. △ Less

Submitted 10 September, 2020; v1 submitted 13 June, 2020; originally announced June 2020.

arXiv:2004.14646 [pdf, other]

Bootstrap Latent-Predictive Representations for Multitask Reinforcement Learning

Authors: Daniel Guo, Bernardo Avila Pires, Bilal Piot, Jean-bastien Grill, Florent Altché, Rémi Munos, Mohammad Gheshlaghi Azar

Abstract: Learning a good representation is an essential component for deep reinforcement learning (RL). Representation learning is especially important in multitask and partially observable settings where building a representation of the unknown environment is crucial to solve the tasks. Here we introduce Prediction of Bootstrap Latents (PBL), a simple and flexible self-supervised representation learning a… ▽ More Learning a good representation is an essential component for deep reinforcement learning (RL). Representation learning is especially important in multitask and partially observable settings where building a representation of the unknown environment is crucial to solve the tasks. Here we introduce Prediction of Bootstrap Latents (PBL), a simple and flexible self-supervised representation learning algorithm for multitask deep RL. PBL builds on multistep predictive representations of future observations, and focuses on capturing structured information about environment dynamics. Specifically, PBL trains its representation by predicting latent embeddings of future observations. These latent embeddings are themselves trained to be predictive of the aforementioned representations. These predictions form a bootstrapping effect, allowing the agent to learn more about the key aspects of the environment dynamics. In addition, by defining prediction tasks completely in latent space, PBL provides the flexibility of using multimodal observations involving pixel images, language instructions, rewards and more. We show in our experiments that PBL delivers across-the-board improved performance over state of the art deep RL agents in the DMLab-30 and Atari-57 multitask setting. △ Less

Submitted 30 April, 2020; originally announced April 2020.

arXiv:1902.07685 [pdf, other]

World Discovery Models

Authors: Mohammad Gheshlaghi Azar, Bilal Piot, Bernardo Avila Pires, Jean-Bastien Grill, Florent Altché, Rémi Munos

Abstract: As humans we are driven by a strong desire for seeking novelty in our world. Also upon observing a novel pattern we are capable of refining our understanding of the world based on the new information---humans can discover their world. The outstanding ability of the human mind for discovery has led to many breakthroughs in science, art and technology. Here we investigate the possibility of building… ▽ More As humans we are driven by a strong desire for seeking novelty in our world. Also upon observing a novel pattern we are capable of refining our understanding of the world based on the new information---humans can discover their world. The outstanding ability of the human mind for discovery has led to many breakthroughs in science, art and technology. Here we investigate the possibility of building an agent capable of discovering its world using the modern AI technology. In particular we introduce NDIGO, Neural Differential Information Gain Optimisation, a self-supervised discovery model that aims at seeking new information to construct a global view of its world from partial and noisy observations. Our experiments on some controlled 2-D navigation tasks show that NDIGO outperforms state-of-the-art information-seeking methods in terms of the quality of the learned representation. The improvement in performance is particularly significant in the presence of white or structured noise where other information-seeking methods follow the noise instead of discovering their world. △ Less

Submitted 1 March, 2019; v1 submitted 20 February, 2019; originally announced February 2019.

arXiv:1810.09365 [pdf, other]

Coupled Longitudinal and Lateral Control of a Vehicle using Deep Learning

Authors: Guillaume Devineau, Philip Polack, Florent Altché, Fabien Moutarde

Abstract: This paper explores the capability of deep neural networks to capture key characteristics of vehicle dynamics, and their ability to perform coupled longitudinal and lateral control of a vehicle. To this extent, two different artificial neural networks are trained to compute vehicle controls corresponding to a reference trajectory, using a dataset based on high-fidelity simulations of vehicle dynam… ▽ More This paper explores the capability of deep neural networks to capture key characteristics of vehicle dynamics, and their ability to perform coupled longitudinal and lateral control of a vehicle. To this extent, two different artificial neural networks are trained to compute vehicle controls corresponding to a reference trajectory, using a dataset based on high-fidelity simulations of vehicle dynamics. In this study, control inputs are chosen as the steering angle of the front wheels, and the applied torque on each wheel. The performance of both models, namely a Multi-Layer Perceptron (MLP) and a Convolutional Neural Network (CNN), is evaluated based on their ability to drive the vehicle on a challenging test track, shifting between long straight lines and tight curves. A comparison to conventional decoupled controllers on the same track is also provided. △ Less

Submitted 22 October, 2018; originally announced October 2018.

Comments: Published in the IEEE 2018 International Conference on Intelligent Transportation Systems (ITSC 2018)

arXiv:1801.07962 [pdf, other]

An LSTM Network for Highway Trajectory Prediction

Authors: Florent Altché, Arnaud de La Fortelle

Abstract: In order to drive safely and efficiently on public roads, autonomous vehicles will have to understand the intentions of surrounding vehicles, and adapt their own behavior accordingly. If experienced human drivers are generally good at inferring other vehicles' motion up to a few seconds in the future, most current Advanced Driving Assistance Systems (ADAS) are unable to perform such medium-term fo… ▽ More In order to drive safely and efficiently on public roads, autonomous vehicles will have to understand the intentions of surrounding vehicles, and adapt their own behavior accordingly. If experienced human drivers are generally good at inferring other vehicles' motion up to a few seconds in the future, most current Advanced Driving Assistance Systems (ADAS) are unable to perform such medium-term forecasts, and are usually limited to high-likelihood situations such as emergency braking. In this article, we present a first step towards consistent trajectory prediction by introducing a long short-term memory (LSTM) neural network, which is capable of accurately predicting future longitudinal and lateral trajectories for vehicles on highway. Unlike previous work focusing on a low number of trajectories collected from a few drivers, our network was trained and validated on the NGSIM US-101 dataset, which contains a total of 800 hours of recorded trajectories in various traffic densities, representing more than 6000 individual drivers. △ Less

Submitted 24 January, 2018; originally announced January 2018.

Comments: Presented at IEEE ITSC 2017

arXiv:1706.08046 [pdf, other]

An Algorithm for Supervised Driving of Cooperative Semi-Autonomous Vehicles (Extended)

Authors: Florent Altche, Xiangjun Qian, Arnaud de La Fortelle

Abstract: Before reaching full autonomy, vehicles will gradually be equipped with more and more advanced driver assistance systems (ADAS), effectively rendering them semi-autonomous. However, current ADAS technologies seem unable to handle complex traffic situations, notably when dealing with vehicles arriving from the sides, either at intersections or when merging on highways. The high rate of accidents in… ▽ More Before reaching full autonomy, vehicles will gradually be equipped with more and more advanced driver assistance systems (ADAS), effectively rendering them semi-autonomous. However, current ADAS technologies seem unable to handle complex traffic situations, notably when dealing with vehicles arriving from the sides, either at intersections or when merging on highways. The high rate of accidents in these settings prove that they constitute difficult driving situations. Moreover, intersections and merging lanes are often the source of important traffic congestion and, sometimes, deadlocks. In this article, we propose a cooperative framework to safely coordinate semi-autonomous vehicles in such settings, removing the risk of collision or deadlocks while remaining compatible with human driving. More specifically, we present a supervised coordination scheme that overrides control inputs from human drivers when they would result in an unsafe or blocked situation. To avoid unnecessary intervention and remain compatible with human driving, overriding only occurs when collisions or deadlocks are imminent. In this case, safe overriding controls are chosen while ensuring they deviate minimally from those originally requested by the drivers. Simulation results based on a realistic physics simulator show that our approach is scalable to real-world scenarios, and computations can be performed in real-time on a standard computer for up to a dozen simultaneous vehicles. △ Less

Submitted 25 June, 2017; originally announced June 2017.

arXiv:1605.00026 [pdf, other]

A Distributed Model Predictive Control Framework for Road-Following Formation Control of Car-like Vehicles (Extended Version)

Authors: Xiangjun Qian, Florent Altché, Arnaud de La Fortelle, Fabien Moutarde

Abstract: This work presents a novel framework for the formation control of multiple autonomous ground vehicles in an on-road environment. Unique challenges of this problem lie in 1) the design of collision avoidance strategies with obstacles and with other vehicles in a highly structured environment, 2) dynamic reconfiguration of the formation to handle different task specifications. In this paper, we desi… ▽ More This work presents a novel framework for the formation control of multiple autonomous ground vehicles in an on-road environment. Unique challenges of this problem lie in 1) the design of collision avoidance strategies with obstacles and with other vehicles in a highly structured environment, 2) dynamic reconfiguration of the formation to handle different task specifications. In this paper, we design a local MPC-based tracking controller for each individual vehicle to follow a reference trajectory while satisfying various constraints (kinematics and dynamics, collision avoidance, \textit{etc.}). The reference trajectory of a vehicle is computed from its leader's trajectory, based on a pre-defined formation tree. We use logic rules to organize the collision avoidance behaviors of member vehicles. Moreover, we propose a methodology to safely reconfigure the formation on-the-fly. The proposed framework has been validated using high-fidelity simulations. △ Less

Submitted 29 April, 2016; originally announced May 2016.

Comments: Extended version of the conference paper submission on ICARCV'16

arXiv:1603.04610 [pdf, other]

doi 10.1109/IROS.2016.7759737

Time-optimal Coordination of Mobile Robots along Specified Paths

Authors: Florent Altché, Xiangjun Qian, Arnaud de La Fortelle

Abstract: In this paper, we address the problem of time-optimal coordination of mobile robots under kinodynamic constraints along specified paths. We propose a novel approach based on time discretization that leads to a mixed-integer linear programming (MILP) formulation. This problem can be solved using general-purpose MILP solvers in a reasonable time, resulting in a resolution-optimal solution. Moreover,… ▽ More In this paper, we address the problem of time-optimal coordination of mobile robots under kinodynamic constraints along specified paths. We propose a novel approach based on time discretization that leads to a mixed-integer linear programming (MILP) formulation. This problem can be solved using general-purpose MILP solvers in a reasonable time, resulting in a resolution-optimal solution. Moreover, unlike previous work found in the literature, our formulation allows an exact linear modeling (up to the discretization resolution) of second-order dynamic constraints. Extensive simulations are performed to demonstrate the effectiveness of our approach. △ Less

Submitted 5 April, 2017; v1 submitted 15 March, 2016; originally announced March 2016.

Comments: Published in 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

Showing 1–14 of 14 results for author: Altché, F