Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–39 of 39 results for author: Bacon, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.18213  [pdf, other

    cs.LG cs.AI cs.CL cs.CR

    Exploring Scaling Trends in LLM Robustness

    Authors: Nikolaus Howe, Michał Zajac, Ian McKenzie, Oskar Hollinsworth, Tom Tseng, Pierre-Luc Bacon, Adam Gleave

    Abstract: Language model capabilities predictably improve from scaling a model's size and training data. Motivated by this, increasingly large language models have been trained, yielding an array of impressive capabilities. Yet these models are vulnerable to adversarial prompts, such as "jailbreaks" that hijack models to perform undesired behaviors, posing a significant risk of misuse. Prior work indicates… ▽ More

    Submitted 26 July, 2024; v1 submitted 25 July, 2024; originally announced July 2024.

    Comments: 31 pages; edit fixed metadata typo (author name)

    ACM Class: I.2.7

  2. arXiv:2406.05953  [pdf, other

    cs.LG

    Decoupling regularization from the action space

    Authors: Sobhan Mohammadpour, Emma Frejinger, Pierre-Luc Bacon

    Abstract: Regularized reinforcement learning (RL), particularly the entropy-regularized kind, has gained traction in optimal control and inverse RL. While standard unregularized RL methods remain unaffected by changes in the number of actions, we show that it can severely impact their regularized counterparts. This paper demonstrates the importance of decoupling the regularizer from the action space: that i… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  3. arXiv:2405.01616  [pdf, other

    q-bio.BM cs.AI cs.LG

    Generative Active Learning for the Search of Small-molecule Protein Binders

    Authors: Maksym Korablyov, Cheng-Hao Liu, Moksh Jain, Almer M. van der Sloot, Eric Jolicoeur, Edward Ruediger, Andrei Cristian Nica, Emmanuel Bengio, Kostiantyn Lapchevskyi, Daniel St-Cyr, Doris Alexandra Schuetz, Victor Ion Butoi, Jarrid Rector-Brooks, Simon Blackburn, Leo Feng, Hadi Nekoei, SaiKrishna Gottipati, Priyesh Vijayan, Prateek Gupta, Ladislav Rampášek, Sasikanth Avancha, Pierre-Luc Bacon, William L. Hamilton, Brooks Paige, Sanchit Misra , et al. (9 additional authors not shown)

    Abstract: Despite substantial progress in machine learning for scientific discovery in recent years, truly de novo design of small molecules which exhibit a property of interest remains a significant challenge. We introduce LambdaZero, a generative active learning approach to search for synthesizable molecules. Powered by deep reinforcement learning, LambdaZero learns to search over the vast space of molecu… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  4. arXiv:2403.07688  [pdf, other

    cs.LG cs.AI

    Maxwell's Demon at Work: Efficient Pruning by Leveraging Saturation of Neurons

    Authors: Simon Dufort-Labbé, Pierluca D'Oro, Evgenii Nikishin, Razvan Pascanu, Pierre-Luc Bacon, Aristide Baratin

    Abstract: When training deep neural networks, the phenomenon of $\textit{dying neurons}$ $\unicode{x2013}$units that become inactive or saturated, output zero during training$\unicode{x2013}$ has traditionally been viewed as undesirable, linked with optimization challenges, and contributing to plasticity loss in continual learning scenarios. In this paper, we reassess this phenomenon, focusing on sparsity a… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  5. arXiv:2402.05290  [pdf, other

    cs.LG cs.AI

    Do Transformer World Models Give Better Policy Gradients?

    Authors: Michel Ma, Tianwei Ni, Clement Gehring, Pierluca D'Oro, Pierre-Luc Bacon

    Abstract: A natural approach for reinforcement learning is to predict future rewards by unrolling a neural network world model, and to backpropagate through the resulting computational graph to learn a policy. However, this method often becomes impractical for long horizons since typical world models induce hard-to-optimize loss landscapes. Transformers are known to efficiently propagate gradients over long… ▽ More

    Submitted 10 February, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

    Comments: Michel Ma and Pierluca D'Oro contributed equally

  6. arXiv:2401.08898  [pdf, other

    cs.LG cs.AI

    Bridging State and History Representations: Understanding Self-Predictive RL

    Authors: Tianwei Ni, Benjamin Eysenbach, Erfan Seyedsalehi, Michel Ma, Clement Gehring, Aditya Mahajan, Pierre-Luc Bacon

    Abstract: Representations are at the core of all deep reinforcement learning (RL) methods for both Markov decision processes (MDPs) and partially observable Markov decision processes (POMDPs). Many representation learning methods and theoretical frameworks have been developed to understand what constitutes an effective representation. However, the relationships between these methods and the shared propertie… ▽ More

    Submitted 21 April, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

    Comments: ICLR 2024 (Poster). Code is available at https://github.com/twni2016/self-predictive-rl

  7. arXiv:2312.14331  [pdf, other

    cs.LG

    Maximum entropy GFlowNets with soft Q-learning

    Authors: Sobhan Mohammadpour, Emmanuel Bengio, Emma Frejinger, Pierre-Luc Bacon

    Abstract: Generative Flow Networks (GFNs) have emerged as a powerful tool for sampling discrete objects from unnormalized distributions, offering a scalable alternative to Markov Chain Monte Carlo (MCMC) methods. While GFNs draw inspiration from maximum entropy reinforcement learning (RL), the connection between the two has largely been unclear and seemingly applicable only in specific cases. This paper add… ▽ More

    Submitted 2 May, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

    Journal ref: 2024 Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 238:2593-2601

  8. arXiv:2310.15386  [pdf, other

    cs.LG cs.AI cs.RO eess.SY

    Course Correcting Koopman Representations

    Authors: Mahan Fathi, Clement Gehring, Jonathan Pilault, David Kanaa, Pierre-Luc Bacon, Ross Goroshin

    Abstract: Koopman representations aim to learn features of nonlinear dynamical systems (NLDS) which lead to linear dynamics in the latent space. Theoretically, such features can be used to simplify many problems in modeling and control of NLDS. In this work we study autoencoder formulations of this problem, and different ways they can be used to model dynamics, specifically for future state prediction over… ▽ More

    Submitted 23 November, 2023; v1 submitted 23 October, 2023; originally announced October 2023.

  9. arXiv:2310.00166  [pdf, other

    cs.AI cs.LG

    Motif: Intrinsic Motivation from Artificial Intelligence Feedback

    Authors: Martin Klissarov, Pierluca D'Oro, Shagun Sodhani, Roberta Raileanu, Pierre-Luc Bacon, Pascal Vincent, Amy Zhang, Mikael Henaff

    Abstract: Exploring rich environments and evaluating one's actions without prior knowledge is immensely challenging. In this paper, we propose Motif, a general method to interface such prior knowledge from a Large Language Model (LLM) with an agent. Motif is based on the idea of grounding LLMs for decision-making without requiring them to interact with the environment: it elicits preferences from an LLM ove… ▽ More

    Submitted 29 September, 2023; originally announced October 2023.

    Comments: The first two authors equally contributed - order decided by coin flip

  10. arXiv:2309.14597  [pdf, other

    cs.LG

    Policy Optimization in a Noisy Neighborhood: On Return Landscapes in Continuous Control

    Authors: Nate Rahn, Pierluca D'Oro, Harley Wiltzer, Pierre-Luc Bacon, Marc G. Bellemare

    Abstract: Deep reinforcement learning agents for continuous control are known to exhibit significant instability in their performance over time. In this work, we provide a fresh perspective on these behaviors by studying the return landscape: the mapping between a policy and a return. We find that popular algorithms traverse noisy neighborhoods of this landscape, in which a single update to the policy param… ▽ More

    Submitted 10 April, 2024; v1 submitted 25 September, 2023; originally announced September 2023.

    Comments: NeurIPS 2023 Accepted Paper. The first two authors contributed equally

  11. arXiv:2307.03864  [pdf, other

    cs.LG

    When Do Transformers Shine in RL? Decoupling Memory from Credit Assignment

    Authors: Tianwei Ni, Michel Ma, Benjamin Eysenbach, Pierre-Luc Bacon

    Abstract: Reinforcement learning (RL) algorithms face two distinct challenges: learning effective representations of past and present observations, and determining how actions influence future returns. Both challenges involve modeling long-term dependencies. The Transformer architecture has been very successful to solve problems that involve long-term dependencies, including in the RL domain. However, the u… ▽ More

    Submitted 3 November, 2023; v1 submitted 7 July, 2023; originally announced July 2023.

    Comments: NeurIPS 2023 (Oral)

  12. arXiv:2306.09539  [pdf, other

    cs.CL cs.LG

    Block-State Transformers

    Authors: Mahan Fathi, Jonathan Pilault, Orhan Firat, Christopher Pal, Pierre-Luc Bacon, Ross Goroshin

    Abstract: State space models (SSMs) have shown impressive results on tasks that require modeling long-range dependencies and efficiently scale to long sequences owing to their subquadratic runtime complexity. Originally designed for continuous signals, SSMs have shown superior performance on a plethora of tasks, in vision and audio; however, SSMs still lag Transformer performance in Language Modeling tasks.… ▽ More

    Submitted 30 October, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

    Comments: NeurIPS'23 - Thirty-seventh Conference on Neural Information Processing Systems

  13. arXiv:2306.04620  [pdf, other

    cs.LG q-bio.BM

    Goal-conditioned GFlowNets for Controllable Multi-Objective Molecular Design

    Authors: Julien Roy, Pierre-Luc Bacon, Christopher Pal, Emmanuel Bengio

    Abstract: In recent years, in-silico molecular design has received much attention from the machine learning community. When designing a new compound for pharmaceutical applications, there are usually multiple properties of such molecules that need to be optimised: binding energy to the target, synthesizability, toxicity, EC50, and so on. While previous approaches have employed a scalarization scheme to turn… ▽ More

    Submitted 29 June, 2023; v1 submitted 7 June, 2023; originally announced June 2023.

    Comments: 14 pages

  14. arXiv:2209.06259  [pdf, other

    cs.LG cs.AI

    Designing Biological Sequences via Meta-Reinforcement Learning and Bayesian Optimization

    Authors: Leo Feng, Padideh Nouri, Aneri Muni, Yoshua Bengio, Pierre-Luc Bacon

    Abstract: The ability to accelerate the design of biological sequences can have a substantial impact on the progress of the medical field. The problem can be framed as a global optimization problem where the objective is an expensive black-box function such that we can query large batches restricted with a limitation of a low number of rounds. Bayesian Optimization is a principled method for tackling this p… ▽ More

    Submitted 13 September, 2022; originally announced September 2022.

  15. arXiv:2205.07802  [pdf, other

    cs.LG cs.AI stat.ML

    The Primacy Bias in Deep Reinforcement Learning

    Authors: Evgenii Nikishin, Max Schwarzer, Pierluca D'Oro, Pierre-Luc Bacon, Aaron Courville

    Abstract: This work identifies a common flaw of deep reinforcement learning (RL) algorithms: a tendency to rely on early interactions and ignore useful evidence encountered later. Because of training on progressively growing datasets, deep RL agents incur a risk of overfitting to earlier experiences, negatively affecting the rest of the learning process. Inspired by cognitive science, we refer to this effec… ▽ More

    Submitted 16 May, 2022; originally announced May 2022.

    Comments: ICML 2022; code at https://github.com/evgenii-nikishin/rl_with_resets

  16. arXiv:2203.01443  [pdf, other

    cs.LG

    Continuous-Time Meta-Learning with Forward Mode Differentiation

    Authors: Tristan Deleu, David Kanaa, Leo Feng, Giancarlo Kerg, Yoshua Bengio, Guillaume Lajoie, Pierre-Luc Bacon

    Abstract: Drawing inspiration from gradient-based meta-learning methods with infinitely small gradient steps, we introduce Continuous-Time Meta-Learning (COMLN), a meta-learning algorithm where adaptation follows the dynamics of a gradient vector field. Specifically, representations of the inputs are meta-learned such that a task-specific linear classifier is obtained as a solution of an ordinary differenti… ▽ More

    Submitted 2 March, 2022; originally announced March 2022.

  17. arXiv:2202.10600  [pdf, other

    cs.LG cs.AI eess.SY stat.ML

    Myriad: a real-world testbed to bridge trajectory optimization and deep learning

    Authors: Nikolaus H. R. Howe, Simon Dufort-Labbé, Nitarshan Rajkumar, Pierre-Luc Bacon

    Abstract: We present Myriad, a testbed written in JAX for learning and planning in real-world continuous environments. The primary contributions of Myriad are threefold. First, Myriad provides machine learning practitioners access to trajectory optimization techniques for application within a typical automatic differentiation workflow. Second, Myriad presents many real-world optimal control problems, rangin… ▽ More

    Submitted 26 January, 2023; v1 submitted 21 February, 2022; originally announced February 2022.

    Comments: Updated to match version accepted at NeurIPS 2022

  18. arXiv:2112.12228  [pdf, other

    cs.LG

    Direct Behavior Specification via Constrained Reinforcement Learning

    Authors: Julien Roy, Roger Girgis, Joshua Romoff, Pierre-Luc Bacon, Christopher Pal

    Abstract: The standard formulation of Reinforcement Learning lacks a practical way of specifying what are admissible and forbidden behaviors. Most often, practitioners go about the task of behavior specification by manually engineering the reward function, a counter-intuitive process that requires several iterations and is prone to reward hacking by the agent. In this work, we argue that constrained RL, whi… ▽ More

    Submitted 18 June, 2022; v1 submitted 22 December, 2021; originally announced December 2021.

  19. arXiv:2110.05442  [pdf, other

    cs.LG cs.AI stat.ML

    Neural Algorithmic Reasoners are Implicit Planners

    Authors: Andreea Deac, Petar Veličković, Ognjen Milinković, Pierre-Luc Bacon, Jian Tang, Mladen Nikolić

    Abstract: Implicit planning has emerged as an elegant technique for combining learned models of the world with end-to-end model-free reinforcement learning. We study the class of implicit planners inspired by value iteration, an algorithm that is guaranteed to yield perfect policies in fully-specified tabular environments. We find that prior approaches either assume that the environment is provided in such… ▽ More

    Submitted 11 October, 2021; originally announced October 2021.

    Comments: To appear at NeurIPS 2021 (Spotlight talk). 20 pages, 10 figures. arXiv admin note: text overlap with arXiv:2010.13146

  20. arXiv:2106.03273  [pdf, other

    cs.LG cs.AI stat.ML

    Control-Oriented Model-Based Reinforcement Learning with Implicit Differentiation

    Authors: Evgenii Nikishin, Romina Abachi, Rishabh Agarwal, Pierre-Luc Bacon

    Abstract: The shortcomings of maximum likelihood estimation in the context of model-based reinforcement learning have been highlighted by an increasing number of papers. When the model class is misspecified or has a limited representational capacity, model parameters with high likelihood might not necessarily result in high performance of the agent on a downstream control task. To alleviate this problem, we… ▽ More

    Submitted 6 June, 2021; originally announced June 2021.

    Comments: Code at https://github.com/evgenii-nikishin/omd

  21. arXiv:2103.06224  [pdf, ps, other

    cs.LG cs.IT

    An Information-Theoretic Perspective on Credit Assignment in Reinforcement Learning

    Authors: Dilip Arumugam, Peter Henderson, Pierre-Luc Bacon

    Abstract: How do we formalize the challenge of credit assignment in reinforcement learning? Common intuition would draw attention to reward sparsity as a key contributor to difficult credit assignment and traditional heuristics would look to temporal recency for the solution, calling upon the classic eligibility trace. We posit that it is not the sparsity of the reward itself that causes difficulty in credi… ▽ More

    Submitted 10 March, 2021; originally announced March 2021.

    Comments: Workshop on Biological and Artificial Reinforcement Learning (NeurIPS 2020)

  22. arXiv:2010.13146  [pdf, other

    cs.LG cs.AI stat.ML

    XLVIN: eXecuted Latent Value Iteration Nets

    Authors: Andreea Deac, Petar Veličković, Ognjen Milinković, Pierre-Luc Bacon, Jian Tang, Mladen Nikolić

    Abstract: Value Iteration Networks (VINs) have emerged as a popular method to incorporate planning algorithms within deep reinforcement learning, enabling performance improvements on tasks requiring long-range reasoning and understanding of environment dynamics. This came with several limitations, however: the model is not incentivised in any way to perform meaningful planning computations, the underlying s… ▽ More

    Submitted 6 December, 2020; v1 submitted 25 October, 2020; originally announced October 2020.

    Comments: NeurIPS 2020 Deep Reinforcement Learning Workshop

  23. arXiv:2009.12604  [pdf, other

    cs.LG cs.AI stat.ML

    Graph neural induction of value iteration

    Authors: Andreea Deac, Pierre-Luc Bacon, Jian Tang

    Abstract: Many reinforcement learning tasks can benefit from explicit planning based on an internal model of the environment. Previously, such planning components have been incorporated through a neural network that partially aligns with the computational graph of value iteration. Such network have so far been focused on restrictive environments (e.g. grid-worlds), and modelled the planning procedure only i… ▽ More

    Submitted 26 September, 2020; originally announced September 2020.

    Comments: ICML GRL+ 2020

  24. arXiv:2007.02786  [pdf, other

    cs.LG stat.ML

    TDprop: Does Jacobi Preconditioning Help Temporal Difference Learning?

    Authors: Joshua Romoff, Peter Henderson, David Kanaa, Emmanuel Bengio, Ahmed Touati, Pierre-Luc Bacon, Joelle Pineau

    Abstract: We investigate whether Jacobi preconditioning, accounting for the bootstrap term in temporal difference (TD) learning, can help boost performance of adaptive optimizers. Our method, TDprop, computes a per parameter learning rate based on the diagonal preconditioning of the TD update rule. We show how this can be used in both $n$-step returns and TD($λ$). Our theoretical findings demonstrate that i… ▽ More

    Submitted 6 July, 2020; originally announced July 2020.

    Comments: Presented at the Theoretical Foundations of Reinforcement Learning workshop at ICML 2020

  25. arXiv:2002.11833  [pdf, other

    cs.LG cs.AI stat.ML

    Policy Evaluation Networks

    Authors: Jean Harb, Tom Schaul, Doina Precup, Pierre-Luc Bacon

    Abstract: Many reinforcement learning algorithms use value functions to guide the search for better policies. These methods estimate the value of a single policy while generalizing across many states. The core idea of this paper is to flip this convention and estimate the value of many policies, for a single set of states. This approach opens up the possibility of performing direct gradient ascent in policy… ▽ More

    Submitted 26 February, 2020; originally announced February 2020.

    Comments: 12 pages, 11 figures

  26. arXiv:2001.00271  [pdf, other

    cs.LG cs.AI stat.ML

    Options of Interest: Temporal Abstraction with Interest Functions

    Authors: Khimya Khetarpal, Martin Klissarov, Maxime Chevalier-Boisvert, Pierre-Luc Bacon, Doina Precup

    Abstract: Temporal abstraction refers to the ability of an agent to use behaviours of controllers which act for a limited, variable amount of time. The options framework describes such behaviours as consisting of a subset of states in which they can initiate, an internal policy and a stochastic termination condition. However, much of the subsequent work on option discovery has ignored the initiation set, be… ▽ More

    Submitted 1 January, 2020; originally announced January 2020.

    Comments: To appear in Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20)

  27. arXiv:1912.05104  [pdf, other

    cs.LG cs.AI stat.ML

    Entropy Regularization with Discounted Future State Distribution in Policy Gradient Methods

    Authors: Riashat Islam, Raihan Seraj, Pierre-Luc Bacon, Doina Precup

    Abstract: The policy gradient theorem is defined based on an objective with respect to the initial distribution over states. In the discounted case, this results in policies that are optimal for one distribution over initial states, but may not be uniformly optimal for others, no matter where the agent starts from. Furthermore, to obtain unbiased gradient estimates, the starting point of the policy gradient… ▽ More

    Submitted 10 December, 2019; originally announced December 2019.

    Comments: In Submission; Appeared at NeurIPS 2019 Optimization Foundations of Reinforcement Learning Workshop

  28. arXiv:1910.09093  [pdf, ps, other

    cs.LG cs.AI stat.ML

    All-Action Policy Gradient Methods: A Numerical Integration Approach

    Authors: Benjamin Petit, Loren Amdahl-Culleton, Yao Liu, Jimmy Smith, Pierre-Luc Bacon

    Abstract: While often stated as an instance of the likelihood ratio trick [Rubinstein, 1989], the original policy gradient theorem [Sutton, 1999] involves an integral over the action space. When this integral can be computed, the resulting "all-action" estimator [Sutton, 2001] provides a conditioning effect [Bratley, 1987] reducing the variance significantly compared to the REINFORCE estimator [Williams, 19… ▽ More

    Submitted 20 October, 2019; originally announced October 2019.

    Comments: 9 pages, 2 figures. NeurIPS 2019 Optimization Foundations of Reinforcement Learning Workshop

  29. arXiv:1910.06508  [pdf, other

    cs.LG stat.ML

    Understanding the Curse of Horizon in Off-Policy Evaluation via Conditional Importance Sampling

    Authors: Yao Liu, Pierre-Luc Bacon, Emma Brunskill

    Abstract: Off-policy policy estimators that use importance sampling (IS) can suffer from high variance in long-horizon domains, and there has been particular excitement over new IS methods that leverage the structure of Markov decision processes. We analyze the variance of the most popular approaches through the viewpoint of conditional Monte Carlo. Surprisingly, we find that in finite horizon MDPs there is… ▽ More

    Submitted 5 June, 2020; v1 submitted 14 October, 2019; originally announced October 2019.

    Comments: Accepted by ICML 2020, 21 pages, 1 figure

  30. arXiv:1811.07004  [pdf, ps, other

    cs.AI cs.LG

    The Barbados 2018 List of Open Issues in Continual Learning

    Authors: Tom Schaul, Hado van Hasselt, Joseph Modayil, Martha White, Adam White, Pierre-Luc Bacon, Jean Harb, Shibl Mourad, Marc Bellemare, Doina Precup

    Abstract: We want to make progress toward artificial general intelligence, namely general-purpose agents that autonomously learn how to competently act in complex environments. The purpose of this report is to sketch a research outline, share some of the most important open issues we are facing, and stimulate further discussion in the community. The content is based on some of our discussions during a week-… ▽ More

    Submitted 16 November, 2018; originally announced November 2018.

    Comments: NIPS Continual Learning Workshop 2018

  31. arXiv:1802.03236  [pdf, other

    cs.AI cs.LG stat.ML

    Learning Robust Options

    Authors: Daniel J. Mankowitz, Timothy A. Mann, Pierre-Luc Bacon, Doina Precup, Shie Mannor

    Abstract: Robust reinforcement learning aims to produce policies that have strong guarantees even in the face of environments/transition models whose parameters have strong uncertainty. Existing work uses value-based methods and the usual primitive action setting. In this paper, we propose robust methods for learning temporally abstract actions, in the framework of options. We present a Robust Options Polic… ▽ More

    Submitted 9 February, 2018; originally announced February 2018.

  32. arXiv:1712.00004  [pdf, other

    cs.LG cs.AI

    Learnings Options End-to-End for Continuous Action Tasks

    Authors: Martin Klissarov, Pierre-Luc Bacon, Jean Harb, Doina Precup

    Abstract: We present new results on learning temporally extended actions for continuoustasks, using the options framework (Suttonet al.[1999b], Precup [2000]). In orderto achieve this goal we work with the option-critic architecture (Baconet al.[2017])using a deliberation cost and train it with proximal policy optimization (Schulmanet al.[2017]) instead of vanilla policy gradient. Results on Mujoco domains… ▽ More

    Submitted 29 November, 2017; originally announced December 2017.

  33. arXiv:1711.03817  [pdf, other

    cs.AI

    Learning with Options that Terminate Off-Policy

    Authors: Anna Harutyunyan, Peter Vrancx, Pierre-Luc Bacon, Doina Precup, Ann Nowe

    Abstract: A temporally abstract action, or an option, is specified by a policy and a termination condition: the policy guides option behavior, and the termination condition roughly determines its length. Generally, learning with longer options (like learning with multi-step returns) is known to be more efficient. However, if the option set for the task is not ideal, and cannot express the primitive optimal… ▽ More

    Submitted 2 December, 2017; v1 submitted 10 November, 2017; originally announced November 2017.

    Comments: AAAI 2018

  34. arXiv:1709.06683  [pdf, other

    cs.LG

    OptionGAN: Learning Joint Reward-Policy Options using Generative Adversarial Inverse Reinforcement Learning

    Authors: Peter Henderson, Wei-Di Chang, Pierre-Luc Bacon, David Meger, Joelle Pineau, Doina Precup

    Abstract: Reinforcement learning has shown promise in learning policies that can solve complex problems. However, manually specifying a good reward function can be difficult, especially for intricate tasks. Inverse reinforcement learning offers a useful paradigm to learn the underlying reward function directly from expert demonstrations. Yet in reality, the corpus of demonstrations may contain trajectories… ▽ More

    Submitted 24 November, 2017; v1 submitted 19 September, 2017; originally announced September 2017.

    Comments: Accepted to the Thirthy-Second AAAI Conference On Artificial Intelligence (AAAI), 2018

  35. arXiv:1709.04571  [pdf, other

    cs.AI

    When Waiting is not an Option : Learning Options with a Deliberation Cost

    Authors: Jean Harb, Pierre-Luc Bacon, Martin Klissarov, Doina Precup

    Abstract: Recent work has shown that temporally extended actions (options) can be learned fully end-to-end as opposed to being specified in advance. While the problem of "how" to learn options is increasingly well understood, the question of "what" good options should be has remained elusive. We formulate our answer to what "good" options should be in the bounded rationality framework (Simon, 1957) through… ▽ More

    Submitted 13 September, 2017; originally announced September 2017.

  36. arXiv:1705.09322  [pdf, other

    cs.LG

    Convergent Tree Backup and Retrace with Function Approximation

    Authors: Ahmed Touati, Pierre-Luc Bacon, Doina Precup, Pascal Vincent

    Abstract: Off-policy learning is key to scaling up reinforcement learning as it allows to learn about a target policy from the experience generated by a different behavior policy. Unfortunately, it has been challenging to combine off-policy learning with function approximation and multi-step bootstrapping in a way that leads to both stable and efficient algorithms. In this work, we show that the \textsc{Tre… ▽ More

    Submitted 22 October, 2018; v1 submitted 25 May, 2017; originally announced May 2017.

    Journal ref: ICML 2018, Proceedings of the 35th International Conference on Machine Learning, PMLR 80:4955-4964, 2018

  37. arXiv:1612.00916  [pdf, ps, other

    cs.AI

    A Matrix Splitting Perspective on Planning with Options

    Authors: Pierre-Luc Bacon, Doina Precup

    Abstract: We show that the Bellman operator underlying the options framework leads to a matrix splitting, an approach traditionally used to speed up convergence of iterative solvers for large linear systems of equations. Based on standard comparison theorems for matrix splittings, we then show how the asymptotic rate of convergence varies as a function of the inherent timescales of the options. This new per… ▽ More

    Submitted 10 July, 2017; v1 submitted 2 December, 2016; originally announced December 2016.

    Comments: The results presented in the previous version of this paper were found be applicable only to "gating execution" and not "call-and-return". We made this distinction clear in the text and added an extension to the call-and-return model

  38. arXiv:1609.05140  [pdf, other

    cs.AI

    The Option-Critic Architecture

    Authors: Pierre-Luc Bacon, Jean Harb, Doina Precup

    Abstract: Temporal abstraction is key to scaling up learning and planning in reinforcement learning. While planning with temporally extended actions is well understood, creating such abstractions autonomously from data has remained challenging. We tackle this problem in the framework of options [Sutton, Precup & Singh, 1999; Precup, 2000]. We derive policy gradient theorems for options and propose a new opt… ▽ More

    Submitted 2 December, 2016; v1 submitted 16 September, 2016; originally announced September 2016.

    Comments: Accepted to the Thirthy-first AAAI Conference On Artificial Intelligence (AAAI), 2017

  39. arXiv:1511.06297  [pdf, other

    cs.LG

    Conditional Computation in Neural Networks for faster models

    Authors: Emmanuel Bengio, Pierre-Luc Bacon, Joelle Pineau, Doina Precup

    Abstract: Deep learning has become the state-of-art tool in many applications, but the evaluation and training of deep models can be time-consuming and computationally expensive. The conditional computation approach has been proposed to tackle this problem (Bengio et al., 2013; Davis & Arel, 2013). It operates by selectively activating only parts of the network at a time. In this paper, we use reinforcement… ▽ More

    Submitted 7 January, 2016; v1 submitted 19 November, 2015; originally announced November 2015.

    Comments: ICLR 2016 submission, revised