Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–23 of 23 results for author: Combes, R T D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2310.17139  [pdf, other

    cs.LG

    Understanding and Addressing the Pitfalls of Bisimulation-based Representations in Offline Reinforcement Learning

    Authors: Hongyu Zang, Xin Li, Leiji Zhang, Yang Liu, Baigui Sun, Riashat Islam, Remi Tachet des Combes, Romain Laroche

    Abstract: While bisimulation-based approaches hold promise for learning robust state representations for Reinforcement Learning (RL) tasks, their efficacy in offline RL tasks has not been up to par. In some instances, their performance has even significantly underperformed alternative methods. We aim to understand why bisimulation methods succeed in online settings, but falter in offline tasks. Our analysis… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

    Comments: NeurIPS 2023

  2. arXiv:2306.13085  [pdf, other

    cs.LG cs.AI

    Harnessing Mixed Offline Reinforcement Learning Datasets via Trajectory Weighting

    Authors: Zhang-Wei Hong, Pulkit Agrawal, Rémi Tachet des Combes, Romain Laroche

    Abstract: Most offline reinforcement learning (RL) algorithms return a target policy maximizing a trade-off between (1) the expected performance gain over the behavior policy that collected the dataset, and (2) the risk stemming from the out-of-distribution-ness of the induced state-action occupancy. It follows that the performance of the target policy is strongly related to the performance of the behavior… ▽ More

    Submitted 22 June, 2023; originally announced June 2023.

    Journal ref: Conference paper at ICLR 2023

  3. arXiv:2211.00863  [pdf, other

    cs.LG cs.AI

    Behavior Prior Representation learning for Offline Reinforcement Learning

    Authors: Hongyu Zang, Xin Li, Jie Yu, Chen Liu, Riashat Islam, Remi Tachet Des Combes, Romain Laroche

    Abstract: Offline reinforcement learning (RL) struggles in environments with rich and noisy inputs, where the agent only has access to a fixed dataset without environment interactions. Past works have proposed common workarounds based on the pre-training of state representations, followed by policy training. In this work, we introduce a simple, yet effective approach for learning state representations. Our… ▽ More

    Submitted 27 February, 2023; v1 submitted 2 November, 2022; originally announced November 2022.

    Comments: ICLR 2023

  4. arXiv:2211.00247  [pdf, other

    cs.LG cs.AI

    Discrete Factorial Representations as an Abstraction for Goal Conditioned Reinforcement Learning

    Authors: Riashat Islam, Hongyu Zang, Anirudh Goyal, Alex Lamb, Kenji Kawaguchi, Xin Li, Romain Laroche, Yoshua Bengio, Remi Tachet Des Combes

    Abstract: Goal-conditioned reinforcement learning (RL) is a promising direction for training agents that are capable of solving multiple tasks and reach a diverse set of objectives. How to \textit{specify} and \textit{ground} these goals in such a way that we can both reliably reach goals during training as well as generalize to new goals during evaluation remains an open area of research. Defining goals in… ▽ More

    Submitted 31 October, 2022; originally announced November 2022.

    Comments: Neurips 2022

  5. arXiv:2211.00164  [pdf, other

    cs.LG cs.AI cs.CV cs.RO

    Agent-Controller Representations: Principled Offline RL with Rich Exogenous Information

    Authors: Riashat Islam, Manan Tomar, Alex Lamb, Yonathan Efroni, Hongyu Zang, Aniket Didolkar, Dipendra Misra, Xin Li, Harm van Seijen, Remi Tachet des Combes, John Langford

    Abstract: Learning to control an agent from data collected offline in a rich pixel-based visual observation space is vital for real-world applications of reinforcement learning (RL). A major challenge in this setting is the presence of input information that is hard to model and irrelevant to controlling the agent. This problem has been approached by the theoretical RL community through the lens of exogenou… ▽ More

    Submitted 13 August, 2023; v1 submitted 31 October, 2022; originally announced November 2022.

    Comments: ICML 2023

  6. arXiv:2206.05229  [pdf, other

    cs.LG

    Measuring the Carbon Intensity of AI in Cloud Instances

    Authors: Jesse Dodge, Taylor Prewitt, Remi Tachet Des Combes, Erika Odmark, Roy Schwartz, Emma Strubell, Alexandra Sasha Luccioni, Noah A. Smith, Nicole DeCario, Will Buchanan

    Abstract: By providing unprecedented access to computational resources, cloud computing has enabled rapid growth in technologies such as machine learning, the computational demands of which incur a high energy cost and a commensurate carbon footprint. As a result, recent scholarship has called for better estimates of the greenhouse gas impact of AI: data scientists today do not have easy or reliable access… ▽ More

    Submitted 10 June, 2022; originally announced June 2022.

    Comments: In ACM Conference on Fairness, Accountability, and Transparency (ACM FAccT) 2022

  7. arXiv:2206.01085  [pdf, other

    cs.LG

    Incorporating Explicit Uncertainty Estimates into Deep Offline Reinforcement Learning

    Authors: David Brandfonbrener, Remi Tachet des Combes, Romain Laroche

    Abstract: Most theoretically motivated work in the offline reinforcement learning setting requires precise uncertainty estimates. This requirement restricts the algorithms derived in that work to the tabular and linear settings where such estimates exist. In this work, we develop a novel method for incorporating scalable uncertainty estimates into an offline reinforcement learning algorithm called deep-SPIB… ▽ More

    Submitted 2 June, 2022; originally announced June 2022.

  8. arXiv:2205.13950  [pdf, other

    cs.LG eess.SY

    Non-Markovian policies occupancy measures

    Authors: Romain Laroche, Remi Tachet des Combes, Jacob Buckman

    Abstract: A central object of study in Reinforcement Learning (RL) is the Markovian policy, in which an agent's actions are chosen from a memoryless probability distribution, conditioned only on its current state. The family of Markovian policies is broad enough to be interesting, yet simple enough to be amenable to analysis. However, RL often involves more complex policies: ensembles of policies, policies… ▽ More

    Submitted 27 May, 2022; originally announced May 2022.

    Comments: 9p+sup. mat

  9. arXiv:2104.03863  [pdf, other

    cs.LG cs.CR stat.ML

    A single gradient step finds adversarial examples on random two-layers neural networks

    Authors: Sébastien Bubeck, Yeshwanth Cherapanamjeri, Gauthier Gidel, Rémi Tachet des Combes

    Abstract: Daniely and Schacham recently showed that gradient descent finds adversarial examples on random undercomplete two-layers ReLU neural networks. The term "undercomplete" refers to the fact that their proof only holds when the number of neurons is a vanishing fraction of the ambient dimension. We extend their result to the overcomplete case, where the number of neurons is larger than the dimension (y… ▽ More

    Submitted 9 April, 2021; v1 submitted 8 April, 2021; originally announced April 2021.

    Comments: Added a comment about universal adversarial perturbations. 18 pages, 7 figures

  10. arXiv:2102.05628  [pdf, ps, other

    stat.ML cs.LG

    On the Regularity of Attention

    Authors: James Vuckovic, Aristide Baratin, Remi Tachet des Combes

    Abstract: Attention is a powerful component of modern neural networks across a wide variety of domains. In this paper, we seek to quantify the regularity (i.e. the amount of smoothness) of the attention operation. To accomplish this goal, we propose a new mathematical framework that uses measure theory and integral operators to model attention. We show that this framework is consistent with the usual defini… ▽ More

    Submitted 10 February, 2021; originally announced February 2021.

    Comments: Conference version of arXiv:2007.02876

  11. arXiv:2010.01069  [pdf, other

    cs.LG cs.AI

    A Deeper Look at Discounting Mismatch in Actor-Critic Algorithms

    Authors: Shangtong Zhang, Romain Laroche, Harm van Seijen, Shimon Whiteson, Remi Tachet des Combes

    Abstract: We investigate the discounting mismatch in actor-critic algorithm implementations from a representation learning perspective. Theoretically, actor-critic algorithms usually have discounting for both actor and critic, i.e., there is a $γ^t$ term in the actor update for the transition observed at time $t$ in a trajectory and the critic is a discounted value function. Practitioners, however, usually… ▽ More

    Submitted 26 January, 2022; v1 submitted 2 October, 2020; originally announced October 2020.

    Comments: AAMAS 2022

  12. arXiv:2009.05475  [pdf, other

    cs.LG cs.CV stat.ML

    Adversarial score matching and improved sampling for image generation

    Authors: Alexia Jolicoeur-Martineau, Rémi Piché-Taillefer, Rémi Tachet des Combes, Ioannis Mitliagkas

    Abstract: Denoising Score Matching with Annealed Langevin Sampling (DSM-ALS) has recently found success in generative modeling. The approach works by first training a neural network to estimate the score of a distribution, and then using Langevin dynamics to sample from the data distribution assumed by the score network. Despite the convincing visual quality of samples, this method appears to perform worse… ▽ More

    Submitted 10 October, 2020; v1 submitted 11 September, 2020; originally announced September 2020.

    Comments: Code at https://github.com/AlexiaJM/AdversarialConsistentScoreMatching

  13. arXiv:2007.02876  [pdf, ps, other

    stat.ML cs.LG

    A Mathematical Theory of Attention

    Authors: James Vuckovic, Aristide Baratin, Remi Tachet des Combes

    Abstract: Attention is a powerful component of modern neural networks across a wide variety of domains. However, despite its ubiquity in machine learning, there is a gap in our understanding of attention from a theoretical point of view. We propose a framework to fill this gap by building a mathematically equivalent model of attention using measure theory. With this model, we are able to interpret self-atte… ▽ More

    Submitted 20 July, 2020; v1 submitted 6 July, 2020; originally announced July 2020.

  14. arXiv:2006.07217  [pdf, other

    cs.LG stat.ML

    Deep Reinforcement and InfoMax Learning

    Authors: Bogdan Mazoure, Remi Tachet des Combes, Thang Doan, Philip Bachman, R Devon Hjelm

    Abstract: We begin with the hypothesis that a model-free agent whose representations are predictive of properties of future states (beyond expected rewards) will be more capable of solving and adapting to new RL problems. To test that hypothesis, we introduce an objective based on Deep InfoMax (DIM) which trains the agent to predict the future by maximizing the mutual information between its internal repres… ▽ More

    Submitted 16 November, 2020; v1 submitted 12 June, 2020; originally announced June 2020.

    Comments: NeurIPS 2020

  15. arXiv:1911.05873  [pdf, ps, other

    cs.LG stat.ML

    A Reduction from Reinforcement Learning to No-Regret Online Learning

    Authors: Ching-An Cheng, Remi Tachet des Combes, Byron Boots, Geoff Gordon

    Abstract: We present a reduction from reinforcement learning (RL) to no-regret online learning based on the saddle-point formulation of RL, by which "any" online algorithm with sublinear regret can generate policies with provable performance guarantees. This new perspective decouples the RL problem into two parts: regret minimization and function approximation. The first part admits a standard online-learni… ▽ More

    Submitted 1 January, 2020; v1 submitted 13 November, 2019; originally announced November 2019.

  16. arXiv:1909.05236  [pdf, other

    cs.LG cs.AI stat.ML

    Safe Policy Improvement with an Estimated Baseline Policy

    Authors: Thiago D. Simão, Romain Laroche, Rémi Tachet des Combes

    Abstract: Previous work has shown the unreliability of existing algorithms in the batch Reinforcement Learning setting, and proposed the theoretically-grounded Safe Policy Improvement with Baseline Bootstrapping (SPIBB) fix: reproduce the baseline policy in the uncertain state-action pairs, in order to control the variance on the trained policy performance. However, in many real-world applications such as d… ▽ More

    Submitted 28 December, 2020; v1 submitted 11 September, 2019; originally announced September 2019.

    Comments: Published at AAMAS 2020

  17. arXiv:1907.05079  [pdf, other

    cs.LG cs.AI stat.ML

    Safe Policy Improvement with Soft Baseline Bootstrapping

    Authors: Kimia Nadjahi, Romain Laroche, Rémi Tachet des Combes

    Abstract: Batch Reinforcement Learning (Batch RL) consists in training a policy using trajectories collected with another policy, called the behavioural policy. Safe policy improvement (SPI) provides guarantees with high probability that the trained policy performs better than the behavioural policy, also called baseline in this setting. Previous work shows that the SPI objective improves mean performance a… ▽ More

    Submitted 11 July, 2019; originally announced July 2019.

    Comments: Accepted paper at ECML-PKDD2019

  18. arXiv:1901.09453  [pdf, other

    cs.LG cs.AI stat.ML

    On Learning Invariant Representation for Domain Adaptation

    Authors: Han Zhao, Remi Tachet des Combes, Kun Zhang, Geoffrey J. Gordon

    Abstract: Due to the ability of deep neural nets to learn rich representations, recent advances in unsupervised domain adaptation have focused on learning domain-invariant features that achieve a small error on the source domain. The hope is that the learnt representation, together with the hypothesis learnt from the source domain, can generalize to the target domain. In this paper, we first construct a sim… ▽ More

    Submitted 30 May, 2019; v1 submitted 27 January, 2019; originally announced January 2019.

    Comments: Compared with the last version, the current one adds a new corollary for the case of different feature transformations (encoders) on source/target domains. Fix a typo in Fig. 1

  19. arXiv:1812.05159  [pdf, other

    cs.LG stat.ML

    An Empirical Study of Example Forgetting during Deep Neural Network Learning

    Authors: Mariya Toneva, Alessandro Sordoni, Remi Tachet des Combes, Adam Trischler, Yoshua Bengio, Geoffrey J. Gordon

    Abstract: Inspired by the phenomenon of catastrophic forgetting, we investigate the learning dynamics of neural networks as they train on single classification tasks. Our goal is to understand whether a related phenomenon occurs when data does not undergo a clear distributional shift. We define a `forgetting event' to have occurred when an individual training example transitions from being classified correc… ▽ More

    Submitted 15 November, 2019; v1 submitted 12 December, 2018; originally announced December 2018.

    Comments: ICLR 2019

  20. arXiv:1806.11525  [pdf, other

    cs.CL cs.LG

    Counting to Explore and Generalize in Text-based Games

    Authors: Xingdi Yuan, Marc-Alexandre Côté, Alessandro Sordoni, Romain Laroche, Remi Tachet des Combes, Matthew Hausknecht, Adam Trischler

    Abstract: We propose a recurrent RL agent with an episodic exploration mechanism that helps discovering good policies in text-based game environments. We show promising results on a set of generated text-based games of varying difficulty where the goal is to collect a coin located at the end of a chain of rooms. In contrast to previous text-based RL approaches, we observe that our agent learns policies that… ▽ More

    Submitted 6 March, 2019; v1 submitted 29 June, 2018; originally announced June 2018.

  21. arXiv:1712.06924  [pdf, other

    cs.LG cs.AI stat.ML

    Safe Policy Improvement with Baseline Bootstrapping

    Authors: Romain Laroche, Paul Trichelair, Rémi Tachet des Combes

    Abstract: This paper considers Safe Policy Improvement (SPI) in Batch Reinforcement Learning (Batch RL): from a fixed dataset and without direct access to the true environment, train a policy that is guaranteed to perform at least as well as the baseline policy used to collect the data. Our approach, called SPI with Baseline Bootstrapping (SPIBB), is inspired by the knows-what-it-knows paradigm: it bootstra… ▽ More

    Submitted 7 June, 2019; v1 submitted 19 December, 2017; originally announced December 2017.

    Comments: accepted as a long oral at ICML2019

  22. arXiv:1505.03854  [pdf, ps, other

    physics.soc-ph cs.SI

    Cities through the Prism of People's Spending Behavior

    Authors: Stanislav Sobolevsky, Izabela Sitko, Remi Tachet des Combes, Bartosz Hawelka, Juan Murillo Arias, Carlo Ratti

    Abstract: Scientific studies of society increasingly rely on digital traces produced by various aspects of human activity. In this paper, we use a relatively unexplored source of data, anonymized records of bank card transactions collected in Spain by a big European bank, in order to propose a new classification scheme of cities based on the economic behavior of their residents. First, we study how individu… ▽ More

    Submitted 14 May, 2015; originally announced May 2015.

  23. arXiv:1405.4301  [pdf, ps, other

    physics.soc-ph cs.SI q-fin.GN

    Mining Urban Performance: Scale-Independent Classification of Cities Based on Individual Economic Transactions

    Authors: Stanislav Sobolevsky, Izabela Sitko, Sebastian Grauwin, Remi Tachet des Combes, Bartosz Hawelka, Juan Murillo Arias, Carlo Ratti

    Abstract: Intensive development of urban systems creates a number of challenges for urban planners and policy makers in order to maintain sustainable growth. Running efficient urban policies requires meaningful urban metrics, which could quantify important urban characteristics including various aspects of an actual human behavior. Since a city size is known to have a major, yet often nonlinear, impact on t… ▽ More

    Submitted 16 May, 2014; originally announced May 2014.

    Comments: 10 pages, 7 figures, to be published in the proceedings of ASE BigDataScience 2014 conference

    MSC Class: 62-07; 68U01 ACM Class: H.2.8; J.4