Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 124 results for author: Pineau, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.07918  [pdf, other

    cs.CY cs.AI cs.LG

    On the Societal Impact of Open Foundation Models

    Authors: Sayash Kapoor, Rishi Bommasani, Kevin Klyman, Shayne Longpre, Ashwin Ramaswami, Peter Cihon, Aspen Hopkins, Kevin Bankston, Stella Biderman, Miranda Bogen, Rumman Chowdhury, Alex Engler, Peter Henderson, Yacine Jernite, Seth Lazar, Stefano Maffulli, Alondra Nelson, Joelle Pineau, Aviya Skowron, Dawn Song, Victor Storchan, Daniel Zhang, Daniel E. Ho, Percy Liang, Arvind Narayanan

    Abstract: Foundation models are powerful technologies: how they are released publicly directly shapes their societal impact. In this position paper, we focus on open foundation models, defined here as those with broadly available model weights (e.g. Llama 2, Stable Diffusion XL). We identify five distinctive properties (e.g. greater customizability, poor monitoring) of open foundation models that lead to bo… ▽ More

    Submitted 27 February, 2024; originally announced March 2024.

  2. arXiv:2210.12574  [pdf, other

    cs.CL cs.LG

    The Curious Case of Absolute Position Embeddings

    Authors: Koustuv Sinha, Amirhossein Kazemnejad, Siva Reddy, Joelle Pineau, Dieuwke Hupkes, Adina Williams

    Abstract: Transformer language models encode the notion of word order using positional information. Most commonly, this positional information is represented by absolute position embeddings (APEs), that are learned from the pretraining data. However, in natural language, it is not absolute position that matters, but relative position, and the extent to which APEs can capture this type of information has not… ▽ More

    Submitted 22 October, 2022; originally announced October 2022.

    Comments: Accepted at EMNLP 2022 Findings; 5 pages and 15 pages Appendix

  3. arXiv:2206.10658  [pdf, other

    cs.CL cs.IR

    Questions Are All You Need to Train a Dense Passage Retriever

    Authors: Devendra Singh Sachan, Mike Lewis, Dani Yogatama, Luke Zettlemoyer, Joelle Pineau, Manzil Zaheer

    Abstract: We introduce ART, a new corpus-level autoencoding approach for training dense retrieval models that does not require any labeled training data. Dense retrieval is a central challenge for open-domain tasks, such as Open QA, where state-of-the-art methods typically require large supervised datasets with custom hard-negative mining and denoising of positive examples. ART, in contrast, only requires a… ▽ More

    Submitted 2 April, 2023; v1 submitted 21 June, 2022; originally announced June 2022.

    Comments: Accepted to TACL, pre MIT Press publication version

  4. arXiv:2204.07496  [pdf, other

    cs.CL cs.IR

    Improving Passage Retrieval with Zero-Shot Question Generation

    Authors: Devendra Singh Sachan, Mike Lewis, Mandar Joshi, Armen Aghajanyan, Wen-tau Yih, Joelle Pineau, Luke Zettlemoyer

    Abstract: We propose a simple and effective re-ranking method for improving passage retrieval in open question answering. The re-ranker re-scores retrieved passages with a zero-shot question generation model, which uses a pre-trained language model to compute the probability of the input question conditioned on a retrieved passage. This approach can be applied on top of any retrieval method (e.g. neural or… ▽ More

    Submitted 2 April, 2023; v1 submitted 15 April, 2022; originally announced April 2022.

    Comments: EMNLP 2022 camera-ready version. Code is available at: https://github.com/DevSinghSachan/unsupervised-passage-reranking

  5. arXiv:2203.03798   

    cs.LG cs.AI

    New Insights on Reducing Abrupt Representation Change in Online Continual Learning

    Authors: Lucas Caccia, Rahaf Aljundi, Nader Asadi, Tinne Tuytelaars, Joelle Pineau, Eugene Belilovsky

    Abstract: In the online continual learning paradigm, agents must learn from a changing distribution while respecting memory and compute constraints. Experience Replay (ER), where a small subset of past data is stored and replayed alongside new data, has emerged as a simple and effective learning strategy. In this work, we focus on the change in representations of observed data that arises when previously un… ▽ More

    Submitted 25 April, 2022; v1 submitted 7 March, 2022; originally announced March 2022.

    Comments: This has been withdrawn as it is a new version of arXiv:2104.05025

  6. arXiv:2203.00097  [pdf

    stat.ME cs.AI cs.LG econ.EM math.OC

    Estimating causal effects with optimization-based methods: A review and empirical comparison

    Authors: Martin Cousineau, Vedat Verter, Susan A. Murphy, Joelle Pineau

    Abstract: In the absence of randomized controlled and natural experiments, it is necessary to balance the distributions of (observable) covariates of the treated and control groups in order to obtain an unbiased estimate of a causal effect of interest; otherwise, a different effect size may be estimated, and incorrect recommendations may be given. To achieve this balance, there exist a wide variety of metho… ▽ More

    Submitted 28 February, 2022; originally announced March 2022.

    Comments: In Press, Corrected Proof

    Journal ref: European Journal of Operational Research, 2022, 14 pages

  7. arXiv:2202.07013  [pdf, other

    cs.LG cs.AI cs.RO

    Robust Policy Learning over Multiple Uncertainty Sets

    Authors: Annie Xie, Shagun Sodhani, Chelsea Finn, Joelle Pineau, Amy Zhang

    Abstract: Reinforcement learning (RL) agents need to be robust to variations in safety-critical environments. While system identification methods provide a way to infer the variation from online experience, they can fail in settings where fast identification is not possible. Another dominant approach is robust RL which produces a policy that can handle worst-case scenarios, but these methods are generally d… ▽ More

    Submitted 4 March, 2022; v1 submitted 14 February, 2022; originally announced February 2022.

    Comments: Project webpage at https://sites.google.com/view/sirsa-public/home

  8. arXiv:2201.01836  [pdf, other

    cs.LG cs.AI

    A Generalized Bootstrap Target for Value-Learning, Efficiently Combining Value and Feature Predictions

    Authors: Anthony GX-Chen, Veronica Chelu, Blake A. Richards, Joelle Pineau

    Abstract: Estimating value functions is a core component of reinforcement learning algorithms. Temporal difference (TD) learning algorithms use bootstrapping, i.e. they update the value function toward a learning target using value estimates at subsequent time-steps. Alternatively, the value function can be updated toward a learning target constructed by separately predicting successor features (SF)--a poli… ▽ More

    Submitted 5 January, 2022; originally announced January 2022.

    Comments: 18 pages, 6 figures, 2 tables. Preprint. Accepted by AAAI-22

  9. arXiv:2110.06972  [pdf, other

    cs.LG cs.AI

    Block Contextual MDPs for Continual Learning

    Authors: Shagun Sodhani, Franziska Meier, Joelle Pineau, Amy Zhang

    Abstract: In reinforcement learning (RL), when defining a Markov Decision Process (MDP), the environment dynamics is implicitly assumed to be stationary. This assumption of stationarity, while simplifying, can be unrealistic in many scenarios. In the continual reinforcement learning scenario, the sequence of tasks is another source of nonstationarity. In this work, we propose to examine this continual reinf… ▽ More

    Submitted 13 October, 2021; originally announced October 2021.

    Comments: 26pages, Under Review

  10. arXiv:2106.10783  [pdf, other

    cs.LG cs.AI

    OptiDICE: Offline Policy Optimization via Stationary Distribution Correction Estimation

    Authors: Jongmin Lee, Wonseok Jeon, Byung-Jun Lee, Joelle Pineau, Kee-Eung Kim

    Abstract: We consider the offline reinforcement learning (RL) setting where the agent aims to optimize the policy solely from the data without further environment interactions. In offline RL, the distributional shift becomes the primary source of difficulty, which arises from the deviation of the target policy being optimized from the behavior policy used for data collection. This typically causes overestim… ▽ More

    Submitted 20 June, 2021; originally announced June 2021.

    Comments: 26 pages, 11 figures, Accepted at ICML 2021

  11. arXiv:2106.10622  [pdf, other

    cs.CL

    Do Encoder Representations of Generative Dialogue Models Encode Sufficient Information about the Task ?

    Authors: Prasanna Parthasarathi, Joelle Pineau, Sarath Chandar

    Abstract: Predicting the next utterance in dialogue is contingent on encoding of users' input text to generate appropriate and relevant response in data-driven approaches. Although the semantic and syntactic quality of the language generated is evaluated, more often than not, the encoded representation of input is not evaluated. As the representation of the encoder is essential for predicting the appropriat… ▽ More

    Submitted 20 June, 2021; originally announced June 2021.

    Comments: Accepted at SIGDial 2021. arXiv admin note: substantial text overlap with arXiv:2008.10427

  12. arXiv:2106.10619  [pdf, other

    cs.CL

    A Brief Study on the Effects of Training Generative Dialogue Models with a Semantic loss

    Authors: Prasanna Parthasarathi, Mohamed Abdelsalam, Joelle Pineau, Sarath Chandar

    Abstract: Neural models trained for next utterance generation in dialogue task learn to mimic the n-gram sequences in the training set with training objectives like negative log-likelihood (NLL) or cross-entropy. Such commonly used training objectives do not foster generating alternate responses to a context. But, the effects of minimizing an alternate training objective that fosters a model to generate alt… ▽ More

    Submitted 20 June, 2021; originally announced June 2021.

    Comments: Accepted at SIGDial 2021

  13. arXiv:2106.09065  [pdf, other

    cs.CV cs.LG

    SPeCiaL: Self-Supervised Pretraining for Continual Learning

    Authors: Lucas Caccia, Joelle Pineau

    Abstract: This paper presents SPeCiaL: a method for unsupervised pretraining of representations tailored for continual learning. Our approach devises a meta-learning objective that differentiates through a sequential learning process. Specifically, we train a linear model over the representations to match different augmented views of the same image together, each view presented sequentially. The linear mode… ▽ More

    Submitted 16 June, 2021; originally announced June 2021.

  14. arXiv:2106.03955  [pdf, other

    cs.LG stat.ML

    Correcting Momentum in Temporal Difference Learning

    Authors: Emmanuel Bengio, Joelle Pineau, Doina Precup

    Abstract: A common optimization tool used in deep reinforcement learning is momentum, which consists in accumulating and discounting past gradients, reapplying them at each iteration. We argue that, unlike in supervised learning, momentum in Temporal Difference (TD) learning accumulates gradients that become doubly stale: not only does the gradient of the loss change due to parameter updates, the loss itsel… ▽ More

    Submitted 7 June, 2021; originally announced June 2021.

    Comments: NeurIPS Deep RL Workshop 2020

  15. arXiv:2106.00099  [pdf, other

    cs.LG

    Multi-Objective SPIBB: Seldonian Offline Policy Improvement with Safety Constraints in Finite MDPs

    Authors: Harsh Satija, Philip S. Thomas, Joelle Pineau, Romain Laroche

    Abstract: We study the problem of Safe Policy Improvement (SPI) under constraints in the offline Reinforcement Learning (RL) setting. We consider the scenario where: (i) we have a dataset collected under a known baseline policy, (ii) multiple reward signals are received from the environment inducing as many objectives to optimize. We present an SPI formulation for this RL setting that takes into account the… ▽ More

    Submitted 29 October, 2021; v1 submitted 31 May, 2021; originally announced June 2021.

  16. arXiv:2104.07623  [pdf, other

    cs.CL

    Sometimes We Want Translationese

    Authors: Prasanna Parthasarathi, Koustuv Sinha, Joelle Pineau, Adina Williams

    Abstract: Rapid progress in Neural Machine Translation (NMT) systems over the last few years has been driven primarily towards improving translation quality, and as a secondary focus, improved robustness to input perturbations (e.g. spelling and grammatical mistakes). While performance and robustness are important objectives, by over-focusing on these, we risk overlooking other important properties. In this… ▽ More

    Submitted 15 April, 2021; originally announced April 2021.

    Comments: 16 pages, 11 figures and 3 tables

  17. arXiv:2104.06644  [pdf, other

    cs.CL cs.LG

    Masked Language Modeling and the Distributional Hypothesis: Order Word Matters Pre-training for Little

    Authors: Koustuv Sinha, Robin Jia, Dieuwke Hupkes, Joelle Pineau, Adina Williams, Douwe Kiela

    Abstract: A possible explanation for the impressive performance of masked language model (MLM) pre-training is that such models have learned to represent the syntactic structures prevalent in classical NLP pipelines. In this paper, we propose a different explanation: MLMs succeed on downstream tasks almost entirely due to their ability to model higher-order word co-occurrence statistics. To demonstrate this… ▽ More

    Submitted 9 September, 2021; v1 submitted 14 April, 2021; originally announced April 2021.

    Comments: To appear at EMNLP 2021; 26 pages total (9 main, 6 reference and 11 Appendix)

  18. arXiv:2104.05025  [pdf, other

    cs.LG

    New Insights on Reducing Abrupt Representation Change in Online Continual Learning

    Authors: Lucas Caccia, Rahaf Aljundi, Nader Asadi, Tinne Tuytelaars, Joelle Pineau, Eugene Belilovsky

    Abstract: In the online continual learning paradigm, agents must learn from a changing distribution while respecting memory and compute constraints. Experience Replay (ER), where a small subset of past data is stored and replayed alongside new data, has emerged as a simple and effective learning strategy. In this work, we focus on the change in representations of observed data that arises when previously un… ▽ More

    Submitted 2 May, 2022; v1 submitted 11 April, 2021; originally announced April 2021.

    Comments: Accepted at ICLR 2022. Code available at www.github.com/pclucas14/AML

  19. arXiv:2103.08067  [pdf, other

    cs.MA cs.AI

    Quasi-Equivalence Discovery for Zero-Shot Emergent Communication

    Authors: Kalesha Bullard, Douwe Kiela, Franziska Meier, Joelle Pineau, Jakob Foerster

    Abstract: Effective communication is an important skill for enabling information exchange in multi-agent settings and emergent communication is now a vibrant field of research, with common settings involving discrete cheap-talk channels. Since, by definition, these settings involve arbitrary encoding of information, typically they do not allow for the learned protocols to generalize beyond training partners… ▽ More

    Submitted 22 June, 2021; v1 submitted 14 March, 2021; originally announced March 2021.

    Comments: 14 pages

  20. arXiv:2102.09850  [pdf, other

    cs.LG cs.AI cs.RO

    Model-Invariant State Abstractions for Model-Based Reinforcement Learning

    Authors: Manan Tomar, Amy Zhang, Roberto Calandra, Matthew E. Taylor, Joelle Pineau

    Abstract: Accuracy and generalization of dynamics models is key to the success of model-based reinforcement learning (MBRL). As the complexity of tasks increases, so does the sample inefficiency of learning accurate dynamics models. However, many complex tasks also exhibit sparsity in the dynamics, i.e., actions have only a local effect on the system dynamics. In this paper, we exploit this property with a… ▽ More

    Submitted 7 June, 2021; v1 submitted 19 February, 2021; originally announced February 2021.

  21. arXiv:2102.07097  [pdf, other

    cs.LG cs.AI

    Domain Adversarial Reinforcement Learning

    Authors: Bonnie Li, Vincent François-Lavet, Thang Doan, Joelle Pineau

    Abstract: We consider the problem of generalization in reinforcement learning where visual aspects of the observations might differ, e.g. when there are different backgrounds or change in contrast, brightness, etc. We assume that our agent has access to only a few of the MDPs from the MDP distribution during training. The performance of the agent is then reported on new unknown test domains drawn from the d… ▽ More

    Submitted 14 February, 2021; originally announced February 2021.

  22. arXiv:2102.06177  [pdf, other

    cs.LG cs.AI cs.RO

    Multi-Task Reinforcement Learning with Context-based Representations

    Authors: Shagun Sodhani, Amy Zhang, Joelle Pineau

    Abstract: The benefit of multi-task learning over single-task learning relies on the ability to use relations across tasks to improve performance on any single task. While sharing representations is an important mechanism to share information across tasks, its success depends on how well the structure underlying the tasks is captured. In some real-world situations, we have access to metadata, or additional… ▽ More

    Submitted 10 June, 2021; v1 submitted 11 February, 2021; originally announced February 2021.

    Comments: Accepted at the 38th International Conference on Machine Learning (ICML 2021). 17 pages, 4 figures, 20 tables

  23. arXiv:2102.03419  [pdf, other

    cs.AI cs.CL cs.IR cs.LG cs.SI

    Exploring the Limits of Few-Shot Link Prediction in Knowledge Graphs

    Authors: Dora Jambor, Komal Teru, Joelle Pineau, William L. Hamilton

    Abstract: Real-world knowledge graphs are often characterized by low-frequency relations - a challenge that has prompted an increasing interest in few-shot link prediction methods. These methods perform link prediction for a set of new relations, unseen during training, given only a few example facts of each relation at test time. In this work, we perform a systematic study on a spectrum of models derived b… ▽ More

    Submitted 5 February, 2021; originally announced February 2021.

    Comments: code available at https://github.com/dorajam/few-shot-link-prediction-paper

    Journal ref: European Chapter of the ACL (EACL), 2021

  24. arXiv:2101.04909  [pdf, other

    cs.CV cs.LG

    COVID-19 Prognosis via Self-Supervised Representation Learning and Multi-Image Prediction

    Authors: Anuroop Sriram, Matthew Muckley, Koustuv Sinha, Farah Shamout, Joelle Pineau, Krzysztof J. Geras, Lea Azour, Yindalon Aphinyanaphongs, Nafissa Yakubova, William Moore

    Abstract: The rapid spread of COVID-19 cases in recent months has strained hospital resources, making rapid and accurate triage of patients presenting to emergency departments a necessity. Machine learning techniques using clinical data such as chest X-rays have been used to predict which patients are most at risk of deterioration. We consider the task of predicting two types of patient deterioration based… ▽ More

    Submitted 24 January, 2021; v1 submitted 13 January, 2021; originally announced January 2021.

  25. arXiv:2101.00010  [pdf, other

    cs.CL cs.LG

    UnNatural Language Inference

    Authors: Koustuv Sinha, Prasanna Parthasarathi, Joelle Pineau, Adina Williams

    Abstract: Recent investigations into the inner-workings of state-of-the-art large-scale pre-trained Transformer-based Natural Language Understanding (NLU) models indicate that they appear to know humanlike syntax, at least to some extent. We provide novel evidence that complicates this claim: we find that state-of-the-art Natural Language Inference (NLI) models assign the same labels to permuted examples as… ▽ More

    Submitted 10 June, 2021; v1 submitted 30 December, 2020; originally announced January 2021.

    Comments: Accepted at ACL 2021 (Long Paper), 9 pages + Appendix

  26. arXiv:2012.02055  [pdf, other

    cs.RO cs.LG

    Intervention Design for Effective Sim2Real Transfer

    Authors: Melissa Mozifian, Amy Zhang, Joelle Pineau, David Meger

    Abstract: The goal of this work is to address the recent success of domain randomization and data augmentation for the sim2real setting. We explain this success through the lens of causal inference, positioning domain randomization and data augmentation as interventions on the environment which encourage invariance to irrelevant features. Such interventions include visual perturbations that have no effect o… ▽ More

    Submitted 3 December, 2020; originally announced December 2020.

  27. arXiv:2010.15896  [pdf, other

    cs.MA cs.AI

    Exploring Zero-Shot Emergent Communication in Embodied Multi-Agent Populations

    Authors: Kalesha Bullard, Franziska Meier, Douwe Kiela, Joelle Pineau, Jakob Foerster

    Abstract: Effective communication is an important skill for enabling information exchange and cooperation in multi-agent settings. Indeed, emergent communication is now a vibrant field of research, with common settings involving discrete cheap-talk channels. One limitation of this setting is that it does not allow for the emergent protocols to generalize beyond the training partners. Furthermore, so far eme… ▽ More

    Submitted 3 December, 2020; v1 submitted 29 October, 2020; originally announced October 2020.

  28. arXiv:2010.03691  [pdf, other

    cs.LG

    Regularized Inverse Reinforcement Learning

    Authors: Wonseok Jeon, Chen-Yang Su, Paul Barde, Thang Doan, Derek Nowrouzezahrai, Joelle Pineau

    Abstract: Inverse Reinforcement Learning (IRL) aims to facilitate a learner's ability to imitate expert behavior by acquiring reward functions that explain the expert's decisions. Regularized IRL applies strongly convex regularizers to the learner's policy in order to avoid the expert's behavior being rationalized by arbitrary constant rewards, also known as degenerate solutions. We propose tractable soluti… ▽ More

    Submitted 2 December, 2020; v1 submitted 7 October, 2020; originally announced October 2020.

    Comments: 26 pages, 7 figures

  29. arXiv:2009.13579  [pdf, other

    cs.LG stat.ML

    Novelty Search in Representational Space for Sample Efficient Exploration

    Authors: Ruo Yu Tao, Vincent François-Lavet, Joelle Pineau

    Abstract: We present a new approach for efficient exploration which leverages a low-dimensional encoding of the environment learned with a combination of model-based and model-free objectives. Our approach uses intrinsic rewards that are based on the distance of nearest neighbors in the low dimensional representational space to gauge novelty. We then leverage these intrinsic rewards for sample-efficient exp… ▽ More

    Submitted 15 April, 2022; v1 submitted 28 September, 2020; originally announced September 2020.

    Comments: 10 pages + references + appendix. Oral presentation at NeurIPS 2020

  30. arXiv:2008.11811  [pdf, other

    cs.LG math.OC stat.ML

    Constrained Markov Decision Processes via Backward Value Functions

    Authors: Harsh Satija, Philip Amortila, Joelle Pineau

    Abstract: Although Reinforcement Learning (RL) algorithms have found tremendous success in simulated domains, they often cannot directly be applied to physical systems, especially in cases where there are hard constraints to satisfy (e.g. on safety or resources). In standard RL, the agent is incentivized to explore any behavior as long as it maximizes rewards, but in the real world, undesired behavior can d… ▽ More

    Submitted 26 August, 2020; originally announced August 2020.

  31. arXiv:2008.10427  [pdf, other

    cs.CL cs.AI

    How To Evaluate Your Dialogue System: Probe Tasks as an Alternative for Token-level Evaluation Metrics

    Authors: Prasanna Parthasarathi, Joelle Pineau, Sarath Chandar

    Abstract: Though generative dialogue modeling is widely seen as a language modeling task, the task demands an agent to have a complex natural language understanding of its input text to carry a meaningful interaction with an user. The automatic metrics used evaluate the quality of the generated text as a proxy to the holistic interaction of the agent. Such metrics were earlier shown to not correlate with th… ▽ More

    Submitted 24 August, 2020; originally announced August 2020.

  32. arXiv:2007.07206  [pdf, other

    cs.LG cs.AI stat.ML

    Learning Robust State Abstractions for Hidden-Parameter Block MDPs

    Authors: Amy Zhang, Shagun Sodhani, Khimya Khetarpal, Joelle Pineau

    Abstract: Many control tasks exhibit similar dynamics that can be modeled as having common latent structure. Hidden-Parameter Markov Decision Processes (HiP-MDPs) explicitly model this structure to improve sample efficiency in multi-task settings. However, this setting makes strong assumptions on the observability of the state that limit its application in real-world scenarios with rich observation spaces.… ▽ More

    Submitted 11 February, 2021; v1 submitted 14 July, 2020; originally announced July 2020.

    Comments: Accepted at the 9th International Conference on Learning Representations. 22 pages, 14 figures

  33. arXiv:2007.02786  [pdf, other

    cs.LG stat.ML

    TDprop: Does Jacobi Preconditioning Help Temporal Difference Learning?

    Authors: Joshua Romoff, Peter Henderson, David Kanaa, Emmanuel Bengio, Ahmed Touati, Pierre-Luc Bacon, Joelle Pineau

    Abstract: We investigate whether Jacobi preconditioning, accounting for the bootstrap term in temporal difference (TD) learning, can help boost performance of adaptive optimizers. Our method, TDprop, computes a per parameter learning rate based on the diagonal preconditioning of the TD update rule. We show how this can be used in both $n$-step returns and TD($λ$). Our theoretical findings demonstrate that i… ▽ More

    Submitted 6 July, 2020; originally announced July 2020.

    Comments: Presented at the Theoretical Foundations of Reinforcement Learning workshop at ICML 2020

  34. arXiv:2007.01516  [pdf, other

    cs.LG q-bio.GN stat.AP stat.ML

    Deep interpretability for GWAS

    Authors: Deepak Sharma, Audrey Durand, Marc-André Legault, Louis-Philippe Lemieux Perreault, Audrey Lemaçon, Marie-Pierre Dubé, Joelle Pineau

    Abstract: Genome-Wide Association Studies are typically conducted using linear models to find genetic variants associated with common diseases. In these studies, association testing is done on a variant-by-variant basis, possibly missing out on non-linear interaction effects between variants. Deep networks can be used to model these interactions, but they are difficult to train and interpret on large geneti… ▽ More

    Submitted 3 July, 2020; originally announced July 2020.

    Comments: Accepted at ICML 2020 workshop on ML Interpretability for Scientific Discovery

  35. arXiv:2006.13258  [pdf, other

    cs.LG cs.AI stat.ML

    Adversarial Soft Advantage Fitting: Imitation Learning without Policy Optimization

    Authors: Paul Barde, Julien Roy, Wonseok Jeon, Joelle Pineau, Christopher Pal, Derek Nowrouzezahrai

    Abstract: Adversarial Imitation Learning alternates between learning a discriminator -- which tells apart expert's demonstrations from generated ones -- and a generator's policy to produce trajectories that can fool this discriminator. This alternated optimization is known to be delicate in practice since it compounds unstable adversarial training with brittle and sample-inefficient reinforcement learning.… ▽ More

    Submitted 16 April, 2021; v1 submitted 23 June, 2020; originally announced June 2020.

    Journal ref: Advances in Neural Information Processing Systems 33 (2020)

  36. arXiv:2005.06616  [pdf, other

    cs.CY cs.AI cs.CL cs.HC cs.LG

    A Large-Scale, Open-Domain, Mixed-Interface Dialogue-Based ITS for STEM

    Authors: Iulian Vlad Serban, Varun Gupta, Ekaterina Kochmar, Dung D. Vu, Robert Belfer, Joelle Pineau, Aaron Courville, Laurent Charlin, Yoshua Bengio

    Abstract: We present Korbit, a large-scale, open-domain, mixed-interface, dialogue-based intelligent tutoring system (ITS). Korbit uses machine learning, natural language processing and reinforcement learning to provide interactive, personalized learning online. Korbit has been designed to easily scale to thousands of subjects, by automating, standardizing and simplifying the content creation process. Unlik… ▽ More

    Submitted 5 May, 2020; originally announced May 2020.

    Comments: 6 pages, 1 figure, 1 table, accepted for publication in the 21st International Conference on Artificial Intelligence in Education (AIED 2020)

    ACM Class: I.2.0; I.2.1; I.2.7; K.3.1; G.4

  37. arXiv:2005.03648  [pdf, other

    cs.LG cs.AI stat.ML

    Plan2Vec: Unsupervised Representation Learning by Latent Plans

    Authors: Ge Yang, Amy Zhang, Ari S. Morcos, Joelle Pineau, Pieter Abbeel, Roberto Calandra

    Abstract: In this paper we introduce plan2vec, an unsupervised representation learning approach that is inspired by reinforcement learning. Plan2vec constructs a weighted graph on an image dataset using near-neighbor distances, and then extrapolates this local metric to a global embedding by distilling path-integral over planned path. When applied to control, plan2vec offers a way to learn goal-conditioned… ▽ More

    Submitted 7 May, 2020; originally announced May 2020.

    Comments: code available at https://geyang.github.io/plan2vec

    Journal ref: Proceedings of Machine Learning Research, the 2nd Annual Conference on Learning for Dynamics and Control (2020) Volume 120, 1-12

  38. arXiv:2005.02431  [pdf, other

    cs.CL cs.AI

    Automated Personalized Feedback Improves Learning Gains in an Intelligent Tutoring System

    Authors: Ekaterina Kochmar, Dung Do Vu, Robert Belfer, Varun Gupta, Iulian Vlad Serban, Joelle Pineau

    Abstract: We investigate how automated, data-driven, personalized feedback in a large-scale intelligent tutoring system (ITS) improves student learning outcomes. We propose a machine learning approach to generate personalized feedback, which takes individual needs of students into account. We utilize state-of-the-art machine learning and natural language processing techniques to provide the students with pe… ▽ More

    Submitted 7 May, 2020; v1 submitted 5 May, 2020; originally announced May 2020.

    Comments: To be published in Proceedings of the the 21st International Conference on Artificial Intelligence in Education (AIED 2020)

  39. arXiv:2005.00583  [pdf, other

    cs.CL cs.LG

    Learning an Unreferenced Metric for Online Dialogue Evaluation

    Authors: Koustuv Sinha, Prasanna Parthasarathi, Jasmine Wang, Ryan Lowe, William L. Hamilton, Joelle Pineau

    Abstract: Evaluating the quality of a dialogue interaction between two agents is a difficult task, especially in open-domain chit-chat style dialogue. There have been recent efforts to develop automatic dialogue evaluation metrics, but most of them do not generalize to unseen datasets and/or need a human-generated reference response during inference, making it infeasible for online evaluation. Here, we prop… ▽ More

    Submitted 1 May, 2020; originally announced May 2020.

    Comments: Accepted at ACL 2020, 5 pages

  40. arXiv:2003.12206  [pdf, other

    cs.LG stat.ML

    Improving Reproducibility in Machine Learning Research (A Report from the NeurIPS 2019 Reproducibility Program)

    Authors: Joelle Pineau, Philippe Vincent-Lamarre, Koustuv Sinha, Vincent Larivière, Alina Beygelzimer, Florence d'Alché-Buc, Emily Fox, Hugo Larochelle

    Abstract: One of the challenges in machine learning research is to ensure that presented and published results are sound and reliable. Reproducibility, that is obtaining similar results as presented in a paper or talk, using the same code and data (when available), is a necessary step to verify the reliability of research findings. Reproducibility is also an important step to promote open and accessible res… ▽ More

    Submitted 30 December, 2020; v1 submitted 26 March, 2020; originally announced March 2020.

    Comments: To appear at JMLR, 16 pages + Appendix

  41. arXiv:2003.06560  [pdf, other

    cs.LG stat.ML

    Evaluating Logical Generalization in Graph Neural Networks

    Authors: Koustuv Sinha, Shagun Sodhani, Joelle Pineau, William L. Hamilton

    Abstract: Recent research has highlighted the role of relational inductive biases in building learning agents that can generalize and reason in a compositional manner. However, while relational learning algorithms such as graph neural networks (GNNs) show promise, we do not understand how effectively these approaches can adapt to new tasks. In this work, we study the task of logical generalization using GNN… ▽ More

    Submitted 14 March, 2020; originally announced March 2020.

  42. arXiv:2003.06350  [pdf, other

    cs.LG stat.ML

    Interference and Generalization in Temporal Difference Learning

    Authors: Emmanuel Bengio, Joelle Pineau, Doina Precup

    Abstract: We study the link between generalization and interference in temporal-difference (TD) learning. Interference is defined as the inner product of two different gradients, representing their alignment. This quantity emerges as being of interest from a variety of observations about neural networks, parameter sharing and the dynamics of learning. We find that TD easily leads to low-interference, under-… ▽ More

    Submitted 13 March, 2020; originally announced March 2020.

    Comments: Submitted to ICML 2020. 20 pages, 14 figures

  43. arXiv:2003.06016  [pdf, other

    cs.LG cs.AI stat.ML

    Invariant Causal Prediction for Block MDPs

    Authors: Amy Zhang, Clare Lyle, Shagun Sodhani, Angelos Filos, Marta Kwiatkowska, Joelle Pineau, Yarin Gal, Doina Precup

    Abstract: Generalization across environments is critical to the successful application of reinforcement learning algorithms to real-world challenges. In this paper, we consider the problem of learning abstractions that generalize in block MDPs, families of environments with a shared latent state space and dynamics structure over that latent space, but varying observations. We leverage tools from causal infe… ▽ More

    Submitted 11 June, 2020; v1 submitted 12 March, 2020; originally announced March 2020.

    Comments: Accepted to ICML 2020. 16 pages, 8 figures

  44. arXiv:2003.04108  [pdf, other

    cs.LG stat.ML

    Stable Policy Optimization via Off-Policy Divergence Regularization

    Authors: Ahmed Touati, Amy Zhang, Joelle Pineau, Pascal Vincent

    Abstract: Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO) are among the most successful policy gradient approaches in deep reinforcement learning (RL). While these methods achieve state-of-the-art performance across a wide range of challenging tasks, there is room for improvement in the stabilization of the policy learning and how the off-policy data are used. In this paper we… ▽ More

    Submitted 19 June, 2020; v1 submitted 9 March, 2020; originally announced March 2020.

    Journal ref: Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI), PMLR volume 124, 2020

  45. arXiv:2002.10525  [pdf, other

    cs.MA cs.LG

    Scalable Multi-Agent Inverse Reinforcement Learning via Actor-Attention-Critic

    Authors: Wonseok Jeon, Paul Barde, Derek Nowrouzezahrai, Joelle Pineau

    Abstract: Multi-agent adversarial inverse reinforcement learning (MA-AIRL) is a recent approach that applies single-agent AIRL to multi-agent problems where we seek to recover both policies for our agents and reward functions that promote expert-like behavior. While MA-AIRL has promising results on cooperative and competitive tasks, it is sample-inefficient and has only been validated empirically for small… ▽ More

    Submitted 24 February, 2020; originally announced February 2020.

  46. arXiv:2002.05651  [pdf, other

    cs.CY cs.LG

    Towards the Systematic Reporting of the Energy and Carbon Footprints of Machine Learning

    Authors: Peter Henderson, Jieru Hu, Joshua Romoff, Emma Brunskill, Dan Jurafsky, Joelle Pineau

    Abstract: Accurate reporting of energy and carbon usage is essential for understanding the potential climate impacts of machine learning research. We introduce a framework that makes this easier by providing a simple interface for tracking realtime energy consumption and carbon emissions, as well as generating standardized online appendices. Utilizing this framework, we create a leaderboard for energy effic… ▽ More

    Submitted 29 November, 2022; v1 submitted 31 January, 2020; originally announced February 2020.

    Comments: Published in JMLR: https://jmlr.org/papers/v21/20-312.html

  47. arXiv:2002.02863  [pdf, other

    cs.LG stat.ML

    Representation of Reinforcement Learning Policies in Reproducing Kernel Hilbert Spaces

    Authors: Bogdan Mazoure, Thang Doan, Tianyu Li, Vladimir Makarenkov, Joelle Pineau, Doina Precup, Guillaume Rabusseau

    Abstract: We propose a general framework for policy representation for reinforcement learning tasks. This framework involves finding a low-dimensional embedding of the policy on a reproducing kernel Hilbert space (RKHS). The usage of RKHS based methods allows us to derive strong theoretical guarantees on the expected return of the reconstructed policy. Such guarantees are typically lacking in black-box mode… ▽ More

    Submitted 15 October, 2020; v1 submitted 7 February, 2020; originally announced February 2020.

  48. arXiv:2002.01093  [pdf, other

    cs.CL cs.AI cs.LG cs.MA stat.ML

    On the interaction between supervision and self-play in emergent communication

    Authors: Ryan Lowe, Abhinav Gupta, Jakob Foerster, Douwe Kiela, Joelle Pineau

    Abstract: A promising approach for teaching artificial agents to use natural language involves using human-in-the-loop training. However, recent work suggests that current machine learning methods are too data inefficient to be trained in this way from scratch. In this paper, we investigate the relationship between two categories of learning signals with the ultimate goal of improving sample efficiency: imi… ▽ More

    Submitted 22 June, 2020; v1 submitted 3 February, 2020; originally announced February 2020.

    Comments: The first two authors contributed equally. Accepted at ICLR 2020

  49. arXiv:1911.09033  [pdf, other

    cs.LG cs.CV stat.ML

    Exploiting Spatial Invariance for Scalable Unsupervised Object Tracking

    Authors: Eric Crawford, Joelle Pineau

    Abstract: The ability to detect and track objects in the visual world is a crucial skill for any intelligent agent, as it is a necessary precursor to any object-level reasoning process. Moreover, it is important that agents learn to track objects without supervision (i.e. without access to annotated training videos) since this will allow agents to begin operating in new environments with minimal human assis… ▽ More

    Submitted 20 November, 2019; originally announced November 2019.

    Comments: Accepted at AAAI 2020. Code: https://github.com/e2crawfo/silot. Visualizations: https://sites.google.com/view/silot

  50. arXiv:1911.08019  [pdf, other

    cs.LG cs.CV stat.ML

    Online Learned Continual Compression with Adaptive Quantization Modules

    Authors: Lucas Caccia, Eugene Belilovsky, Massimo Caccia, Joelle Pineau

    Abstract: We introduce and study the problem of Online Continual Compression, where one attempts to simultaneously learn to compress and store a representative dataset from a non i.i.d data stream, while only observing each sample once. A naive application of auto-encoders in this setting encounters a major challenge: representations derived from earlier encoder states must be usable by later decoder states… ▽ More

    Submitted 20 August, 2020; v1 submitted 18 November, 2019; originally announced November 2019.