Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–17 of 17 results for author: Ghiassian, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.00747  [pdf, other

    cs.LG cs.AI

    Soft Preference Optimization: Aligning Language Models to Expert Distributions

    Authors: Arsalan Sharifnassab, Sina Ghiassian, Saber Salehkaleybar, Surya Kanoria, Dale Schuurmans

    Abstract: We propose Soft Preference Optimization (SPO), a method for aligning generative models, such as Large Language Models (LLMs), with human preferences, without the need for a reward model. SPO optimizes model outputs directly over a preference dataset through a natural loss function that integrates preference loss with a regularization term across the model's entire output distribution rather than l… ▽ More

    Submitted 27 May, 2024; v1 submitted 30 April, 2024; originally announced May 2024.

  2. arXiv:2404.02649  [pdf, other

    cs.LG

    On the Importance of Uncertainty in Decision-Making with Large Language Models

    Authors: Nicolò Felicioni, Lucas Maystre, Sina Ghiassian, Kamil Ciosek

    Abstract: We investigate the role of uncertainty in decision-making problems with natural language as input. For such tasks, using Large Language Models as agents has become the norm. However, none of the recent approaches employ any additional phase for estimating the uncertainty the agent has about the world during the decision-making task. We focus on a fundamental decision-making framework with natural… ▽ More

    Submitted 13 July, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

    Comments: Published in TMLR (07/2024). 12 pages of main content, 25 pages with references and appendix

  3. arXiv:2403.06826  [pdf, other

    cs.LG cs.AI stat.ML

    In-context Exploration-Exploitation for Reinforcement Learning

    Authors: Zhenwen Dai, Federico Tomasi, Sina Ghiassian

    Abstract: In-context learning is a promising approach for online policy learning of offline reinforcement learning (RL) methods, which can be achieved at inference time without gradient optimization. However, this method is hindered by significant computational costs resulting from the gathering of large training trajectory sets and the need to train large Transformer models. We address this challenge by in… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

    Comments: Published at ICLR 2024

  4. arXiv:2210.14361  [pdf, other

    cs.LG cs.AI

    Auxiliary task discovery through generate-and-test

    Authors: Banafsheh Rafiee, Sina Ghiassian, Jun Jin, Richard Sutton, Jun Luo, Adam White

    Abstract: In this paper, we explore an approach to auxiliary task discovery in reinforcement learning based on ideas from representation learning. Auxiliary tasks tend to improve data efficiency by forcing the agent to learn auxiliary prediction and control objectives in addition to the main task of maximizing reward, and thus producing better representations. Typically these tasks are designed by people. M… ▽ More

    Submitted 20 July, 2024; v1 submitted 25 October, 2022; originally announced October 2022.

  5. arXiv:2203.10172  [pdf, other

    cs.LG

    Importance Sampling Placement in Off-Policy Temporal-Difference Methods

    Authors: Eric Graves, Sina Ghiassian

    Abstract: A central challenge to applying many off-policy reinforcement learning algorithms to real world problems is the variance introduced by importance sampling. In off-policy learning, the agent learns about a different policy than the one being executed. To account for the difference importance sampling ratios are often used, but can increase variance in the algorithms and reduce the rate of learning.… ▽ More

    Submitted 16 June, 2022; v1 submitted 18 March, 2022; originally announced March 2022.

    Comments: 5 pages, 2 figures

  6. arXiv:2109.05110  [pdf, other

    cs.LG cs.AI

    An Empirical Comparison of Off-policy Prediction Learning Algorithms in the Four Rooms Environment

    Authors: Sina Ghiassian, Richard S. Sutton

    Abstract: Many off-policy prediction learning algorithms have been proposed in the past decade, but it remains unclear which algorithms learn faster than others. We empirically compare 11 off-policy prediction learning algorithms with linear function approximation on two small tasks: the Rooms task, and the High Variance Rooms task. The tasks are designed such that learning fast in them is challenging. In t… ▽ More

    Submitted 10 September, 2021; originally announced September 2021.

    Comments: 13 pages

  7. arXiv:2106.00922  [pdf, other

    cs.LG cs.AI

    An Empirical Comparison of Off-policy Prediction Learning Algorithms on the Collision Task

    Authors: Sina Ghiassian, Richard S. Sutton

    Abstract: Off-policy prediction -- learning the value function for one policy from data generated while following another policy -- is one of the most challenging subproblems in reinforcement learning. This paper presents empirical results with eleven prominent off-policy learning algorithms that use linear function approximation: five Gradient-TD methods, two Emphatic-TD methods, Off-policy TD($λ$), Vtrace… ▽ More

    Submitted 11 June, 2021; v1 submitted 1 June, 2021; originally announced June 2021.

  8. arXiv:2102.07686  [pdf, other

    cs.LG cs.AI stat.ML

    Does the Adam Optimizer Exacerbate Catastrophic Forgetting?

    Authors: Dylan R. Ashley, Sina Ghiassian, Richard S. Sutton

    Abstract: Catastrophic forgetting remains a severe hindrance to the broad application of artificial neural networks (ANNs), however, it continues to be a poorly understood phenomenon. Despite the extensive amount of work on catastrophic forgetting, we argue that it is still unclear how exactly the phenomenon should be quantified, and, moreover, to what degree all of the choices we make when designing learni… ▽ More

    Submitted 9 June, 2021; v1 submitted 15 February, 2021; originally announced February 2021.

    Comments: 9 pages in main text + 3 pages of references + 16 pages of appendices, 6 figures in main text + 21 figures in appendices, 6 tables in appendices; source code available at https://github.com/dylanashley/catastrophic-forgetting/tree/arxiv

    ACM Class: I.2.6

  9. arXiv:2011.04590  [pdf, other

    cs.AI

    From Eye-blinks to State Construction: Diagnostic Benchmarks for Online Representation Learning

    Authors: Banafsheh Rafiee, Zaheer Abbas, Sina Ghiassian, Raksha Kumaraswamy, Richard Sutton, Elliot Ludvig, Adam White

    Abstract: We present three new diagnostic prediction problems inspired by classical-conditioning experiments to facilitate research in online prediction learning. Experiments in classical conditioning show that animals such as rabbits, pigeons, and dogs can make long temporal associations that enable multi-step prediction. To replicate this remarkable ability, an agent must construct an internal state repre… ▽ More

    Submitted 10 October, 2022; v1 submitted 9 November, 2020; originally announced November 2020.

  10. arXiv:2007.00611  [pdf, other

    cs.LG cs.AI stat.ML

    Gradient Temporal-Difference Learning with Regularized Corrections

    Authors: Sina Ghiassian, Andrew Patterson, Shivam Garg, Dhawal Gupta, Adam White, Martha White

    Abstract: It is still common to use Q-learning and temporal difference (TD) learning-even though they have divergence issues and sound Gradient TD alternatives exist-because divergence seems rare and they typically perform well. However, recent work with large neural network learning systems reveals that instability is more common than previously thought. Practitioners face a difficult dilemma: choose an ea… ▽ More

    Submitted 17 September, 2020; v1 submitted 1 July, 2020; originally announced July 2020.

    Comments: Appeared in Proceedings of the 37th International Conference on Machine Learning (ICML2020)

  11. arXiv:2004.07229  [pdf

    q-bio.MN cs.LG q-bio.QM stat.ML

    Network Medicine Framework for Identifying Drug Repurposing Opportunities for COVID-19

    Authors: Deisy Morselli Gysi, Ítalo Do Valle, Marinka Zitnik, Asher Ameli, Xiao Gan, Onur Varol, Susan Dina Ghiassian, JJ Patten, Robert Davey, Joseph Loscalzo, Albert-László Barabási

    Abstract: The current pandemic has highlighted the need for methodologies that can quickly and reliably prioritize clinically approved compounds for their potential effectiveness for SARS-CoV-2 infections. In the past decade, network medicine has developed and validated multiple predictive algorithms for drug repurposing, exploiting the sub-cellular network-based relationship between a drug's targets and di… ▽ More

    Submitted 9 August, 2020; v1 submitted 15 April, 2020; originally announced April 2020.

  12. arXiv:2003.07417  [pdf, other

    cs.LG cs.AI cs.NE

    Improving Performance in Reinforcement Learning by Breaking Generalization in Neural Networks

    Authors: Sina Ghiassian, Banafsheh Rafiee, Yat Long Lo, Adam White

    Abstract: Reinforcement learning systems require good representations to work well. For decades practical success in reinforcement learning was limited to small domains. Deep reinforcement learning systems, on the other hand, are scalable, not dependent on domain specific prior knowledge and have been successfully used to play Atari, in 3D navigation from pixels, and to control high degree of freedom robots… ▽ More

    Submitted 16 March, 2020; originally announced March 2020.

    Comments: 10 pages; Accepted to AAMAS 2020

  13. arXiv:1910.13213  [pdf, other

    cs.AI cs.LG

    Overcoming Catastrophic Interference in Online Reinforcement Learning with Dynamic Self-Organizing Maps

    Authors: Yat Long Lo, Sina Ghiassian

    Abstract: Using neural networks in the reinforcement learning (RL) framework has achieved notable successes. Yet, neural networks tend to forget what they learned in the past, especially when they learn online and fully incrementally, a setting in which the weights are updated after each sample is received and the sample is then discarded. Under this setting, an update can lead to overly global generalizati… ▽ More

    Submitted 29 October, 2019; originally announced October 2019.

    Comments: 9 Pages, 7 Figures, NeurIPS Workshop on Biological and Artificial Reinforcement Learning, 2019

    Journal ref: Biological and Artificial RL Workshop at NeurIPS 2019

  14. arXiv:1903.00194  [pdf, other

    cs.AI cs.LG

    Should All Temporal Difference Learning Use Emphasis?

    Authors: Xiang Gu, Sina Ghiassian, Richard S. Sutton

    Abstract: Emphatic Temporal Difference (ETD) learning has recently been proposed as a convergent off-policy learning method. ETD was proposed mainly to address convergence issues of conventional Temporal Difference (TD) learning under off-policy training but it is different from conventional TD learning even under on-policy training. A simple counterexample provided back in 2017 pointed to a potential class… ▽ More

    Submitted 1 March, 2019; originally announced March 2019.

  15. arXiv:1811.02597  [pdf, other

    cs.LG cs.AI stat.ML

    Online Off-policy Prediction

    Authors: Sina Ghiassian, Andrew Patterson, Martha White, Richard S. Sutton, Adam White

    Abstract: This paper investigates the problem of online prediction learning, where learning proceeds continuously as the agent interacts with an environment. The predictions made by the agent are contingent on a particular way of behaving, represented as a value function. However, the behavior used to select actions and generate the behavior data might be different from the one used to define the prediction… ▽ More

    Submitted 6 November, 2018; originally announced November 2018.

    Comments: 68 pages

  16. arXiv:1805.07476  [pdf, other

    cs.LG cs.AI stat.ML

    Two geometric input transformation methods for fast online reinforcement learning with neural nets

    Authors: Sina Ghiassian, Huizhen Yu, Banafsheh Rafiee, Richard S. Sutton

    Abstract: We apply neural nets with ReLU gates in online reinforcement learning. Our goal is to train these networks in an incremental manner, without the computationally expensive experience replay. By studying how individual neural nodes behave in online training, we recognize that the global nature of ReLU gates can cause undesirable learning interference in each node's learning behavior. We propose redu… ▽ More

    Submitted 6 September, 2018; v1 submitted 18 May, 2018; originally announced May 2018.

    Comments: 16 pages

  17. arXiv:1705.04185  [pdf, other

    cs.AI cs.LG

    A First Empirical Study of Emphatic Temporal Difference Learning

    Authors: Sina Ghiassian, Banafsheh Rafiee, Richard S. Sutton

    Abstract: In this paper we present the first empirical study of the emphatic temporal-difference learning algorithm (ETD), comparing it with conventional temporal-difference learning, in particular, with linear TD(0), on on-policy and off-policy variations of the Mountain Car problem. The initial motivation for developing ETD was that it has good convergence properties under off-policy training (Sutton, Mah… ▽ More

    Submitted 12 May, 2017; v1 submitted 11 May, 2017; originally announced May 2017.

    Comments: 5 pages, Accepted to NIPS Continual Learning and Deep Networks workshop, 2016