Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–13 of 13 results for author: Dwaracherla, V

Searching in archive cs. Search in all archives.
.
  1. arXiv:2402.00396  [pdf, other

    cs.LG cs.AI cs.CL stat.ME stat.ML

    Efficient Exploration for LLMs

    Authors: Vikranth Dwaracherla, Seyed Mohammad Asghari, Botao Hao, Benjamin Van Roy

    Abstract: We present evidence of substantial benefit from efficient exploration in gathering human feedback to improve large language models. In our experiments, an agent sequentially generates queries while fitting a reward model to the feedback received. Our best-performing agent generates queries using double Thompson sampling, with uncertainty represented by an epistemic neural network. Our results demo… ▽ More

    Submitted 4 June, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

    Comments: Accepted at ICML 2024

  2. arXiv:2302.09205  [pdf, other

    cs.LG cs.AI

    Approximate Thompson Sampling via Epistemic Neural Networks

    Authors: Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Morteza Ibrahimi, Xiuyuan Lu, Benjamin Van Roy

    Abstract: Thompson sampling (TS) is a popular heuristic for action selection, but it requires sampling from a posterior distribution. Unfortunately, this can become computationally intractable in complex environments, such as those modeled using neural networks. Approximate posterior samples can produce effective actions, but only if they reasonably approximate joint predictive distributions of outputs acro… ▽ More

    Submitted 17 February, 2023; originally announced February 2023.

  3. arXiv:2207.00137  [pdf, other

    cs.LG

    Robustness of Epinets against Distributional Shifts

    Authors: Xiuyuan Lu, Ian Osband, Seyed Mohammad Asghari, Sven Gowal, Vikranth Dwaracherla, Zheng Wen, Benjamin Van Roy

    Abstract: Recent work introduced the epinet as a new approach to uncertainty modeling in deep learning. An epinet is a small neural network added to traditional neural networks, which, together, can produce predictive distributions. In particular, using an epinet can greatly improve the quality of joint predictions across multiple inputs, a measure of how well a neural network knows what it does not know. I… ▽ More

    Submitted 30 June, 2022; originally announced July 2022.

  4. arXiv:2206.03633  [pdf, other

    cs.LG cs.AI stat.ML

    Ensembles for Uncertainty Estimation: Benefits of Prior Functions and Bootstrapping

    Authors: Vikranth Dwaracherla, Zheng Wen, Ian Osband, Xiuyuan Lu, Seyed Mohammad Asghari, Benjamin Van Roy

    Abstract: In machine learning, an agent needs to estimate uncertainty to efficiently explore and adapt and to make effective decisions. A common approach to uncertainty estimation maintains an ensemble of models. In recent years, several approaches have been proposed for training ensembles, and conflicting views prevail with regards to the importance of various ingredients of these approaches. In this paper… ▽ More

    Submitted 7 June, 2022; originally announced June 2022.

  5. arXiv:2202.13509  [pdf, other

    stat.ML cs.AI cs.LG

    Evaluating High-Order Predictive Distributions in Deep Learning

    Authors: Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Xiuyuan Lu, Benjamin Van Roy

    Abstract: Most work on supervised learning research has focused on marginal predictions. In decision problems, joint predictive distributions are essential for good performance. Previous work has developed methods for assessing low-order predictive distributions with inputs sampled i.i.d. from the testing distribution. With low-dimensional inputs, these methods distinguish agents that effectively estimate u… ▽ More

    Submitted 27 February, 2022; originally announced February 2022.

  6. arXiv:2110.04629  [pdf, other

    cs.LG cs.AI stat.ML

    The Neural Testbed: Evaluating Joint Predictions

    Authors: Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Botao Hao, Morteza Ibrahimi, Dieterich Lawson, Xiuyuan Lu, Brendan O'Donoghue, Benjamin Van Roy

    Abstract: Predictive distributions quantify uncertainties ignored by point estimates. This paper introduces The Neural Testbed: an open-source benchmark for controlled and principled evaluation of agents that generate such predictions. Crucially, the testbed assesses agents not only on the quality of their marginal predictions per input, but also on their joint predictions across many inputs. We evaluate a… ▽ More

    Submitted 1 November, 2022; v1 submitted 9 October, 2021; originally announced October 2021.

  7. arXiv:2107.09224  [pdf, ps, other

    cs.LG stat.ML

    From Predictions to Decisions: The Importance of Joint Predictive Distributions

    Authors: Zheng Wen, Ian Osband, Chao Qin, Xiuyuan Lu, Morteza Ibrahimi, Vikranth Dwaracherla, Mohammad Asghari, Benjamin Van Roy

    Abstract: A fundamental challenge for any intelligent system is prediction: given some inputs, can you predict corresponding outcomes? Most work on supervised learning has focused on producing accurate marginal predictions for each input. However, we show that for a broad class of decision problems, accurate joint predictions are required to deliver good performance. In particular, we establish several resu… ▽ More

    Submitted 23 May, 2022; v1 submitted 19 July, 2021; originally announced July 2021.

  8. arXiv:2107.08924  [pdf, other

    cs.LG cs.AI stat.ML

    Epistemic Neural Networks

    Authors: Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Morteza Ibrahimi, Xiuyuan Lu, Benjamin Van Roy

    Abstract: Intelligence relies on an agent's knowledge of what it does not know. This capability can be assessed based on the quality of joint predictions of labels across multiple inputs. In principle, ensemble-based approaches produce effective joint predictions, but the computational costs of training large ensembles can become prohibitive. We introduce the epinet: an architecture that can supplement any… ▽ More

    Submitted 17 May, 2023; v1 submitted 19 July, 2021; originally announced July 2021.

  9. arXiv:2103.04047  [pdf, other

    cs.LG cs.AI

    Reinforcement Learning, Bit by Bit

    Authors: Xiuyuan Lu, Benjamin Van Roy, Vikranth Dwaracherla, Morteza Ibrahimi, Ian Osband, Zheng Wen

    Abstract: Reinforcement learning agents have demonstrated remarkable achievements in simulated environments. Data efficiency poses an impediment to carrying this success over to real environments. The design of data-efficient agents calls for a deeper understanding of information acquisition and representation. We discuss concepts and regret analysis that together offer principled guidance. This line of thi… ▽ More

    Submitted 4 May, 2023; v1 submitted 6 March, 2021; originally announced March 2021.

  10. arXiv:2006.07464  [pdf, other

    cs.LG math.OC stat.ML

    Hypermodels for Exploration

    Authors: Vikranth Dwaracherla, Xiuyuan Lu, Morteza Ibrahimi, Ian Osband, Zheng Wen, Benjamin Van Roy

    Abstract: We study the use of hypermodels to represent epistemic uncertainty and guide exploration. This generalizes and extends the use of ensembles to approximate Thompson sampling. The computational cost of training an ensemble grows with its size, and as such, prior work has typically been limited to ensembles with tens of elements. We show that alternative hypermodels can enjoy dramatic efficiency gain… ▽ More

    Submitted 12 June, 2020; originally announced June 2020.

    Comments: Published as a conference paper at ICLR 2020

  11. arXiv:2002.07282  [pdf, other

    cs.LG cs.AI stat.ML

    Langevin DQN

    Authors: Vikranth Dwaracherla, Benjamin Van Roy

    Abstract: Algorithms that tackle deep exploration -- an important challenge in reinforcement learning -- have relied on epistemic uncertainty representation through ensembles or other hypermodels, exploration bonuses, or visitation count distributions. An open question is whether deep exploration can be achieved by an incremental reinforcement learning algorithm that tracks a single point estimate, without… ▽ More

    Submitted 23 February, 2021; v1 submitted 17 February, 2020; originally announced February 2020.

    Comments: 5 figures, 14 pages

  12. arXiv:1912.10577  [pdf, other

    cs.LG cs.AI stat.ML

    Parameterized Indexed Value Function for Efficient Exploration in Reinforcement Learning

    Authors: Tian Tan, Zhihan Xiong, Vikranth R. Dwaracherla

    Abstract: It is well known that quantifying uncertainty in the action-value estimates is crucial for efficient exploration in reinforcement learning. Ensemble sampling offers a relatively computationally tractable way of doing this using randomized value functions. However, it still requires a huge amount of computational resources for complex problems. In this paper, we present an alternative, computationa… ▽ More

    Submitted 19 March, 2020; v1 submitted 22 December, 2019; originally announced December 2019.

    Comments: 17 pages, 4 figures, Proceedings of the 34th AAAI Conference on Artificial Intelligence

  13. arXiv:1804.05195  [pdf, other

    cs.RO cs.CV cs.LG

    Motion-based Object Segmentation based on Dense RGB-D Scene Flow

    Authors: Lin Shao, Parth Shah, Vikranth Dwaracherla, Jeannette Bohg

    Abstract: Given two consecutive RGB-D images, we propose a model that estimates a dense 3D motion field, also known as scene flow. We take advantage of the fact that in robot manipulation scenarios, scenes often consist of a set of rigidly moving objects. Our model jointly estimates (i) the segmentation of the scene into an unknown but finite number of objects, (ii) the motion trajectories of these objects… ▽ More

    Submitted 24 July, 2018; v1 submitted 14 April, 2018; originally announced April 2018.

    Comments: Accepted to IEEE Robotics and Automation Letters and selected by IROS'18 Program Committee for presentation at the Conference