Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–10 of 10 results for author: Delalleau, O

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.11704  [pdf, other

    cs.CL cs.AI cs.LG

    Nemotron-4 340B Technical Report

    Authors: Nvidia, :, Bo Adler, Niket Agarwal, Ashwath Aithal, Dong H. Anh, Pallab Bhattacharya, Annika Brundyn, Jared Casper, Bryan Catanzaro, Sharon Clay, Jonathan Cohen, Sirshak Das, Ayush Dattagupta, Olivier Delalleau, Leon Derczynski, Yi Dong, Daniel Egert, Ellie Evans, Aleksander Ficek, Denys Fridman, Shaona Ghosh, Boris Ginsburg, Igor Gitman, Tomasz Grzegorzek , et al. (58 additional authors not shown)

    Abstract: We release the Nemotron-4 340B model family, including Nemotron-4-340B-Base, Nemotron-4-340B-Instruct, and Nemotron-4-340B-Reward. Our models are open access under the NVIDIA Open Model License Agreement, a permissive model license that allows distribution, modification, and use of the models and its outputs. These models perform competitively to open access models on a wide range of evaluation be… ▽ More

    Submitted 6 August, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

  2. arXiv:2406.08673  [pdf, ps, other

    cs.CL cs.AI cs.LG

    HelpSteer2: Open-source dataset for training top-performing reward models

    Authors: Zhilin Wang, Yi Dong, Olivier Delalleau, Jiaqi Zeng, Gerald Shen, Daniel Egert, Jimmy J. Zhang, Makesh Narsimhan Sreedhar, Oleksii Kuchaiev

    Abstract: High-quality preference datasets are essential for training reward models that can effectively guide large language models (LLMs) in generating high-quality responses aligned with human preferences. As LLMs become stronger and better aligned, permissively licensed preference datasets, such as Open Assistant, HH-RLHF, and HelpSteer need to be updated to remain effective for reward modeling. Methods… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  3. arXiv:2405.01481  [pdf, other

    cs.CL cs.AI cs.LG

    NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment

    Authors: Gerald Shen, Zhilin Wang, Olivier Delalleau, Jiaqi Zeng, Yi Dong, Daniel Egert, Shengyang Sun, Jimmy Zhang, Sahil Jain, Ali Taghibakhshi, Markel Sanz Ausin, Ashwath Aithal, Oleksii Kuchaiev

    Abstract: Aligning Large Language Models (LLMs) with human values and preferences is essential for making them helpful and safe. However, building efficient tools to perform alignment can be challenging, especially for the largest and most competent LLMs which often contain tens or hundreds of billions of parameters. We create NeMo-Aligner, a toolkit for model alignment that can efficiently scale to a thous… ▽ More

    Submitted 3 September, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

    Comments: 16 pages, 4 figures, Accepted to COLM 2024

  4. arXiv:2311.09528  [pdf, other

    cs.CL cs.AI cs.LG

    HelpSteer: Multi-attribute Helpfulness Dataset for SteerLM

    Authors: Zhilin Wang, Yi Dong, Jiaqi Zeng, Virginia Adams, Makesh Narsimhan Sreedhar, Daniel Egert, Olivier Delalleau, Jane Polak Scowcroft, Neel Kant, Aidan Swope, Oleksii Kuchaiev

    Abstract: Existing open-source helpfulness preference datasets do not specify what makes some responses more helpful and others less so. Models trained on these datasets can incidentally learn to model dataset artifacts (e.g. preferring longer but unhelpful responses only due to their length). To alleviate this problem, we collect HelpSteer, a multi-attribute helpfulness dataset annotated for the various as… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

  5. arXiv:2306.00867  [pdf, other

    cs.LG cs.AI

    IQL-TD-MPC: Implicit Q-Learning for Hierarchical Model Predictive Control

    Authors: Rohan Chitnis, Yingchen Xu, Bobak Hashemi, Lucas Lehnert, Urun Dogan, Zheqing Zhu, Olivier Delalleau

    Abstract: Model-based reinforcement learning (RL) has shown great promise due to its sample efficiency, but still struggles with long-horizon sparse-reward tasks, especially in offline settings where the agent learns from a fixed dataset. We hypothesize that model-based RL agents struggle in these environments due to a lack of long-term planning capabilities, and that planning in a temporally abstract model… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

    Journal ref: Short version published at ICRA 2024 (https://tinyurl.com/icra24-iqltdmpc)

  6. arXiv:2010.02838  [pdf, other

    cs.LG cs.DC math.OC

    A Closer Look at Codistillation for Distributed Training

    Authors: Shagun Sodhani, Olivier Delalleau, Mahmoud Assran, Koustuv Sinha, Nicolas Ballas, Michael Rabbat

    Abstract: Codistillation has been proposed as a mechanism to share knowledge among concurrently trained models by encouraging them to represent the same function through an auxiliary loss. This contrasts with the more commonly used fully-synchronous data-parallel stochastic gradient descent methods, where different model replicas average their gradients (or parameters) at every iteration and thus maintain i… ▽ More

    Submitted 25 July, 2021; v1 submitted 6 October, 2020; originally announced October 2020.

    Comments: Under review

  7. arXiv:1912.11077  [pdf, other

    cs.LG cs.AI stat.ML

    Discrete and Continuous Action Representation for Practical RL in Video Games

    Authors: Olivier Delalleau, Maxim Peter, Eloi Alonso, Adrien Logut

    Abstract: While most current research in Reinforcement Learning (RL) focuses on improving the performance of the algorithms in controlled environments, the use of RL under constraints like those met in the video game industry is rarely studied. Operating under such constraints, we propose Hybrid SAC, an extension of the Soft Actor-Critic algorithm able to handle discrete, continuous and parameterized action… ▽ More

    Submitted 23 December, 2019; originally announced December 2019.

    Comments: Presented at the AAAI-20 Workshop on Reinforcement Learning in Games

    ACM Class: I.2.6

  8. arXiv:1910.04054  [pdf, other

    cs.LG cs.DC cs.NI stat.ML

    MVFST-RL: An Asynchronous RL Framework for Congestion Control with Delayed Actions

    Authors: Viswanath Sivakumar, Olivier Delalleau, Tim Rocktäschel, Alexander H. Miller, Heinrich Küttler, Nantas Nardelli, Mike Rabbat, Joelle Pineau, Sebastian Riedel

    Abstract: Effective network congestion control strategies are key to keeping the Internet (or any large computer network) operational. Network congestion control has been dominated by hand-crafted heuristics for decades. Recently, ReinforcementLearning (RL) has emerged as an alternative to automatically optimize such control strategies. Research so far has primarily considered RL interfaces which block the… ▽ More

    Submitted 26 May, 2021; v1 submitted 9 October, 2019; originally announced October 2019.

    Comments: Workshop on ML for Systems at NeurIPS 2019

  9. arXiv:1605.02688  [pdf, other

    cs.SC cs.LG cs.MS

    Theano: A Python framework for fast computation of mathematical expressions

    Authors: The Theano Development Team, Rami Al-Rfou, Guillaume Alain, Amjad Almahairi, Christof Angermueller, Dzmitry Bahdanau, Nicolas Ballas, Frédéric Bastien, Justin Bayer, Anatoly Belikov, Alexander Belopolsky, Yoshua Bengio, Arnaud Bergeron, James Bergstra, Valentin Bisson, Josh Bleecher Snyder, Nicolas Bouchard, Nicolas Boulanger-Lewandowski, Xavier Bouthillier, Alexandre de Brébisson, Olivier Breuleux, Pierre-Luc Carrier, Kyunghyun Cho, Jan Chorowski, Paul Christiano , et al. (88 additional authors not shown)

    Abstract: Theano is a Python library that allows to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. Since its introduction, it has been one of the most used CPU and GPU mathematical compilers - especially in the machine learning community - and has shown steady performance improvements. Theano is being actively and continuously developed since 2008, mu… ▽ More

    Submitted 9 May, 2016; originally announced May 2016.

    Comments: 19 pages, 5 figures

  10. arXiv:1209.0521  [pdf, other

    cs.LG stat.ML

    Efficient EM Training of Gaussian Mixtures with Missing Data

    Authors: Olivier Delalleau, Aaron Courville, Yoshua Bengio

    Abstract: In data-mining applications, we are frequently faced with a large fraction of missing entries in the data matrix, which is problematic for most discriminant machine learning algorithms. A solution that we explore in this paper is the use of a generative model (a mixture of Gaussians) to compute the conditional expectation of the missing variables given the observed variables. Since training a Gaus… ▽ More

    Submitted 8 January, 2018; v1 submitted 3 September, 2012; originally announced September 2012.