Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–17 of 17 results for author: Raiman, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.19055  [pdf, other

    cs.SE cs.AI cs.LG

    Effective Large Language Model Debugging with Best-first Tree Search

    Authors: Jialin Song, Jonathan Raiman, Bryan Catanzaro

    Abstract: Large Language Models (LLMs) show promise in code generation tasks. However, their code-writing abilities are often limited in scope: while they can successfully implement simple functions, they struggle with more complex tasks. A fundamental difference with how an LLM writes code, compared to a human programmer, is that it cannot consistently spot and fix bugs. Debugging is a crucial skill for pr… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

  2. arXiv:2406.11857  [pdf, other

    cs.CY cs.AI

    AI Royalties -- an IP Framework to Compensate Artists & IP Holders for AI-Generated Content

    Authors: Pablo Ducru, Jonathan Raiman, Ronaldo Lemos, Clay Garner, George He, Hanna Balcha, Gabriel Souto, Sergio Branco, Celina Bottino

    Abstract: This article investigates how AI-generated content can disrupt central revenue streams of the creative industries, in particular the collection of dividends from intellectual property (IP) rights. It reviews the IP and copyright questions related to the input and output of generative AI systems. A systematic method is proposed to assess whether AI-generated outputs, especially images, infringe pre… ▽ More

    Submitted 5 April, 2024; originally announced June 2024.

    Comments: 7 pages, 2 figures, submitted to AAAI

  3. CircuitVAE: Efficient and Scalable Latent Circuit Optimization

    Authors: Jialin Song, Aidan Swope, Robert Kirby, Rajarshi Roy, Saad Godil, Jonathan Raiman, Bryan Catanzaro

    Abstract: Automatically designing fast and space-efficient digital circuits is challenging because circuits are discrete, must exactly implement the desired logic, and are costly to simulate. We address these challenges with CircuitVAE, a search algorithm that embeds computation graphs in a continuous space and optimizes a learned surrogate of physical simulation by gradient descent. By carefully controllin… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Design Automation Conference (DAC) 2024; the first two authors contributed equally

  4. arXiv:2405.17428  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models

    Authors: Chankyu Lee, Rajarshi Roy, Mengyao Xu, Jonathan Raiman, Mohammad Shoeybi, Bryan Catanzaro, Wei Ping

    Abstract: Decoder-only large language model (LLM)-based embedding models are beginning to outperform BERT or T5-based embedding models in general-purpose text embedding tasks, including dense vector-based retrieval. In this work, we introduce the NV-Embed model with a variety of architectural designs and training procedures to significantly enhance the performance of LLM as a versatile embedding model, whil… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  5. arXiv:2311.00176  [pdf, other

    cs.CL

    ChipNeMo: Domain-Adapted LLMs for Chip Design

    Authors: Mingjie Liu, Teodor-Dumitru Ene, Robert Kirby, Chris Cheng, Nathaniel Pinckney, Rongjian Liang, Jonah Alben, Himyanshu Anand, Sanmitra Banerjee, Ismet Bayraktaroglu, Bonita Bhaskaran, Bryan Catanzaro, Arjun Chaudhuri, Sharon Clay, Bill Dally, Laura Dang, Parikshit Deshpande, Siddhanth Dhodhi, Sameer Halepete, Eric Hill, Jiashang Hu, Sumit Jain, Ankit Jindal, Brucek Khailany, George Kokai , et al. (17 additional authors not shown)

    Abstract: ChipNeMo aims to explore the applications of large language models (LLMs) for industrial chip design. Instead of directly deploying off-the-shelf commercial or open-source LLMs, we instead adopt the following domain adaptation techniques: domain-adaptive tokenization, domain-adaptive continued pretraining, model alignment with domain-specific instructions, and domain-adapted retrieval models. We e… ▽ More

    Submitted 4 April, 2024; v1 submitted 31 October, 2023; originally announced November 2023.

    Comments: Updated results for ChipNeMo-70B model

  6. PrefixRL: Optimization of Parallel Prefix Circuits using Deep Reinforcement Learning

    Authors: Rajarshi Roy, Jonathan Raiman, Neel Kant, Ilyas Elkin, Robert Kirby, Michael Siu, Stuart Oberman, Saad Godil, Bryan Catanzaro

    Abstract: In this work, we present a reinforcement learning (RL) based approach to designing parallel prefix circuits such as adders or priority encoders that are fundamental to high-performance digital design. Unlike prior methods, our approach designs solutions tabula rasa purely through learning with synthesis in the loop. We design a grid-based state-action representation and an RL environment for const… ▽ More

    Submitted 14 May, 2022; originally announced May 2022.

    Comments: Copyright 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

    Journal ref: ACM/IEEE Design Automation Conference (DAC), 2021, pp. 853-858

  7. arXiv:2011.11472  [pdf, other

    cs.LG stat.ML

    Generative Adversarial Simulator

    Authors: Jonathan Raiman

    Abstract: Knowledge distillation between machine learning models has opened many new avenues for parameter count reduction, performance improvements, or amortizing training time when changing architectures between the teacher and student network. In the case of reinforcement learning, this technique has also been applied to distill teacher policies to students. Until now, policy distillation required access… ▽ More

    Submitted 23 November, 2020; originally announced November 2020.

    Comments: Presented at the Beyond "Tabula Rasa" in Reinforcement Learning (BeTR-RL): Agents that remember, adapt, and generalize ICLR 2020 workshop

  8. arXiv:1912.06721  [pdf, other

    cs.CL cs.LG

    Long-Term Planning and Situational Awareness in OpenAI Five

    Authors: Jonathan Raiman, Susan Zhang, Filip Wolski

    Abstract: Understanding how knowledge about the world is represented within model-free deep reinforcement learning methods is a major challenge given the black box nature of its learning process within high-dimensional observation and action spaces. AlphaStar and OpenAI Five have shown that agents can be trained without any explicit hierarchical macro-actions to reach superhuman skill in games that require… ▽ More

    Submitted 13 December, 2019; originally announced December 2019.

  9. arXiv:1912.06719  [pdf, other

    cs.CL cs.LG

    Neural Network Surgery with Sets

    Authors: Jonathan Raiman, Susan Zhang, Christy Dennison

    Abstract: The cost to train machine learning models has been increasing exponentially, making exploration and research into the correct features and architecture a costly or intractable endeavor at scale. However, using a technique named "surgery" OpenAI Five was continuously trained to play the game DotA 2 over the course of 10 months through 20 major changes in features and architecture. Surgery transfers… ▽ More

    Submitted 19 March, 2020; v1 submitted 13 December, 2019; originally announced December 2019.

  10. arXiv:1912.06680  [pdf, other

    cs.LG stat.ML

    Dota 2 with Large Scale Deep Reinforcement Learning

    Authors: OpenAI, :, Christopher Berner, Greg Brockman, Brooke Chan, Vicki Cheung, Przemysław Dębiak, Christy Dennison, David Farhi, Quirin Fischer, Shariq Hashme, Chris Hesse, Rafal Józefowicz, Scott Gray, Catherine Olsson, Jakub Pachocki, Michael Petrov, Henrique P. d. O. Pinto, Jonathan Raiman, Tim Salimans, Jeremy Schlatter, Jonas Schneider, Szymon Sidor, Ilya Sutskever, Jie Tang , et al. (2 additional authors not shown)

    Abstract: On April 13th, 2019, OpenAI Five became the first AI system to defeat the world champions at an esports game. The game of Dota 2 presents novel challenges for AI systems such as long time horizons, imperfect information, and complex, continuous state-action spaces, all challenges which will become increasingly central to more capable AI systems. OpenAI Five leveraged existing reinforcement learnin… ▽ More

    Submitted 13 December, 2019; originally announced December 2019.

  11. arXiv:1802.01021  [pdf, other

    cs.CL

    DeepType: Multilingual Entity Linking by Neural Type System Evolution

    Authors: Jonathan Raiman, Olivier Raiman

    Abstract: The wealth of structured (e.g. Wikidata) and unstructured data about the world available today presents an incredible opportunity for tomorrow's Artificial Intelligence. So far, integration of these two different modalities is a difficult process, involving many decisions concerning how best to represent the information so that it will be captured or useful, and hand-labeling large amounts of data… ▽ More

    Submitted 3 February, 2018; originally announced February 2018.

    Comments: Presented at AAAI 2018

  12. arXiv:1710.07654  [pdf, other

    cs.SD cs.AI cs.CL cs.LG eess.AS

    Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning

    Authors: Wei Ping, Kainan Peng, Andrew Gibiansky, Sercan O. Arik, Ajay Kannan, Sharan Narang, Jonathan Raiman, John Miller

    Abstract: We present Deep Voice 3, a fully-convolutional attention-based neural text-to-speech (TTS) system. Deep Voice 3 matches state-of-the-art neural speech synthesis systems in naturalness while training ten times faster. We scale Deep Voice 3 to data set sizes unprecedented for TTS, training on more than eight hundred hours of audio from over two thousand speakers. In addition, we identify common erro… ▽ More

    Submitted 22 February, 2018; v1 submitted 20 October, 2017; originally announced October 2017.

    Comments: Published as a conference paper at ICLR 2018. (v3 changed paper title)

  13. arXiv:1709.02828  [pdf, other

    cs.CL

    Globally Normalized Reader

    Authors: Jonathan Raiman, John Miller

    Abstract: Rapid progress has been made towards question answering (QA) systems that can extract answers from text. Existing neural approaches make use of expensive bi-directional attention mechanisms or score all possible answer spans, limiting scalability. We propose instead to cast extractive QA as an iterative search problem: select the answer's sentence, start word, and end word. This representation red… ▽ More

    Submitted 8 September, 2017; originally announced September 2017.

    Comments: Presented at EMNLP 2017

  14. arXiv:1705.08947  [pdf, other

    cs.CL

    Deep Voice 2: Multi-Speaker Neural Text-to-Speech

    Authors: Sercan Arik, Gregory Diamos, Andrew Gibiansky, John Miller, Kainan Peng, Wei Ping, Jonathan Raiman, Yanqi Zhou

    Abstract: We introduce a technique for augmenting neural text-to-speech (TTS) with lowdimensional trainable speaker embeddings to generate different voices from a single model. As a starting point, we show improvements over the two state-ofthe-art approaches for single-speaker neural TTS: Deep Voice 1 and Tacotron. We introduce Deep Voice 2, which is based on a similar pipeline with Deep Voice 1, but constr… ▽ More

    Submitted 20 September, 2017; v1 submitted 24 May, 2017; originally announced May 2017.

    Comments: Accepted in NIPS 2017

  15. arXiv:1702.07825  [pdf, other

    cs.CL cs.LG cs.NE cs.SD

    Deep Voice: Real-time Neural Text-to-Speech

    Authors: Sercan O. Arik, Mike Chrzanowski, Adam Coates, Gregory Diamos, Andrew Gibiansky, Yongguo Kang, Xian Li, John Miller, Andrew Ng, Jonathan Raiman, Shubho Sengupta, Mohammad Shoeybi

    Abstract: We present Deep Voice, a production-quality text-to-speech system constructed entirely from deep neural networks. Deep Voice lays the groundwork for truly end-to-end neural speech synthesis. The system comprises five major building blocks: a segmentation model for locating phoneme boundaries, a grapheme-to-phoneme conversion model, a phoneme duration prediction model, a fundamental frequency predi… ▽ More

    Submitted 7 March, 2017; v1 submitted 24 February, 2017; originally announced February 2017.

    Comments: Submitted to ICML 2017

  16. arXiv:1512.02595  [pdf, other

    cs.CL

    Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

    Authors: Dario Amodei, Rishita Anubhai, Eric Battenberg, Carl Case, Jared Casper, Bryan Catanzaro, Jingdong Chen, Mike Chrzanowski, Adam Coates, Greg Diamos, Erich Elsen, Jesse Engel, Linxi Fan, Christopher Fougner, Tony Han, Awni Hannun, Billy Jun, Patrick LeGresley, Libby Lin, Sharan Narang, Andrew Ng, Sherjil Ozair, Ryan Prenger, Jonathan Raiman, Sanjeev Satheesh , et al. (9 additional authors not shown)

    Abstract: We show that an end-to-end deep learning approach can be used to recognize either English or Mandarin Chinese speech--two vastly different languages. Because it replaces entire pipelines of hand-engineered components with neural networks, end-to-end learning allows us to handle a diverse variety of speech including noisy environments, accents and different languages. Key to our approach is our app… ▽ More

    Submitted 8 December, 2015; originally announced December 2015.

  17. arXiv:1506.08251  [pdf, other

    cs.LG

    Occam's Gates

    Authors: Jonathan Raiman, Szymon Sidor

    Abstract: We present a complimentary objective for training recurrent neural networks (RNN) with gating units that helps with regularization and interpretability of the trained model. Attention-based RNN models have shown success in many difficult sequence to sequence classification problems with long and short term dependencies, however these models are prone to overfitting. In this paper, we describe how… ▽ More

    Submitted 26 June, 2015; originally announced June 2015.

    Comments: In review at NIPS