Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–10 of 10 results for author: Achiam, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2303.08774  [pdf, other

    cs.CL cs.AI

    GPT-4 Technical Report

    Authors: OpenAI, Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, Red Avila, Igor Babuschkin, Suchir Balaji, Valerie Balcom, Paul Baltescu, Haiming Bao, Mohammad Bavarian, Jeff Belgum, Irwan Bello, Jake Berdine, Gabriel Bernadett-Shapiro, Christopher Berner, Lenny Bogdonoff, Oleg Boiko , et al. (256 additional authors not shown)

    Abstract: We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs. While less capable than humans in many real-world scenarios, GPT-4 exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score around the top 10% of test takers. GPT-4 is a Transformer-based mo… ▽ More

    Submitted 4 March, 2024; v1 submitted 15 March, 2023; originally announced March 2023.

    Comments: 100 pages; updated authors list; fixed author names and added citation

  2. arXiv:2207.14157  [pdf, other

    cs.SE cs.AI

    A Hazard Analysis Framework for Code Synthesis Large Language Models

    Authors: Heidy Khlaaf, Pamela Mishkin, Joshua Achiam, Gretchen Krueger, Miles Brundage

    Abstract: Codex, a large language model (LLM) trained on a variety of codebases, exceeds the previous state of the art in its capacity to synthesize and generate code. Although Codex provides a plethora of benefits, models that may generate code on such scale have significant limitations, alignment problems, the potential to be misused, and the possibility to increase the rate of progress in technical field… ▽ More

    Submitted 25 July, 2022; originally announced July 2022.

  3. arXiv:2107.03374  [pdf, other

    cs.LG

    Evaluating Large Language Models Trained on Code

    Authors: Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian, Clemens Winter , et al. (33 additional authors not shown)

    Abstract: We introduce Codex, a GPT language model fine-tuned on publicly available code from GitHub, and study its Python code-writing capabilities. A distinct production version of Codex powers GitHub Copilot. On HumanEval, a new evaluation set we release to measure functional correctness for synthesizing programs from docstrings, our model solves 28.8% of the problems, while GPT-3 solves 0% and GPT-J sol… ▽ More

    Submitted 14 July, 2021; v1 submitted 7 July, 2021; originally announced July 2021.

    Comments: corrected typos, added references, added authors, added acknowledgements

  4. arXiv:2007.03964  [pdf, other

    math.OC cs.AI cs.LG

    Responsive Safety in Reinforcement Learning by PID Lagrangian Methods

    Authors: Adam Stooke, Joshua Achiam, Pieter Abbeel

    Abstract: Lagrangian methods are widely used algorithms for constrained optimization problems, but their learning dynamics exhibit oscillations and overshoot which, when applied to safe reinforcement learning, leads to constraint-violating behavior during agent training. We address this shortcoming by proposing a novel Lagrange multiplier update method that utilizes derivatives of the constraint function. W… ▽ More

    Submitted 8 July, 2020; originally announced July 2020.

    Comments: ICML 2020

  5. arXiv:1903.08894  [pdf, other

    cs.LG cs.AI

    Towards Characterizing Divergence in Deep Q-Learning

    Authors: Joshua Achiam, Ethan Knight, Pieter Abbeel

    Abstract: Deep Q-Learning (DQL), a family of temporal difference algorithms for control, employs three techniques collectively known as the `deadly triad' in reinforcement learning: bootstrapping, off-policy learning, and function approximation. Prior work has demonstrated that together these can lead to divergence in Q-learning algorithms, but the conditions under which divergence occurs are not well-under… ▽ More

    Submitted 21 March, 2019; originally announced March 2019.

  6. arXiv:1807.10299  [pdf, other

    cs.AI

    Variational Option Discovery Algorithms

    Authors: Joshua Achiam, Harrison Edwards, Dario Amodei, Pieter Abbeel

    Abstract: We explore methods for option discovery based on variational inference and make two algorithmic contributions. First: we highlight a tight connection between variational option discovery methods and variational autoencoders, and introduce Variational Autoencoding Learning of Options by Reinforcement (VALOR), a new method derived from the connection. In VALOR, the policy encodes contexts from a noi… ▽ More

    Submitted 26 July, 2018; originally announced July 2018.

  7. arXiv:1803.02999  [pdf, other

    cs.LG

    On First-Order Meta-Learning Algorithms

    Authors: Alex Nichol, Joshua Achiam, John Schulman

    Abstract: This paper considers meta-learning problems, where there is a distribution of tasks, and we would like to obtain an agent that performs well (i.e., learns quickly) when presented with a previously unseen task sampled from this distribution. We analyze a family of algorithms for learning a parameter initialization that can be fine-tuned quickly on a new task, using only first-order derivatives for… ▽ More

    Submitted 22 October, 2018; v1 submitted 8 March, 2018; originally announced March 2018.

  8. arXiv:1705.10528  [pdf, other

    cs.LG

    Constrained Policy Optimization

    Authors: Joshua Achiam, David Held, Aviv Tamar, Pieter Abbeel

    Abstract: For many applications of reinforcement learning it can be more convenient to specify both a reward function and constraints, rather than trying to design behavior through the reward function. For example, systems that physically interact with or around humans should satisfy safety constraints. Recent advances in policy search algorithms (Mnih et al., 2016, Schulman et al., 2015, Lillicrap et al.,… ▽ More

    Submitted 30 May, 2017; originally announced May 2017.

    Comments: Accepted to ICML 2017

  9. arXiv:1703.01732  [pdf, other

    cs.LG

    Surprise-Based Intrinsic Motivation for Deep Reinforcement Learning

    Authors: Joshua Achiam, Shankar Sastry

    Abstract: Exploration in complex domains is a key challenge in reinforcement learning, especially for tasks with very sparse rewards. Recent successes in deep reinforcement learning have been achieved mostly using simple heuristic exploration strategies such as $ε$-greedy action selection or Gaussian control noise, but there are many tasks where these methods are insufficient to make any learning progress.… ▽ More

    Submitted 6 March, 2017; originally announced March 2017.

    Comments: Appeared in Deep RL Workshop at NIPS 2016

  10. arXiv:1602.09118  [pdf, ps, other

    cs.LG cs.AI stat.ML

    Easy Monotonic Policy Iteration

    Authors: Joshua Achiam

    Abstract: A key problem in reinforcement learning for control with general function approximators (such as deep neural networks and other nonlinear functions) is that, for many algorithms employed in practice, updates to the policy or $Q$-function may fail to improve performance---or worse, actually cause the policy performance to degrade. Prior work has addressed this for policy iteration by deriving tight… ▽ More

    Submitted 29 February, 2016; originally announced February 2016.