Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–33 of 33 results for author: Lambert, N

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.18495  [pdf, other

    cs.CL

    WildGuard: Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs

    Authors: Seungju Han, Kavel Rao, Allyson Ettinger, Liwei Jiang, Bill Yuchen Lin, Nathan Lambert, Yejin Choi, Nouha Dziri

    Abstract: We introduce WildGuard -- an open, light-weight moderation tool for LLM safety that achieves three goals: (1) identifying malicious intent in user prompts, (2) detecting safety risks of model responses, and (3) determining model refusal rate. Together, WildGuard serves the increasing needs for automatic safety moderation and evaluation of LLM interactions, providing a one-stop tool with enhanced a… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: First two authors contributed equally. Third and fourth authors contributed equally

  2. arXiv:2406.09279  [pdf, other

    cs.CL

    Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback

    Authors: Hamish Ivison, Yizhong Wang, Jiacheng Liu, Zeqiu Wu, Valentina Pyatkin, Nathan Lambert, Noah A. Smith, Yejin Choi, Hannaneh Hajishirzi

    Abstract: Learning from preference feedback has emerged as an essential step for improving the generation quality and performance of modern language models (LMs). Despite its widespread use, the way preference-based learning is applied varies wildly, with differing data, learning algorithms, and evaluations used, making disentangling the impact of each aspect difficult. In this work, we identify four core a… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Preprint

  3. arXiv:2405.15802  [pdf

    cs.SE cs.AI

    Towards a Framework for Openness in Foundation Models: Proceedings from the Columbia Convening on Openness in Artificial Intelligence

    Authors: Adrien Basdevant, Camille François, Victor Storchan, Kevin Bankston, Ayah Bdeir, Brian Behlendorf, Merouane Debbah, Sayash Kapoor, Yann LeCun, Mark Surman, Helen King-Turvey, Nathan Lambert, Stefano Maffulli, Nik Marda, Govind Shivkumar, Justine Tunney

    Abstract: Over the past year, there has been a robust debate about the benefits and risks of open sourcing foundation models. However, this discussion has often taken place at a high level of generality or with a narrow focus on specific technical attributes. In part, this is because defining open source for foundation models has proven tricky, given its significant differences from traditional software dev… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

  4. arXiv:2405.01511  [pdf, other

    cs.CL

    D2PO: Discriminator-Guided DPO with Response Evaluation Models

    Authors: Prasann Singhal, Nathan Lambert, Scott Niekum, Tanya Goyal, Greg Durrett

    Abstract: Varied approaches for aligning language models have been proposed, including supervised fine-tuning, RLHF, and direct optimization methods such as DPO. Although DPO has rapidly gained popularity due to its straightforward training process and competitive results, there is an open question of whether there remain practical advantages of using a discriminator, like a reward model, to evaluate respon… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: 20 pages, 12 figures

  5. arXiv:2404.10271  [pdf, other

    cs.LG cs.AI cs.CL cs.CY cs.GT

    Social Choice Should Guide AI Alignment in Dealing with Diverse Human Feedback

    Authors: Vincent Conitzer, Rachel Freedman, Jobst Heitzig, Wesley H. Holliday, Bob M. Jacobs, Nathan Lambert, Milan Mossé, Eric Pacuit, Stuart Russell, Hailey Schoelkopf, Emanuel Tewolde, William S. Zwicker

    Abstract: Foundation models such as GPT-4 are fine-tuned to avoid unsafe or otherwise problematic behavior, such as helping to commit crimes or producing racist text. One approach to fine-tuning, called reinforcement learning from human feedback, learns from humans' expressed preferences over multiple outputs. Another approach is constitutional AI, in which the input from humans is a list of high-level prin… ▽ More

    Submitted 4 June, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

    Comments: 15 pages, 4 figures

    MSC Class: 68T01; 68T50; 91B14; 91B12 ACM Class: I.2.0; I.2.7; K.4.2; I.2.m; J.4

  6. arXiv:2403.13787  [pdf, other

    cs.LG

    RewardBench: Evaluating Reward Models for Language Modeling

    Authors: Nathan Lambert, Valentina Pyatkin, Jacob Morrison, LJ Miranda, Bill Yuchen Lin, Khyathi Chandu, Nouha Dziri, Sachin Kumar, Tom Zick, Yejin Choi, Noah A. Smith, Hannaneh Hajishirzi

    Abstract: Reward models (RMs) are at the crux of successfully using RLHF to align pretrained models to human preferences, yet there has been relatively little study that focuses on evaluation of those models. Evaluating reward models presents an opportunity to understand the opaque technologies used for alignment of language models and which values are embedded in them. Resources for reward model training a… ▽ More

    Submitted 8 June, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

    Comments: 44 pages, 19 figures, 12 tables

  7. arXiv:2402.16827  [pdf, other

    cs.CL cs.LG

    A Survey on Data Selection for Language Models

    Authors: Alon Albalak, Yanai Elazar, Sang Michael Xie, Shayne Longpre, Nathan Lambert, Xinyi Wang, Niklas Muennighoff, Bairu Hou, Liangming Pan, Haewon Jeong, Colin Raffel, Shiyu Chang, Tatsunori Hashimoto, William Yang Wang

    Abstract: A major factor in the recent success of large language models is the use of enormous and ever-growing text datasets for unsupervised pre-training. However, naively training a model on all available data may not be optimal (or feasible), as the quality of available text data can vary. Filtering out data can also decrease the carbon footprint and financial costs of training models by reducing the am… ▽ More

    Submitted 8 March, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

    Comments: Paper list available at https://github.com/alon-albalak/data-selection-survey

  8. arXiv:2402.00838  [pdf, other

    cs.CL

    OLMo: Accelerating the Science of Language Models

    Authors: Dirk Groeneveld, Iz Beltagy, Pete Walsh, Akshita Bhagia, Rodney Kinney, Oyvind Tafjord, Ananya Harsh Jha, Hamish Ivison, Ian Magnusson, Yizhong Wang, Shane Arora, David Atkinson, Russell Authur, Khyathi Raghavi Chandu, Arman Cohan, Jennifer Dumas, Yanai Elazar, Yuling Gu, Jack Hessel, Tushar Khot, William Merrill, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam , et al. (18 additional authors not shown)

    Abstract: Language models (LMs) have become ubiquitous in both NLP research and in commercial product offerings. As their commercial importance has surged, the most powerful models have become closed off, gated behind proprietary interfaces, with important details of their training data, architectures, and development undisclosed. Given the importance of these details in scientifically studying these models… ▽ More

    Submitted 7 June, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

  9. arXiv:2402.00159  [pdf, other

    cs.CL

    Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research

    Authors: Luca Soldaini, Rodney Kinney, Akshita Bhagia, Dustin Schwenk, David Atkinson, Russell Authur, Ben Bogin, Khyathi Chandu, Jennifer Dumas, Yanai Elazar, Valentin Hofmann, Ananya Harsh Jha, Sachin Kumar, Li Lucy, Xinxi Lyu, Nathan Lambert, Ian Magnusson, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam, Matthew E. Peters, Abhilasha Ravichander, Kyle Richardson, Zejiang Shen , et al. (11 additional authors not shown)

    Abstract: Information about pretraining corpora used to train the current best-performing language models is seldom discussed: commercial models rarely detail their data, and even open models are often released without accompanying training data or recipes to reproduce them. As a result, it is challenging to conduct and advance scientific research on language modeling, such as understanding how training dat… ▽ More

    Submitted 6 June, 2024; v1 submitted 31 January, 2024; originally announced February 2024.

    Comments: Accepted at ACL 2024; Dataset: https://hf.co/datasets/allenai/dolma; Code: https://github.com/allenai/dolma

  10. arXiv:2311.10702  [pdf, other

    cs.CL

    Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2

    Authors: Hamish Ivison, Yizhong Wang, Valentina Pyatkin, Nathan Lambert, Matthew Peters, Pradeep Dasigi, Joel Jang, David Wadden, Noah A. Smith, Iz Beltagy, Hannaneh Hajishirzi

    Abstract: Since the release of TÜLU [Wang et al., 2023b], open resources for instruction tuning have developed quickly, from better base models to new finetuning techniques. We test and incorporate a number of these advances into TÜLU, resulting in TÜLU 2, a suite of improved TÜLU models for advancing the understanding and best practices of adapting pretrained language models to downstream tasks and user pr… ▽ More

    Submitted 19 November, 2023; v1 submitted 17 November, 2023; originally announced November 2023.

    Comments: technical report; fixed zephyr numbers

  11. arXiv:2311.00168  [pdf, other

    cs.LG

    The Alignment Ceiling: Objective Mismatch in Reinforcement Learning from Human Feedback

    Authors: Nathan Lambert, Roberto Calandra

    Abstract: Reinforcement learning from human feedback (RLHF) has emerged as a powerful technique to make large language models (LLMs) more capable in complex settings. RLHF proceeds as collecting human preference data, training a reward model on said data, and optimizing a base ML model with respect to said reward for extrinsic evaluation metrics (e.g. MMLU, GSM8k). RLHF relies on many assumptions about how… ▽ More

    Submitted 1 February, 2024; v1 submitted 31 October, 2023; originally announced November 2023.

    Comments: 11 pages, 5 figures

  12. arXiv:2310.16944  [pdf, other

    cs.LG cs.CL

    Zephyr: Direct Distillation of LM Alignment

    Authors: Lewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sanseviero, Alexander M. Rush, Thomas Wolf

    Abstract: We aim to produce a smaller language model that is aligned to user intent. Previous research has shown that applying distilled supervised fine-tuning (dSFT) on larger models significantly improves task accuracy; however, these models are unaligned, i.e. they do not respond well to natural prompts. To distill this property, we experiment with the use of preference data from AI Feedback (AIF). Start… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

  13. arXiv:2310.13595  [pdf, other

    cs.CY

    The History and Risks of Reinforcement Learning and Human Feedback

    Authors: Nathan Lambert, Thomas Krendl Gilbert, Tom Zick

    Abstract: Reinforcement learning from human feedback (RLHF) has emerged as a powerful technique to make large language models (LLMs) easier to use and more effective. A core piece of the RLHF process is the training and utilization of a model of human preferences that acts as a reward function for optimization. This approach, which operates at the intersection of many stakeholders and academic disciplines,… ▽ More

    Submitted 28 November, 2023; v1 submitted 20 October, 2023; originally announced October 2023.

    Comments: 14 pages, 3 figures

  14. arXiv:2310.06253  [pdf, other

    cs.LG

    A Unified View on Solving Objective Mismatch in Model-Based Reinforcement Learning

    Authors: Ran Wei, Nathan Lambert, Anthony McDonald, Alfredo Garcia, Roberto Calandra

    Abstract: Model-based Reinforcement Learning (MBRL) aims to make agents more sample-efficient, adaptive, and explainable by learning an explicit model of the environment. While the capabilities of MBRL agents have significantly improved in recent years, how to best learn the model is still an unresolved question. The majority of MBRL algorithms aim at training the model to make accurate predictions about th… ▽ More

    Submitted 6 April, 2024; v1 submitted 9 October, 2023; originally announced October 2023.

  15. arXiv:2308.00862  [pdf, ps, other

    cs.CY

    Confidence-Building Measures for Artificial Intelligence: Workshop Proceedings

    Authors: Sarah Shoker, Andrew Reddie, Sarah Barrington, Ruby Booth, Miles Brundage, Husanjot Chahal, Michael Depp, Bill Drexel, Ritwik Gupta, Marina Favaro, Jake Hecla, Alan Hickey, Margarita Konaev, Kirthi Kumar, Nathan Lambert, Andrew Lohn, Cullen O'Keefe, Nazneen Rajani, Michael Sellitto, Robert Trager, Leah Walker, Alexa Wehsener, Jessica Young

    Abstract: Foundation models could eventually introduce several pathways for undermining state security: accidents, inadvertent escalation, unintentional conflict, the proliferation of weapons, and the interference with human diplomacy are just a few on a long list. The Confidence-Building Measures for Artificial Intelligence workshop hosted by the Geopolitics Team at OpenAI and the Berkeley Risk and Securit… ▽ More

    Submitted 3 August, 2023; v1 submitted 1 August, 2023; originally announced August 2023.

  16. Enjeux de communication dans la multirepr{é}sentation cartographique reproductible

    Authors: Nicolas Lambert, Timothée Giraud, Ronan Ysebaert

    Abstract: This chapter deepens cartographic communication through a cartographic multirepresentation exercise. Using a single dataset on World population data, the chapter presents a series of 13 different maps to illustrate how mapping is primarily a matter of choices and methods.

    Submitted 19 June, 2023; originally announced June 2023.

    Comments: in French language

    Journal ref: Communication cartographique, ISTE Group, pp.73-102, 2022

  17. arXiv:2212.05129  [pdf, other

    cs.AI cs.LG

    Measuring Data

    Authors: Margaret Mitchell, Alexandra Sasha Luccioni, Nathan Lambert, Marissa Gerchick, Angelina McMillan-Major, Ezinwanne Ozoani, Nazneen Rajani, Tristan Thrush, Yacine Jernite, Douwe Kiela

    Abstract: We identify the task of measuring data to quantitatively characterize the composition of machine learning data and datasets. Similar to an object's height, width, and volume, data measurements quantify different attributes of data along common dimensions that support comparison. Several lines of research have proposed what we refer to as measurements, with differing terminology; we bring some of t… ▽ More

    Submitted 13 February, 2023; v1 submitted 9 December, 2022; originally announced December 2022.

  18. arXiv:2204.10817  [pdf, other

    cs.LG cs.CY

    Reward Reports for Reinforcement Learning

    Authors: Thomas Krendl Gilbert, Nathan Lambert, Sarah Dean, Tom Zick, Aaron Snoswell

    Abstract: Building systems that are good for society in the face of complex societal effects requires a dynamic approach. Recent approaches to machine learning (ML) documentation have demonstrated the promise of discursive frameworks for deliberation about these complexities. However, these developments have been grounded in a static ML paradigm, leaving the role of feedback and post-deployment performance… ▽ More

    Submitted 19 March, 2023; v1 submitted 22 April, 2022; originally announced April 2022.

  19. arXiv:2203.09637  [pdf, other

    cs.LG

    Investigating Compounding Prediction Errors in Learned Dynamics Models

    Authors: Nathan Lambert, Kristofer Pister, Roberto Calandra

    Abstract: Accurately predicting the consequences of agents' actions is a key prerequisite for planning in robotic control. Model-based reinforcement learning (MBRL) is one paradigm which relies on the iterative learning and prediction of state-action transitions to solve a task. Deep MBRL has become a popular candidate, using a neural network to learn a dynamics model that predicts with each pass from high-… ▽ More

    Submitted 17 March, 2022; originally announced March 2022.

    Comments: 25 pages, 19 figures

  20. arXiv:2202.05716  [pdf

    cs.LG cs.CY

    Choices, Risks, and Reward Reports: Charting Public Policy for Reinforcement Learning Systems

    Authors: Thomas Krendl Gilbert, Sarah Dean, Tom Zick, Nathan Lambert

    Abstract: In the long term, reinforcement learning (RL) is considered by many AI theorists to be the most promising path to artificial general intelligence. This places RL practitioners in a position to design systems that have never existed before and lack prior documentation in law and policy. Public agencies could intervene on complex dynamics that were previously too opaque to deliberate about, and long… ▽ More

    Submitted 11 February, 2022; originally announced February 2022.

    Comments: 60 pages

    Journal ref: Center for Long Term Cybersecurity Whitepaper Series Feb. 2022; see release https://cltc.berkeley.edu/2022/02/08/reward-reports/

  21. arXiv:2201.11861  [pdf, other

    cs.LG

    The Challenges of Exploration for Offline Reinforcement Learning

    Authors: Nathan Lambert, Markus Wulfmeier, William Whitney, Arunkumar Byravan, Michael Bloesch, Vibhavari Dasagi, Tim Hertweck, Martin Riedmiller

    Abstract: Offline Reinforcement Learning (ORL) enablesus to separately study the two interlinked processes of reinforcement learning: collecting informative experience and inferring optimal behaviour. The second step has been widely studied in the offline setting, but just as critical to data-efficient RL is the collection of informative data. The task-agnostic setting for data collection, where the task is… ▽ More

    Submitted 18 February, 2022; v1 submitted 27 January, 2022; originally announced January 2022.

  22. arXiv:2108.13606  [pdf, other

    cs.RO

    BotNet: A Simulator for Studying the Effects of Accurate Communication Models on Multi-agent and Swarm Control

    Authors: Mark Selden, Jason Zhou, Felipe Campos, Nathan Lambert, Daniel Drew, Kristofer S. J. Pister

    Abstract: Decentralized control in multi-robot systems is dependent on accurate and reliable communication between agents. Important communication factors, such as latency and packet delivery ratio, are strong functions of the number of agents in the network. Findings from studies of mobile and high node-count radio-frequency (RF) mesh networks have only been transferred to the domain of multi-robot systems… ▽ More

    Submitted 31 August, 2021; originally announced August 2021.

    Comments: 9 pages, 8 figures

  23. Axes for Sociotechnical Inquiry in AI Research

    Authors: Sarah Dean, Thomas Krendl Gilbert, Nathan Lambert, Tom Zick

    Abstract: The development of artificial intelligence (AI) technologies has far exceeded the investigation of their relationship with society. Sociotechnical inquiry is needed to mitigate the harms of new technologies whose potential impacts remain poorly understood. To date, subfields of AI research develop primarily individual views on their relationship with sociotechnics, while tools for external investi… ▽ More

    Submitted 26 April, 2021; originally announced May 2021.

    Comments: 9 pages, 1 figure

  24. arXiv:2104.10159  [pdf, other

    cs.AI eess.SY

    MBRL-Lib: A Modular Library for Model-based Reinforcement Learning

    Authors: Luis Pineda, Brandon Amos, Amy Zhang, Nathan O. Lambert, Roberto Calandra

    Abstract: Model-based reinforcement learning is a compelling framework for data-efficient learning of agents that interact with the world. This family of algorithms has many subcomponents that need to be carefully selected and tuned. As a result the entry-bar for researchers to approach the field and to deploy it in real-world tasks can be daunting. In this paper, we present MBRL-Lib -- a machine learning l… ▽ More

    Submitted 20 April, 2021; originally announced April 2021.

  25. arXiv:2102.13651  [pdf, other

    cs.LG cs.AI cs.NE eess.SY

    On the Importance of Hyperparameter Optimization for Model-based Reinforcement Learning

    Authors: Baohe Zhang, Raghu Rajan, Luis Pineda, Nathan Lambert, André Biedenkapp, Kurtland Chua, Frank Hutter, Roberto Calandra

    Abstract: Model-based Reinforcement Learning (MBRL) is a promising framework for learning control in a data-efficient manner. MBRL algorithms can be fairly complex due to the separate dynamics modeling and the subsequent planning algorithm, and as a result, they often possess tens of hyperparameters and architectural choices. For this reason, MBRL typically requires significant human expertise before it can… ▽ More

    Submitted 26 February, 2021; originally announced February 2021.

    Comments: 19 pages, accepted by AISTATS 2021

  26. arXiv:2102.04255  [pdf, other

    cs.CY cs.AI

    AI Development for the Public Interest: From Abstraction Traps to Sociotechnical Risks

    Authors: McKane Andrus, Sarah Dean, Thomas Krendl Gilbert, Nathan Lambert, Tom Zick

    Abstract: Despite interest in communicating ethical problems and social contexts within the undergraduate curriculum to advance Public Interest Technology (PIT) goals, interventions at the graduate level remain largely unexplored. This may be due to the conflicting ways through which distinct Artificial Intelligence (AI) research tracks conceive of their interface with social contexts. In this paper we trac… ▽ More

    Submitted 4 February, 2021; originally announced February 2021.

    Comments: 8 Pages

  27. arXiv:2012.09156  [pdf, other

    cs.LG cs.RO

    Learning Accurate Long-term Dynamics for Model-based Reinforcement Learning

    Authors: Nathan O. Lambert, Albert Wilcox, Howard Zhang, Kristofer S. J. Pister, Roberto Calandra

    Abstract: Accurately predicting the dynamics of robotic systems is crucial for model-based control and reinforcement learning. The most common way to estimate dynamics is by fitting a one-step ahead prediction model and using it to recursively propagate the predicted state distribution over long horizons. Unfortunately, this approach is known to compound even small prediction errors, making long-term predic… ▽ More

    Submitted 31 August, 2021; v1 submitted 16 December, 2020; originally announced December 2020.

    Comments: 8 pages, +4 pages appendix

  28. Nonholonomic Yaw Control of an Underactuated Flying Robot with Model-based Reinforcement Learning

    Authors: Nathan Lambert, Craig Schindler, Daniel Drew, Kristofer Pister

    Abstract: Nonholonomic control is a candidate to control nonlinear systems with path-dependant states. We investigate an underactuated flying micro-aerial-vehicle, the ionocraft, that requires nonholonomic control in the yaw-direction for complete attitude control. Deploying an analytical control law involves substantial engineering design and is sensitive to inaccuracy in the system model. With specific as… ▽ More

    Submitted 12 January, 2021; v1 submitted 2 September, 2020; originally announced September 2020.

    Comments: 7 pages, 1 page appendix

    Journal ref: IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 455-461, April 2021

  29. arXiv:2004.13194  [pdf, other

    cs.RO

    Learning for Microrobot Exploration: Model-based Locomotion, Sparse-robust Navigation, and Low-power Deep Classification

    Authors: Nathan O. Lambert, Farhan Toddywala, Brian Liao, Eric Zhu, Lydia Lee, Kristofer S. J. Pister

    Abstract: Building intelligent autonomous systems at any scale is challenging. The sensing and computation constraints of a microrobot platform make the problems harder. We present improvements to learning-based methods for on-board learning of locomotion, classification, and navigation of microrobots. We show how simulated locomotion can be achieved with model-based reinforcement learning via on-board sens… ▽ More

    Submitted 27 April, 2020; originally announced April 2020.

    Comments: 6 pages; 2 pages appendices

  30. arXiv:2002.04523  [pdf, other

    cs.LG cs.RO stat.ML

    Objective Mismatch in Model-based Reinforcement Learning

    Authors: Nathan Lambert, Brandon Amos, Omry Yadan, Roberto Calandra

    Abstract: Model-based reinforcement learning (MBRL) has been shown to be a powerful framework for data-efficiently learning control of continuous tasks. Recent work in MBRL has mostly focused on using more advanced function approximators and planning schemes, with little development of the general framework. In this paper, we identify a fundamental issue of the standard MBRL framework -- what we call the ob… ▽ More

    Submitted 18 April, 2021; v1 submitted 11 February, 2020; originally announced February 2020.

    Comments: 9 pages, 2 pages references, 5 pages appendices

    Journal ref: Proceedings of the 2nd Conference on Learning for Dynamics and Control, PMLR 120:761-770, 2020

  31. arXiv:1909.12324  [pdf, other

    cs.RO

    Learning Generalizable Locomotion Skills with Hierarchical Reinforcement Learning

    Authors: Tianyu Li, Nathan Lambert, Roberto Calandra, Franziska Meier, Akshara Rai

    Abstract: Learning to locomote to arbitrary goals on hardware remains a challenging problem for reinforcement learning. In this paper, we present a hierarchical learning framework that improves sample-efficiency and generalizability of locomotion skills on real-world robots. Our approach divides the problem of goal-oriented locomotion into two sub-problems: learning diverse primitives skills, and using mode… ▽ More

    Submitted 26 September, 2019; originally announced September 2019.

    Comments: Submitted to 2020 ICRA

  32. arXiv:1901.03737  [pdf, other

    cs.RO cs.LG

    Low Level Control of a Quadrotor with Deep Model-Based Reinforcement Learning

    Authors: Nathan O. Lambert, Daniel S. Drew, Joseph Yaconelli, Roberto Calandra, Sergey Levine, Kristofer S. J. Pister

    Abstract: Designing effective low-level robot controllers often entail platform-specific implementations that require manual heuristic parameter tuning, significant system knowledge, or long design times. With the rising number of robotic and mechatronic systems deployed across areas ranging from industrial automation to intelligent toys, the need for a general approach to generating low-level controllers i… ▽ More

    Submitted 19 July, 2019; v1 submitted 11 January, 2019; originally announced January 2019.

    Comments: Accepted to IROS and RA-L, 2019. For more information, see the website: https://sites.google.com/berkeley.edu/mbrl-quadrotor/. 9 pages, 12 figures

  33. arXiv:0802.1362  [pdf, ps, other

    cs.GT

    Complexity of Combinatorial Market Makers

    Authors: Yiling Chen, Lance Fortnow, Nicolas Lambert, David M. Pennock, Jennifer Wortman

    Abstract: We analyze the computational complexity of market maker pricing algorithms for combinatorial prediction markets. We focus on Hanson's popular logarithmic market scoring rule market maker (LMSR). Our goal is to implicitly maintain correct LMSR prices across an exponentially large outcome space. We examine both permutation combinatorics, where outcomes are permutations of objects, and Boolean comb… ▽ More

    Submitted 10 February, 2008; originally announced February 2008.

    ACM Class: J.4