Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–16 of 16 results for author: Frostig, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  2. arXiv:2209.10780  [pdf, other

    cs.RO cs.AI cs.LG

    Learning Model Predictive Controllers with Real-Time Attention for Real-World Navigation

    Authors: Xuesu Xiao, Tingnan Zhang, Krzysztof Choromanski, Edward Lee, Anthony Francis, Jake Varley, Stephen Tu, Sumeet Singh, Peng Xu, Fei Xia, Sven Mikael Persson, Dmitry Kalashnikov, Leila Takayama, Roy Frostig, Jie Tan, Carolina Parada, Vikas Sindhwani

    Abstract: Despite decades of research, existing navigation systems still face real-world challenges when deployed in the wild, e.g., in cluttered home environments or in human-occupied public spaces. To address this, we present a new class of implicit control policies combining the benefits of imitation learning with the robust handling of system constraints from Model Predictive Control (MPC). Our approach… ▽ More

    Submitted 23 September, 2022; v1 submitted 22 September, 2022; originally announced September 2022.

  3. arXiv:2204.10923  [pdf, other

    cs.PL

    You Only Linearize Once: Tangents Transpose to Gradients

    Authors: Alexey Radul, Adam Paszke, Roy Frostig, Matthew Johnson, Dougal Maclaurin

    Abstract: Automatic differentiation (AD) is conventionally understood as a family of distinct algorithms, rooted in two "modes" -- forward and reverse -- which are typically presented (and implemented) separately. Can there be only one? Following up on the AD systems developed in the JAX and Dex projects, we formalize a decomposition of reverse-mode AD into (i) forward-mode AD followed by (ii) unzipping the… ▽ More

    Submitted 6 December, 2022; v1 submitted 22 April, 2022; originally announced April 2022.

  4. arXiv:2203.17193  [pdf, other

    cs.LG stat.ML

    Learning from many trajectories

    Authors: Stephen Tu, Roy Frostig, Mahdi Soltanolkotabi

    Abstract: We initiate a study of supervised learning from many independent sequences ("trajectories") of non-independent covariates, reflecting tasks in sequence modeling, control, and reinforcement learning. Conceptually, our multi-trajectory setup sits between two traditional settings in statistical learning theory: learning from independent examples and learning from a single auto-correlated sequence. Ou… ▽ More

    Submitted 31 January, 2023; v1 submitted 31 March, 2022; originally announced March 2022.

  5. arXiv:2105.15183  [pdf, other

    cs.LG math.NA stat.ML

    Efficient and Modular Implicit Differentiation

    Authors: Mathieu Blondel, Quentin Berthet, Marco Cuturi, Roy Frostig, Stephan Hoyer, Felipe Llinares-López, Fabian Pedregosa, Jean-Philippe Vert

    Abstract: Automatic differentiation (autodiff) has revolutionized machine learning. It allows to express complex computations by composing elementary ones in creative ways and removes the burden of computing their derivatives by hand. More recently, differentiation of optimization problem solutions has attracted widespread attention with applications such as optimization layers, and in bi-level problems suc… ▽ More

    Submitted 12 October, 2022; v1 submitted 31 May, 2021; originally announced May 2021.

    Comments: V3: added more related work and Jacobian precision figure

  6. arXiv:2105.09469  [pdf, other

    cs.PL cs.LG

    Decomposing reverse-mode automatic differentiation

    Authors: Roy Frostig, Matthew J. Johnson, Dougal Maclaurin, Adam Paszke, Alexey Radul

    Abstract: We decompose reverse-mode automatic differentiation into (forward-mode) linearization followed by transposition. Doing so isolates the essential difference between forward- and reverse-mode AD, and simplifies their joint implementation. In particular, once forward-mode AD rules are defined for every primitive operation in a source language, only linear primitives require an additional transpositio… ▽ More

    Submitted 19 May, 2021; originally announced May 2021.

    Comments: Presented at the LAFI 2021 workshop at POPL, 17 January 2021

  7. arXiv:1905.10360  [pdf, other

    cs.LG cs.DS stat.ML

    The advantages of multiple classes for reducing overfitting from test set reuse

    Authors: Vitaly Feldman, Roy Frostig, Moritz Hardt

    Abstract: Excessive reuse of holdout data can lead to overfitting. However, there is little concrete evidence of significant overfitting due to holdout reuse in popular multiclass benchmarks today. Known results show that, in the worst-case, revealing the accuracy of $k$ adaptively chosen classifiers on a data set of size $n$ allows to create a classifier with bias of $Θ(\sqrt{k/n})$ for any binary predicti… ▽ More

    Submitted 24 May, 2019; originally announced May 2019.

  8. arXiv:1811.03600  [pdf, other

    cs.LG stat.ML

    Measuring the Effects of Data Parallelism on Neural Network Training

    Authors: Christopher J. Shallue, Jaehoon Lee, Joseph Antognini, Jascha Sohl-Dickstein, Roy Frostig, George E. Dahl

    Abstract: Recent hardware developments have dramatically increased the scale of data parallelism available for neural network training. Among the simplest ways to harness next-generation hardware is to increase the batch size in standard mini-batch neural network training algorithms. In this work, we aim to experimentally characterize the effects of increasing the batch size on training time, as measured by… ▽ More

    Submitted 18 July, 2019; v1 submitted 8 November, 2018; originally announced November 2018.

    Journal ref: Journal of Machine Learning Research 20 (2019) 1-49

  9. arXiv:1703.07872  [pdf, other

    cs.LG

    Random Features for Compositional Kernels

    Authors: Amit Daniely, Roy Frostig, Vineet Gupta, Yoram Singer

    Abstract: We describe and analyze a simple random feature scheme (RFS) from prescribed compositional kernels. The compositional kernels we use are inspired by the structure of convolutional neural networks and kernels. The resulting scheme yields sparse and efficiently computable features. Each random feature can be represented as an algebraic expression over a small number of (random) paths in a compositio… ▽ More

    Submitted 22 March, 2017; originally announced March 2017.

  10. arXiv:1608.03100  [pdf, other

    stat.ML cs.LG

    Estimation from Indirect Supervision with Linear Moments

    Authors: Aditi Raghunathan, Roy Frostig, John Duchi, Percy Liang

    Abstract: In structured prediction problems where we have indirect supervision of the output, maximum marginal likelihood faces two computational obstacles: non-convexity of the objective and intractability of even a single gradient computation. In this paper, we bypass both obstacles for a class of what we call linear indirectly-supervised problems. Our approach is simple: we solve a linear system to estim… ▽ More

    Submitted 10 August, 2016; originally announced August 2016.

    Comments: 12 pages, 7 figures, extended and updated version of our paper appearing in ICML 2016

  11. arXiv:1602.06872  [pdf, ps, other

    cs.DS cs.LG stat.ML

    Principal Component Projection Without Principal Component Analysis

    Authors: Roy Frostig, Cameron Musco, Christopher Musco, Aaron Sidford

    Abstract: We show how to efficiently project a vector onto the top principal components of a matrix, without explicitly computing these components. Specifically, we introduce an iterative algorithm that provably computes the projection using few calls to any black-box routine for ridge regression. By avoiding explicit principal component analysis (PCA), our algorithm is the first with no runtime dependenc… ▽ More

    Submitted 26 November, 2019; v1 submitted 22 February, 2016; originally announced February 2016.

  12. arXiv:1602.05897  [pdf, other

    cs.LG cs.AI cs.CC cs.DS stat.ML

    Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity

    Authors: Amit Daniely, Roy Frostig, Yoram Singer

    Abstract: We develop a general duality between neural networks and compositional kernels, striving towards a better understanding of deep learning. We show that initial representations generated by common random initializations are sufficiently rich to express all functions in the dual kernel space. Hence, though the training objective is hard to optimize in the worst case, the initial weights form a good s… ▽ More

    Submitted 19 May, 2017; v1 submitted 18 February, 2016; originally announced February 2016.

  13. arXiv:1506.07512  [pdf, other

    stat.ML cs.DS cs.LG

    Un-regularizing: approximate proximal point and faster stochastic algorithms for empirical risk minimization

    Authors: Roy Frostig, Rong Ge, Sham M. Kakade, Aaron Sidford

    Abstract: We develop a family of accelerated stochastic algorithms that minimize sums of convex functions. Our algorithms improve upon the fastest running time for empirical risk minimization (ERM), and in particular linear least-squares regression, across a wide range of problem settings. To achieve this, we establish a framework based on the classical proximal point algorithm. Namely, we provide several a… ▽ More

    Submitted 24 June, 2015; originally announced June 2015.

  14. arXiv:1412.6606  [pdf, other

    stat.ML cs.LG

    Competing with the Empirical Risk Minimizer in a Single Pass

    Authors: Roy Frostig, Rong Ge, Sham M. Kakade, Aaron Sidford

    Abstract: In many estimation problems, e.g. linear and logistic regression, we wish to minimize an unknown objective given only unbiased samples of the objective function. Furthermore, we aim to achieve this using as few samples as possible. In the absence of computational constraints, the minimizer of a sample average of observed data -- commonly referred to as either the empirical risk minimizer (ERM) or… ▽ More

    Submitted 25 February, 2015; v1 submitted 20 December, 2014; originally announced December 2014.

  15. arXiv:1408.2270  [pdf, ps, other

    cs.DS cs.CC math.OC

    A sub-constant improvement in approximating the positive semidefinite Grothendieck problem

    Authors: Roy Frostig, Sida I. Wang

    Abstract: Semidefinite relaxations are a powerful tool for approximately solving combinatorial optimization problems such as MAX-CUT and the Grothendieck problem. By exploiting a bounded rank property of extreme points in the semidefinite cone, we make a sub-constant improvement in the approximation ratio of one such problem. Precisely, we describe a polynomial-time algorithm for the positive semidefinite G… ▽ More

    Submitted 10 August, 2014; originally announced August 2014.

  16. arXiv:1312.6205  [pdf, other

    stat.ML cs.LG

    Relaxations for inference in restricted Boltzmann machines

    Authors: Sida I. Wang, Roy Frostig, Percy Liang, Christopher D. Manning

    Abstract: We propose a relaxation-based approximate inference algorithm that samples near-MAP configurations of a binary pairwise Markov random field. We experiment on MAP inference tasks in several restricted Boltzmann machines. We also use our underlying sampler to estimate the log-partition function of restricted Boltzmann machines and compare against other sampling-based methods.

    Submitted 2 January, 2014; v1 submitted 20 December, 2013; originally announced December 2013.

    Comments: ICLR 2014 workshop track submission