Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–30 of 30 results for author: Brock, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.07839  [pdf, other

    cs.LG cs.AI cs.CL

    RecurrentGemma: Moving Past Transformers for Efficient Open Language Models

    Authors: Aleksandar Botev, Soham De, Samuel L Smith, Anushan Fernando, George-Cristian Muraru, Ruba Haroun, Leonard Berrada, Razvan Pascanu, Pier Giuseppe Sessa, Robert Dadashi, Léonard Hussenot, Johan Ferret, Sertan Girgin, Olivier Bachem, Alek Andreev, Kathleen Kenealy, Thomas Mesnard, Cassidy Hardin, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, Pouya Tafti , et al. (37 additional authors not shown)

    Abstract: We introduce RecurrentGemma, an open language model which uses Google's novel Griffin architecture. Griffin combines linear recurrences with local attention to achieve excellent performance on language. It has a fixed-sized state, which reduces memory use and enables efficient inference on long sequences. We provide a pre-trained model with 2B non-embedding parameters, and an instruction tuned var… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  2. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  3. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  4. arXiv:2310.16764  [pdf, other

    cs.CV cs.LG cs.NE

    ConvNets Match Vision Transformers at Scale

    Authors: Samuel L. Smith, Andrew Brock, Leonard Berrada, Soham De

    Abstract: Many researchers believe that ConvNets perform well on small or moderately sized datasets, but are not competitive with Vision Transformers when given access to datasets on the web-scale. We challenge this belief by evaluating a performant ConvNet architecture pre-trained on JFT-4B, a large labelled dataset of images often used for training foundation models. We consider pre-training compute budge… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

  5. arXiv:2302.10322  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    Deep Transformers without Shortcuts: Modifying Self-attention for Faithful Signal Propagation

    Authors: Bobby He, James Martens, Guodong Zhang, Aleksandar Botev, Andrew Brock, Samuel L Smith, Yee Whye Teh

    Abstract: Skip connections and normalisation layers form two standard architectural components that are ubiquitous for the training of Deep Neural Networks (DNNs), but whose precise roles are poorly understood. Recent approaches such as Deep Kernel Shaping have made progress towards reducing our reliance on them, using insights from wide NN kernel theory to improve signal propagation in vanilla DNNs (which… ▽ More

    Submitted 20 February, 2023; originally announced February 2023.

    Comments: ICLR 2023

  6. arXiv:2302.03130  [pdf, other

    cs.LG cs.CV

    Spatial Functa: Scaling Functa to ImageNet Classification and Generation

    Authors: Matthias Bauer, Emilien Dupont, Andy Brock, Dan Rosenbaum, Jonathan Richard Schwarz, Hyunjik Kim

    Abstract: Neural fields, also known as implicit neural representations, have emerged as a powerful means to represent complex signals of various modalities. Based on this Dupont et al. (2022) introduce a framework that views neural fields as data, termed *functa*, and proposes to do deep learning directly on this dataset of neural fields. In this work, we show that the proposed framework faces limitations w… ▽ More

    Submitted 9 February, 2023; v1 submitted 6 February, 2023; originally announced February 2023.

  7. Accessible Interactive Maps for Visually Impaired Users

    Authors: Julie Ducasse, Anke Brock, Christophe Jouffrais

    Abstract: Tactile maps are commonly used to give visually impaired users access to geographical representations. Although those relief maps are efficient tools for acquisition of spatial knowledge, they present several limitations and issues such as the need to read braille. Several research projects have been led during the past three decades in order to improve access to maps using interactive technologie… ▽ More

    Submitted 31 August, 2022; originally announced August 2022.

    Journal ref: Mobility in Visually Impaired People - Fundamentals and ICT Assistive Technologies, Springer, 2018

  8. Editorial: Special Issue on Collaborative Aspects of Open Data in Software EngineeringJohan

    Authors: Johan Linåker, Per Runeson, Anneke Zuiderwijk, Amanda Brock

    Abstract: High-quality data has become increasingly important to software engineers in designing and implementing today's software, for example, as an input to machine-learning algorithms and visualisation- and analytics-based features. Open data - i.e., data shared under a licence that gives users the right to study, process, and distribute the data to anyone and for any purpose - offers a mechanism to add… ▽ More

    Submitted 31 July, 2022; originally announced August 2022.

  9. arXiv:2206.14503  [pdf, other

    cs.GR

    Parallel Compositing of Volumetric Depth Images for Interactive Visualization of Distributed Volumes at High Frame Rates

    Authors: Aryaman Gupta, Pietro Incardona, Anton Brock, Guido Reina, Steffen Frey, Stefan Gumhold, Ulrik Günther, Ivo F. Sbalzarini

    Abstract: We present a parallel compositing algorithm for Volumetric Depth Images (VDIs) of large three-dimensional volume data. Large distributed volume data are routinely produced in both numerical simulations and experiments, yet it remains challenging to visualize them at smooth, interactive frame rates. VDIs are view-dependent piecewise constant representations of volume data that offer a potential sol… ▽ More

    Submitted 12 January, 2023; v1 submitted 29 June, 2022; originally announced June 2022.

    Comments: 12 pages, 12 figures. Submitted to EGPGV 2023

  10. arXiv:2204.14198  [pdf, other

    cs.CV cs.AI cs.LG

    Flamingo: a Visual Language Model for Few-Shot Learning

    Authors: Jean-Baptiste Alayrac, Jeff Donahue, Pauline Luc, Antoine Miech, Iain Barr, Yana Hasson, Karel Lenc, Arthur Mensch, Katie Millican, Malcolm Reynolds, Roman Ring, Eliza Rutherford, Serkan Cabi, Tengda Han, Zhitao Gong, Sina Samangooei, Marianne Monteiro, Jacob Menick, Sebastian Borgeaud, Andrew Brock, Aida Nematzadeh, Sahand Sharifzadeh, Mikolaj Binkowski, Ricardo Barreira, Oriol Vinyals , et al. (2 additional authors not shown)

    Abstract: Building models that can be rapidly adapted to novel tasks using only a handful of annotated examples is an open challenge for multimodal machine learning research. We introduce Flamingo, a family of Visual Language Models (VLM) with this ability. We propose key architectural innovations to: (i) bridge powerful pretrained vision-only and language-only models, (ii) handle sequences of arbitrarily i… ▽ More

    Submitted 15 November, 2022; v1 submitted 29 April, 2022; originally announced April 2022.

    Comments: 54 pages. In Proceedings of Neural Information Processing Systems (NeurIPS) 2022

  11. arXiv:2112.04426  [pdf, other

    cs.CL cs.LG

    Improving language models by retrieving from trillions of tokens

    Authors: Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Rutherford, Katie Millican, George van den Driessche, Jean-Baptiste Lespiau, Bogdan Damoc, Aidan Clark, Diego de Las Casas, Aurelia Guy, Jacob Menick, Roman Ring, Tom Hennigan, Saffron Huang, Loren Maggiore, Chris Jones, Albin Cassirer, Andy Brock, Michela Paganini, Geoffrey Irving, Oriol Vinyals, Simon Osindero, Karen Simonyan , et al. (3 additional authors not shown)

    Abstract: We enhance auto-regressive language models by conditioning on document chunks retrieved from a large corpus, based on local similarity with preceding tokens. With a $2$ trillion token database, our Retrieval-Enhanced Transformer (RETRO) obtains comparable performance to GPT-3 and Jurassic-1 on the Pile, despite using 25$\times$ fewer parameters. After fine-tuning, RETRO performance translates to d… ▽ More

    Submitted 7 February, 2022; v1 submitted 8 December, 2021; originally announced December 2021.

    Comments: Fix incorrect reported numbers in Table 14

  12. arXiv:2111.12124  [pdf, ps, other

    cs.SD eess.AS

    Towards Learning Universal Audio Representations

    Authors: Luyu Wang, Pauline Luc, Yan Wu, Adria Recasens, Lucas Smaira, Andrew Brock, Andrew Jaegle, Jean-Baptiste Alayrac, Sander Dieleman, Joao Carreira, Aaron van den Oord

    Abstract: The ability to learn universal audio representations that can solve diverse speech, music, and environment tasks can spur many applications that require general sound content understanding. In this work, we introduce a holistic audio representation evaluation suite (HARES) spanning 12 downstream tasks across audio domains and provide a thorough empirical study of recent sound representation learni… ▽ More

    Submitted 23 June, 2022; v1 submitted 23 November, 2021; originally announced November 2021.

  13. arXiv:2107.14795  [pdf, other

    cs.LG cs.CL cs.CV cs.SD eess.AS

    Perceiver IO: A General Architecture for Structured Inputs & Outputs

    Authors: Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, Joāo Carreira

    Abstract: A central goal of machine learning is the development of systems that can solve many problems in as many data domains as possible. Current architectures, however, cannot be applied beyond a small set of stereotyped settings, as they bake in domain & task assumptions or scale poorly to large inputs or outputs. In this work, we propose Perceiver IO, a general-purpose architecture that handles data f… ▽ More

    Submitted 15 March, 2022; v1 submitted 30 July, 2021; originally announced July 2021.

    Comments: ICLR 2022 camera ready. Code: https://dpmd.ai/perceiver-code

  14. arXiv:2105.13343  [pdf, other

    cs.LG cs.CV

    Drawing Multiple Augmentation Samples Per Image During Training Efficiently Decreases Test Error

    Authors: Stanislav Fort, Andrew Brock, Razvan Pascanu, Soham De, Samuel L. Smith

    Abstract: In computer vision, it is standard practice to draw a single sample from the data augmentation procedure for each unique image in the mini-batch. However recent work has suggested drawing multiple samples can achieve higher test accuracies. In this work, we provide a detailed empirical evaluation of how the number of augmentation samples per unique image influences model performance on held out da… ▽ More

    Submitted 24 February, 2022; v1 submitted 27 May, 2021; originally announced May 2021.

  15. Skillful Precipitation Nowcasting using Deep Generative Models of Radar

    Authors: Suman Ravuri, Karel Lenc, Matthew Willson, Dmitry Kangin, Remi Lam, Piotr Mirowski, Megan Fitzsimons, Maria Athanassiadou, Sheleem Kashem, Sam Madge, Rachel Prudden, Amol Mandhane, Aidan Clark, Andrew Brock, Karen Simonyan, Raia Hadsell, Niall Robinson, Ellen Clancy, Alberto Arribas, Shakir Mohamed

    Abstract: Precipitation nowcasting, the high-resolution forecasting of precipitation up to two hours ahead, supports the real-world socio-economic needs of many sectors reliant on weather-dependent decision-making. State-of-the-art operational nowcasting methods typically advect precipitation fields with radar-based wind estimates, and struggle to capture important non-linear events such as convective initi… ▽ More

    Submitted 2 April, 2021; originally announced April 2021.

    Comments: 46 pages, 17 figures, 2 tables

  16. arXiv:2103.03206  [pdf, other

    cs.CV cs.AI cs.LG cs.SD eess.AS

    Perceiver: General Perception with Iterative Attention

    Authors: Andrew Jaegle, Felix Gimeno, Andrew Brock, Andrew Zisserman, Oriol Vinyals, Joao Carreira

    Abstract: Biological systems perceive the world by simultaneously processing high-dimensional inputs from modalities as diverse as vision, audition, touch, proprioception, etc. The perception models used in deep learning on the other hand are designed for individual modalities, often relying on domain-specific assumptions such as the local grid structures exploited by virtually all existing vision models. T… ▽ More

    Submitted 22 June, 2021; v1 submitted 4 March, 2021; originally announced March 2021.

    Comments: ICML 2021

  17. arXiv:2102.06171  [pdf, other

    cs.CV cs.LG stat.ML

    High-Performance Large-Scale Image Recognition Without Normalization

    Authors: Andrew Brock, Soham De, Samuel L. Smith, Karen Simonyan

    Abstract: Batch normalization is a key component of most image classification models, but it has many undesirable properties stemming from its dependence on the batch size and interactions between examples. Although recent work has succeeded in training deep ResNets without normalization layers, these models do not match the test accuracies of the best batch-normalized networks, and are often unstable for l… ▽ More

    Submitted 11 February, 2021; originally announced February 2021.

  18. arXiv:2101.08692  [pdf, other

    cs.LG cs.CV stat.ML

    Characterizing signal propagation to close the performance gap in unnormalized ResNets

    Authors: Andrew Brock, Soham De, Samuel L. Smith

    Abstract: Batch Normalization is a key component in almost all state-of-the-art image classifiers, but it also introduces practical challenges: it breaks the independence between training examples within a batch, can incur compute and memory overhead, and often results in unexpected bugs. Building on recent theoretical analyses of deep ResNets at initialization, we propose a simple set of analysis tools to… ▽ More

    Submitted 27 January, 2021; v1 submitted 21 January, 2021; originally announced January 2021.

    Comments: Published as a conference paper at ICLR 2021

  19. arXiv:2010.15040  [pdf, other

    stat.ML cs.LG

    Training Generative Adversarial Networks by Solving Ordinary Differential Equations

    Authors: Chongli Qin, Yan Wu, Jost Tobias Springenberg, Andrew Brock, Jeff Donahue, Timothy P. Lillicrap, Pushmeet Kohli

    Abstract: The instability of Generative Adversarial Network (GAN) training has frequently been attributed to gradient descent. Consequently, recent methods have aimed to tailor the models and training procedures to stabilise the discrete updates. In contrast, we study the continuous-time dynamics induced by GAN training. Both theory and toy experiments suggest that these dynamics are in fact surprisingly st… ▽ More

    Submitted 28 November, 2020; v1 submitted 28 October, 2020; originally announced October 2020.

  20. arXiv:2010.10241  [pdf, ps, other

    stat.ML cs.CV cs.LG

    BYOL works even without batch statistics

    Authors: Pierre H. Richemond, Jean-Bastien Grill, Florent Altché, Corentin Tallec, Florian Strub, Andrew Brock, Samuel Smith, Soham De, Razvan Pascanu, Bilal Piot, Michal Valko

    Abstract: Bootstrap Your Own Latent (BYOL) is a self-supervised learning approach for image representation. From an augmented view of an image, BYOL trains an online network to predict a target network representation of a different augmented view of the same image. Unlike contrastive methods, BYOL does not explicitly use a repulsion term built from negative pairs in its training objective. Yet, it avoids co… ▽ More

    Submitted 20 October, 2020; originally announced October 2020.

  21. arXiv:2004.02967  [pdf, other

    cs.LG cs.CV cs.NE stat.ML

    Evolving Normalization-Activation Layers

    Authors: Hanxiao Liu, Andrew Brock, Karen Simonyan, Quoc V. Le

    Abstract: Normalization layers and activation functions are fundamental components in deep networks and typically co-locate with each other. Here we propose to design them using an automated approach. Instead of designing them separately, we unify them into a single tensor-to-tensor computation graph, and evolve its structure starting from basic mathematical functions. Examples of such mathematical function… ▽ More

    Submitted 17 July, 2020; v1 submitted 6 April, 2020; originally announced April 2020.

  22. arXiv:1902.00465  [pdf, other

    cs.LG cs.AI cs.DC stat.ML

    TF-Replicator: Distributed Machine Learning for Researchers

    Authors: Peter Buchlovsky, David Budden, Dominik Grewe, Chris Jones, John Aslanides, Frederic Besse, Andy Brock, Aidan Clark, Sergio Gómez Colmenarejo, Aedan Pope, Fabio Viola, Dan Belov

    Abstract: We describe TF-Replicator, a framework for distributed machine learning designed for DeepMind researchers and implemented as an abstraction over TensorFlow. TF-Replicator simplifies writing data-parallel and model-parallel research code. The same models can be effortlessly deployed to different cluster architectures (i.e. one or many machines containing CPUs, GPUs or TPU accelerators) using synchr… ▽ More

    Submitted 1 February, 2019; originally announced February 2019.

  23. arXiv:1809.11096  [pdf, other

    cs.LG stat.ML

    Large Scale GAN Training for High Fidelity Natural Image Synthesis

    Authors: Andrew Brock, Jeff Donahue, Karen Simonyan

    Abstract: Despite recent progress in generative image modeling, successfully generating high-resolution, diverse samples from complex datasets such as ImageNet remains an elusive goal. To this end, we train Generative Adversarial Networks at the largest scale yet attempted, and study the instabilities specific to such scale. We find that applying orthogonal regularization to the generator renders it amenabl… ▽ More

    Submitted 25 February, 2019; v1 submitted 28 September, 2018; originally announced September 2018.

  24. arXiv:1711.01297  [pdf, other

    stat.ML cs.LG

    Implicit Weight Uncertainty in Neural Networks

    Authors: Nick Pawlowski, Andrew Brock, Matthew C. H. Lee, Martin Rajchl, Ben Glocker

    Abstract: Modern neural networks tend to be overconfident on unseen, noisy or incorrectly labelled data and do not produce meaningful uncertainty measures. Bayesian deep learning aims to address this shortcoming with variational approximations (such as Bayes by Backprop or Multiplicative Normalising Flows). However, current approaches have limitations regarding flexibility and scalability. We introduce Baye… ▽ More

    Submitted 25 May, 2018; v1 submitted 3 November, 2017; originally announced November 2017.

    Comments: Submitted to NIPS 2018, under review

  25. arXiv:1708.05344  [pdf, other

    cs.LG

    SMASH: One-Shot Model Architecture Search through HyperNetworks

    Authors: Andrew Brock, Theodore Lim, J. M. Ritchie, Nick Weston

    Abstract: Designing architectures for deep neural networks requires expert knowledge and substantial computation time. We propose a technique to accelerate architecture selection by learning an auxiliary HyperNet that generates the weights of a main model conditioned on that model's architecture. By comparing the relative validation performance of networks with HyperNet-generated weights, we can effectively… ▽ More

    Submitted 17 August, 2017; originally announced August 2017.

  26. arXiv:1706.04983  [pdf, other

    stat.ML cs.LG

    FreezeOut: Accelerate Training by Progressively Freezing Layers

    Authors: Andrew Brock, Theodore Lim, J. M. Ritchie, Nick Weston

    Abstract: The early layers of a deep neural net have the fewest parameters, but take up the most computation. In this extended abstract, we propose to only train the hidden layers for a set portion of the training run, freezing them out one-by-one and excluding them from the backward pass. Through experiments on CIFAR, we empirically demonstrate that FreezeOut yields savings of up to 20% wall-clock time dur… ▽ More

    Submitted 18 June, 2017; v1 submitted 15 June, 2017; originally announced June 2017.

    Comments: Extended Abstract

  27. arXiv:1609.07093  [pdf, other

    cs.LG cs.CV cs.NE stat.ML

    Neural Photo Editing with Introspective Adversarial Networks

    Authors: Andrew Brock, Theodore Lim, J. M. Ritchie, Nick Weston

    Abstract: The increasingly photorealistic sample quality of generative image models suggests their feasibility in applications beyond image generation. We present the Neural Photo Editor, an interface that leverages the power of generative neural networks to make large, semantically coherent changes to existing images. To tackle the challenge of achieving accurate reconstructions without loss of feature qua… ▽ More

    Submitted 6 February, 2017; v1 submitted 22 September, 2016; originally announced September 2016.

    Comments: 10 pages, 7 figures, 3 tables

  28. arXiv:1608.04236  [pdf, other

    cs.CV cs.HC cs.LG stat.ML

    Generative and Discriminative Voxel Modeling with Convolutional Neural Networks

    Authors: Andrew Brock, Theodore Lim, J. M. Ritchie, Nick Weston

    Abstract: When working with three-dimensional data, choice of representation is key. We explore voxel-based models, and present evidence for the viability of voxellated representations in applications including shape modeling and object classification. Our key contributions are methods for training voxel-based variational autoencoders, a user interface for exploring the latent space learned by the autoencod… ▽ More

    Submitted 16 August, 2016; v1 submitted 15 August, 2016; originally announced August 2016.

    Comments: 9 pages, 5 figures, 2 tables

  29. Interactive audio-tactile maps for visually impaired people

    Authors: Anke Brock, Christophe Jouffrais

    Abstract: Visually impaired people face important challenges related to orientation and mobility. Indeed, 56% of visually impaired people in France declared having problems concerning autonomous mobility. These problems often mean that visually impaired people travel less, which influences their personal and professional life and can lead to exclusion from society. Therefore this issue presents a social cha… ▽ More

    Submitted 3 December, 2015; originally announced December 2015.

    Comments: \<http://dl.acm.org/citation.cfm?id=J956\&CFID=730680571\&CFTOKEN=17044974\>. \<10.1145/2850440.2850441\&gt

    Journal ref: ACM SIGACCESS Accessibility and Computing (ACM Digital Library), Association for Computing Machinery (ACM), 2015, pp.3-12.

  30. Design and User Satisfaction of Interactive Maps for Visually Impaired People

    Authors: Anke Brock, Philippe Truillet, Bernard Oriola, Delphine Picard, Christophe Jouffrais

    Abstract: Multimodal interactive maps are a solution for presenting spatial information to visually impaired people. In this paper, we present an interactive multimodal map prototype that is based on a tactile paper map, a multi-touch screen and audio output. We first describe the different steps for designing an interactive map: drawing and printing the tactile paper map, choice of multi-touch technology,… ▽ More

    Submitted 19 July, 2012; originally announced July 2012.

    Journal ref: ICCHP 2012 (2012) 544-551