Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–11 of 11 results for author: Thomas, A W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.17844  [pdf, other

    cs.LG

    Mechanistic Design and Scaling of Hybrid Architectures

    Authors: Michael Poli, Armin W Thomas, Eric Nguyen, Pragaash Ponnusamy, Björn Deiseroth, Kristian Kersting, Taiji Suzuki, Brian Hie, Stefano Ermon, Christopher Ré, Ce Zhang, Stefano Massaroli

    Abstract: The development of deep learning architectures is a resource-demanding process, due to a vast design space, long prototyping times, and high compute costs associated with at-scale model training and evaluation. We set out to simplify this process by grounding it in an end-to-end mechanistic architecture design (MAD) pipeline, encompassing small-scale capability unit tests predictive of scaling law… ▽ More

    Submitted 19 August, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

  2. arXiv:2310.12109  [pdf, other

    cs.LG

    Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture

    Authors: Daniel Y. Fu, Simran Arora, Jessica Grogan, Isys Johnson, Sabri Eyuboglu, Armin W. Thomas, Benjamin Spector, Michael Poli, Atri Rudra, Christopher Ré

    Abstract: Machine learning models are increasingly being scaled in both sequence length and model dimension to reach longer contexts and better performance. However, existing architectures such as Transformers scale quadratically along both these axes. We ask: are there performant architectures that can scale sub-quadratically along sequence length and model dimension? We introduce Monarch Mixer (M2), a new… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

    Comments: NeurIPS 2023 (Oral)

  3. arXiv:2302.06646  [pdf, other

    cs.LG

    Simple Hardware-Efficient Long Convolutions for Sequence Modeling

    Authors: Daniel Y. Fu, Elliot L. Epstein, Eric Nguyen, Armin W. Thomas, Michael Zhang, Tri Dao, Atri Rudra, Christopher Ré

    Abstract: State space models (SSMs) have high performance on long sequence modeling but require sophisticated initialization techniques and specialized implementations for high quality and runtime performance. We study whether a simple alternative can match SSMs in performance and efficiency: directly learning long convolutions over the sequence. We find that a key requirement to achieving high performance… ▽ More

    Submitted 13 February, 2023; originally announced February 2023.

  4. arXiv:2212.14052  [pdf, other

    cs.LG cs.CL

    Hungry Hungry Hippos: Towards Language Modeling with State Space Models

    Authors: Daniel Y. Fu, Tri Dao, Khaled K. Saab, Armin W. Thomas, Atri Rudra, Christopher Ré

    Abstract: State space models (SSMs) have demonstrated state-of-the-art sequence modeling performance in some modalities, but underperform attention in language modeling. Moreover, despite scaling nearly linearly in sequence length instead of quadratically, SSMs are still slower than Transformers due to poor hardware utilization. In this paper, we make progress on understanding the expressivity gap between S… ▽ More

    Submitted 28 April, 2023; v1 submitted 28 December, 2022; originally announced December 2022.

    Comments: ICLR 2023 Camera-Ready (Notable-top-25% / Spotlight)

  5. arXiv:2206.00649  [pdf, other

    q-bio.NC cs.LG

    Differentiable programming for functional connectomics

    Authors: Rastko Ciric, Armin W. Thomas, Oscar Esteban, Russell A. Poldrack

    Abstract: Mapping the functional connectome has the potential to uncover key insights into brain organisation. However, existing workflows for functional connectomics are limited in their adaptability to new data, and principled workflow design is a challenging combinatorial problem. We introduce a new analytic paradigm and software toolbox that implements common operations used in functional connectomics a… ▽ More

    Submitted 31 May, 2022; originally announced June 2022.

    Comments: 12 pages, 6 figures (Supplement: 10 pages, 3 figures). For associated code, see https://github.com/rciric/hypercoil

  6. arXiv:2205.15581  [pdf, other

    q-bio.NC cs.LG

    Comparing interpretation methods in mental state decoding analyses with deep learning models

    Authors: Armin W. Thomas, Christopher Ré, Russell A. Poldrack

    Abstract: Deep learning (DL) models find increasing application in mental state decoding, where researchers seek to understand the mapping between mental states (e.g., perceiving fear or joy) and brain activity by identifying those brain regions (and networks) whose activity allows to accurately identify (i.e., decode) these states. Once a DL model has been trained to accurately decode a set of mental state… ▽ More

    Submitted 14 October, 2022; v1 submitted 31 May, 2022; originally announced May 2022.

    Comments: 27 pages, 5 main figures

  7. arXiv:2111.01562  [pdf, other

    q-bio.NC cs.LG

    Evaluating deep transfer learning for whole-brain cognitive decoding

    Authors: Armin W. Thomas, Ulman Lindenberger, Wojciech Samek, Klaus-Robert Müller

    Abstract: Research in many fields has shown that transfer learning (TL) is well-suited to improve the performance of deep learning (DL) models in datasets with small numbers of samples. This empirical success has triggered interest in the application of TL to cognitive decoding analyses with functional neuroimaging data. Here, we systematically evaluate TL for the application of DL models to the decoding of… ▽ More

    Submitted 1 November, 2021; originally announced November 2021.

  8. arXiv:2108.07258  [pdf, other

    cs.LG cs.AI cs.CY

    On the Opportunities and Risks of Foundation Models

    Authors: Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stefano Ermon, John Etchemendy, Kawin Ethayarajh , et al. (89 additional authors not shown)

    Abstract: AI is undergoing a paradigm shift with the rise of models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks. We call these models foundation models to underscore their critically central yet incomplete character. This report provides a thorough account of the opportunities and risks of foundation models, ranging from their cap… ▽ More

    Submitted 12 July, 2022; v1 submitted 16 August, 2021; originally announced August 2021.

    Comments: Authored by the Center for Research on Foundation Models (CRFM) at the Stanford Institute for Human-Centered Artificial Intelligence (HAI). Report page with citation guidelines: https://crfm.stanford.edu/report.html

  9. arXiv:2108.06896  [pdf

    cs.LG stat.ME

    Challenges for cognitive decoding using deep learning methods

    Authors: Armin W. Thomas, Christopher Ré, Russell A. Poldrack

    Abstract: In cognitive decoding, researchers aim to characterize a brain region's representations by identifying the cognitive states (e.g., accepting/rejecting a gamble) that can be identified from the region's activity. Deep learning (DL) methods are highly promising for cognitive decoding, with their unmatched ability to learn versatile representations of complex data. Yet, their widespread application i… ▽ More

    Submitted 16 August, 2021; originally announced August 2021.

  10. arXiv:1907.01953  [pdf, other

    eess.IV cs.LG stat.ML

    Deep Transfer Learning For Whole-Brain fMRI Analyses

    Authors: Armin W. Thomas, Klaus-Robert Müller, Wojciech Samek

    Abstract: The application of deep learning (DL) models to the decoding of cognitive states from whole-brain functional Magnetic Resonance Imaging (fMRI) data is often hindered by the small sample size and high dimensionality of these datasets. Especially, in clinical settings, where patient data are scarce. In this work, we demonstrate that transfer learning represents a solution to this problem. Particular… ▽ More

    Submitted 2 July, 2019; originally announced July 2019.

    Comments: 8 pages, 3 figures

  11. arXiv:1810.09945  [pdf, other

    cs.LG cs.CV cs.NE q-bio.NC stat.ML

    Analyzing Neuroimaging Data Through Recurrent Deep Learning Models

    Authors: Armin W. Thomas, Hauke R. Heekeren, Klaus-Robert Müller, Wojciech Samek

    Abstract: The application of deep learning (DL) models to neuroimaging data poses several challenges, due to the high dimensionality, low sample size and complex temporo-spatial dependency structure of these datasets. Even further, DL models act as as black-box models, impeding insight into the association of cognitive state and brain activity. To approach these challenges, we introduce the DeepLight framew… ▽ More

    Submitted 5 April, 2019; v1 submitted 23 October, 2018; originally announced October 2018.

    Comments: 36 pages, 9 figures