Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–21 of 21 results for author: Fu, D Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2402.07440  [pdf, other

    cs.IR cs.LG

    Benchmarking and Building Long-Context Retrieval Models with LoCo and M2-BERT

    Authors: Jon Saad-Falcon, Daniel Y. Fu, Simran Arora, Neel Guha, Christopher Ré

    Abstract: Retrieval pipelines-an integral component of many machine learning systems-perform poorly in domains where documents are long (e.g., 10K tokens or more) and where identifying the relevant document requires synthesizing information across the entire text. Developing long-context retrieval encoders suitable for these domains raises three challenges: (1) how to evaluate long-context retrieval perform… ▽ More

    Submitted 13 February, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

  2. arXiv:2402.05099  [pdf, other

    cs.LG

    Hydragen: High-Throughput LLM Inference with Shared Prefixes

    Authors: Jordan Juravsky, Bradley Brown, Ryan Ehrlich, Daniel Y. Fu, Christopher Ré, Azalia Mirhoseini

    Abstract: Transformer-based large language models (LLMs) are now deployed to hundreds of millions of users. LLM inference is commonly performed on batches of sequences that share a prefix, such as few-shot examples or a chatbot system prompt. Decoding in this large-batch setting can be bottlenecked by the attention operation, which reads large key-value (KV) caches from memory and computes inefficient matri… ▽ More

    Submitted 13 May, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

  3. arXiv:2311.05908  [pdf, other

    cs.LG

    FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores

    Authors: Daniel Y. Fu, Hermann Kumbong, Eric Nguyen, Christopher Ré

    Abstract: Convolution models with long filters have demonstrated state-of-the-art reasoning abilities in many long-sequence tasks but lag behind the most optimized Transformers in wall-clock time. A major bottleneck is the Fast Fourier Transform (FFT)--which allows long convolutions to run in $O(N logN)$ time in sequence length $N$ but has poor hardware utilization. In this paper, we study how to optimize t… ▽ More

    Submitted 10 November, 2023; originally announced November 2023.

  4. arXiv:2310.18780  [pdf, other

    cs.LG cs.AI eess.SP

    Laughing Hyena Distillery: Extracting Compact Recurrences From Convolutions

    Authors: Stefano Massaroli, Michael Poli, Daniel Y. Fu, Hermann Kumbong, Rom N. Parnichkun, Aman Timalsina, David W. Romero, Quinn McIntyre, Beidi Chen, Atri Rudra, Ce Zhang, Christopher Re, Stefano Ermon, Yoshua Bengio

    Abstract: Recent advances in attention-free sequence models rely on convolutions as alternatives to the attention operator at the core of Transformers. In particular, long convolution sequence models have achieved state-of-the-art performance in many domains, but incur a significant cost during auto-regressive inference workloads -- naively requiring a full pass (or caching of activations) over the input se… ▽ More

    Submitted 28 October, 2023; originally announced October 2023.

  5. arXiv:2310.12109  [pdf, other

    cs.LG

    Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture

    Authors: Daniel Y. Fu, Simran Arora, Jessica Grogan, Isys Johnson, Sabri Eyuboglu, Armin W. Thomas, Benjamin Spector, Michael Poli, Atri Rudra, Christopher Ré

    Abstract: Machine learning models are increasingly being scaled in both sequence length and model dimension to reach longer contexts and better performance. However, existing architectures such as Transformers scale quadratically along both these axes. We ask: are there performant architectures that can scale sub-quadratically along sequence length and model dimension? We introduce Monarch Mixer (M2), a new… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

    Comments: NeurIPS 2023 (Oral)

  6. arXiv:2303.06865  [pdf, other

    cs.LG cs.AI cs.PF

    FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU

    Authors: Ying Sheng, Lianmin Zheng, Binhang Yuan, Zhuohan Li, Max Ryabinin, Daniel Y. Fu, Zhiqiang Xie, Beidi Chen, Clark Barrett, Joseph E. Gonzalez, Percy Liang, Christopher Ré, Ion Stoica, Ce Zhang

    Abstract: The high computational and memory requirements of large language model (LLM) inference make it feasible only with multiple high-end accelerators. Motivated by the emerging demand for latency-insensitive tasks with batched processing, this paper initiates the study of high-throughput LLM inference using limited resources, such as a single commodity GPU. We present FlexGen, a high-throughput generat… ▽ More

    Submitted 12 June, 2023; v1 submitted 13 March, 2023; originally announced March 2023.

  7. arXiv:2302.10866  [pdf, other

    cs.LG cs.CL

    Hyena Hierarchy: Towards Larger Convolutional Language Models

    Authors: Michael Poli, Stefano Massaroli, Eric Nguyen, Daniel Y. Fu, Tri Dao, Stephen Baccus, Yoshua Bengio, Stefano Ermon, Christopher Ré

    Abstract: Recent advances in deep learning have relied heavily on the use of large Transformers due to their ability to learn at scale. However, the core building block of Transformers, the attention operator, exhibits quadratic cost in sequence length, limiting the amount of context accessible. Existing subquadratic methods based on low-rank and sparse approximations need to be combined with dense attentio… ▽ More

    Submitted 19 April, 2023; v1 submitted 21 February, 2023; originally announced February 2023.

    Comments: Additional details

  8. arXiv:2302.06646  [pdf, other

    cs.LG

    Simple Hardware-Efficient Long Convolutions for Sequence Modeling

    Authors: Daniel Y. Fu, Elliot L. Epstein, Eric Nguyen, Armin W. Thomas, Michael Zhang, Tri Dao, Atri Rudra, Christopher Ré

    Abstract: State space models (SSMs) have high performance on long sequence modeling but require sophisticated initialization techniques and specialized implementations for high quality and runtime performance. We study whether a simple alternative can match SSMs in performance and efficiency: directly learning long convolutions over the sequence. We find that a key requirement to achieving high performance… ▽ More

    Submitted 13 February, 2023; originally announced February 2023.

  9. arXiv:2212.14052  [pdf, other

    cs.LG cs.CL

    Hungry Hungry Hippos: Towards Language Modeling with State Space Models

    Authors: Daniel Y. Fu, Tri Dao, Khaled K. Saab, Armin W. Thomas, Atri Rudra, Christopher Ré

    Abstract: State space models (SSMs) have demonstrated state-of-the-art sequence modeling performance in some modalities, but underperform attention in language modeling. Moreover, despite scaling nearly linearly in sequence length instead of quadratically, SSMs are still slower than Transformers due to poor hardware utilization. In this paper, we make progress on understanding the expressivity gap between S… ▽ More

    Submitted 28 April, 2023; v1 submitted 28 December, 2022; originally announced December 2022.

    Comments: ICLR 2023 Camera-Ready (Notable-top-25% / Spotlight)

  10. arXiv:2206.05252  [pdf, other

    cs.CV

    Lost in Transmission: On the Impact of Networking Corruptions on Video Machine Learning Models

    Authors: Trenton Chang, Daniel Y. Fu

    Abstract: We study how networking corruptions--data corruptions caused by networking errors--affect video machine learning (ML) models. We discover apparent networking corruptions in Kinetics-400, a benchmark video ML dataset. In a simulation study, we investigate (1) what artifacts networking corruptions cause, (2) how such artifacts affect ML models, and (3) whether standard robustness methods can mitigat… ▽ More

    Submitted 10 June, 2022; originally announced June 2022.

    Comments: 12 pages, 12 figures (with supplemental: 34 pages)

  11. arXiv:2205.14135  [pdf, other

    cs.LG

    FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

    Authors: Tri Dao, Daniel Y. Fu, Stefano Ermon, Atri Rudra, Christopher Ré

    Abstract: Transformers are slow and memory-hungry on long sequences, since the time and memory complexity of self-attention are quadratic in sequence length. Approximate attention methods have attempted to address this problem by trading off model quality to reduce the compute complexity, but often do not achieve wall-clock speedup. We argue that a missing principle is making attention algorithms IO-aware -… ▽ More

    Submitted 23 June, 2022; v1 submitted 27 May, 2022; originally announced May 2022.

  12. arXiv:2204.08173  [pdf, other

    cs.CL cs.LG

    TABi: Type-Aware Bi-Encoders for Open-Domain Entity Retrieval

    Authors: Megan Leszczynski, Daniel Y. Fu, Mayee F. Chen, Christopher Ré

    Abstract: Entity retrieval--retrieving information about entity mentions in a query--is a key step in open-domain tasks, such as question answering or fact checking. However, state-of-the-art entity retrievers struggle to retrieve rare entities for ambiguous mentions due to biases towards popular entities. Incorporating knowledge graph types during training could help overcome popularity biases, but there a… ▽ More

    Submitted 18 April, 2022; originally announced April 2022.

    Comments: Accepted to Findings of ACL 2022

  13. arXiv:2204.07596  [pdf, other

    stat.ML cs.LG

    Perfectly Balanced: Improving Transfer and Robustness of Supervised Contrastive Learning

    Authors: Mayee F. Chen, Daniel Y. Fu, Avanika Narayan, Michael Zhang, Zhao Song, Kayvon Fatahalian, Christopher Ré

    Abstract: An ideal learned representation should display transferability and robustness. Supervised contrastive learning (SupCon) is a promising method for training accurate models, but produces representations that do not capture these properties due to class collapse -- when all points in a class map to the same representation. Recent work suggests that "spreading out" these representations improves them,… ▽ More

    Submitted 13 July, 2022; v1 submitted 15 April, 2022; originally announced April 2022.

    Comments: ICML 2022 Camera Ready

  14. arXiv:2203.13270  [pdf, other

    stat.ML cs.LG

    Shoring Up the Foundations: Fusing Model Embeddings and Weak Supervision

    Authors: Mayee F. Chen, Daniel Y. Fu, Dyah Adila, Michael Zhang, Frederic Sala, Kayvon Fatahalian, Christopher Ré

    Abstract: Foundation models offer an exciting new paradigm for constructing models with out-of-the-box embeddings and a few labeled examples. However, it is not clear how to best apply foundation models without labeled data. A potential approach is to fuse foundation models with weak supervision frameworks, which use weak label sources -- pre-trained models, heuristics, crowd-workers -- to construct pseudol… ▽ More

    Submitted 1 August, 2022; v1 submitted 24 March, 2022; originally announced March 2022.

    Comments: UAI 2022 Camera Ready

  15. arXiv:2008.06007  [pdf, other

    cs.CY cs.MM

    Analyzing Who and What Appears in a Decade of US Cable TV News

    Authors: James Hong, Will Crichton, Haotian Zhang, Daniel Y. Fu, Jacob Ritchie, Jeremy Barenholtz, Ben Hannel, Xinwei Yao, Michaela Murray, Geraldine Moriba, Maneesh Agrawala, Kayvon Fatahalian

    Abstract: Cable TV news reaches millions of U.S. households each day, meaning that decisions about who appears on the news and what stories get covered can profoundly influence public opinion and discourse. We analyze a data set of nearly 24/7 video, audio, and text captions from three U.S. cable TV networks (CNN, FOX, and MSNBC) from January 2010 to July 2019. Using machine learning tools, we detect faces… ▽ More

    Submitted 24 January, 2022; v1 submitted 13 August, 2020; originally announced August 2020.

    Comments: Published in KDD 2021 as "Analysis of Faces in a Decade of US Cable TV News". ArXiv draft: 14 pages, 22 figures (15 pages, 16 figures in supplemental materials)

  16. arXiv:2006.15168  [pdf, other

    stat.ML cs.LG

    Train and You'll Miss It: Interactive Model Iteration with Weak Supervision and Pre-Trained Embeddings

    Authors: Mayee F. Chen, Daniel Y. Fu, Frederic Sala, Sen Wu, Ravi Teja Mullapudi, Fait Poms, Kayvon Fatahalian, Christopher Ré

    Abstract: Our goal is to enable machine learning systems to be trained interactively. This requires models that perform well and train quickly, without large amounts of hand-labeled data. We take a step forward in this direction by borrowing from weak supervision (WS), wherein models can be trained with noisy sources of signal instead of hand-labeled data. But WS relies on training downstream deep networks… ▽ More

    Submitted 26 June, 2020; originally announced June 2020.

  17. arXiv:2002.11955  [pdf, other

    stat.ML cs.LG

    Fast and Three-rious: Speeding Up Weak Supervision with Triplet Methods

    Authors: Daniel Y. Fu, Mayee F. Chen, Frederic Sala, Sarah M. Hooper, Kayvon Fatahalian, Christopher Ré

    Abstract: Weak supervision is a popular method for building machine learning models without relying on ground truth annotations. Instead, it generates probabilistic training labels by estimating the accuracies of multiple noisy labeling sources (e.g., heuristics, crowd workers). Existing approaches use latent variable estimation to model the noisy sources, but these methods can be computationally expensive,… ▽ More

    Submitted 15 July, 2020; v1 submitted 27 February, 2020; originally announced February 2020.

  18. arXiv:1910.09505  [pdf, other

    stat.ML cs.CV cs.LG

    Multi-Resolution Weak Supervision for Sequential Data

    Authors: Frederic Sala, Paroma Varma, Jason Fries, Daniel Y. Fu, Shiori Sagawa, Saelig Khattar, Ashwini Ramamoorthy, Ke Xiao, Kayvon Fatahalian, James Priest, Christopher Ré

    Abstract: Since manually labeling training data is slow and expensive, recent industrial and scientific research efforts have turned to weaker or noisier forms of supervision sources. However, existing weak supervision approaches fail to model multi-resolution sources for sequential data, like video, that can assign labels to individual elements or collections of elements in a sequence. A key challenge in w… ▽ More

    Submitted 21 October, 2019; originally announced October 2019.

    Comments: NeurIPS 2019 (Conference on Neural Information Processing Systems)

  19. arXiv:1910.02993  [pdf, other

    cs.DB cs.CL cs.CV cs.IR

    Rekall: Specifying Video Events using Compositions of Spatiotemporal Labels

    Authors: Daniel Y. Fu, Will Crichton, James Hong, Xinwei Yao, Haotian Zhang, Anh Truong, Avanika Narayan, Maneesh Agrawala, Christopher Ré, Kayvon Fatahalian

    Abstract: Many real-world video analysis applications require the ability to identify domain-specific events in video, such as interviews and commercials in TV news broadcasts, or action sequences in film. Unfortunately, pre-trained models to detect all the events of interest in video may not exist, and training new models from scratch can be costly and labor-intensive. In this paper, we explore the utility… ▽ More

    Submitted 7 October, 2019; originally announced October 2019.

  20. arXiv:1809.07684  [pdf, other

    cs.DC

    Automatic Parallelization of Sequential Programs

    Authors: Peter Kraft, Amos Waterland, Daniel Y Fu, Anitha Gollamudi, Shai Szulanski, Margo Seltzer

    Abstract: Prior work on Automatically Scalable Computation (ASC) suggests that it is possible to parallelize sequential computation by building a model of whole-program execution, using that model to predict future computations, and then speculatively executing those future computations. Although that prior work demonstrated scaling, it did not demonstrate speedup, because it ran entirely in emulation. We t… ▽ More

    Submitted 29 July, 2018; originally announced September 2018.

  21. arXiv:1804.08667  [pdf, other

    cs.MA cs.AI

    Influencing Flock Formation in Low-Density Settings

    Authors: Daniel Y. Fu, Emily S. Wang, Peter M. Krafft, Barbara J. Grosz

    Abstract: Flocking is a coordinated collective behavior that results from local sensing between individual agents that have a tendency to orient towards each other. Flocking is common among animal groups and might also be useful in robotic swarms. In the interest of learning how to control flocking behavior, recent work in the multiagent systems literature has explored the use of influencing agents for guid… ▽ More

    Submitted 23 April, 2018; originally announced April 2018.

    Comments: 9 pages, 5 figures, accepted to AAMAS 2018