Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 98 results for author: Rush, A M

.
  1. arXiv:2407.17469  [pdf, other

    cs.CL cs.AI

    I Could've Asked That: Reformulating Unanswerable Questions

    Authors: Wenting Zhao, Ge Gao, Claire Cardie, Alexander M. Rush

    Abstract: When seeking information from unfamiliar documents, users frequently pose questions that cannot be answered by the documents. While existing large language models (LLMs) identify these unanswerable questions, they do not assist users in reformulating their questions, thereby reducing their overall utility. We curate CouldAsk, an evaluation benchmark composed of existing and new datasets for docume… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  2. arXiv:2406.16635  [pdf, other

    cs.LG cs.AI cs.CL

    ShadowLLM: Predictor-based Contextual Sparsity for Large Language Models

    Authors: Yash Akhauri, Ahmed F AbouElhamayed, Jordan Dotzel, Zhiru Zhang, Alexander M Rush, Safeen Huda, Mohamed S Abdelfattah

    Abstract: The high power consumption and latency-sensitive deployments of large language models (LLMs) have motivated techniques like quantization and sparsity. Contextual sparsity, where the sparsity pattern is input-dependent, is crucial in LLMs because the permanent removal of attention heads or neurons from LLMs can significantly degrade accuracy. Prior work has attempted to model contextual sparsity us… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  3. arXiv:2404.01626  [pdf, other

    cs.CL cs.IR

    Entity Disambiguation via Fusion Entity Decoding

    Authors: Junxiong Wang, Ali Mousavi, Omar Attia, Ronak Pradeep, Saloni Potdar, Alexander M. Rush, Umar Farooq Minhas, Yunyao Li

    Abstract: Entity disambiguation (ED), which links the mentions of ambiguous entities to their referent entities in a knowledge base, serves as a core component in entity linking (EL). Existing generative approaches demonstrate improved accuracy compared to classification approaches under the standardized ZELDA benchmark. Nevertheless, generative approaches suffer from the need for large-scale pre-training a… ▽ More

    Submitted 7 May, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: Accepted at NAACL'24 main

  4. arXiv:2401.13660  [pdf, other

    cs.CL cs.LG

    MambaByte: Token-free Selective State Space Model

    Authors: Junxiong Wang, Tushaar Gangavarapu, Jing Nathan Yan, Alexander M. Rush

    Abstract: Token-free language models learn directly from raw bytes and remove the inductive bias of subword tokenization. Operating on bytes, however, results in significantly longer sequences. In this setting, standard autoregressive Transformers scale poorly as the effective memory required grows with sequence length. The recent development of the Mamba state space model (SSM) offers an appealing alternat… ▽ More

    Submitted 2 April, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

  5. arXiv:2311.18257  [pdf, other

    cs.CV cs.LG

    Diffusion Models Without Attention

    Authors: Jing Nathan Yan, Jiatao Gu, Alexander M. Rush

    Abstract: In recent advancements in high-fidelity image generation, Denoising Diffusion Probabilistic Models (DDPMs) have emerged as a key player. However, their application at high resolutions presents significant computational challenges. Current methods, such as patchifying, expedite processes in UNet and Transformer architectures but at the expense of representational capacity. Addressing this, we intro… ▽ More

    Submitted 30 November, 2023; originally announced November 2023.

  6. arXiv:2311.13647  [pdf, other

    cs.CL cs.LG

    Language Model Inversion

    Authors: John X. Morris, Wenting Zhao, Justin T. Chiu, Vitaly Shmatikov, Alexander M. Rush

    Abstract: Language models produce a distribution over the next token; can we use this information to recover the prompt tokens? We consider the problem of language model inversion and show that next-token probabilities contain a surprising amount of information about the preceding text. Often we can recover the text in cases where it is hidden from the user, motivating a method for recovering unknown prompt… ▽ More

    Submitted 22 November, 2023; originally announced November 2023.

  7. arXiv:2311.08390  [pdf, other

    cs.CL

    Predicting Text Preference Via Structured Comparative Reasoning

    Authors: Jing Nathan Yan, Tianqi Liu, Justin T Chiu, Jiaming Shen, Zhen Qin, Yue Yu, Yao Zhao, Charu Lakshmanan, Yair Kurzion, Alexander M. Rush, Jialu Liu, Michael Bendersky

    Abstract: Comparative reasoning plays a crucial role in text preference prediction; however, large language models (LLMs) often demonstrate inconsistencies in their reasoning. While approaches like Chain-of-Thought improve accuracy in many other settings, they struggle to consistently distinguish the similarities and differences of complex texts. We introduce SC, a prompting approach that predicts text pref… ▽ More

    Submitted 1 July, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

  8. arXiv:2311.00430  [pdf, other

    cs.CL cs.SD eess.AS

    Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling

    Authors: Sanchit Gandhi, Patrick von Platen, Alexander M. Rush

    Abstract: As the size of pre-trained speech recognition models increases, running these large models in low-latency or resource-constrained environments becomes challenging. In this work, we leverage pseudo-labelling to assemble a large-scale open-source dataset which we use to distill the Whisper model into a smaller variant, called Distil-Whisper. Using a simple word error rate (WER) heuristic, we select… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

    Comments: 30 pages, 2 figures, 25 tables

  9. arXiv:2310.17140  [pdf, other

    cs.CL cs.AI

    Symbolic Planning and Code Generation for Grounded Dialogue

    Authors: Justin T. Chiu, Wenting Zhao, Derek Chen, Saujas Vaduguru, Alexander M. Rush, Daniel Fried

    Abstract: Large language models (LLMs) excel at processing and generating both text and code. However, LLMs have had limited applicability in grounded task-oriented dialogue as they are difficult to steer toward task objectives and fail to handle novel grounding. We present a modular and interpretable grounded dialogue system that addresses these shortcomings by composing LLMs with a symbolic planner and gr… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

    Comments: Accepted to EMNLP 2023

  10. arXiv:2310.16944  [pdf, other

    cs.LG cs.CL

    Zephyr: Direct Distillation of LM Alignment

    Authors: Lewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sanseviero, Alexander M. Rush, Thomas Wolf

    Abstract: We aim to produce a smaller language model that is aligned to user intent. Previous research has shown that applying distilled supervised fine-tuning (dSFT) on larger models significantly improves task accuracy; however, these models are unaligned, i.e. they do not respond well to natural prompts. To distill this property, we experiment with the use of preference data from AI Feedback (AIF). Start… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

  11. arXiv:2310.14034  [pdf, other

    cs.CL cs.LG

    Tree Prompting: Efficient Task Adaptation without Fine-Tuning

    Authors: John X. Morris, Chandan Singh, Alexander M. Rush, Jianfeng Gao, Yuntian Deng

    Abstract: Prompting language models (LMs) is the main interface for applying them to new tasks. However, for smaller LMs, prompting provides low accuracy compared to gradient-based finetuning. Tree Prompting is an approach to prompting which builds a decision tree of prompts, linking multiple LM calls together to solve a task. At inference time, each call to the LM is determined by efficiently routing the o… ▽ More

    Submitted 21 October, 2023; originally announced October 2023.

    Comments: Both first authors contributed equally; accepted to EMNLP 2023

  12. arXiv:2310.06816  [pdf, other

    cs.CL cs.LG

    Text Embeddings Reveal (Almost) As Much As Text

    Authors: John X. Morris, Volodymyr Kuleshov, Vitaly Shmatikov, Alexander M. Rush

    Abstract: How much private information do text embeddings reveal about the original text? We investigate the problem of embedding \textit{inversion}, reconstructing the full text represented in dense text embeddings. We frame the problem as controlled generation: generating text that, when reembedded, is close to a fixed point in latent space. We find that although a naĂŻve model conditioned on the embedding… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

    Comments: Accepted at EMNLP 2023

  13. arXiv:2309.14396  [pdf, other

    cs.SE cs.LG cs.PL

    Guess & Sketch: Language Model Guided Transpilation

    Authors: Celine Lee, Abdulrahman Mahmoud, Michal Kurek, Simone Campanoni, David Brooks, Stephen Chong, Gu-Yeon Wei, Alexander M. Rush

    Abstract: Maintaining legacy software requires many software and systems engineering hours. Assembly code programs, which demand low-level control over the computer machine state and have no variable names, are particularly difficult for humans to analyze. Existing conventional program translators guarantee correctness, but are hand-engineered for the source and target programming languages in question. Lea… ▽ More

    Submitted 15 March, 2024; v1 submitted 25 September, 2023; originally announced September 2023.

  14. arXiv:2306.16527  [pdf, other

    cs.IR cs.CV

    OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents

    Authors: Hugo Laurençon, Lucile Saulnier, Léo Tronchon, Stas Bekman, Amanpreet Singh, Anton Lozhkov, Thomas Wang, Siddharth Karamcheti, Alexander M. Rush, Douwe Kiela, Matthieu Cord, Victor Sanh

    Abstract: Large multimodal models trained on natural documents, which interleave images and text, outperform models trained on image-text pairs on various multimodal benchmarks. However, the datasets used to train these models have not been released, and the collection process has not been fully specified. We introduce the OBELICS dataset, an open web-scale filtered dataset of interleaved image-text documen… ▽ More

    Submitted 21 August, 2023; v1 submitted 21 June, 2023; originally announced June 2023.

  15. arXiv:2305.16264  [pdf, other

    cs.CL cs.AI cs.LG

    Scaling Data-Constrained Language Models

    Authors: Niklas Muennighoff, Alexander M. Rush, Boaz Barak, Teven Le Scao, Aleksandra Piktus, Nouamane Tazi, Sampo Pyysalo, Thomas Wolf, Colin Raffel

    Abstract: The current trend of scaling language models involves increasing both parameter count and training dataset size. Extrapolating this trend suggests that training dataset size may soon be limited by the amount of text data available on the internet. Motivated by this limit, we investigate scaling language models in data-constrained regimes. Specifically, we run a large set of experiments varying the… ▽ More

    Submitted 25 October, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

    Comments: 50 pages (9 main), 39 figures, 15 tables

  16. arXiv:2305.14618  [pdf, other

    cs.CL cs.AI

    Abductive Commonsense Reasoning Exploiting Mutually Exclusive Explanations

    Authors: Wenting Zhao, Justin T. Chiu, Claire Cardie, Alexander M. Rush

    Abstract: Abductive reasoning aims to find plausible explanations for an event. This style of reasoning is critical for commonsense tasks where there are often multiple plausible explanations. Existing approaches for abductive reasoning in natural language processing (NLP) often rely on manually generated annotations for supervision; however, such annotations can be subjective and biased. Instead of using d… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

    Comments: accepted at ACL'23

  17. arXiv:2305.14237  [pdf, ps, other

    cs.CL cs.AI

    HOP, UNION, GENERATE: Explainable Multi-hop Reasoning without Rationale Supervision

    Authors: Wenting Zhao, Justin T. Chiu, Claire Cardie, Alexander M. Rush

    Abstract: Explainable multi-hop question answering (QA) not only predicts answers but also identifies rationales, i. e. subsets of input sentences used to derive the answers. This problem has been extensively studied under the supervised setting, where both answer and rationale annotations are given. Because rationale annotations are expensive to collect and not always available, recent efforts have been de… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

  18. arXiv:2212.10544  [pdf, other

    cs.CL cs.LG

    Pretraining Without Attention

    Authors: Junxiong Wang, Jing Nathan Yan, Albert Gu, Alexander M. Rush

    Abstract: Transformers have been essential to pretraining success in NLP. While other architectures have been used, downstream accuracy is either significantly worse, or requires attention layers to match standard benchmarks such as GLUE. This work explores pretraining without attention by using recent advances in sequence routing based on state-space models (SSMs). Our proposed model, Bidirectional Gated S… ▽ More

    Submitted 8 May, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

  19. arXiv:2211.05100  [pdf, other

    cs.CL

    BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

    Authors: BigScience Workshop, :, Teven Le Scao, Angela Fan, Christopher Akiki, Ellie Pavlick, Suzana Ilić, Daniel Hesslow, Roman Castagné, Alexandra Sasha Luccioni, François Yvon, Matthias Gallé, Jonathan Tow, Alexander M. Rush, Stella Biderman, Albert Webson, Pawan Sasanka Ammanamanchi, Thomas Wang, Benoît Sagot, Niklas Muennighoff, Albert Villanova del Moral, Olatunji Ruwase, Rachel Bawden, Stas Bekman, Angelina McMillan-Major , et al. (369 additional authors not shown)

    Abstract: Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access… ▽ More

    Submitted 27 June, 2023; v1 submitted 9 November, 2022; originally announced November 2022.

  20. arXiv:2210.13763  [pdf, other

    cs.NI cs.LG

    Teal: Learning-Accelerated Optimization of WAN Traffic Engineering

    Authors: Zhiying Xu, Francis Y. Yan, Rachee Singh, Justin T. Chiu, Alexander M. Rush, Minlan Yu

    Abstract: The rapid expansion of global cloud wide-area networks (WANs) has posed a challenge for commercial optimization engines to efficiently solve network traffic engineering (TE) problems at scale. Existing acceleration strategies decompose TE optimization into concurrent subproblems but realize limited parallelism due to an inherent tradeoff between run time and allocation performance. We present Te… ▽ More

    Submitted 19 May, 2024; v1 submitted 25 October, 2022; originally announced October 2022.

  21. arXiv:2210.13352  [pdf, other

    cs.CL cs.SD eess.AS

    ESB: A Benchmark For Multi-Domain End-to-End Speech Recognition

    Authors: Sanchit Gandhi, Patrick von Platen, Alexander M. Rush

    Abstract: Speech recognition applications cover a range of different audio and text distributions, with different speaking styles, background noise, transcription punctuation and character casing. However, many speech recognition systems require dataset-specific tuning (audio filtering, punctuation removal and normalisation of casing), therefore assuming a-priori knowledge of both the audio and text distrib… ▽ More

    Submitted 24 October, 2022; originally announced October 2022.

    Comments: 25 pages, 1 figure, submitted to ICLR 2023

  22. arXiv:2210.11528  [pdf, other

    cs.CL

    Unsupervised Text Deidentification

    Authors: John X. Morris, Justin T. Chiu, Ramin Zabih, Alexander M. Rush

    Abstract: Deidentification seeks to anonymize textual data prior to distribution. Automatic deidentification primarily uses supervised named entity recognition from human-labeled data points. We propose an unsupervised deidentification method that masks words that leak personally-identifying information. The approach utilizes a specially trained reidentification model to identify individuals from redacted p… ▽ More

    Submitted 20 October, 2022; originally announced October 2022.

    Comments: Findings of EMNLP 2022

  23. arXiv:2210.08444  [pdf, other

    cs.CL cs.LG stat.ML

    Model Criticism for Long-Form Text Generation

    Authors: Yuntian Deng, Volodymyr Kuleshov, Alexander M. Rush

    Abstract: Language models have demonstrated the ability to generate highly fluent text; however, it remains unclear whether their output retains coherent high-level structure (e.g., story progression). Here, we propose to apply a statistical tool, model criticism in latent space, to evaluate the high-level structure of the generated text. Model criticism compares the distributions between real and generated… ▽ More

    Submitted 16 October, 2022; originally announced October 2022.

    Comments: EMNLP 2022

  24. arXiv:2210.05147  [pdf, other

    cs.LG cs.CL cs.CV

    Markup-to-Image Diffusion Models with Scheduled Sampling

    Authors: Yuntian Deng, Noriyuki Kojima, Alexander M. Rush

    Abstract: Building on recent advances in image generation, we present a fully data-driven approach to rendering markup into images. The approach is based on diffusion models, which parameterize the distribution of data using a sequence of denoising operations on top of a Gaussian noise distribution. We view the diffusion denoising process as a sequential decision making process, and show that it exhibits co… ▽ More

    Submitted 11 October, 2022; originally announced October 2022.

  25. arXiv:2210.01970  [pdf, other

    cs.LG

    Evaluate & Evaluation on the Hub: Better Best Practices for Data and Model Measurements

    Authors: Leandro von Werra, Lewis Tunstall, Abhishek Thakur, Alexandra Sasha Luccioni, Tristan Thrush, Aleksandra Piktus, Felix Marty, Nazneen Rajani, Victor Mustar, Helen Ngo, Omar Sanseviero, Mario Šaško, Albert Villanova, Quentin Lhoest, Julien Chaumond, Margaret Mitchell, Alexander M. Rush, Thomas Wolf, Douwe Kiela

    Abstract: Evaluation is a key part of machine learning (ML), yet there is a lack of support and tooling to enable its informed and systematic practice. We introduce Evaluate and Evaluation on the Hub --a set of tools to facilitate the evaluation of models and datasets in ML. Evaluate is a library to support best practices for measurements, metrics, and comparisons of data and models. Its goal is to support… ▽ More

    Submitted 6 October, 2022; v1 submitted 30 September, 2022; originally announced October 2022.

  26. arXiv:2210.01848  [pdf, other

    cs.LG cs.AI cs.CL q-bio.NC stat.ML

    Explaining Patterns in Data with Language Models via Interpretable Autoprompting

    Authors: Chandan Singh, John X. Morris, Jyoti Aneja, Alexander M. Rush, Jianfeng Gao

    Abstract: Large language models (LLMs) have displayed an impressive ability to harness natural language to perform complex tasks. In this work, we explore whether we can leverage this learned ability to find and explain patterns in data. Specifically, given a pre-trained LLM and data examples, we introduce interpretable autoprompting (iPrompt), an algorithm that generates a natural-language string explainin… ▽ More

    Submitted 26 January, 2023; v1 submitted 4 October, 2022; originally announced October 2022.

    Comments: The two first authors contributed equally

  27. arXiv:2208.07852  [pdf, other

    cs.CL cs.HC cs.LG

    Interactive and Visual Prompt Engineering for Ad-hoc Task Adaptation with Large Language Models

    Authors: Hendrik Strobelt, Albert Webson, Victor Sanh, Benjamin Hoover, Johanna Beyer, Hanspeter Pfister, Alexander M. Rush

    Abstract: State-of-the-art neural language models can now be used to solve ad-hoc language tasks through zero-shot prompting without the need for supervised training. This approach has gained popularity in recent years, and researchers have demonstrated prompts that achieve strong accuracy on specific NLP tasks. However, finding a prompt for new tasks requires experimentation. Different prompt templates wit… ▽ More

    Submitted 16 August, 2022; originally announced August 2022.

    Comments: 9 pages content, 2 pages references

  28. arXiv:2202.01279  [pdf, other

    cs.LG cs.CL

    PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts

    Authors: Stephen H. Bach, Victor Sanh, Zheng-Xin Yong, Albert Webson, Colin Raffel, Nihal V. Nayak, Abheesht Sharma, Taewoon Kim, M Saiful Bari, Thibault Fevry, Zaid Alyafeai, Manan Dey, Andrea Santilli, Zhiqing Sun, Srulik Ben-David, Canwen Xu, Gunjan Chhablani, Han Wang, Jason Alan Fries, Maged S. Al-shaibani, Shanya Sharma, Urmish Thakker, Khalid Almubarak, Xiangru Tang, Dragomir Radev , et al. (2 additional authors not shown)

    Abstract: PromptSource is a system for creating, sharing, and using natural language prompts. Prompts are functions that map an example from a dataset to a natural language input and target output. Using prompts to train and query language models is an emerging area in NLP that requires new tools that let users develop and refine these prompts collaboratively. PromptSource addresses the emergent challenges… ▽ More

    Submitted 29 March, 2022; v1 submitted 2 February, 2022; originally announced February 2022.

    Comments: ACL 2022 Demo

  29. arXiv:2201.02715  [pdf, other

    cs.CL

    Low-Rank Constraints for Fast Inference in Structured Models

    Authors: Justin T. Chiu, Yuntian Deng, Alexander M. Rush

    Abstract: Structured distributions, i.e. distributions over combinatorial spaces, are commonly used to learn latent probabilistic representations from observed data. However, scaling these models is bottlenecked by the high computational and memory complexity with respect to the size of the latent representations. Common models such as Hidden Markov Models (HMMs) and Probabilistic Context-Free Grammars (PCF… ▽ More

    Submitted 7 January, 2022; originally announced January 2022.

    Comments: 22 pages. Published at NeurIPS 2021

  30. GenNI: Human-AI Collaboration for Data-Backed Text Generation

    Authors: Hendrik Strobelt, Jambay Kinley, Robert Krueger, Johanna Beyer, Hanspeter Pfister, Alexander M. Rush

    Abstract: Table2Text systems generate textual output based on structured data utilizing machine learning. These systems are essential for fluent natural language interfaces in tools such as virtual assistants; however, left to generate freely these ML systems often produce misleading or unexpected outputs. GenNI (Generation Negotiation Interface) is an interactive visual system for high-level human-AI colla… ▽ More

    Submitted 19 October, 2021; originally announced October 2021.

    Comments: IEEE VIS 2021

    ACM Class: I.2.7; H.5.2

  31. arXiv:2110.08207  [pdf, other

    cs.LG cs.CL

    Multitask Prompted Training Enables Zero-Shot Task Generalization

    Authors: Victor Sanh, Albert Webson, Colin Raffel, Stephen H. Bach, Lintang Sutawika, Zaid Alyafeai, Antoine Chaffin, Arnaud Stiegler, Teven Le Scao, Arun Raja, Manan Dey, M Saiful Bari, Canwen Xu, Urmish Thakker, Shanya Sharma Sharma, Eliza Szczechla, Taewoon Kim, Gunjan Chhablani, Nihal Nayak, Debajyoti Datta, Jonathan Chang, Mike Tian-Jian Jiang, Han Wang, Matteo Manica, Sheng Shen , et al. (16 additional authors not shown)

    Abstract: Large language models have recently been shown to attain reasonable zero-shot generalization on a diverse set of tasks (Brown et al., 2020). It has been hypothesized that this is a consequence of implicit multitask learning in language models' pretraining (Radford et al., 2019). Can zero-shot generalization instead be directly induced by explicit multitask learning? To test this question at scale,… ▽ More

    Submitted 17 March, 2022; v1 submitted 15 October, 2021; originally announced October 2021.

    Comments: ICLR 2022 Spotlight (with extended discussion)

  32. arXiv:2109.06387  [pdf, other

    cs.CL cs.LG

    Rationales for Sequential Predictions

    Authors: Keyon Vafa, Yuntian Deng, David M. Blei, Alexander M. Rush

    Abstract: Sequence models are a critical component of modern NLP systems, but their predictions are difficult to explain. We consider model explanations though rationales, subsets of context that can explain individual model predictions. We find sequential rationales by solving a combinatorial optimization: the best rationale is the smallest subset of input tokens that would predict the same output as the f… ▽ More

    Submitted 17 November, 2021; v1 submitted 13 September, 2021; originally announced September 2021.

    Comments: Appeared in the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP 2021)

  33. arXiv:2109.04838  [pdf, other

    cs.LG cs.CL

    Block Pruning For Faster Transformers

    Authors: François Lagunas, Ella Charlaix, Victor Sanh, Alexander M. Rush

    Abstract: Pre-training has improved model accuracy for both classification and generation tasks at the cost of introducing much larger and slower models. Pruning methods have proven to be an effective way of reducing model size, whereas distillation methods are proven for speeding up inference. We introduce a block pruning approach targeting both small and fast models. Our approach extends structured method… ▽ More

    Submitted 10 September, 2021; originally announced September 2021.

    Comments: EMNLP 2021. Code, hyper-parameters, evaluation results and checkpoints available at https://github.com/huggingface/nn_pruning

    ACM Class: I.2.6; I.2.7

  34. arXiv:2109.02846  [pdf, other

    cs.CL

    Datasets: A Community Library for Natural Language Processing

    Authors: Quentin Lhoest, Albert Villanova del Moral, Yacine Jernite, Abhishek Thakur, Patrick von Platen, Suraj Patil, Julien Chaumond, Mariama Drame, Julien Plu, Lewis Tunstall, Joe Davison, Mario Šaško, Gunjan Chhablani, Bhavitvya Malik, Simon Brandeis, Teven Le Scao, Victor Sanh, Canwen Xu, Nicolas Patry, Angelina McMillan-Major, Philipp Schmid, Sylvain Gugger, Clément Delangue, Théo Matussière, Lysandre Debut , et al. (7 additional authors not shown)

    Abstract: The scale, variety, and quantity of publicly-available NLP datasets has grown rapidly as researchers propose new tasks, larger models, and novel benchmarks. Datasets is a community library for contemporary NLP designed to support this ecosystem. Datasets aims to standardize end-user interfaces, versioning, and documentation, while providing a lightweight front-end that behaves similarly for small… ▽ More

    Submitted 6 September, 2021; originally announced September 2021.

    Comments: EMNLP Demo 2021

  35. arXiv:2104.03514  [pdf, other

    cs.CL

    Low-Complexity Probing via Finding Subnetworks

    Authors: Steven Cao, Victor Sanh, Alexander M. Rush

    Abstract: The dominant approach in probing neural networks for linguistic properties is to train a new shallow multi-layer perceptron (MLP) on top of the model's internal representations. This approach can detect properties encoded in the model, but at the cost of adding new parameters that may learn the task directly. We instead propose a subtractive pruning-based probe, where we find an existing subnetwor… ▽ More

    Submitted 8 April, 2021; originally announced April 2021.

    Comments: NAACL-HLT 2021

  36. arXiv:2103.08493  [pdf, other

    cs.LG

    How Many Data Points is a Prompt Worth?

    Authors: Teven Le Scao, Alexander M. Rush

    Abstract: When fine-tuning pretrained models for classification, researchers either use a generic model head or a task-specific prompt for prediction. Proponents of prompting have argued that prompts provide a method for injecting task-specific guidance, which is beneficial in low-data regimes. We aim to quantify this benefit through rigorous testing of prompts in a fair setting: comparing prompted and head… ▽ More

    Submitted 6 April, 2021; v1 submitted 15 March, 2021; originally announced March 2021.

    Comments: NAACL HLT 2021

  37. arXiv:2102.13196  [pdf, other

    cs.LG cs.CL

    Named Tensor Notation

    Authors: David Chiang, Alexander M. Rush, Boaz Barak

    Abstract: We propose a notation for tensors with named axes, which relieves the author, reader, and future implementers of machine learning models from the burden of keeping track of the order of axes and the purpose of each. The notation makes it easy to lift operations on low-order tensors to higher order ones, for example, from images to minibatches of images, or from an attention mechanism to multiple a… ▽ More

    Submitted 17 January, 2023; v1 submitted 25 February, 2021; originally announced February 2021.

    Journal ref: TMLR, January 2023

  38. arXiv:2012.07463  [pdf, other

    cs.CL cs.LG

    Parameter-Efficient Transfer Learning with Diff Pruning

    Authors: Demi Guo, Alexander M. Rush, Yoon Kim

    Abstract: While task-specific finetuning of pretrained networks has led to significant empirical advances in NLP, the large size of networks makes finetuning difficult to deploy in multi-task, memory-constrained settings. We propose diff pruning as a simple approach to enable parameter-efficient transfer learning within the pretrain-finetune framework. This approach views finetuning as learning a task-speci… ▽ More

    Submitted 9 June, 2021; v1 submitted 14 December, 2020; originally announced December 2020.

    Comments: ACL 2021

  39. arXiv:2012.01300  [pdf, other

    cs.CL cs.LG

    Learning from others' mistakes: Avoiding dataset biases without modeling them

    Authors: Victor Sanh, Thomas Wolf, Yonatan Belinkov, Alexander M. Rush

    Abstract: State-of-the-art natural language processing (NLP) models often learn to model dataset biases and surface form correlations instead of features that target the intended underlying task. Previous work has demonstrated effective methods to circumvent these issues when knowledge of the bias is available. We consider cases where the bias issues may not be explicitly identified, and show a method for t… ▽ More

    Submitted 2 December, 2020; originally announced December 2020.

    Comments: 15 pages, 6 figures, 6 tables

  40. arXiv:2011.14244  [pdf, other

    cs.CL cs.AI cs.LG

    Latent Template Induction with Gumbel-CRFs

    Authors: Yao Fu, Chuanqi Tan, Bin Bi, Mosha Chen, Yansong Feng, Alexander M. Rush

    Abstract: Learning to control the structure of sentences is a challenging problem in text generation. Existing work either relies on simple deterministic approaches or RL-based hard structures. We explore the use of structured variational autoencoders to infer latent templates for sentence generation using a soft, continuous relaxation in order to utilize reparameterization for training. Specifically, we pr… ▽ More

    Submitted 28 November, 2020; originally announced November 2020.

    Comments: NeurIPS 2020 camera ready

  41. arXiv:2011.14203  [pdf, other

    cs.AR cs.CL

    EdgeBERT: Sentence-Level Energy Optimizations for Latency-Aware Multi-Task NLP Inference

    Authors: Thierry Tambe, Coleman Hooper, Lillian Pentecost, Tianyu Jia, En-Yu Yang, Marco Donato, Victor Sanh, Paul N. Whatmough, Alexander M. Rush, David Brooks, Gu-Yeon Wei

    Abstract: Transformer-based language models such as BERT provide significant accuracy improvement for a multitude of natural language processing (NLP) tasks. However, their hefty computational and memory demands make them challenging to deploy to resource-constrained edge platforms with strict latency requirements. We present EdgeBERT, an in-depth algorithm-hardware co-design for latency-aware energy optimi… ▽ More

    Submitted 5 September, 2021; v1 submitted 28 November, 2020; originally announced November 2020.

    Comments: 12 pages plus references. Paper to appear at the 54th IEEE/ACM International Symposium on Microarchitecture (MICRO 2021)

  42. arXiv:2011.09039  [pdf, other

    cs.CL cs.LG

    Sequence-Level Mixed Sample Data Augmentation

    Authors: Demi Guo, Yoon Kim, Alexander M. Rush

    Abstract: Despite their empirical success, neural networks still have difficulty capturing compositional aspects of natural language. This work proposes a simple data augmentation approach to encourage compositional behavior in neural models for sequence-to-sequence problems. Our approach, SeqMix, creates new synthetic examples by softly combining input/output sequences from the training set. We connect thi… ▽ More

    Submitted 17 November, 2020; originally announced November 2020.

    Comments: EMNLP 2020

  43. arXiv:2011.04743  [pdf, other

    cs.CL cs.CR

    Adversarial Semantic Collisions

    Authors: Congzheng Song, Alexander M. Rush, Vitaly Shmatikov

    Abstract: We study semantic collisions: texts that are semantically unrelated but judged as similar by NLP models. We develop gradient-based approaches for generating semantic collisions and demonstrate that state-of-the-art models for many tasks which rely on analyzing the meaning and similarity of texts-- including paraphrase identification, document retrieval, response suggestion, and extractive summariz… ▽ More

    Submitted 9 November, 2020; originally announced November 2020.

  44. arXiv:2011.04640  [pdf, other

    cs.CL cs.LG

    Scaling Hidden Markov Language Models

    Authors: Justin T. Chiu, Alexander M. Rush

    Abstract: The hidden Markov model (HMM) is a fundamental tool for sequence modeling that cleanly separates the hidden state from the emission structure. However, this separation makes it difficult to fit HMMs to large datasets in modern NLP, and they have fallen out of use due to very poor performance compared to fully observed models. This work revisits the challenge of scaling HMMs to language modeling da… ▽ More

    Submitted 9 November, 2020; originally announced November 2020.

    Comments: 9 pages, accepted as a short paper at EMNLP 2020

    Journal ref: EMNLP 2020

  45. arXiv:2010.13002  [pdf, other

    cs.CL cs.AI

    Pre-trained Summarization Distillation

    Authors: Sam Shleifer, Alexander M. Rush

    Abstract: Recent state-of-the-art approaches to summarization utilize large pre-trained Transformer models. Distilling these models to smaller student models has become critically important for practical use; however there are many different distillation methods proposed by the NLP literature. Recent work on distilling BERT for classification and regression tasks shows strong performance using direct knowle… ▽ More

    Submitted 28 October, 2020; v1 submitted 24 October, 2020; originally announced October 2020.

  46. arXiv:2008.09249  [pdf, other

    cs.CL

    GRIT: Generative Role-filler Transformers for Document-level Event Entity Extraction

    Authors: Xinya Du, Alexander M. Rush, Claire Cardie

    Abstract: We revisit the classic problem of document-level role-filler entity extraction (REE) for template filling. We argue that sentence-level approaches are ill-suited to the task and introduce a generative transformer-based encoder-decoder framework (GRIT) that is designed to model context at the document level: it can make extraction decisions across sentence boundaries; is implicitly aware of noun ph… ▽ More

    Submitted 28 January, 2021; v1 submitted 20 August, 2020; originally announced August 2020.

    Comments: To appear in EACL 2021; Code is available at https://github.com/xinyadu/grit_doc_event_entity

  47. arXiv:2007.12238  [pdf, other

    cs.HC cs.GL

    MiniConf -- A Virtual Conference Framework

    Authors: Alexander M. Rush, Hendrik Strobelt

    Abstract: MiniConf is a framework for hosting virtual academic conferences motivated by the sudden inability for these events to be hosted globally. The framework is designed to be global and asynchronous, interactive, and to promote browsing and discovery. We developed the system to be sustainable and maintainable, in particular ensuring that it is open-source, easy to setup, and scalable on minimal hardwa… ▽ More

    Submitted 10 July, 2020; originally announced July 2020.

  48. arXiv:2006.01112  [pdf, other

    cs.CL cs.LG cs.NE stat.ML

    Cascaded Text Generation with Markov Transformers

    Authors: Yuntian Deng, Alexander M. Rush

    Abstract: The two dominant approaches to neural text generation are fully autoregressive models, using serial beam search decoding, and non-autoregressive models, using parallel decoding with no output dependencies. This work proposes an autoregressive model with sub-linear parallel time generation. Noting that conditional random fields with bounded context can be decoded in parallel, we propose an efficien… ▽ More

    Submitted 5 December, 2020; v1 submitted 1 June, 2020; originally announced June 2020.

    Comments: NeurIPS 2020

  49. arXiv:2005.07683  [pdf, other

    cs.CL cs.LG

    Movement Pruning: Adaptive Sparsity by Fine-Tuning

    Authors: Victor Sanh, Thomas Wolf, Alexander M. Rush

    Abstract: Magnitude pruning is a widely used strategy for reducing model size in pure supervised learning; however, it is less effective in the transfer learning regime that has become standard for state-of-the-art natural language processing applications. We propose the use of movement pruning, a simple, deterministic first-order weight pruning method that is more adaptive to pretrained model fine-tuning.… ▽ More

    Submitted 23 October, 2020; v1 submitted 15 May, 2020; originally announced May 2020.

    Comments: 14 pages, 6 figures, 3 tables. Published at NeurIPS2020. Code: \url{huggingface.co/mvp}

  50. arXiv:2005.04560  [pdf, other

    cs.CL cs.AI cs.LG

    Posterior Control of Blackbox Generation

    Authors: Xiang Lisa Li, Alexander M. Rush

    Abstract: Text generation often requires high-precision output that obeys task-specific rules. This fine-grained control is difficult to enforce with off-the-shelf deep learning models. In this work, we consider augmenting neural generation models with discrete control states learned through a structured latent-variable approach. Under this formulation, task-specific knowledge can be encoded through a range… ▽ More

    Submitted 9 May, 2020; originally announced May 2020.

    Comments: Accepted for publication at ACL 2020