Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 80 results for author: Kiela, D

.
  1. arXiv:2402.09906  [pdf, other

    cs.CL cs.AI cs.LG

    Generative Representational Instruction Tuning

    Authors: Niklas Muennighoff, Hongjin Su, Liang Wang, Nan Yang, Furu Wei, Tao Yu, Amanpreet Singh, Douwe Kiela

    Abstract: All text-based language problems can be reduced to either generation or embedding. Current models only perform well at one or the other. We introduce generative representational instruction tuning (GRIT) whereby a large language model is trained to handle both generative and embedding tasks by distinguishing between them through instructions. Compared to other open models, our resulting GritLM 7B… ▽ More

    Submitted 17 April, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

    Comments: 66 pages (16 main), 25 figures, 34 tables

  2. arXiv:2402.01306  [pdf, other

    cs.LG cs.AI

    KTO: Model Alignment as Prospect Theoretic Optimization

    Authors: Kawin Ethayarajh, Winnie Xu, Niklas Muennighoff, Dan Jurafsky, Douwe Kiela

    Abstract: Kahneman & Tversky's $\textit{prospect theory}$ tells us that humans perceive random variables in a biased but well-defined manner (1992); for example, humans are famously loss-averse. We show that objectives for aligning LLMs with human feedback implicitly incorporate many of these biases -- the success of these objectives (e.g., DPO) over cross-entropy minimization can partly be ascribed to them… ▽ More

    Submitted 2 June, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

    Comments: ICML 2024

  3. arXiv:2401.05300  [pdf, other

    cs.CL cs.AI

    I am a Strange Dataset: Metalinguistic Tests for Language Models

    Authors: Tristan Thrush, Jared Moore, Miguel Monares, Christopher Potts, Douwe Kiela

    Abstract: Statements involving metalinguistic self-reference ("This paper has six sections.") are prevalent in many domains. Can large language models (LLMs) handle such language? In this paper, we present "I am a Strange Dataset", a new dataset for addressing this question. There are two subtasks: generation and verification. In generation, models continue statements like "The penultimate word in this sent… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.

  4. arXiv:2311.15108  [pdf, other

    cs.CV cs.AI

    Leveraging Diffusion Perturbations for Measuring Fairness in Computer Vision

    Authors: Nicholas Lui, Bryan Chia, William Berrios, Candace Ross, Douwe Kiela

    Abstract: Computer vision models have been known to encode harmful biases, leading to the potentially unfair treatment of historically marginalized groups, such as people of color. However, there remains a lack of datasets balanced along demographic traits that can be used to evaluate the downstream fairness of these models. In this work, we demonstrate that diffusion models can be leveraged to create such… ▽ More

    Submitted 11 February, 2024; v1 submitted 25 November, 2023; originally announced November 2023.

    Comments: The Appendix can be found at https://bit.ly/dp-appendix; Added link to code and fixed formatting (Feb 10 2024)

  5. arXiv:2311.11944  [pdf, other

    cs.CL cs.AI cs.CE stat.ML

    FinanceBench: A New Benchmark for Financial Question Answering

    Authors: Pranab Islam, Anand Kannappan, Douwe Kiela, Rebecca Qian, Nino Scherrer, Bertie Vidgen

    Abstract: FinanceBench is a first-of-its-kind test suite for evaluating the performance of LLMs on open book financial question answering (QA). It comprises 10,231 questions about publicly traded companies, with corresponding answers and evidence strings. The questions in FinanceBench are ecologically valid and cover a diverse set of scenarios. They are intended to be clear-cut and straightforward to answer… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

    Comments: Dataset is available at: https://huggingface.co/datasets/PatronusAI/financebench

  6. arXiv:2309.08638  [pdf, other

    cs.CL

    Anchor Points: Benchmarking Models with Much Fewer Examples

    Authors: Rajan Vivek, Kawin Ethayarajh, Diyi Yang, Douwe Kiela

    Abstract: Modern language models often exhibit powerful but brittle behavior, leading to the development of larger and more diverse benchmarks to reliably assess their behavior. Here, we suggest that model performance can be benchmarked and elucidated with much smaller evaluation sets. We first show that in six popular language classification benchmarks, model confidence in the correct class on many pairs o… ▽ More

    Submitted 18 February, 2024; v1 submitted 14 September, 2023; originally announced September 2023.

    Comments: Accepted to EACL 2024 Main Conference. Code will be released at: https://github.com/rvivek3/AnchorPoints

  7. arXiv:2306.16527  [pdf, other

    cs.IR cs.CV

    OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents

    Authors: Hugo Laurençon, Lucile Saulnier, Léo Tronchon, Stas Bekman, Amanpreet Singh, Anton Lozhkov, Thomas Wang, Siddharth Karamcheti, Alexander M. Rush, Douwe Kiela, Matthieu Cord, Victor Sanh

    Abstract: Large multimodal models trained on natural documents, which interleave images and text, outperform models trained on image-text pairs on various multimodal benchmarks. However, the datasets used to train these models have not been released, and the collection process has not been fully specified. We introduce the OBELICS dataset, an open web-scale filtered dataset of interleaved image-text documen… ▽ More

    Submitted 21 August, 2023; v1 submitted 21 June, 2023; originally announced June 2023.

  8. arXiv:2306.16410  [pdf, other

    cs.CL cs.CV

    Towards Language Models That Can See: Computer Vision Through the LENS of Natural Language

    Authors: William Berrios, Gautam Mittal, Tristan Thrush, Douwe Kiela, Amanpreet Singh

    Abstract: We propose LENS, a modular approach for tackling computer vision problems by leveraging the power of large language models (LLMs). Our system uses a language model to reason over outputs from a set of independent and highly descriptive vision modules that provide exhaustive information about an image. We evaluate the approach on pure computer vision settings such as zero- and few-shot object recog… ▽ More

    Submitted 28 June, 2023; originally announced June 2023.

  9. arXiv:2303.12582  [pdf, other

    cs.CL

    AfroDigits: A Community-Driven Spoken Digit Dataset for African Languages

    Authors: Chris Chinenye Emezue, Sanchit Gandhi, Lewis Tunstall, Abubakar Abid, Josh Meyer, Quentin Lhoest, Pete Allen, Patrick Von Platen, Douwe Kiela, Yacine Jernite, Julien Chaumond, Merve Noyan, Omar Sanseviero

    Abstract: The advancement of speech technologies has been remarkable, yet its integration with African languages remains limited due to the scarcity of African speech corpora. To address this issue, we present AfroDigits, a minimalist, community-driven dataset of spoken digits for African languages, currently covering 38 African languages. As a demonstration of the practical applications of AfroDigits, we c… ▽ More

    Submitted 3 April, 2023; v1 submitted 22 March, 2023; originally announced March 2023.

    Comments: Accepted to the AfricaNLP Workshop at ICLR 2023

  10. arXiv:2302.06976  [pdf, other

    cs.CL

    Investigating Multi-source Active Learning for Natural Language Inference

    Authors: Ard Snijders, Douwe Kiela, Katerina Margatina

    Abstract: In recent years, active learning has been successfully applied to an array of NLP tasks. However, prior work often assumes that training and test data are drawn from the same distribution. This is problematic, as in real-life settings data may stem from several sources of varying relevance and quality. We show that four popular active learning schemes fail to outperform random selection when appli… ▽ More

    Submitted 14 February, 2023; originally announced February 2023.

    Comments: 23 pages. Accepted for publication at the European Chapter of the Association of Computational Linguistics (EACL) 2023

  11. arXiv:2212.05129  [pdf, other

    cs.AI cs.LG

    Measuring Data

    Authors: Margaret Mitchell, Alexandra Sasha Luccioni, Nathan Lambert, Marissa Gerchick, Angelina McMillan-Major, Ezinwanne Ozoani, Nazneen Rajani, Tristan Thrush, Yacine Jernite, Douwe Kiela

    Abstract: We identify the task of measuring data to quantitatively characterize the composition of machine learning data and datasets. Similar to an object's height, width, and volume, data measurements quantify different attributes of data along common dimensions that support comparison. Several lines of research have proposed what we refer to as measurements, with differing terminology; we bring some of t… ▽ More

    Submitted 13 February, 2023; v1 submitted 9 December, 2022; originally announced December 2022.

  12. arXiv:2211.05100  [pdf, other

    cs.CL

    BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

    Authors: BigScience Workshop, :, Teven Le Scao, Angela Fan, Christopher Akiki, Ellie Pavlick, Suzana Ilić, Daniel Hesslow, Roman Castagné, Alexandra Sasha Luccioni, François Yvon, Matthias Gallé, Jonathan Tow, Alexander M. Rush, Stella Biderman, Albert Webson, Pawan Sasanka Ammanamanchi, Thomas Wang, Benoît Sagot, Niklas Muennighoff, Albert Villanova del Moral, Olatunji Ruwase, Rachel Bawden, Stas Bekman, Angelina McMillan-Major , et al. (369 additional authors not shown)

    Abstract: Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access… ▽ More

    Submitted 27 June, 2023; v1 submitted 9 November, 2022; originally announced November 2022.

  13. arXiv:2210.01970  [pdf, other

    cs.LG

    Evaluate & Evaluation on the Hub: Better Best Practices for Data and Model Measurements

    Authors: Leandro von Werra, Lewis Tunstall, Abhishek Thakur, Alexandra Sasha Luccioni, Tristan Thrush, Aleksandra Piktus, Felix Marty, Nazneen Rajani, Victor Mustar, Helen Ngo, Omar Sanseviero, Mario Šaško, Albert Villanova, Quentin Lhoest, Julien Chaumond, Margaret Mitchell, Alexander M. Rush, Thomas Wolf, Douwe Kiela

    Abstract: Evaluation is a key part of machine learning (ML), yet there is a lack of support and tooling to enable its informed and systematic practice. We introduce Evaluate and Evaluation on the Hub --a set of tools to facilitate the evaluation of models and datasets in ML. Evaluate is a library to support best practices for measurements, metrics, and comparisons of data and models. Its goal is to support… ▽ More

    Submitted 6 October, 2022; v1 submitted 30 September, 2022; originally announced October 2022.

  14. arXiv:2207.10062  [pdf, other

    cs.LG

    DataPerf: Benchmarks for Data-Centric AI Development

    Authors: Mark Mazumder, Colby Banbury, Xiaozhe Yao, Bojan Karlaš, William Gaviria Rojas, Sudnya Diamos, Greg Diamos, Lynn He, Alicia Parrish, Hannah Rose Kirk, Jessica Quaye, Charvi Rastogi, Douwe Kiela, David Jurado, David Kanter, Rafael Mosquera, Juan Ciro, Lora Aroyo, Bilge Acun, Lingjiao Chen, Mehul Smriti Raje, Max Bartolo, Sabri Eyuboglu, Amirata Ghorbani, Emmett Goodman , et al. (20 additional authors not shown)

    Abstract: Machine learning research has long focused on models rather than datasets, and prominent datasets are used for common ML tasks without regard to the breadth, difficulty, and faithfulness of the underlying problems. Neglecting the fundamental importance of data has given rise to inaccuracy, bias, and fragility in real-world applications, and research is hindered by saturation across existing datase… ▽ More

    Submitted 13 October, 2023; v1 submitted 20 July, 2022; originally announced July 2022.

    Comments: NeurIPS 2023 Datasets and Benchmarks Track

  15. arXiv:2205.12586  [pdf, other

    cs.CL cs.AI

    Perturbation Augmentation for Fairer NLP

    Authors: Rebecca Qian, Candace Ross, Jude Fernandes, Eric Smith, Douwe Kiela, Adina Williams

    Abstract: Unwanted and often harmful social biases are becoming ever more salient in NLP research, affecting both models and datasets. In this work, we ask whether training on demographically perturbed data leads to fairer language models. We collect a large dataset of human annotated text perturbations and train a neural perturbation model, which we show outperforms heuristic alternatives. We find that (i)… ▽ More

    Submitted 12 October, 2022; v1 submitted 25 May, 2022; originally announced May 2022.

  16. arXiv:2204.03162  [pdf, other

    cs.CV cs.CL

    Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality

    Authors: Tristan Thrush, Ryan Jiang, Max Bartolo, Amanpreet Singh, Adina Williams, Douwe Kiela, Candace Ross

    Abstract: We present a novel task and dataset for evaluating the ability of vision and language models to conduct visio-linguistic compositional reasoning, which we call Winoground. Given two images and two captions, the goal is to match them correctly - but crucially, both captions contain a completely identical set of words, only in a different order. The dataset was carefully hand-curated by expert annot… ▽ More

    Submitted 22 April, 2022; v1 submitted 6 April, 2022; originally announced April 2022.

    Comments: CVPR 2022

  17. arXiv:2204.01906  [pdf, other

    cs.CL cs.AI

    Dynatask: A Framework for Creating Dynamic AI Benchmark Tasks

    Authors: Tristan Thrush, Kushal Tirumala, Anmol Gupta, Max Bartolo, Pedro Rodriguez, Tariq Kane, William Gaviria Rojas, Peter Mattson, Adina Williams, Douwe Kiela

    Abstract: We introduce Dynatask: an open source system for setting up custom NLP tasks that aims to greatly lower the technical knowledge and effort required for hosting and evaluating state-of-the-art NLP models, as well as for conducting model in the loop data collection with crowdworkers. Dynatask is integrated with Dynabench, a research platform for rethinking benchmarking in AI that facilitates human a… ▽ More

    Submitted 4 April, 2022; originally announced April 2022.

    Comments: ACL System Demos 2022

  18. arXiv:2112.09062  [pdf, other

    cs.CL

    Models in the Loop: Aiding Crowdworkers with Generative Annotation Assistants

    Authors: Max Bartolo, Tristan Thrush, Sebastian Riedel, Pontus Stenetorp, Robin Jia, Douwe Kiela

    Abstract: In Dynamic Adversarial Data Collection (DADC), human annotators are tasked with finding examples that models struggle to predict correctly. Models trained on DADC-collected training data have been shown to be more robust in adversarial and out-of-domain settings, and are considerably harder for humans to fool. However, DADC is more time-consuming than traditional data collection and thus more cost… ▽ More

    Submitted 17 May, 2022; v1 submitted 16 December, 2021; originally announced December 2021.

  19. arXiv:2112.04482  [pdf, other

    cs.CV cs.CL

    FLAVA: A Foundational Language And Vision Alignment Model

    Authors: Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, Douwe Kiela

    Abstract: State-of-the-art vision and vision-and-language models rely on large-scale visio-linguistic pretraining for obtaining good performance on a variety of downstream tasks. Generally, such models are often either cross-modal (contrastive) or multi-modal (with earlier fusion) but not both; and they often only target specific modalities or tasks. A promising direction would be to use a single holistic u… ▽ More

    Submitted 29 March, 2022; v1 submitted 8 December, 2021; originally announced December 2021.

    Comments: CVPR 2022

  20. arXiv:2110.08514  [pdf, other

    cs.CL cs.LG

    Analyzing Dynamic Adversarial Training Data in the Limit

    Authors: Eric Wallace, Adina Williams, Robin Jia, Douwe Kiela

    Abstract: To create models that are robust across a wide range of test inputs, training datasets should include diverse examples that span numerous phenomena. Dynamic adversarial data collection (DADC), where annotators craft examples that challenge continually improving models, holds promise as an approach for generating such diverse training sets. Prior work has shown that running DADC over 1-3 rounds can… ▽ More

    Submitted 26 September, 2022; v1 submitted 16 October, 2021; originally announced October 2021.

    Comments: ACL Findings 2022

  21. arXiv:2109.03939  [pdf, other

    cs.CL cs.AI

    What's Hidden in a One-layer Randomly Weighted Transformer?

    Authors: Sheng Shen, Zhewei Yao, Douwe Kiela, Kurt Keutzer, Michael W. Mahoney

    Abstract: We demonstrate that, hidden within one-layer randomly weighted neural networks, there exist subnetworks that can achieve impressive performance, without ever modifying the weight initializations, on machine translation tasks. To find subnetworks for one-layer randomly weighted neural networks, we apply different binary masks to the same weight matrix to generate different layers. Hidden within a o… ▽ More

    Submitted 8 September, 2021; originally announced September 2021.

    Comments: EMNLP 2021 (short)

  22. arXiv:2106.06052  [pdf, other

    cs.CL cs.AI

    Dynaboard: An Evaluation-As-A-Service Platform for Holistic Next-Generation Benchmarking

    Authors: Zhiyi Ma, Kawin Ethayarajh, Tristan Thrush, Somya Jain, Ledell Wu, Robin Jia, Christopher Potts, Adina Williams, Douwe Kiela

    Abstract: We introduce Dynaboard, an evaluation-as-a-service framework for hosting benchmarks and conducting holistic model comparison, integrated with the Dynabench platform. Our platform evaluates NLP models directly instead of relying on self-reported metrics or predictions on a single dataset. Under this paradigm, models are submitted to be evaluated in the cloud, circumventing the issues of reproducibi… ▽ More

    Submitted 20 May, 2021; originally announced June 2021.

  23. arXiv:2106.02280  [pdf, other

    cs.CV cs.CL

    Human-Adversarial Visual Question Answering

    Authors: Sasha Sheng, Amanpreet Singh, Vedanuj Goswami, Jose Alberto Lopez Magana, Wojciech Galuba, Devi Parikh, Douwe Kiela

    Abstract: Performance on the most commonly used Visual Question Answering dataset (VQA v2) is starting to approach human accuracy. However, in interacting with state-of-the-art VQA models, it is clear that the problem is far from being solved. In order to stress test VQA models, we benchmark them against human-adversarial examples. Human subjects interact with a state-of-the-art VQA model, and for each imag… ▽ More

    Submitted 4 June, 2021; originally announced June 2021.

    Comments: 22 pages, 13 figures. First two authors contributed equally

  24. arXiv:2106.00872  [pdf, other

    cs.CL cs.AI cs.LG

    On the Efficacy of Adversarial Data Collection for Question Answering: Results from a Large-Scale Randomized Study

    Authors: Divyansh Kaushik, Douwe Kiela, Zachary C. Lipton, Wen-tau Yih

    Abstract: In adversarial data collection (ADC), a human workforce interacts with a model in real time, attempting to produce examples that elicit incorrect predictions. Researchers hope that models trained on these more challenging datasets will rely less on superficial patterns, and thus be less brittle. However, despite ADC's intuitive appeal, it remains unclear when training on adversarial datasets produ… ▽ More

    Submitted 1 June, 2021; originally announced June 2021.

    Comments: Accepted at ACL-IJCNLP 2021

  25. arXiv:2105.11447  [pdf, other

    cs.CL cs.LG stat.ML

    True Few-Shot Learning with Language Models

    Authors: Ethan Perez, Douwe Kiela, Kyunghyun Cho

    Abstract: Pretrained language models (LMs) perform well on many tasks even when learning from a few examples, but prior work uses many held-out examples to tune various aspects of learning, such as hyperparameters, training objectives, and natural language templates ("prompts"). Here, we evaluate the few-shot ability of LMs when such held-out examples are unavailable, a setting we call true few-shot learnin… ▽ More

    Submitted 24 May, 2021; originally announced May 2021.

    Comments: Code at https://github.com/ethanjperez/true_few_shot

  26. arXiv:2104.14337  [pdf, other

    cs.CL cs.AI

    Dynabench: Rethinking Benchmarking in NLP

    Authors: Douwe Kiela, Max Bartolo, Yixin Nie, Divyansh Kaushik, Atticus Geiger, Zhengxuan Wu, Bertie Vidgen, Grusha Prasad, Amanpreet Singh, Pratik Ringshia, Zhiyi Ma, Tristan Thrush, Sebastian Riedel, Zeerak Waseem, Pontus Stenetorp, Robin Jia, Mohit Bansal, Christopher Potts, Adina Williams

    Abstract: We introduce Dynabench, an open-source platform for dynamic dataset creation and model benchmarking. Dynabench runs in a web browser and supports human-and-model-in-the-loop dataset creation: annotators seek to create examples that a target model will misclassify, but that another person will not. In this paper, we argue that Dynabench addresses a critical need in our community: contemporary model… ▽ More

    Submitted 7 April, 2021; originally announced April 2021.

    Comments: NAACL 2021

  27. arXiv:2104.13733  [pdf, other

    cs.CL cs.AI cs.CR cs.LG

    Gradient-based Adversarial Attacks against Text Transformers

    Authors: Chuan Guo, Alexandre Sablayrolles, Hervé Jégou, Douwe Kiela

    Abstract: We propose the first general-purpose gradient-based attack against transformer models. Instead of searching for a single adversarial example, we search for a distribution of adversarial examples parameterized by a continuous-valued matrix, hence enabling gradient-based optimization. We empirically demonstrate that our white-box attack attains state-of-the-art attack performance on a variety of nat… ▽ More

    Submitted 15 April, 2021; originally announced April 2021.

  28. Improving Question Answering Model Robustness with Synthetic Adversarial Data Generation

    Authors: Max Bartolo, Tristan Thrush, Robin Jia, Sebastian Riedel, Pontus Stenetorp, Douwe Kiela

    Abstract: Despite recent progress, state-of-the-art question answering models remain vulnerable to a variety of adversarial attacks. While dynamic adversarial data collection, in which a human annotator tries to write examples that fool a model-in-the-loop, can improve model robustness, this process is expensive which limits the scale of the collected data. In this work, we are the first to use synthetic ad… ▽ More

    Submitted 15 March, 2022; v1 submitted 17 April, 2021; originally announced April 2021.

    Comments: EMNLP 2021

    Journal ref: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, p.8830-8848. Association for Computational Linguistics

  29. arXiv:2104.08108  [pdf, other

    cs.CV cs.CL

    Cross-Modal Retrieval Augmentation for Multi-Modal Classification

    Authors: Shir Gur, Natalia Neverova, Chris Stauffer, Ser-Nam Lim, Douwe Kiela, Austin Reiter

    Abstract: Recent advances in using retrieval components over external knowledge sources have shown impressive results for a variety of downstream tasks in natural language processing. Here, we explore the use of unstructured external knowledge sources of images and their corresponding captions for improving visual question answering (VQA). First, we train a novel alignment model for embedding images and cap… ▽ More

    Submitted 16 April, 2021; originally announced April 2021.

  30. arXiv:2104.07567  [pdf, other

    cs.CL cs.AI

    Retrieval Augmentation Reduces Hallucination in Conversation

    Authors: Kurt Shuster, Spencer Poff, Moya Chen, Douwe Kiela, Jason Weston

    Abstract: Despite showing increasingly human-like conversational abilities, state-of-the-art dialogue models often suffer from factual incorrectness and hallucination of knowledge (Roller et al., 2020). In this work we explore the use of neural-retrieval-in-the-loop architectures - recently shown to be effective in open-domain QA (Lewis et al., 2020b; Izacard and Grave, 2020) - for knowledge-grounded dialog… ▽ More

    Submitted 15 April, 2021; originally announced April 2021.

  31. arXiv:2104.06644  [pdf, other

    cs.CL cs.LG

    Masked Language Modeling and the Distributional Hypothesis: Order Word Matters Pre-training for Little

    Authors: Koustuv Sinha, Robin Jia, Dieuwke Hupkes, Joelle Pineau, Adina Williams, Douwe Kiela

    Abstract: A possible explanation for the impressive performance of masked language model (MLM) pre-training is that such models have learned to represent the syntactic structures prevalent in classical NLP pipelines. In this paper, we propose a different explanation: MLMs succeed on downstream tasks almost entirely due to their ability to model higher-order word co-occurrence statistics. To demonstrate this… ▽ More

    Submitted 9 September, 2021; v1 submitted 14 April, 2021; originally announced April 2021.

    Comments: To appear at EMNLP 2021; 26 pages total (9 main, 6 reference and 11 Appendix)

  32. arXiv:2103.08067  [pdf, other

    cs.MA cs.AI

    Quasi-Equivalence Discovery for Zero-Shot Emergent Communication

    Authors: Kalesha Bullard, Douwe Kiela, Franziska Meier, Joelle Pineau, Jakob Foerster

    Abstract: Effective communication is an important skill for enabling information exchange in multi-agent settings and emergent communication is now a vibrant field of research, with common settings involving discrete cheap-talk channels. Since, by definition, these settings involve arbitrary encoding of information, typically they do not allow for the learned protocols to generalize beyond training partners… ▽ More

    Submitted 22 June, 2021; v1 submitted 14 March, 2021; originally announced March 2021.

    Comments: 14 pages

  33. arXiv:2103.03872  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    Rissanen Data Analysis: Examining Dataset Characteristics via Description Length

    Authors: Ethan Perez, Douwe Kiela, Kyunghyun Cho

    Abstract: We introduce a method to determine if a certain capability helps to achieve an accurate model of given data. We view labels as being generated from the inputs by a program composed of subroutines with different capabilities, and we posit that a subroutine is useful if and only if the minimal program that invokes it is shorter than the one that does not. Since minimum program length is uncomputable… ▽ More

    Submitted 5 March, 2021; originally announced March 2021.

    Comments: Code at https://github.com/ethanjperez/rda along with a script to run RDA on your own dataset

  34. arXiv:2012.15761  [pdf, other

    cs.CL cs.LG

    Learning from the Worst: Dynamically Generated Datasets to Improve Online Hate Detection

    Authors: Bertie Vidgen, Tristan Thrush, Zeerak Waseem, Douwe Kiela

    Abstract: We present a human-and-model-in-the-loop process for dynamically generating datasets and training better performing and more robust hate detection models. We provide a new dataset of ~40,000 entries, generated and labelled by trained annotators over four rounds of dynamic data creation. It includes ~15,000 challenging perturbations and each hateful entry has fine-grained labels for the type and ta… ▽ More

    Submitted 3 June, 2021; v1 submitted 31 December, 2020; originally announced December 2020.

  35. arXiv:2012.15349  [pdf, other

    cs.CL

    DynaSent: A Dynamic Benchmark for Sentiment Analysis

    Authors: Christopher Potts, Zhengxuan Wu, Atticus Geiger, Douwe Kiela

    Abstract: We introduce DynaSent ('Dynamic Sentiment'), a new English-language benchmark task for ternary (positive/negative/neutral) sentiment analysis. DynaSent combines naturally occurring sentences with sentences created using the open-source Dynabench Platform, which facilities human-and-model-in-the-loop dataset creation. DynaSent has a total of 121,634 sentences, each validated by five crowdworkers, a… ▽ More

    Submitted 30 December, 2020; originally announced December 2020.

  36. arXiv:2012.15045  [pdf, other

    cs.CL

    Reservoir Transformers

    Authors: Sheng Shen, Alexei Baevski, Ari S. Morcos, Kurt Keutzer, Michael Auli, Douwe Kiela

    Abstract: We demonstrate that transformers obtain impressive performance even when some of the layers are randomly initialized and never updated. Inspired by old and well-established ideas in machine learning, we explore a variety of non-linear "reservoir" layers interspersed with regular transformer layers, and show improvements in wall-clock compute time until convergence, as well as overall performance,… ▽ More

    Submitted 1 June, 2021; v1 submitted 30 December, 2020; originally announced December 2020.

    Comments: ACL 2021

  37. arXiv:2012.13391  [pdf, other

    cs.CL cs.AI cs.LG

    I like fish, especially dolphins: Addressing Contradictions in Dialogue Modeling

    Authors: Yixin Nie, Mary Williamson, Mohit Bansal, Douwe Kiela, Jason Weston

    Abstract: To quantify how well natural language understanding models can capture consistency in a general conversation, we introduce the DialoguE COntradiction DEtection task (DECODE) and a new conversational dataset containing both human-human and human-bot contradictory dialogues. We then compare a structured utterance-based approach of using pre-trained Transformer models for contradiction detection with… ▽ More

    Submitted 28 December, 2020; v1 submitted 24 December, 2020; originally announced December 2020.

    Comments: 15 pages

  38. arXiv:2012.13354  [pdf, other

    cs.CL

    To what extent do human explanations of model behavior align with actual model behavior?

    Authors: Grusha Prasad, Yixin Nie, Mohit Bansal, Robin Jia, Douwe Kiela, Adina Williams

    Abstract: Given the increasingly prominent role NLP models (will) play in our lives, it is important for human expectations of model behavior to align with actual model behavior. Using Natural Language Inference (NLI) as a case study, we investigate the extent to which human-generated explanations of models' inference decisions align with how models actually make these decisions. More specifically, we defin… ▽ More

    Submitted 16 September, 2021; v1 submitted 24 December, 2020; originally announced December 2020.

    Comments: To appear in the Proceedings of BlackBox NLP 2021

  39. arXiv:2010.15896  [pdf, other

    cs.MA cs.AI

    Exploring Zero-Shot Emergent Communication in Embodied Multi-Agent Populations

    Authors: Kalesha Bullard, Franziska Meier, Douwe Kiela, Joelle Pineau, Jakob Foerster

    Abstract: Effective communication is an important skill for enabling information exchange and cooperation in multi-agent settings. Indeed, emergent communication is now a vibrant field of research, with common settings involving discrete cheap-talk channels. One limitation of this setting is that it does not allow for the emergent protocols to generalize beyond the training partners. Furthermore, so far eme… ▽ More

    Submitted 3 December, 2020; v1 submitted 29 October, 2020; originally announced October 2020.

  40. arXiv:2010.12729  [pdf, other

    cs.CL

    ANLIzing the Adversarial Natural Language Inference Dataset

    Authors: Adina Williams, Tristan Thrush, Douwe Kiela

    Abstract: We perform an in-depth error analysis of Adversarial NLI (ANLI), a recently introduced large-scale human-and-model-in-the-loop natural language inference dataset collected over multiple rounds. We propose a fine-grained annotation scheme of the different aspects of inference that are responsible for the gold classification labels, and use it to hand-code all three of the ANLI development sets. We… ▽ More

    Submitted 23 October, 2020; originally announced October 2020.

    Comments: 33 pages, 1 figure, 24 tables

  41. arXiv:2009.12789  [pdf, other

    cs.LG cs.IT stat.ML

    Learning Optimal Representations with the Decodable Information Bottleneck

    Authors: Yann Dubois, Douwe Kiela, David J. Schwab, Ramakrishna Vedantam

    Abstract: We address the question of characterizing and finding optimal representations for supervised learning. Traditionally, this question has been tackled using the Information Bottleneck, which compresses the inputs while retaining information about the targets, in a decoder-agnostic fashion. In machine learning, however, our goal is not compression but rather generalization, which is intimately linked… ▽ More

    Submitted 16 July, 2021; v1 submitted 27 September, 2020; originally announced September 2020.

    Comments: Accepted at NeurIPS 2020

  42. arXiv:2009.12756  [pdf, other

    cs.CL

    Answering Complex Open-Domain Questions with Multi-Hop Dense Retrieval

    Authors: Wenhan Xiong, Xiang Lorraine Li, Srini Iyer, Jingfei Du, Patrick Lewis, William Yang Wang, Yashar Mehdad, Wen-tau Yih, Sebastian Riedel, Douwe Kiela, Barlas Oğuz

    Abstract: We propose a simple and efficient multi-hop dense retrieval approach for answering complex open-domain questions, which achieves state-of-the-art performance on two multi-hop datasets, HotpotQA and multi-evidence FEVER. Contrary to previous work, our method does not require access to any corpus-specific information, such as inter-document hyperlinks or human-annotated entity markers, and can be ap… ▽ More

    Submitted 19 February, 2021; v1 submitted 27 September, 2020; originally announced September 2020.

  43. arXiv:2005.11401  [pdf, other

    cs.CL cs.LG

    Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

    Authors: Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela

    Abstract: Large pre-trained language models have been shown to store factual knowledge in their parameters, and achieve state-of-the-art results when fine-tuned on downstream NLP tasks. However, their ability to access and precisely manipulate knowledge is still limited, and hence on knowledge-intensive tasks, their performance lags behind task-specific architectures. Additionally, providing provenance for… ▽ More

    Submitted 12 April, 2021; v1 submitted 22 May, 2020; originally announced May 2020.

    Comments: Accepted at NeurIPS 2020

  44. arXiv:2005.04790  [pdf, other

    cs.AI cs.CL cs.CV

    The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes

    Authors: Douwe Kiela, Hamed Firooz, Aravind Mohan, Vedanuj Goswami, Amanpreet Singh, Pratik Ringshia, Davide Testuggine

    Abstract: This work proposes a new challenge set for multimodal classification, focusing on detecting hate speech in multimodal memes. It is constructed such that unimodal models struggle and only multimodal models can succeed: difficult examples ("benign confounders") are added to the dataset to make it hard to rely on unimodal signals. The task requires subtle reasoning, yet is straightforward to evaluate… ▽ More

    Submitted 7 April, 2021; v1 submitted 10 May, 2020; originally announced May 2020.

    Comments: NeurIPS 2020

  45. arXiv:2005.00614  [pdf, other

    cs.CL

    Multi-Dimensional Gender Bias Classification

    Authors: Emily Dinan, Angela Fan, Ledell Wu, Jason Weston, Douwe Kiela, Adina Williams

    Abstract: Machine learning models are trained to find patterns in data. NLP models can inadvertently learn socially undesirable patterns when training on gender biased text. In this work, we propose a general framework that decomposes gender bias in text along several pragmatic and semantic dimensions: bias from the gender of the person being spoken about, bias from the gender of the person being spoken to,… ▽ More

    Submitted 1 May, 2020; originally announced May 2020.

  46. arXiv:2002.09758  [pdf, other

    cs.CL cs.AI cs.LG

    Unsupervised Question Decomposition for Question Answering

    Authors: Ethan Perez, Patrick Lewis, Wen-tau Yih, Kyunghyun Cho, Douwe Kiela

    Abstract: We aim to improve question answering (QA) by decomposing hard questions into simpler sub-questions that existing QA systems are capable of answering. Since labeling questions with decompositions is cumbersome, we take an unsupervised approach to produce sub-questions, also enabling us to leverage millions of questions from the internet. Specifically, we propose an algorithm for One-to-N Unsupervis… ▽ More

    Submitted 6 October, 2020; v1 submitted 22 February, 2020; originally announced February 2020.

    Comments: EMNLP 2020 Camera-Ready. Code available at https://github.com/facebookresearch/UnsupervisedDecomposition

  47. arXiv:2002.02878  [pdf, other

    cs.AI cs.CL stat.ML

    I love your chain mail! Making knights smile in a fantasy game world: Open-domain goal-oriented dialogue agents

    Authors: Shrimai Prabhumoye, Margaret Li, Jack Urbanek, Emily Dinan, Douwe Kiela, Jason Weston, Arthur Szlam

    Abstract: Dialogue research tends to distinguish between chit-chat and goal-oriented tasks. While the former is arguably more naturalistic and has a wider use of language, the latter has clearer metrics and a straightforward learning signal. Humans effortlessly combine the two, for example engaging in chit-chat with the goal of exchanging information or eliciting a specific response. Here, we bridge the div… ▽ More

    Submitted 10 February, 2020; v1 submitted 7 February, 2020; originally announced February 2020.

  48. arXiv:2002.01093  [pdf, other

    cs.CL cs.AI cs.LG cs.MA stat.ML

    On the interaction between supervision and self-play in emergent communication

    Authors: Ryan Lowe, Abhinav Gupta, Jakob Foerster, Douwe Kiela, Joelle Pineau

    Abstract: A promising approach for teaching artificial agents to use natural language involves using human-in-the-loop training. However, recent work suggests that current machine learning methods are too data inefficient to be trained in this way from scratch. In this paper, we investigate the relationship between two categories of learning signals with the ultimate goal of improving sample efficiency: imi… ▽ More

    Submitted 22 June, 2020; v1 submitted 3 February, 2020; originally announced February 2020.

    Comments: The first two authors contributed equally. Accepted at ICLR 2020

  49. arXiv:1911.09194  [pdf, other

    cs.AI cs.CL cs.LG

    Generating Interactive Worlds with Text

    Authors: Angela Fan, Jack Urbanek, Pratik Ringshia, Emily Dinan, Emma Qian, Siddharth Karamcheti, Shrimai Prabhumoye, Douwe Kiela, Tim Rocktaschel, Arthur Szlam, Jason Weston

    Abstract: Procedurally generating cohesive and interesting game environments is challenging and time-consuming. In order for the relationships between the game elements to be natural, common-sense has to be encoded into arrangement of the elements. In this work, we investigate a machine learning approach for world creation using content from the multi-player text adventure game environment LIGHT. We introdu… ▽ More

    Submitted 4 December, 2019; v1 submitted 20 November, 2019; originally announced November 2019.

  50. arXiv:1911.03842  [pdf, other

    cs.CL

    Queens are Powerful too: Mitigating Gender Bias in Dialogue Generation

    Authors: Emily Dinan, Angela Fan, Adina Williams, Jack Urbanek, Douwe Kiela, Jason Weston

    Abstract: Models often easily learn biases present in the training data, and their predictions directly reflect this bias. We analyze gender bias in dialogue data, and examine how this bias is actually amplified in subsequent generative chit-chat dialogue models. We measure gender bias in six existing dialogue datasets, and focus on the most biased one, the multi-player text-based fantasy adventure dataset… ▽ More

    Submitted 16 April, 2020; v1 submitted 9 November, 2019; originally announced November 2019.