Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 140 results for author: Van Durme, B

.
  1. arXiv:2407.07778  [pdf, other

    cs.CL

    WorldAPIs: The World Is Worth How Many APIs? A Thought Experiment

    Authors: Jiefu Ou, Arda Uzunoglu, Benjamin Van Durme, Daniel Khashabi

    Abstract: AI systems make decisions in physical environments through primitive actions or affordances that are accessed via API calls. While deploying AI agents in the real world involves numerous high-level actions, existing embodied simulators offer a limited set of domain-salient APIs. This naturally brings up the questions: how many primitive actions (APIs) are needed for a versatile embodied agent, and… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: ACL 2024 NLRSE, 8 pages

  2. arXiv:2407.03572  [pdf, other

    cs.CL

    Core: Robust Factual Precision Scoring with Informative Sub-Claim Identification

    Authors: Zhengping Jiang, Jingyu Zhang, Nathaniel Weir, Seth Ebner, Miriam Wanner, Kate Sanders, Daniel Khashabi, Anqi Liu, Benjamin Van Durme

    Abstract: Hallucinations -- the generation of untrue claims -- pose a challenge to the application of large language models (LLMs) [1] thereby motivating the development of metrics to evaluate factual precision. We observe that popular metrics using the Decompose-Then-Verify framework, such as FActScore [2], can be manipulated by adding obvious or repetitive claims to artificially inflate scores. We expand… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  3. arXiv:2406.17186  [pdf, other

    cs.CL cs.CY

    CLERC: A Dataset for Legal Case Retrieval and Retrieval-Augmented Analysis Generation

    Authors: Abe Bohan Hou, Orion Weller, Guanghui Qin, Eugene Yang, Dawn Lawrie, Nils Holzenberger, Andrew Blair-Stanek, Benjamin Van Durme

    Abstract: Legal professionals need to write analyses that rely on citations to relevant precedents, i.e., previous case decisions. Intelligent systems assisting legal professionals in writing such documents provide great benefits but are challenging to design. Such systems need to help locate, summarize, and reason over salient precedents in order to be useful. To enable systems for such tasks, we work with… ▽ More

    Submitted 27 June, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

  4. arXiv:2406.14764  [pdf, other

    cs.IR cs.AI cs.CL cs.LG

    RE-AdaptIR: Improving Information Retrieval through Reverse Engineered Adaptation

    Authors: William Fleshman, Benjamin Van Durme

    Abstract: Large language models (LLMs) fine-tuned for text-retrieval have demonstrated state-of-the-art results across several information retrieval (IR) benchmarks. However, supervised training for improving these models requires numerous labeled examples, which are generally unavailable or expensive to acquire. In this work, we explore the effectiveness of extending reverse engineered adaptation to the co… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  5. arXiv:2406.14739  [pdf, other

    cs.CL

    Learning to Retrieve Iteratively for In-Context Learning

    Authors: Yunmo Chen, Tongfei Chen, Harsh Jhamtani, Patrick Xia, Richard Shin, Jason Eisner, Benjamin Van Durme

    Abstract: We introduce iterative retrieval, a novel framework that empowers retrievers to make iterative decisions through policy optimization. Finding an optimal portfolio of retrieved items is a combinatorial optimization problem, generally considered NP-hard. This approach provides a learned approximation to such a solution, meeting specific task requirements under a given family of large language models… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  6. arXiv:2406.09646  [pdf, other

    cs.CV cs.AI

    A Survey of Video Datasets for Grounded Event Understanding

    Authors: Kate Sanders, Benjamin Van Durme

    Abstract: While existing video benchmarks largely consider specialized downstream tasks like retrieval or question-answering (QA), contemporary multimodal AI systems must be capable of well-rounded common-sense reasoning akin to human visual understanding. A critical component of human temporal-visual perception is our ability to identify and cognitively model "things happening", or events. Historically, vi… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  7. arXiv:2405.15007  [pdf, other

    cs.CL cs.AI cs.LG

    RE-Adapt: Reverse Engineered Adaptation of Large Language Models

    Authors: William Fleshman, Benjamin Van Durme

    Abstract: We introduce RE-Adapt, an approach to fine-tuning large language models on new domains without degrading any pre-existing instruction-tuning. We reverse engineer an adapter which isolates what an instruction-tuned model has learned beyond its corresponding pretrained base model. Importantly, this requires no additional data or training. We can then fine-tune the base model on a new domain and read… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  8. arXiv:2404.08417  [pdf, other

    cs.LG cs.AI cs.CL

    AdapterSwap: Continuous Training of LLMs with Data Removal and Access-Control Guarantees

    Authors: William Fleshman, Aleem Khan, Marc Marone, Benjamin Van Durme

    Abstract: Large language models (LLMs) are increasingly capable of completing knowledge intensive tasks by recalling information from a static pretraining corpus. Here we are concerned with LLMs in the context of evolving data requirements. For instance: batches of new data that are introduced periodically; subsets of data with user-based access controls; or requirements on dynamic removal of documents with… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

  9. arXiv:2404.04298  [pdf, other

    cs.AI cs.CL cs.LG

    SELF-[IN]CORRECT: LLMs Struggle with Refining Self-Generated Responses

    Authors: Dongwei Jiang, Jingyu Zhang, Orion Weller, Nathaniel Weir, Benjamin Van Durme, Daniel Khashabi

    Abstract: Can LLMs continually improve their previous outputs for better results? An affirmative answer would require LLMs to be better at discriminating among previously-generated alternatives, than generating initial responses. We explore the validity of this hypothesis in practice. We first introduce a unified framework that allows us to compare the generative and discriminative capability of any model o… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

  10. arXiv:2404.03862  [pdf, other

    cs.CL

    Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data

    Authors: Jingyu Zhang, Marc Marone, Tianjian Li, Benjamin Van Durme, Daniel Khashabi

    Abstract: For humans to trust the fluent generations of large language models (LLMs), they must be able to verify their correctness against trusted, external sources. Recent efforts aim to increase verifiability through citations of retrieved documents or post-hoc provenance. However, such citations are prone to mistakes that further complicate their verifiability. To address these limitations, we tackle th… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

  11. arXiv:2403.15246  [pdf, other

    cs.IR cs.CL cs.LG

    FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions

    Authors: Orion Weller, Benjamin Chang, Sean MacAvaney, Kyle Lo, Arman Cohan, Benjamin Van Durme, Dawn Lawrie, Luca Soldaini

    Abstract: Modern Language Models (LMs) are capable of following long and complex instructions that enable a large and diverse set of user requests. While Information Retrieval (IR) models use these LMs as the backbone of their architectures, virtually none of them allow users to provide detailed instructions alongside queries, thus limiting their ability to satisfy complex information needs. In this work, w… ▽ More

    Submitted 7 May, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

  12. arXiv:2403.12958  [pdf, other

    cs.CL

    Dated Data: Tracing Knowledge Cutoffs in Large Language Models

    Authors: Jeffrey Cheng, Marc Marone, Orion Weller, Dawn Lawrie, Daniel Khashabi, Benjamin Van Durme

    Abstract: Released Large Language Models (LLMs) are often paired with a claimed knowledge cutoff date, or the dates at which training data was gathered. Such information is crucial for applications where the LLM must provide up to date information. However, this statement only scratches the surface: do all resources in the training data share the same knowledge cutoff date? Does the model's demonstrated kno… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  13. arXiv:2403.11905  [pdf, other

    cs.AI cs.CL cs.CV cs.HC

    Tur[k]ingBench: A Challenge Benchmark for Web Agents

    Authors: Kevin Xu, Yeganeh Kordi, Kate Sanders, Yizhong Wang, Adam Byerly, Jack Zhang, Benjamin Van Durme, Daniel Khashabi

    Abstract: Recent chatbots have demonstrated impressive ability to understand and communicate in raw-text form. However, there is more to the world than raw text. For example, humans spend long hours of their time on web pages, where text is intertwined with other modalities and tasks are accomplished in the form of various complex interactions. Can state-of-the-art multi-modal models generalize to such comp… ▽ More

    Submitted 21 March, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

  14. arXiv:2403.11903  [pdf, other

    cs.CL

    A Closer Look at Claim Decomposition

    Authors: Miriam Wanner, Seth Ebner, Zhengping Jiang, Mark Dredze, Benjamin Van Durme

    Abstract: As generated text becomes more commonplace, it is increasingly important to evaluate how well-supported such text is by external knowledge sources. Many approaches for evaluating textual support rely on some method for decomposing text into its individual subclaims which are scored against a trusted reference. We investigate how various methods of claim decomposition -- especially LLM-based method… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  15. arXiv:2403.04746  [pdf, other

    cs.CL cs.AI cs.LG

    LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error

    Authors: Boshi Wang, Hao Fang, Jason Eisner, Benjamin Van Durme, Yu Su

    Abstract: Tools are essential for large language models (LLMs) to acquire up-to-date information and take consequential actions in external environments. Existing work on tool-augmented LLMs primarily focuses on the broad coverage of tools and the flexibility of adding new tools. However, a critical aspect that has surprisingly been understudied is simply how accurately an LLM uses tools for which it has be… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

    Comments: Code and data available at https://github.com/microsoft/simulated-trial-and-error

  16. arXiv:2402.19467  [pdf, other

    cs.CL cs.AI cs.CV

    TV-TREES: Multimodal Entailment Trees for Neuro-Symbolic Video Reasoning

    Authors: Kate Sanders, Nathaniel Weir, Benjamin Van Durme

    Abstract: It is challenging to perform question-answering over complex, multimodal content such as television clips. This is in part because current video-language models rely on single-modality reasoning, have lowered performance on long inputs, and lack interpetability. We propose TV-TREES, the first multimodal entailment tree generator. TV-TREES serves as an approach to video understanding that promotes… ▽ More

    Submitted 10 March, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

    Comments: 9 pages, preprint

    ACM Class: I.2.7; I.2.10

  17. arXiv:2402.18678  [pdf, other

    cs.CL

    RORA: Robust Free-Text Rationale Evaluation

    Authors: Zhengping Jiang, Yining Lu, Hanjie Chen, Daniel Khashabi, Benjamin Van Durme, Anqi Liu

    Abstract: Free-text rationales play a pivotal role in explainable NLP, bridging the knowledge and reasoning gaps behind a model's decision-making. However, due to the diversity of potential reasoning paths and a corresponding lack of definitive ground truth, their evaluation remains a challenge. Existing evaluation metrics rely on the degree to which a rationale supports a target label, but we find these fa… ▽ More

    Submitted 14 June, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

  18. arXiv:2402.14798  [pdf, other

    cs.CL cs.AI

    Enhancing Systematic Decompositional Natural Language Inference Using Informal Logic

    Authors: Nathaniel Weir, Kate Sanders, Orion Weller, Shreya Sharma, Dongwei Jiang, Zhengping Jiang, Bhavana Dalvi Mishra, Oyvind Tafjord, Peter Jansen, Peter Clark, Benjamin Van Durme

    Abstract: Contemporary language models enable new opportunities for structured reasoning with text, such as the construction and evaluation of intuitive, proof-like textual entailment trees without relying on brittle formal logic. However, progress in this direction has been hampered by a long-standing lack of a clear protocol for determining what valid compositional entailment is. This absence causes noisy… ▽ More

    Submitted 27 February, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

  19. arXiv:2402.01172  [pdf, other

    cs.CL cs.SD eess.AS

    Streaming Sequence Transduction through Dynamic Compression

    Authors: Weiting Tan, Yunmo Chen, Tongfei Chen, Guanghui Qin, Haoran Xu, Heidi C. Zhang, Benjamin Van Durme, Philipp Koehn

    Abstract: We introduce STAR (Stream Transduction with Anchor Representations), a novel Transformer-based model designed for efficient sequence-to-sequence transduction over streams. STAR dynamically segments input streams to create compressed anchor representations, achieving nearly lossless compression (12x) in Automatic Speech Recognition (ASR) and outperforming existing methods. Moreover, STAR demonstrat… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

  20. arXiv:2401.16209  [pdf, other

    cs.CL cs.AI

    MultiMUC: Multilingual Template Filling on MUC-4

    Authors: William Gantt, Shabnam Behzad, Hannah YoungEun An, Yunmo Chen, Aaron Steven White, Benjamin Van Durme, Mahsa Yarmohammadi

    Abstract: We introduce MultiMUC, the first multilingual parallel corpus for template filling, comprising translations of the classic MUC-4 template filling benchmark into five languages: Arabic, Chinese, Farsi, Korean, and Russian. We obtain automatic translations from a strong multilingual machine translation system and manually project the original English annotations into each target language. For all la… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

    Comments: EACL 2024

  21. arXiv:2401.08417  [pdf, other

    cs.CL

    Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation

    Authors: Haoran Xu, Amr Sharaf, Yunmo Chen, Weiting Tan, Lingfeng Shen, Benjamin Van Durme, Kenton Murray, Young Jin Kim

    Abstract: Moderate-sized large language models (LLMs) -- those with 7B or 13B parameters -- exhibit promising machine translation (MT) performance. However, even the top-performing 13B LLM-based translation models, like ALMA, does not match the performance of state-of-the-art conventional encoder-decoder translation models or larger-scale LLMs such as GPT-4. In this study, we bridge this performance gap. We… ▽ More

    Submitted 2 June, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

    Comments: Accepted at ICML 2024

  22. arXiv:2401.06715  [pdf, other

    cs.CL cs.AI

    Reframing Tax Law Entailment as Analogical Reasoning

    Authors: Xinrui Zou, Ming Zhang, Nathaniel Weir, Benjamin Van Durme, Nils Holzenberger

    Abstract: Statutory reasoning refers to the application of legislative provisions to a series of case facts described in natural language. We re-frame statutory reasoning as an analogy task, where each instance of the analogy task involves a combination of two instances of statutory reasoning. This increases the dataset size by two orders of magnitude, and introduces an element of interpretability. We show… ▽ More

    Submitted 12 January, 2024; originally announced January 2024.

  23. arXiv:2312.17249  [pdf, other

    cs.CL cs.AI cs.LG

    Do Androids Know They're Only Dreaming of Electric Sheep?

    Authors: Sky CH-Wang, Benjamin Van Durme, Jason Eisner, Chris Kedzie

    Abstract: We design probes trained on the internal representations of a transformer language model to predict its hallucinatory behavior on three grounded generation tasks. To train the probes, we annotate for span-level hallucination on both sampled (organic) and manually edited (synthetic) reference outputs. Our probes are narrowly trained and we find that they are sensitive to their training domain: they… ▽ More

    Submitted 8 June, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

    Comments: ACL 2024 (Findings) Camera-Ready

  24. arXiv:2311.09796  [pdf, other

    cs.CL cs.AI

    Interpreting User Requests in the Context of Natural Language Standing Instructions

    Authors: Nikita Moghe, Patrick Xia, Jacob Andreas, Jason Eisner, Benjamin Van Durme, Harsh Jhamtani

    Abstract: Users of natural language interfaces, generally powered by Large Language Models (LLMs),often must repeat their preferences each time they make a similar request. We describe an approach to LLM-based dialogue modeling in which persistent user constraints and preferences -- collectively termed standing instructions -- as additional context for such interfaces. For example, when a user states "I'm h… ▽ More

    Submitted 7 March, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

    Comments: Updated with results from LLaMA-2

  25. arXiv:2311.09693  [pdf, other

    cs.CL cs.AI

    BLT: Can Large Language Models Handle Basic Legal Text?

    Authors: Andrew Blair-Stanek, Nils Holzenberger, Benjamin Van Durme

    Abstract: We find that the best publicly available LLMs like GPT-4, Claude, and {PaLM 2} currently perform poorly at basic legal text handling. We introduce a benchmark consisting of tasks that lawyers and paralegals would expect LLMs to handle zero-shot, such as looking up the text at a line of a witness deposition or at a subsection of a contract. LLMs' poor performance on this benchmark casts into doubt… ▽ More

    Submitted 28 February, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

    ACM Class: I.2.1; I.2.7; J.7

  26. arXiv:2311.08620  [pdf, other

    cs.CL cs.LG

    Toucan: Token-Aware Character Level Language Modeling

    Authors: William Fleshman, Benjamin Van Durme

    Abstract: Character-level language models obviate the need for separately trained tokenizers, but efficiency suffers from longer sequence lengths. Learning to combine character representations into tokens has made training these models more efficient, but they still require decoding characters individually. We propose Toucan, an augmentation to character-level models to make them "token-aware". Comparing ou… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

  27. arXiv:2311.05601  [pdf, other

    cs.CL

    FAMuS: Frames Across Multiple Sources

    Authors: Siddharth Vashishtha, Alexander Martin, William Gantt, Benjamin Van Durme, Aaron Steven White

    Abstract: Understanding event descriptions is a central aspect of language processing, but current approaches focus overwhelmingly on single sentences or documents. Aggregating information about an event \emph{across documents} can offer a much richer understanding. To this end, we present FAMuS, a new corpus of Wikipedia passages that \emph{report} on some event, paired with underlying, genre-diverse (non-… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.

  28. arXiv:2311.02310  [pdf, other

    cs.CL

    Narrowing the Gap between Zero- and Few-shot Machine Translation by Matching Styles

    Authors: Weiting Tan, Haoran Xu, Lingfeng Shen, Shuyue Stella Li, Kenton Murray, Philipp Koehn, Benjamin Van Durme, Yunmo Chen

    Abstract: Large language models trained primarily in a monolingual setting have demonstrated their ability to generalize to machine translation using zero- and few-shot examples with in-context learning. However, even though zero-shot translations are relatively good, there remains a discernible gap comparing their performance with the few-shot setting. In this paper, we investigate the factors contributing… ▽ More

    Submitted 3 November, 2023; originally announced November 2023.

  29. arXiv:2310.14495  [pdf, other

    cs.CL cs.AI

    InstructExcel: A Benchmark for Natural Language Instruction in Excel

    Authors: Justin Payan, Swaroop Mishra, Mukul Singh, Carina Negreanu, Christian Poelitz, Chitta Baral, Subhro Roy, Rasika Chakravarthy, Benjamin Van Durme, Elnaz Nouri

    Abstract: With the evolution of Large Language Models (LLMs) we can solve increasingly more complex NLP tasks across various domains, including spreadsheets. This work investigates whether LLMs can generate code (Excel OfficeScripts, a TypeScript API for executing many tasks in Excel) that solves Excel specific tasks provided via natural language user instructions. To do so we introduce a new large-scale be… ▽ More

    Submitted 22 October, 2023; originally announced October 2023.

    Comments: Findings of EMNLP 2023, 18 pages

  30. arXiv:2310.13793  [pdf, other

    cs.CL cs.LG

    A Unified View of Evaluation Metrics for Structured Prediction

    Authors: Yunmo Chen, William Gantt, Tongfei Chen, Aaron Steven White, Benjamin Van Durme

    Abstract: We present a conceptual framework that unifies a variety of evaluation metrics for different structured prediction tasks (e.g. event and relation extraction, syntactic and semantic parsing). Our framework requires representing the outputs of these tasks as objects of certain data types, and derives metrics through matching of common substructures, possibly followed by normalization. We demonstrate… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

    Comments: Accepted at EMNLP2023 Main Track

  31. arXiv:2310.03991  [pdf, other

    cs.CL

    SemStamp: A Semantic Watermark with Paraphrastic Robustness for Text Generation

    Authors: Abe Bohan Hou, Jingyu Zhang, Tianxing He, Yichen Wang, Yung-Sung Chuang, Hongwei Wang, Lingfeng Shen, Benjamin Van Durme, Daniel Khashabi, Yulia Tsvetkov

    Abstract: Existing watermarking algorithms are vulnerable to paraphrase attacks because of their token-level design. To address this issue, we propose SemStamp, a robust sentence-level semantic watermarking algorithm based on locality-sensitive hashing (LSH), which partitions the semantic space of sentences. The algorithm encodes and LSH-hashes a candidate sentence generated by an LLM, and conducts sentence… ▽ More

    Submitted 22 April, 2024; v1 submitted 5 October, 2023; originally announced October 2023.

    Comments: Accepted to NAACL 24 Main

  32. arXiv:2310.02409  [pdf, other

    cs.CL cs.AI cs.LG

    Dodo: Dynamic Contextual Compression for Decoder-only LMs

    Authors: Guanghui Qin, Corby Rosset, Ethan C. Chau, Nikhil Rao, Benjamin Van Durme

    Abstract: Transformer-based language models (LMs) are inefficient in long contexts. We propose Dodo, a solution for context compression. Instead of one vector per token in a standard transformer model, Dodo represents text with a dynamic number of hidden states at each layer, reducing the cost of self-attention to a fraction of typical time and space. Moreover, off-the-shelf models such as LLaMA can be adap… ▽ More

    Submitted 13 June, 2024; v1 submitted 3 October, 2023; originally announced October 2023.

    Comments: ACL 2024 camera-ready. 15 pages and 7 figures

    ACM Class: I.2.7; I.2.6

  33. arXiv:2310.01732  [pdf, other

    cs.CL cs.AI cs.LG

    Nugget: Neural Agglomerative Embeddings of Text

    Authors: Guanghui Qin, Benjamin Van Durme

    Abstract: Embedding text sequences is a widespread requirement in modern language understanding. Existing approaches focus largely on constant-size representations. This is problematic, as the amount of information contained in text often varies with the length of the input. We propose a solution called Nugget, which encodes language into a representation based on a dynamically selected subset of input toke… ▽ More

    Submitted 2 October, 2023; originally announced October 2023.

    Comments: Appeared at ICML 2023

    ACM Class: I.2.7; I.2.6

    Journal ref: ICML 2023

  34. arXiv:2309.13075  [pdf, other

    cs.AI cs.CL cs.LG

    SCREWS: A Modular Framework for Reasoning with Revisions

    Authors: Kumar Shridhar, Harsh Jhamtani, Hao Fang, Benjamin Van Durme, Jason Eisner, Patrick Xia

    Abstract: Large language models (LLMs) can improve their accuracy on various tasks through iteratively refining and revising their output based on feedback. We observe that these revisions can introduce errors, in which case it is better to roll back to a previous result. Further, revisions are typically homogeneous: they use the same reasoning method that produced the initial answer, which may not correct… ▽ More

    Submitted 20 September, 2023; originally announced September 2023.

  35. arXiv:2309.09992  [pdf

    cs.AI cs.CL

    OpenAI Cribbed Our Tax Example, But Can GPT-4 Really Do Tax?

    Authors: Andrew Blair-Stanek, Nils Holzenberger, Benjamin Van Durme

    Abstract: The authors explain where OpenAI got the tax law example in its livestream demonstration of GPT-4, why GPT-4 got the wrong answer, and how it fails to reliably calculate taxes.

    Submitted 7 February, 2024; v1 submitted 15 September, 2023; originally announced September 2023.

    Comments: 5 pages

    ACM Class: I.2.7; I.2.0

    Journal ref: 180 TAX NOTES FEDERAL 1101 (AUG. 14, 2023)

  36. arXiv:2309.08541  [pdf, other

    cs.IR cs.AI cs.CL

    When do Generative Query and Document Expansions Fail? A Comprehensive Study Across Methods, Retrievers, and Datasets

    Authors: Orion Weller, Kyle Lo, David Wadden, Dawn Lawrie, Benjamin Van Durme, Arman Cohan, Luca Soldaini

    Abstract: Using large language models (LMs) for query or document expansion can improve generalization in information retrieval. However, it is unknown whether these techniques are universally beneficial or only effective in specific settings, such as for particular retrieval models, dataset domains, or query types. To answer this, we conduct the first comprehensive analysis of LM-based expansion. We find t… ▽ More

    Submitted 26 February, 2024; v1 submitted 15 September, 2023; originally announced September 2023.

    Comments: EACL 2024 camera ready

  37. arXiv:2307.07049  [pdf, other

    cs.CL

    MegaWika: Millions of reports and their sources across 50 diverse languages

    Authors: Samuel Barham, Orion Weller, Michelle Yuan, Kenton Murray, Mahsa Yarmohammadi, Zhengping Jiang, Siddharth Vashishtha, Alexander Martin, Anqi Liu, Aaron Steven White, Jordan Boyd-Graber, Benjamin Van Durme

    Abstract: To foster the development of new models for collaborative AI-assisted report generation, we introduce MegaWika, consisting of 13 million Wikipedia articles in 50 diverse languages, along with their 71 million referenced source materials. We process this dataset for a myriad of applications, going beyond the initial Wikipedia citation extraction and web scraping of content, including translating no… ▽ More

    Submitted 13 July, 2023; originally announced July 2023.

    Comments: Submitted to ACL, 2023

    ACM Class: I.2.7

  38. arXiv:2307.03153  [pdf, other

    cs.IR cs.CV cs.MM

    MultiVENT: Multilingual Videos of Events with Aligned Natural Text

    Authors: Kate Sanders, David Etter, Reno Kriz, Benjamin Van Durme

    Abstract: Everyday news coverage has shifted from traditional broadcasts towards a wide range of presentation formats such as first-hand, unedited video footage. Datasets that reflect the diverse array of multimodal, multilingual news sources available online could be used to teach models to benefit from this shift, but existing news video datasets focus on traditional news broadcasts produced for English-s… ▽ More

    Submitted 6 July, 2023; originally announced July 2023.

  39. arXiv:2306.16722  [pdf, other

    cs.CL cs.AI

    Evaluating Paraphrastic Robustness in Textual Entailment Models

    Authors: Dhruv Verma, Yash Kumar Lal, Shreyashee Sinha, Benjamin Van Durme, Adam Poliak

    Abstract: We present PaRTE, a collection of 1,126 pairs of Recognizing Textual Entailment (RTE) examples to evaluate whether models are robust to paraphrasing. We posit that if RTE models understand language, their predictions should be consistent across inputs that share the same meaning. We use the evaluation set to determine if RTE models' predictions change when examples are paraphrased. In our experime… ▽ More

    Submitted 29 June, 2023; originally announced June 2023.

  40. arXiv:2306.00824  [pdf, other

    cs.CL

    Zero and Few-shot Semantic Parsing with Ambiguous Inputs

    Authors: Elias Stengel-Eskin, Kyle Rawlins, Benjamin Van Durme

    Abstract: Despite the frequent challenges posed by ambiguity when representing meaning via natural language, it is often ignored or deliberately removed in tasks mapping language to formally-designed representations, which generally assume a one-to-one mapping between linguistic and formal representations. We attempt to address this shortcoming by introducing AmP, a framework, dataset, and challenge for tra… ▽ More

    Submitted 22 January, 2024; v1 submitted 1 June, 2023; originally announced June 2023.

    Comments: ICLR 2024 Camera Ready

  41. arXiv:2305.14659  [pdf, other

    cs.CL

    InteractiveIE: Towards Assessing the Strength of Human-AI Collaboration in Improving the Performance of Information Extraction

    Authors: Ishani Mondal, Michelle Yuan, Anandhavelu N, Aparna Garimella, Francis Ferraro, Andrew Blair-Stanek, Benjamin Van Durme, Jordan Boyd-Graber

    Abstract: Learning template based information extraction from documents is a crucial yet difficult task. Prior template-based IE approaches assume foreknowledge of the domain templates; however, real-world IE do not have pre-defined schemas and it is a figure-out-as you go phenomena. To quickly bootstrap templates in a real-world setting, we need to induce template slots from documents with zero or minimal… ▽ More

    Submitted 17 November, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: Version 2

  42. arXiv:2305.13993  [pdf, other

    cs.CL

    Condensing Multilingual Knowledge with Lightweight Language-Specific Modules

    Authors: Haoran Xu, Weiting Tan, Shuyue Stella Li, Yunmo Chen, Benjamin Van Durme, Philipp Koehn, Kenton Murray

    Abstract: Incorporating language-specific (LS) modules is a proven method to boost performance in multilingual machine translation. This approach bears similarity to Mixture-of-Experts (MoE) because it does not inflate FLOPs. However, the scalability of this approach to hundreds of languages (experts) tends to be unmanageable due to the prohibitive number of parameters introduced by full-rank matrices in fu… ▽ More

    Submitted 22 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: Accepted at the main conference of EMNLP 2023

  43. arXiv:2305.13252  [pdf, other

    cs.CL cs.AI

    "According to ...": Prompting Language Models Improves Quoting from Pre-Training Data

    Authors: Orion Weller, Marc Marone, Nathaniel Weir, Dawn Lawrie, Daniel Khashabi, Benjamin Van Durme

    Abstract: Large Language Models (LLMs) may hallucinate and generate fake information, despite pre-training on factual data. Inspired by the journalistic device of "according to sources", we propose according-to prompting: directing LLMs to ground responses against previously observed text. To quantify this grounding, we propose a novel evaluation metric (QUIP-Score) that measures the extent to which model-p… ▽ More

    Submitted 26 February, 2024; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: Accepted to EACL 2024

  44. arXiv:2305.08677  [pdf, other

    cs.CL

    Natural Language Decomposition and Interpretation of Complex Utterances

    Authors: Harsh Jhamtani, Hao Fang, Patrick Xia, Eran Levy, Jacob Andreas, Ben Van Durme

    Abstract: Designing natural language interfaces has historically required collecting supervised data to translate user requests into carefully designed intent representations. This requires enumerating and labeling a long tail of user requests, which is challenging. At the same time, large language models (LLMs) encode knowledge about goals and plans that can help conversational assistants interpret user re… ▽ More

    Submitted 8 January, 2024; v1 submitted 15 May, 2023; originally announced May 2023.

  45. arXiv:2305.07614  [pdf, other

    cs.IR cs.CL

    NevIR: Negation in Neural Information Retrieval

    Authors: Orion Weller, Dawn Lawrie, Benjamin Van Durme

    Abstract: Negation is a common everyday phenomena and has been a consistent area of weakness for language models (LMs). Although the Information Retrieval (IR) community has adopted LMs as the backbone of modern IR architectures, there has been little to no research in understanding how negation impacts neural IR. We therefore construct a straightforward benchmark on this theme: asking IR models to rank two… ▽ More

    Submitted 26 February, 2024; v1 submitted 12 May, 2023; originally announced May 2023.

    Comments: Accepted to EACL 2024

  46. arXiv:2303.16857  [pdf, other

    cs.CL

    Did You Mean...? Confidence-based Trade-offs in Semantic Parsing

    Authors: Elias Stengel-Eskin, Benjamin Van Durme

    Abstract: We illustrate how a calibrated model can help balance common trade-offs in task-oriented parsing. In a simulated annotator-in-the-loop experiment, we show that well-calibrated confidence scores allow us to balance cost with annotator load, improving accuracy with a small number of interactions. We then examine how confidence scores can help optimize the trade-off between usability and safety. We s… ▽ More

    Submitted 20 October, 2023; v1 submitted 29 March, 2023; originally announced March 2023.

    Comments: EMNLP 2023, Camera ready. arXiv admin note: substantial text overlap with arXiv:2211.07443

  47. arXiv:2303.03919  [pdf, other

    cs.LG cs.CL

    Data Portraits: Recording Foundation Model Training Data

    Authors: Marc Marone, Benjamin Van Durme

    Abstract: Foundation models are trained on increasingly immense and opaque datasets. Even while these models are now key in AI system building, it can be difficult to answer the straightforward question: has the model already encountered a given example during training? We therefore propose a widespread adoption of Data Portraits: artifacts that record training data and allow for downstream inspection. Firs… ▽ More

    Submitted 14 December, 2023; v1 submitted 5 March, 2023; originally announced March 2023.

    Comments: NeurIPS 2023 Datasets and Benchmarks

  48. arXiv:2302.06100  [pdf, other

    cs.CL cs.AI

    Can GPT-3 Perform Statutory Reasoning?

    Authors: Andrew Blair-Stanek, Nils Holzenberger, Benjamin Van Durme

    Abstract: Statutory reasoning is the task of reasoning with facts and statutes, which are rules written in natural language by a legislature. It is a basic legal skill. In this paper we explore the capabilities of the most capable GPT-3 model, text-davinci-003, on an established statutory-reasoning dataset called SARA. We consider a variety of approaches, including dynamic few-shot prompting, chain-of-thoug… ▽ More

    Submitted 10 May, 2023; v1 submitted 12 February, 2023; originally announced February 2023.

    Comments: 10 pages

  49. arXiv:2212.10618  [pdf, ps, other

    cs.CL

    Ontologically Faithful Generation of Non-Player Character Dialogues

    Authors: Nathaniel Weir, Ryan Thomas, Randolph D'Amore, Kellie Hill, Benjamin Van Durme, Harsh Jhamtani

    Abstract: We introduce a language generation task grounded in a popular video game environment. KNUDGE (KNowledge Constrained User-NPC Dialogue GEneration) requires models to produce trees of dialogue between video game characters that accurately reflect quest and entity specifications stated in natural language. KNUDGE is constructed from side quest dialogues drawn directly from game data of Obsidian Enter… ▽ More

    Submitted 13 May, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

  50. arXiv:2212.10019  [pdf, other

    cs.CL

    When Do Decompositions Help for Machine Reading?

    Authors: Kangda Wei, Dawn Lawrie, Benjamin Van Durme, Yunmo Chen, Orion Weller

    Abstract: Answering complex questions often requires multi-step reasoning in order to obtain the final answer. Most research into decompositions of complex questions involves open-domain systems, which have shown success in using these decompositions for improved retrieval. In the machine reading setting, however, work to understand when decompositions are helpful is understudied. We conduct experiments on… ▽ More

    Submitted 20 December, 2022; originally announced December 2022.