Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–32 of 32 results for author: Thorne, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.06424  [pdf, other

    cs.CV

    Margin-aware Preference Optimization for Aligning Diffusion Models without Reference

    Authors: Jiwoo Hong, Sayak Paul, Noah Lee, Kashif Rasul, James Thorne, Jongheon Jeong

    Abstract: Modern alignment techniques based on human preferences, such as RLHF and DPO, typically employ divergence regularization relative to the reference model to ensure training stability. However, this often limits the flexibility of models during alignment, especially when there is a clear distributional discrepancy between the preference data and the reference model. In this paper, we focus on the al… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Preprint

  2. arXiv:2406.02657  [pdf, other

    cs.CL cs.AI cs.LG

    Block Transformer: Global-to-Local Language Modeling for Fast Inference

    Authors: Namgyu Ho, Sangmin Bae, Taehyeon Kim, Hyunjik Jo, Yireun Kim, Tal Schuster, Adam Fisch, James Thorne, Se-Young Yun

    Abstract: This paper presents the Block Transformer architecture which adopts hierarchical global-to-local modeling to autoregressive transformers to mitigate the inference bottlenecks of self-attention. To apply self-attention, the key-value (KV) cache of all previous sequences must be retrieved from memory at every decoding step. Thereby, this KV cache IO becomes a significant bottleneck in batch inferenc… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 30 pages, 21 figures, 5 tables

  3. arXiv:2403.12862  [pdf, other

    cs.CL

    Epistemology of Language Models: Do Language Models Have Holistic Knowledge?

    Authors: Minsu Kim, James Thorne

    Abstract: This paper investigates the inherent knowledge in language models from the perspective of epistemological holism. The purpose of this paper is to explore whether LLMs exhibit characteristics consistent with epistemological holism. These characteristics suggest that core knowledge, such as general scientific knowledge, each plays a specific role, serving as the foundation of our knowledge system an… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  4. arXiv:2403.10900  [pdf, other

    cs.CL

    BEnQA: A Question Answering and Reasoning Benchmark for Bengali and English

    Authors: Sheikh Shafayat, H M Quamran Hasan, Minhajur Rahman Chowdhury Mahim, Rifki Afina Putri, James Thorne, Alice Oh

    Abstract: In this study, we introduce BEnQA, a dataset comprising parallel Bengali and English exam questions for middle and high school levels in Bangladesh. Our dataset consists of approximately 5K questions covering several subjects in science with different types of questions, including factual, application, and reasoning-based questions. We benchmark several Large Language Models (LLMs) with our parall… ▽ More

    Submitted 16 March, 2024; originally announced March 2024.

  5. arXiv:2403.07691  [pdf, other

    cs.CL cs.AI

    ORPO: Monolithic Preference Optimization without Reference Model

    Authors: Jiwoo Hong, Noah Lee, James Thorne

    Abstract: While recent preference alignment algorithms for language models have demonstrated promising results, supervised fine-tuning (SFT) remains imperative for achieving successful convergence. In this paper, we study the crucial role of SFT within the context of preference alignment, emphasizing that a minor penalty for the disfavored generation style is sufficient for preference-aligned SFT. Building… ▽ More

    Submitted 14 March, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

    Comments: Preprint

  6. arXiv:2403.06412  [pdf, other

    cs.CL

    CLIcK: A Benchmark Dataset of Cultural and Linguistic Intelligence in Korean

    Authors: Eunsu Kim, Juyoung Suk, Philhoon Oh, Haneul Yoo, James Thorne, Alice Oh

    Abstract: Despite the rapid development of large language models (LLMs) for the Korean language, there remains an obvious lack of benchmark datasets that test the requisite Korean cultural and linguistic knowledge. Because many existing Korean benchmark datasets are derived from the English counterparts through translation, they often overlook the different cultural contexts. For the few benchmark datasets… ▽ More

    Submitted 4 July, 2024; v1 submitted 10 March, 2024; originally announced March 2024.

  7. arXiv:2402.13482  [pdf, other

    cs.CL cs.AI cs.LG

    Retrieval-Augmented Data Augmentation for Low-Resource Domain Tasks

    Authors: Minju Seo, Jinheon Baek, James Thorne, Sung Ju Hwang

    Abstract: Despite large successes of recent language models on diverse tasks, they suffer from severe performance degeneration in low-resource settings with limited training data available. Many existing works tackle this problem by generating synthetic data from the training data and then training models on them, recently using Large Language Models (LLMs). However, in low-resource settings, the amount of… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

  8. arXiv:2402.02418  [pdf, other

    cs.IR cs.LG

    eXplainable Bayesian Multi-Perspective Generative Retrieval

    Authors: EuiYul Song, Philhoon Oh, Sangryul Kim, James Thorne

    Abstract: Modern deterministic retrieval pipelines prioritize achieving state-of-the-art performance but often lack interpretability in decision-making. These models face challenges in assessing uncertainty, leading to overconfident predictions. To overcome these limitations, we integrate uncertainty calibration and interpretability into a retrieval pipeline. Specifically, we introduce Bayesian methodologie… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

    Comments: 15 pages, 7 figures

    MSC Class: 94C06 ACM Class: H.3.3

  9. arXiv:2401.16979  [pdf, other

    cs.IR

    Re3val: Reinforced and Reranked Generative Retrieval

    Authors: EuiYul Song, Sangryul Kim, Haeju Lee, Joonkee Kim, James Thorne

    Abstract: Generative retrieval models encode pointers to information in a corpus as an index within the model's parameters. These models serve as part of a larger pipeline, where retrieved information conditions generation for knowledge-intensive NLP tasks. However, we identify two limitations: the generative retrieval does not account for contextual information. Secondly, the retrieval can't be tuned for t… ▽ More

    Submitted 23 February, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

    Comments: 17 pages, 4 figures, Findings of the Association for Computational Linguistics: EACL 2023

    MSC Class: 94C06 ACM Class: H.3.3

  10. arXiv:2311.00321  [pdf, other

    cs.CL

    HARE: Explainable Hate Speech Detection with Step-by-Step Reasoning

    Authors: Yongjin Yang, Joonkee Kim, Yujin Kim, Namgyu Ho, James Thorne, Se-young Yun

    Abstract: With the proliferation of social media, accurate detection of hate speech has become critical to ensure safety online. To combat nuanced forms of hate speech, it is important to identify and thoroughly explain hate speech to help users understand its harmful effects. Recent benchmarks have attempted to tackle this issue by training generative models on free-text annotations of implications in hate… ▽ More

    Submitted 22 November, 2023; v1 submitted 1 November, 2023; originally announced November 2023.

    Comments: Findings of EMNLP 2023; The first three authors contribute equally

  11. arXiv:2310.18077  [pdf, other

    cs.CL cs.AI

    Detrimental Contexts in Open-Domain Question Answering

    Authors: Philhoon Oh, James Thorne

    Abstract: For knowledge intensive NLP tasks, it has been widely accepted that accessing more information is a contributing factor to improvements in the model's end-to-end performance. However, counter-intuitively, too much context can have a negative impact on the model when evaluated on common question answering (QA) datasets. In this paper, we analyze how passages can have a detrimental effect on retriev… ▽ More

    Submitted 27 October, 2023; originally announced October 2023.

    Comments: Findings of EMNLP 2023

  12. arXiv:2310.18076  [pdf, other

    cs.CL cs.AI

    Knowledge Corpus Error in Question Answering

    Authors: Yejoon Lee, Philhoon Oh, James Thorne

    Abstract: Recent works in open-domain question answering (QA) have explored generating context passages from large language models (LLMs), replacing the traditional retrieval step in the QA pipeline. However, it is not well understood why generated passages can be more effective than retrieved ones. This study revisits the conventional formulation of QA and introduces the concept of knowledge corpus error.… ▽ More

    Submitted 27 October, 2023; originally announced October 2023.

    Comments: Findings of EMNLP 2023

  13. arXiv:2310.08491  [pdf, other

    cs.CL cs.LG

    Prometheus: Inducing Fine-grained Evaluation Capability in Language Models

    Authors: Seungone Kim, Jamin Shin, Yejin Cho, Joel Jang, Shayne Longpre, Hwaran Lee, Sangdoo Yun, Seongjin Shin, Sungdong Kim, James Thorne, Minjoon Seo

    Abstract: Recently, using a powerful proprietary Large Language Model (LLM) (e.g., GPT-4) as an evaluator for long-form responses has become the de facto standard. However, for practitioners with large-scale evaluation tasks and custom criteria in consideration (e.g., child-readability), using proprietary LLMs as an evaluator is unreliable due to the closed-source nature, uncontrolled versioning, and prohib… ▽ More

    Submitted 9 March, 2024; v1 submitted 12 October, 2023; originally announced October 2023.

    Comments: ICLR 2024

  14. arXiv:2308.01525  [pdf, other

    cs.CV

    VisAlign: Dataset for Measuring the Degree of Alignment between AI and Humans in Visual Perception

    Authors: Jiyoung Lee, Seungho Kim, Seunghyun Won, Joonseok Lee, Marzyeh Ghassemi, James Thorne, Jaeseok Choi, O-Kil Kwon, Edward Choi

    Abstract: AI alignment refers to models acting towards human-intended goals, preferences, or ethical principles. Given that most large-scale deep learning models act as black boxes and cannot be manually controlled, analyzing the similarity between models and humans can be a proxy measure for ensuring AI safety. In this paper, we focus on the models' visual perception alignment with humans, further referred… ▽ More

    Submitted 20 October, 2023; v1 submitted 3 August, 2023; originally announced August 2023.

    Comments: Published as a conference paper at NeurIPS 2023 (Track on Datasets and Benchmarks)

  15. arXiv:2307.10928  [pdf, other

    cs.CL cs.AI

    FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets

    Authors: Seonghyeon Ye, Doyoung Kim, Sungdong Kim, Hyeonbin Hwang, Seungone Kim, Yongrae Jo, James Thorne, Juho Kim, Minjoon Seo

    Abstract: Evaluation of Large Language Models (LLMs) is challenging because instruction-following necessitates alignment with human values and the required set of skills varies depending on the instruction. However, previous studies have mainly focused on coarse-grained evaluation (i.e. overall preference-based evaluation), which limits interpretability since it does not consider the nature of user instruct… ▽ More

    Submitted 14 April, 2024; v1 submitted 20 July, 2023; originally announced July 2023.

    Comments: ICLR 2024 Spotlight

  16. arXiv:2305.13788  [pdf, other

    cs.CL cs.AI

    Can Large Language Models Capture Dissenting Human Voices?

    Authors: Noah Lee, Na Min An, James Thorne

    Abstract: Large language models (LLMs) have shown impressive achievements in solving a broad range of tasks. Augmented by instruction fine-tuning, LLMs have also been shown to generalize in zero-shot settings as well. However, whether LLMs closely align with the human disagreement distribution has not been well-studied, especially within the scope of natural language inference (NLI). In this paper, we evalu… ▽ More

    Submitted 27 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: To appear at EMNLP 2023

  17. arXiv:2305.06590  [pdf, other

    cs.CL cs.AI

    FactKG: Fact Verification via Reasoning on Knowledge Graphs

    Authors: Jiho Kim, Sungjin Park, Yeonsu Kwon, Yohan Jo, James Thorne, Edward Choi

    Abstract: In real world applications, knowledge graphs (KG) are widely used in various domains (e.g. medical applications and dialogue agents). However, for fact verification, KGs have not been adequately utilized as a knowledge source. KGs can be a valuable knowledge source in fact verification due to their reliability and broad applicability. A KG consists of nodes and edges which makes it clear how conce… ▽ More

    Submitted 18 May, 2023; v1 submitted 11 May, 2023; originally announced May 2023.

    Comments: Accepted to ACL 2023

  18. arXiv:2304.02247  [pdf, other

    cs.CL cs.LG

    Disentangling Structure and Style: Political Bias Detection in News by Inducing Document Hierarchy

    Authors: Jiwoo Hong, Yejin Cho, Jaemin Jung, Jiyoung Han, James Thorne

    Abstract: We address an important gap in detecting political bias in news articles. Previous works that perform document classification can be influenced by the writing style of each news outlet, leading to overfitting and limited generalizability. Our approach overcomes this limitation by considering both the sentence-level semantics and the document-level rhetorical structure, resulting in a more robust a… ▽ More

    Submitted 27 October, 2023; v1 submitted 5 April, 2023; originally announced April 2023.

    Comments: Findings of EMNLP 2023

  19. arXiv:2211.09388  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    Data-Efficient Autoregressive Document Retrieval for Fact Verification

    Authors: James Thorne

    Abstract: Document retrieval is a core component of many knowledge-intensive natural language processing task formulations such as fact verification and question answering. Sources of textual knowledge, such as Wikipedia articles, condition the generation of answers from the models. Recent advances in retrieval use sequence-to-sequence models to incrementally predict the title of the appropriate Wikipedia p… ▽ More

    Submitted 17 November, 2022; originally announced November 2022.

    Comments: To appear at SustaiNLP@EMNLP 2022. Code is available: https://github.com/j6mes/sustainlp2022-deardr

  20. arXiv:2106.05707  [pdf, other

    cs.CL

    FEVEROUS: Fact Extraction and VERification Over Unstructured and Structured information

    Authors: Rami Aly, Zhijiang Guo, Michael Schlichtkrull, James Thorne, Andreas Vlachos, Christos Christodoulopoulos, Oana Cocarascu, Arpit Mittal

    Abstract: Fact verification has attracted a lot of attention in the machine learning and natural language processing communities, as it is one of the key methods for detecting misinformation. Existing large-scale benchmarks for this task have focused mostly on textual sources, i.e. unstructured information, and thus ignored the wealth of information available in structured formats, such as tables. In this p… ▽ More

    Submitted 12 October, 2021; v1 submitted 10 June, 2021; originally announced June 2021.

    Comments: Accepted at NeurIPS 2021 Datasets and Benchmarks Track

  21. arXiv:2106.01074  [pdf, other

    cs.CL cs.AI cs.DB

    Database Reasoning Over Text

    Authors: James Thorne, Majid Yazdani, Marzieh Saeidi, Fabrizio Silvestri, Sebastian Riedel, Alon Halevy

    Abstract: Neural models have shown impressive performance gains in answering queries from natural language text. However, existing works are unable to support database queries, such as "List/Count all female athletes who were born in 20th century", which require reasoning over sets of relevant facts with operations such as join, filtering and aggregation. We show that while state-of-the-art transformer mode… ▽ More

    Submitted 2 June, 2021; originally announced June 2021.

    Comments: To appear at ACL2021

  22. arXiv:2106.01072   

    cs.CL cs.AI cs.LG

    Evidence-based Factual Error Correction

    Authors: James Thorne, Andreas Vlachos

    Abstract: This paper introduces the task of factual error correction: performing edits to a claim so that the generated rewrite is better supported by evidence. This extends the well-studied task of fact verification by providing a mechanism to correct written texts that are refuted or only partially supported by evidence. We demonstrate that it is feasible to train factual error correction systems from exi… ▽ More

    Submitted 17 June, 2021; v1 submitted 2 June, 2021; originally announced June 2021.

    Comments: Uploaded as a new paper in error. Please see the replacement of arxiv paper 2012.15788v2 for this version: arXiv:2012.15788

  23. arXiv:2104.00640  [pdf, other

    cs.CL cs.AI

    AmbiFC: Fact-Checking Ambiguous Claims with Evidence

    Authors: Max Glockner, Ieva Staliūnaitė, James Thorne, Gisela Vallejo, Andreas Vlachos, Iryna Gurevych

    Abstract: Automated fact-checking systems verify claims against evidence to predict their veracity. In real-world scenarios, the retrieved evidence may not unambiguously support or refute the claim and yield conflicting but valid interpretations. Existing fact-checking datasets assume that the models developed with them predict a single veracity label for each claim, thus discouraging the handling of such a… ▽ More

    Submitted 14 December, 2023; v1 submitted 1 April, 2021; originally announced April 2021.

    Comments: Accepted at TACL; pre-MIT Press publication version

  24. arXiv:2012.15788  [pdf, other

    cs.CL cs.AI

    Evidence-based Factual Error Correction

    Authors: James Thorne, Andreas Vlachos

    Abstract: This paper introduces the task of factual error correction: performing edits to a claim so that the generated rewrite is better supported by evidence. This extends the well-studied task of fact verification by providing a mechanism to correct written texts that are refuted or only partially supported by evidence. We demonstrate that it is feasible to train factual error correction systems from exi… ▽ More

    Submitted 11 June, 2021; v1 submitted 31 December, 2020; originally announced December 2020.

    Comments: Accepted at ACL2021

  25. arXiv:2010.06973  [pdf, other

    cs.CL cs.DB cs.LG

    Neural Databases

    Authors: James Thorne, Majid Yazdani, Marzieh Saeidi, Fabrizio Silvestri, Sebastian Riedel, Alon Halevy

    Abstract: In recent years, neural networks have shown impressive performance gains on long-standing AI problems, and in particular, answering queries from natural language text. These advances raise the question of whether they can be extended to a point where we can relax the fundamental assumption of database management, namely, that our data is represented as fields of a pre-defined schema. This paper… ▽ More

    Submitted 14 October, 2020; originally announced October 2020.

    Comments: Submitted to PVLDB vol 14

  26. arXiv:2009.02252  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    KILT: a Benchmark for Knowledge Intensive Language Tasks

    Authors: Fabio Petroni, Aleksandra Piktus, Angela Fan, Patrick Lewis, Majid Yazdani, Nicola De Cao, James Thorne, Yacine Jernite, Vladimir Karpukhin, Jean Maillard, Vassilis Plachouras, Tim Rocktäschel, Sebastian Riedel

    Abstract: Challenging problems such as open-domain question answering, fact checking, slot filling and entity linking require access to large, external knowledge sources. While some models do well on individual tasks, developing general models is difficult as each task might require computationally expensive indexing of custom knowledge sources, in addition to dedicated infrastructure. To catalyze research… ▽ More

    Submitted 27 May, 2021; v1 submitted 4 September, 2020; originally announced September 2020.

    Comments: accepted at NAACL 2021

  27. arXiv:2004.14366  [pdf, other

    cs.CL cs.LG

    Elastic weight consolidation for better bias inoculation

    Authors: James Thorne, Andreas Vlachos

    Abstract: The biases present in training datasets have been shown to affect models for sentence pair classification tasks such as natural language inference (NLI) and fact verification. While fine-tuning models on additional data has been used to mitigate them, a common issue is that of catastrophic forgetting of the original training dataset. In this paper, we show that elastic weight consolidation (EWC) a… ▽ More

    Submitted 4 February, 2021; v1 submitted 29 April, 2020; originally announced April 2020.

    Comments: Accepted at EACL 2021. Was previously submitted to arxiv with the title "Avoiding catastrophic forgetting in mitigating model biases in sentence-pair classification with elastic weight consolidation"

  28. arXiv:1904.10717  [pdf, ps, other

    cs.LG cs.CL stat.ML

    Generating Token-Level Explanations for Natural Language Inference

    Authors: James Thorne, Andreas Vlachos, Christos Christodoulopoulos, Arpit Mittal

    Abstract: The task of Natural Language Inference (NLI) is widely modeled as supervised sentence pair classification. While there has been a lot of work recently on generating explanations of the predictions of classifiers on a single piece of text, there have been no attempts to generate explanations of classifiers operating on pairs of sentences. In this paper, we show that it is possible to generate token… ▽ More

    Submitted 24 April, 2019; originally announced April 2019.

    Comments: Accepted at NAACL2019

  29. arXiv:1903.05543  [pdf, ps, other

    cs.CL cs.AI

    Adversarial attacks against Fact Extraction and VERification

    Authors: James Thorne, Andreas Vlachos

    Abstract: This paper describes a baseline for the second iteration of the Fact Extraction and VERification shared task (FEVER2.0) which explores the resilience of systems through adversarial evaluation. We present a collection of simple adversarial attacks against systems that participated in the first FEVER shared task. FEVER modeled the assessment of truthfulness of written claims as a joint information r… ▽ More

    Submitted 13 March, 2019; originally announced March 2019.

    Comments: Work in progress

  30. arXiv:1811.10971  [pdf, ps, other

    cs.CL

    The Fact Extraction and VERification (FEVER) Shared Task

    Authors: James Thorne, Andreas Vlachos, Oana Cocarascu, Christos Christodoulopoulos, Arpit Mittal

    Abstract: We present the results of the first Fact Extraction and VERification (FEVER) Shared Task. The task challenged participants to classify whether human-written factoid claims could be Supported or Refuted using evidence retrieved from Wikipedia. We received entries from 23 competing teams, 19 of which scored higher than the previously published baseline. The best performing system achieved a FEVER sc… ▽ More

    Submitted 30 November, 2018; v1 submitted 27 November, 2018; originally announced November 2018.

    Comments: Revised from published version in the proceedings of the FEVER workshop at EMNLP 2018

  31. arXiv:1806.07687  [pdf, ps, other

    cs.CL

    Automated Fact Checking: Task formulations, methods and future directions

    Authors: James Thorne, Andreas Vlachos

    Abstract: The recently increased focus on misinformation has stimulated research in fact checking, the task of assessing the truthfulness of a claim. Research in automating this task has been conducted in a variety of disciplines including natural language processing, machine learning, knowledge representation, databases, and journalism. While there has been substantial progress, relevant papers and article… ▽ More

    Submitted 5 September, 2018; v1 submitted 20 June, 2018; originally announced June 2018.

    Comments: Published at the 27th International Conference on Computational Linguistics (COLING 2018)

  32. arXiv:1803.05355  [pdf, other

    cs.CL

    FEVER: a large-scale dataset for Fact Extraction and VERification

    Authors: James Thorne, Andreas Vlachos, Christos Christodoulopoulos, Arpit Mittal

    Abstract: In this paper we introduce a new publicly available dataset for verification against textual sources, FEVER: Fact Extraction and VERification. It consists of 185,445 claims generated by altering sentences extracted from Wikipedia and subsequently verified without knowledge of the sentence they were derived from. The claims are classified as Supported, Refuted or NotEnoughInfo by annotators achievi… ▽ More

    Submitted 18 December, 2018; v1 submitted 14 March, 2018; originally announced March 2018.

    Comments: Updated version of NAACL2018 paper. Data is released on http://fever.ai