Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–13 of 13 results for author: Kedzie, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2312.17249  [pdf, other

    cs.CL cs.AI cs.LG

    Do Androids Know They're Only Dreaming of Electric Sheep?

    Authors: Sky CH-Wang, Benjamin Van Durme, Jason Eisner, Chris Kedzie

    Abstract: We design probes trained on the internal representations of a transformer language model to predict its hallucinatory behavior on three grounded generation tasks. To train the probes, we annotate for span-level hallucination on both sampled (organic) and manually edited (synthetic) reference outputs. Our probes are narrowly trained and we find that they are sensitive to their training domain: they… ▽ More

    Submitted 8 June, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

    Comments: ACL 2024 (Findings) Camera-Ready

  2. arXiv:2111.13993  [pdf, other

    cs.CL

    An analysis of document graph construction methods for AMR summarization

    Authors: Fei-Tzin Lee, Chris Kedzie, Nakul Verma, Kathleen McKeown

    Abstract: Meaning Representation (AMR) is a graph-based semantic representation for sentences, composed of collections of concepts linked by semantic relations. AMR-based approaches have found success in a variety of applications, but a challenge to using it in tasks that require document-level context is that it only represents individual sentences. Prior work in AMR-based summarization has automatically m… ▽ More

    Submitted 27 November, 2021; originally announced November 2021.

  3. arXiv:2110.11850  [pdf, other

    cs.CL

    Lightweight Decoding Strategies for Increasing Specificity

    Authors: Katy Ilonka Gero, Chris Kedzie, Savvas Petridis, Lydia Chilton

    Abstract: Language models are known to produce vague and generic outputs. We propose two unsupervised decoding strategies based on either word-frequency or point-wise mutual information to increase the specificity of any model that outputs a probability distribution over its vocabulary at generation time. We test the strategies in a prompt completion task; with human evaluations, we find that both strategie… ▽ More

    Submitted 22 October, 2021; originally announced October 2021.

  4. arXiv:2106.02293  [pdf, other

    cs.CL cs.IR

    Cross-language Sentence Selection via Data Augmentation and Rationale Training

    Authors: Yanda Chen, Chris Kedzie, Suraj Nair, Petra Galuščáková, Rui Zhang, Douglas W. Oard, Kathleen McKeown

    Abstract: This paper proposes an approach to cross-language sentence selection in a low-resource setting. It uses data augmentation and negative sampling techniques on noisy parallel sentence data to directly learn a cross-lingual embedding-based query relevance model. Results show that this approach performs as well as or better than multiple state-of-the-art machine translation + monolingual retrieval sys… ▽ More

    Submitted 4 June, 2021; originally announced June 2021.

    Comments: ACL 2021 main conference

  5. arXiv:2104.07868  [pdf, other

    cs.CL

    Segmenting Subtitles for Correcting ASR Segmentation Errors

    Authors: David Wan, Chris Kedzie, Faisal Ladhak, Elsbeth Turcan, Petra Galuščáková, Elena Zotkina, Zhengping Jiang, Peter Bell, Kathleen McKeown

    Abstract: Typical ASR systems segment the input audio into utterances using purely acoustic information, which may not resemble the sentence-like units that are expected by conventional machine translation (MT) systems for Spoken Language Translation. In this work, we propose a model for correcting the acoustic segmentation of ASR models for low-resource languages to improve performance on downstream tasks.… ▽ More

    Submitted 15 April, 2021; originally announced April 2021.

  6. arXiv:2010.09693  [pdf, other

    cs.CL

    Subtitles to Segmentation: Improving Low-Resource Speech-to-Text Translation Pipelines

    Authors: David Wan, Zhengping Jiang, Chris Kedzie, Elsbeth Turcan, Peter Bell, Kathleen McKeown

    Abstract: In this work, we focus on improving ASR output segmentation in the context of low-resource language speech-to-text translation. ASR output segmentation is crucial, as ASR systems segment the input audio using purely acoustic information and are not guaranteed to output sentence-like segments. Since most MT systems expect sentences as input, feeding in longer unsegmented passages can lead to sub-op… ▽ More

    Submitted 19 October, 2020; originally announced October 2020.

    Journal ref: CLSST@LREC 2020 68-73

  7. arXiv:2010.09608  [pdf, other

    cs.CL

    Incorporating Terminology Constraints in Automatic Post-Editing

    Authors: David Wan, Chris Kedzie, Faisal Ladhak, Marine Carpuat, Kathleen McKeown

    Abstract: Users of machine translation (MT) may want to ensure the use of specific lexical terminologies. While there exist techniques for incorporating terminology constraints during inference for MT, current APE approaches cannot ensure that they will appear in the final translation. In this paper, we present both autoregressive and non-autoregressive models for lexically constrained APE, demonstrating th… ▽ More

    Submitted 19 October, 2020; originally announced October 2020.

    Comments: To appear in WMT, 2020

  8. arXiv:1911.03385  [pdf, other

    cs.CL

    Low-Level Linguistic Controls for Style Transfer and Content Preservation

    Authors: Katy Gero, Chris Kedzie, Jonathan Reeve, Lydia Chilton

    Abstract: Despite the success of style transfer in image processing, it has seen limited progress in natural language generation. Part of the problem is that content is not as easily decoupled from style in the text domain. Curiously, in the field of stylometry, content does not figure prominently in practical methods of discriminating stylistic elements, such as authorship and genre. Rather, syntax and fun… ▽ More

    Submitted 8 November, 2019; originally announced November 2019.

    Comments: Accepted as a long paper at INLG 2019

  9. arXiv:1911.03373  [pdf, ps, other

    cs.CL

    A Good Sample is Hard to Find: Noise Injection Sampling and Self-Training for Neural Language Generation Models

    Authors: Chris Kedzie, Kathleen McKeown

    Abstract: Deep neural networks (DNN) are quickly becoming the de facto standard modeling method for many natural language generation (NLG) tasks. In order for such models to truly be useful, they must be capable of correctly generating utterances for novel meaning representations (MRs) at test time. In practice, even sophisticated DNNs with various forms of semantic control frequently fail to generate utter… ▽ More

    Submitted 8 November, 2019; originally announced November 2019.

    Comments: Accepted as a long paper at INLG 2019

  10. arXiv:1810.12343  [pdf, other

    cs.CL

    Content Selection in Deep Learning Models of Summarization

    Authors: Chris Kedzie, Kathleen McKeown, Hal Daume III

    Abstract: We carry out experiments with deep learning models of summarization across the domains of news, personal stories, meetings, and medical articles in order to understand how content selection is performed. We find that many sophisticated features of state of the art extractive summarizers do not improve performance over simpler models. These results suggest that it is easier to create a summarizer f… ▽ More

    Submitted 18 February, 2019; v1 submitted 29 October, 2018; originally announced October 2018.

    Comments: Revised to correct for error in AMI oracle results. Originally published at EMNLP 2018

  11. arXiv:1809.03632  [pdf, other

    cs.CL

    Detecting Gang-Involved Escalation on Social Media Using Context

    Authors: Serina Chang, Ruiqi Zhong, Ethan Adams, Fei-Tzin Lee, Siddharth Varia, Desmond Patton, William Frey, Chris Kedzie, Kathleen McKeown

    Abstract: Gang-involved youth in cities such as Chicago have increasingly turned to social media to post about their experiences and intents online. In some situations, when they experience the loss of a loved one, their online expression of emotion may evolve into aggression towards rival gangs and ultimately into real-world violence. In this paper, we present a novel system for detecting Aggression and Lo… ▽ More

    Submitted 10 September, 2018; originally announced September 2018.

    Comments: 12 pages

    Journal ref: EMNLP 2018

  12. arXiv:1807.08465  [pdf, other

    cs.LG cs.CL stat.ML

    Multimodal Social Media Analysis for Gang Violence Prevention

    Authors: Philipp Blandfort, Desmond Patton, William R. Frey, Svebor Karaman, Surabhi Bhargava, Fei-Tzin Lee, Siddharth Varia, Chris Kedzie, Michael B. Gaskell, Rossano Schifanella, Kathleen McKeown, Shih-Fu Chang

    Abstract: Gang violence is a severe issue in major cities across the U.S. and recent studies [Patton et al. 2017] have found evidence of social media communications that can be linked to such violence in communities with high rates of exposure to gang activity. In this paper we partnered computer scientists with social work researchers, who have domain expertise in gang violence, to analyze how public tweet… ▽ More

    Submitted 23 July, 2018; originally announced July 2018.

  13. arXiv:1605.03664  [pdf, other

    cs.CL

    Real-Time Web Scale Event Summarization Using Sequential Decision Making

    Authors: Chris Kedzie, Fernando Diaz, Kathleen McKeown

    Abstract: We present a system based on sequential decision making for the online summarization of massive document streams, such as those found on the web. Given an event of interest (e.g. "Boston marathon bombing"), our system is able to filter the stream for relevance and produce a series of short text updates describing the event as it unfolds over time. Unlike previous work, our approach is able to join… ▽ More

    Submitted 11 May, 2016; originally announced May 2016.

    Comments: in Proceedings of the 25th International Joint Conference on Artificial Intelligence 2016