Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–10 of 10 results for author: Lent, H

.
  1. arXiv:2402.03137  [pdf, other

    cs.CL cs.LG

    Sociolinguistically Informed Interpretability: A Case Study on Hinglish Emotion Classification

    Authors: Kushal Tatariya, Heather Lent, Johannes Bjerva, Miryam de Lhoneux

    Abstract: Emotion classification is a challenging task in NLP due to the inherent idiosyncratic and subjective nature of linguistic expression, especially with code-mixed data. Pre-trained language models (PLMs) have achieved high performance for many tasks and languages, but it remains to be seen whether these models learn and are robust to the differences in emotional expression across languages. Sociolin… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

    Comments: 5 pages, Accepted to SIGTYP 2024 @ EACL

  2. arXiv:2401.12192  [pdf, other

    cs.CL cs.AI cs.CR

    Text Embedding Inversion Security for Multilingual Language Models

    Authors: Yiyi Chen, Heather Lent, Johannes Bjerva

    Abstract: Textual data is often represented as real-numbered embeddings in NLP, particularly with the popularity of large language models (LLMs) and Embeddings as a Service (EaaS). However, storing sensitive information as embeddings can be susceptible to security breaches, as research shows that text can be reconstructed from embeddings, even without knowledge of the underlying model. While defence mechani… ▽ More

    Submitted 5 June, 2024; v1 submitted 22 January, 2024; originally announced January 2024.

    Comments: 18 pages, 17 Tables, 6 Figures

  3. arXiv:2310.19567  [pdf, other

    cs.CL cs.AI

    CreoleVal: Multilingual Multitask Benchmarks for Creoles

    Authors: Heather Lent, Kushal Tatariya, Raj Dabre, Yiyi Chen, Marcell Fekete, Esther Ploeger, Li Zhou, Ruth-Ann Armstrong, Abee Eijansantos, Catriona Malau, Hans Erik Heje, Ernests Lavrinovics, Diptesh Kanojia, Paul Belony, Marcel Bollmann, Loïc Grobol, Miryam de Lhoneux, Daniel Hershcovich, Michel DeGraff, Anders Søgaard, Johannes Bjerva

    Abstract: Creoles represent an under-explored and marginalized group of languages, with few available resources for NLP research.While the genealogical ties between Creoles and a number of highly-resourced languages imply a significant potential for transfer learning, this potential is hampered due to this lack of annotated data. In this work we present CreoleVal, a collection of benchmark datasets spanning… ▽ More

    Submitted 6 May, 2024; v1 submitted 30 October, 2023; originally announced October 2023.

    Comments: Accepted to TACL

  4. arXiv:2206.04371  [pdf, other

    cs.CL

    Ancestor-to-Creole Transfer is Not a Walk in the Park

    Authors: Heather Lent, Emanuele Bugliarello, Anders Søgaard

    Abstract: We aim to learn language models for Creole languages for which large volumes of data are not readily available, and therefore explore the potential transfer from ancestor languages (the 'Ancestry Transfer Hypothesis'). We find that standard transfer methods do not facilitate ancestry transfer. Surprisingly, different from other non-Creole languages, a very distinct two-phase pattern emerges for Cr… ▽ More

    Submitted 9 June, 2022; originally announced June 2022.

    Comments: Workshop on Insights from Negative Results in NLP 2022

  5. arXiv:2206.00437  [pdf, other

    cs.CL cs.CY

    What a Creole Wants, What a Creole Needs

    Authors: Heather Lent, Kelechi Ogueji, Miryam de Lhoneux, Orevaoghene Ahia, Anders Søgaard

    Abstract: In recent years, the natural language processing (NLP) community has given increased attention to the disparity of efforts directed towards high-resource languages over low-resource ones. Efforts to remedy this delta often begin with translations of existing English datasets into other languages. However, this approach ignores that different language communities have different needs. We consider a… ▽ More

    Submitted 1 June, 2022; originally announced June 2022.

    Comments: LREC 2022

  6. arXiv:2203.10020  [pdf, other

    cs.CL

    Challenges and Strategies in Cross-Cultural NLP

    Authors: Daniel Hershcovich, Stella Frank, Heather Lent, Miryam de Lhoneux, Mostafa Abdou, Stephanie Brandl, Emanuele Bugliarello, Laura Cabello Piqueras, Ilias Chalkidis, Ruixiang Cui, Constanza Fierro, Katerina Margatina, Phillip Rust, Anders Søgaard

    Abstract: Various efforts in the Natural Language Processing (NLP) community have been made to accommodate linguistic diversity and serve speakers of many different languages. However, it is important to acknowledge that speakers and the content they produce and require, vary not just by language, but also by culture. Although language and culture are tightly linked, there are important differences. Analogo… ▽ More

    Submitted 18 March, 2022; originally announced March 2022.

    Comments: ACL 2022 - Theme track

  7. arXiv:2109.06074  [pdf, other

    cs.CL

    On Language Models for Creoles

    Authors: Heather Lent, Emanuele Bugliarello, Miryam de Lhoneux, Chen Qiu, Anders Søgaard

    Abstract: Creole languages such as Nigerian Pidgin English and Haitian Creole are under-resourced and largely ignored in the NLP literature. Creoles typically result from the fusion of a foreign language with multiple local languages, and what grammatical and lexical features are transferred to the creole is a complex process. While creoles are generally stable, the prominence of some features may be much s… ▽ More

    Submitted 13 September, 2021; originally announced September 2021.

    Comments: CoNLL 2021

  8. arXiv:2108.03509  [pdf

    cs.CL

    Compositional Generalization in Multilingual Semantic Parsing over Wikidata

    Authors: Ruixiang Cui, Rahul Aralikatte, Heather Lent, Daniel Hershcovich

    Abstract: Semantic parsing (SP) allows humans to leverage vast knowledge resources through natural interaction. However, parsers are mostly designed for and evaluated on English resources, such as CFQ (Keysers et al., 2020), the current standard benchmark based on English data generated from grammar rules and oriented towards Freebase, an outdated knowledge base. We propose a method for creating a multiling… ▽ More

    Submitted 31 May, 2022; v1 submitted 7 August, 2021; originally announced August 2021.

    Comments: Accepted to TACL; Authors' final version, pre-MIT Press publication; Previous title: Multilingual Compositional Wikidata Questions

  9. arXiv:2010.05567  [pdf, other

    cs.CL

    Joint Semantic Analysis with Document-Level Cross-Task Coherence Rewards

    Authors: Rahul Aralikatte, Mostafa Abdou, Heather Lent, Daniel Hershcovich, Anders Søgaard

    Abstract: Coreference resolution and semantic role labeling are NLP tasks that capture different aspects of semantics, indicating respectively, which expressions refer to the same entity, and what semantic roles expressions serve in the sentence. However, they are often closely interdependent, and both generally necessitate natural language understanding. Do they form a coherent abstract representation of d… ▽ More

    Submitted 12 October, 2020; originally announced October 2020.

  10. arXiv:1909.02392  [pdf, other

    cs.CL

    Rewarding Coreference Resolvers for Being Consistent with World Knowledge

    Authors: Rahul Aralikatte, Heather Lent, Ana Valeria Gonzalez, Daniel Hershcovich, Chen Qiu, Anders Sandholm, Michael Ringaard, Anders Søgaard

    Abstract: Unresolved coreference is a bottleneck for relation extraction, and high-quality coreference resolvers may produce an output that makes it a lot easier to extract knowledge triples. We show how to improve coreference resolvers by forwarding their input to a relation extraction system and reward the resolvers for producing triples that are found in knowledge bases. Since relation extraction systems… ▽ More

    Submitted 11 November, 2019; v1 submitted 5 September, 2019; originally announced September 2019.

    Comments: To appear in EMNLP 2019 (with corrected Fig. 2)