Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–9 of 9 results for author: Rust, P

.
  1. arXiv:2407.06177  [pdf, other

    cs.CV cs.AI cs.CL cs.CY

    Vision-Language Models under Cultural and Inclusive Considerations

    Authors: Antonia Karamolegkou, Phillip Rust, Yong Cao, Ruixiang Cui, Anders Søgaard, Daniel Hershcovich

    Abstract: Large vision-language models (VLMs) can assist visually impaired people by describing images from their daily lives. Current evaluation datasets may not reflect diverse cultural user backgrounds or the situational context of this use case. To address this problem, we create a survey to determine caption preferences and propose a culture-centric evaluation benchmark by filtering VizWiz, an existing… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: HuCLLM @ ACL 2024

  2. arXiv:2402.09611  [pdf, other

    cs.CL cs.AI cs.CV cs.LG

    Towards Privacy-Aware Sign Language Translation at Scale

    Authors: Phillip Rust, Bowen Shi, Skyler Wang, Necati Cihan Camgöz, Jean Maillard

    Abstract: A major impediment to the advancement of sign language translation (SLT) is data scarcity. Much of the sign language data currently available on the web cannot be used for training supervised models due to the lack of aligned captions. Furthermore, scaling SLT using large-scale web-scraped datasets bears privacy risks due to the presence of biometric information, which the responsible development… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

  3. arXiv:2311.00522  [pdf, other

    cs.CL

    Text Rendering Strategies for Pixel Language Models

    Authors: Jonas F. Lotz, Elizabeth Salesky, Phillip Rust, Desmond Elliott

    Abstract: Pixel-based language models process text rendered as images, which allows them to handle any script, making them a promising approach to open vocabulary language modelling. However, recent approaches use text renderers that produce a large set of almost-equivalent input patches, which may prove sub-optimal for downstream tasks, due to redundancy in the input representations. In this paper, we inve… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

    Comments: EMNLP 2023

  4. arXiv:2310.18343  [pdf, other

    cs.CL

    PHD: Pixel-Based Language Modeling of Historical Documents

    Authors: Nadav Borenstein, Phillip Rust, Desmond Elliott, Isabelle Augenstein

    Abstract: The digitisation of historical documents has provided historians with unprecedented research opportunities. Yet, the conventional approach to analysing historical documents involves converting them from images to text using OCR, a process that overlooks the potential benefits of treating them as images and introduces high levels of noise. To bridge this gap, we take advantage of recent advancement… ▽ More

    Submitted 4 November, 2023; v1 submitted 22 October, 2023; originally announced October 2023.

    Comments: Accepted to the main conference of EMNLP 2023

  5. arXiv:2308.08774  [pdf, other

    cs.CL cs.AI cs.CR cs.LG

    Differential Privacy, Linguistic Fairness, and Training Data Influence: Impossibility and Possibility Theorems for Multilingual Language Models

    Authors: Phillip Rust, Anders Søgaard

    Abstract: Language models such as mBERT, XLM-R, and BLOOM aim to achieve multilingual generalization or compression to facilitate transfer to a large number of (potentially unseen) languages. However, these models should ideally also be private, linguistically fair, and transparent, by relating their predictions to training data. Can these requirements be simultaneously satisfied? We show that multilingual… ▽ More

    Submitted 17 August, 2023; originally announced August 2023.

    Comments: ICML 2023

  6. arXiv:2207.06991  [pdf, other

    cs.CL cs.AI cs.CV cs.LG

    Language Modelling with Pixels

    Authors: Phillip Rust, Jonas F. Lotz, Emanuele Bugliarello, Elizabeth Salesky, Miryam de Lhoneux, Desmond Elliott

    Abstract: Language models are defined over a finite set of inputs, which creates a vocabulary bottleneck when we attempt to scale the number of supported languages. Tackling this bottleneck results in a trade-off between what can be represented in the embedding matrix and computational issues in the output layer. This paper introduces PIXEL, the Pixel-based Encoder of Language, which suffers from neither of… ▽ More

    Submitted 26 April, 2023; v1 submitted 14 July, 2022; originally announced July 2022.

    Comments: ICLR 2023

  7. arXiv:2203.10020  [pdf, other

    cs.CL

    Challenges and Strategies in Cross-Cultural NLP

    Authors: Daniel Hershcovich, Stella Frank, Heather Lent, Miryam de Lhoneux, Mostafa Abdou, Stephanie Brandl, Emanuele Bugliarello, Laura Cabello Piqueras, Ilias Chalkidis, Ruixiang Cui, Constanza Fierro, Katerina Margatina, Phillip Rust, Anders Søgaard

    Abstract: Various efforts in the Natural Language Processing (NLP) community have been made to accommodate linguistic diversity and serve speakers of many different languages. However, it is important to acknowledge that speakers and the content they produce and require, vary not just by language, but also by culture. Although language and culture are tightly linked, there are important differences. Analogo… ▽ More

    Submitted 18 March, 2022; originally announced March 2022.

    Comments: ACL 2022 - Theme track

  8. arXiv:2012.15613  [pdf, other

    cs.CL

    How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models

    Authors: Phillip Rust, Jonas Pfeiffer, Ivan Vulić, Sebastian Ruder, Iryna Gurevych

    Abstract: In this work, we provide a systematic and comprehensive empirical comparison of pretrained multilingual language models versus their monolingual counterparts with regard to their monolingual task performance. We study a set of nine typologically diverse languages with readily available pretrained monolingual models on a set of five diverse monolingual downstream tasks. We first aim to establish, v… ▽ More

    Submitted 1 June, 2021; v1 submitted 31 December, 2020; originally announced December 2020.

    Comments: ACL 2021

  9. arXiv:2004.13161  [pdf, other

    cs.CL cs.AI

    PuzzLing Machines: A Challenge on Learning From Small Data

    Authors: Gözde Gül Şahin, Yova Kementchedjhieva, Phillip Rust, Iryna Gurevych

    Abstract: Deep neural models have repeatedly proved excellent at memorizing surface patterns from large datasets for various ML and NLP benchmarks. They struggle to achieve human-like thinking, however, because they lack the skill of iterative reasoning upon knowledge. To expose this problem in a new light, we introduce a challenge on learning from small data, PuzzLing Machines, which consists of Rosetta St… ▽ More

    Submitted 27 April, 2020; originally announced April 2020.

    Comments: Accepted to ACL 2020