-
Multitask Multilingual Model Adaptation with Featurized Low-Rank Mixtures
Authors:
Chu-Cheng Lin,
Xinyi Wang,
Jonathan H. Clark,
Han Lu,
Yun Zhu,
Chenxi Whitehouse,
Hongkun Yu
Abstract:
Adapting pretrained large language models (LLMs) to various downstream tasks in tens or hundreds of human languages is computationally expensive. Parameter-efficient fine-tuning (PEFT) significantly reduces the adaptation cost, by tuning only a small amount of parameters. However, directly applying PEFT methods such as LoRA (Hu et al., 2022) on diverse dataset mixtures could lead to suboptimal per…
▽ More
Adapting pretrained large language models (LLMs) to various downstream tasks in tens or hundreds of human languages is computationally expensive. Parameter-efficient fine-tuning (PEFT) significantly reduces the adaptation cost, by tuning only a small amount of parameters. However, directly applying PEFT methods such as LoRA (Hu et al., 2022) on diverse dataset mixtures could lead to suboptimal performance due to limited parameter capacity and negative interference among different datasets. In this work, we propose Featurized Low-rank Mixtures (FLix), a novel PEFT method designed for effective multitask multilingual tuning. FLix associates each unique dataset feature, such as the dataset's language or task, with its own low-rank weight update parameters. By composing feature-specific parameters for each dataset, FLix can accommodate diverse dataset mixtures and generalize better to unseen datasets. Our experiments show that FLix leads to significant improvements over a variety of tasks for both supervised learning and zero-shot settings using different training data mixtures.
△ Less
Submitted 27 February, 2024;
originally announced February 2024.
-
FIAT: Fusing learning paradigms with Instruction-Accelerated Tuning
Authors:
Xinyi Wang,
John Wieting,
Jonathan H. Clark
Abstract:
Learning paradigms for large language models (LLMs) currently tend to fall within either in-context learning (ICL) or full fine-tuning. Each of these comes with their own trade-offs based on available data, model size, compute cost, ease-of-use, and final quality with neither solution performing well across-the-board. In this article, we first describe ICL and fine-tuning paradigms in a way that h…
▽ More
Learning paradigms for large language models (LLMs) currently tend to fall within either in-context learning (ICL) or full fine-tuning. Each of these comes with their own trade-offs based on available data, model size, compute cost, ease-of-use, and final quality with neither solution performing well across-the-board. In this article, we first describe ICL and fine-tuning paradigms in a way that highlights their natural connections. Based on these connections, we propose a new learning paradigm called FIAT that fuses the best of these paradigms together, enabling prompt-engineered instructions and chain-of-thought reasoning with the very largest models while also using similar methods to perform parameter updates on a modestly-sized LLM with parameter-efficient tuning. We evaluate FIAT's effectiveness on a variety of multilingual tasks and observe that FIAT performs better than both ICL and fine-tuning at scales ranging from 100-10,000 training examples. We hope that FIAT provides a practical way of harnessing the full potential of LLMs without needing to make a hard choice between learning paradigms.
△ Less
Submitted 12 September, 2023; v1 submitted 8 September, 2023;
originally announced September 2023.
-
The Devil is in the Errors: Leveraging Large Language Models for Fine-grained Machine Translation Evaluation
Authors:
Patrick Fernandes,
Daniel Deutsch,
Mara Finkelstein,
Parker Riley,
André F. T. Martins,
Graham Neubig,
Ankush Garg,
Jonathan H. Clark,
Markus Freitag,
Orhan Firat
Abstract:
Automatic evaluation of machine translation (MT) is a critical tool driving the rapid iterative development of MT systems. While considerable progress has been made on estimating a single scalar quality score, current metrics lack the informativeness of more detailed schemes that annotate individual errors, such as Multidimensional Quality Metrics (MQM). In this paper, we help fill this gap by pro…
▽ More
Automatic evaluation of machine translation (MT) is a critical tool driving the rapid iterative development of MT systems. While considerable progress has been made on estimating a single scalar quality score, current metrics lack the informativeness of more detailed schemes that annotate individual errors, such as Multidimensional Quality Metrics (MQM). In this paper, we help fill this gap by proposing AutoMQM, a prompting technique which leverages the reasoning and in-context learning capabilities of large language models (LLMs) and asks them to identify and categorize errors in translations. We start by evaluating recent LLMs, such as PaLM and PaLM-2, through simple score prediction prompting, and we study the impact of labeled data through in-context learning and finetuning. We then evaluate AutoMQM with PaLM-2 models, and we find that it improves performance compared to just prompting for scores (with particularly large gains for larger models) while providing interpretability through error spans that align with human annotations.
△ Less
Submitted 14 August, 2023;
originally announced August 2023.
-
Evaluating and Modeling Attribution for Cross-Lingual Question Answering
Authors:
Benjamin Muller,
John Wieting,
Jonathan H. Clark,
Tom Kwiatkowski,
Sebastian Ruder,
Livio Baldini Soares,
Roee Aharoni,
Jonathan Herzig,
Xinyi Wang
Abstract:
Trustworthy answer content is abundant in many high-resource languages and is instantly accessible through question answering systems, yet this content can be hard to access for those that do not speak these languages. The leap forward in cross-lingual modeling quality offered by generative language models offers much promise, yet their raw generations often fall short in factuality. To improve tr…
▽ More
Trustworthy answer content is abundant in many high-resource languages and is instantly accessible through question answering systems, yet this content can be hard to access for those that do not speak these languages. The leap forward in cross-lingual modeling quality offered by generative language models offers much promise, yet their raw generations often fall short in factuality. To improve trustworthiness in these systems, a promising direction is to attribute the answer to a retrieved source, possibly in a content-rich language different from the query. Our work is the first to study attribution for cross-lingual question answering. First, we collect data in 5 languages to assess the attribution level of a state-of-the-art cross-lingual QA system. To our surprise, we find that a substantial portion of the answers is not attributable to any retrieved passages (up to 50% of answers exactly matching a gold reference) despite the system being able to attend directly to the retrieved text. Second, to address this poor attribution level, we experiment with a wide range of attribution detection techniques. We find that Natural Language Inference models and PaLM 2 fine-tuned on a very small amount of attribution data can accurately detect attribution. Based on these models, we improve the attribution level of a cross-lingual question-answering system. Overall, we show that current academic generative cross-lingual QA systems have substantial shortcomings in attribution and we build tooling to mitigate these issues.
△ Less
Submitted 15 November, 2023; v1 submitted 23 May, 2023;
originally announced May 2023.
-
XTREME-UP: A User-Centric Scarce-Data Benchmark for Under-Represented Languages
Authors:
Sebastian Ruder,
Jonathan H. Clark,
Alexander Gutkin,
Mihir Kale,
Min Ma,
Massimo Nicosia,
Shruti Rijhwani,
Parker Riley,
Jean-Michel A. Sarr,
Xinyi Wang,
John Wieting,
Nitish Gupta,
Anna Katanova,
Christo Kirov,
Dana L. Dickinson,
Brian Roark,
Bidisha Samanta,
Connie Tao,
David I. Adelani,
Vera Axelrod,
Isaac Caswell,
Colin Cherry,
Dan Garrette,
Reeve Ingle,
Melvin Johnson
, et al. (2 additional authors not shown)
Abstract:
Data scarcity is a crucial issue for the development of highly multilingual NLP systems. Yet for many under-represented languages (ULs) -- languages for which NLP re-search is particularly far behind in meeting user needs -- it is feasible to annotate small amounts of data. Motivated by this, we propose XTREME-UP, a benchmark defined by: its focus on the scarce-data scenario rather than zero-shot;…
▽ More
Data scarcity is a crucial issue for the development of highly multilingual NLP systems. Yet for many under-represented languages (ULs) -- languages for which NLP re-search is particularly far behind in meeting user needs -- it is feasible to annotate small amounts of data. Motivated by this, we propose XTREME-UP, a benchmark defined by: its focus on the scarce-data scenario rather than zero-shot; its focus on user-centric tasks -- tasks with broad adoption by speakers of high-resource languages; and its focus on under-represented languages where this scarce-data scenario tends to be most realistic. XTREME-UP evaluates the capabilities of language models across 88 under-represented languages over 9 key user-centric technologies including ASR, OCR, MT, and information access tasks that are of general utility. We create new datasets for OCR, autocomplete, semantic parsing, and transliteration, and build on and refine existing datasets for other tasks. XTREME-UP provides methodology for evaluating many modeling scenarios including text-only, multi-modal (vision, audio, and text),supervised parameter tuning, and in-context learning. We evaluate commonly used models on the benchmark. We release all code and scripts to train and evaluate models
△ Less
Submitted 24 May, 2023; v1 submitted 19 May, 2023;
originally announced May 2023.
-
PaLM 2 Technical Report
Authors:
Rohan Anil,
Andrew M. Dai,
Orhan Firat,
Melvin Johnson,
Dmitry Lepikhin,
Alexandre Passos,
Siamak Shakeri,
Emanuel Taropa,
Paige Bailey,
Zhifeng Chen,
Eric Chu,
Jonathan H. Clark,
Laurent El Shafey,
Yanping Huang,
Kathy Meier-Hellstern,
Gaurav Mishra,
Erica Moreira,
Mark Omernick,
Kevin Robinson,
Sebastian Ruder,
Yi Tay,
Kefan Xiao,
Yuanzhong Xu,
Yujing Zhang,
Gustavo Hernandez Abrego
, et al. (103 additional authors not shown)
Abstract:
We introduce PaLM 2, a new state-of-the-art language model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM. PaLM 2 is a Transformer-based model trained using a mixture of objectives. Through extensive evaluations on English and multilingual language, and reasoning tasks, we demonstrate that PaLM 2 has significantly improved quality on…
▽ More
We introduce PaLM 2, a new state-of-the-art language model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM. PaLM 2 is a Transformer-based model trained using a mixture of objectives. Through extensive evaluations on English and multilingual language, and reasoning tasks, we demonstrate that PaLM 2 has significantly improved quality on downstream tasks across different model sizes, while simultaneously exhibiting faster and more efficient inference compared to PaLM. This improved efficiency enables broader deployment while also allowing the model to respond faster, for a more natural pace of interaction. PaLM 2 demonstrates robust reasoning capabilities exemplified by large improvements over PaLM on BIG-Bench and other reasoning tasks. PaLM 2 exhibits stable performance on a suite of responsible AI evaluations, and enables inference-time control over toxicity without additional overhead or impact on other capabilities. Overall, PaLM 2 achieves state-of-the-art performance across a diverse set of tasks and capabilities.
When discussing the PaLM 2 family, it is important to distinguish between pre-trained models (of various sizes), fine-tuned variants of these models, and the user-facing products that use these models. In particular, user-facing products typically include additional pre- and post-processing steps. Additionally, the underlying models may evolve over time. Therefore, one should not expect the performance of user-facing products to exactly match the results reported in this report.
△ Less
Submitted 13 September, 2023; v1 submitted 17 May, 2023;
originally announced May 2023.
-
AfriQA: Cross-lingual Open-Retrieval Question Answering for African Languages
Authors:
Odunayo Ogundepo,
Tajuddeen R. Gwadabe,
Clara E. Rivera,
Jonathan H. Clark,
Sebastian Ruder,
David Ifeoluwa Adelani,
Bonaventure F. P. Dossou,
Abdou Aziz DIOP,
Claytone Sikasote,
Gilles Hacheme,
Happy Buzaaba,
Ignatius Ezeani,
Rooweither Mabuya,
Salomey Osei,
Chris Emezue,
Albert Njoroge Kahira,
Shamsuddeen H. Muhammad,
Akintunde Oladipo,
Abraham Toluwase Owodunni,
Atnafu Lambebo Tonja,
Iyanuoluwa Shode,
Akari Asai,
Tunde Oluwaseyi Ajayi,
Clemencia Siro,
Steven Arthur
, et al. (27 additional authors not shown)
Abstract:
African languages have far less in-language content available digitally, making it challenging for question answering systems to satisfy the information needs of users. Cross-lingual open-retrieval question answering (XOR QA) systems -- those that retrieve answer content from other languages while serving people in their native language -- offer a means of filling this gap. To this end, we create…
▽ More
African languages have far less in-language content available digitally, making it challenging for question answering systems to satisfy the information needs of users. Cross-lingual open-retrieval question answering (XOR QA) systems -- those that retrieve answer content from other languages while serving people in their native language -- offer a means of filling this gap. To this end, we create AfriQA, the first cross-lingual QA dataset with a focus on African languages. AfriQA includes 12,000+ XOR QA examples across 10 African languages. While previous datasets have focused primarily on languages where cross-lingual QA augments coverage from the target language, AfriQA focuses on languages where cross-lingual answer content is the only high-coverage source of answer content. Because of this, we argue that African languages are one of the most important and realistic use cases for XOR QA. Our experiments demonstrate the poor performance of automatic translation and multilingual retrieval methods. Overall, AfriQA proves challenging for state-of-the-art QA models. We hope that the dataset enables the development of more equitable QA technology.
△ Less
Submitted 11 May, 2023;
originally announced May 2023.
-
Beyond Contrastive Learning: A Variational Generative Model for Multilingual Retrieval
Authors:
John Wieting,
Jonathan H. Clark,
William W. Cohen,
Graham Neubig,
Taylor Berg-Kirkpatrick
Abstract:
Contrastive learning has been successfully used for retrieval of semantically aligned sentences, but it often requires large batch sizes or careful engineering to work well. In this paper, we instead propose a generative model for learning multilingual text embeddings which can be used to retrieve or score sentence pairs. Our model operates on parallel data in $N$ languages and, through an approxi…
▽ More
Contrastive learning has been successfully used for retrieval of semantically aligned sentences, but it often requires large batch sizes or careful engineering to work well. In this paper, we instead propose a generative model for learning multilingual text embeddings which can be used to retrieve or score sentence pairs. Our model operates on parallel data in $N$ languages and, through an approximation we introduce, efficiently encourages source separation in this multilingual setting, separating semantic information that is shared between translations from stylistic or language-specific variation. We show careful large-scale comparisons between contrastive and generation-based approaches for learning multilingual text embeddings, a comparison that has not been done to the best of our knowledge despite the popularity of these approaches. We evaluate this method on a suite of tasks including semantic similarity, bitext mining, and cross-lingual question retrieval -- the last of which we introduce in this paper. Overall, our Variational Multilingual Source-Separation Transformer (VMSST) model outperforms both a strong contrastive and generative baseline on these tasks.
△ Less
Submitted 4 June, 2023; v1 submitted 20 December, 2022;
originally announced December 2022.
-
Angular Diameters and Fundamental Parameters of Forty-Four Stars from the Navy Precision Optical Interferometer
Authors:
Ellyn K. Baines,
J. Thomas Armstrong,
James H. Clark III,
Jim Gorney,
Donald J. Hutter,
Anders M. Jorgensen,
Casey Kyte,
David Mozurkewich,
Ishara Nisley,
Jason Sanborn,
Henrique R. Schmitt,
Gerard T. van Belle
Abstract:
We measured the angular diameters of 44 stars with the Navy Precision Optical Interferometer, obtaining uncertainties on the limb darkened diameter of 2% or less for all but four stars. We then used our diameters with Gaia or Hipparcos parallaxes to calculate each star's physical radius. We gathered information from the literature to determine bolometric flux and luminosity, and combined that with…
▽ More
We measured the angular diameters of 44 stars with the Navy Precision Optical Interferometer, obtaining uncertainties on the limb darkened diameter of 2% or less for all but four stars. We then used our diameters with Gaia or Hipparcos parallaxes to calculate each star's physical radius. We gathered information from the literature to determine bolometric flux and luminosity, and combined that with our diameters to produce an effective temperature. Our sample consists of mostly giant stars, and spans a wide range of spectral classes from B to M.
△ Less
Submitted 16 November, 2022;
originally announced November 2022.
-
Detecting Topological phase transitions in a double kicked quantum rotor
Authors:
Nikolai Bolik,
Caspar Groiseau,
Jerry H. Clark,
Gil S. Summy,
Yingmei Liu,
Sandro Wimberger
Abstract:
We present a concrete theoretical proposal for detecting topological phase transitions in double kicked atom-optics kicked rotors with internal spin-1/2 degree of freedom. The implementation utilizes a kicked Bose-Einstein condensate evolving in one-dimensional momentum space. To reduce influence of atom loss and phase decoherence we aim to keep experimental durations short while maintaining a res…
▽ More
We present a concrete theoretical proposal for detecting topological phase transitions in double kicked atom-optics kicked rotors with internal spin-1/2 degree of freedom. The implementation utilizes a kicked Bose-Einstein condensate evolving in one-dimensional momentum space. To reduce influence of atom loss and phase decoherence we aim to keep experimental durations short while maintaining a resonant experimental protocol. Experimental limitations induced by phase noise, quasimomentum distributions, symmetries, and the AC-Stark shift are considered. Our results thus suggest a feasible and optimized procedure for observing topological phase transitions in quantum kicked rotors.
△ Less
Submitted 21 October, 2022;
originally announced October 2022.
-
MIA 2022 Shared Task: Evaluating Cross-lingual Open-Retrieval Question Answering for 16 Diverse Languages
Authors:
Akari Asai,
Shayne Longpre,
Jungo Kasai,
Chia-Hsuan Lee,
Rui Zhang,
Junjie Hu,
Ikuya Yamada,
Jonathan H. Clark,
Eunsol Choi
Abstract:
We present the results of the Workshop on Multilingual Information Access (MIA) 2022 Shared Task, evaluating cross-lingual open-retrieval question answering (QA) systems in 16 typologically diverse languages. In this task, we adapted two large-scale cross-lingual open-retrieval QA datasets in 14 typologically diverse languages, and newly annotated open-retrieval QA data in 2 underrepresented langu…
▽ More
We present the results of the Workshop on Multilingual Information Access (MIA) 2022 Shared Task, evaluating cross-lingual open-retrieval question answering (QA) systems in 16 typologically diverse languages. In this task, we adapted two large-scale cross-lingual open-retrieval QA datasets in 14 typologically diverse languages, and newly annotated open-retrieval QA data in 2 underrepresented languages: Tagalog and Tamil. Four teams submitted their systems. The best system leveraging iteratively mined diverse negative examples and larger pretrained models achieves 32.2 F1, outperforming our baseline by 4.5 points. The second best system uses entity-aware contextualized representations for document retrieval, and achieves significant improvements in Tamil (20.8 F1), whereas most of the other systems yield nearly zero scores.
△ Less
Submitted 2 July, 2022;
originally announced July 2022.
-
Light-shift induced behaviors observed in momentum-space quantum walks
Authors:
Nikolai Bolik,
Caspar Groiseau,
Jerry H. Clark,
Alexander Gresch,
Siamak Dadras,
Gil S. Summy,
Yingmei Liu,
Sandro Wimberger
Abstract:
Over the last decade there have been many advances in studies of quantum walks (QWs) including a momentum-space QW recently realized in our spinor Bose-Einstein condensate system. This QW possessed behaviors that generally agreed with theoretical predictions; however, it also showed momentum distributions that were not adequately explained by the theory. We present a theoretical model which proves…
▽ More
Over the last decade there have been many advances in studies of quantum walks (QWs) including a momentum-space QW recently realized in our spinor Bose-Einstein condensate system. This QW possessed behaviors that generally agreed with theoretical predictions; however, it also showed momentum distributions that were not adequately explained by the theory. We present a theoretical model which proves that the coherent dynamics of the spinor condensate is sufficient to explain the experimental data without invoking the presence of a thermal cloud of atoms as in the original theory. Our numerical findings are supported by an analytical prediction for the momentum distributions in the limit of zero-temperature condensates. This current model provides more complete explanations to the momentum-space QWs that can be applied to study quantum search algorithms and topological phases in Floquet-driven systems.
△ Less
Submitted 26 September, 2022; v1 submitted 16 May, 2022;
originally announced May 2022.
-
Scaling Up Models and Data with $\texttt{t5x}$ and $\texttt{seqio}$
Authors:
Adam Roberts,
Hyung Won Chung,
Anselm Levskaya,
Gaurav Mishra,
James Bradbury,
Daniel Andor,
Sharan Narang,
Brian Lester,
Colin Gaffney,
Afroz Mohiuddin,
Curtis Hawthorne,
Aitor Lewkowycz,
Alex Salcianu,
Marc van Zee,
Jacob Austin,
Sebastian Goodman,
Livio Baldini Soares,
Haitang Hu,
Sasha Tsvyashchenko,
Aakanksha Chowdhery,
Jasmijn Bastings,
Jannis Bulian,
Xavier Garcia,
Jianmo Ni,
Andrew Chen
, et al. (18 additional authors not shown)
Abstract:
Recent neural network-based language models have benefited greatly from scaling up the size of training datasets and the number of parameters in the models themselves. Scaling can be complicated due to various factors including the need to distribute computation on supercomputer clusters (e.g., TPUs), prevent bottlenecks when infeeding data, and ensure reproducible results. In this work, we presen…
▽ More
Recent neural network-based language models have benefited greatly from scaling up the size of training datasets and the number of parameters in the models themselves. Scaling can be complicated due to various factors including the need to distribute computation on supercomputer clusters (e.g., TPUs), prevent bottlenecks when infeeding data, and ensure reproducible results. In this work, we present two software libraries that ease these issues: $\texttt{t5x}$ simplifies the process of building and training large language models at scale while maintaining ease of use, and $\texttt{seqio}$ provides a task-based API for simple creation of fast and reproducible training data and evaluation pipelines. These open-source libraries have been used to train models with hundreds of billions of parameters on datasets with multiple terabytes of training data.
Along with the libraries, we release configurations and instructions for T5-like encoder-decoder models as well as GPT-like decoder-only architectures.
$\texttt{t5x}$ and $\texttt{seqio}$ are open source and available at https://github.com/google-research/t5x and https://github.com/google/seqio, respectively.
△ Less
Submitted 31 March, 2022;
originally announced March 2022.
-
XTREME-S: Evaluating Cross-lingual Speech Representations
Authors:
Alexis Conneau,
Ankur Bapna,
Yu Zhang,
Min Ma,
Patrick von Platen,
Anton Lozhkov,
Colin Cherry,
Ye Jia,
Clara Rivera,
Mihir Kale,
Daan Van Esch,
Vera Axelrod,
Simran Khanuja,
Jonathan H. Clark,
Orhan Firat,
Michael Auli,
Sebastian Ruder,
Jason Riesa,
Melvin Johnson
Abstract:
We introduce XTREME-S, a new benchmark to evaluate universal cross-lingual speech representations in many languages. XTREME-S covers four task families: speech recognition, classification, speech-to-text translation and retrieval. Covering 102 languages from 10+ language families, 3 different domains and 4 task families, XTREME-S aims to simplify multilingual speech representation evaluation, as w…
▽ More
We introduce XTREME-S, a new benchmark to evaluate universal cross-lingual speech representations in many languages. XTREME-S covers four task families: speech recognition, classification, speech-to-text translation and retrieval. Covering 102 languages from 10+ language families, 3 different domains and 4 task families, XTREME-S aims to simplify multilingual speech representation evaluation, as well as catalyze research in "universal" speech representation learning. This paper describes the new benchmark and establishes the first speech-only and speech-text baselines using XLS-R and mSLAM on all downstream tasks. We motivate the design choices and detail how to use the benchmark. Datasets and fine-tuning scripts are made easily accessible at https://hf.co/datasets/google/xtreme_s.
△ Less
Submitted 13 April, 2022; v1 submitted 21 March, 2022;
originally announced March 2022.
-
Pediatric Otoscopy Video Screening with Shift Contrastive Anomaly Detection
Authors:
Weiyao Wang,
Aniruddha Tamhane,
Christine Santos,
John R. Rzasa,
James H. Clark,
Therese L. Canares,
Mathias Unberath
Abstract:
Ear related concerns and symptoms represents the leading indication for seeking pediatric healthcare attention. Despite the high incidence of such encounters, the diagnostic process of commonly encountered disease of the middle and external presents significant challenge. Much of this challenge stems from the lack of cost effective diagnostic testing, which necessitating the presence or absence of…
▽ More
Ear related concerns and symptoms represents the leading indication for seeking pediatric healthcare attention. Despite the high incidence of such encounters, the diagnostic process of commonly encountered disease of the middle and external presents significant challenge. Much of this challenge stems from the lack of cost effective diagnostic testing, which necessitating the presence or absence of ear pathology to be determined clinically. Research has however demonstrated considerable variation among clinicians in their ability to accurately diagnose and consequently manage ear pathology. With recent advances in computer vision and machine learning, there is an increasing interest in helping clinicians to accurately diagnose middle and external ear pathology with computer-aided systems. It has been shown that AI has the capacity to analyse a single clinical image captured during examination of the ear canal and eardrum from which it can determine the likelihood of a pathognomonic pattern for a specific diagnosis being present. The capture of such an image can however be challenging especially to inexperienced clinicians. To help mitigate this technical challenge we have developed and tested a method using video sequences. We present a two stage method that first, identifies valid frames by detecting and extracting ear drum patches from the video sequence, and second, performs the proposed shift contrastive anomaly detection to flag the otoscopy video sequences as normal or abnormal. Our method achieves an AUROC of 88.0% on the patient-level and also outperforms the average of a group of 25 clinicians in a comparative study, which is the largest of such published to date. We conclude that the presented method achieves a promising first step towards automated analysis of otoscopy video.
△ Less
Submitted 25 October, 2021;
originally announced October 2021.
-
SLAM: A Unified Encoder for Speech and Language Modeling via Speech-Text Joint Pre-Training
Authors:
Ankur Bapna,
Yu-an Chung,
Nan Wu,
Anmol Gulati,
Ye Jia,
Jonathan H. Clark,
Melvin Johnson,
Jason Riesa,
Alexis Conneau,
Yu Zhang
Abstract:
Unsupervised pre-training is now the predominant approach for both text and speech understanding. Self-attention models pre-trained on large amounts of unannotated data have been hugely successful when fine-tuned on downstream tasks from a variety of domains and languages. This paper takes the universality of unsupervised language pre-training one step further, by unifying speech and text pre-trai…
▽ More
Unsupervised pre-training is now the predominant approach for both text and speech understanding. Self-attention models pre-trained on large amounts of unannotated data have been hugely successful when fine-tuned on downstream tasks from a variety of domains and languages. This paper takes the universality of unsupervised language pre-training one step further, by unifying speech and text pre-training within a single model. We build a single encoder with the BERT objective on unlabeled text together with the w2v-BERT objective on unlabeled speech. To further align our model representations across modalities, we leverage alignment losses, specifically Translation Language Modeling (TLM) and Speech Text Matching (STM) that make use of supervised speech-text recognition data. We demonstrate that incorporating both speech and text data during pre-training can significantly improve downstream quality on CoVoST~2 speech translation, by around 1 BLEU compared to single-modality pre-trained models, while retaining close to SotA performance on LibriSpeech and SpeechStew ASR tasks. On four GLUE tasks and text-normalization, we observe evidence of capacity limitations and interference between the two modalities, leading to degraded performance compared to an equivalent text-only model, while still being competitive with BERT. Through extensive empirical analysis we also demonstrate the importance of the choice of objective function for speech pre-training, and the beneficial effect of adding additional supervised signals on the quality of the learned representations.
△ Less
Submitted 19 October, 2021;
originally announced October 2021.
-
Quantum to Classical Walk Transitions Tuned by Spontaneous Emissions
Authors:
J. H. Clark,
C. Groiseau,
Z. N. Shaw,
S. Dadras,
C. Binegar,
S. Wimberger,
G. S. Summy,
Y. Liu
Abstract:
We have realized a quantum walk in momentum space with a rubidium spinor Bose-Einstein condensate by applying a periodic kicking potential as a walk operator and a resonant microwave pulse as a coin toss operator. The generated quantum walks appear to be stable for up to ten steps and then quickly transit to classical walks due to spontaneous emissions induced by laser beams of the walk operator.…
▽ More
We have realized a quantum walk in momentum space with a rubidium spinor Bose-Einstein condensate by applying a periodic kicking potential as a walk operator and a resonant microwave pulse as a coin toss operator. The generated quantum walks appear to be stable for up to ten steps and then quickly transit to classical walks due to spontaneous emissions induced by laser beams of the walk operator. We investigate these quantum to classical walk transitions by introducing well controlled spontaneous emissions with an external light source during quantum walks. Our findings demonstrate a scheme to control the robustness of the quantum walks and can also be applied to other cold atom experiments involving spontaneous emissions.
△ Less
Submitted 20 August, 2021;
originally announced August 2021.
-
CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation
Authors:
Jonathan H. Clark,
Dan Garrette,
Iulia Turc,
John Wieting
Abstract:
Pipelined NLP systems have largely been superseded by end-to-end neural modeling, yet nearly all commonly-used models still require an explicit tokenization step. While recent tokenization approaches based on data-derived subword lexicons are less brittle than manually engineered tokenizers, these techniques are not equally suited to all languages, and the use of any fixed vocabulary may limit a m…
▽ More
Pipelined NLP systems have largely been superseded by end-to-end neural modeling, yet nearly all commonly-used models still require an explicit tokenization step. While recent tokenization approaches based on data-derived subword lexicons are less brittle than manually engineered tokenizers, these techniques are not equally suited to all languages, and the use of any fixed vocabulary may limit a model's ability to adapt. In this paper, we present CANINE, a neural encoder that operates directly on character sequences, without explicit tokenization or vocabulary, and a pre-training strategy that operates either directly on characters or optionally uses subwords as a soft inductive bias. To use its finer-grained input effectively and efficiently, CANINE combines downsampling, which reduces the input sequence length, with a deep transformer stack, which encodes context. CANINE outperforms a comparable mBERT model by 2.8 F1 on TyDi QA, a challenging multilingual benchmark, despite having 28% fewer model parameters.
△ Less
Submitted 18 May, 2022; v1 submitted 11 March, 2021;
originally announced March 2021.
-
CapWAP: Captioning with a Purpose
Authors:
Adam Fisch,
Kenton Lee,
Ming-Wei Chang,
Jonathan H. Clark,
Regina Barzilay
Abstract:
The traditional image captioning task uses generic reference captions to provide textual information about images. Different user populations, however, will care about different visual aspects of images. In this paper, we propose a new task, Captioning with a Purpose (CapWAP). Our goal is to develop systems that can be tailored to be useful for the information needs of an intended population, rath…
▽ More
The traditional image captioning task uses generic reference captions to provide textual information about images. Different user populations, however, will care about different visual aspects of images. In this paper, we propose a new task, Captioning with a Purpose (CapWAP). Our goal is to develop systems that can be tailored to be useful for the information needs of an intended population, rather than merely provide generic information about an image. In this task, we use question-answer (QA) pairs---a natural expression of information need---from users, instead of reference captions, for both training and post-inference evaluation. We show that it is possible to use reinforcement learning to directly optimize for the intended information need, by rewarding outputs that allow a question answering model to provide correct answers to sampled user questions. We convert several visual question answering datasets into CapWAP datasets, and demonstrate that under a variety of scenarios our purposeful captioning system learns to anticipate and fulfill specific information needs better than its generic counterparts, as measured by QA performance on user questions from unseen images, when using the caption alone as context.
△ Less
Submitted 9 November, 2020;
originally announced November 2020.
-
Learning to Recognize Dialect Features
Authors:
Dorottya Demszky,
Devyani Sharma,
Jonathan H. Clark,
Vinodkumar Prabhakaran,
Jacob Eisenstein
Abstract:
Building NLP systems that serve everyone requires accounting for dialect differences. But dialects are not monolithic entities: rather, distinctions between and within dialects are captured by the presence, absence, and frequency of dozens of dialect features in speech and text, such as the deletion of the copula in "He {} running". In this paper, we introduce the task of dialect feature detection…
▽ More
Building NLP systems that serve everyone requires accounting for dialect differences. But dialects are not monolithic entities: rather, distinctions between and within dialects are captured by the presence, absence, and frequency of dozens of dialect features in speech and text, such as the deletion of the copula in "He {} running". In this paper, we introduce the task of dialect feature detection, and present two multitask learning approaches, both based on pretrained transformers. For most dialects, large-scale annotated corpora for these features are unavailable, making it difficult to train recognizers. We train our models on a small number of minimal pairs, building on how linguists typically define dialect features. Evaluation on a test set of 22 dialect features of Indian English demonstrates that these models learn to recognize many features with high accuracy, and that a few minimal pairs can be as effective for training as thousands of labeled examples. We also demonstrate the downstream applicability of dialect feature detection both as a measure of dialect density and as a dialect classifier.
△ Less
Submitted 6 May, 2021; v1 submitted 23 October, 2020;
originally announced October 2020.
-
XOR QA: Cross-lingual Open-Retrieval Question Answering
Authors:
Akari Asai,
Jungo Kasai,
Jonathan H. Clark,
Kenton Lee,
Eunsol Choi,
Hannaneh Hajishirzi
Abstract:
Multilingual question answering tasks typically assume answers exist in the same language as the question. Yet in practice, many languages face both information scarcity -- where languages have few reference articles -- and information asymmetry -- where questions reference concepts from other cultures. This work extends open-retrieval question answering to a cross-lingual setting enabling questio…
▽ More
Multilingual question answering tasks typically assume answers exist in the same language as the question. Yet in practice, many languages face both information scarcity -- where languages have few reference articles -- and information asymmetry -- where questions reference concepts from other cultures. This work extends open-retrieval question answering to a cross-lingual setting enabling questions from one language to be answered via answer content from another language. We construct a large-scale dataset built on questions from TyDi QA lacking same-language answers. Our task formulation, called Cross-lingual Open Retrieval Question Answering (XOR QA), includes 40k information-seeking questions from across 7 diverse non-English languages. Based on this dataset, we introduce three new tasks that involve cross-lingual document retrieval using multi-lingual and English resources. We establish baselines with state-of-the-art machine translation systems and cross-lingual pretrained models. Experimental results suggest that XOR QA is a challenging task that will facilitate the development of novel techniques for multilingual question answering. Our data and code are available at https://nlp.cs.washington.edu/xorqa.
△ Less
Submitted 13 April, 2021; v1 submitted 22 October, 2020;
originally announced October 2020.
-
TyDi QA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages
Authors:
Jonathan H. Clark,
Eunsol Choi,
Michael Collins,
Dan Garrette,
Tom Kwiatkowski,
Vitaly Nikolaev,
Jennimaria Palomaki
Abstract:
Confidently making progress on multilingual modeling requires challenging, trustworthy evaluations. We present TyDi QA---a question answering dataset covering 11 typologically diverse languages with 204K question-answer pairs. The languages of TyDi QA are diverse with regard to their typology---the set of linguistic features each language expresses---such that we expect models performing well on t…
▽ More
Confidently making progress on multilingual modeling requires challenging, trustworthy evaluations. We present TyDi QA---a question answering dataset covering 11 typologically diverse languages with 204K question-answer pairs. The languages of TyDi QA are diverse with regard to their typology---the set of linguistic features each language expresses---such that we expect models performing well on this set to generalize across a large number of the world's languages. We present a quantitative analysis of the data quality and example-level qualitative linguistic analyses of observed language phenomena that would not be found in English-only corpora. To provide a realistic information-seeking task and avoid priming effects, questions are written by people who want to know the answer, but don't know the answer yet, and the data is collected directly in each language without the use of translation.
△ Less
Submitted 10 March, 2020;
originally announced March 2020.
-
VISION: A Six-Telescope Fiber-Fed Visible Light Beam Combiner for the Navy Precision Optical Interferometer
Authors:
Eugenio V. Garcia,
Matthew W. Muterspaugh,
Gerard van Belle,
John D. Monnier,
Keivan G. Stassun,
Askari Ghasempour,
James H. Clark,
R. T. Zavala,
James A. Benson,
Donald J. Hutter,
Henrique R. Schmitt,
Ellyn K. Baines,
Anders M. Jorgensen,
Susan G. Strosahl,
Jason Sanborn,
Stephen J. Zawicki,
Michael F. Sakosky,
Samuel Swihart
Abstract:
Visible-light long baseline interferometry holds the promise of advancing a number of important applications in fundamental astronomy, including the direct measurement of the angular diameters and oblateness of stars, and the direct measurement of the orbits of binary and multiple star systems. To advance, the field of visible-light interferometry requires development of instruments capable of com…
▽ More
Visible-light long baseline interferometry holds the promise of advancing a number of important applications in fundamental astronomy, including the direct measurement of the angular diameters and oblateness of stars, and the direct measurement of the orbits of binary and multiple star systems. To advance, the field of visible-light interferometry requires development of instruments capable of combining light from 15 baselines (6 telescopes) simultaneously. The Visible Imaging System for Interferometric Observations at NPOI (VISION) is a new visible light beam combiner for the Navy Precision Optical Interferometer (NPOI) that uses single-mode fibers to coherently combine light from up to six telescopes simultaneously with an image-plane combination scheme. It features a photometric camera for calibrations and spatial filtering from single-mode fibers with two Andor Ixon electron multiplying CCDs. This paper presents the VISION system, results of laboratory tests, and results of commissioning on-sky observations. A new set of corrections have been determined for the power spectrum and bispectrum by taking into account non-Gaussian statistics and read noise present in electron-multipying CCDs to enable measurement of visibilities and closure phases in the VISION post-processing pipeline. The post-processing pipeline has been verified via new on-sky observations of the O-type supergiant binary $ζ$ Orionis A, obtaining a flux ratio of $2.18\pm0.13$ mag with a position angle of $223.9\pm1.0^{\circ}$ and separation $40.6\pm1.8$ mas over 570-750 nm, in good agreement with expectations from the previously published orbit.
△ Less
Submitted 31 December, 2015;
originally announced January 2016.
-
Temperature-dependent Raman scattering of natural and isotopically substituted PbS
Authors:
P. G. Etchegoin,
M. Cardona,
R. Lauck,
R. J. H. Clark,
J. Serrano,
A. H. Romero
Abstract:
Lead sulfide is an important semiconductor that has found technological applications for over a century. Raman spectroscopy, a standard tool for the investigation and characterization of semiconductors, has limited application to this material because of the forbidden nature of its first order scattering and its opacity to visible lasers. Nevertheless, useful vibrational spectra from two-phonon…
▽ More
Lead sulfide is an important semiconductor that has found technological applications for over a century. Raman spectroscopy, a standard tool for the investigation and characterization of semiconductors, has limited application to this material because of the forbidden nature of its first order scattering and its opacity to visible lasers. Nevertheless, useful vibrational spectra from two-phonon processes are obtained with red lasers, probably because of a resonance in the concomitant electronic transitions. Here we report temperature dependent spectra, covering the 10-300 K range, for two samples with different sulfur isotopic compositions. The results are analyzed by comparison with ab initio calculations of the lattice dynamics of PbS and the corresponding densities of one and two-phonon states. Emphasis is placed on the analysis of the two phonon band centered at ~430 cm-1.
△ Less
Submitted 2 September, 2007;
originally announced September 2007.
-
Double radiative pion capture on hydrogen and deuterium and the nucleon's pion cloud
Authors:
S. Tripathi,
D. S. Armstrong,
M. E. Christy,
J. H. D. Clark,
T. P. Gorringe,
M. D. Hasinoff,
M. A. Kovash,
D. H. Wright,
P. A. Zolnierczuk
Abstract:
We report measurements of double radiative capture in pionic hydrogen and pionic deuterium. The measurements were performed with the RMC spectrometer at the TRIUMF cyclotron by recording photon pairs from pion stops in liquid hydrogen and deuterium targets. We obtained absolute branching ratios of $(3.02 \pm 0.27 (stat.) \pm 0.31 (syst.)) \times 10^{-5}$ for hydrogen and…
▽ More
We report measurements of double radiative capture in pionic hydrogen and pionic deuterium. The measurements were performed with the RMC spectrometer at the TRIUMF cyclotron by recording photon pairs from pion stops in liquid hydrogen and deuterium targets. We obtained absolute branching ratios of $(3.02 \pm 0.27 (stat.) \pm 0.31 (syst.)) \times 10^{-5}$ for hydrogen and $(1.42 \pm ^{0.09}_{0.12} (stat.) \pm 0.11 (syst.)) \times 10^{-5}$ for deuterium, and relative branching ratios of double radiative capture to single radiative capture of $(7.68 \pm 0.69(stat.) \pm 0.79(syst.)) \times 10^{-5}$ for hydrogen and $(5.44 \pm^{0.34}_{0.46}(stat.) \pm 0.42(syst.)) \times 10^{-5}$ for deuterium. For hydrogen, the measured branching ratio and photon energy-angle distributions are in fair agreement with a reaction mechanism involving the annihilation of the incident $π^-$ on the $π^+$ cloud of the target proton. For deuterium, the measured branching ratio and energy-angle distributions are qualitatively consistent with simple arguments for the expected role of the spectator neutron. A comparison between our hydrogen and deuterium data and earlier beryllium and carbon data reveals substantial changes in the relative branching ratios and the energy-angle distributions and is in agreement with the expected evolution of the reaction dynamics from an annihilation process in S-state capture to a bremsstrahlung process in P-state capture. Lastly, we comment on the relevance of the double radiative process to the investigation of the charged pion polarizability and the in-medium pion field.
△ Less
Submitted 2 January, 2007;
originally announced January 2007.
-
Ortho-para transition rate in $μ$-molecular hydrogen and the proton's induced pseudoscalar coupling $g_p$
Authors:
J. H. D. Clark,
D. S. Armstrong,
T. P. Gorringe,
M. D. Hasinoff,
P. M. King,
T. J. Stocki,
S. Tripathi,
D. H. Wright,
P. A. Zolnierczuk
Abstract:
We report a measurement of the ortho-para transition rate in the p$μ$p molecule. The experiment was conducted at TRIUMF via the measurement of the time dependence of the 5.2 MeV neutrons from muon capture in liquid hydrogen. The measurement yielded an ortho-para rate $Λ_{op} = (11.1 \pm 1.7 \pm^{0.9}_{0.6}) \times 10^4$ s$^{-1}$ that is substantially larger than the earlier result of Bardin {\it…
▽ More
We report a measurement of the ortho-para transition rate in the p$μ$p molecule. The experiment was conducted at TRIUMF via the measurement of the time dependence of the 5.2 MeV neutrons from muon capture in liquid hydrogen. The measurement yielded an ortho-para rate $Λ_{op} = (11.1 \pm 1.7 \pm^{0.9}_{0.6}) \times 10^4$ s$^{-1}$ that is substantially larger than the earlier result of Bardin {\it et al.} We discuss the striking implications for the proton's induced pseudoscalar coupling $g_p$.
△ Less
Submitted 19 September, 2005;
originally announced September 2005.
-
Search for exotic baryons in double radiative capture on pionic hydrogen
Authors:
P. A. Zolnierczuk,
D. S. Armstrong,
E. Christy,
J. H. D. Clark,
T. P. Gorringe,
M. D. Hasinoff,
M. A. Kovash,
S. Tripathi,
D. H. Wright
Abstract:
We report a search for low-lying exotic baryons via double radiative capture on pionic hydrogen. The data were collected at the TRIUMF cyclotron using the RMC spectrometer by detecting gamma-ray pairs from pion stops in liquid hydrogen. No evidence was found to support an earlier claim for exotic baryons of masses 1004 and 1044 MeV/$c^2$. We obtain upper limits on the branching ratios for double…
▽ More
We report a search for low-lying exotic baryons via double radiative capture on pionic hydrogen. The data were collected at the TRIUMF cyclotron using the RMC spectrometer by detecting gamma-ray pairs from pion stops in liquid hydrogen. No evidence was found to support an earlier claim for exotic baryons of masses 1004 and 1044 MeV/$c^2$. We obtain upper limits on the branching ratios for double radiative capture via these exotic states of $< 3 \times 10^{-6}$ and $< 4 \times 10^{-6}$ respectively.
△ Less
Submitted 23 March, 2004;
originally announced March 2004.
-
Observation of double radiative capture on pionic hydrogen
Authors:
S. Tripathi,
D. S. Armstrong,
M. E. Christy,
J. H. D. Clark,
T. P. Gorringe,
M. D. Hasinoff,
M. A. Kovash,
D. H. Wright,
P. A. Zolnierczuk
Abstract:
We report the first observation of double radiative capture on pionic hydrogen. The experiment was conducted at the TRIUMF cyclotron using the RMC spectrometer, and detected $γ$--ray coincidences following $π^-$ stops in liquid hydrogen. We found the branching ratio for double radiative capture to be $(3.05 \pm 0.27(stat.) \pm 0.31(syst.)) \times 10^{-5}$. The measured branching ratio and angle-…
▽ More
We report the first observation of double radiative capture on pionic hydrogen. The experiment was conducted at the TRIUMF cyclotron using the RMC spectrometer, and detected $γ$--ray coincidences following $π^-$ stops in liquid hydrogen. We found the branching ratio for double radiative capture to be $(3.05 \pm 0.27(stat.) \pm 0.31(syst.)) \times 10^{-5}$. The measured branching ratio and angle-energy distributions support the theoretical prediction of a dominant contribution from the $ππ\to γγ$ annihilation mechanism.
△ Less
Submitted 25 October, 2002; v1 submitted 25 April, 2002;
originally announced April 2002.