Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–14 of 14 results for author: Andor, D

.
  1. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  2. arXiv:2301.09044  [pdf, other

    cs.LG

    Learning to Reject with a Fixed Predictor: Application to Decontextualization

    Authors: Christopher Mohri, Daniel Andor, Eunsol Choi, Michael Collins

    Abstract: We study the problem of classification with a reject option for a fixed predictor, applicable in natural language processing. We introduce a new problem formulation for this scenario, and an algorithm minimizing a new surrogate loss function. We provide a complete theoretical analysis of the surrogate loss function with a strong $H$-consistency guarantee. For evaluation, we choose the decontextual… ▽ More

    Submitted 31 January, 2023; v1 submitted 21 January, 2023; originally announced January 2023.

  3. arXiv:2212.08037  [pdf, other

    cs.CL

    Attributed Question Answering: Evaluation and Modeling for Attributed Large Language Models

    Authors: Bernd Bohnet, Vinh Q. Tran, Pat Verga, Roee Aharoni, Daniel Andor, Livio Baldini Soares, Massimiliano Ciaramita, Jacob Eisenstein, Kuzman Ganchev, Jonathan Herzig, Kai Hui, Tom Kwiatkowski, Ji Ma, Jianmo Ni, Lierni Sestorain Saralegui, Tal Schuster, William W. Cohen, Michael Collins, Dipanjan Das, Donald Metzler, Slav Petrov, Kellie Webster

    Abstract: Large language models (LLMs) have shown impressive results while requiring little or no direct supervision. Further, there is mounting evidence that LLMs may have potential in information-seeking scenarios. We believe the ability of an LLM to attribute the text that it generates is likely to be crucial in this setting. We formulate and study Attributed QA as a key first step in the development of… ▽ More

    Submitted 10 February, 2023; v1 submitted 15 December, 2022; originally announced December 2022.

  4. arXiv:2210.02498  [pdf, other

    cs.CL cs.LG

    Honest Students from Untrusted Teachers: Learning an Interpretable Question-Answering Pipeline from a Pretrained Language Model

    Authors: Jacob Eisenstein, Daniel Andor, Bernd Bohnet, Michael Collins, David Mimno

    Abstract: Explainable question answering systems should produce not only accurate answers but also rationales that justify their reasoning and allow humans to check their work. But what sorts of rationales are useful and how can we train systems to produce them? We propose a new style of rationale for open-book question answering, called \emph{markup-and-mask}, which combines aspects of extractive and free-… ▽ More

    Submitted 24 April, 2024; v1 submitted 5 October, 2022; originally announced October 2022.

    Comments: added details about a human evaluation

  5. arXiv:2203.17189  [pdf, other

    cs.LG cs.CL

    Scaling Up Models and Data with $\texttt{t5x}$ and $\texttt{seqio}$

    Authors: Adam Roberts, Hyung Won Chung, Anselm Levskaya, Gaurav Mishra, James Bradbury, Daniel Andor, Sharan Narang, Brian Lester, Colin Gaffney, Afroz Mohiuddin, Curtis Hawthorne, Aitor Lewkowycz, Alex Salcianu, Marc van Zee, Jacob Austin, Sebastian Goodman, Livio Baldini Soares, Haitang Hu, Sasha Tsvyashchenko, Aakanksha Chowdhery, Jasmijn Bastings, Jannis Bulian, Xavier Garcia, Jianmo Ni, Andrew Chen , et al. (18 additional authors not shown)

    Abstract: Recent neural network-based language models have benefited greatly from scaling up the size of training datasets and the number of parameters in the models themselves. Scaling can be complicated due to various factors including the need to distribute computation on supercomputer clusters (e.g., TPUs), prevent bottlenecks when infeeding data, and ensure reproducible results. In this work, we presen… ▽ More

    Submitted 31 March, 2022; originally announced March 2022.

  6. arXiv:2009.06354  [pdf, other

    cs.CL cs.AI

    QED: A Framework and Dataset for Explanations in Question Answering

    Authors: Matthew Lamm, Jennimaria Palomaki, Chris Alberti, Daniel Andor, Eunsol Choi, Livio Baldini Soares, Michael Collins

    Abstract: A question answering system that in addition to providing an answer provides an explanation of the reasoning that leads to that answer has potential advantages in terms of debuggability, extensibility and trust. To this end, we propose QED, a linguistically informed, extensible framework for explanations in question answering. A QED explanation specifies the relationship between a question and ans… ▽ More

    Submitted 8 September, 2020; originally announced September 2020.

  7. arXiv:1909.09704  [pdf, other

    cs.CL cs.AI cs.LG

    Measuring Domain Portability and ErrorPropagation in Biomedical QA

    Authors: Stefan Hosein, Daniel Andor, Ryan McDonald

    Abstract: In this work we present Google's submission to the BioASQ 7 biomedical question answering (QA) task (specifically Task 7b, Phase B). The core of our systems are based on BERT QA models, specifically the model of \cite{alberti2019bert}. In this report, and via our submissions, we aimed to investigate two research questions. We start by studying how domain portable are QA systems that have been pre-… ▽ More

    Submitted 24 September, 2019; v1 submitted 12 September, 2019; originally announced September 2019.

  8. arXiv:1909.00109  [pdf, ps, other

    cs.CL cs.LG

    Giving BERT a Calculator: Finding Operations and Arguments with Reading Comprehension

    Authors: Daniel Andor, Luheng He, Kenton Lee, Emily Pitler

    Abstract: Reading comprehension models have been successfully applied to extractive text answers, but it is unclear how best to generalize these models to abstractive numerical answers. We enable a BERT-based reading comprehension model to perform lightweight numerical reasoning. We augment the model with a predefined set of executable 'programs' which encompass simple arithmetic as well as extraction. Rath… ▽ More

    Submitted 12 September, 2019; v1 submitted 30 August, 2019; originally announced September 2019.

  9. arXiv:1906.05416  [pdf, other

    cs.CL

    Synthetic QA Corpora Generation with Roundtrip Consistency

    Authors: Chris Alberti, Daniel Andor, Emily Pitler, Jacob Devlin, Michael Collins

    Abstract: We introduce a novel method of generating synthetic question answering corpora by combining models of question generation and answer extraction, and by filtering the results to ensure roundtrip consistency. By pretraining on the resulting corpora we obtain significant improvements on SQuAD2 and NQ, establishing a new state-of-the-art on the latter. Our synthetic data generation models, for both qu… ▽ More

    Submitted 12 June, 2019; originally announced June 2019.

  10. arXiv:1805.08237  [pdf, other

    cs.CL

    Morphosyntactic Tagging with a Meta-BiLSTM Model over Context Sensitive Token Encodings

    Authors: Bernd Bohnet, Ryan McDonald, Goncalo Simoes, Daniel Andor, Emily Pitler, Joshua Maynez

    Abstract: The rise of neural networks, and particularly recurrent neural networks, has produced significant advances in part-of-speech tagging accuracy. One characteristic common among these models is the presence of rich initial word encodings. These encodings typically are composed of a recurrent character-based representation with learned and pre-trained word embeddings. However, these encodings do not c… ▽ More

    Submitted 21 May, 2018; originally announced May 2018.

    Journal ref: ACL 2018

  11. arXiv:1804.08199  [pdf, other

    cs.CL

    Linguistically-Informed Self-Attention for Semantic Role Labeling

    Authors: Emma Strubell, Patrick Verga, Daniel Andor, David Weiss, Andrew McCallum

    Abstract: Current state-of-the-art semantic role labeling (SRL) uses a deep neural network with no explicit linguistic features. However, prior work has shown that gold syntax trees can dramatically improve SRL decoding, suggesting the possibility of increased accuracy from explicit modeling of syntax. In this work, we present linguistically-informed self-attention (LISA): a neural network model that combin… ▽ More

    Submitted 12 November, 2018; v1 submitted 22 April, 2018; originally announced April 2018.

    Comments: In Conference on Empirical Methods in Natural Language Processing (EMNLP). Brussels, Belgium. October 2018

  12. arXiv:1703.04929  [pdf, ps, other

    cs.CL

    SyntaxNet Models for the CoNLL 2017 Shared Task

    Authors: Chris Alberti, Daniel Andor, Ivan Bogatyy, Michael Collins, Dan Gillick, Lingpeng Kong, Terry Koo, Ji Ma, Mark Omernick, Slav Petrov, Chayut Thanapirom, Zora Tung, David Weiss

    Abstract: We describe a baseline dependency parsing system for the CoNLL2017 Shared Task. This system, which we call "ParseySaurus," uses the DRAGNN framework [Kong et al, 2017] to combine transition-based recurrent parsing and tagging with character-based word representations. On the v1.3 Universal Dependencies Treebanks, the new system outpeforms the publicly available, state-of-the-art "Parsey's Cousins"… ▽ More

    Submitted 15 March, 2017; originally announced March 2017.

    Comments: Tech report

  13. arXiv:1703.04474  [pdf, other

    cs.CL

    DRAGNN: A Transition-based Framework for Dynamically Connected Neural Networks

    Authors: Lingpeng Kong, Chris Alberti, Daniel Andor, Ivan Bogatyy, David Weiss

    Abstract: In this work, we present a compact, modular framework for constructing novel recurrent neural architectures. Our basic module is a new generic unit, the Transition Based Recurrent Unit (TBRU). In addition to hidden layer activations, TBRUs have discrete state dynamics that allow network connections to be built dynamically as a function of intermediate activations. By connecting multiple TBRUs, we… ▽ More

    Submitted 13 March, 2017; originally announced March 2017.

    Comments: 10 pages; Submitted for review to ACL2017

  14. arXiv:1603.06042  [pdf, ps, other

    cs.CL cs.LG cs.NE

    Globally Normalized Transition-Based Neural Networks

    Authors: Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro Presta, Kuzman Ganchev, Slav Petrov, Michael Collins

    Abstract: We introduce a globally normalized transition-based neural network model that achieves state-of-the-art part-of-speech tagging, dependency parsing and sentence compression results. Our model is a simple feed-forward neural network that operates on a task-specific transition system, yet achieves comparable or better accuracies than recurrent models. We discuss the importance of global as opposed to… ▽ More

    Submitted 8 June, 2016; v1 submitted 18 March, 2016; originally announced March 2016.