Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–25 of 25 results for author: McCoy, R T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01687  [pdf, other

    cs.CL cs.AI

    Deciphering the Factors Influencing the Efficacy of Chain-of-Thought: Probability, Memorization, and Noisy Reasoning

    Authors: Akshara Prabhakar, Thomas L. Griffiths, R. Thomas McCoy

    Abstract: Chain-of-Thought (CoT) prompting has been shown to enhance the multi-step reasoning capabilities of Large Language Models (LLMs). However, debates persist about whether LLMs exhibit abstract generalization or rely on shallow heuristics when given CoT prompts. To understand the factors influencing CoT reasoning we provide a detailed case study of the symbolic reasoning task of decoding shift cipher… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 9 pages plus references and appendices

  2. arXiv:2406.18501  [pdf, other

    cs.CL

    Is In-Context Learning a Type of Gradient-Based Learning? Evidence from the Inverse Frequency Effect in Structural Priming

    Authors: Zhenghao Zhou, Robert Frank, R. Thomas McCoy

    Abstract: Large language models (LLMs) have shown the emergent capability of in-context learning (ICL). One line of research has explained ICL as functionally performing gradient descent. In this paper, we introduce a new way of diagnosing whether ICL is functionally equivalent to gradient-based learning. Our approach is based on the inverse frequency effect (IFE) -- a phenomenon in which an error-driven le… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  3. arXiv:2406.17038  [pdf, other

    cs.CL

    modeLing: A Novel Dataset for Testing Linguistic Reasoning in Language Models

    Authors: Nathan A. Chi, Teodor Malchev, Riley Kong, Ryan A. Chi, Lucas Huang, Ethan A. Chi, R. Thomas McCoy, Dragomir Radev

    Abstract: We introduce modeLing, a novel benchmark of Linguistics Olympiad-style puzzles which tests few-shot reasoning in AI systems. Solving these puzzles necessitates inferring aspects of a language's grammatical structure from a small number of examples. Such puzzles provide a natural testbed for language models, as they require compositional generalization and few-shot inductive reasoning. Consisting s… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  4. arXiv:2402.07035  [pdf, other

    cs.LG cs.AI

    Distilling Symbolic Priors for Concept Learning into Neural Networks

    Authors: Ioana Marinescu, R. Thomas McCoy, Thomas L. Griffiths

    Abstract: Humans can learn new concepts from a small number of examples by drawing on their inductive biases. These inductive biases have previously been captured by using Bayesian models defined over symbolic hypothesis spaces. Is it possible to create a neural network that displays the same inductive biases? We show that inductive biases that enable rapid concept learning can be instantiated in artificial… ▽ More

    Submitted 10 February, 2024; originally announced February 2024.

    Comments: 8 pages, 6 figures, 4 tables

  5. arXiv:2312.14226  [pdf, other

    cs.CL cs.AI cs.LG stat.ML

    Deep de Finetti: Recovering Topic Distributions from Large Language Models

    Authors: Liyi Zhang, R. Thomas McCoy, Theodore R. Sumers, Jian-Qiao Zhu, Thomas L. Griffiths

    Abstract: Large language models (LLMs) can produce long, coherent passages of text, suggesting that LLMs, although trained on next-word prediction, must represent the latent structure that characterizes a document. Prior work has found that internal representations of LLMs encode one aspect of latent structure, namely syntax; here we investigate a complementary aspect, namely the document's topic structure.… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

    Comments: 13 pages, 4 figures

    ACM Class: I.2.6; I.2.7

  6. arXiv:2311.10206  [pdf, other

    cs.LG cs.AI

    Bayes in the age of intelligent machines

    Authors: Thomas L. Griffiths, Jian-Qiao Zhu, Erin Grant, R. Thomas McCoy

    Abstract: The success of methods based on artificial neural networks in creating intelligent machines seems like it might pose a challenge to explanations of human cognition in terms of Bayesian inference. We argue that this is not the case, and that in fact these systems offer new opportunities for Bayesian modeling. Specifically, we argue that Bayesian models of cognition and artificial neural networks li… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

  7. arXiv:2309.13638  [pdf, other

    cs.CL cs.AI

    Embers of Autoregression: Understanding Large Language Models Through the Problem They are Trained to Solve

    Authors: R. Thomas McCoy, Shunyu Yao, Dan Friedman, Matthew Hardy, Thomas L. Griffiths

    Abstract: The widespread adoption of large language models (LLMs) makes it important to recognize their strengths and limitations. We argue that in order to develop a holistic understanding of these systems we need to consider the problem that they were trained to solve: next-word prediction over Internet text. By recognizing the pressures that this task exerts we can make predictions about the strategies t… ▽ More

    Submitted 24 September, 2023; originally announced September 2023.

    Comments: 50 pages plus 11 page of references and 23 pages of appendices

  8. arXiv:2305.14701  [pdf, other

    cs.CL cs.AI

    Modeling rapid language learning by distilling Bayesian priors into artificial neural networks

    Authors: R. Thomas McCoy, Thomas L. Griffiths

    Abstract: Humans can learn languages from remarkably little experience. Developing computational models that explain this ability has been a major challenge in cognitive science. Bayesian models that build in strong inductive biases - factors that guide generalization - have been successful at explaining how humans might generalize from few examples in controlled settings but are usually too restrictive to… ▽ More

    Submitted 24 May, 2023; originally announced May 2023.

    Comments: 21 pages plus references; 4 figures

  9. arXiv:2301.11462  [pdf, other

    cs.CL

    How poor is the stimulus? Evaluating hierarchical generalization in neural networks trained on child-directed speech

    Authors: Aditya Yedetore, Tal Linzen, Robert Frank, R. Thomas McCoy

    Abstract: When acquiring syntax, children consistently choose hierarchical rules over competing non-hierarchical possibilities. Is this preference due to a learning bias for hierarchical structure, or due to more general biases that interact with hierarchical cues in children's linguistic input? We explore these possibilities by training LSTMs and Transformers - two types of neural networks without a hierar… ▽ More

    Submitted 6 June, 2023; v1 submitted 26 January, 2023; originally announced January 2023.

    Comments: 10 pages plus references and appendices; accepted to ACL

    ACM Class: J.4; I.2.7

  10. arXiv:2208.06061  [pdf, other

    cs.CL

    Structural Biases for Improving Transformers on Translation into Morphologically Rich Languages

    Authors: Paul Soulos, Sudha Rao, Caitlin Smith, Eric Rosen, Asli Celikyilmaz, R. Thomas McCoy, Yichen Jiang, Coleman Haley, Roland Fernandez, Hamid Palangi, Jianfeng Gao, Paul Smolensky

    Abstract: Machine translation has seen rapid progress with the advent of Transformer-based models. These models have no explicit linguistic structure built into them, yet they may still implicitly learn structured relationships by attending to relevant tokens. We hypothesize that this structural learning could be made more robust by explicitly endowing Transformers with a structural bias, and we investigate… ▽ More

    Submitted 11 August, 2022; originally announced August 2022.

    Comments: Revised edition to 4th Workshop on Technologies for MT of Low Resource Languages

    Journal ref: Proceedings of the 4th Workshop on Technologies for MT of Low Resource Languages (LoResMT2021)

  11. arXiv:2205.01128  [pdf, other

    cs.AI cs.NE cs.SC

    Neurocompositional computing: From the Central Paradox of Cognition to a new generation of AI systems

    Authors: Paul Smolensky, R. Thomas McCoy, Roland Fernandez, Matthew Goldrick, Jianfeng Gao

    Abstract: What explains the dramatic progress from 20th-century to 21st-century AI, and how can the remaining limitations of current AI be overcome? The widely accepted narrative attributes this progress to massive increases in the quantity of computational and data resources available to support statistical learning in deep artificial neural networks. We show that an additional crucial factor is the develo… ▽ More

    Submitted 2 May, 2022; originally announced May 2022.

    Comments: 21 pages, 6 figures. For a general AI audience: to appear in AI Magazine. A more extensive presentation of this work is "Neurocompositional computing in human and machine intelligence: A tutorial", Microsoft Technical Report MSR-TR-2022-5; see https://www.microsoft.com/en-us/research/publication/neurocompositional-computing-in-human-and-machine-intelligence-a-tutorial/

  12. arXiv:2111.09509  [pdf, other

    cs.CL

    How much do language models copy from their training data? Evaluating linguistic novelty in text generation using RAVEN

    Authors: R. Thomas McCoy, Paul Smolensky, Tal Linzen, Jianfeng Gao, Asli Celikyilmaz

    Abstract: Current language models can generate high-quality text. Are they simply copying text they have seen before, or have they learned generalizable linguistic abstractions? To tease apart these possibilities, we introduce RAVEN, a suite of analyses for assessing the novelty of generated text, focusing on sequential structure (n-grams) and syntactic structure. We apply these analyses to four neural lang… ▽ More

    Submitted 17 November, 2021; originally announced November 2021.

    Comments: 10 pages, plus 39 pages of appendices

  13. arXiv:2011.12073  [pdf, other

    cs.CL

    Picking BERT's Brain: Probing for Linguistic Dependencies in Contextualized Embeddings Using Representational Similarity Analysis

    Authors: Michael A. Lepori, R. Thomas McCoy

    Abstract: As the name implies, contextualized representations of language are typically motivated by their ability to encode context. Which aspects of context are captured by such representations? We introduce an approach to address this question using Representational Similarity Analysis (RSA). As case studies, we investigate the degree to which a verb embedding encodes the verb's subject, a pronoun embedd… ▽ More

    Submitted 24 November, 2020; originally announced November 2020.

  14. arXiv:2006.16324  [pdf, other

    cs.CL cs.LG

    Universal linguistic inductive biases via meta-learning

    Authors: R. Thomas McCoy, Erin Grant, Paul Smolensky, Thomas L. Griffiths, Tal Linzen

    Abstract: How do learners acquire languages from the limited data available to them? This process must involve some inductive biases - factors that affect how a learner generalizes - but it is unclear which inductive biases can explain observed patterns in language acquisition. To facilitate computational modeling aimed at addressing this question, we introduce a framework for giving particular linguistic i… ▽ More

    Submitted 29 June, 2020; originally announced June 2020.

    Comments: To appear in the Proceedings of the 42nd Annual Conference of the Cognitive Science Society

  15. arXiv:2005.00019  [pdf, other

    cs.CL

    Representations of Syntax [MASK] Useful: Effects of Constituency and Dependency Structure in Recursive LSTMs

    Authors: Michael A. Lepori, Tal Linzen, R. Thomas McCoy

    Abstract: Sequence-based neural networks show significant sensitivity to syntactic structure, but they still perform less well on syntactic tasks than tree-based networks. Such tree-based networks can be provided with a constituency parse, a dependency parse, or both. We evaluate which of these two representational schemes more effectively introduces biases for syntactic structure that increase performance… ▽ More

    Submitted 30 April, 2020; originally announced May 2020.

    Comments: To appear in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL-2020)

  16. arXiv:2004.11999  [pdf, other

    cs.CL

    Syntactic Data Augmentation Increases Robustness to Inference Heuristics

    Authors: Junghyun Min, R. Thomas McCoy, Dipanjan Das, Emily Pitler, Tal Linzen

    Abstract: Pretrained neural models such as BERT, when fine-tuned to perform natural language inference (NLI), often show high accuracy on standard datasets, but display a surprising lack of sensitivity to word order on controlled challenge sets. We hypothesize that this issue is not primarily caused by the pretrained model's limitations, but rather by the paucity of crowdsourced NLI examples that might conv… ▽ More

    Submitted 24 April, 2020; originally announced April 2020.

    Comments: ACL 2020

  17. arXiv:2001.03632  [pdf, other

    cs.CL

    Does syntax need to grow on trees? Sources of hierarchical inductive bias in sequence-to-sequence networks

    Authors: R. Thomas McCoy, Robert Frank, Tal Linzen

    Abstract: Learners that are exposed to the same training data might generalize differently due to differing inductive biases. In neural network models, inductive biases could in theory arise from any aspect of the model architecture. We investigate which architectural factors affect the generalization behavior of neural sequence-to-sequence models trained on two syntactic tasks, English question formation a… ▽ More

    Submitted 10 January, 2020; originally announced January 2020.

    Comments: 12 pages, 10 figures; accepted to TACL

  18. arXiv:1911.02969  [pdf, other

    cs.CL

    BERTs of a feather do not generalize together: Large variability in generalization across models with similar test set performance

    Authors: R. Thomas McCoy, Junghyun Min, Tal Linzen

    Abstract: If the same neural network architecture is trained multiple times on the same dataset, will it make similar linguistic generalizations across runs? To study this question, we fine-tuned 100 instances of BERT on the Multi-genre Natural Language Inference (MNLI) dataset and evaluated them on the HANS dataset, which evaluates syntactic generalization in natural language inference. On the MNLI develop… ▽ More

    Submitted 16 November, 2020; v1 submitted 7 November, 2019; originally announced November 2019.

    Comments: 11 pages, 7 figures; accepted to the 2020 BlackboxNLP workshop

  19. arXiv:1905.06316  [pdf, other

    cs.CL

    What do you learn from context? Probing for sentence structure in contextualized word representations

    Authors: Ian Tenney, Patrick Xia, Berlin Chen, Alex Wang, Adam Poliak, R Thomas McCoy, Najoung Kim, Benjamin Van Durme, Samuel R. Bowman, Dipanjan Das, Ellie Pavlick

    Abstract: Contextualized representation models such as ELMo (Peters et al., 2018a) and BERT (Devlin et al., 2018) have recently achieved state-of-the-art results on a diverse array of downstream NLP tasks. Building on recent token-level probing work, we introduce a novel edge probing task design and construct a broad suite of sub-sentence tasks derived from the traditional structured NLP pipeline. We probe… ▽ More

    Submitted 15 May, 2019; originally announced May 2019.

    Comments: ICLR 2019 camera-ready version, 17 pages including appendices

  20. arXiv:1904.11544  [pdf, other

    cs.CL

    Probing What Different NLP Tasks Teach Machines about Function Word Comprehension

    Authors: Najoung Kim, Roma Patel, Adam Poliak, Alex Wang, Patrick Xia, R. Thomas McCoy, Ian Tenney, Alexis Ross, Tal Linzen, Benjamin Van Durme, Samuel R. Bowman, Ellie Pavlick

    Abstract: We introduce a set of nine challenge tasks that test for the understanding of function words. These tasks are created by structurally mutating sentences from existing datasets to target the comprehension of specific types of function words (e.g., prepositions, wh-words). Using these probing tasks, we explore the effects of various pretraining objectives for sentence encoders (e.g., language modeli… ▽ More

    Submitted 7 August, 2019; v1 submitted 25 April, 2019; originally announced April 2019.

    Comments: Accepted to *SEM 2019 (revised submission). Corresponding authors: Najoung Kim (n.kim@jhu.edu), Ellie Pavlick (ellie_pavlick@brown.edu)

  21. arXiv:1902.01007  [pdf, other

    cs.CL

    Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference

    Authors: R. Thomas McCoy, Ellie Pavlick, Tal Linzen

    Abstract: A machine learning system can score well on a given test set by relying on heuristics that are effective for frequent example types but break down in more challenging cases. We study this issue within natural language inference (NLI), the task of determining whether one sentence entails another. We hypothesize that statistical NLI models may adopt three fallible syntactic heuristics: the lexical o… ▽ More

    Submitted 24 June, 2019; v1 submitted 3 February, 2019; originally announced February 2019.

    Comments: Camera-ready for ACL 2019

  22. arXiv:1812.10860  [pdf, other

    cs.CL

    Can You Tell Me How to Get Past Sesame Street? Sentence-Level Pretraining Beyond Language Modeling

    Authors: Alex Wang, Jan Hula, Patrick Xia, Raghavendra Pappagari, R. Thomas McCoy, Roma Patel, Najoung Kim, Ian Tenney, Yinghui Huang, Katherin Yu, Shuning Jin, Berlin Chen, Benjamin Van Durme, Edouard Grave, Ellie Pavlick, Samuel R. Bowman

    Abstract: Natural language understanding has recently seen a surge of progress with the use of sentence encoders like ELMo (Peters et al., 2018a) and BERT (Devlin et al., 2019) which are pretrained on variants of language modeling. We conduct the first large-scale systematic study of candidate pretraining tasks, comparing 19 different tasks both as alternatives and complements to language modeling. Our prim… ▽ More

    Submitted 22 July, 2019; v1 submitted 27 December, 2018; originally announced December 2018.

    Comments: ACL 2019. This paper supercedes "Looking for ELMo's Friends: Sentence-Level Pretraining Beyond Language Modeling", an earlier version of this work by the same authors

  23. arXiv:1812.08718  [pdf, other

    cs.CL

    RNNs Implicitly Implement Tensor Product Representations

    Authors: R. Thomas McCoy, Tal Linzen, Ewan Dunbar, Paul Smolensky

    Abstract: Recurrent neural networks (RNNs) can learn continuous vector representations of symbolic structures such as sequences and sentences; these representations often exhibit linear regularities (analogies). Such regularities motivate our hypothesis that RNNs that show such regularities implicitly compile symbolic structures into tensor product representations (TPRs; Smolensky, 1990), which additively c… ▽ More

    Submitted 5 March, 2019; v1 submitted 20 December, 2018; originally announced December 2018.

    Comments: Accepted to ICLR 2019

  24. arXiv:1811.12112  [pdf, ps, other

    cs.CL

    Non-entailed subsequences as a challenge for natural language inference

    Authors: R. Thomas McCoy, Tal Linzen

    Abstract: Neural network models have shown great success at natural language inference (NLI), the task of determining whether a premise entails a hypothesis. However, recent studies suggest that these models may rely on fallible heuristics rather than deep language understanding. We introduce a challenge set to test whether NLI systems adopt one such heuristic: assuming that a sentence entails all of its su… ▽ More

    Submitted 30 November, 2018; v1 submitted 29 November, 2018; originally announced November 2018.

    Comments: Accepted as an abstract for SCiL 2019; added acknowledgments

  25. arXiv:1802.09091  [pdf, other

    cs.CL

    Revisiting the poverty of the stimulus: hierarchical generalization without a hierarchical bias in recurrent neural networks

    Authors: R. Thomas McCoy, Robert Frank, Tal Linzen

    Abstract: Syntactic rules in natural language typically need to make reference to hierarchical sentence structure. However, the simple examples that language learners receive are often equally compatible with linear rules. Children consistently ignore these linear explanations and settle instead on the correct hierarchical one. This fact has motivated the proposal that the learner's hypothesis space is cons… ▽ More

    Submitted 8 June, 2018; v1 submitted 25 February, 2018; originally announced February 2018.

    Comments: Proceedings of the 40th Annual Conference of the Cognitive Science Society; 10 pages