Search | arXiv e-print repository

Explaining black box text modules in natural language with language models

Authors: Chandan Singh, Aliyah R. Hsu, Richard Antonello, Shailee Jain, Alexander G. Huth, Bin Yu, Jianfeng Gao

Abstract: Large language models (LLMs) have demonstrated remarkable prediction performance for a growing array of tasks. However, their rapid proliferation and increasing opaqueness have created a growing need for interpretability. Here, we ask whether we can automatically obtain natural language explanations for black box text modules. A "text module" is any function that maps text to a scalar continuous v… ▽ More Large language models (LLMs) have demonstrated remarkable prediction performance for a growing array of tasks. However, their rapid proliferation and increasing opaqueness have created a growing need for interpretability. Here, we ask whether we can automatically obtain natural language explanations for black box text modules. A "text module" is any function that maps text to a scalar continuous value, such as a submodule within an LLM or a fitted model of a brain region. "Black box" indicates that we only have access to the module's inputs/outputs. We introduce Summarize and Score (SASC), a method that takes in a text module and returns a natural language explanation of the module's selectivity along with a score for how reliable the explanation is. We study SASC in 3 contexts. First, we evaluate SASC on synthetic modules and find that it often recovers ground truth explanations. Second, we use SASC to explain modules found within a pre-trained BERT model, enabling inspection of the model's internals. Finally, we show that SASC can generate explanations for the response of individual fMRI voxels to language stimuli, with potential applications to fine-grained brain mapping. All code for using SASC and reproducing results is made available on Github. △ Less

Submitted 15 November, 2023; v1 submitted 16 May, 2023; originally announced May 2023.

arXiv:2010.00504 [pdf, other]

doi 10.1103/PhysRevE.102.062404

Time cells might be optimized for predictive capacity, not redundancy reduction or memory capacity

Authors: Alexander Hsu, Sarah Marzen

Abstract: Recently, researchers have found time cells in the hippocampus that appear to contain information about the timing of past events. Some researchers have argued that time cells are taking a Laplace transform of their input in order to reconstruct the past stimulus. We argue that stimulus prediction, not stimulus reconstruction or redundancy reduction, is in better agreement with observed responses… ▽ More Recently, researchers have found time cells in the hippocampus that appear to contain information about the timing of past events. Some researchers have argued that time cells are taking a Laplace transform of their input in order to reconstruct the past stimulus. We argue that stimulus prediction, not stimulus reconstruction or redundancy reduction, is in better agreement with observed responses of time cells. In the process, we introduce new analyses of nonlinear, continuous-time reservoirs that model these time cells. △ Less

Submitted 1 October, 2020; originally announced October 2020.

arXiv:1006.3271 [pdf]

The probabilistic analysis of language acquisition: Theoretical, computational, and experimental analysis

Authors: Anne S. Hsu, Nick Chater, Paul M. B. Vitanyi

Abstract: There is much debate over the degree to which language learning is governed by innate language-specific biases, or acquired through cognition-general principles. Here we examine the probabilistic language acquisition hypothesis on three levels: We outline a novel theoretical result showing that it is possible to learn the exact generative model underlying a wide class of languages, purely from obs… ▽ More There is much debate over the degree to which language learning is governed by innate language-specific biases, or acquired through cognition-general principles. Here we examine the probabilistic language acquisition hypothesis on three levels: We outline a novel theoretical result showing that it is possible to learn the exact generative model underlying a wide class of languages, purely from observing samples of the language. We then describe a recently proposed practical framework, which quantifies natural language learnability, allowing specific learnability predictions to be made for the first time. In previous work, this framework was used to make learnability predictions for a wide variety of linguistic constructions, for which learnability has been much debated. Here, we present a new experiment which tests these learnability predictions. We find that our experimental results support the possibility that these linguistic constructions are acquired probabilistically from cognition-general principles. △ Less

Submitted 16 June, 2010; originally announced June 2010.

Comments: 26 pages, pdf, 4 figures, Submitted to "Cognition"

MSC Class: 91E10; 97C30; 68T50

Showing 1–3 of 3 results for author: Hsu, A