-
Look Before You Leap: A Universal Emergent Decomposition of Retrieval Tasks in Language Models
Authors:
Alexandre Variengien,
Eric Winsor
Abstract:
When solving challenging problems, language models (LMs) are able to identify relevant information from long and complicated contexts. To study how LMs solve retrieval tasks in diverse situations, we introduce ORION, a collection of structured retrieval tasks spanning six domains, from text understanding to coding. Each task in ORION can be represented abstractly by a request (e.g. a question) tha…
▽ More
When solving challenging problems, language models (LMs) are able to identify relevant information from long and complicated contexts. To study how LMs solve retrieval tasks in diverse situations, we introduce ORION, a collection of structured retrieval tasks spanning six domains, from text understanding to coding. Each task in ORION can be represented abstractly by a request (e.g. a question) that retrieves an attribute (e.g. the character name) from a context (e.g. a story). We apply causal analysis on 18 open-source language models with sizes ranging from 125 million to 70 billion parameters. We find that LMs internally decompose retrieval tasks in a modular way: middle layers at the last token position process the request, while late layers retrieve the correct entity from the context. After causally enforcing this decomposition, models are still able to solve the original task, preserving 70% of the original correct token probability in 98 of the 106 studied model-task pairs. We connect our macroscopic decomposition with a microscopic description by performing a fine-grained case study of a question-answering task on Pythia-2.8b. Building on our high-level understanding, we demonstrate a proof of concept application for scalable internal oversight of LMs to mitigate prompt-injection while requiring human supervision on only a single input. Our solution improves accuracy drastically (from 15.5% to 97.5% on Pythia-12b). This work presents evidence of a universal emergent modular processing of tasks across varied domains and models and is a pioneering effort in applying interpretability for scalable internal oversight of LMs.
△ Less
Submitted 13 December, 2023;
originally announced December 2023.
-
Interpreting Neural Networks through the Polytope Lens
Authors:
Sid Black,
Lee Sharkey,
Leo Grinsztajn,
Eric Winsor,
Dan Braun,
Jacob Merizian,
Kip Parker,
Carlos Ramón Guevara,
Beren Millidge,
Gabriel Alfour,
Connor Leahy
Abstract:
Mechanistic interpretability aims to explain what a neural network has learned at a nuts-and-bolts level. What are the fundamental primitives of neural network representations? Previous mechanistic descriptions have used individual neurons or their linear combinations to understand the representations a network has learned. But there are clues that neurons and their linear combinations are not the…
▽ More
Mechanistic interpretability aims to explain what a neural network has learned at a nuts-and-bolts level. What are the fundamental primitives of neural network representations? Previous mechanistic descriptions have used individual neurons or their linear combinations to understand the representations a network has learned. But there are clues that neurons and their linear combinations are not the correct fundamental units of description: directions cannot describe how neural networks use nonlinearities to structure their representations. Moreover, many instances of individual neurons and their combinations are polysemantic (i.e. they have multiple unrelated meanings). Polysemanticity makes interpreting the network in terms of neurons or directions challenging since we can no longer assign a specific feature to a neural unit. In order to find a basic unit of description that does not suffer from these problems, we zoom in beyond just directions to study the way that piecewise linear activation functions (such as ReLU) partition the activation space into numerous discrete polytopes. We call this perspective the polytope lens. The polytope lens makes concrete predictions about the behavior of neural networks, which we evaluate through experiments on both convolutional image classifiers and language models. Specifically, we show that polytopes can be used to identify monosemantic regions of activation space (while directions are not in general monosemantic) and that the density of polytope boundaries reflect semantic boundaries. We also outline a vision for what mechanistic interpretability might look like through the polytope lens.
△ Less
Submitted 22 November, 2022;
originally announced November 2022.
-
Scatterbrain: Unifying Sparse and Low-rank Attention Approximation
Authors:
Beidi Chen,
Tri Dao,
Eric Winsor,
Zhao Song,
Atri Rudra,
Christopher Ré
Abstract:
Recent advances in efficient Transformers have exploited either the sparsity or low-rank properties of attention matrices to reduce the computational and memory bottlenecks of modeling long sequences. However, it is still challenging to balance the trade-off between model quality and efficiency to perform a one-size-fits-all approximation for different tasks. To better understand this trade-off, w…
▽ More
Recent advances in efficient Transformers have exploited either the sparsity or low-rank properties of attention matrices to reduce the computational and memory bottlenecks of modeling long sequences. However, it is still challenging to balance the trade-off between model quality and efficiency to perform a one-size-fits-all approximation for different tasks. To better understand this trade-off, we observe that sparse and low-rank approximations excel in different regimes, determined by the softmax temperature in attention, and sparse + low-rank can outperform each individually. Inspired by the classical robust-PCA algorithm for sparse and low-rank decomposition, we propose Scatterbrain, a novel way to unify sparse (via locality sensitive hashing) and low-rank (via kernel feature map) attention for accurate and efficient approximation. The estimation is unbiased with provably low error. We empirically show that Scatterbrain can achieve 2.1x lower error than baselines when serving as a drop-in replacement in BigGAN image generation and pre-trained T2T-ViT. On a pre-trained T2T Vision transformer, even without fine-tuning, Scatterbrain can reduce 98% of attention memory at the cost of only 1% drop in accuracy. We demonstrate Scatterbrain for end-to-end training with up to 4 points better perplexity and 5 points better average accuracy than sparse or low-rank efficient transformers on language modeling and long-range-arena tasks.
△ Less
Submitted 28 October, 2021;
originally announced October 2021.
-
Generalized Lyndon Factorizations of Infinite Words
Authors:
Amanda Burcroff,
Eric Winsor
Abstract:
A generalized lexicographic order on words is a lexicographic order where the total order of the alphabet depends on the position of the comparison. A generalized Lyndon word is a finite word which is strictly smallest among its class of rotations with respect to a generalized lexicographic order. This notion can be extended to infinite words: an infinite generalized Lyndon word is an infinite wor…
▽ More
A generalized lexicographic order on words is a lexicographic order where the total order of the alphabet depends on the position of the comparison. A generalized Lyndon word is a finite word which is strictly smallest among its class of rotations with respect to a generalized lexicographic order. This notion can be extended to infinite words: an infinite generalized Lyndon word is an infinite word which is strictly smallest among its class of suffixes. We prove a conjecture of Dolce, Restivo, and Reutenauer: every infinite word has a unique nonincreasing factorization into finite and infinite generalized Lyndon words. When this factorization has finitely many terms, we characterize the last term of the factorization. Our methods also show that the infinite generalized Lyndon words are precisely the words with infinitely many generalized Lyndon prefixes.
△ Less
Submitted 20 June, 2019; v1 submitted 12 May, 2019;
originally announced May 2019.
-
A Refined Conjecture for the Variance of Gaussian Primes Across Sectors
Authors:
Ryan C. Chen,
Yujin H. Kim,
Jared D. Lichtman,
Steven J. Miller,
Alina Shubina,
Shannon Sweitzer,
Ezra Waxman,
Eric Winsor,
Jianing Yang
Abstract:
We derive a refined conjecture for the variance of Gaussian primes across sectors, with a power saving error term, by applying the L-functions Ratios Conjecture. We observe a bifurcation point in the main term, consistent with the Random Matrix Theory (RMT) heuristic previously proposed by Rudnick and Waxman. Our model also identifies a second bifurcation point, undetected by the RMT model, that e…
▽ More
We derive a refined conjecture for the variance of Gaussian primes across sectors, with a power saving error term, by applying the L-functions Ratios Conjecture. We observe a bifurcation point in the main term, consistent with the Random Matrix Theory (RMT) heuristic previously proposed by Rudnick and Waxman. Our model also identifies a second bifurcation point, undetected by the RMT model, that emerges upon taking into account lower order terms. For sufficiently small sectors, we moreover prove an unconditional result that is consistent with our conjecture down to lower order terms.
△ Less
Submitted 22 February, 2021; v1 submitted 22 January, 2019;
originally announced January 2019.
-
Limiting Distributions in Generalized Zeckendorf Decompositions
Authors:
Alexandre Gueganic,
Granger Carty,
Yujin H. Kim,
Steven J. Miller,
Alina Shubina,
Shannon Sweitzer,
Eric Winsor,
Jianing Yang
Abstract:
An equivalent definition of the Fibonacci numbers is that they are the unique sequence such that every integer can be written uniquely as a sum of non-adjacent terms. We can view this as we have bins of length 1, we can take at most one element from a bin, and if we choose an element from a bin we cannot take one from a neighboring bin. We generalize to allowing bins of varying length and restrict…
▽ More
An equivalent definition of the Fibonacci numbers is that they are the unique sequence such that every integer can be written uniquely as a sum of non-adjacent terms. We can view this as we have bins of length 1, we can take at most one element from a bin, and if we choose an element from a bin we cannot take one from a neighboring bin. We generalize to allowing bins of varying length and restrictions as to how many elements may be used in a decomposition. We derive conditions on when the resulting sequences have uniqueness of decomposition, and (similar to the Fibonacci case) when the number of summands converges to a Gaussian; the main tool in the proofs here is the Lyaponuv Central Limit Theorem.
△ Less
Submitted 6 October, 2018;
originally announced October 2018.
-
Lower-Order Biases Second Moments of Dirichlet Coefficients in Families of $L$-Functions
Authors:
Megumi Asada,
Ryan Chen,
Eva Fourakis,
Yujin Kim,
Andrew Kwon,
Jared D. Lichtman,
Blake Mackall,
Steven J. Miller,
Eric Winsor,
Karl Winsor,
Jianing Yang,
Kevin Yang
Abstract:
Let $\mathcal E: y^2 = x^3 + A(T)x + B(T)$ be a nontrivial one-parameter family of elliptic curves over $\mathbb{Q}(T)$, with $A(T), B(T) \in \mathbb Z(T)$, and consider the $k$\textsuperscript{th} moments $A_{k,\mathcal{E}}(p) := \sum_{t (p)} a_{\mathcal{E}_t}(p)^k$ of the Dirichlet coefficients $a_{\mathcal{E}_t}(p) := p + 1 - |\mathcal{E}_t (\mathbb{F}_p)|$. Rosen and Silverman proved a conject…
▽ More
Let $\mathcal E: y^2 = x^3 + A(T)x + B(T)$ be a nontrivial one-parameter family of elliptic curves over $\mathbb{Q}(T)$, with $A(T), B(T) \in \mathbb Z(T)$, and consider the $k$\textsuperscript{th} moments $A_{k,\mathcal{E}}(p) := \sum_{t (p)} a_{\mathcal{E}_t}(p)^k$ of the Dirichlet coefficients $a_{\mathcal{E}_t}(p) := p + 1 - |\mathcal{E}_t (\mathbb{F}_p)|$. Rosen and Silverman proved a conjecture of Nagao relating the first moment $A_{1,\mathcal{E}}(p)$ to the rank of the family over $\mathbb{Q}(T)$, and Michel proved that if $j(T)$ is not constant then the second moment is equal to $A_{2,\mathcal{E}}(p) = p^2 + O(p^{3/2})$. Cohomological arguments show that the lower order terms are of sizes $p^{3/2}, p, p^{1/2}$, and $1$. In every case we are able to analyze in closed form, the largest lower order term in the second moment expansion that does not average to zero is on average negative, though numerics suggest this may fail for families of moderate rank. We prove this Bias Conjecture for several large classes of families, including families with rank, complex multiplication, and constant $j(T)$-invariant. We also study the analogous Bias Conjecture for families of Dirichlet characters, holomorphic forms on GL$(2)/\mathbb{Q}$, and their symmetric powers and Rankin-Selberg convolutions. We identify all lower order terms in large classes of families, shedding light on the arithmetic objects controlling these terms. The negative bias in these lower order terms has implications toward the excess rank conjecture and the behavior of zeros near the central point in families of $L$-functions.
△ Less
Submitted 7 February, 2021; v1 submitted 18 August, 2018;
originally announced August 2018.
-
Spectral Statistics of Non-Hermitian Random Matrix Ensembles
Authors:
Ryan C. Chen,
Yujin H. Kim,
Jared D. Lichtman,
Steven J. Miller,
Shannon Sweitzer,
Eric Winsor
Abstract:
Recently Burkhardt et. al. introduced the $k$-checkerboard random matrix ensembles, which have a split limiting behavior of the eigenvalues (in the limit all but $k$ of the eigenvalues are on the order of $\sqrt{N}$ and converge to semi-circular behavior, with the remaining $k$ of size $N$ and converging to hollow Gaussian ensembles). We generalize their work to consider non-Hermitian ensembles wi…
▽ More
Recently Burkhardt et. al. introduced the $k$-checkerboard random matrix ensembles, which have a split limiting behavior of the eigenvalues (in the limit all but $k$ of the eigenvalues are on the order of $\sqrt{N}$ and converge to semi-circular behavior, with the remaining $k$ of size $N$ and converging to hollow Gaussian ensembles). We generalize their work to consider non-Hermitian ensembles with complex eigenvalues; instead of a blip new behavior is seen, ranging from multiple satellites to annular rings. These results are based on moment method techniques adapted to the complex plane as well as analysis of singular values, and we further isolate the singular value joint density formula for the Complex Symmetric Gaussian Ensemble.
△ Less
Submitted 10 April, 2018; v1 submitted 21 March, 2018;
originally announced March 2018.