NeuroSurgeon: A Toolkit for Subnetwork Analysis

Michael A. Lepori Ellie Pavlick

Department of Computer Science Department of Computer Science
Brown University Brown University
michael_lepori@brown.edu ellie_pavlick@brown.edu

Thomas Serre
Carney Institute for Brain Science
Brown University
Abstract 2 Overview
Despite recent advances in the field of explain- NeuroSurgeon supports several popular models
ability, much remains unknown about the al- within the Huggingface Transformers repository
gorithms that neural networks learn to repre-
(Wolf et al., 2019), including ViT (Dosovitskiy
arXiv:2309.00244v1 [cs.LG] 1 Sep 2023

sent. Recent work has attempted to under-

stand trained models by decomposing them
et al., 2020), ResNet (He et al., 2016), GPT2 (Rad-
into functional circuits (Csordás et al., 2020; ford et al., 2019), BERT (Devlin et al., 2018), and
Lepori et al., 2023). To advance this research, more. With NeuroSurgeon, one discovers func-
we developed NeuroSurgeon, a python library tional subnetworks by optimizing a binary mask
that can be used to discover and manipulate over weights (or neurons) within model layers, ab-
subnetworks within models in the Hugging- lating everything except the units necessary for a
face Transformers library (Wolf et al., 2019). particular computation. We have implemented two
NeuroSurgeon is freely available at https:
optimization-based techniques from model prun-
ing (as well as a simple baseline technique) for
1 Introduction generating these binary masks.
Neural networks – particularly transformers Hard-Concrete Masking: Hard-Concrete mask-
(Vaswani et al., 2017) – are the de facto solution ing was introduced to provide an approximation to
to machine learning problems in both industry and the l0 penalty, providing a bias towards sparse solu-
academia. Despite their ubiquity, these models are tions during model training (Louizos et al., 2017).
largely inscrutable. Recent work in mechanistic This technique produces masks by stochastically
interpretability has manually reverse-engineered sampling mask values from a parameterized hard-
specialized circuits1 in small models, but scaling concrete distribution.
up this approach poses a daunting challenge (Nanda
et al., 2022; Merullo et al., 2023; Wang et al., 2022; Continuous Sparsification: Continuous Sparsi-
Olsson et al., 2022). fication was introduced to provide a deterministic
Another line of work employs subnetwork anal- approximation to the l0 penalty (Savarese et al.,
ysis to understand the internal structure of trained 2020). This technique produces masks by anneal-
models. This approach seeks to automatically un- ing a parameterized soft mask into a hard mask
cover circuits within a trained model and locate over the course of training.
them in particular subnetworks. This approach
Magnitude Pruning: Magnitude pruning sim-
borrows techniques from model pruning to un-
ply ablates some fraction of the lowest magnitude
cover subnetworks that might implement such high-
weights (Han et al., 2015). Though simple, this
level computations. We developed a python li-
approach has been used in several important works
brary – NeuroSurgeon – to simplify the process
on pruning and subnetworks, notably the Lottery
of subnetwork analysis, allowing researchers to
Ticket Hypothesis (Frankle and Carbin, 2018). This
more quickly uncover the internal structure that
method should be used as a baseline to compare
lies within trained models.
against the optimization-based methods described
In this work, we define a “circuit” as a portion of a model above.
that performs some high-level functions, and a “subnetwork”
as any subset of weights or neurons within a model. A circuit
can thus be localized to a subnetwork, and a subnetwork can When performing subnetwork analysis, we
comprise a circuit if it performs a high-level function. freeze the underlying model weights and optimize
the parameters introduced by Continuous Spar-
sification or Hard-Concrete Masking. We typi-
cally include an l0 regularization term on the mask
to encourage parsimonious subnetworks. Both
optimization-based techniques can be used to dis-
cover subnetworks at the weight or neuron level.

3 Visualization
In order to visualize the results of subnetwork anal-
ysis, we have implemented a visualizer that can
be used to understand how subnetworks are dis-
tributed throughout the layers of a model. It can be
used to display one or two subnetworks within the
same model. See Figure 1 for an example visual-
ization of two subnetworks in a 2-layer GPT2-style

4 Related Work
Subnetwork analysis has been used in a wide va-
riety of contexts in recent deep learning research.
Some studies have used subnetwork analysis to
uncover how linguistic information is distributed
throughout a model (De Cao et al., 2022, 2020).
One notable approach to this is subnetwork prob-
ing (Cao et al., 2021), which NeuroSurgeon imple-
ments. Others have sought to understand how par-
ticular computations are structured within model
weights (Csordás et al., 2020; Lepori et al., 2023;
Conmy et al., 2023). Still others have used subnet-
work analysis to better understand generalization
and transfer learning (Zhang et al., 2021; Panigrahi
et al., 2023; Zheng et al., 2023; Guo et al., 2021),
or to control model behavior (Li et al., 2023).

5 Discussion Figure 1: Visualization of two subnetworks within a

2-layer GPT2-style transformer. This transformer was
We present NeuroSurgeon, a python library de- trained in a multitask fashion on addition and multi-
signed to enable researchers to easily identify func- plication tasks, similar to the Addition/Multiplication
tional subnetworks within trained models. We hope setting in Csordás et al. (2020). One subnetwork was
that NeuroSurgeon lowers the barrier to entry for optimized to solve addition problems and the other was
optimized to solve multiplication problems. Both were
researchers interested in performing subnetwork
trained with l0 regularization. Notably, we see that the
analysis for mechanistic interpretability. subnetworks are sparse – the majority of each block was
pruned. Additionally, we see more subnetwork overlap
in Layer 0 than in Layer 1. For instance, the subnet-
References works are almost entirely overlapping in Layer 0’s MLP.
