Neurosurgeon: A Toolkit For Subnetwork Analysis: Csordás Et Al. 2020 Lepori Et Al. 2023

NeuroSurgeon: A Toolkit for Subnetwork Analysis
Michael A. Lepori Ellie Pavlick

Department of Computer Science Department of Computer Science
Brown University Brown University
michael_lepori@brown.edu ellie_pavlick@brown.edu
Thomas Serre
Carney Institute for Brain Science
Brown University
thomas_serre@brown.edu
Abstract 2 Overview
Despite recent advances in the field of explain- NeuroSurgeon supports several popular models
ability, much remains unknown about the al- within the Huggingface Transformers repository
gorithms that neural networks learn to repre-
(Wolf et al., 2019), including ViT (Dosovitskiy
arXiv:2309.00244v1 [cs.LG] 1 Sep 2023
sent. Recent work has attempted to under-

stand trained models by decomposing them
et al., 2020), ResNet (He et al., 2016), GPT2 (Rad-
into functional circuits (Csordás et al., 2020; ford et al., 2019), BERT (Devlin et al., 2018), and
Lepori et al., 2023). To advance this research, more. With NeuroSurgeon, one discovers func-
we developed NeuroSurgeon, a python library tional subnetworks by optimizing a binary mask
that can be used to discover and manipulate over weights (or neurons) within model layers, ab-
subnetworks within models in the Hugging- lating everything except the units necessary for a
face Transformers library (Wolf et al., 2019). particular computation. We have implemented two
NeuroSurgeon is freely available at https:
optimization-based techniques from model prun-
//github.com/mlepori1/NeuroSurgeon.
ing (as well as a simple baseline technique) for
1 Introduction generating these binary masks.
Neural networks – particularly transformers Hard-Concrete Masking: Hard-Concrete mask-
(Vaswani et al., 2017) – are the de facto solution ing was introduced to provide an approximation to
to machine learning problems in both industry and the l0 penalty, providing a bias towards sparse solu-
academia. Despite their ubiquity, these models are tions during model training (Louizos et al., 2017).
largely inscrutable. Recent work in mechanistic This technique produces masks by stochastically
interpretability has manually reverse-engineered sampling mask values from a parameterized hard-
specialized circuits1 in small models, but scaling concrete distribution.
up this approach poses a daunting challenge (Nanda
et al., 2022; Merullo et al., 2023; Wang et al., 2022; Continuous Sparsification: Continuous Sparsi-
Olsson et al., 2022). fication was introduced to provide a deterministic
Another line of work employs subnetwork anal- approximation to the l0 penalty (Savarese et al.,
ysis to understand the internal structure of trained 2020). This technique produces masks by anneal-
models. This approach seeks to automatically un- ing a parameterized soft mask into a hard mask
cover circuits within a trained model and locate over the course of training.
them in particular subnetworks. This approach
Magnitude Pruning: Magnitude pruning sim-
borrows techniques from model pruning to un-
ply ablates some fraction of the lowest magnitude
cover subnetworks that might implement such high-
weights (Han et al., 2015). Though simple, this
level computations. We developed a python li-
approach has been used in several important works
brary – NeuroSurgeon – to simplify the process
on pruning and subnetworks, notably the Lottery
of subnetwork analysis, allowing researchers to
Ticket Hypothesis (Frankle and Carbin, 2018). This
more quickly uncover the internal structure that
method should be used as a baseline to compare
lies within trained models.
against the optimization-based methods described
1
In this work, we define a “circuit” as a portion of a model above.
that performs some high-level functions, and a “subnetwork”
as any subset of weights or neurons within a model. A circuit
can thus be localized to a subnetwork, and a subnetwork can When performing subnetwork analysis, we
comprise a circuit if it performs a high-level function. freeze the underlying model weights and optimize
the parameters introduced by Continuous Spar-
sification or Hard-Concrete Masking. We typi-
cally include an l0 regularization term on the mask
to encourage parsimonious subnetworks. Both
optimization-based techniques can be used to dis-
cover subnetworks at the weight or neuron level.
3 Visualization
In order to visualize the results of subnetwork anal-
ysis, we have implemented a visualizer that can
be used to understand how subnetworks are dis-
tributed throughout the layers of a model. It can be
used to display one or two subnetworks within the
same model. See Figure 1 for an example visual-
ization of two subnetworks in a 2-layer GPT2-style
transformer.
4 Related Work
Subnetwork analysis has been used in a wide va-
riety of contexts in recent deep learning research.
Some studies have used subnetwork analysis to
uncover how linguistic information is distributed
throughout a model (De Cao et al., 2022, 2020).
One notable approach to this is subnetwork prob-
ing (Cao et al., 2021), which NeuroSurgeon imple-
ments. Others have sought to understand how par-
ticular computations are structured within model
weights (Csordás et al., 2020; Lepori et al., 2023;
Conmy et al., 2023). Still others have used subnet-
work analysis to better understand generalization
and transfer learning (Zhang et al., 2021; Panigrahi
et al., 2023; Zheng et al., 2023; Guo et al., 2021),
or to control model behavior (Li et al., 2023).
5 Discussion Figure 1: Visualization of two subnetworks within a

2-layer GPT2-style transformer. This transformer was
We present NeuroSurgeon, a python library de- trained in a multitask fashion on addition and multi-
signed to enable researchers to easily identify func- plication tasks, similar to the Addition/Multiplication
tional subnetworks within trained models. We hope setting in Csordás et al. (2020). One subnetwork was
that NeuroSurgeon lowers the barrier to entry for optimized to solve addition problems and the other was
optimized to solve multiplication problems. Both were
researchers interested in performing subnetwork
trained with l0 regularization. Notably, we see that the
analysis for mechanistic interpretability. subnetworks are sparse – the majority of each block was
pruned. Additionally, we see more subnetwork overlap
in Layer 0 than in Layer 1. For instance, the subnet-
References works are almost entirely overlapping in Layer 0’s MLP.
On the other hand, Layer 1’s MLP and Attention Heads
Steven Cao, Victor Sanh, and Alexander M Rush. 2021. 1 and 2 contain very little overlap.
Low-complexity probing via finding subnetworks. In
Proceedings of the 2021 Conference of the North
American Chapter of the Association for Computa-
tional Linguistics: Human Language Technologies,
pages 960–966.
Arthur Conmy, Augustine N Mavor-Parker, Aengus Maximilian Li, Xander Davies, and Max Nadeau. 2023.
Lynch, Stefan Heimersheim, and Adrià Garriga- Circuit breaking: Removing model behaviors with
Alonso. 2023. Towards automated circuit discov- targeted ablation.
ery for mechanistic interpretability. arXiv preprint
arXiv:2304.14997. Christos Louizos, Max Welling, and Diederik P Kingma.
2017. Learning sparse neural networks through l_0
Róbert Csordás, Sjoerd van Steenkiste, and Jürgen regularization. arXiv preprint arXiv:1712.01312.
Schmidhuber. 2020. Are neural nets modular? in-
specting functional modularity through differentiable Jack Merullo, Carsten Eickhoff, and Ellie Pavlick. 2023.
weight masks. In International Conference on Learn- Language models implement simple word2vec-style
ing Representations. vector arithmetic. arXiv preprint arXiv:2305.16130.
Nicola De Cao, Michael Sejr Schlichtkrull, Wilker Aziz,
and Ivan Titov. 2020. How do decisions emerge Neel Nanda, Lawrence Chan, Tom Lieberum, Jess
across layers in neural models? interpretation with Smith, and Jacob Steinhardt. 2022. Progress mea-
differentiable masking. In Proceedings of the 2020 sures for grokking via mechanistic interpretability. In
Conference on Empirical Methods in Natural Lan- The Eleventh International Conference on Learning
guage Processing (EMNLP), pages 3243–3255. Representations.
Nicola De Cao, Leon Schmid, Dieuwke Hupkes, and Catherine Olsson, Nelson Elhage, Neel Nanda, Nicholas
Ivan Titov. 2022. Sparse interventions in language Joseph, Nova DasSarma, Tom Henighan, Ben Mann,
models with differentiable masking. In Proceedings Amanda Askell, Yuntao Bai, Anna Chen, et al. 2022.
of the Fifth BlackboxNLP Workshop on Analyzing In-context learning and induction heads. arXiv
and Interpreting Neural Networks for NLP, pages preprint arXiv:2209.11895.
16–27.
Abhishek Panigrahi, Nikunj Saunshi, Haoyu Zhao, and
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Sanjeev Arora. 2023. Task-specific skill localiza-
Kristina Toutanova. 2018. Bert: Pre-training of deep tion in fine-tuned language models. arXiv preprint
bidirectional transformers for language understand- arXiv:2302.06600.
ing. arXiv preprint arXiv:1810.04805.
Alec Radford, Jeffrey Wu, Rewon Child, David Luan,
Alexey Dosovitskiy, Lucas Beyer, Alexander
Dario Amodei, Ilya Sutskever, et al. 2019. Language
Kolesnikov, Dirk Weissenborn, Xiaohua Zhai,
models are unsupervised multitask learners. OpenAI
Thomas Unterthiner, Mostafa Dehghani, Matthias
blog, 1(8):9.
Minderer, Georg Heigold, Sylvain Gelly, et al. 2020.
An image is worth 16x16 words: Transformers
for image recognition at scale. In International Pedro Savarese, Hugo Silva, and Michael Maire. 2020.
Conference on Learning Representations. Winning the lottery with continuous sparsification.
Advances in neural information processing systems,
Jonathan Frankle and Michael Carbin. 2018. The lottery 33:11380–11390.
ticket hypothesis: Finding sparse, trainable neural
networks. arXiv preprint arXiv:1803.03635. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob
Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz
Demi Guo, Alexander M Rush, and Yoon Kim. 2021. Kaiser, and Illia Polosukhin. 2017. Attention is all
Parameter-efficient transfer learning with diff prun- you need. Advances in neural information processing
ing. In Proceedings of the 59th Annual Meeting of the systems, 30.
Association for Computational Linguistics and the
11th International Joint Conference on Natural Lan- Kevin Ro Wang, Alexandre Variengien, Arthur Conmy,
guage Processing (Volume 1: Long Papers), pages Buck Shlegeris, and Jacob Steinhardt. 2022. Inter-
4884–4896. pretability in the wild: a circuit for indirect object
identification in gpt-2 small. In The Eleventh Inter-
Song Han, Jeff Pool, John Tran, and William Dally. national Conference on Learning Representations.
2015. Learning both weights and connections for
efficient neural network. Advances in neural infor-
Thomas Wolf, Lysandre Debut, Victor Sanh, Julien
mation processing systems, 28.
Chaumond, Clement Delangue, Anthony Moi, Pier-
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian ric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz,
Sun. 2016. Deep residual learning for image recog- et al. 2019. Huggingface’s transformers: State-of-
nition. In Proceedings of the IEEE conference on the-art natural language processing. arXiv preprint
computer vision and pattern recognition, pages 770– arXiv:1910.03771.
778.
Dinghuai Zhang, Kartik Ahuja, Yilun Xu, Yisen Wang,
Michael A Lepori, Thomas Serre, and Ellie Pavlick. and Aaron Courville. 2021. Can subnetwork struc-
2023. Break it down: evidence for structural com- ture be the key to out-of-distribution generalization?
positionality in neural networks. arXiv preprint In International Conference on Machine Learning,
arXiv:2301.10884. pages 12356–12367. PMLR.
Kecheng Zheng, Wei Wu, Ruili Feng, Kai Zhu, Jiawei
Liu, Deli Zhao, Zheng-Jun Zha, Wei Chen, and Yu-
jun Shen. 2023. Regularized mask tuning: Uncover-
ing hidden knowledge in pre-trained vision-language
models. arXiv preprint arXiv:2307.15049.

Neurosurgeon: A Toolkit For Subnetwork Analysis: Csordás Et Al. 2020 Lepori Et Al. 2023

Uploaded by

Copyright:

Available Formats

Neurosurgeon: A Toolkit For Subnetwork Analysis: Csordás Et Al. 2020 Lepori Et Al. 2023

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Neurosurgeon: A Toolkit For Subnetwork Analysis: Csordás Et Al. 2020 Lepori Et Al. 2023

Uploaded by

Copyright:

Available Formats

NeuroSurgeon: A Toolkit for Subnetwork Analysis

Michael A. Lepori Ellie Pavlick

sent. Recent work has attempted to under-

5 Discussion Figure 1: Visualization of two subnetworks within a

You might also like