Neurosurgeon: A Toolkit For Subnetwork Analysis: Csordás Et Al. 2020 Lepori Et Al. 2023
Neurosurgeon: A Toolkit For Subnetwork Analysis: Csordás Et Al. 2020 Lepori Et Al. 2023
Neurosurgeon: A Toolkit For Subnetwork Analysis: Csordás Et Al. 2020 Lepori Et Al. 2023
Thomas Serre
Carney Institute for Brain Science
Brown University
thomas_serre@brown.edu
Abstract 2 Overview
Despite recent advances in the field of explain- NeuroSurgeon supports several popular models
ability, much remains unknown about the al- within the Huggingface Transformers repository
gorithms that neural networks learn to repre-
(Wolf et al., 2019), including ViT (Dosovitskiy
arXiv:2309.00244v1 [cs.LG] 1 Sep 2023
3 Visualization
In order to visualize the results of subnetwork anal-
ysis, we have implemented a visualizer that can
be used to understand how subnetworks are dis-
tributed throughout the layers of a model. It can be
used to display one or two subnetworks within the
same model. See Figure 1 for an example visual-
ization of two subnetworks in a 2-layer GPT2-style
transformer.
4 Related Work
Subnetwork analysis has been used in a wide va-
riety of contexts in recent deep learning research.
Some studies have used subnetwork analysis to
uncover how linguistic information is distributed
throughout a model (De Cao et al., 2022, 2020).
One notable approach to this is subnetwork prob-
ing (Cao et al., 2021), which NeuroSurgeon imple-
ments. Others have sought to understand how par-
ticular computations are structured within model
weights (Csordás et al., 2020; Lepori et al., 2023;
Conmy et al., 2023). Still others have used subnet-
work analysis to better understand generalization
and transfer learning (Zhang et al., 2021; Panigrahi
et al., 2023; Zheng et al., 2023; Guo et al., 2021),
or to control model behavior (Li et al., 2023).
Nicola De Cao, Leon Schmid, Dieuwke Hupkes, and Catherine Olsson, Nelson Elhage, Neel Nanda, Nicholas
Ivan Titov. 2022. Sparse interventions in language Joseph, Nova DasSarma, Tom Henighan, Ben Mann,
models with differentiable masking. In Proceedings Amanda Askell, Yuntao Bai, Anna Chen, et al. 2022.
of the Fifth BlackboxNLP Workshop on Analyzing In-context learning and induction heads. arXiv
and Interpreting Neural Networks for NLP, pages preprint arXiv:2209.11895.
16–27.
Abhishek Panigrahi, Nikunj Saunshi, Haoyu Zhao, and
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Sanjeev Arora. 2023. Task-specific skill localiza-
Kristina Toutanova. 2018. Bert: Pre-training of deep tion in fine-tuned language models. arXiv preprint
bidirectional transformers for language understand- arXiv:2302.06600.
ing. arXiv preprint arXiv:1810.04805.
Alec Radford, Jeffrey Wu, Rewon Child, David Luan,
Alexey Dosovitskiy, Lucas Beyer, Alexander
Dario Amodei, Ilya Sutskever, et al. 2019. Language
Kolesnikov, Dirk Weissenborn, Xiaohua Zhai,
models are unsupervised multitask learners. OpenAI
Thomas Unterthiner, Mostafa Dehghani, Matthias
blog, 1(8):9.
Minderer, Georg Heigold, Sylvain Gelly, et al. 2020.
An image is worth 16x16 words: Transformers
for image recognition at scale. In International Pedro Savarese, Hugo Silva, and Michael Maire. 2020.
Conference on Learning Representations. Winning the lottery with continuous sparsification.
Advances in neural information processing systems,
Jonathan Frankle and Michael Carbin. 2018. The lottery 33:11380–11390.
ticket hypothesis: Finding sparse, trainable neural
networks. arXiv preprint arXiv:1803.03635. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob
Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz
Demi Guo, Alexander M Rush, and Yoon Kim. 2021. Kaiser, and Illia Polosukhin. 2017. Attention is all
Parameter-efficient transfer learning with diff prun- you need. Advances in neural information processing
ing. In Proceedings of the 59th Annual Meeting of the systems, 30.
Association for Computational Linguistics and the
11th International Joint Conference on Natural Lan- Kevin Ro Wang, Alexandre Variengien, Arthur Conmy,
guage Processing (Volume 1: Long Papers), pages Buck Shlegeris, and Jacob Steinhardt. 2022. Inter-
4884–4896. pretability in the wild: a circuit for indirect object
identification in gpt-2 small. In The Eleventh Inter-
Song Han, Jeff Pool, John Tran, and William Dally. national Conference on Learning Representations.
2015. Learning both weights and connections for
efficient neural network. Advances in neural infor-
Thomas Wolf, Lysandre Debut, Victor Sanh, Julien
mation processing systems, 28.
Chaumond, Clement Delangue, Anthony Moi, Pier-
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian ric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz,
Sun. 2016. Deep residual learning for image recog- et al. 2019. Huggingface’s transformers: State-of-
nition. In Proceedings of the IEEE conference on the-art natural language processing. arXiv preprint
computer vision and pattern recognition, pages 770– arXiv:1910.03771.
778.
Dinghuai Zhang, Kartik Ahuja, Yilun Xu, Yisen Wang,
Michael A Lepori, Thomas Serre, and Ellie Pavlick. and Aaron Courville. 2021. Can subnetwork struc-
2023. Break it down: evidence for structural com- ture be the key to out-of-distribution generalization?
positionality in neural networks. arXiv preprint In International Conference on Machine Learning,
arXiv:2301.10884. pages 12356–12367. PMLR.
Kecheng Zheng, Wei Wu, Ruili Feng, Kai Zhu, Jiawei
Liu, Deli Zhao, Zheng-Jun Zha, Wei Chen, and Yu-
jun Shen. 2023. Regularized mask tuning: Uncover-
ing hidden knowledge in pre-trained vision-language
models. arXiv preprint arXiv:2307.15049.