Search | arXiv e-print repository

Cyclic Sparse Training: Is it Enough?

Authors: Advait Gadhikar, Sree Harsha Nelaturu, Rebekka Burkholz

Abstract: The success of iterative pruning methods in achieving state-of-the-art sparse networks has largely been attributed to improved mask identification and an implicit regularization induced by pruning. We challenge this hypothesis and instead posit that their repeated cyclic training schedules enable improved optimization. To verify this, we show that pruning at initialization is significantly boosted… ▽ More The success of iterative pruning methods in achieving state-of-the-art sparse networks has largely been attributed to improved mask identification and an implicit regularization induced by pruning. We challenge this hypothesis and instead posit that their repeated cyclic training schedules enable improved optimization. To verify this, we show that pruning at initialization is significantly boosted by repeated cyclic training, even outperforming standard iterative pruning methods. The dominant mechanism how this is achieved, as we conjecture, can be attributed to a better exploration of the loss landscape leading to a lower training loss. However, at high sparsity, repeated cyclic training alone is not enough for competitive performance. A strong coupling between learnt parameter initialization and mask seems to be required. Standard methods obtain this coupling via expensive pruning-training iterations, starting from a dense network. To achieve this with sparse training instead, we propose SCULPT-ing, i.e., repeated cyclic training of any sparse mask followed by a single pruning step to couple the parameters and the mask, which is able to match the performance of state-of-the-art iterative pruning methods in the high sparsity regime at reduced computational cost. △ Less

Submitted 7 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

arXiv:2406.00418 [pdf, other]

GATE: How to Keep Out Intrusive Neighbors

Authors: Nimrah Mustafa, Rebekka Burkholz

Abstract: Graph Attention Networks (GATs) are designed to provide flexible neighborhood aggregation that assigns weights to neighbors according to their importance. In practice, however, GATs are often unable to switch off task-irrelevant neighborhood aggregation, as we show experimentally and analytically. To address this challenge, we propose GATE, a GAT extension that holds three major advantages: i) It… ▽ More Graph Attention Networks (GATs) are designed to provide flexible neighborhood aggregation that assigns weights to neighbors according to their importance. In practice, however, GATs are often unable to switch off task-irrelevant neighborhood aggregation, as we show experimentally and analytically. To address this challenge, we propose GATE, a GAT extension that holds three major advantages: i) It alleviates over-smoothing by addressing its root cause of unnecessary neighborhood aggregation. ii) Similarly to perceptrons, it benefits from higher depth as it can still utilize additional layers for (non-)linear feature transformations in case of (nearly) switched-off neighborhood aggregation. iii) By down-weighting connections to unrelated neighbors, it often outperforms GATs on real-world heterophilic datasets. To further validate our claims, we construct a synthetic test bed to analyze a model's ability to utilize the appropriate amount of neighborhood aggregation, which could be of independent interest. △ Less

Submitted 1 June, 2024; originally announced June 2024.

Comments: 26 pages. To be published at the International Conference on Machine Learning (ICML), 2024

arXiv:2405.18655 [pdf, other]

CAVACHON: a hierarchical variational autoencoder to integrate multi-modal single-cell data

Authors: Ping-Han Hsieh, Ru-Xiu Hsiao, Katalin Ferenc, Anthony Mathelier, Rebekka Burkholz, Chien-Yu Chen, Geir Kjetil Sandve, Tatiana Belova, Marieke Lydia Kuijjer

Abstract: Paired single-cell sequencing technologies enable the simultaneous measurement of complementary modalities of molecular data at single-cell resolution. Along with the advances in these technologies, many methods based on variational autoencoders have been developed to integrate these data. However, these methods do not explicitly incorporate prior biological relationships between the data modaliti… ▽ More Paired single-cell sequencing technologies enable the simultaneous measurement of complementary modalities of molecular data at single-cell resolution. Along with the advances in these technologies, many methods based on variational autoencoders have been developed to integrate these data. However, these methods do not explicitly incorporate prior biological relationships between the data modalities, which could significantly enhance modeling and interpretation. We propose a novel probabilistic learning framework that explicitly incorporates conditional independence relationships between multi-modal data as a directed acyclic graph using a generalized hierarchical variational autoencoder. We demonstrate the versatility of our framework across various applications pertinent to single-cell multi-omics data integration. These include the isolation of common and distinct information from different modalities, modality-specific differential analysis, and integrated cell clustering. We anticipate that the proposed framework can facilitate the construction of highly flexible graphical models that can capture the complexities of biological hypotheses and unravel the connections between different biological data types, such as different modalities of paired single-cell multi-omics data. The implementation of the proposed framework can be found in the repository https://github.com/kuijjerlab/CAVACHON. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2404.04612 [pdf, other]

Spectral Graph Pruning Against Over-Squashing and Over-Smoothing

Authors: Adarsh Jamadandi, Celia Rubio-Madrigal, Rebekka Burkholz

Abstract: Message Passing Graph Neural Networks are known to suffer from two problems that are sometimes believed to be diametrically opposed: over-squashing and over-smoothing. The former results from topological bottlenecks that hamper the information flow from distant nodes and are mitigated by spectral gap maximization, primarily, by means of edge additions. However, such additions often promote over-sm… ▽ More Message Passing Graph Neural Networks are known to suffer from two problems that are sometimes believed to be diametrically opposed: over-squashing and over-smoothing. The former results from topological bottlenecks that hamper the information flow from distant nodes and are mitigated by spectral gap maximization, primarily, by means of edge additions. However, such additions often promote over-smoothing that renders nodes of different classes less distinguishable. Inspired by the Braess phenomenon, we argue that deleting edges can address over-squashing and over-smoothing simultaneously. This insight explains how edge deletions can improve generalization, thus connecting spectral gap optimization to a seemingly disconnected objective of reducing computational resources by pruning graphs for lottery tickets. To this end, we propose a more effective spectral gap optimization framework to add or delete edges and demonstrate its effectiveness on large heterophilic datasets. △ Less

Submitted 6 April, 2024; originally announced April 2024.

arXiv:2403.04805 [pdf, other]

Not all tickets are equal and we know it: Guiding pruning with domain-specific knowledge

Authors: Intekhab Hossain, Jonas Fischer, Rebekka Burkholz, John Quackenbush

Abstract: Neural structure learning is of paramount importance for scientific discovery and interpretability. Yet, contemporary pruning algorithms that focus on computational resource efficiency face algorithmic barriers to select a meaningful model that aligns with domain expertise. To mitigate this challenge, we propose DASH, which guides pruning by available domain-specific structural information. In the… ▽ More Neural structure learning is of paramount importance for scientific discovery and interpretability. Yet, contemporary pruning algorithms that focus on computational resource efficiency face algorithmic barriers to select a meaningful model that aligns with domain expertise. To mitigate this challenge, we propose DASH, which guides pruning by available domain-specific structural information. In the context of learning dynamic gene regulatory network models, we show that DASH combined with existing general knowledge on interaction partners provides data-specific insights aligned with biology. For this task, we show on synthetic data with ground truth information and two real world applications the effectiveness of DASH, which outperforms competing methods by a large margin and provides more meaningful biological insights. Our work shows that domain specific structural information bears the potential to improve model-derived scientific insights. △ Less

Submitted 5 March, 2024; originally announced March 2024.

arXiv:2402.19262 [pdf, other]

Masks, Signs, And Learning Rate Rewinding

Authors: Advait Gadhikar, Rebekka Burkholz

Abstract: Learning Rate Rewinding (LRR) has been established as a strong variant of Iterative Magnitude Pruning (IMP) to find lottery tickets in deep overparameterized neural networks. While both iterative pruning schemes couple structure and parameter learning, understanding how LRR excels in both aspects can bring us closer to the design of more flexible deep learning algorithms that can optimize diverse… ▽ More Learning Rate Rewinding (LRR) has been established as a strong variant of Iterative Magnitude Pruning (IMP) to find lottery tickets in deep overparameterized neural networks. While both iterative pruning schemes couple structure and parameter learning, understanding how LRR excels in both aspects can bring us closer to the design of more flexible deep learning algorithms that can optimize diverse sets of sparse architectures. To this end, we conduct experiments that disentangle the effect of mask learning and parameter optimization and how both benefit from overparameterization. The ability of LRR to flip parameter signs early and stay robust to sign perturbations seems to make it not only more effective in mask identification but also in optimizing diverse sets of masks, including random ones. In support of this hypothesis, we prove in a simplified single hidden neuron setting that LRR succeeds in more cases than IMP, as it can escape initially problematic sign configurations. △ Less

Submitted 29 February, 2024; originally announced February 2024.

Comments: Accepted for publishing at ICLR 2024

arXiv:2310.07235 [pdf, other]

Are GATs Out of Balance?

Authors: Nimrah Mustafa, Aleksandar Bojchevski, Rebekka Burkholz

Abstract: While the expressive power and computational capabilities of graph neural networks (GNNs) have been theoretically studied, their optimization and learning dynamics, in general, remain largely unexplored. Our study undertakes the Graph Attention Network (GAT), a popular GNN architecture in which a node's neighborhood aggregation is weighted by parameterized attention coefficients. We derive a conse… ▽ More While the expressive power and computational capabilities of graph neural networks (GNNs) have been theoretically studied, their optimization and learning dynamics, in general, remain largely unexplored. Our study undertakes the Graph Attention Network (GAT), a popular GNN architecture in which a node's neighborhood aggregation is weighted by parameterized attention coefficients. We derive a conservation law of GAT gradient flow dynamics, which explains why a high portion of parameters in GATs with standard initialization struggle to change during training. This effect is amplified in deeper GATs, which perform significantly worse than their shallow counterparts. To alleviate this problem, we devise an initialization scheme that balances the GAT network. Our approach i) allows more effective propagation of gradients and in turn enables trainability of deeper networks, and ii) attains a considerable speedup in training and convergence time in comparison to the standard initialization. Our main theorem serves as a stepping stone to studying the learning dynamics of positive homogeneous models with attention mechanisms. △ Less

Submitted 25 October, 2023; v1 submitted 11 October, 2023; originally announced October 2023.

Comments: 25 pages. To be published in Advances in Neural Information Processing Systems (NeurIPS), 2023

arXiv:2301.13732 [pdf, other]

Preserving local densities in low-dimensional embeddings

Authors: Jonas Fischer, Rebekka Burkholz, Jilles Vreeken

Abstract: Low-dimensional embeddings and visualizations are an indispensable tool for analysis of high-dimensional data. State-of-the-art methods, such as tSNE and UMAP, excel in unveiling local structures hidden in high-dimensional data and are therefore routinely applied in standard analysis pipelines in biology. We show, however, that these methods fail to reconstruct local properties, such as relative d… ▽ More Low-dimensional embeddings and visualizations are an indispensable tool for analysis of high-dimensional data. State-of-the-art methods, such as tSNE and UMAP, excel in unveiling local structures hidden in high-dimensional data and are therefore routinely applied in standard analysis pipelines in biology. We show, however, that these methods fail to reconstruct local properties, such as relative differences in densities (Fig. 1) and that apparent differences in cluster size can arise from computational artifact caused by differing sample sizes (Fig. 2). Providing a theoretical analysis of this issue, we then suggest dtSNE, which approximately conserves local densities. In an extensive study on synthetic benchmark and real world data comparing against five state-of-the-art methods, we empirically show that dtSNE provides similar global reconstruction, but yields much more accurate depictions of local distances and relative densities. △ Less

Submitted 31 January, 2023; originally announced January 2023.

arXiv:2210.02412 [pdf, other]

Why Random Pruning Is All We Need to Start Sparse

Authors: Advait Gadhikar, Sohom Mukherjee, Rebekka Burkholz

Abstract: Random masks define surprisingly effective sparse neural network models, as has been shown empirically. The resulting sparse networks can often compete with dense architectures and state-of-the-art lottery ticket pruning algorithms, even though they do not rely on computationally expensive prune-train iterations and can be drawn initially without significant computational overhead. We offer a theo… ▽ More Random masks define surprisingly effective sparse neural network models, as has been shown empirically. The resulting sparse networks can often compete with dense architectures and state-of-the-art lottery ticket pruning algorithms, even though they do not rely on computationally expensive prune-train iterations and can be drawn initially without significant computational overhead. We offer a theoretical explanation of how random masks can approximate arbitrary target networks if they are wider by a logarithmic factor in the inverse sparsity $1 / \log(1/\text{sparsity})$. This overparameterization factor is necessary at least for 3-layer random networks, which elucidates the observed degrading performance of random networks at higher sparsity. At moderate to high sparsity levels, however, our results imply that sparser networks are contained within random source networks so that any dense-to-sparse training scheme can be turned into a computationally more efficient sparse-to-sparse one by constraining the search to a fixed random mask. We demonstrate the feasibility of this approach in experiments for different pruning methods and propose particularly effective choices of initial layer-wise sparsity ratios of the random source network. As a special case, we show theoretically and experimentally that random source networks also contain strong lottery tickets. △ Less

Submitted 31 May, 2023; v1 submitted 5 October, 2022; originally announced October 2022.

Comments: Accepted for publication at ICML, 2023

arXiv:2210.02411 [pdf, other]

Dynamical Isometry for Residual Networks

Authors: Advait Gadhikar, Rebekka Burkholz

Abstract: The training success, training speed and generalization ability of neural networks rely crucially on the choice of random parameter initialization. It has been shown for multiple architectures that initial dynamical isometry is particularly advantageous. Known initialization schemes for residual blocks, however, miss this property and suffer from degrading separability of different inputs for incr… ▽ More The training success, training speed and generalization ability of neural networks rely crucially on the choice of random parameter initialization. It has been shown for multiple architectures that initial dynamical isometry is particularly advantageous. Known initialization schemes for residual blocks, however, miss this property and suffer from degrading separability of different inputs for increasing depth and instability without Batch Normalization or lack feature diversity. We propose a random initialization scheme, RISOTTO, that achieves perfect dynamical isometry for residual networks with ReLU activation functions even for finite depth and width. It balances the contributions of the residual and skip branches unlike other schemes, which initially bias towards the skip connections. In experiments, we demonstrate that in most cases our approach outperforms initialization schemes proposed to make Batch Normalization obsolete, including Fixup and SkipInit, and facilitates stable training. Also in combination with Batch Normalization, we find that RISOTTO often achieves the overall best result. △ Less

Submitted 5 October, 2022; originally announced October 2022.

Comments: 22 pages, 5 figures

arXiv:2205.02343 [pdf, other]

Convolutional and Residual Networks Provably Contain Lottery Tickets

Authors: Rebekka Burkholz

Abstract: The Lottery Ticket Hypothesis continues to have a profound practical impact on the quest for small scale deep neural networks that solve modern deep learning tasks at competitive performance. These lottery tickets are identified by pruning large randomly initialized neural networks with architectures that are as diverse as their applications. Yet, theoretical insights that attest their existence h… ▽ More The Lottery Ticket Hypothesis continues to have a profound practical impact on the quest for small scale deep neural networks that solve modern deep learning tasks at competitive performance. These lottery tickets are identified by pruning large randomly initialized neural networks with architectures that are as diverse as their applications. Yet, theoretical insights that attest their existence have been mostly focused on deep fully-connected feed forward networks with ReLU activation functions. We prove that also modern architectures consisting of convolutional and residual layers that can be equipped with almost arbitrary activation functions can contain lottery tickets with high probability. △ Less

Submitted 4 May, 2022; originally announced May 2022.

arXiv:2205.02321 [pdf, other]

Most Activation Functions Can Win the Lottery Without Excessive Depth

Authors: Rebekka Burkholz

Abstract: The strong lottery ticket hypothesis has highlighted the potential for training deep neural networks by pruning, which has inspired interesting practical and theoretical insights into how neural networks can represent functions. For networks with ReLU activation functions, it has been proven that a target network with depth $L$ can be approximated by the subnetwork of a randomly initialized neural… ▽ More The strong lottery ticket hypothesis has highlighted the potential for training deep neural networks by pruning, which has inspired interesting practical and theoretical insights into how neural networks can represent functions. For networks with ReLU activation functions, it has been proven that a target network with depth $L$ can be approximated by the subnetwork of a randomly initialized neural network that has double the target's depth $2L$ and is wider by a logarithmic factor. We show that a depth $L+1$ network is sufficient. This result indicates that we can expect to find lottery tickets at realistic, commonly used depths while only requiring logarithmic overparametrization. Our novel construction approach applies to a large class of activation functions and is not limited to ReLUs. △ Less

Submitted 8 January, 2023; v1 submitted 4 May, 2022; originally announced May 2022.

Comments: Accepted for publication at NeurIPS 2022

arXiv:2111.11153 [pdf, other]

Plant 'n' Seek: Can You Find the Winning Ticket?

Authors: Jonas Fischer, Rebekka Burkholz

Abstract: The lottery ticket hypothesis has sparked the rapid development of pruning algorithms that aim to reduce the computational costs associated with deep learning during training and model deployment. Currently, such algorithms are primarily evaluated on imaging data, for which we lack ground truth information and thus the understanding of how sparse lottery tickets could be. To fill this gap, we deve… ▽ More The lottery ticket hypothesis has sparked the rapid development of pruning algorithms that aim to reduce the computational costs associated with deep learning during training and model deployment. Currently, such algorithms are primarily evaluated on imaging data, for which we lack ground truth information and thus the understanding of how sparse lottery tickets could be. To fill this gap, we develop a framework that allows us to plant and hide winning tickets with desirable properties in randomly initialized neural networks. To analyze the ability of state-of-the-art pruning to identify tickets of extreme sparsity, we design and hide such tickets solving four challenging tasks. In extensive experiments, we observe similar trends as in imaging studies, indicating that our framework can provide transferable insights into realistic problems. Additionally, we can now see beyond such relative trends and highlight limitations of current pruning methods. Based on our results, we conclude that the current limitations in ticket sparsity are likely of algorithmic rather than fundamental nature. We anticipate that comparisons to planted tickets will facilitate future developments of efficient pruning algorithms. △ Less

Submitted 7 June, 2022; v1 submitted 22 November, 2021; originally announced November 2021.

arXiv:2111.11146 [pdf, other]

On the Existence of Universal Lottery Tickets

Authors: Rebekka Burkholz, Nilanjana Laha, Rajarshi Mukherjee, Alkis Gotovos

Abstract: The lottery ticket hypothesis conjectures the existence of sparse subnetworks of large randomly initialized deep neural networks that can be successfully trained in isolation. Recent work has experimentally observed that some of these tickets can be practically reused across a variety of tasks, hinting at some form of universality. We formalize this concept and theoretically prove that not only do… ▽ More The lottery ticket hypothesis conjectures the existence of sparse subnetworks of large randomly initialized deep neural networks that can be successfully trained in isolation. Recent work has experimentally observed that some of these tickets can be practically reused across a variety of tasks, hinting at some form of universality. We formalize this concept and theoretically prove that not only do such universal tickets exist but they also do not require further training. Our proofs introduce a couple of technical innovations related to pruning for strong lottery tickets, including extensions of subset sum results and a strategy to leverage higher amounts of depth. Our explicit sparse constructions of universal function families might be of independent interest, as they highlight representational benefits induced by univariate convolutional architectures. △ Less

Submitted 16 March, 2022; v1 submitted 22 November, 2021; originally announced November 2021.

Comments: Accepted for publication at The Tenth International Conference on Learning Representations (ICLR 2022)

arXiv:2110.11150 [pdf, ps, other]

Lottery Tickets with Nonzero Biases

Authors: Jonas Fischer, Advait Gadhikar, Rebekka Burkholz

Abstract: The strong lottery ticket hypothesis holds the promise that pruning randomly initialized deep neural networks could offer a computationally efficient alternative to deep learning with stochastic gradient descent. Common parameter initialization schemes and existence proofs, however, are focused on networks with zero biases, thus foregoing the potential universal approximation property of pruning.… ▽ More The strong lottery ticket hypothesis holds the promise that pruning randomly initialized deep neural networks could offer a computationally efficient alternative to deep learning with stochastic gradient descent. Common parameter initialization schemes and existence proofs, however, are focused on networks with zero biases, thus foregoing the potential universal approximation property of pruning. To fill this gap, we extend multiple initialization schemes and existence proofs to nonzero biases, including explicit 'looks-linear' approaches for ReLU activation functions. These do not only enable truly orthogonal parameter initialization but also reduce potential pruning errors. In experiments on standard benchmark data, we further highlight the practical benefits of nonzero bias initialization schemes, and present theoretically inspired extensions for state-of-the-art strong lottery ticket pruning. △ Less

Submitted 7 June, 2022; v1 submitted 21 October, 2021; originally announced October 2021.

arXiv:2107.02911 [pdf, other]

Scaling up Continuous-Time Markov Chains Helps Resolve Underspecification

Authors: Alkis Gotovos, Rebekka Burkholz, John Quackenbush, Stefanie Jegelka

Abstract: Modeling the time evolution of discrete sets of items (e.g., genetic mutations) is a fundamental problem in many biomedical applications. We approach this problem through the lens of continuous-time Markov chains, and show that the resulting learning task is generally underspecified in the usual setting of cross-sectional data. We explore a perhaps surprising remedy: including a number of addition… ▽ More Modeling the time evolution of discrete sets of items (e.g., genetic mutations) is a fundamental problem in many biomedical applications. We approach this problem through the lens of continuous-time Markov chains, and show that the resulting learning task is generally underspecified in the usual setting of cross-sectional data. We explore a perhaps surprising remedy: including a number of additional independent items can help determine time order, and hence resolve underspecification. This is in sharp contrast to the common practice of limiting the analysis to a small subset of relevant items, which is followed largely due to poor scaling of existing methods. To put our theoretical insight into practice, we develop an approximate likelihood maximization method for learning continuous-time Markov chains, which can scale to hundreds of items and is orders of magnitude faster than previous methods. We demonstrate the effectiveness of our approach on synthetic and real cancer data. △ Less

Submitted 6 July, 2021; originally announced July 2021.

arXiv:2104.01690 [pdf, other]

DRAGON: Determining Regulatory Associations using Graphical models on multi-Omic Networks

Authors: Katherine H. Shutta, Deborah Weighill, Rebekka Burkholz, Marouen Ben Guebila, Dawn L. DeMeo, Helena U. Zacharias, John Quackenbush, Michael Altenbuchinger

Abstract: The increasing quantity of multi-omics data, such as methylomic and transcriptomic profiles, collected on the same specimen, or even on the same cell, provide a unique opportunity to explore the complex interactions that define cell phenotype and govern cellular responses to perturbations. We propose a network approach based on Gaussian Graphical Models (GGMs) that facilitates the joint analysis o… ▽ More The increasing quantity of multi-omics data, such as methylomic and transcriptomic profiles, collected on the same specimen, or even on the same cell, provide a unique opportunity to explore the complex interactions that define cell phenotype and govern cellular responses to perturbations. We propose a network approach based on Gaussian Graphical Models (GGMs) that facilitates the joint analysis of paired omics data. This method, called DRAGON (Determining Regulatory Associations using Graphical models on multi-Omic Networks), calibrates its parameters to achieve an optimal trade-off between the network's complexity and estimation accuracy, while explicitly accounting for the characteristics of each of the assessed omics "layers." In simulation studies, we show that DRAGON adapts to edge density and feature size differences between omics layers, improving model inference and edge recovery compared to state-of-the-art methods. We further demonstrate in an analysis of joint transcriptome - methylome data from TCGA breast cancer specimens that DRAGON can identify key molecular mechanisms such as gene regulation via promoter methylation. In particular, we identify Transcription Factor AP-2 Beta (TFAP2B) as a potential multi-omic biomarker for basal-type breast cancer. DRAGON is available as open-source code in Python through the Network Zoo package (netZooPy v0.8; netzoo.github.io). △ Less

Submitted 21 September, 2022; v1 submitted 4 April, 2021; originally announced April 2021.

Comments: 24 pages, 8 figures

arXiv:1909.05416 [pdf, other]

Cascade Size Distributions: Why They Matter and How to Compute Them Efficiently

Authors: Rebekka Burkholz, John Quackenbush

Abstract: Cascade models are central to understanding, predicting, and controlling epidemic spreading and information propagation. Related optimization, including influence maximization, model parameter inference, or the development of vaccination strategies, relies heavily on sampling from a model. This is either inefficient or inaccurate. As alternative, we present an efficient message passing algorithm t… ▽ More Cascade models are central to understanding, predicting, and controlling epidemic spreading and information propagation. Related optimization, including influence maximization, model parameter inference, or the development of vaccination strategies, relies heavily on sampling from a model. This is either inefficient or inaccurate. As alternative, we present an efficient message passing algorithm that computes the probability distribution of the cascade size for the Independent Cascade Model on weighted directed networks and generalizations. Our approach is exact on trees but can be applied to any network topology. It approximates locally tree-like networks well, scales to large networks, and can lead to surprisingly good performance on more dense networks, as we also exemplify on real world data. △ Less

Submitted 16 December, 2020; v1 submitted 9 September, 2019; originally announced September 2019.

Comments: Accepted at AAAI 2021

arXiv:1901.05872 [pdf, other]

doi 10.1088/1748-9326/ab4864

International crop trade networks: The impact of shocks and cascades

Authors: Rebekka Burkholz, Frank Schweitzer

Abstract: Analyzing available FAO data from 176 countries over 21 years, we observe an increase of complexity in the international trade of maize, rice, soy, and wheat. A larger number of countries play a role as producers or intermediaries, either for trade or food processing. In consequence, we find that the trade networks become more prone to failure cascades caused by exogenous shocks. In our model, cou… ▽ More Analyzing available FAO data from 176 countries over 21 years, we observe an increase of complexity in the international trade of maize, rice, soy, and wheat. A larger number of countries play a role as producers or intermediaries, either for trade or food processing. In consequence, we find that the trade networks become more prone to failure cascades caused by exogenous shocks. In our model, countries compensate for demand deficits by imposing export restrictions. To capture these, we construct higher-order trade dependency networks for the different crops and years. These networks reveal hidden dependencies between countries and allow to discuss policy implications. △ Less

Submitted 17 January, 2019; originally announced January 2019.

arXiv:1811.06872 [pdf, other]

Efficient message passing for cascade size distributions on finite trees

Authors: Rebekka Burkholz

Abstract: How big is the risk that a few initial failures of networked nodes amplify to large cascades that endanger the functioning of the system? Common answers refer to the average final cascade size. Two analytic approaches allow its computation: a) (heterogeneous) mean field approximation and b) belief propagation. The former applies to (infinitely) large locally tree-like networks, while the latter is… ▽ More How big is the risk that a few initial failures of networked nodes amplify to large cascades that endanger the functioning of the system? Common answers refer to the average final cascade size. Two analytic approaches allow its computation: a) (heterogeneous) mean field approximation and b) belief propagation. The former applies to (infinitely) large locally tree-like networks, while the latter is exact on finite trees. Yet, cascade sizes can have broad and multi-modal distributions that are not well represented by their average. Full distribution information is essential to identify likely events and to estimate the tail risk, i.e. the probability of extreme events. Here, we lay the basis for a general theory to calculate the cascade size distribution in finite networks. We present an efficient message passing algorithm that is exact on finite trees and a large class of cascade processes. An approximation version performs well on locally tree-like networks. △ Less

Submitted 14 November, 2018; originally announced November 2018.

arXiv:1806.06362 [pdf, other]

Initialization of ReLUs for Dynamical Isometry

Authors: Rebekka Burkholz, Alina Dubatovka

Abstract: Deep learning relies on good initialization schemes and hyperparameter choices prior to training a neural network. Random weight initializations induce random network ensembles, which give rise to the trainability, training speed, and sometimes also generalization ability of an instance. In addition, such ensembles provide theoretical insights into the space of candidate models of which one is sel… ▽ More Deep learning relies on good initialization schemes and hyperparameter choices prior to training a neural network. Random weight initializations induce random network ensembles, which give rise to the trainability, training speed, and sometimes also generalization ability of an instance. In addition, such ensembles provide theoretical insights into the space of candidate models of which one is selected during training. The results obtained so far rely on mean field approximations that assume infinite layer width and that study average squared signals. We derive the joint signal output distribution exactly, without mean field assumptions, for fully-connected networks with Gaussian weights and biases, and analyze deviations from the mean field results. For rectified linear units, we further discuss limitations of the standard initialization scheme, such as its lack of dynamical isometry, and propose a simple alternative that overcomes these by initial parameter sharing. △ Less

Submitted 24 October, 2019; v1 submitted 17 June, 2018; originally announced June 2018.

Comments: NeurIPS 2019

arXiv:1802.03286 [pdf, other]

Explicit size distributions of failure cascades redefine systemic risk on finite networks

Authors: Rebekka Burkholz, Hans J. Herrmann, Frank Schweitzer

Abstract: How big is the risk that a few initial failures of nodes in a network amplify to large cascades that span a substantial share of all nodes? Predicting the final cascade size is critical to ensure the functioning of a system as a whole. Yet, this task is hampered by uncertain or changing parameters and missing information. In infinitely large networks, the average cascade size can often be well est… ▽ More How big is the risk that a few initial failures of nodes in a network amplify to large cascades that span a substantial share of all nodes? Predicting the final cascade size is critical to ensure the functioning of a system as a whole. Yet, this task is hampered by uncertain or changing parameters and missing information. In infinitely large networks, the average cascade size can often be well estimated by established approaches building on local tree approximations and mean field approximations. Yet, as we demonstrate, in finite networks, this average does not even need to be a likely outcome. Instead, we find broad and even bimodal cascade size distributions. This phenomenon persists for system sizes up to $10^{7}$ and different cascade models, i.e. it is relevant for most real systems. To show this, we derive explicit closed-form solutions for the full probability distribution of the final cascade size. We focus on two topological limit cases, the complete network representing a dense network with a very narrow degree distribution, and the star network representing a sparse network with a inhomogeneous degree distribution. Those topologies are of great interest, as they either minimize or maximize the average cascade size and are common motifs in many real world networks. △ Less

Submitted 8 February, 2018; originally announced February 2018.

Comments: systemic risk, finite size effects, cascades, networks

arXiv:1712.01755 [pdf, other]

Modeling the formation of R\&D alliances: An agent-based model with empirical validation

Authors: Mario V. Tomasello, Rebekka Burkholz, Frank Schweitzer

Abstract: We develop an agent-based model to reproduce the size distribution of R\&D alliances of firms. Agents are uniformly selected to initiate an alliance and to invite collaboration partners. These decide about acceptance based on an individual threshold that is compared with the utility expected from joining the current alliance. The benefit of alliances results from the fitness of the agents involved… ▽ More We develop an agent-based model to reproduce the size distribution of R\&D alliances of firms. Agents are uniformly selected to initiate an alliance and to invite collaboration partners. These decide about acceptance based on an individual threshold that is compared with the utility expected from joining the current alliance. The benefit of alliances results from the fitness of the agents involved. Fitness is obtained from an empirical distribution of agent's activities. The cost of an alliance reflects its coordination effort. Two free parameters $a_{c}$ and $a_{l}$ scale the costs and the individual threshold. If initiators receive $R$ rejections of invitations, the alliance formation stops and another initiator is selected. The three free parameters $(a_{c},a_{l},R)$ are calibrated against a large scale data set of about 15,000 firms engaging in about 15,000 R\&D alliances over 26 years. For the validation of the model we compare the empirical size distribution with the theoretical one, using confidence bands, to find a very good agreement. As an asset of our agent-based model, we provide an analytical solution that allows to reduce the simulation effort considerably. The analytical solution applies to general forms of the utility of alliances. Hence, the model can be extended to other cases of alliance formation. While no information about the initiators of an alliance is available, our results indicate that mostly firms with high fitness are able to attract newcomers and to establish larger alliances. △ Less

Submitted 5 December, 2017; originally announced December 2017.

arXiv:1706.04451 [pdf, other]

doi 10.1103/PhysRevE.98.022306

Correlations between thresholds and degrees: An analytic approach to model attacks and failure cascades

Authors: Rebekka Burkholz, Frank Schweitzer

Abstract: Two node variables determine the evolution of cascades in random networks: a node's degree and threshold. Correlations between both fundamentally change the robustness of a network, yet, they are disregarded in standard analytic methods as local tree or heterogeneous mean field approximations because of the bad tractability of order statistics. We show how they become tractable in the thermodynami… ▽ More Two node variables determine the evolution of cascades in random networks: a node's degree and threshold. Correlations between both fundamentally change the robustness of a network, yet, they are disregarded in standard analytic methods as local tree or heterogeneous mean field approximations because of the bad tractability of order statistics. We show how they become tractable in the thermodynamic limit of infinite network size. This enables the analytic description of node attacks that are characterized by threshold allocations based on node degree. Using two examples, we discuss possible implications of irregular phase transitions and different speeds of cascade evolution for the control of cascades. △ Less

Submitted 16 June, 2017; v1 submitted 14 June, 2017; originally announced June 2017.

MSC Class: 60K35

Journal ref: Phys. Rev. E 98, 022306 (2018)

arXiv:1701.06970 [pdf, other]

doi 10.1103/PhysRevE.97.042312

A framework for cascade size calculations on random networks

Authors: Rebekka Burkholz, Frank Schweitzer

Abstract: We present a framework to calculate the cascade size evolution for a large class of cascade models on random network ensembles in the limit of infinite network size. Our method is exact and applies to network ensembles with almost arbitrary degree distribution, degree-degree correlations and, in case of threshold models, with arbitrary threshold distribution. With our approach, we shift the perspe… ▽ More We present a framework to calculate the cascade size evolution for a large class of cascade models on random network ensembles in the limit of infinite network size. Our method is exact and applies to network ensembles with almost arbitrary degree distribution, degree-degree correlations and, in case of threshold models, with arbitrary threshold distribution. With our approach, we shift the perspective from the known branching process approximations to the iterative update of suitable probability distributions. Such distributions are key to capture cascade dynamics that involve possibly continuous quantities and that depend on the cascade history, e.g. if load is accumulated over time. These distributions respect the Markovian nature of the studied random processes. Random variables capture the impact of nodes that have failed at any point in the past on their neighborhood. As a proof of concept, we provide two examples: (a) Constant load models that cover many of the analytically tractable cascade models, and, as a highlight, (b) a fiber bundle model that was not tractable by branching process approximations before. Our derivations cover the whole cascade dynamics, not only their steady state. This allows to include interventions in time or further model complexity in the analysis. △ Less

Submitted 18 January, 2017; originally announced January 2017.

Journal ref: Phys. Rev. E 97, 042312 (2018)

arXiv:1506.06664 [pdf, other]

doi 10.1016/j.physd.2015.10.004

Systemic risk in multiplex networks with asymmetric coupling and threshold feedback

Authors: Rebekka Burkholz, Matt V. Leduc, Antonios Garas, Frank Schweitzer

Abstract: We study cascades on a two-layer multiplex network, with asymmetric feedback that depends on the coupling strength between the layers. Based on an analytical branching process approximation, we calculate the systemic risk measured by the final fraction of failed nodes on a reference layer. The results are compared with the case of a single layer network that is an aggregated representation of the… ▽ More We study cascades on a two-layer multiplex network, with asymmetric feedback that depends on the coupling strength between the layers. Based on an analytical branching process approximation, we calculate the systemic risk measured by the final fraction of failed nodes on a reference layer. The results are compared with the case of a single layer network that is an aggregated representation of the two layers. We find that systemic risk in the two-layer network is smaller than in the aggregated one only if the coupling strength between the two layers is small. Above a critical coupling strength, systemic risk is increased because of the mutual amplification of cascades in the two layers. We even observe sharp phase transitions in the cascade size that are less pronounced on the aggregated layer. Our insights can be applied to a scenario where firms decide whether they want to split their business into a less risky core business and a more risky subsidiary business. In most cases, this may lead to a drastic increase of systemic risk, which is underestimated in an aggregated approach. △ Less

Submitted 22 June, 2015; originally announced June 2015.

Comments: 18 pages, 5 figures

Journal ref: Physica D: Nonlinear Phenomena, Vol. 323--324, 64--72 (2016)

arXiv:1503.00925 [pdf, other]

doi 10.1103/PhysRevE.93.042313

How Damage Diversification Can Reduce Systemic Risk

Authors: Rebekka Burkholz, Antonios Garas, Frank Schweitzer

Abstract: We consider the problem of risk diversification in complex networks. Nodes represent e.g. financial actors, whereas weighted links represent e.g. financial obligations (credits/debts). Each node has a risk to fail because of losses resulting from defaulting neighbors, which may lead to large failure cascades. Classical risk diversification strategies usually neglect network effects and therefore s… ▽ More We consider the problem of risk diversification in complex networks. Nodes represent e.g. financial actors, whereas weighted links represent e.g. financial obligations (credits/debts). Each node has a risk to fail because of losses resulting from defaulting neighbors, which may lead to large failure cascades. Classical risk diversification strategies usually neglect network effects and therefore suggest that risk can be reduced if possible losses (i.e., exposures) are split among many neighbors (exposure diversification, ED). But from a complex networks perspective diversification implies higher connectivity of the system as a whole which can also lead to increasing failure risk of a node. To cope with this, we propose a different strategy (damage diversification, DD), i.e. the diversification of losses that are imposed on neighboring nodes as opposed to losses incurred by the node itself. Here, we quantify the potential of DD to reduce systemic risk in comparison to ED. For this, we develop a branching process approximation that we generalize to weighted networks with (almost) arbitrary degree and weight distributions. This allows us to identify systemically relevant nodes in a network even if their directed weights differ strongly. On the macro level, we provide an analytical expression for the average cascade size, to quantify systemic risk. Furthermore, on the meso level we calculate failure probabilities of nodes conditional on their system relevance. △ Less

Submitted 3 March, 2015; originally announced March 2015.

Journal ref: Phys. Rev. E 93, 042313 (2016)

Showing 1–27 of 27 results for author: Burkholz, R