Observing network dynamics through sentinel nodes

Neil G. MacLaren¹ Baruch Barzel^2,3,4 Naoki Masuda^1,5,6,∗ ¹Department of Mathematics, State University of New York at Buffalo, NY 14260-2900, USA ²Department of Mathematics, Bar-Ilan University, Ramat-Gan, 5290002, Israel ³The Gonda Multidisciplinary Brain Research Center, Bar-Ilan University, Ramat-Gan, 5290002, Israel ⁴Network Science Institute, Northeastern University, Boston, MA 02115, USA ⁵Institute for Artificial Intelligence and Data Science, State University of New York at Buffalo, USA ⁶Center for Computational Social Science, Kobe University, Kobe, 657-8501, Japan ^∗naokimas@buffalo.edu

(July 31, 2024)

Abstract

A fundamental premise of statistical physics is that the particles in a physical system are interchangeable, and hence the state of each specific component is representative of the system as a whole. This assumption breaks down for complex networks, in which nodes may be extremely diverse, and no single component can truly represent the state of the entire system. It seems, therefore, that to observe the dynamics of social, biological or technological networks, one must extract the dynamic states of a large number of nodes—a task that is often practically prohibitive. To overcome this challenge, we use machine learning techniques to detect the network’s sentinel nodes, a set of network components whose combined states can help approximate the average dynamics of the entire network. The method allows us to assess the state of a large complex system by tracking just a small number of carefully selected nodes. The resulting sentinel node set offers a natural probe by which to practically observe complex network dynamics.

I Introduction

The dynamic state of a complex networked system is given in terms of the microscopic states $x_{i}$ of all its components (nodes) [1, 2, 3]. For example, the functionality of a cell can be captured by the expression levels of all individual genes [4, 5, 6]. Similarly, the state of an ecosystem is often characterized by the abundance of all species [7, 8]. Therefore, to observe the state of the system, one must simultaneously track the individual states of its many nodes—a level of empirical control that we seldom possess.

We, therefore, seek efficient reduction methods, that help track the state of the system in a more compact fashion. In most applications, such reduction is achieved by focusing on the average activity of all nodes, i.e. $\langle x\rangle=\sum_{i=1}^{N}a_{i}x_{i}$ , where the weights $\{a_{1},\ldots,a_{N}\}$ characterize the importance of each node in assessing the system’s dynamic state [9, 10, 11, 12, 13, 14, 15]. For example, selecting $a_{i}=k_{i}$ , a node’s weighted degree, leads to the Gao-Barzel-Barabási (GBB) reduction [9], which helps predict the critical points of transition between dynamic states. Alternatively, under the dynamics approximate reduction technique (DART), the weight $a_{i}$ represents the $i$ th entry of the leading eigenvector of the network’s adjacency matrix [10, 11].

While these reduction methods provide crucial theoretical insight, allowing us to analytically predict the system’s critical points of transition, they offer limited advances on observability. This is because measuring $\langle x\rangle$ via GBB or DART, one still needs to access the state of all nodes $x_{1},\dots,x_{N}$ , thus remaining beyond the practical bounds of empirical accessibility. More selective probes have been introduced in the context of early warning signals, where one follows just a small number of nodes that show signs of looming transitions early on, prior to the rest of the network [16, 17, 18, 19, 20]. The problem is that such early warning nodes are, by design, selected thanks to their divergent behavior around the transition points. This renders them unlikely to be a representative sample of the entire networked system.

How then can we gain the predictive power of GBB or DART, but with a node set that is comparable to the few early warning signifiers? To solve this, here we use machine learning to impose a sparsity condition on $\{a_{1},\ldots,a_{N}\}$ , requiring that $a_{i}\neq 0$ only for a small number of nodes. The result is a set of sentinel nodes, whose average allows us to approximate the state of the entire system. We find that, with only a few strategically selected nodes, we can achieve network-wide observability.

Our analysis shows that the ideal sentinel node sets tend to sample the network heterogeneity, for example, comprising a range of low, average and high degrees. This indicates that the optimal combination of sentinel nodes is mainly determined by the network topology, largely independent of the system’s specific dynamics. Consequently, one can extract the sentinels even without a priori knowledge about the system’s hidden dynamics. This allows practical observability even under extremely restrictive conditions.

II Results

II.1 Seeking the sentinel nodes

Consider coupled nonlinear dynamics on a network, such as mutualistic or epidemic dynamics. (See Methods for the dynamics models and networks used.) In many cases, we want to know the mere average of the network activity, i.e., $\overline{x}=\sum_{i=1}^{N}x_{i}/N$ , where $x_{i}$ is the activity of the $i$ th node, and $N$ is the number of nodes in the network. For example, in contagion processes, if we assign $x_{i}=0$ for susceptible and $x_{i}=1$ for infectious, then the average $\overline{x}$ represents the fraction of infectious individuals in the population. Similarly, $\overline{x}$ may represent the average political opinion in a population, the fraction of people aware of news, average biomass in an ecosystem, or the extent of cancerous cells in the body. Quantity $\overline{x}$ is simple, intuitive, and practical, providing a first-hand summary of dynamics on networks.

Denoting by $x_{i}^{*}$ the equilibrium of $x_{i}$ , our goal is to approximate the average activity of all nodes, i.e.,

\overline{x}=\frac{1}{N}\sum_{i=1}^{N}x_{i}^{*},

(1)

using a limited set $S$ of sentinel nodes, namely,

\overline{x}^{\prime}=\frac{1}{n}\sum_{i\in S}x_{i}^{*}.

(2)

Here $n\ll N$ is the number of sentinel nodes in $S$ . To obtain $S$ , we simulate the system under a range of conditions, e.g., by varying the average edge weight $D$ . Parameter $D$ is a natural control parameter; as $D$ varies, $x$ in Eq. (1) may change or undergo critical transitions (Fig. 1). We then seek the optimal node set $S$ that minimizes

\varepsilon=\frac{\sum_{\ell=1}^{L}(\overline{x}^{\prime}_{\ell}-\overline{x}_% {\ell})^{2}}{L\sum_{\ell=1}^{L}\overline{x}_{\ell}},

(3)

where $\overline{x}_{\ell}$ and $\overline{x}^{\prime}_{\ell}$ are $\overline{x}$ and $\overline{x}^{\prime}$ , respectively, at the $\ell$ th value of $D$ . Equation (3) captures the total difference between the approximation $\overline{x}^{\prime}(D)$ and the exact $\overline{x}(D)$ across the $L$ different conditions that together sample a range of $D$ values. It is minimal when $\overline{x}^{\prime}$ successfully recovers the system average in Eq. (1), including its observed points of transition.

As an example, we consider the coupled double-well dynamics (Fig. 1a) on the dolphin social network (Fig. 1b) under different coupling strength $D$ . When uncoupled, each node has two states, i.e., lower and upper states, as equilibria. We initialize each simulation for each $D$ value by placing all the nodes in their lower state. We show the equilibrium state values, $x_{i}^{*}$ , for all nodes as a function of $D$ by the light gray lines in Fig. 1c–h. The black line represents the average activity of the network, $\overline{x}$ . As $D$ increases, $\overline{x}$ increases in stages with a prolonged intermediate range in which some $x_{i}^{*}$ s remain in their lower states (i.e., $x_{i}^{*}<2$ ) and the other $x_{i}^{*}$ have transitioned to their upper states (i.e., $x_{i}^{*}>5$ ). This range of $D$ values captures a multistage transition [21, 22, 19].

Refer to caption — Figure 1: Approximations of the average activity, $\overline{x}$ , of the coupled double-well dynamics on the dolphin network. (a) Coupled double-well dynamics model. (b) Dolphin network with $N=62$ nodes. Larger circles marked with color represent the nodes used in sentinel node approximations shown in (c)–(f): orange for $n=1$ , yellow for $n=2$ , green for $n=3$ , and blue for $n=4$ . (c)–(h) Equilibrium states in the dolphin network. The gray lines indicate $x_{i}^{*}$ at each value of the control parameter. The black lines indicate $\overline{x}$ . Note that the gray and black lines are the same in panels (c)–(h). The purple lines in (c)–(f) show the sentinel node approximation corresponding to the selected sentinel nodes, whose $x_{i}^{*}$ values are shown in orange (c), yellow (d), green (e), and blue (f). We verified by exhaustive search of every possible combination of nodes that the obtained node sets for $n=1$ (shown in (c)) and $n=2$ (shown in (d)) are exact minimizers of $\varepsilon$ . Panels (g) and (h) show the GBB and DART approximations in red and pink, respectively. The brown and dark yellow lines in each plot show the observables (i.e., a particular weighted average of $\{x_{1}^{*},\ldots,x_{N}^{*}\}$ ) that the GBB and DART schemes intend to approximate.

We demonstrate approximating $\overline{x}$ with the sentinel node approximation, $\overline{x}^{\prime}$ , with $n=1$ , $2$ , $3$ , and $4$ by the purple line in Fig. 1c, 1d, 1e, and 1f, respectively. The sentinel node approximation correctly identifies the onset of the bifurcation at $D\approx 0.2$ for any $n\in\{1,\ldots,4\}$ but over- or underestimates $\overline{x}$ depending on $D$ in the range of the multistage transition. In general, larger optimized node sets do not include smaller ones as a subset. In this example, there is no overlap between any of the optimized node sets with $n=1,2,3$ , and $4$ ; we show the selected sentinel nodes in color in Fig. 1b. However, the approximation accuracy progressively improves as one includes more sentinel nodes. Concretely, the approximation errors for $n=1$ , $2$ , $3$ , and $4$ are $\varepsilon=0.035$ , $\varepsilon=0.020$ , $\varepsilon=0.016$ , and $\varepsilon=0.004$ , respectively. Figure 1f demonstrates that, with a microscopic number of nodes (i.e., $n=4$ ), the sentinel node approximation captures major components of the multistage transition, resulting in a small approximation error. As expected, further increasing $n$ results in further reduction in approximation error (see SI section S2); however, for all practical purposes $n=4$ is already sufficient.

For comparison, we show the results obtained from the GBB and DART approximations in Fig. 1g and 1h, respectively. Even with just $n=1$ sentinel node, our method performs better than the GBB and DART, reducing $\varepsilon$ by more than 75% (GBB: $\varepsilon=0.171$ ; DART: $\varepsilon=0.142$ ). This is mainly owing to the fact that, while GBB and DART track the collective state of the system (i.e., $\langle x\rangle$ in the Introduction section) and thus predict a single transition point, our method can discern the multistage nature of the actual observed transition (black solid line). Most crucially, thanks to its machine learning component, our framework achieves the observed predictability with a much smaller set of observations—here just with $n=4$ nodes (Fig. 1f). In sum, our method realizes a dramatic reduction in complexity along with a comparable, and in the present case even improved, accuracy.

Because our algorithm is stochastic, we ran it 100 times, starting with independent initial node sets, on the same network to assess generality of the results shown in Fig. 1. We set $n=4$ . We show the approximation error, $\varepsilon$ , for the 100 optimized node sets by the blue circles in Fig. 2a. Each circle corresponds to one run. The optimized node set is different in each run in general. However, the $\varepsilon$ values of these optimized node sets are similar to each other, and most importantly they are small. This result supports the robustness of our algorithm with respect to the initial condition and stochasticity of the algorithm.

Next, we select two groups of random node sets for comparison. First, we take the best optimized node set among the 100 runs in terms of $\varepsilon$ , denoted by $\tilde{S}$ . We then make 100 node sets by randomly selecting $n=4$ nodes having the same degree sequence as $\tilde{S}$ . For example, if the selected optimized node set is composed of four nodes with degrees 1, 3, 3, and 8, then we sample one node with degree 1 uniformly at random from the network and similarly sample two nodes with degree 3 and one node with degree 8. We call this node set the degree-preserving random node set. We show $\varepsilon$ for each degree-preserving random node set by the orange triangles in Fig. 2a. Second, we make 100 node sets with $n=4$ by selecting nodes uniformly at random from the entire network. We call this node set the completely random node set. We calculate $\varepsilon$ for each of the 100 completely random node sets, shown by the green squares in Fig. 2a. As expected, the degree-preserving node sets outperform the completely random ones, exhibiting an approximation error that is concentrated on smaller values in general. The crucial point is, however, that our optimized node sets have consistently smaller errors than both alternatives, better than 100% of the random sets, and 89% of the degree-preserving sets. Therefore, while selecting $n$ nodes with appropriate degrees improves the accuracy of approximation, one can substantially reduce $\varepsilon$ by further optimization beyond only using a good degree sequence.

These results remain similar for other networks; see Fig. 2b for results for a Barabási-Albert (BA) network and SI section S3 for results obtained from eight more networks. In all networks, the optimized node sets yield relatively small $\varepsilon$ .

The patterns described for the coupled double-well dynamics across the various networks also hold true for the mutualistic species, gene-regulatory, and SIS dynamics (see SI section S3). Across every combination of one of the four dynamics and one of the ten networks, the worst optimized node set still has a smaller $\varepsilon$ than 100% of completely random node sets and 95.2% of degree-preserving node sets.

We statistically verified these observations by constructing an analysis of variance (ANOVA) model of the approximation error, $\varepsilon$ . The dependent variable is $\ln\varepsilon$ . There are three qualitative independent variables: dynamics (reference: coupled double-well dynamics), network (reference: BA network), and node set type (reference: completely random node sets). We exclude the Erdős-Rényi (ER) network because it yields small $\varepsilon$ regardless of node set type (see SI section S3), presumably because of the small spread of its degree distribution and the relative similarity across all nodes. We show the details of our analysis in SI section S4. Across all dynamics and networks, $\varepsilon$ of the optimized node set is on average more than 200 times smaller than $\varepsilon$ of completely random node sets ( $p<10^{-7}$ ) and more than 25 times smaller than $\varepsilon$ of degree-preserving node sets ( $p<10^{-7}$ ).

While we have used undirected and unweighted networks, we have verified that these results, including the ANOVA results, remain similar for various weighted networks (see SI section S9) and directed networks (see SI section S10).

II.2 Characteristics of good node sets

We have shown that finding nodes with an appropriate degree sequence is an important but not sufficient condition for finding a good set of sentinel nodes, $S$ . In this section, we explore the characteristics of the nodes that our algorithm selects for inclusion in $S$ .

Our first observation is that nodes in $S$ tend to have a degree sequence that is balanced across the average degree in the entire network. We show this pattern in Fig. 3a for the dolphin network. The top panel of Fig. 3a shows the degree histogram of all nodes in the network. The degree distribution is bimodal, with a predominance of nodes with degree 1 and a secondary group of nodes with degree near the mean degree $\overline{k}=5.13$ . The lower panels of Fig. 3a show the aggregate degree histograms realized by the nodes in $S$ obtained from 100 independent runs of the algorithm with $n=1$ , $2$ , $3$ , and $4$ , with one panel corresponding to an $n$ value. The degrees of the chosen node are sorted such that the blue bars represent the smallest degree node in the optimized node set in the 100 runs, orange the second smallest, red the third smallest, and light blue the largest degree nodes. We find that the algorithm balances the degrees of the selected nodes such that both small-degree and large-degree nodes tend to exist in a node set when $n\in\{3,4\}$ (and to a lesser extent when $n=2$ ). For example, in Fig. 1f, the $n=4$ nodes selected for $S$ have degrees 1, 6, 7, and 8. The $x_{i}^{*}$ values for these four nodes, shown by the four blue lines in Fig. 1f, appear above and below the target values, $\overline{x}$ , depending on whether the degree of the node is large and small, respectively. As a result, the average of these four lines provides an accurate approximation to $\overline{x}$ . Qualitatively the same phenomenon also occurs for $n=2$ and $n=3$ , as shown in Fig. 1d and e, respectively.

We show in Fig. 3b the results of the same analysis for the BA network used in Fig. 2b. The network has hub nodes, i.e., nodes with large degrees. The top hub nodes, with $k_{i}=$ 38, 40, 66, 76, and 81, are indicated by discrete peaks in the inset of the top panel of Fig. 3b. The bottom panels of Fig. 3b indicate that our optimization algorithm does not select these hub nodes in any of the 100 runs for each $n\in\{1,2,3,6\}$ . Instead, the algorithm selects nodes the degree of which is close to the mean; note that the mean degree is close to the minimum degree in BA networks. More specifically, the algorithm selects a mixture of nodes with small and moderately large degrees, but avoiding nodes with larger degrees.

The degree histograms of the optimized node sets shown in Fig. 3 appear to markedly differ from that of the entire network. To quantify the difference, we computed the Kullback-Leibler divergence, $D_{\rm KL}$ , between the optimized node sets and the original network in terms of the degree distribution (see SI section S5 for the methods).

We show the $D_{\rm KL}$ values for the dolphin network by the blue circles in Fig. 4a. For comparison, the green squares represent the average $D_{\rm KL}$ between the original degree distribution and completely random degree distributions constructed with the same number of nodes (i.e., $100n$ nodes) as that used for constructing the degree distribution from the $100$ optimized node sets. Figure 4a indicates that the discrepancy between the optimized node sets and the original networks in terms of the degree distribution is substantially large when $n$ is small; note the logarithmic scale on the vertical axis. As $n$ becomes larger, both the optimized and the random node sets tend to become more similar to the original network in terms of the degree distribution. However, the degree distribution of the optimized node sets is significantly different from the original degree distribution at all $n\in\{1,\ldots,12\}$ . Therefore, although our method tends to select a mix of large- and small- degree nodes, it does not sample the degree values uniformly at random even for relatively large node sets (i.e., up to $n=12$ ). We have verified that the optimized node sets have smaller approximation errors than completely random node sets of equivalent size for the same range of $n$ (see SI section S2), extending the results for the dolphin network shown in Fig. 2a for $n=4$ . It should be noted that the approximation error for both optimized and completely random node sets decreases as $n$ increases (see SI section S2), which is expected.

We have also found that the average degree of the nearest neighbors of the nodes in the optimized sentinel node set, sentinel nodes’ local clustering coefficients, and their community membership systematically deviate from these quantities for uniformly randomly selected nodes. However, heuristically constructed node sets exploiting the observed network properties, such as the avoidance of hub nodes (informed by Fig. 3) or multiple nodes from the same community, only marginally, though significantly, decrease the approximation error, $\varepsilon$ , compared to completely random node sets. See SI section S6 for these results. Therefore, we conclude that running our optimization algorithm is a necessary step to realize a high accuracy in approximating $\overline{x}$ .

II.3 Transfer learning: Effectiveness of optimized node sets across different dynamics

Our optimization algorithm does not use just the information on the network structure but also implicitly requires the dynamic model running on the networks. The challenge is that, in practice, we often do not know the actual dynamics that produce the observed data. Therefore, here we ask whether and to what extent our optimization algorithm enables transfer learning, that is, the application of node sets optimized with one dynamics, called the training dynamics, to another dynamics, called the test dynamics.

We again use the dolphin network as an example. In each panel of Fig. 5a, we plot the approximation error, $\varepsilon$ , of each optimized (blue circles), degree-preserving (orange triangles), and completely random (green squares) node set. For each node set, we show the approximation error for the training dynamics on the horizontal axis and the test dynamics on the vertical axis. We also show the average result (large symbols), as obtained for 100 node sets of each type.

As expected, node sets optimized on the training dynamics have consistently smaller $\varepsilon$ when evaluated on the same dynamics than on an alternative test dynamics. However, although $\varepsilon$ has more variance when evaluated on the test dynamics, the node sets optimized under a different training dynamics still outperform the completely random node sets on average (see Fig. 5a). See Fig. 5b–d for an example. In nine out of the twelve pairs of training and test dynamics, the optimized node sets also outperform the degree-preserving node sets (see Fig. 5a). Therefore, the optimized node sets are transferrable between different dynamics to some extent. This result is striking in particular because different dynamics may have different bifurcation structures. For example, the coupled double-well dynamics have saddle-node bifurcations, and the SIS dynamics have transcritical bifurcations. Additionally, the completely random node sets in each panel show an approximately linear relationship. Therefore, the random node sets that happen by chance to be good sentinels for a specific training dynamics continue to be good sentinels for the test dynamics on average. This result further supports our claim that the performance of node sets is transferrable across different dynamics.

Figure 5: Transfer learnability of the sentinel node approximation. (a) Approximation error for optimized node sets evaluated on training and test dynamics. Using the dolphin network, we computed 100 optimized node sets on one (training) dynamics. For the optimized node sets, we show the approximation error when evaluated on the same training dynamics, shown by the blue circles on the horizontal axis, and on an alternative (i.e., test) dynamics, shown on the vertical axis. We also show the approximation errors of 100 degree-preserving node sets (orange triangles) and 100 completely random node sets (green squares) evaluated on the training and test dynamics. The large markers show the average approximation error over the 100 node sets for each type of node set. (b) Dolphin network. The large blue nodes indicate the members of the optimized node set with

n=4

nodes that attains the lowest approximation error on the coupled double-well training dynamics. (c) Sentinel node approximation for the coupled double-well dynamics on the dolphin network. The states of the nodes marked in (b) are shown in blue. The states of the remaining nodes are shown in gray. The black line represents the unweighted average state of all nodes,

\overline{x}

. The purple line represents the sentinel node approximation of the node set marked in blue. (d) Sentinel node approximation for the SIS dynamics on the same network when the node set is optimized for the coupled double-well dynamics. The blue lines show

x_{i}^{*}

for the nodes marked in (b).

As before, we assess the generalizability of these observations with an ANOVA. The dependent variable is $\ln\varepsilon$ evaluated on the given test dynamics. The independent variables are the training dynamics, test dynamics, network, and node set type (see SI section S7 for details). Despite the substantial increase in $\varepsilon$ due to the lack of knowledge about the test dynamics, optimized node sets still achieve $\varepsilon$ that is 5.74 times smaller than completely random node sets ( $p<10^{-7}$ ) and 1.55 times smaller than degree-preserving node sets ( $p<10^{-7}$ ). We obtained similar results on the feasibility of transfer learning in the case of weighted networks (see SI section S9) and directed networks (see SI section S10). In sum, these results suggest that our optimization algorithm can select good sentinel node sets when we do not know the test dynamics, or even the type of bifurcation.

II.4 Weighted averaging

We have restricted ourselves to unweighted averages of $x_{i}^{*}$ as our approximators. Previous work has considered the case $n=N$ and chosen the weight vector $(a_{1},\ldots,a_{N})$ to be either the degree vector (i.e., $a_{i}$ is proportional to the degree of the $i$ th node) [9, 23, 15] or a particular eigenvector [10, 11, 13, 24, 14]. In this section, we keep $n$ small and additionally consider optimization of the weights of each node to allow a weighted average of $x_{i}^{*}$ given by $\overline{x}^{\prime}=\sum_{i=1;i\in S}^{N}a_{i}x_{i,\ell}^{*}$ , subject to $\sum_{i=1;i\in S}^{N}a_{i}=1$ and $a_{i}\geq 0$ $\forall i\in S$ . We find these weights using a quadratic programming step (see the Methods section for details) inserted into our original algorithm.

We find that the additional optimization of node weights leads to substantially smaller approximation error, $\varepsilon$ , than the unweighted average examined in the previous sections. Across all dynamics and networks, $\varepsilon$ is on average an additional 46 times smaller when we optimize both node set and node weight than when we only optimize the node set (see SI section S8 for the statistical results).

However, there is no extra advantage to optimizing node weights when one evaluates $\varepsilon$ on alternative dynamics. As in section II.3, we generated, for each dynamics and network, 100 node sets with their node selection and weights optimized on a training dynamics. We then evaluated $\varepsilon$ of the optimized node set on each other test dynamics. We found that, when evaluated on the test dynamics, $\varepsilon$ was not significantly different between the optimized node sets with additional weight optimization and those without ( $p=0.591$ ). Therefore, although optimizing node weights drastically improves the approximation to $\overline{x}$ if we know the actual dynamics, this advantage is lost if we do not know it.

III Discussion

As our understanding of complex system dynamics advances [3, 1, 25, 26, 27, 28, 29], a crucial bottleneck that we continue to face is our limited empirical access to the states of their multitude of relevant parameters. This bottleneck hinders the development of new theoretical analyses and empirical validation of existing theories on dynamics. A similar challenge in biology and medicine has been that it is often difficult to track every single metabolite, protein, or cell. In that context, the identification of biomarkers [30, 31] can offer dramatic breakthroughs, potentially allowing us to identify the state of the entire system just by tracking a small set of indicative parameters. Our work here envisions an equivalent opportunity in the context of complex networks.

A crucial aspect of our formulation is that the detected sentinels are largely agnostic to the dynamic model. This indicates that one can transfer the learned set of nodes from one dynamics to infer the state of the system under another dynamics. Such transfer learning is especially useful for systems with unknown dynamics, capturing a rather common challenge in the context of complex networked systems [32, 33, 34, 35, 36]. To further capitalize on this potential advantage, future work should aim to extract a master model, i.e., a nonlinear system of equations whose sentinel nodes optimally transfer to other types of dynamics.

Our work differs from other approaches for approximating the population activity of or efficiently observing a network by attempting to accurately estimate $\overline{x}$ . First, there are methods to select a small number of sentinel nodes for constructing early warning signals for anticipating regime shifts [16, 17, 18, 19, 20], but there is no a priori reason to consider that these sentinel nodes are also good at representing $\overline{x}$ . Second, the theory of network observability allows us to determine the node set, observation of which enables one to reconstruct the system’s complete activity [37]. The method has also been applied for selecting sentinel nodes with which to construct early warning signals [18]. However, this observability concerns inference of the initial condition, i.e., the activity of each node at time $0$ . In fact, $\overline{x}$ in the equilibrium or the time course of it, rather than its initial value, would be of practical interest in applications.

Our work also differs from prior work in the importance it attributes to hub nodes in heterogeneous networks. Both the GBB [9] and DART [10, 11] reduce dynamical systems on networks into a similar dynamical system of small dimensions. A key idea behind these methods is to theoretically find an appropriate linear combination $\langle x\rangle=\sum_{i=1}^{N}a_{i}x_{i}^{*}$ of $x_{1}^{*}$ , $\ldots$ , $x_{N}^{*}$ and the dynamical system that $\langle x\rangle$ approximately obeys. To do so, GBB and DART use the degree vector and the dominant eigenvector of the adjacency matrix as $\bm{a}=(a_{1},\ldots,a_{N})$ , respectively, as the first $\langle x\rangle$ . In fact, many large empirical networks are scale-free networks, i.e., those in which the degree obeys an approximate power-law distribution [38, 39, 40], so that a small fraction of hub nodes have very large degrees. In a related vein, many empirical and model networks show eigenvector localization phenomena, with which the most entries of the dominant eigenvector of the adjacency matrix are close to $0$ [41, 42]. Eigenvector localization implies that, at least in large degree-heterogeneous but otherwise uniformly random networks, most of the entries of $\bm{a}$ are close to $0$ and do not contribute to $\langle x\rangle$ . Therefore, $\langle x\rangle$ in the GBB and DART is approximately a weighted sum of a small number of $x_{i}$ . This situation apparently resembles selecting $n$ ( $\ll N$ ) nodes for $S$ in our framework. However, these methods and our sentinel node approximation select very different nodes. The nodes effectively used by GBB and DART (i.e., those whose $a_{i}$ values are far from $0$ ) tend to be hub nodes. Therefore, GBB and DART deliberately neglect the behavior of a majority of nodes, which have a small degree. In contrast, our optimized node set tends to avoid hub nodes, which are outliers, and instead mixes nodes with different degrees that are not extremely large. This discrepancy is because our goal is to observe the unweighted average, $\overline{x}=\sum_{i=1}^{N}x_{i}^{*}/N$ , while the goal of GBB and DART is to derive a dynamical system approximately closed in terms of $\langle x\rangle$ . Although hubs play important roles in dynamics on networks such as impacting on the epidemic threshold [43, 44] and having different responsiveness to perturbation compared to low-degree nodes [25], our optimized node sets does not include them to approximate $\overline{x}$ because their behavior is in general different from that of the majority of nodes.

Our framework accommodates directed and weighted networks, noisy dynamics, and, as mentioned above, the case in which dynamical system equations are unknown. Furthermore, without essential changes, our method allows any weighted sum of $\{x_{1}^{*},\ldots,x_{N}^{*}\}$ as the target quantity to be approximated. Together, these features make our method versatile enough for potential applications. For example, in ecosystem monitoring, our algorithm could help focus monitoring efforts on a small number of key species [45], which may not be the most numerous or well-connected species [46, 47]. Assuming that an estimated network of species interactions is available, which is often the case [48], one can derive a small number of sentinel species (i.e., nodes) by simulating a “ground truth” using a mutualistic species dynamics model on the measured network. Then, the populations of the chosen sentinel species are expected to provide a good approximation of population dynamics of the entire ecosystem.

From a methodological perspective, our drastic reduction from $N$ to just a small number of nodes is rooted in a combined toolbox that mixes theoretical modeling with machine learning techniques. The theoretical models, captured within the low-dimensional dynamical systems derived by GBB, DART, or other methods, represent our analytical insights into the system’s internal mechanisms. Such insights, however, are limited by the prohibitive complexity of the system. This is precisely where the machine learning component helps complement the theoretical approach, lending its computational power to help simplify the scale of the problem. This synergy between theoretical insights and the predictive power of modern computational learning technology represents, in our view, one of the main avenues towards predicting, observing, and influencing complex network behavior.

IV Methods

IV.1 Node Selection

Consider an undirected and unweighted network with $N$ nodes, $M$ edges, and an adjacency matrix $A=(A_{ij})$ with $A_{ij}\in\{0,1\}$ , $A_{ii}=0$ , and $A_{ij}=A_{ji}$ . Each node is characterized by a continuous variable $x_{i}$ representing, for example, the abundance of a species or the fraction of a population that is infected with a pathogen. A straightforward estimate of the network’s aggregate state is the unweighted average of all node states at equilibrium, $\overline{x}$ , given by Eq. (1). Our goal is to approximate $\overline{x}$ by a linear combination of a small number of equilibrium node states, $x_{i}^{*}$ , given by Eq. (2), with a small error and across a range of a control parameter that represents, e.g., an environmental state. Examples of the control parameter are the strength of mutualistic interactions between species and the infection rate of a contagion process. Choice of the control parameter, its range, and its effect on $x_{i}^{*}$ varies by model and is further described in section IV.2.

We seek a set $S$ of sentinel nodes with $|S|=n$ , $n\ll N$ . We assume that each node in $S$ has equal weight in determining $\overline{x}^{\prime}$ , as in Eq. (2). We relax this assumption in section II.4. We quantify the discrepancy between $\overline{x}$ and $\overline{x}^{\prime}$ by a normalized mean squared error over a given range of the control parameter, i.e., $\varepsilon=\frac{\sum_{\ell=1}^{L}(\overline{x}^{\prime}_{\ell}-\overline{x}_% {\ell})^{2}}{L\sum_{\ell=1}^{L}\overline{x}_{\ell}}$ , called the approximation error. Subscript $\ell\in\{1,\ldots L\}$ specifies a value of an evenly spaced control parameter, and $x_{i,\ell}^{*}$ represents the equilibrium value of $x_{i}$ for the $\ell$ th value of the control parameter; $\ell$ is omitted in the following text when there should be no confusion. We choose a control parameter which, when varied over the first and last (i.e., $L$ th) values, causes bifurcations in $x_{i}^{*}$ . We set $L=100$ in our numerical experiments.

Note that two other known methods for dynamics reduction on networks, GBB [9, 23, 15] and DART [10, 11, 13, 24, 14], attempt to match a different weighted sum of $x_{1}$ , $\ldots$ , $x_{N}$ from $\overline{x}$ . Specifically, the observables of the GBB and DART are in the form $\sum_{i=1}^{N}a_{i}x_{i}$ , where the $a_{i}$ is proportional to the degree of the $i$ th node in the case of the GBB and the $i$ th entry of the leading eigenvector of the adjacency matrix in the case of the DART.

Given all $x_{i,\ell}^{*}$ values (with $i\in\{1,\ldots,N\}$ and $\ell\in\{1,\ldots,L\}$ ) computed, we determine $S$ by combinatorial simulated annealing. Specifically, we first initialize $S$ by selecting $n$ nodes uniformly at random from the network and calculate the approximation error for $S$ , which we denote by $\varepsilon(S)$ . We arbitrarily set $n=\lfloor\ln N\rfloor$ , where $\lfloor\cdot\rfloor$ is the floor operation, unless we state otherwise. Then, in each $h$ th iteration of the algorithm, we select an $i$ th node uniformly at random from $S$ and replace it with a $j$ th node that does not belong to $S$ and is chosen uniformly at random. The tentative new node set is given by $S^{\prime}=S\setminus\{i\}\cup\{j\}$ . We compute $\varepsilon$ for $S^{\prime}$ , which we denote by $\varepsilon(S^{\prime})$ . We accept $S^{\prime}$ as the new $S$ according to the Metropolis criterion, i.e., with probability $1$ if $\varepsilon(S^{\prime})<\varepsilon(S)$ and with probability $q=\exp\{-[\varepsilon(S^{\prime})-\varepsilon(S)]/t\}$ otherwise. The normalization factor $t=t_{0}/\ln(h+e-1)$ decreases as $h\in\{1,\ldots,h_{\max}\}$ increases. Therefore, an $S^{\prime}$ with a larger approximation error than $S$ is less likely to be accepted as $h$ increases, encouraging convergence to a local minimum of $\varepsilon$ . We set $t_{0}=10$ . We also set $h_{\max}=50N$ and confirmed that this number of iteration was sufficient for reaching a local minimum of $\varepsilon$ .

IV.2 Dynamical system models

We test our method on the following four models of dynamical systems on networks.

Many systems have alternative stable states at a single value of a control parameter, the transition between which displays hysteresis. For example, a certain species may be present or extinct in an ecosystem under a given environmental condition, depending on the system’s state in the past [49]. Recovery from the extinct state may be difficult without substantial improvement of the environment. Similarly, tropical ecosystems may show sudden transitions from rainforest to grassland in response to just small changes in local water availability [50]. A coupled double-well dynamics model on networks [51, 52, 53, 54, 55] has been employed for modeling interacting climate regions [50] and biological species [21] and is given by

\frac{dx_{i}}{dt}=-(x_{i}-r_{1})(x_{i}-r_{2})(x_{i}-r_{3})+D\sum_{j=1}^{N}A_{% ij}x_{j},

(4)

where $r_{1}$ , $r_{2}$ , and $r_{3}$ determine the location of the equilibria and satisfy $r_{1}<r_{2}<r_{3}$ ; $D$ is the coupling strength parameter. We set $r_{1}=1$ , $r_{2}=3$ , and $r_{3}=5$ . In the absence of coupling, this model has stable equilibria at $x=r_{1}$ and $x=r_{3}$ , and an unstable equilibrium at $x=r_{2}$ . In the presence of coupling, all $x_{i}$ s tend to be near $r_{1}$ when $D$ is sufficiently small, and they tend to be near $x_{3}$ when $D$ is large. We use $D$ as the control parameter in the range $[0,1]$ .

A similar model proposed for mutualistic species interactions is given by

\frac{dx_{i}}{dt}=B+x_{i}\left(1-\frac{x_{i}}{K}\right)\left(\frac{x_{i}}{C}-1% \right)+D\sum_{j=1}^{N}A_{ij}\frac{x_{i}x_{j}}{\tilde{D}+Ex_{i}+Hx_{j}},

(5)

where $x_{i}$ represents the abundance of the $i$ th species and $B$ , $K$ , $C$ , $D$ , $\tilde{D}$ , $E$ , and $H$ are constants [9]. The constant $B$ represents migration rate and $K$ the carrying capacity. The Allee constant, $C$ , represents the ease with which a species can become established in the environment. In the absence of migration and species interactions, $x_{i}>0$ at equilibrium if and only if the initial $x_{i}>C$ ; otherwise $x_{i}=0$ is the only stable equilibrium. We set $B=1$ , $K=5$ , $\tilde{D}=5$ , $E=0.9$ , and $H=0.1$ following [9]. We use $D$ as the control parameter in the range $[0,3]$ ; to observe the transition of all nodes, we use a more extended range of $D$ for this model than for the other models described in this section.

The deterministic susceptible-infectious-susceptible (SIS) model on networks, also called the individual-based approximation of the stochastic SIS model, is given by

\frac{dx_{i}}{dt}=-\mu x_{i}+\lambda\sum_{j=1}^{N}A_{ij}(1-x_{i})x_{j},

(6)

where $x_{i}$ represents the probability that the $i$ th node is infectious, $\lambda$ is the infection rate, and $\mu$ is the recovery rate [44]. The second term on the right-hand side expresses the rate at which the $j$ th node infects the $i$ th node. We set $\mu=1$ without loss of generality; changing $(\lambda,\mu)$ to $(c\lambda,c\mu)$ with a constant $c$ is equivalent to scaling the time by a factor of $c$ , so it does not affect the equilibrium. We use $\lambda$ as the control parameter in the range $[0,1]$ . One obtains $x_{i}=0$ $\forall i$ when $\lambda$ is below a value called the epidemic threshold. Above the epidemic threshold, all $x_{i}^{*}$ ( $<1$ ) values are positive given a positive initial value. The parameter $\lambda$ has the same role as $D$ in the other dynamics models.

A model of gene regulatory dynamics is given by

\frac{dx_{i}}{dt}=-Bx_{i}^{f}+D\sum_{j=1}^{N}A_{ij}\frac{x_{j}^{h}}{1+x_{j}^{h% }},

(7)

where $x_{i}$ represents the expression level of the $i$ th gene [9]. We set $B=1$ , $f=1$ , and $h=2$ following [9]. We use $D$ as the control parameter in the range $[0,1]$ . Given sufficiently large initial values, all $x_{i}^{*}$ will remain large when $D$ is above a threshold value, a situation which represents a living cell. When $D$ is small, all $x_{i}$ s approach zero, representing cell death.

To compute equilibrium values of $x_{i}$ at each control parameter value, we initially set each $x_{i}$ to a model-dependent standard value (coupled double-well: $x_{i}=1$ , SIS: $x_{i}=0.01$ , mutualistic species: $x_{i}=0.001$ , gene-regulatory: $x_{i}=2$ ). Then, we solved the ODEs using the LSODA algorithm [56] provided by the deSolve package for R [57]. We used a total simulation time of $T=15$ , which we found was sufficient to allow each of the above dynamics on each network described below to relax to a point where no further change was noticeable. We used the value of each $x_{i}$ at $T=15$ as $x_{i,\ell}^{*}$ .

IV.3 Networks

We chose five empirical and five model networks to test our method. All networks were coerced to be undirected, simple (i.e., no self- or multi-edges), and unweighted. We used the largest connected component.

In the social network of wild dolphins [58], which we refer to as the dolphin network, each node is a dolphin individual. Two nodes are defined to be adjacent if two individuals were observed together more often than expected by chance. This network has $N=62$ nodes, $M=159$ edges, and an average degree $\overline{k}=5.13$ . The coefficient of variation (CV), defined as the standard deviation divided by the mean, of the degree is 0.58.

The BA model generates networks with power-law degree distributions [38]. We initialized a BA network with a complete graph of three nodes and three edges, and added a single node with $m=2$ edges in each time step. The final network had $N=1,000$ , $M=1,996$ , $\overline{k}=3.99$ , and CV of the degree is equal to $1.33$ .

We used eight other networks as well as the dolphin and BA networks in the ANOVA analyses. See SI section S1 for the description of these eight networks.

IV.4 Optimization of node weights

In section II.4, we approximate $\overline{x}_{\ell}$ with a weighted average of the node states in $S$ , $\overline{x}^{\prime}_{\ell}=\sum_{i=1;i\in S}^{N}a_{i}x_{i,\ell}^{*}$ . The GBB and DART assign weights to all the nodes in the network. In contrast, here we determine weights only for the nodes in $S$ by a quadratic programming optimization.

We insert the quadratic programming step into our node selection method as follows. Assume that we have a set of nodes $S$ , $|S|=n$ . This $S$ can be random, as in the initial iteration of the algorithm, or a candidate set $S^{\prime}$ from any subsequent iteration. Let $\bm{a}=\{a_{i}\}_{i\in S}$ be an $n$ -dimensional vector of weights. We require that these weights are non-negative and sum to $1$ . The mean squared error over the $L$ values of the control parameter is given by

\epsilon=\frac{1}{L}\sum_{\ell=1}^{L}\left(\sum_{i=1;i\in S}^{N}a_{i}x_{i,\ell% }^{*}-\overline{x}_{\ell}\right)^{2}=\frac{1}{L}\left(\bm{a}^{\top}X^{\top}X% \bm{a}-2\bm{\overline{x}}^{\top}X\bm{a}+\bm{\overline{x}}^{\top}\bm{\overline{% x}}\right),

(8)

where $X$ is the $L\times n$ matrix of $x_{i,\ell}^{*}$ (with $\ell\in\{1,\ldots,L\}$ , $i\in S$ ), $\bm{\overline{x}}=(\overline{x}_{1},\ldots,\overline{x}_{L})^{\top}$ , and ^⊤ represents the transposition. Therefore, we determine $\bm{a}$ by

$\displaystyle\min$	$\displaystyle\frac{1}{2}\bm{a}^{\top}X^{\top}X\bm{a}-\bm{\overline{x}}^{\top}X% \bm{a},$	(9)
s.t.	$\displaystyle\bm{1}^{\top}\bm{a}=1,$
	$\displaystyle\bm{a}\geq\bm{0},$

where $\bm{1}=(1,\ldots,1)^{\top}$ . We use the qpOASES algorithm [59] to solve for $\bm{a}$ via the ROI package [60] in R. In an iteration of our algorithm, we first tentatively update $S$ by carrying out one step of the algorithm for the unweighted averaging, optimize the node weights, $\bm{a}$ , given the updated $S$ by solving Eq. (9), and calculate $\varepsilon$ for the updated $S$ and optimized $\bm{a}$ . We then compare the obtained $\varepsilon$ with the value of $\varepsilon$ before updating $S$ and optimizing $\bm{a}$ , and apply the Metropolis criterion to determine whether or not we adopt the tentatively updated $S$ and $\bm{a}$ . We iterate this process as described in Sec. IV.1.

Note that this additional optimization step substantially increases computation time. However, in practice, local minima are reached after fewer iterations of the simulated annealing algorithm. Therefore, we use $25N$ iterations of the algorithm when we optimize the node weights, instead of the $50N$ iterations used when we do not optimize the node weights.

Data and code availability

Data and analysis code are available at https://github.com/ngmaclaren/reduction.

Acknowledgements

This work was performed in part at Center for Computational Research, the State University of New York at Buffalo.

Author contributions

B.B. and N. Masuda conceived the study. N.G. MacLaren and N. Masuda developed the method and analyzed the data. N.G. MacLaren wrote the code and ran simulations. All the authors wrote the manuscript.

Funding

B.B. was supported by the Israel Science Foundation (grant no. 499/19), the Israel-China ISF-NSFC joint research program (grant no. 3552/21), and by the VATAT grant for data science research. N. Masuda was supported by the Japan Science and Technology Agency (JST) Moonshot R&D (under grant no. JPMJMS2021), the National Science Foundation (under grant no. 2052720), and JSPS KAKENHI (under grant nos. JP 21H04595, 23H03414, and 24K14840).

References

[1] M. A. Porter and J. P. Gleeson. Dynamical systems on networks: A tutorial. Springer, London, UK, 2010.
[2] M. E. J. Newman. Networks. Oxford University Press, Oxford, UK, 2nd edition, 2018.
[3] A. Barrat, M. Barthélemy, and A. Vespignani. Dynamical processes on complex networks. Cambridge University Press, Cambridge, UK, 2008.
[4] G. Karlebach and R. Shamir. Modelling and analysis of gene regulatory networks. Nature Reviews Molecular Cell Biology, 9:770–780, 2008.
[5] C. Trapnell. Defining cell types and states with single-cell genomics. Genome Research, 25:1491–1498, 2015.
[6] A. Wagner, A. Regev, and N. Yosef. Revealing the vectors of cellular identity with single-cell genomics. Nature Biotechnology, 34(11):1145–1160, 2016.
[7] M. S. Wisz, J. Pottier, W. D. Kissling, L. Pellissier, J. Lenoir, C. F. Damgaard, C. F. Dormann, M. C. Forchhammer, J.-A. Grytnes, A. Guisan, R. K. Heikkinen, T. T. Høye, I. Kühn, M. Luoto, L. Maiorano, M.-C. Nilsson, S. Normand, E. Öckinger, N. M. Schmidt, M. Termansen, A. Timmermann, D. A. Wardle, P. Aastrup, and J.-C. Svenning. The role of biotic interactions in shaping distributions and realised assemblages of species: Implications for species distribution modelling. Biological Reviews, 88:15–30, 2013.
[8] J. Bascompte and M. Scheffer. The resilience of plant–pollinator networks. Annual Review of Entomology, 68:363–380, 2023.
[9] J. Gao, B. Barzel, and A.-L. Barabási. Universal resilience patterns in complex networks. Nature, 530:307–312, 2016.
[10] E. Laurence, N. Doyon, L. J. Dubé, and P. Desrosiers. Spectral dimension reduction of complex dynamical networks. Physical Review X, 9:011042, 2019.
[11] V. Thibeault, G. St-Onge, L. J. Dubé, and P. Desrosiers. Threefold way to the dimension reduction of dynamics on networks: An application to synchronization. Physical Review Research, 2(4):043215, 2020.
[12] C. Tu, P. D’Odorico, and S. Suweis. Dimensionality reduction of complex dynamical systems. iScience, 24:101912, 2021.
[13] N. Masuda and P. Kundu. Dimension reduction of dynamical systems on networks with leading and non-leading eigenvectors of adjacency matrices. Physical Review Research, 4(2):023257, 2022.
[14] M. Vegué, V. Thibeault, P. Desrosiers, and A. Allard. Dimension reduction of dynamics on modular and heterogeneous directed networks. PNAS Nexus, 2(5):pgad150, 2023.
[15] C. Ma, G. Korniss, B. K. Szymanski, and J. Gao. Generalized dimension reduction approach for heterogeneous networked systems with time-delay. arXiv, page 2308.11666, 2023.
[16] L. Chen, R. Liu, Z.-P. Liu, M. Li, and K. Aihara. Detecting early-warning signals for sudden deterioration of complex diseases by dynamical network biomarkers. Scientific Reports, 2:342, 2012.
[17] F. Vafaee. Using multi-objective optimization to identify dynamical network biomarkers as early-warning signals of complex diseases. Scientific Reports, 6:22023, 2016.
[18] A. Aparicio, J. X. Velasco-Hernández, C. H. Moog, Y.-Y. Liu, and M. T. Angulo. Structure-based identification of sensor species for anticipating critical transitions. Proceedings of the National Academy of Sciences of the United States of America, 118(51):e2104732118, 2021.
[19] N. G. MacLaren, P. Kundu, and N. Masuda. Early warnings for multi-stage transitions in dynamics on networks. Journal of the Royal Society Interface, 20:20220743, 2023.
[20] N. Masuda, K. Aihara, and N. G. MacLaren. Anticipating regime shifts by mixing early warning signals from different nodes. Nature Communications, 15:1086, 2024.
[21] J. J. Lever, I. A. van de Leemput, E. Weinans, R. Quax, V. Dakos, E. H. van Nes, J. Bascompte, and M. Scheffer. Foreseeing the future of mutualistic communities beyond collapse. Ecology Letters, 23:2–15, 2020.
[22] N. Wunderling, M. Gelbrecht, R. Winkelmann, J. Kurths, and J. F. Donges. Basin stability and limit cycles in a conceptual model for climate tipping cascades. New Journal of Physics, 22:123031, 2020.
[23] H. Zhang, Q. Wang, W. Zhang, S. Havlin, and J. Gao. Estimating comparable distances to tipping points across mutualistic systems by scaled recovery rates. Nature Ecology & Evolution, 6:1524–1536, 2022.
[24] V. Thibeault, A. Allard, and P. Desrosiers. The low-rank hypothesis of complex systems. Nature Physics, 20:294–302, 2024.
[25] B. Barzel and A.-L. Barabási. Universality in network dynamics. Nature Physics, 9:673–681, 2013.
[26] Z. Wang, M. A. Andrews, Z.-X. Wu, L. Wang, and C. T. Bauch. Coupled disease–behavior dynamics on complex networks: A review. Physics of Life Reviews, 15:1–29, 2015.
[27] U. Harush and B. Barzel. Dynamic patterns of information flow in complex networks. Nature Communications, 8:2181, 2017.
[28] C. Hens, U. Harush, S. Haber, R. Cohen, and B. Barzel. Spatiotemporal signal propagation in complex networks. Nature Physics, 15:403–412, 2019.
[29] R. M. D’Souza, M. di Bernardo, and Y.-Y. Liu. Controlling complex networks with complex nodes. Nature Reviews Physics, 5:250–262, 2023.
[30] J. P. B. O’Connor, E. O. Aboagye, J. E. Adams, H. J. W. L. Aerts, S. F. Barrington, A. J. Beer, R. Boellaard, S. E. Bohndiek, M. Brady, G. Brown, D. L. Buckley, T. L. Chenevert, L. P. Clarke, S. Collette, G. J. Cook, N. M. deSouza, J. C. Dickson, C. Dive, J. L. Evelhoch, C. Faivre-Finn, F. A. Gallagher, F. J. Gilbert, R. J. Gillies, V. Goh, J. R. Griffiths, A. M. Groves, S. Halligan, A. L. Harris, D. J. Hawkes, O. S. Hoekstra, E. P. Huang, B. F. Hutton, E. F. Jackson, G. C. Jayson, A. Jones, D.-M. Koh, D. Lacombe, P. Lambin, N. Lassau, M. O. Leach, T.-Y. Lee, E. L. Leen, J. S. Lewis, Y. Liu, M. F. Lythgoe, P. Manoharan, R. J. Maxwell, K. A. Miles, B. Morgan, S. Morris, T. Ng, A. R. Padhani, G. J. M. Parker, M. Partridge A. P. Pathak, A. C. Peet, S. Punwani, A. R. Reynolds S. P. Robinson, L. K. Shankar, R. A. Sharma, D. Soloviev, S. Stroobants, D. C. Sullivan, S. A. Taylor, P. S. Tofts, G. M. Tozer, M. van Herk, S. Walker-Samuel, J. Wason, K. J. Williams, P. Workman, T. E. Yankeelov, K. M. Brindle, L. M. McShane, A. Jackson, and J. C. Waterton. Imaging biomarker roadmap for cancer studies. Nature Reviews Clinical Oncology, 14:169–186, 2017.
[31] R. M. Califf. Biomarker definitions and their applications. Experimental Biology and Medicine, 243:213–221, 2018.
[32] B. Barzel, Y.-Y. Liu, and A.-L. Barab’asi. Constructing minimal models for complex system dynamics. Nature Communications, 6:7186, 2015.
[33] H.-T. Wai, A. Scaglione, B. Barzel, and A. Leshem. Joint network topology and dynamics recovery from perturbed stationary points. IEEE Transactions on Signal Processing, 67(17):4582–4596, 2019.
[34] Y.-C. Lai. Finding nonlinear system equations and complex network structures from data: A sparse optimization approach. Chaos, 31:082101, 2021.
[35] T.-T. Gao and G. Yan. Autonomous inference of complex network dynamics from incomplete and noisy data. Nature Computational Science, 2:160–168, 2022.
[36] J. Koch, Z. Chen, A. Tur, J. Drgona, and D. Vrabie. Structural inference of networked dynamical systems with universal differential equations. Chaos, 33:023103, 2023.
[37] Y.-Y. Liu, J.-J. Slotine, and A.-L. Barabási. Observability of complex systems. Proceedings of the National Academy of Sciences of the United States of America, 110(7):2460–2465, 2013.
[38] A.-L. Barabási and R. Albert. Emergence of scaling in random networks. Science, 286(5439):509–512, 1999.
[39] M. E. J. Newman. Power laws, Pareto distributions and Zipf’s law. Contemporary Physics, 46(5):323–351, 2005.
[40] I. Voitalov, P. Van Der Hoorn, R. Van Der Hofstad, and D. Krioukov. Scale-free networks well done. Physical Review Research, 1:033034, 2019.
[41] A. V. Goltsev, S. N. Dorogovtsev, J. G. Oliveira, and J. F. F. Mendes. Localization and spreading of diseases in complex networks. Physical Review Letters, 109(12):128702, 2012.
[42] R. Pastor-Satorras and C. Castellano. Distinct types of eigenvector localization in networks. Scientific Reports, 6:18847, 2016.
[43] R. Pastor-Satorras and A. Vespignani. Epidemic spreading in scale-free networks. Physical Review Letters, 86(14):3200–3203, 2001.
[44] R. Pastor-Satorras, C. Castellano, P. Van Mieghem, and A. Vespignani. Epidemic processes in complex networks. Reviews of Modern Physics, 87(3):925–979, 2015.
[45] S. L. Hale and J. L. Koprowski. Ecosystem-level effects of keystone species reintroduction: A literature review. Restoration Ecology, 26(3):439–445, 2018.
[46] L. S. Mills, M. E. Soulé, and D. F. Doak. The keystone-species concept in ecology and conservation: Management and policy must explicitly consider the complexity of interactions in natural systems. BioScience, 43(4):219–224, 1993.
[47] H. E. W. Cottee-Jones and R. J. Whittaker. Perspective: The keystone species concept: A critical appraisal. Frontiers of Biogeography, 4(3), 2012.
[48] E. Delmas, M. Besson, M.-H. Brice, L. A. Burkle, G. V. Dalla Riva, M.-J. Fortin, D. Gravel, P. R. Guimar aes Jr., D. H. Hembry, E. A. Newman, J. M. Olesen, M. M. Pires, J. D. Yeakel, and T. Poisot. Analysing ecological networks of species interactions. Biological Reviews, 94:16–36, 2019.
[49] M. Scheffer, J. Bascompte, W. A. Brock, V. Brovkin, S. R. Carpenter, V. Dakos, H. Held, E. H. Van Nes, M. Rietkerk, and G. Sugihara. Early-warning signals for critical transitions. Nature, 461:53–59, 2009.
[50] N. Wunderling, A. Staal, B. Sakschewski, M. Hirota, O. A. Tuinenburg, J. F. Donges, H. M. J. Barbosa, and R. Winkelmann. Recurrent droughts increase risk of cascading tipping events by outpacing adaptive capacities in the Amazon rainforest. Proceedings of the National Academy of Sciences of the United States of America, 119(32):e2120777119, 2022.
[51] N. E. Kouvaris, H. Kori, and A. S. Mikhailov. Traveling and pinned fronts in bistable reaction-diffusion systems on networks. PLOS ONE, 7(9):e45029, 2012.
[52] C. D. Brummitt, G. Barnett, and R. M. D’Souza. Coupled catastrophes: Sudden shifts cascade and hop among interdependent systems. Journal of The Royal Society Interface, 12:20150712, 2015.
[53] J. Krönke, N. Wunderling, R. Winkelmann, A. Staal, B. Stumpf, O. A. Tuinenburg, and J. F. Donges. Dynamics of tipping cascades on complex networks. Physical Review E, 101(4):042311, 2020.
[54] P. Kundu, H. Kori, and N. Masuda. Accuracy of a one-dimensional reduction of dynamical systems on networks. Physical Review E, 105(2):024305, 2022.
[55] P. Kundu, N. G. MacLaren, H. Kori, and N. Masuda. Mean-field theory for double-well systems on degree-heterogeneous networks. Proceedings of the Royal Society A, 478:20220350, 2022.
[56] Linda Petzold. Automatic selection of methods for solving stiff and nonstiff systems of ordinary differential equations. SIAM Journal on Scientific and Statistical Computing, 4(1):136–148, 1983.
[57] K. Soetaert, T. Petzoldt, and R. W. Setzer. Solving differential equations in R: Package deSolve. Journal of Statistical Software, 33(9):1–25, 2010.
[58] D. Lusseau, K. Schneider, O. J. Boisseau, P. Haase, E. Slooten, and S. M. Dawson. The bottlenose dolphin community of Doubtful Sound features a large proportion of long-lasting associations. Behavioral Ecology and Sociobiology, 54:396–405, 2003.
[59] H. J. Ferreau, C. Kirches, A. Potschka, H. G. Bock, and M. Diehl. qpOASES: A parametric active-set algorithm for quadratic programming. Mathematical Programming Computation, 6:327–363, 2014.
[60] S. Theußl, F. Schwendinger, and K. Hornik. ROI: An extensible R optimization infrastructure. Journal of Statistical Software, 94(15):1–64, 2020.
[61] J. Kunegis. KONECT – The Koblenz Network Collection. In Proceedings of the 22nd International Conference on the World Wide Web, pages 1343–1350, 2013.
[62] L. Isella, J. Stehlé, A. Barrat, C. Cattuto, J.-F. Pinton, and W. Van den Broeck. What’s in a crowd? Analysis of face-to-face behavioral networks. Journal of Theoretical Biology, 271:166–180, 2011.
[63] H. Jeong, B. Tombor, R. Albert, Z. N. Oltvai, and A.-L. Barabási. The large-scale organization of metabolic networks. Nature, 407:651–654, 2000.
[64] L. Šubelj and M. Bajec. Robust network community detection using balanced propagation. European Physical Journal B, 81:353–362, 2011.
[65] R. Guimerà, L. Danon, A. Díaz-Guilera, F. Giralt, and A. Arenas. Self-similar community structure in a network of human interactions. Physical Review E, 68(6):065103, 2003.
[66] A. A. Hagberg, D. A. Schult, and P. J. Swart. Exploring network structure, dynamics, and function using NetworkX. In G. Varoquaux, T. Vaught, and J. Millman, editors, Proceedings of the 7th Python in Science Conference, pages 11–15, Pasadena, CA USA, 2008.
[67] G. Csárdi, T. Nepusz, V. Traag, Sz. Horvát, F. Zanini, D. Noom, and K. Müller. igraph: Network Analysis and Visualization in R, 2024. R package version 2.0.3.
[68] P. Holme and B. J. Kim. Growing scale-free networks with tunable clustering. Physical Review E, 65(2):026107, 2002.
[69] K.-I. Goh, B. Kahng, and D. Kim. Universal behavior of load distribution in scale-free networks. Physical Review Letters, 87(27):278701, 2001.
[70] F. Chung and L. Lu. Connected components in random graphs with given expected degree sequences. Annals of Combinatorics, 6:125–145, 2002.
[71] Y. S. Cho, J. S. Kim, J. Park, B. Kahng, and D. Kim. Percolation transitions in scale-free networks under the Achlioptas process. Physical Review Letters, 103(13):135702, 2009.
[72] A. Lancichinetti, S. Fortunato, and F. Radicchi. Benchmark graphs for testing community detection algorithms. Physical Review E, 78(4):046110, 2008.
[73] A. Arenas, A. Díaz-Guilera, and C. J. Pérez-Vicente. Synchronization reveals topological scales in complex networks. Physical Review Letters, 96(11):114102, 2006.
[74] M. T. Schaub, J.-C. Delvenne, R. Lambiotte, and M. Barahona. Structured networks and coarse-grained descriptions: A dynamical perspective. In P. Doreian, V. Batagelj, and A. Ferligoj, editors, Advances in Network Clustering and Blockmodeling, pages 333–361. John Wiley & Sons, Ltd, 2020.
[75] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre. Fast unfolding of communities in large networks. Journal of Statistical Mechanics, 2008:P10008, 2008.
[76] Y. Takahata. Diachronic changes in the dominance relations of adult female Japanese monkeys of the Arashiyama B group. In L. M. Fedigan and P. J. Asquith, editors, The Monkeys of Arashiyama: Thirty-five Years of Research in Japan and the West, pages 123–139. State University of New York Press, Albany, NY, 1991.
[77] J. S. Coleman. Introduction to Mathematical Sociology. Collier-Macmillan, London, 1964.
[78] L. C. Freeman, C. M. Webster, and D. M. Kirke. Exploring social structure using dynamic three-dimensional color images. Social Networks, 20(2):109–118, 1998.
[79] T. P. Peixoto. The Netzschleuder network catalogue and repository, 2020. https://networks.skewed.de/, accessed 18 April 2024.
[80] L. C. Freeman, S. C. Freeman, and A. G. Michaelson. On human social intelligence. Journal of Social and Biological Structures, 11(4):415–425, 1988.
[81] B. Hayes. Connecting the dots. American Scientist, 94(5):400–404, 2006.
[82] R. B. Correia, L. P. de Araújo Kohler, M. M. Mattos, and L. M. Rocha. City-wide electronic health records reveal gender and age biases in administration of known drug–drug interactions. NPJ Digital Medicine, 2:74, 2019.
[83] M. E. J. Newman. Finding community structure in networks using the eigenvectors of matrices. Physical Review E, 74(3):036104, 2006.
[84] S. J. Cook, T. A. Jarrell, C. A. Brittin, Y. Wang, A. E. Bloniarz, M. A. Yakovlev, K. C. Q. Nguyen, L. T.-H. Tang, E. A. Bayer, J. S. Duerr, H. E. Bülow, O. Hobert, D. H. Hall, and S. W. Emmons. Whole-animal connectomes of both Caenorhabditis elegans sexes. Nature, 571:63–71, 2019.
[85] R. Hausmann, C. A. Hidalgo, S. Bustos, M. Coscia, A. Simoes, and M A. Yildirim. The Atlas of Economic Complexity: Mapping Paths to Prosperity. MIT Press, 2013.
[86] R. M. Thompson and C. R. Townsend. Impacts on stream food webs of native and exotic forest: An intercontinental comparison. Ecology, 84(1):145–161, 2003.
[87] J. Coleman, E. Katz, and H. Menzel. The diffusion of an innovation among physicians. Sociometry, 20(4):253–270, 1957.
[88] R. Michalski, S. Palus, and P. Kazienko. Matching organizational structure and social network extracted from email communication. In W. Abramowicz, editor, 14th International Conference on Business Information Systems, Lecture Notes in Business Information Processing, pages 197–206, 2011.
[89] L. Šubelj and M. Bajec. Community structure of complex software systems: Analysis and applications. Physica A, 390(16):2968–2975, 2011.
[90] S. S. Shen-Orr, R. Milo, S. Mangan, and U. Alon. Network motifs in the transcriptional regulation network of Escherichia coli. Nature Genetics, 31:64–68, 2002.
[91] R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon. Network motifs: simple building blocks of complex networks. Science, 298(5594):824–827, 2002.
[92] Bureau of Transportation Statistics. T-100 domestic market, 2020. https://networks.skewed.de/net/us_air_traffic, accessed 18 April 2024.
[93] United States Federal Aviation Administration. Air traffic control system command center, 2010. https://networks.skewed.de/net/faa_routes, accessed 18 April 2024.

Supplementary Materials for:
Approximating nonlinear dynamics on networks by sentinel nodes

Neil G. MacLaren, Baruch Barzel, and Naoki Masuda

Appendix S1 Undirected and unweighted networks used in the analysis

In the main text, we have shown results from two undirected and unweighted networks, i.e., the dolphin network and a network generated by the Barabási-Albert (BA) model. We used eight other undirected and unweighted networks, of which four are empirical networks and the other four are model networks, in our statistical analyses.

We downloaded the five empirical networks, including the dolphin network described in the main text, from the KONECT repository [61] at http://konect.cc. The other four empirical networks are as follows:

Proximity: A network of visitors at a museum [62]. Each visitor is a node. Two nodes are adjacent (i.e., directly connected by an edge) if any face-to-face contact of 20 seconds or more was recorded between them. There are 69 days of recording [62]. We use the day with the largest number of contacts, which is the network provided in the KONECT repository. The network has $N=410$ , $M=2,765$ , $\overline{k}=13.49$ , and CV of the degree equal to $0.62$ .
Metabolic: A metabolic network of the nematode Caenorhabditis elegans [63]. In this network, nodes are metabolic compounds. Two nodes are adjacent if one metabolite is a product of the other. Note that the original study treated this network as directed, but we use the undirected version here. This network has $N=453$ , $M=2,025$ , $\overline{k}=8.94$ , and CV of the degree equal to $1.87$ .
Road: A network of major European roads [64]. Each node represents a city. Two cities are adjacent if there is a road connection between them. Our version of this network has $N=1,039$ , $M=1,305$ , $\overline{k}=2.51$ , and CV of the degree equal to $0.48$ .
Email: A network of email exchanges at the University of Rovira i Virgili [65]. Nodes in this network are email accounts. Two nodes are adjacent if at least one email was sent between them. Our version of this network has $N=1,133$ , $M=5,451$ , $\overline{k}=9.62$ , and CV of the degree equal to $0.97$ .

We used instances of five undirected random network models, including the BA model, with $N=1,000$ nodes. We removed any self-loops or multi-edges and retained the largest connected component. We generated the Erdős-Rényi (ER), BA, Holme-Kim (HK), and Lancichinetti-Fortunato-Radicchi (LFR) models using NetworkX [66], and the Gao-Kahng-Kim (GKK) model using igraph [67]. The BA network is described in the main text. The remaining four networks are as follows:

ER: We generated an undirected ER network with the probability of connecting two edges set to $p=0.05$ . The resulting network was connected and had $N=1,000$ , $M=25,132$ , $\overline{k}=50.26$ , and CV of the degree equal to $0.14$ .
HK: The HK model is a variation of the BA model that aims to produce high clustering (i.e., large density of triangles) [68]. We set $m=2$ and a target local clustering coefficient of $0.1$ . The final network had $N=1,000$ , $M=1,996$ , $\overline{k}=3.99$ , CV of the degree equal to $1.40$ , and an average local clustering coefficient of $0.12$ .
GKK: The GKK model is a node fitness model [69]. A node is assigned a fitness $f_{i}=(i+i_{0}-1)^{-\alpha}$ , where $i_{0}=N^{1-\frac{1}{\alpha}}\left[10\sqrt{2}(1-\alpha)\right]^{\frac{1}{\alpha}}$ constrains the maximum degree [70, 71]. For each $i,j\in\{1,\ldots,N\}$ , edge $(i,j)$ is present with probability $\frac{f_{i}f_{j}}{(\sum_{\ell=1}^{N}f_{\ell})^{2}}$ . We set $\alpha=1.25$ , $N=1,000$ , and $M=2,500$ . The largest connected component of the generated network had $N=949$ , $M=2,496$ , $\overline{k}=5.26$ , and CV of the degree equal to $0.84$ .
LFR: The LFR benchmark model produces networks that have both heterogeneous degree distributions and community structure with heterogeneous community sizes [72]. We set the expected power-law exponents to $-3$ and $-1.5$ for the degree distribution and the distribution of community sizes, respectively. We set the probability of connecting nodes between communities to 0.1, the expected average degree to 4, and the minimum community size to 20. The final network had $N=998$ , $M=1,988$ , $\overline{k}=3.98$ , and CV of the degree equal to $0.66$ .

Appendix S2 Approximation error as a function of $n$

In the main text (section II.2), we showed that the degree distributions of optimized node sets remain distinct from those of completely random node sets up to $n=12$ for the dolphin and BA networks. Figure S6 indicates that the approximation errors of optimized node sets also remain distinct from and lower than those of completely random node sets. We note that the approximation error of both optimized and completely random node sets decreases as $n$ increases, which is expected.

Appendix S3 Distribution of approximation errors for various networks

In Figs. 2(a) and (b) in the main text, we demonstrated that the state averaged over the nodes in the optimized node set can closely approximate the state averaged over all the nodes in the coupled double-well dynamics on the dolphin and BA networks, respectively. We show the corresponding results for all ten networks in Fig. S7, as well as for the mutualistic interaction, SIS, and gene-regulatory dynamics in Figs. S8, S9, and S10, respectively. Figures S7(a) and (g) are identical to Figs. 2(a) and (b), respectively.

Across all dynamics and networks, the optimized node sets have relatively small approximation error. However, the relationship between degree-preserving and completely random node sets differs. In some networks, the difference between the degree-preserving and completely random node sets is large, such as for the dolphin network (Fig. S7a). However, it is not the case for the road network (Fig. S7d) and the LFR network (Fig. S7j). This latter result may be because the degree sequence may not contain sufficient information to characterize these networks: the road network has a complicated community structure imposed by geography [64], and the LFR network has a marked community structure by design [72]. Either of these cases may make particular nodes more suitable or less so for inclusion in $S$ for reasons other than the node’s degree. In addition, the road network has a narrow degree distribution, probably making the degree a less important indicator of a node’s importance in representing the dynamics of the entire network. Even in these cases, our algorithm finds node sets with relatively low approximation error.

Appendix S4 Statistical results for the effect of node set type on approximation error

To systematically assess the performance of our optimization algorithm across different dynamics and networks, we generated 100 optimized node sets by running the optimization algorithm 100 times, for each combination of dynamics and network. For comparison, we also generated 100 degree-preserving node sets, one per optimized node set, 100 completely random node sets for each network, and 100 node sets from each heuristic algorithm introduced in section S6 ( $k$ -constrained, $k$ -quantiled, $k_{\rm nn}$ -constrained, and community-based) for each network. Note that neither the completely random nor heuristic algorithm node sets depend on the dynamics.

We computed the approximation error for each node set obtained by each of the seven algorithms, each of the four dynamics, and each of the ten networks. To compare the approximation error, we conducted a multi-way analysis of variance (ANOVA) with three independent variables, i.e., dynamics (reference: coupled double-well dynamics), network (reference: BA network), and node set type (reference: completely random node sets). We exclude the ER network because all node sets achieve small approximation errors on this network, probably due to limited heterogeneity in the degree distribution. The dependent, or response, variable in the model is the approximation error, $\varepsilon$ . We use $\ln\varepsilon$ because tiny (i.e., near zero) error values are present; the error values obey skewed distributions for most dynamics, networks, and node set types; and the variance of $\varepsilon$ tends to be large when the mean of $\varepsilon$ is large.

Overall, our ANOVA model predicts $\ln\varepsilon$ well ( $R^{2}=0.70$ ). In part due to the large sample size (i.e., 25,200 observations), all three independent variables are highly significant (dynamics: $df=3$ , $F=7017.89$ , $p<10^{-7}$ , where the $F$ statistic is the mean-square error of the factor divided by the mean-square error of the residuals; network: $df=8$ , $F=175.08$ , $p<10^{-7}$ ; node set type: $df=6$ , $F=6091.11$ , $p<10^{-7}$ ).

We show the differences between the average error for each pair of node set types, as computed by a Tukey’s honestly significant difference test, in Table S1. Each row of Table S1 specifies the estimated difference in average $\ln\varepsilon$ between the two node set types, the 95% confidence interval of the estimated mean difference, and a $p$ -value for the difference adjusted for multiple comparisons. For example, the first row states that, across dynamics and networks, optimized node sets have an average $\ln\varepsilon$ value that is 5.625 smaller than that of completely random node sets. On the natural scale, the average $\varepsilon$ for optimized node sets is $1/e^{-5.625}=277.3$ times smaller than the average $\varepsilon$ for completely random node sets.

Table S1: Differences in average

\ln\varepsilon

between different node set types when we use the test dynamics for optimization. The differences are computed by a Tukey’s honestly significant difference test. The 95% confidence intervals (CI) of the differences and associated

p

-values, adjusted for multiple comparisons, are also shown.

	Difference	CI	$p$
Optimized $-$ Random	$-5.625$	$[-5.737,-5.512]$	$<10^{-7}$
Degree-preserving $-$ Random	$-2.345$	$[-2.457,-2.233]$	$<10^{-7}$
$k$ -constrained $-$ Random	$-0.170$	$-[0.282,-0.058]$	$1.62\times 10^{-4}$
$k$ -quantiled $-$ Random	$-0.138$	$[-0.250,-0.025]$	$5.59\times 10^{-3}$
$k_{\rm nn}$ -constrained $-$ Random	$-0.262$	$[-0.375,-0.150]$	$<10^{-7}$
Community-based $-$ Random	$-0.157$	$[-0.269,-0.045]$	$7.40\times 10^{-4}$
Degree-preserving $-$ Optimized	$3.280$	$[3.168,3.392]$	$<10^{-7}$
$k$ -constrained $-$ Optimized	$5.455$	$[5.343,5.567]$	$<10^{-7}$
$k$ -quantiled $-$ Optimized	$5.487$	$[5.375,5.599]$	$<10^{-7}$
$k_{\rm nn}$ -constrained $-$ Optimized	$5.362$	$[5.250,5.474]$	$<10^{-7}$
Community-based $-$ Optimized	$5.468$	$[5.356,5.580]$	$<10^{-7}$
$k$ -constrained $-$ Degree-preserving	$2.175$	$[2.063,2.287]$	$<10^{-7}$
$k$ -quantiled $-$ Degree-preserving	$2.207$	$[2.095,2.320]$	$<10^{-7}$
$k_{\rm nn}$ -constrained $-$ Degree-preserving	$2.082$	$[1.970,2.195]$	$<10^{-7}$
Community-based $-$ Degree-preserving	$2.188$	$[2.076,2.300]$	$<10^{-7}$
$k$ -quantiled $-$ $k$ -constrained	$0.032$	$[-0.080,0.145]$	$0.979$
$k_{\rm nn}$ -constrained $-$ $k$ -constrained	$-0.092$	$[-0.205,0.020]$	$0.186$
Community-based $-$ $k$ -constrained	$0.013$	$[-0.099,0.125]$	$0.999$
$k_{\rm nn}$ -constrained $-$ $k$ -quantiled	$-0.125$	$[-0.237,-0.013]$	$0.018$
Community-based $-$ $k$ -quantiled	$-0.019$	$[-0.132,0.093]$	$0.999$
Community-based $-$ $k_{\rm nn}$ -constrained	$0.106$	$[-0.007,0.218]$	$0.081$

The average $\ln\varepsilon$ of any node set type is smaller than that of completely random node sets (1st–6th rows of Table S1). In fact, optimized node sets have the smallest error (1st and 7th–11th rows), followed by the degree-preserving node sets (2nd, 7th, and 12th–15th rows), followed by the node sets generated by the heuristic algorithms (remaining rows). The node sets generated by the heuristic algorithms are in general poorly differentiated within them (last six rows of Table S1).

Appendix S5 Kullback-Leibler divergence

We measure how different a sampled degree distribution is from the original degree distribution of the network with the Kullback-Leibler divergence, $D_{\rm KL}$ . The Kullback-Leibler divergence is defined as

D_{\rm KL}(Q\|P)=\sum_{k\in\mathcal{K}}Q(k)\log\left[\frac{Q(k)}{P(k)}\right],

(S10)

where $P(k)$ is the degree distribution of the original network (i.e., the proportion of nodes having degree $k$ ), $Q(k)$ is the degree distribution based on the set of sampled nodes, and $\mathcal{K}$ is the set of unique degree values in the original network. To compute $Q(k)$ , we first generated 100 optimized node sets with a given $n$ . Then, we calculated the proportion of nodes, across the 100 node sets, which had each degree in $\mathcal{K}$ .

Appendix S6 Effects of structural features on the inclusion of nodes in optimized node sets

In the main text, we analyzed the degree distributions of optimized sentinel node sets in comparison with the degree distribution of completely randomly node sets. Other structural features of the network may also play roles in making nodes suitable for inclusion in optimized node sets. In this section, we investigate three such features.

S6.1 Effects of three structural features on the approximation error

First, we examined the effect of the average nearest neighbor degrees of the nodes in the optimized node sets. The average nearest neighbor degree of an $i$ th node is given by

k_{{\rm nn},i}=\frac{1}{k_{i}}\sum_{j=1;j\in\mathcal{N}_{i}}^{N}k_{j}

(S11)

where $\mathcal{N}_{i}$ is the set of the $k_{i}$ neighbors of the $i$ th node. We compute the average of $k_{{\rm nn},i}$ over the $n$ nodes in the node set $S$ by

k_{\rm nn}=\frac{1}{n}\sum_{i=1;i\in S}^{N}k_{{\rm nn},i}

(S12)

for each optimized and completely random node set. We show in Fig. S11 the distribution of $k_{\rm nn}$ in the optimized and completely random node sets for the coupled double-well dynamics on the ten networks used in Fig. S7. The figure suggests that nodes in optimized node sets tend to have larger, or at least not smaller, $k_{\rm nn}$ than completely random node sets on average. Qualitatively, this observation suggests that our algorithm prefers nodes that “see” more of the network, i.e., nodes with neighbors receiving input from a larger portion of the network.

Second, we examined the effect of the local clustering coefficient. The local clustering coefficient of an $i$ th node is given by [2]

C_{i}=\frac{\text{number of }i\text{'s neighbors that are adjacent to each % other}}{\frac{1}{2}k_{i}(k_{i}-1)}.

(S13)

We computed the average of $C_{i}$ over the $n$ nodes in node set $S$ by

C_{\text{local}}=\frac{1}{n}\sum_{i=1;i\in S}^{N}C_{i}

(S14)

for each optimized and completely random node set. We show the distribution of $C_{\text{local}}$ in the optimized and completely random node sets for the coupled double-well dynamics on various networks in Fig. S12. The figure indicates that $C_{\text{local}}$ does not much differ between optimized and completely random node sets.

Third, we hypothesized that nodes in optimized node sets are more likely to come from different communities than those in completely random node sets. This hypothesis is intuitive because nodes in the same community are expected to show relatively similar dynamics [73, 10, 74, 14], and therefore it is probably redundant to devote multiple sentinel nodes to a single community when one can only use a small number of sentinel nodes. To test the hypothesis, we confined ourselves to six networks, i.e., the five empirical networks and the LFR network, which have relatively strong community structure. We obtained a partition of each network into communities using the Louvain algorithm [75]. Then, for each node set, we counted the number of nodes in the same community and recorded the maximum such number over all the communities, which we denote by $K$ . In Fig. S13, we show the distribution of $K$ for the optimized and completely random node sets. We find that the optimized node sets seem to have a somewhat smaller number of sentinel nodes coming from the same community (thus, smaller $K$ ) than the randomized node sets, supporting our claim, albeit weakly in appearance.

To formalize these observations, we attempted to classify each node set as either optimized or completely random based only on structural features of the node set. A successful classification would imply that one can likely construct a high-performance node set by relying on these node quantities and avoid running our optimization algorithm. We used logistic regression for this classification task. Specifically, we use a generalized linear model with binomial errors and a logit link function in which the dependent variable is the binary outcome of a trial (i.e., either a node set is completely random, which is the reference, or it is optimized). The independent variables are the simulation conditions (i.e., dynamics and network); $k$ , the average degree of the $n$ nodes in the node set; $k_{\rm nn}$ , the average nearest neighbor degree (i.e., $k_{{\rm nn},i}$ ) averaged over the $n$ nodes (see Eqs. (S11) and (S12)); $C_{\text{local}}$ , the average local clustering coefficient of the $n$ nodes (see Eqs. (S13) and (S14)); and $K$ , the maximum number of nodes in a node set assigned to the same community. We analyzed the same number of node sets of each type: 100 optimized node sets for each combination of dynamics and network, and an equal number of completely random node sets (i.e., 400 random node sets for each network). We again exclude the ER network because of its relatively homogeneous degree distribution.

We show the results of this analysis in Table S2. We find that the classification is unsuccessful; our model explains less than 1% of the variance in classification of a node set as random or optimized (McFadden’s pseudo- $R^{2}=0.008$ ). However, the coefficients on each node quantity are statistically significant and consistent with our analysis above: optimized node sets are associated with smaller $k$ (model coefficient $b=-0.055$ , $p=3.62\times 10^{-6}$ ), larger $k_{\rm nn}$ ( $b=0.023$ , $p=1.19\times 10^{-4}$ ), smaller $C_{\text{local}}$ ( $b=-1.10$ , $p=1.73\times 10^{-4}$ ), and smaller $K$ ( $b=-0.228$ , $p<\times 10^{-7}$ ). Note that the coefficient on $C_{\text{local}}$ is the least extreme among the four ( $|z|=3.76$ ), consistent with our analysis above, while the $p$ -value is small due to a large sample size. Additionally, the controls for dynamics in this logistic regression are not significant, suggesting that the composition of optimized node sets may be more strongly related to network structure than dynamics. In sum, our model supports that, across dynamics and networks, optimized node sets avoid large-degree nodes, include nodes with better connected neighbors which do not in turn connect with each other, and avoid nodes that come from the same community. However, all these effects are weak.

Table S2: Regression results in search for features of the optimized node set. The dependent variable is whether the node set is completely random (the reference) or optimized. The model is a logistic regression, i.e., a generalized linear model with binomial errors and a logit link function. SE: standard error,

z

: coefficient estimate divided by the standard error.

	Estimate	SE	$z$	$p$
Intercept	$0.865$	$0.159$	$5.44$	$5.21\times 10^{-8}$
Dynamics
Mutualistic species	$0.005$	$0.067$	$0.07$	$0.943$
SIS	$-0.018$	$0.067$	$-0.27$	$0.787$
Gene-regulatory	$0.004$	$0.067$	$0.06$	$0.950$
Network
Proximity	$0.516$	$0.162$	$3.18$	$0.001$
Metabolic	$-0.391$	$0.294$	$-1.33$	$0.183$
Road	$-0.429$	$0.137$	$-3.13$	$0.002$
Email	$0.017$	$0.133$	$0.13$	$0.897$
BA	$-0.529$	$0.141$	$-3.75$	$1.76\times 10^{-4}$
HK	$-0.450$	$0.127$	$-3.54$	$4.07\times 10^{-4}$
GKK	$-0.363$	$0.134$	$-2.72$	$0.007$
LFR	$-0.294$	$0.111$	$-2.65$	$0.008$
$k$	$-0.055$	$0.012$	$-4.63$	$3.62\times 10^{-6}$
$k_{\rm nn}$	$0.023$	$0.006$	$3.85$	$1.19\times 10^{-4}$
$C_{\text{local}}$	$-1.100$	$0.293$	$-3.76$	$1.73\times 10^{-4}$
$K$	$-0.228$	$0.040$	$-5.73$	$<10^{-7}$
Null deviance	9981	$df$ : 7199
Residual deviance	9903	$df$ : 7184
Pseudo- $R^{2}$	0.008

S6.2 Heuristic algorithms for node set selection

To explore the possibility of using the information obtained from these analyses to eliminate the need for our optimization algorithm, we designed four heuristic algorithms for node set selection based on our observations. These algorithms do not depend on the dynamics and only use the information on the network structure to different extents. We avoided algorithms using $C_{\text{local}}$ because, as we showed with Fig. S12, its distribution was apparently indifferent between the optimized and completely random node sets for at least some networks.

For the first algorithm, which we call “ $k$ -constrained,” we reject the largest 5% of nodes in terms of degree and select $n$ nodes uniformly at random without replacement from the remaining 95% of nodes. We also reject the 5% largest-degree nodes before selecting $n$ nodes without replacement in each of the following algorithms.

In the “ $k$ -quantiled” algorithm, we select one node uniformly at random from each of $n$ divisons of the degree distribution. For example, if $n=4$ , then we select one node from the smallest 25% of nodes in terms of degree, one node from the 25th–50th percent of nodes, and so on, among the 95% of nodes with the smallest degrees.

In the “ $k_{\rm nn}$ -constrained” algorithm, we reject the bottom 5% of nodes in terms of $k_{\rm nn}$ and select nodes uniformly at random and without replacement from the remaining nodes.

Finally, the “community-based” algorithm runs as follows. First, we partition the network into communities using the Louvain algorithm [75]. Then, we select nodes softly avoiding multiple nodes from the same community. Specifically, we select the $i$ th community with probability $\sqrt{c_{i}}/\sum_{j=1}^{n_{\text{c}}}\sqrt{c_{j}}$ , where $c_{i}$ is the number of nodes in the $i$ th community and $n_{\text{c}}$ is the number of communities in the network, then select one node uniformly at random without replacement from the $i$ th community. We repeat this procedure $n$ times. Note that the unbiased sampling of nodes would select each community with probability $c_{i}/\sum_{j=1}^{n_{\text{c}}}c_{j}$ . Therefore, nodes in large communities are underrepresented in our community-based node set such that selection of multiple nodes from the same community is discouraged.

We found that selecting node sets according to these algorithms yielded smaller approximation error than with completely random node sets. Specifically, the $k$ -constrained, $k$ -quantiled, $k_{\rm nn}$ -constrained, and community-based node sets have 1.19, 1.15, 1.30, and 1.17 times smaller approximation error, respectively, than the completely random node sets (see Table S1). This result verifies that the structural properties that our optimization algorithm emphasizes, i.e., degree, average nearest-neighbor degree, and community membership, characterize good sentinel nodes. However, their contribution to suppressing the approximation error is modest. In contrast, one can achieve much smaller approximation error by actually running our optimization algorithm, suggesting that our optimization algorithm determines efficient sentinel node sets by exploiting more complex information on the given network than commonly known properties such as the node degrees and community structure. These observations also hold true when we do not know the test dynamics (Table S3) and for weighted (Tables S6 and S7) and directed (Tables S8 and S9) networks.

Appendix S7 Evaluating approximation error on an alternative dynamics

We have assumed that we can use the equations of the actual dynamical system in order to optimize node set selection. This condition is unlikely to hold in practice. In this section, we assess the consequences of optimizing with a proposed, or training, dynamics that is not the actual, or test, dynamics. Our procedure works as follows. For a given network, we pretend that we do not know the test dynamics (e.g., SIS dynamics) and therefore run a training dynamics (e.g., coupled double-well dynamics). Then, we analyze the approximation error obtained by the node sets we optimized with the training dynamics when the actual dynamics are the test dynamics (see Figure 5a in the main text).

We followed this procedure for each combination of training dynamics, test dynamics, network, and node set type. Specifically, for each network, we used the 100 node sets optimized on the training dynamics (e.g., coupled double-well dynamics) to calculate approximation errors on each test dynamics (i.e., SIS dynamics, gene-regulatory dynamics, and mutualistic species dynamics if the training dynamics is the coupled double-well dynamics). We carry out this procedure for each pair of different training and test dynamics, each network, and each node set type. Note that only the optimized and degree-preserving node sets use information on the dynamics. Therefore, there are 100 optimized and degree-preserving node sets for each pair of training dynamics and network, but 100 node sets for each network for each of the other five types of node set. As in section S4, we use $\ln\varepsilon$ as the dependent variable and built a generalized linear model with Gaussian errors and an identity link function. We excluded data from the ER network for the same reason as that stated in section S4.

We analyzed the obtained $\ln\varepsilon$ values by a multi-way ANOVA as in section S4. Even without knowledge of the test dynamics, our model explains a substantial portion of the variance in $\ln\varepsilon$ ( $R^{2}=0.58$ ), although it is less than when we do know the test dynamics (see section S4). Each of the four independent variables is highly significant (training dynamics: $df=3$ , $F=2980.2$ , $p<10^{-7}$ ; test dynamics: $df=3$ , $F=24920.9$ , $p<10^{-7}$ ; network: $df=8$ , $F=643.8$ , $p<10^{-7}$ ; node set type: $df=6$ , $F=2643.2$ , $p<10^{-7}$ ). Note that the large number (i.e., $75,600$ ) of observations in part explains the small $p$ -values.

Table S3 shows the differences in average $\ln\varepsilon$ , estimated by Tukey’s honestly significant difference test. Despite not knowing the test dynamics, optimized node sets are still associated with the smallest $\varepsilon$ , followed by the degree-preserving, heuristic algorithms, and completely random node sets. Although the difference between several pairs of the four heuristic algorithms in terms of $\ln\varepsilon$ is statistically significant, the magnitude of the differences among them is small.

Table S3: Differences in average

\ln\varepsilon

between different node set types when we do not know the test dynamics. The differences are computed by a Tukey’s honestly significant difference test. The 95% confidence intervals (CI) of the differences and associated

p

-values, adjusted for multiple comparisons, are also shown.

	Difference	CI	$p$
Optimized $-$ Random	$-1.748$	$[-1.804,-1.692]$	$<10^{-7}$
Degree-preserving $-$ Random	$-1.310$	$[-1.366,-1.254]$	$<10^{-7}$
$k$ -constrained $-$ Random	$-0.170$	$[-0.226,-0.114]$	$<10^{-7}$
$k$ -quantiled $-$ Random	$-0.138$	$[-0.194,-0.082]$	$<10^{-7}$
$k_{\rm nn}$ -constrained $-$ Random	$-0.262$	$[-0.319,-0.206]$	$<10^{-7}$
Community-based $-$ Random	$-0.157$	$[-0.213,-0.101]$	$<10^{-7}$
Degree-preserving $-$ Optimized	$0.438$	$[0.381,0.494]$	$<10^{-7}$
$k$ -constrained $-$ Optimized	$1.578$	$[1.522,1.634]$	$<10^{-7}$
$k$ -quantiled $-$ Optimized	$1.610$	$[1.554,1.666]$	$<10^{-7}$
$k_{\rm nn}$ -constrained $-$ Optimized	$1.485$	$[1.429,1.541]$	$<10^{-7}$
Community-based $-$ Optimized	$1.591$	$[1.535,1.647]$	$<10^{-7}$
$k$ -constrained $-$ Degree-preserving	$1.140$	$[1.084,1.196]$	$<10^{-7}$
$k$ -quantiled $-$ Degree-preserving	$1.173$	$[1.117,1.229]$	$<10^{-7}$
$k_{\rm nn}$ -constrained $-$ Degree-preserving	$1.048$	$[0.992,1.104]$	$<10^{-7}$
Community-based $-$ Degree-preserving	$1.153$	$[1.097,1.209]$	$<10^{-7}$
$k$ -quantiled $-$ $k$ -constrained	$0.032$	$[-0.024,0.088]$	$0.613$
$k_{\rm nn}$ -constrained $-$ $k$ -constrained	$-0.092$	$[-0.149,-0.036]$	$2.37\times 10^{-5}$
Community-based $-$ $k$ -constrained	$0.013$	$[-0.043,0.069]$	$0.993$
$k_{\rm nn}$ -constrained $-$ $k$ -quantiled	$-0.125$	$[-0.181,-0.069]$	$<10^{-7}$
Community-based $-$ $k$ -quantiled	$-0.019$	$[-0.075,0.037]$	$0.950$
Community-based $-$ $k_{\rm nn}$ -constrained	$0.106$	$[0.049,0.162]$	$6\times 10^{-7}$

Appendix S8 Optimizing node weights

In this section, we analyze the effect of optimizing node weights in additional to optimizing node selection.

S8.1 An example

In Fig. 1in the main text, we demonstrate our algorithm on sentinel node sets of size $n\in\{1,2,3,4\}$ for the coupled double-well dynamics on the dolphin network. In Fig. S14, we show a similar example when we also optimize node weights. When we use $n\in\{2,3,4\}$ , the additional optimization step—that is, optimizing node weights in addition to the combinatorial optimization—allows us to even more closely approximate the averaged network activity, $\overline{x}$ .

S8.2 Effect of node set type on approximation error

To quantify performances of the optimized node sets in which we also optimize the node weights, we built an ANOVA model using $\ln\varepsilon$ as the dependent variable, as we did in section S4. We generated 100 weight-optimized node sets for each pair of dynamics and network. For comparison, we use the same completely random and combinatorially optimized node sets (i.e., without optimizing node weights) from section S4. The three independent variables were dynamics, network, and node set type (random, optimized, and weight-optimized). All three independent variables were significant (dynamics: $df=3$ , $F=5436.06$ , $p<10^{-7}$ ; network: $df=8$ , $F=155.76$ , $p<10^{-7}$ ; node set type: $df=2$ , $F=12362.80$ , $p<10^{-7}$ ).

We computed the differences between the three node set types with Tukey’s honestly significant difference test. We show the results in Table S4. As described in the main text, the average $\ln\varepsilon$ for the weight-optimized node sets is small, i.e., $1/e^{-3.832}=46.16$ times smaller than the optimized node sets without the additional weight optimization step.

Table S4: Differences between node set types in terms of average

\ln\varepsilon

when we optimize node weights and know the test dynamics. CI: confidence interval.

	Difference	CI	$p$
Optimized $-$ Random	$-5.625$	$[-5.767,-5.483]$	$<10^{-7}$
Weight-optimized $-$ Random	$-9.456$	$[-9.598,-9.315]$	$<10^{-7}$
Weight-optimized $-$ Optimized	$-3.832$	$[-3.973,-3.690]$	$<10^{-7}$

S8.3 Transfer learning

In this section, we investigate performances of weight-optimized node sets when one does not know the test dynamics. As in section S7, we generated 100 weight-optimized node sets optimized with a training dynamics and evaluated the approximation error of those node sets on a test dynamics. We did this for each combination of training dynamics, test dynamics, and network. For comparison, we include the completely random and optimized node sets generated in the analysis in section S7.

All of the independent variables of the constructed ANOVA model are significant (training dynamics: $df=3$ , $F=1353.61$ , $p<10^{-7}$ ; test dynamics: $df=3$ , $F=11278.18$ , $p<10^{-7}$ ; network: $df=8$ , $F=285.22$ , $p<10^{-7}$ ; node set type: $df=2$ , $F=5171.33$ , $p<10^{-7}$ ), and the ANOVA model fits the data well ( $R^{2}=0.61$ ). As reported in the main text, there is no significant difference between weight-optimized node sets and our standard optimized node sets (i.e., without node-weight optimization) when we optimize on a dynamics that is not the test dynamics ( $p=0.591$ ).

Table S5: Differences between node set types in terms of average

\ln\varepsilon

when we optimize node weights and do not know the test dynamics. CI: confidence interval.

	Difference	CI	$p$
Optimized $-$ Random	$-1.748$	$[-1.794,-1.701]$	$<10^{-7}$
Weight-optimized $-$ Random	$-1.728$	$[-1.775,-1.682]$	$<10^{-7}$
Weight-optimized $-$ Optimized	$0.019$	$[-0.027,0.066]$	$0.591$

Appendix S9 Sentinel node set optimization on weighted networks

In the main text, we focused on unweighted networks (i.e., networks in which all the edges have the same weight equal to $1$ ). In this section, we show that our sentinel node approximation also performs well for weighted networks.

We collected ten weighted networks from online repositories. From the KONECT project [61], we used a dominance network of Japanese macaques [76] ( $N=62$ , $M=1167$ ), a friendship network of high school students in Illinois, USA [77] ( $N=70$ , $M=274$ ), a friendship network of Australian university students [78] ( $N=217$ , $M=1839$ ), and the proximity network from the main text but with edge weights retained [62] ( $N=410$ , $M=2765$ ). From the Netzschleuder repository [79], we used a network of interpersonal contacts between windsurfers [80] ( $N=43$ , $M=336$ ), a network of contacts between individuals involved in the train bombing in 2004 in Madrid, Spain [81] ( $N=64$ , $M=243$ ), a network of drug interactions gathered from health records in Blumenau, Brazil [82] ( $N=75$ , $M=181$ ), a coauthorship network [83] ( $N=379$ , $M=914$ )), a neuronal network of Caenorhabditis elegans [84] ( $N=460$ , $M=1432$ ), and a product export network [85] ( $N=774$ , $M=1779$ ). As we did for unweighted networks in the main text, we coerced each network to be undirected, discarded any self-edges, and analyzed only the largest connected component.

As before, we simulated each dynamics on each network, using the parameters described in the main text, obtaining $x_{i,\ell}^{*}$ $\forall\ell\in\{1\ldots L\}$ . We then selected 100 node sets of size $n=\lfloor\ln N\rfloor$ of each type.

We show the approximation error for the coupled double-well dynamics on the ten weighted networks in Fig. S15. As was the case for unweighted networks, our optimization algorithm consistently (i.e., over 100 independent algorithm runs and for each network) finds sentinel node sets with lower approximation error than degree-preserving and completely random node sets.

To systematically verify that the sentinel node approximation performs better than the other node set selection methods, we ran the same statistical analysis as the one we used for unweighted networks (see section S4). The ANOVA results were similar to those for unweighted networks (model $R^{2}=0.73$ ; dynamics: $df=3$ , $F=10300$ , $p<10^{-7}$ ; network: $df=9$ , $F=966.3$ , $p<10^{-7}$ ; node set type: $df=6$ , $F=6243$ , $p<10^{-7}$ ). The optimized node sets had an average approximation error that was 960 times smaller than completely random node sets ( $b=-6.867$ , $e^{b}=0.001$ , $p<10^{-7}$ ) and 262 times smaller than degree-preserving node sets (difference in coefficients: $-5.569$ , $e^{b}=0.004$ , $p<10^{-7}$ ); see Table S6. We note that the ordering of node set types in terms of approximation error is similar to that for unweighted networks, except that there is less difference between the heuristic node sets and the completely random node sets in the case of weighted networks. For example, both the $k$ -constrained and community-based node sets are not significantly different from the completely random node sets.

Table S6: Differences between node set types in terms of average

\ln\varepsilon

when the network has edge weights and we know the test dynamics. CI: confidence interval.

	Difference	CI	$p$
Optimized $-$ Random	$-6.867$	$[-6.999,-6.735]$	$<10^{-7}$
Degree-preserving $-$ Random	$-1.298$	$[-1.430,-1.166]$	$<10^{-7}$
$k$ -constrained $-$ Random	$-0.093$	$[-0.225,0.039]$	$0.370$
$k$ -quantiled $-$ Random	$-0.407$	$[-0.539,-0.275]$	$<10^{-7}$
$k_{\rm nn}$ -constrained $-$ Random	$-0.196$	$[-0.328,-0.064]$	$2.36\times 10^{-4}$
Community-based $-$ Random	$-0.100$	$[-0.232,0.032]$	$0.283$
Degree-preserving $-$ Optimized	$5.569$	$[5.437,5.701]$	$<10^{-7}$
$k$ -constrained $-$ Optimized	$6.774$	$[6.642,6.906]$	$<10^{-7}$
$k$ -quantiled $-$ Optimized	$6.460$	$[6.328,6.592]$	$<10^{-7}$
$k_{\rm nn}$ -constrained $-$ Optimized	$6.670$	$[6.538,6.802]$	$<10^{-7}$
Community-based $-$ Optimized	$6.767$	$[6.635,6.899]$	$<10^{-7}$
$k$ -constrained $-$ Degree-preserving	$1.205$	$[1.073,1.337]$	$<10^{-7}$
$k$ -quantiled $-$ Degree-preserving	$0.891$	$[0.759,1.023]$	$<10^{-7}$
$k_{\rm nn}$ -constrained $-$ Degree-preserving	$1.101$	$[0.969,1.233]$	$<10^{-7}$
Community-based $-$ Degree-preserving	$1.198$	$[1.066,1.330]$	$<10^{-7}$
$k$ -quantiled $-$ $k$ -constrained	$-0.314$	$[-0.446,-0.182]$	$<10^{-7}$
$k_{\rm nn}$ -constrained $-$ $k$ -constrained	$-0.104$	$[-0.236,0.028]$	$0.238$
Community-based $-$ $k$ -constrained	$-0.007$	$[-0.139,0.125]$	$>0.999$
$k_{\rm nn}$ -constrained $-$ $k$ -quantiled	$0.210$	$[0.078,0.343]$	$5.41\times 10^{-5}$
Community-based $-$ $k$ -quantiled	$0.307$	$[0.175,0.439]$	$<10^{-7}$
Community-based $-$ $k_{\rm nn}$ -constrained	$0.097$	$[-0.035,0.229]$	$0.318$

The performance of our algorithm on weighted networks is also similar to that on unweighted networks when we do not know the test dynamics. In fact, by running the same analysis as that for the unweighted networks (see section S7), we find similar ANOVA results ( $R^{2}=0.68$ ; training dynamics: $df=3$ , $F=4440$ , $p<10^{-7}$ ; test dynamics: $df=3$ , $F=38634$ , $p<10^{-7}$ ; network: $df=9$ , $F=3490$ , $p<10^{-7}$ ; node set type: $df=6$ , $F=2391$ , $p<10^{-7}$ ). The average approximation error for the optimized node sets is 9.67 times smaller than for completely random node sets ( $b=-2.269$ , $e^{b}=0.096$ , $p<10^{-7}$ ; Table S7) and 3.66 times smaller than degree-preserving node sets (difference in coefficients: $-1.297$ , $e^{b}=0.273$ , $p<10^{-7}$ ).

Table S7: Differences between node set types in terms of average

\ln\varepsilon

when the network has edge weights and we do not know the test dynamics. CI: confidence interval.

	Difference	CI	$p$
Optimized $-$ Random	$-2.269$	$[-2.339,-2.199]$	$<10^{-7}$
Degree-preserving $-$ Random	$-0.972$	$[-1.042,-0.903]$	$<10^{-7}$
$k$ -constrained $-$ Random	$-0.093$	$[-0.162,-0.023]$	$1.63\times 10^{-3}$
$k$ -quantiled $-$ Random	$-0.407$	$[-0.476,-0.337]$	$<10^{-7}$
$k_{\rm nn}$ -constrained $-$ Random	$-0.196$	$[-0.266,-0.127]$	$<10^{-7}$
Community-based $-$ Random	$-0.100$	$[-0.169,-0.030]$	$4.79\times 10^{-4}$
Degree-preserving $-$ Optimized	$1.297$	$[1.227,1.366]$	$<10^{-7}$
$k$ -constrained $-$ Optimized	$2.176$	$[2.107,2.246]$	$<10^{-7}$
$k$ -quantiled $-$ Optimized	$1.862$	$[1.793,1.932]$	$<10^{-7}$
$k_{\rm nn}$ -constrained $-$ Optimized	$2.073$	$[2.003,2.142]$	$<10^{-7}$
Community-based $-$ Optimized	$2.169$	$[2.100,2.239]$	$<10^{-7}$
$k$ -constrained $-$ Degree-preserving	$0.879$	$[0.810,0.949]$	$<10^{-7}$
$k$ -quantiled $-$ Degree-preserving	$0.565$	$[0.496,0.635]$	$<10^{-7}$
$k_{\rm nn}$ -constrained $-$ Degree-preserving	$0.776$	$[0.706,0.845]$	$<10^{-7}$
Community-based $-$ Degree-preserving	$0.873$	$[0.803,0.942]$	$<10^{-7}$
$k$ -quantiled $-$ $k$ -constrained	$-0.314$	$[-0.384,-0.245]$	$<10^{-7}$
$k_{\rm nn}$ -constrained $-$ $k$ -constrained	$-0.104$	$[-0.173,-0.034]$	$2.24\times 10^{-4}$
Community-based $-$ $k$ -constrained	$-0.007$	$[-0.076,0.063]$	$>0.999$
$k_{\rm nn}$ -constrained $-$ $k$ -quantiled	$0.210$	$[0.141,0.280]$	$<10^{-7}$
Community-based $-$ $k$ -quantiled	$0.307$	$[0.238,0.377]$	$<10^{-7}$
Community-based $-$ $k_{\rm nn}$ -constrained	$0.097$	$[0.027,0.166]$	$8.04\times 10^{-4}$

Appendix S10 Sentinel node set optimization on directed networks

In this section, we show that our sentinel node approximation also performs well for directed networks. For simplicity, we consider the largest weakly connected component of each network and do not consider networks with edge weights.

We collected ten directed networks from the Netzschleuder repository [79]: a freshwater trophic network [86] ( $N=109$ , $M=717$ ), a social network of physicians [87] ( $N=117$ , $M=542$ ), an email network from a manufacturing company [88] ( $N=167$ , $M=5783$ ), a dependency network of the Flamingo software [89] ( $N=228$ , $M=497$ ), a transcription network of the bacterium Escherichia coli [90] ( $N=328$ , $M=456$ ), a transcription network of the yeast Saccharomyces cerevisiae [91] ( $N=664$ , $M=1066$ ), a network of US air carrier flights in 2020 [92] ( $N=806$ , $M=11924$ ), a dependency network of the JUNG software [89] ( $N=879$ , $M=2051$ ), the directed version of the university email network from the main text [65] ( $N=1133$ , $M=10902$ ), and a network of air routes preferred by the US Federal Aviation Administration [93] ( $N=1226$ , $M=2613$ ). We discarded any self-loops and the multiplicity of any multi-edges.

We show the approximation error for the coupled double-well dynamics on the ten directed networks in Fig. S16. The figure indicates that our optimization algorithm reliably identifies node sets that obtain relatively small approximation error in each of the ten directed networks.

We verified our observations with an ANOVA (model $R^{2}=0.60$ ; dynamics: $df=3$ , $F=5048$ , $p<10^{-7}$ ; network: $df=9$ , $F=1925$ , $p<10^{-7}$ ; node set type: $df=6$ , $F=1541$ , $p<10^{-7}$ ). On average, the approximation error for optimized node sets was 211 times smaller than completely random node sets ( $b=-5.351$ , $e^{b}=0.005$ , $p<10^{-7}$ ) and 62.4 times smaller than degree-preserving node sets (difference in coefficients: $-4.134$ , $e^{b}=0.016$ , $p<10^{-7}$ ); see Table S8.

Table S8: Differences between node set types in terms of average

\ln\varepsilon

when the edges are directed and we know the test dynamics. CI: confidence interval.

	Difference	CI	$p$
Optimized $-$ Random	$-5.351$	$[-5.563,-5.139]$	$<10^{-7}$
Degree-preserving $-$ Random	$-1.217$	$[-1.429,-1.005]$	$<10^{-7}$
$k$ -constrained $-$ Random	$-0.081$	$[-0.293,0.131]$	0.921
$k$ -quantiled $-$ Random	$-0.362$	$[-0.574,-0.150]$	$1.00\times 10^{-5}$
$k_{\rm nn}$ -constrained $-$ Random	$-0.045$	$[-0.257,0.167]$	0.996
Community-based $-$ Random	$0.347$	$[0.134,0.561]$	$3.19\times 10^{-5}$
Degree-preserving $-$ Optimized	$4.134$	$[3.922,4.346]$	$<10^{-7}$
$k$ -constrained $-$ Optimized	$5.270$	$[5.058,5.482]$	$<10^{-7}$
$k$ -quantiled $-$ Optimized	$4.989$	$[4.777,5.201]$	$<10^{-7}$
$k_{\rm nn}$ -constrained $-$ Optimized	$5.306$	$[5.094,5.518]$	$<10^{-7}$
Community-based $-$ Optimized	$5.699$	$[5.486,5.912]$	$<10^{-7}$
$k$ -constrained $-$ Degree-preserving	$1.136$	$[0.924,1.348]$	$<10^{-7}$
$k$ -quantiled $-$ Degree-preserving	$0.855$	$[0.643,1.067]$	$<10^{-7}$
$k_{\rm nn}$ -constrained $-$ Degree-preserving	$1.172$	$[0.960,1.384]$	$<10^{-7}$
Community-based $-$ Degree-preserving	$1.564$	$[1.351,1.778]$	$<10^{-7}$
$k$ -quantiled $-$ $k$ -constrained	$-0.281$	$[-0.493,-0.069]$	$1.80\times 10^{-3}$
$k_{\rm nn}$ -constrained $-$ $k$ -constrained	$0.036$	- $[0.176,0.248]$	0.999
Community-based $-$ $k$ -constrained	$0.428$	$[0.215,0.641]$	$10^{-7}$
$k_{\rm nn}$ -constrained $-$ $k$ -quantiled	$0.317$	$[0.105,0.529]$	$2.15\times 10^{-4}$
Community-based $-$ $k$ -quantiled	$0.709$	$[0.496,0.922]$	$<10^{-7}$
Community-based $-$ $k_{\rm nn}$ -constrained	$0.393$	$[0.179,0.606]$	$1.2\times 10^{-6}$

The ANOVA for directed networks in the case when we do not know the test dynamics fits the data reasonably well (model $R^{2}=0.53$ ; training dynamics: $df=3$ , $F=1384$ , $p<10^{-7}$ ; test dynamics: $df=3$ , $F=10888$ , $p<10^{-7}$ ; network: $df=9$ , $F=6397$ , $p<10^{-7}$ ; node set type: $df=6$ , $F=261$ , $p<10^{-7}$ ). As is the case for undirected networks, the performance of optimized node sets is worse when the test dynamics is different from the training dynamics, but optimized node sets still outperform all other node sets. Optimized node sets had on average 3.02 times lower approximation error than completely random node sets ( $b=-1.104$ , $e^{b}=0.332$ , $p<10^{-7}$ ) and 1.81 times smaller error than degree-preserving node sets (difference in coefficients: $-0.595$ , $e^{b}=0.552$ , $p<10^{-7}$ ); see Table S9.

Table S9: Differences between node set types in terms of average

\ln\varepsilon

when the edges are directed and we do not know the test dynamics. CI: confidence interval.

	Difference	CI	$p$
Optimized $-$ Random	$-1.104$	$[-1.224,-0.984]$	$<10^{-7}$
Degree-preserving $-$ Random	$-0.508$	$[-0.628,-0.388]$	$<10^{-7}$
$k$ -constrained $-$ Random	$-0.081$	$[-0.201,0.039]$	0.422
$k$ -quantiled $-$ Random	$-0.362$	$[-0.482,-0.242]$	$<10^{-7}$
$k_{\rm nn}$ -constrained $-$ Random	$-0.045$	$[-0.165,0.075]$	0.925
Community-based $-$ Random	$0.351$	$[0.231,0.472]$	$<10^{-7}$
Degree-preserving $-$ Optimized	$0.595$	$[0.475,0.715]$	$<10^{-7}$
$k$ -constrained $-$ Optimized	$1.023$	$[0.903,1.143]$	$<10^{-7}$
$k$ -quantiled $-$ Optimized	$0.742$	$[0.622,0.862]$	$<10^{-7}$
$k_{\rm nn}$ -constrained $-$ Optimized	$1.058$	$[0.939,1.178]$	$<10^{-7}$
Community-based $-$ Optimized	$1.455$	$[1.334,1.575]$	$<10^{-7}$
$k$ -constrained $-$ Degree-preserving	$0.427$	$[0.308,0.547]$	$<10^{-7}$
$k$ -quantiled $-$ Degree-preserving	$0.146$	$[0.027,0.266]$	$5.86\times 10^{-3}$
$k_{\rm nn}$ -constrained $-$ Degree-preserving	$0.463$	$[0.343,0.583]$	$<10^{-7}$
Community-based $-$ Degree-preserving	$0.859$	$[0.739,0.980]$	$<10^{-7}$
$k$ -quantiled $-$ $k$ -constrained	$-0.281$	$[-0.401,-0.161]$	$<10^{-7}$
$k_{\rm nn}$ -constrained $-$ $k$ -constrained	$0.036$	$[-0.084,0.156]$	0.976
Community-based $-$ $k$ -constrained	$0.432$	$[0.311,0.552]$	$<10^{-7}$
$k_{\rm nn}$ -constrained $-$ $k$ -quantiled	$0.317$	$[0.197,0.437]$	$<10^{-7}$
Community-based $-$ $k$ -quantiled	$0.713$	$[0.592,0.834]$	$<10^{-7}$
Community-based $-$ $k_{\rm nn}$ -constrained	$0.396$	$[0.276,0.517]$	$<10^{-7}$