Exploring the space of graphs with fixed discrete curvatures
Abstract
Discrete curvatures are quantities associated to the nodes and edges of a graph that reflect the local geometry around them. These curvatures have a rich mathematical theory and they have recently found success as a tool to analyze networks across a wide range of domains. In this work, we consider the problem of constructing graphs with a prescribed set of discrete edge curvatures, and explore the space of such graphs. We address this problem in two ways: first, we develop an evolutionary algorithm to sample graphs with discrete curvatures close to a given set. We use this algorithm to explore how other network statistics vary when constrained by the discrete curvatures in the network. Second, we solve the exact reconstruction problem for the specific case of Forman–Ricci curvature. By leveraging the theory of Markov bases, we obtain a finite set of rewiring moves that connects the space of all graphs with a fixed discrete curvature.
Keywords: network geometry, discrete curvature, Markov bases, Markov chain Monte Carlo methods, evolutionary algorithm, edge-based network measures
1 Introduction
Networks are used ubiquitously as a tool to model, analyze and design complex systems in a wide range of domains of science and technology such as neuroscience [24], epidemiology [48], economics [28, 6], urban systems [4], energy infrastructure [47] and many more. Across these different disciplines, networks, graphs and their higher-order generalizations [35, 5, 8] provide a unifying mathematical language that allows to translate concepts and tools from one setting to another. Moreover, a series of observations at the turn of the 21st century found that the networks extracted from real-world systems often exhibit common features: heavy-tailed degree distributions [3], a “small world” structure [60] and rich mesoscale connectivity patterns such as communities [26]. This initiated an ongoing pursuit to find models and theories that explain the presence of these features, and develop new mathematical methods to detect, describe and quantify them. We point to [11, 58, 12] for a critical perspective on these observations.
Among the many approaches to study complex networks, this article is concerned with geometry and more specifically with discrete curvature as a tool to describe the structure of graphs; see [10, 62] for an overview of different notions of geometry associated with complex systems and their applications. We now explain in more detail the main topic of the article: discrete curvature.
Curvature is a concept from Riemannian geometry, the study of smooth objects such as surfaces equipped with a metric, and quantifies how much these objects differ locally from being flat. While classical definitions of curvature need the structure of a Riemannian manifold, more recent works have generalized these definitions so they can be applied to other metric spaces such as graphs; this results in discrete notions of curvature — see Section 2 for a concrete example.
Definitions of discrete curvature on graphs111We note that these definitions cover many types of discrete curvature such as scalar curvature and Ricci curvature. Furthermore, in many cases these discrete curvatures are defined for more general objects, with graphs arising as a special case. were given for instance by Forman based on Laplacian operators [23, 53, 32], by Ollivier and later Lin, Lu and Yau based on neighborhood overlap [44, 45, 36], by Bakry and Émery based on the Bochner identity [2], by Erbar and Maas based on a certain entropy functional [21], by Steinerberger based on solutions to an equilibrium problem [55], and by Devriendt and Lambiotte based on the effective resistance [14, 15]. Importantly, many of these discrete curvatures satisfy properties and results analogous to the powerful theorems from differential geometry; see for instance [37, 36, 7, 31, 39, 23, 38, 15, 55]. Of particular relevance to the central question of emergent structure and organization in complex systems, we note that many notions of discrete curvature satisfy powerful ‘local-to-global’ theorems, where local constraints on the curvature lead to statements on the global structure of the graph.
Aside from a rich mathematical theory, discrete curvature has also found successful applications as a tool to analyze and characterize the structure of complex real-world networks. It has been used for instance in the context of community detection [43, 27, 22], in the study of biological networks [51, 19], in characterizing connectivity patterns in brain networks [20], in studying market instabilities in financial networks [49], and to detect and address the problem of oversquashing in graph neural networks [57, 9].
In this article, we consider the following question: How to construct graphs with prescribed discrete curvatures? To our knowledge, this inverse problem has not yet been addressed for any notion of discrete curvature, and we consider it a basic methodological gap in trying to understand how the geometry of a graph or network influences its other features. As a first step in addressing this methodological gap, this article provides two different graph reconstruction approaches for specific choices of discrete curvatures. We furthermore explore, both experimentally and theoretically, the resulting space of graphs with fixed discrete curvatures.
In the first part, Section 3, we consider the problem of approximate reconstruction and develop a methodology that works for any notion of discrete curvature. We use a Markov chain Monte Carlo-type algorithm (MCMC) to iteratively construct graphs with discrete curvatures close to a target set of curvatures. We then use this algorithm to explore the ensemble of graphs with a given discrete curvature sequence in a number of experiments. For computational reasons, our discussion focuses on two simple notions of discrete curvature, Forman–Ricci and augmented Forman–Ricci curvature, but we have verified the methodology for other curvatures as well.
In the second part, Section 4, we take a more theoretical approach and consider the problem of exact reconstruction of graphs from a given target Forman–Ricci curvature sequence. In this specific case, the problem reduces to a combinatorial question involving the so-called joint degrees in the graph; see for instance [54]. Using known characterizations of possible joint degrees in combination with the theory of Markov bases [17], we show that there exists a finite set of graph rewiring moves that connect any two graphs with the same Forman–Ricci curvatures and node degrees; this is the content of Theorem 4.12. The set of rewiring moves can be computed using existing computer algebra packages, and we give explicit examples for the case of graphs with a small maximum degree. Our result naturally leads to an ergodic Markov chain over the set of all graphs with equal Forman–Ricci curvatures and degrees.
Organization: The rest of this article is organized as follows. Section 2 introduces the relevant definitions from graph theory and discrete curvature and formalizes the reconstruction question. In Section 3, we propose an evolutionary MCMC algorithm for approximate reconstruction from a target curvature sequence. We then use the algorithm in a number of experiments and describe network statistics of graphs sampled using the algorithm. In Section 4, we consider the exact reconstruction problem. We formalize the problem and solve it using the theory of joint degree matrices and Markov bases. Section 5 concludes the article with a summary of the results and an outlook towards future applications and extensions.
2 Background and Forman–Ricci curvatures
We consider simple graphs where is the set of nodes and is the set of edges, which are pairs of distinct nodes that are connected in the graph. As mentioned in the introduction, we will characterize the structural properties of graphs by considering their discrete curvature. Our focus will be on two definitions of discrete Ricci curvature on edges.
The Forman–Ricci curvature for graphs is defined with respect to a choice of positive weights for the nodes and edges. The curvature of an edge is equal to [53]
where runs over all edges that contain (resp. ) and are distinct from . Setting all weights to results in the combinatorial formula for the Forman–Ricci curvature of an edge that will be used in this article:
(1) |
for and where denotes the degree of node , i. e., its number of neighbors in . Clearly, Forman–Ricci curvature is a very simple notion of discrete curvature that only depends on the local degree information of the graph. For this reason, we will also consider an extended notion of discrete Ricci curvature based on Forman’s original curvature for CW complexes [23]. The augmented Forman–Ricci curvature for unweighted graphs is defined in [50] as
(2) |
where is the number of triangles in the graph that contain the edge . The augmented Forman–Ricci curvature is widely used because it can be computed efficiently while still capturing the local geometry of a graph well; for instance, it correlates highly with other notions of discrete curvature such as Ollivier–Ricci curvature [50] and Bakry–Émery curvature [40]. One can further augment by considering cycles of length four and more, but this involves a tradeoff between expressivity and computational complexity of the discrete curvature; see for instance [32, 56, 22] for more on this. Figure 1 shows a small graph with the Forman–Ricci and augmented Forman–Ricci curvatures computed for all edges.
For a given graph , the curvature sequence is the multiset (set with possible repeats) of all discrete curvatures of its edges. For instance, the Forman–Ricci curvature sequence is the multiset , which for the example in Figure 1 equals . We can now make the main methodological question of this article more precise:
Given a multiset of numbers, construct a graph with curvature sequence . | (Q1) |
The answer to this question can then be used to address a second goal of the article: to explore the space of all graphs that have a given discrete curvature.
Section 3 addresses question (Q1) algorithmically by proposing an algorithm that samples graphs such that , for any notion of discrete curvature. Section 4 addresses question (Q1) theoretically for the case of Forman–Ricci curvature. We show that there exists a finite set of graph operations, such that any two graphs with the same Forman–Ricci curvature sequence and degree sequence can be transformed into each other by using these graph operations. Throughout the article, we limit our scope to choices of for which there exists at least one graph with curvature sequence . Characterizing the multisets that are realizable as discrete curvature sequences of a graph is an interesting problem in itself, and it is still unexplored for most notions of discrete curvature.
3 Approximate graph reconstruction from curvatures
The main methodological question of this article (Q1) asks to construct graphs with predefined discrete curvatures. Similar to reconstructing graphs from statistics such as motif counts or eigenvalues [29], this is a very difficult task in general and it highly depends on the choice of discrete curvature. Therefore, this section proposes an algorithm for approximate reconstruction that works for any choice of curvature. For a given multiset , the algorithm samples graphs such that . We use this algorithm to explore the ensemble of graphs with curvatures close to a given target sequence. In particular, we study how a number of standard network summary statistics vary over this ensemble.
3.1 Evolutionary algorithm for sampling graphs with given curvatures
We follow the methodology of [29] and use an evolutionary Markov chain Monte Carlo-type algorithm (MCMC) that consists of three steps: initialization, mutation and selection. The algorithm starts with an initial graph which is then evolved iteratively to make its curvature sequence gradually closer to the target sequence. After choosing a notion of discrete curvature of interest, the algorithm takes three inputs: (i) the number of nodes and the target sequence , which is a multiset of numbers, (ii) a parameter which introduces variability to avoid the algorithm from getting stuck and (iii) a parameter , which is the number of steps after which the algorithm halts. In practice we choose the parameters and for a given sequence using grid search. We now describe the algorithm steps in more detail.
Initialization: Construct a random graph with nodes and edges, starting from a random tree (to guarantee connectedness) and with the remaining edges randomly added while avoiding multi-edges. The iteration parameter is set to zero.
Mutation: Select a random edge in the graph and construct a graph in which the edge is rewired to , where is chosen uniformly at random while avoiding isolated nodes and multi-edges.
Selection: Compute the discrete curvature sequences and and compare with the target curvature sequence by computing the mean squared error222We found that choosing a different measure to compare the curvature sequences does not have much influence on the convergence speed of the algorithm.:
where the curvature sequences are sorted in ascending order. The mutation is accepted if it results in a decrease in the mean squared error , or otherwise it is accepted with probability . Finally, set and if the mutation is accepted or otherwise, and continue the process until .
Figure 2 illustrates the evolution of the curvature sequences throughout the algorithm. It shows the histogram of the target sequence and some curvature sequences during the algorithm; note the good convergence of the final curvature sequence . While this methodology is applicable to any choice of discrete curvature, we focus on Forman–Ricci and augmented Forman–Ricci curvature for computational reasons in our experiments. We found that the algorithm still converges for Ollivier–Ricci curvature and resistance curvature target sequences, but with prohibitively long computation times.
3.2 Approximate sampling experiments
We now use the algorithm described in Section 3.1 to explore ensembles of graphs with a given discrete curvature sequence. For both Forman–Ricci and augmented Forman–Ricci curvature, we consider four target curvature sequences and sample graphs. We then compute a number of network summary statistics and explore how they vary over these graphs.
3.2.1 Target sequences
To guarantee target sequences which are realizable, we use curvature sequences of existing graphs. We consider graphs obtained from three widely-studied random graph models [42] and one real-world network. The considered graphs are as follows:
-
•
Erdős–Rényi (ER) random graph. The graph has nodes and every pair of nodes is connected independently with probability . These graphs result in a unimodal target sequence.
-
•
Stochastic block model (SBM) random graph. The graph has nodes divided in two equal sized groups. Pairs of nodes in group are connected with probability , in group with probability and pairs of nodes from different groups with probability . These graphs result in a bimodal target sequence.
-
•
Barabási–Albert (BA) random graph. The graph has nodes and is constructed by starting from a star graph with four nodes and then adding new nodes. Each new node is connected to three of the existing nodes with probability proportional to their degrees. These graphs result in a heavy-tailed target sequence.
- •
3.2.2 Summary statistics
To summarize the structure of the graphs obtained from our algorithm, we compute four global network statistics:
- •
- •
-
•
Average shortest path length: this is the average, over all pairs of nodes and , of the smallest number of steps between and . It is an intuitive summary of the metric structure of a graph.
-
•
First positive eigenvalue: the normalized graph Laplacian is the matrix where is the identity matrix, is the diagonal degree matrix and is the adjacency matrix. The eigenvalues of the Laplacian contain a lot of information about the graph structure and dynamical processes such as diffusion and random walks associated with the graph [13, 41]. In particular, is a positive semidefinite matrix and its first positive eigenvalue quantifies how well-connected the graph is.
3.2.3 Experiments
Before sampling the ensemble of graphs, we determine the optimal parameters and for each of the target sequences. Table 1 shows the parameter values obtained via grid search.
ER | SBM | BA | C. elegans | ||
Forman–Ricci curvature sequence | |||||
augmented Forman–Ricci curvature sequence | |||||
We now consider the four graphs (ER, SBM, BA, C. elegans), compute their Forman–Ricci curvature sequence as a target sequence and then compute graphs with using our algorithm. As an indication of the quality of the approximation, Table 2 shows the mean squared errors for the sampled graphs. Generally, we observe that the MSE increases as the graphs become less uniform, as is the case for the Barabási–Albert and the C. elegans network.
Figure 4 shows the histograms of the network statistics (in blue) and the value of this statistic for the starting graph (red line). For each statistic, we also plot the standard score , where is the value of the statistic for the target graph, is the ensemble average of the statistic and is the standard deviation of the statistic over the ensemble. This number is a proxy for how typical or atypical the statistics of the target graph are with respect to the ensemble.
We note a few observations: the standard scores for the ER-based target sequence are all below . This suggests that the ensemble of graphs obtained from the algorithm is not very different from an ensemble of ER random graphs. Second, for the SBM-based target sequence, we see that the average shortest path length is much smaller for the sampled graphs than for the target graph, and that the first positive eigenvalue is much larger. Both observations suggest that the block structure of the SBM graph, i. e., two groups of nodes which are poorly connected, is not well-reproduced by the sampled graphs. Finally, for the C. elegans-based target sequence, we see that all standard scores are very high. This implies that the structure of this graph is very different from the sampled graphs. This is not surprising since only fixing the degree sums of edges is a very local and coarse structural constraint.
We repeat the same experiment with the augmented Forman–Ricci curvature. The mean squared error between the sampled and target curvature sequences are shown in Table 2 and the MSE is again higher for the Barabási–Albert graph and the C. elegans network. The results of the experiment are shown in Figure 5.
We note a few observations: compared to Figure 4, the standard scores for the ER-based target sequence have all increased. This is not unexpected, since the graphs are sampled based on the augmented Forman–Ricci curvature which takes into account the presence of triangles in the graph, whereas ER graphs have no correlations between different edges. Secondly, the same observation as before holds for the SBM-based target sequence: the smaller shortest path lengths and larger first positive eigenvalue suggest that the two-block structure of the SBM is not reproduced in the sampled graphs.
MSE | Forman | Augmented Forman | ||||||
ER | SBM | BA | C. elegans | ER | SBM | BA | C. elegans | |
min | 0.0 | 0.016 | 0.038 | 0.111 | 0.018 | 0.153 | 0.195 | 15.539 |
max | 0.005 | 0.039 | 0.193 | 0.416 | 0.191 | 0.285 | 3.757 | 21.469 |
mean | 0.001 | 0.027 | 0.088 | 0.033 | 0.094 | 0.212 | 1.115 | 18.188 |
Finally, we note that one should take care in interpreting the histograms of network statistics, since the specific design of our algorithm does not give much control or insight into the distribution from which we sample graphs. Furthermore, we do not have any mixing time results on which we can base our choice of stopping parameter . This makes it hard to control the influence of the initialization on the ensemble of graphs which we sample from. In future work, these issues could be addressed by designing a more tailored algorithm or by trying to find exact expressions for the probability distribution over graphs, for instance using a soft maximum entropy model [52].
4 Exact graph reconstruction from curvatures and degrees
In this section, we take a second approach to the main question (Q1) and consider the exact reconstruction problem of finding graphs whose discrete curvatures precisely match a given multiset of numbers. This problem clearly depends on the notion of discrete curvature, and we will focus on Forman–Ricci curvature . For reasons which will be explained later, we not only consider fixed curvatures but also fixed degrees. In the main result, Theorem 4.12, we find that there exists a finite set of graph rewiring moves which can transform any two multigraphs with the same Forman–Ricci curvatures and degree sequences into each other. This result follows from encoding the degree and curvature constraints algebraically using a joint degree matrix and using the theory of Markov bases. Our approach then decomposes into two parts: first, finding all joint degree matrices that are compatible with the curvature and degree constraints and second, finding all graphs with the same joint degree matrix.
The rest of this section will use the following conventions: a multigraph can have multi-edges and self-loops while a simple graph cannot, and we write “graph” if the distinction is not relevant. The maximum degree is denoted by , the degree sequence is the -tuple and the curvature sequence is the -tuple (# edges with Forman–Ricci curvature ); see Figure 6 for an example. Note that this is different from how we defined the curvature sequence before; it records the frequency of all possible curvatures rather than the multiset of curvatures itself. Since both definitions contain the same information, we will use the same name.
4.1 Joint degree matrix and transpositions
The Forman–Ricci curvature of an edge depends on the sum of the degrees of its end points. This information can be represented by the following matrix: the joint degree matrix (JDM) of a graph is a symmetric matrix with entries
We note that the degree and curvature sequences of a graph can be computed from its JDM: the number of nodes with degree is equal to the row sum and the number of edges with Forman–Ricci curvature is equal to the anti-diagonal sum . Figure 6 shows the JDM and associated sequences for a small graph that will be our running example.
The following result from [54] gives a complete characterization of JDMs.
Theorem 4.1.
A symmetric matrix is the JDM of a simple graph if and only if it satisfies, for all distinct , the conditions
-
(i)
is an integer,
-
(ii)
, and
-
(iii)
.
The authors of [54] gave an algorithm to construct a simple graph from a given JDM. More importantly, they also showed that the set of all graphs with the same JDM is connected through the graph operation shown in Figure 7: let be two edges such that and let . The transposition of the edges in is a new graph obtained by rewiring these two edges. Its nodes and edges are
Transpositions do not change the JDM of a graph and the theory of matchings in convex bipartite graphs guarantees that the set of multigraphs with the same JDM is connected through transpositions [16, 54]. In general, transpositions in simple graphs may create multi-edges or self-loops depending on the choice of , but the following result shows that there always exists a choice for which this is not the case.
Theorem 4.2 ([54]).
The set of all simple graphs with a given JDM is connected through transpositions.
In other words, all simple graphs with the same JDM can be found by performing transpositions, in such a way that every intermediate step is also a simple graph. Since transpositions do not change the JDM, these graphs will also have the same curvature and degree sequences. This is the first step towards our main result.
We note the following count for the number of multigraphs with a given JDM:
Proposition 4.3.
The number of node-labeled multigraphs with a given JDM is equal to
(3) |
Proof. We have nodes of degree . Each such node has “stubs” that need to be matched to of the “half-edges”. For the first node, there are ways to match its stubs, for the second node there are ways, and so on. This gives a total of
ways to connect nodes of degree to half-edges. This counting procedure distinguishes the half-edges, so to obtain edge-unlabeled multigraphs we have to forget the half-edge labels. This makes any permutation of the edges which are connected to nodes with degree and indistinguishable, so we divide the count by . Second, forgetting the half-edge labels makes any flip of the edges connected to two nodes of equal degree indistinguishable, so we divide the count by . This results in (3) and completes the proof. ∎
4.2 Markov bases
In the previous section, we found that all graphs with a given JDM are connected by transpositions. As a second step, we now want to find all JDMs which are compatible with a given curvature and degree sequence. More precisely, these are all nonnegative integer matrices that satisfy conditions (i)—(iii) in Theorem 4.1 (valid JDMs) and have prescribed row sums (degrees) and anti-diagonal sums (curvatures). We will show how this set of matrices can be obtained using Markov bases.
The theory of Markov bases was developed by Diaconis and Sturmfels in [17] to solve the problem of sampling contingency tables, i. e., nonnegative integer matrices with fixed row and column sums, in the context of statistics. The main result in this theory is that this sampling problem can be solved using concepts and tools from commutative algebra. We first give the main definition and theorem and then apply it to our problem.
Definition 4.4.
Let be an integer matrix. A Markov basis for is a finite set of integer vectors in the kernel of , such that for every with , there exists a sequence that satisfies
-
(i)
, and
-
(ii)
.
For we call the fiber of . The adjective ‘Markov’ reflects that a Markov basis can be used to design a Markov chain on a fiber: points in the fiber are states of the Markov chain and the transitions out of any state are those vectors in the Markov basis for which . The Markov basis properties guarantee that this chain is ergodic.
The question of how to find a Markov basis for a given integer matrix is addressed by the fundamental theorem of Markov bases (see also Appendix A).
Theorem 4.5 (Fundamental theorem of Markov bases [17]).
Let , then is a Markov basis for if and only if the binomials generate the ideal .
Appendix A provides further background on the relevant algebraic concepts, but the details of the theorem are not relevant for the purpose of this article. The most important fact is that a finite Markov basis always exists and that it can be computed using computer algebra systems for sufficiently small matrices . For instance, using the software package 4ti2 [1] we compute that
For larger matrices, these computations can become infeasible in practice and other approaches such as relaxed notions of Markov bases or restrictions to particular fibers may be necessary [25].
We now show that the set of all JDMs with a given degree and curvature sequence is the fiber for some matrix . This means that the associated Markov basis can be used to explore this fiber. We stress that this matrix and its Markov basis only depend on the maximum degree , and not on the specific degree and curvature sequences.
We start by encoding a JDM and its constraints as a vector , where . This vector contains the upper-triangular JDM entries and the slack variables which we define as
with defined as before from . The slack variables are used to encode the inequalities on the JDM entries given in Theorem 4.1: a matrix is a valid JDM if and only if the associated slack variables are nonnegative integers, and thus . The matrix can be reconstructed from by forgetting the slack variables, so we can write .
Next, to construct the matrix , we consider the equations in the and variables that are invariant when fixing the degree and curvature sequences:
-
•
for all (fixed degree sequence333To be precise, this equation fixes from which can be retrieved.),
-
•
for all (fixed curvature sequence),
-
•
for all (definition of slack variables).
These equations are linear functions of with integer coefficients and they determine the rows of the matrix .
To illustrate, we give explicitly for in Examples 4.8 and 4.9 below. The relevance of this matrix is shown by the following proposition.
Proposition 4.6.
Let be a JDM with corresponding vector . The fiber contains all JDMs with the same degree and curvature sequences as .
Proof. Let be a JDM with corresponding vector . We consider the fiber
Let and write and for the corresponding entries in this vector. From the rows of that encode the fixed degree sequence, we obtain
(4) |
From the rows of corresponding to the off-diagonal slack equations ( and the fact that and thus , we know that
Similarly, from the diagonal slack equations () and , we obtain
By Theorem 4.1, this proves that the matrix is a JDM. The degree sequences and are equal by (4), and the rows of which encode the fixed curvature sequence guarantee that
and thus that the curvature sequences of and are equal. This holds for every which proves that every vector in the fiber indeed corresponds to a JDM with degree and curvature sequences equal to . The same approach shows that every such JDM is also an element of the fiber, which completes the proof. ∎
Following Proposition 4.6, we can thus use a Markov basis corresponding to the matrix to obtain all JDMs with a fixed curvature and degree sequence. At this point, it has become clear why we need to consider fixed degrees: to encode the inequalities in the characterization of JDM matrices using slack variables as , we need and to be constant.
A first result on the Markov basis follows from the specific structure of .
Proposition 4.7.
The entries of a vector corresponding to the slack variables are the negative of the entries corresponding to the variables.
Proof. By construction, the equation is invariant so any change when applying must be compensated by the opposite change . ∎
Following Proposition 4.7, we can just consider the -entries of vectors in the Markov basis and we will furthermore rearrange these as a symmetric matrix and write . We now consider the cases in more detail to illustrate the definitions and results.
Example 4.8 ().
The matrix is a matrix. It can be written as the block-matrix
where is the identity matrix, is the zero matrix and is the matrix whose rows encode the fixed degree sequences (first rows) and fixed curvature sequences (last rows) in terms of the variables:
The top indices show the correspondence between the columns of and the entries of the matrix, while the indices on the right indicate the equation encoded by the row. For instance, the second row reflects the sum ; this equation fixes and thus the number of nodes with degree . The th row reflects the sum ; this equation fixes the number of edges with Forman–Ricci curvature .
Example 4.9 ().
The matrix is a matrix. It can be written in block-matrix form as
where is the identity matrix, is the zero matrix of appropriate dimensions and is the matrix whose rows encode the fixed degree sequences (first rows) and fixed curvature sequences (last rows) in terms of the variables:
For instance, the first row encodes the equation , which keeps the number of nodes with degree fixed. The th row encodes the equation , which keeps the number of edges with Forman–Ricci curvature fixed.
The matrix has rank 27 and using 4ti2 we compute the following Markov basis consisting of elements:
If we define444This notation refers to the fact that is the degree of the homogeneous binomial (see Appendix A). This should not be confused with the degree of a node. , we see that there are five bases with , two bases with and two bases with . We will see later that this corresponds to the number of edges involved in the graph operation associated with .
We now give some further results related to the matrix .
Proposition 4.10.
for all .
Proof. We first give two independent linear dependencies between the rows of to show that . The first linear relation between the rows is
The second linear relation between the rows is
To prove that the latter relation holds, we simplify the first sum to and the second sum to , where . We further simplify
which confirms that the second linear relation equals zero. Next, we show that the matrix has a nonsingular -minor . This implies that which then completes the proof.
First, note that can be written as , where is the zero matrix, is the identity matrix of size and a matrix. The rows corresponding to the identity matrices encode the fixed slack variables and the rows corresponding to encode the fixed degree and curvature sequences; as a result, the columns of correspond to variables. See Examples 4.8 and 4.9 for a concrete example. We show that has a nonsingular -minor which implies that has the required nonsingular -minor .
Consider the following submatrices of :
for . After reordering rows and columns, we can write matrix as
where indicates entries that are not relevant. This is by construction a ()-minor of . It is an upper-triangular matrix with ones along the diagonal, which means that it is nonsingular. As noted, this implies that the -minor is nonsingular, which completes the proof. ∎
Understanding the size of the fiber is a hard problem in general. However, making use of the special property that is one-dimensional, we obtain the following bound on the size of the fiber in this case.
Proposition 4.11.
Let be a JDM with . Then there are at most
JDMs, including , with the same degree and curvature sequences as .
Proof. We recall from Example 4.8 above that the Markov basis for contains a single element
This means that every JDM with the same degree and curvature sequences as is of the form with . For to be a JDM, it is necessary that and for this reason we know that and . These conditions are necessary but in general not sufficient, since alone does not guarantee that the slack variables are also nonnegative. The possible JDMs from the proposition are thus a subset of for within the given bounds, which completes the proof. ∎
We note that the bound is the same for all elements of the fiber. To illustrate that the bound can be strict, consider the JDM in the running example in Figure 6. Here, is a valid JDM but is not and we thus have just two JDMs in the fiber, whereas .
4.3 Markov moves and main result
To combine Theorem 4.2 on transpositions with Proposition 4.6 on JDMs and Markov bases, we make the observation that applying an element of the Markov basis to a JDM can be interpreted as a graph operation; this is illustrated in Figure 8.
We define a Markov move as follows: Let be a graph and an element of the Markov basis such that is a JDM. Consider a pair where is a subset of edges that has edges with degrees whenever , and where is a permutation such that the set has edges with degrees whenever . Existence of such a pair follows from the definition of the Markov basis . A Markov move of is a new graph with nodes and edges
See Figure 8 for an example. We now know that all multigraphs with a given degree and curvature sequence are connected by a finite set of graph operations: transpositions, which leave the JDM unchanged, and Markov moves corresponding to the elements of , which change the JDM. This is summarized in the following theorem.
Theorem 4.12.
The set of multigraphs with a given Forman–Ricci curvature and degree sequence is connected through transpositions and Markov bases. For small , there exist Markov bases of size
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | ||
0 | 0 | 0 | 1 | 9 | 111 | 2662 | 171964 | ? |
Proof. Let and be two multigraphs with the same curvature and degree sequences. If and have the same JDM then Theorem 4.12 in the case of multigraphs (see also [54, 16]) says that can be transformed into by transpositions. If and do not have the same JDM, i. e., , then by Proposition 4.6 and the fundamental theorem of Markov bases, matrix can be transformed into by adding or subtracting a sequence of elements from the Markov basis . This implies that can be transformed using Markov moves into a graph such that . Using Theorem 4.12 again, can then be transformed into using transpositions. All intermediate steps in this procedure can be guaranteed to be multigraphs. This concludes the proof of the first part. The results on the size of the Markov bases follows from computations: for the matrix has full rank and for we computed the Markov bases using the software 4ti2. ∎
We note that the set of simple graphs with a given curvature and degree sequence sits inside this bigger set of multigraphs. So in particular, the set of simple graphs is connected by transpositions and Markov moves, with intermediate steps potentially being multigraphs. It is an interesting open question whether all intermediate steps can be simple graphs as well, which would imply a version of Theorem 4.12 for simple graphs.
Finally, we note that Theorem 4.12 may be used to sample graphs with a given degree and curvature sequence. Algorithm 1 shows one possible approach.
The performance of any sampling algorithm based on transpositions and Markov moves will depend on how fast the associated Markov chain mixes. The mixing behavior of transpositions is partially understood via its relation to perfect matching sampling [54], but understanding and controlling the mixing behavior of the Markov chain associated with general Markov bases is much more challenging [25, 61].
5 Conclusion and discussion
In this work, we consider the problem of constructing graphs with prescribed discrete curvatures. We take a first step in addressing this methodological gap in network geometry and discrete curvature by approaching the problem in two ways. First, in Section 3 we develop an evolutionary MCMC-type algorithm to construct graphs whose curvatures approximate a given set. Using this algorithm, we then explore the ensemble of graphs with a fixed discrete curvature sequence. Second, in Section 4 we solve the exact reconstruction problem for Forman–Ricci curvature and degree sequences. We show that there exists a finite set of graph rewiring moves — Markov moves and transpositions — which connect the set of all graphs with a given Forman–Ricci curvature and degree sequence.
Our work opens several new directions for future research. Most results obtained in this article are limited to specific choices of discrete curvatures, but the same questions are equally relevant for all other notions of discrete curvature. More specifically, our algorithmic approach could be extended by optimizing the expensive computations in the mutation/selection phases for specific curvatures. This would make the sampling algorithm practical for a wider range of curvatures and provide a tool to compare among them. Furthermore, a more careful choice of algorithm design might lead to an understanding of and control over the mixing times and convergence rates of the algorithm.
Our results on exact reconstruction for Forman–Ricci target sequences are not as directly extendable to other discrete curvatures. The analysis and solution heavily depends on the simple combinatorial nature of Forman–Ricci curvatures and its connection to joint degrees in a graph. However, an important follow-up question would be to find ways to decrease the prohibitively large size of the Markov bases . Adding constraints to the curvature or degree sequences and thus restricting to subspaces of is one possible approach. If the Markov basis size can be controlled, this would enable a practical implementation of the proposed Algorithm 1.
Acknowledgements: The authors thank Florentin Münch for a discussion that lead to the proof of Proposition 4.10.
Appendix A Algebraic background for Markov bases
This brief appendix defines the algebraic concepts that appear in the fundamental theorem of Markov bases, Theorem 4.5, and that may not be commonly known. We refer to [17, 25] for further background.
Let be the ring of polynomials in variables and complex coefficients. For a vector we write for the monomial with exponent vector , and for a vector we write for the vector with entries if and zero otherwise, and .
An ideal in is a subset such that and . An ideal is generated by if every can be written as a polynomial combination for some .
The fundamental theorem of Markov bases is thus an algebraic statement which says that Markov bases are precisely the generators of some ideal associated with the kernel of an integer matrix ; these are called toric ideals.
References
- [1] 4ti2 team. 4ti2—a software package for algebraic, geometric and combinatorial problems on linear spaces. available at: https://4ti2.github.io.
- [2] D. Bakry, I. Gentil, and M. Ledoux. Analysis and Geometry of Markov Diffusion Operators. Grundlehren der mathematischen Wissenschaften. Springer, Cham, 2013.
- [3] A.-L. Barabási and R. Albert. Emergence of scaling in random networks. Science, 286(5439):509–512, 1999.
- [4] M. Barthelemy. The structure and dynamics of cities: Urban data analysis and theoretical modeling. Cambridge University Press, Cambridge, UK, 2016.
- [5] F. Battiston, G. Cencetti, I. Iacopini, V. Latora, M. Lucas, A. Patania, J.-G. Young, and G. Petri. Networks beyond pairwise interactions: Structure and dynamics. Physics Reports, 874:1–92, 2020.
- [6] S. Battiston, M. Puliga, R. Kaushik, P. Tasca, and G. Caldarelli. DebtRank: Too central to fail? Financial networks, the FED and systemic risk. Scientific Reports, 2(1):541–541, 2012.
- [7] F. Bauer, J. Jost, and S. Liu. Ollivier-Ricci curvature and the spectrum of the normalized graph Laplace operator. Math. Res. Lett., 19(6):1185–1205, 2012.
- [8] C. Bick, E. Gross, H. A. Harrington, and M. T. Schaub. What are higher-order networks? SIAM Review, 65(3):686–731, 2023.
- [9] J. Bober, A. Monod, E. Saucan, and K. N. Webster. Rewiring networks for graph neural network training using discrete geometry, 2022. arXiv:2207.08026 [stat.ML].
- [10] M. Boguñá, I. Bonamassa, M. De Domenico, S. Havlin, D. Krioukov, and M. A. Serrano. Network geometry. Nature Reviews Physics, 3(2):114–135, 2021.
- [11] A. D. Broido and A. Clauset. Scale-free networks are rare. Nature Communications, 10(1):1017–1017, 2019.
- [12] G. T. Cantwell, Y. Liu, B. F. Maier, A. C. Schwarze, C. A. Serván, J. Snyder, and G. St-Onge. Thresholding normally distributed data creates complex networks. Phys. Rev. E, 101:062302, 2020.
- [13] F. R. K. Chung. Spectral Graph Theory. American Mathematical Society, Providence, RI, US, 1997.
- [14] K. Devriendt and R. Lambiotte. Discrete curvature on graphs from the effective resistance. Journal of Physics: Complexity, 3(2):025008, 2022.
- [15] K. Devriendt, A. Ottolini, and S. Steinerberger. Graph curvature via resistance distance. Discrete Applied Mathematics, 348:68–78, 2024.
- [16] P. Diaconis, R. Graham, and S. P. Holmes. Statistical problems involving permutations with restricted positions. Lecture notes-monograph series, 36:195–222, 2001.
- [17] P. Diaconis and B. Sturmfels. Algebraic algorithms for sampling from conditional distributions. The Annals of Statistics, 26(1):363–397, 1998.
- [18] J. Duch and A. Arenas. Community detection in complex networks using extremal optimization. Phys. Rev. E, 72(2):027104, 2005.
- [19] M. Eidi, A. Farzam, W. Leal, A. Samal, and J. Jost. Edge-based analysis of networks: curvatures of graphs and hypergraphs. Theory in Biosciences, 139:337 – 348, 2020.
- [20] P. Elumalai, Y. Yadav, N. Williams, E. Saucan, J. Jost, and A. Samal. Graph Ricci curvatures reveal atypical functional connectivity in autism spectrum disorder. Scientific Reports, 12(1), 2022.
- [21] M. Erbar and J. Maas. Ricci curvature of finite Markov chains via convexity of the entropy. Archive for rational mechanics and analysis, 206(3):997–1038, 2012.
- [22] L. Fesser, S. S. de Haro Iváñez, K. Devriendt, M. Weber, and R. Lambiotte. Augmentations of Forman’s Ricci curvature and their applications in community detection, 2023. arXiv:2306.06474 [math.CO].
- [23] R. Forman. Bochner’s method for cell complexes and combinatorial Ricci curvature. Discrete & Computational Geometry, 29:323–374, 2003.
- [24] A. Fornito, A. Zalesky, and E. T. Bullmore. Fundamentals of brain network analysis. Academic Press, London, UK, first edition, 2016.
- [25] J. A. D. Félix Almendra-Hernández and S. Petrović. Markov bases: A 25 year update. Journal of the American Statistical Association, pages 1–32, 2024.
- [26] M. Girvan and M. E. J. Newman. Community structure in social and biological networks. Proceedings of the National Academy of Sciences, 99(12):7821–7826, 2002.
- [27] A. Gosztolai and A. Arnaudon. Unfolding the multiscale structure of networks with dynamical Ollivier-Ricci curvature. Nature Communications, 12:4561, 2021.
- [28] R. Hausmann, C. Hidalgo, S. Bustos, M. Coscia, S. Chung, J. Jimenez, A. Simoes, and M. Yildirim. The Atlas of Economic Complexity. Puritan Press, 2011.
- [29] M. Ipsen and A. S. Mikhailov. Evolutionary reconstruction of networks. Phys. Rev. E, 66:046109, 2002.
- [30] H. Jeong, B. Tombor, R. Albert, Z. N. Oltvai, and A.-L. Barabási. The large-scale organization of metabolic networks. Nature, 407(6804):651–654, 2000.
- [31] J. Jost and S. Liu. Ollivier’s Ricci curvature, local clustering and curvature-dimension inequalities on graphs. Discrete & Computational Geometry, 51(2):300–322, 2014.
- [32] J. Jost and F. Münch. Characterizations of Forman curvature, 2021. arXiv:2110.04554 [math.DG].
- [33] D. Krioukov. Clustering implies geometry in networks. Phys. Rev. Lett., 116:208302, 2016.
- [34] J. Kunegis. Konect: the Koblenz network collection. In Proceedings of the 22nd international conference on world wide web, pages 1343–1350, 2013.
- [35] R. Lambiotte, M. Rosvall, and I. Scholtes. From networks to optimal higher-order models of complex systems. Nature Physics, 15(4):313–320, 2019.
- [36] Y. Lin, L. Lu, and S.-T. Yau. Ricci curvature of graphs. Tohoku Mathematical Journal, 63(4):605 – 627, 2011.
- [37] Y. Lin and S.-T. Yau. Ricci curvature and eigenvalue estimate on locally finite graphs. Mathematical research letters, 17(2):343–356, 2010.
- [38] S. Liu, F. Münch, and N. Peyerimhoff. Bakry–Émery curvature and diameter bounds on graphs. Calculus of variations and partial differential equations, 57(2):1–9, 2018.
- [39] B. Loisel and P. Romon. Ricci curvature on polyhedral surfaces via optimal transportation. Axioms, 3(1):119–139, 2014.
- [40] M. Mondal, A. Samal, F. Münch, and J. Jost. Bakry-Émery-Ricci curvature: An alternative network geometry measure in the expanding toolbox of graph Ricci curvatures, 2024. arXiv:2402.06616 [physics.comp-ph].
- [41] R. Mulas, D. Horak, and J. Jost. Graphs, simplicial complexes and hypergraphs: Spectral theory and topology. In Higher-Order Systems, Understanding Complex Systems, pages 1–58. Springer International Publishing AG, Switzerland, 2022.
- [42] M. E. J. Newman. Networks an introduction. Oxford University Press, Oxford, UK, 2010.
- [43] C. Ni, Y. Lin, F. Luo, and J. Gao. Community detection on networks with Ricci flow. Scientific Reports, 9:9984, 2019.
- [44] Y. Ollivier. Ricci curvature of metric spaces. Comptes Rendus Mathematique, 345(11):643–646, 2007.
- [45] Y. Ollivier. A survey of Ricci curvature for metric spaces and Markov chains. In Probabilistic approach to geometry, volume 57, pages 343–382. Mathematical Society of Japan, 2010.
- [46] R. Overbeek, N. Larsen, G. D. Pusch, M. D’Souza, E. S. Jr, N. Kyrpides, M. Fonstein, N. Maltsev, and E. Selkov. WIT: Integrated system for high-throughput genome sequence analysis and metabolic reconstruction. Nucleic Acids Research, 28(1):123–125, 2000.
- [47] G. A. Pagani and M. Aiello. The power grid as a complex network: A survey. Physica A: Statistical Mechanics and its Applications, 392(11):2688–2700, 2013.
- [48] R. Pastor-Satorras, C. Castellano, P. Van Mieghem, and A. Vespignani. Epidemic processes in complex networks. Rev. Mod. Phys., 87:925–979, 2015.
- [49] A. Samal, H. K. Pharasi, S. J. Ramaia, H. Kannan, E. Saucan, J. Jost, and A. Chakraborti. Network geometry and market instability. Royal Society Open Science, 8(2):201734, 2021.
- [50] A. Samal, R. P. Sreejith, J. Gu, S. Liu, E. Saucan, and J. Jost. Comparative analysis of two discretizations of Ricci curvature for complex networks. Scientific Reports, 8(1):1–16, 2018.
- [51] R. Sandhu, T. Georgiou, E. Reznik, L. Zhu, I. Kolesov, Y. Şenbabaoğlu, and A. Tannenbaum. Graph curvature for differentiating cancer networks. Scientific Reports, 5:12323, 2015.
- [52] T. Squartini and D. Garlaschelli. Maximum-entropy networks: Pattern detection, network reconstruction and graph combinatorics. SpringerBriefs in complexity. Springer, Cham, Switzerland, 2017.
- [53] R.-P. Sreejith, K. Mohanraj, J. Jost, E. Saucan, and A. Samal. Forman curvature for complex networks. Journal of Statistical Mechanics: Theory and Experiment, 2016(6):063206, 2016.
- [54] I. Stanton and A. Pinar. Constructing and sampling graphs with a prescribed joint degree distribution. ACM J. Exp. Algorithmics, 17, 2012.
- [55] S. Steinerberger. Curvature on graphs via equilibrium measures. Journal of Graph Theory, 103(3):415–436, 2023.
- [56] P. Tee and C. A. Trugenberger. Enhanced Forman curvature and its relation to Ollivier curvature. Europhysics Letters, 133(6):60006, 2021.
- [57] J. Topping, F. D. Giovanni, B. P. Chamberlain, X. Dong, and M. M. Bronstein. Understanding over-squashing and bottlenecks on graphs via curvature, 2022. arXiv:2111.14522 [stat.ML].
- [58] I. Voitalov, P. van der Hoorn, R. van der Hofstad, and D. Krioukov. Scale-free networks well done. Phys. Rev. Res., 1:033034, 2019.
- [59] S. Wasserman and K. Faust. Social Network Analysis: Methods and Applications. Cambridge University Press, Cambridge, UK, 1994.
- [60] D. J. Watts and S. H. Strogatz. Collective dynamics of ‘small-world’ networks. Nature, 393(6684):440–442, 1998.
- [61] T. Windisch. Rapid mixing and markov bases. SIAM Journal on Discrete Mathematics, 30(4):2130–2145, 2016.
- [62] Z. Wu, G. Menichetti, C. Rahmede, and G. Bianconi. Emergent complex network geometry. Scientific Reports, 5, 2014.