Appendix: The Box Covering Algorithms
The estimation of the fractal dimension and the self‐similar features in networks have become standard properties in the study of real‐world systems. For this reason, in the last three years many box covering algorithms have been proposed [64,69]. This section presents four of the main algorithms, along with a brief discussion on the advantages and disadvantages that they offer.
Recalling the original definition of box covering by Hausdorff [14,29,55], for a given network G and box size \( { \ell_{\text{B}} } \), a box is a set of nodes where all distances \( { \ell_{ij} } \) between any two nodes i and j in the box are smaller than \( { \ell_{\text{B}} } \). The minimum number of boxes required to cover the entire network G is denoted by \( { N_{\text{B}} } \). For \( { \ell_{\text{B}} = 1 } \), each box encloses only 1 node and therefore, \( { N_{\text{B}} } \) is equal to the size of the network N. On the other hand, \( { N_{\text{B}}=1 } \) for \( { \ell_{\text{B}} \ge \ell_{\text{B}}^\text{max} } \), where \( { \ell_{\text{B}}^\text{max} } \) is the diameter of the network plus one.
The ultimate goal of a box‐covering algorithm is to find the minimum number of boxes \( { N_{\text{B}}(\ell_{\text{B}}) } \) for any \( { \ell_{\text{B}} } \). It has been shown that this problem belongs to the family of NP‐hard problems [34], which means that the solution cannot be achieved in polynomial time. In other words, for a relatively large network size, there is no algorithm that can provide an exact solution in a reasonably short amount of time. This limitation requires treating the box covering problem with approximations, using for example optimization algorithms.
The GreedyColoring Algorithm
The box‐covering problem can be mapped into another NP‐hard problem [34]: the graph coloring problem.
An algorithm that approximates well the optimal solution of this problem was presented in [64]. For an arbitrary value of \( { \ell_{\text{B}} } \), first construct a dual network \( { G^{\prime} } \), in which two nodes are connected if the distance between them in G (the original network) is greater or equal than \( { \ell_{\text{B}} } \). Figure 13 shows an example of a network G which yields such a dual network \( { G^{\prime} } \) for \( { \ell_{\text{B}}=3 } \) (upper row of Fig. 13).
Vertex coloring is a well‐known procedure, where labels (or colors) are assigned to each vertex of a network, so that no edge connects two identically colored vertices. It is clear that such a coloring in \( { G^{\prime} } \) gives rise to a natural box covering in the original network G, in the sense that vertices of the same color will necessarily form a box since the distance between them must be less than \( { \ell_{\text{B}} } \). Accordingly, the minimum number of boxes \( { N_{\text{B}}(G) } \) is equal to the minimum required number of colors (or the chromatic number) in the dual network \( { G^{\prime} } \), \( { \chi(G^{\prime}) } \).
In simpler terms, (a) if the distance between two nodes in G is greater than \( { \ell_{\text{B}} } \) these two neighbors cannot belong in the same box. According to the construction of \( { G^{\prime} } \), these two nodes will be connected in \( { G^{\prime} } \) and thus they cannot have the same color. Since they have a different color they will not belong in the same box in G. (b) On the contrary, if the distance between two nodes in G is less than \( { \ell_{\text{B}} } \) it is possible that these nodes belong in the same box. In \( { G^{\prime} } \) these two nodes will not be connected and it is allowed for these two nodes to carry the same color, i. e. they may belong to the same box in G, (whether these nodes will actually be connected depends on the exact implementation of the coloring algorithm).
The algorithm that follows both constructs the dual network \( { G^{\prime} } \) and assigns the proper node colors for all \( { \ell_{\text{B}} } \) values in one go. For this implementation a two‐dimensional matrix \( { c_{i\ell} } \) of size \( { N\times \ell_{\text{B}}^\text{max} } \) is needed, whose values represent the color of node i for a given box size \( { \ell=\ell_{\text{B}} } \).
-
1.
Assign a unique id from 1 to N to all network nodes, without assigning any colors yet.
-
2.
For all \( { \ell_{\text{B}} } \) values, assign a color value 0 to the node with id=1, i. e. \( { c_{1\ell}=0 } \).
-
3.
Set the id value \( { i=2 } \). Repeat the following until \( { i=N } \).
-
(a)
Calculate the distance \( { \ell_{ij} } \) from i to all the nodes in the network with id j less than i.
-
(b)
Set \( { \ell_{\text{B}}=1 } \)
-
(c)
Select one of the unused colors \( { c_{j\ell_{ij}} } \) from all nodes \( { j<i } \) for which \( { \ell_{ij}\geq\ell_{\text{B}} } \). This is the color \( { c_{i\ell_{\text{B}}} } \) of node i for the given \( { \ell_{\text{B}} } \) value.
-
(d)
Increase \( { \ell_{\text{B}} } \) by one and repeat (c) until \( { \ell_{\text{B}}=\ell_{\text{B}}^\text{max} } \).
-
(e)
Increase i by 1.
The results of the greedy algorithm may depend on the original coloring sequence. The quality of this algorithm was investigated by randomly reshuffling the coloring sequence and applying the greedy algorithm several times and in different models [64]. The result was that the probability distribution of the number of boxes \( { N_{\text{B}} } \) (for all box sizes \( { \ell_{\text{B}} } \)) is a narrow Gaussian distribution, which indicates that almost any implementation of the algorithm yields a solution close to the optimal.
Strictly speaking, the calculation of the fractal dimension \( { d_{\text{B}} } \) through the relation \( { N_{\text{B}}\sim \ell_{\text{B}}^{-d_{\text{B}}} } \) is valid only for the minimum possible value of \( { N_{\text{B}} } \), for any given \( { \ell_{\text{B}} } \) value, so any box covering algorithm must aim to find this minimum \( { N_{\text{B}} } \). Although there is no rule to determine when this minimum value has been actually reached (since this would require an exact solution of the NP‐hard coloring problem) it has been shown [23] that the greedy coloring algorithm can, in many cases, identify a coloring sequence which yields the optimal solution.
Burning Algorithms
This section presents three box covering algorithms based on more traditional breadth‐first search algorithm.
A box is defined as compact when it includes the maximum possible number of nodes, i. e. when there do not exist any other network nodes that could be included in this box. A connected box means that any node in the box can be reached from any other node in this box, without having to leave this box. Equivalently, a disconnected box denotes a box where certain nodes can be reached by other nodes in the box only by visiting nodes outside this box. For a demonstration of these definitions see Fig. 14.
Burning with the Diameter \( \ell_{\text{B}} \), and theCompact‐Box‐Burning
(CBB) Algorithm
The basic idea of the CBB algorithm for the generation of a box is to start from a given box center and then expand the box so that it includes the maximum possible number of nodes, satisfying at the same time the maximum distance between nodes in the box \( { \ell_{\text{B}} } \). The CBB algorithm is as follows (see Fig. 15):
-
1.
Initially, mark all nodes as uncovered.
-
2.
Construct the set C of all yet uncovered nodes.
-
3.
Choose a random node p from the set of uncovered nodes C and remove it from C.
-
4.
Remove from C all nodes i whose distance from p is \( { \ell_{{\text{pi}}}\geq\ell_{\text{B}} } \), since by definition they will not belong in the same box.
-
5.
Repeat steps (3) and (4) until the candidate set is empty.
-
6.
Repeat from step (2) until all the network has been covered.
Random BoxBurning
In 2006, J. S. Kim et al. presented a simple algorithm for the calculation of fractal dimension in networks [42,43,44]:
-
1.
Pick a randomly chosen node in the network as a seed of the box.
-
2.
Search using breath‐first search algorithm until distance \( { l_{\text{B}} } \) from the seed. Assign all newly burned nodes to the new box. If no new node is found, discard and start from (1) again.
-
3.
Repeat (1) and (2) until all nodes have a box assigned.
This Random Box Burning algorithm has the advantage of being a fast and simple method. However, at the same time there is no inherent optimization employed during the network coverage. Thus, this simple Monte‐Carlo method is almost certain that will yield a solution far from the optimal and one needs to implement many different realizations and only retain the smallest number of boxes found out of all these realizations.
Burning with the Radius r
B, and the Maximum‐Excluded‐Mass‐Burning
(MEMB) Algorithm
A box of size \( { \ell_{\text{B}} } \) includes nodes where the distance between any pair of nodes is less than \( { \ell_{\text{B}} } \). It is possible, though, to grow a box from a given central node, so that all nodes in the box are within distance less than a given box radius \( { r_{\text{B}} } \) (the maximum distance from a central node). This way, one can still recover the same fractal properties of a network. For the original definition of the box, \( { \ell_{\text{B}} } \) corresponds to the box diameter (maximum distance between any two nodes in the box) plus one. Thus, \( { \ell_{\text{B}} } \) and \( { r_{\text{B}} } \) are connected through the simple relation \( { \ell_{\text{B}} = 2 r_{\text{B}}+1 } \). In general this relation is exact for loopless configurations, but in general there may exist cases where this equation is not exact (Fig. 14).
The MEMB algorithm always yields the optimal solution for non scale‐free homogeneous networks, since the choice of the central node is not important. However, in inhomogeneous networks with wide‐tailed degree distribution, such as scale‐free networks, this algorithm fails to achieve an optimal solution because of the presence of hubs.
The MEMB, as a difference from the Random Box Burning and the CBB, attempts to locate some optimal central nodes which act as the burning origins for the boxes. It contains as a special case the choice of the hubs as centers of the boxes, but it also allows for low‐degree nodes to be burning centers, which sometimes is convenient for finding a solution closer to the optimal.
In the following algorithm we use the basic idea of box optimization, in which each box covers the maximum possible number of nodes. For a given burning radius \( { r_{\text{B}} } \), we define the excluded mass of a node as the number of uncovered nodes within a chemical distance less than \( { r_{\text{B}} } \). First, calculate the excluded mass for all the uncovered nodes. Then, seek to cover the network with boxes of maximum excluded mass. The details of this algorithm are as follows (see Fig. 17):
-
1.
Initially, all nodes are marked as uncovered and non‐centers.
-
2.
For all non‐center nodes (including the already covered nodes) calculate the excluded mass, and select the node p with the maximum excluded mass as the next center.
-
3.
Mark all the nodes with chemical distance less than \( { r_{\text{B}} } \) from p as covered.
-
4.
Repeat steps (2) and (3) until all nodes are either covered or centers.
Notice that the excluded mass has to be updated in each step because it is possible that it has been modified during this step. A box center can also be an already covered node, since it may lead to a larger box mass. After the above procedure, the number of selected centers coincides with the number of boxes \( { N_{\text{B}} } \) that completely cover the network. However, the non‐center nodes have not yet been assigned to a given box. This is performed in the next step:
-
1.
Give a unique box id to every center node.
-
2.
For all nodes calculate the “central distance”, which is the chemical distance to its nearest center. The central distance has to be less than \( { r_{\text{B}} } \), and the center identification algorithm above guarantees that there will always exist such a center. Obviously, all center nodes have a central distance equal to 0.
-
3.
Sort the non‐center nodes in a list according to increasing central distance.
-
4.
For each non‐center node i, at least one of its neighbors has a central distance less than its own. Assign to i the same id with this neighbor. If there exist several such neighbors, randomly select an id from these neighbors. Remove i from the list.
-
5.
Repeat step (4) according to the sequence from the list in step (3) for all non‐center nodes.
Comparison Between Algorithms
The choice of the algorithm to be used for a problem depends on the details of the problem itself. If connected boxes are a requirement, MEMB is the most appropriate algorithm; but if one is only interested in obtaining the fractal dimension of a network, the greedy‐coloring or the random box burning are more suitable since they are the fastest algorithms.
As explained previously, any algorithm should intend to find the optimal solution, that is, find the minimum number of boxes that cover the network. Figure 18 shows the performance of each algorithm. The greedy‐coloring, the CBB and MEMB algorithms exhibit a narrow distribution of the number of boxes, showing evidence that they cover the network with a number of boxes that is close to the optimal solution. Instead, the Random Box Burning returns a wider distribution and its average is far above the average of the other algorithms. Because of the great ease and speed with which this technique can be implemented, it would be useful to show that the average number of covering boxes is overestimated by a fixed proportionality constant. In that case, despite the error, the predicted number of boxes would still yield the correct scaling and fractal dimension.