1. Introduction
A complex network [
1] is the abstract expression of a real system [
2,
3], where the nodes rely on edges to connect with each other. Node importance, which generally varies from node to node, is an important basis for designing the network structure, improving the system robustness, etc. [
4,
5,
6,
7]. When it comes to the analysis of node importance, most of the available methods focus on unweighted networks, in which only a single edge is allowed between any two nodes [
8]. Nevertheless, weighted networks also have great applicability [
9,
10], as they are more similar to the networks abstracted from the real world, such as the transportation networks between cities, collaboration networks of scientists, etc.
Currently, several evaluation approaches of node importance for weighted networks have been proposed, primarily from two perspectives—local characteristics and global characteristics. For the former aspect, the weighted degree centrality, H-degree centrality, and weighted page-rank approaches are gradually becoming mainstream. For the latter, the weighted betweenness centrality method is the most commonly used.
The local characteristics of a node [
11] reflect the properties of other nodes directly connected to it. The primary evaluation approach used to determine it is degree centrality [
12]. In 2010, Opsahl et al. improved this approach and proposed the weighted degree centrality (WDC) [
13] algorithm, which introduced the concept of strength, stating that the node would be more important if it gains greater strength in the network. However, the definition of the strength index is not unique, which may result in conceptual confusion.
For instance, the strength index of the nodes can be determined by the sum of the edge weights, the sum of flows [
14], the number of the nodes’ neighbors with a weighted edge [
15], and the amount of mutual information [
16]. Due to the inconsistencies in concepts, the WDC algorithm is not universal. In addition, different from other WDC approaches, from the perspective of mathematical concepts, the WDC algorithm based on mutual information (MI) evaluates the node’s importance from the probability and statistics. Although the MI algorithm [
16] expresses the interaction strength among nodes in weighted networks, directly connected nodes are also the only requirement to be considered.
Zhao S.X. et al. [
17] put forward the H-degree centrality (HDC) algorithm extended from the Lobby index proposed by Korn [
18] in 2011. If there are, at most,
n edges connected to a node and the weight of the edges is not less than
n, then the H-degree centrality of this node is
n. H-degree centrality can be seen as a compromise between using the node strength and degree to measure centrality. However, there are several shortcomings in this method leading to low efficiency, e.g., the value of the edge weight is supposed to be in appropriate range or the node importance will not be sorted effectively. Hence, in various improvements based on degree centrality, local information can be reflected integrally but global information cannot.
In addition to the algorithm series of degree centrality, another algorithm focusing on local characteristics is the weighted page-rank (WPR) algorithm [
19] proposed by Page et al. in 1999. The WPR algorithm was a method originally used by Google to identify the hierarchy of webpages. If Webpage A points to Webpage B through a hyperlink, this is equivalent to Webpage A voting for Webpage B. In addition, Page A is supposed to assign a part of its own Page-Rank value to Page B. Finally, just as the importance of a paper can be measured by the number of references cited in other papers, the importance of the webpage is judged according to the Page-Rank value.
As the degree of the bridge node in a network is likely to be small, the WPR algorithm may easily bring about underestimation of the node importance of the bridge node. Therefore, as the algorithms mentioned above focus greatly on the nodes’ local characteristics, it is difficult for them to evaluate special nodes, such as bridge nodes, which is a disadvantage of these algorithms.
Evaluation approaches reflecting the nodes’ global characteristics [
20] make up for the aforementioned shortcomings, and the weighted betweenness centrality (WBC) algorithm [
21] is a typical representative of them. The betweenness centrality [
22] is the ratio of the number of edges passing through a node in the shortest paths to the number of all shortest paths in the network. In a weighted network, the path length between the nodes is determined by the edge weight. In the process of infectious disease propagation, the probability for closely-connected persons to become infected is larger, thus the reciprocal of the edge weight is often used to measure the distance. For example, if the weight of one edge is twice the other one, then the distance of the former is half of the latter.
Based on these principles, the weighted betweenness centrality (WBC) algorithm [
21] is applicable to the world wide web (WWW) [
23]. Nonetheless, the delay of all these networks and the interference among nodes are not sensitive to the nodes’s number [
24,
25,
26]. In view of ignoring the nodes’ number mentioned above, it is hard to characterize the non-negligible effect of the nodes’ number on the transmission efficiency of the network. Therefore, when considering from only a global aspect, there is the possibility of overwhelming the local characteristics, which is disadvantageous in representing the node’s importance.
Hence, it is difficult to balance the local and global aspects for existing mainstream algorithms when evaluating the node importance. In this paper, we posit that the local and global characteristics should be described not only by directly-connected nodes but by all nodes in the network. Additionally, while both the local and global aspects should be considered, their influences are diverse in the various network structures. Therefore, as a combination of these two aspects, a new node importance evaluation approach “the weighted K-order propagation number (WKPN) algorithm” is proposed in this paper, in which K stands for the propagation stride.
As a comprehensive evaluation index, , is defined to adjust the combined contribution of both sides in evaluating the node importance. When K is smaller, is likely to represent the influence of the local characteristics; when K is larger, tends to represent the global influence. The K value can vary from 0 to the network diameter d, which is exactly the process considering local and global characteristics comprehensively. In summary, the WKPN algorithm has good universality for different networks and significant effects on preserving both global and local characteristics as much as possible.
In this paper, a detailed description of the creation of the WKPN algorithm and its experiments on various networks is provided. The organization of this paper is as follows. In
Section 2, we describe the establishment process of the WKPN algorithm in detail. In
Section 3, we present experiments on both simulation networks and real networks and discuss the results. In
Section 4, we give the conclusions obtained and the prospects for further research.
2. Weighted K-Order Propagation Number Algorithm
Models such as susceptible infective (SI), susceptible infective susceptible (SIS), and susceptible infective removed (SIR) [
27] are widely used in information propagation, which were originally applied to the domain of disease transmission. Among them, whether individuals can be cured and have immunity are important factors giving rise to differences of the above models. In the SI and SIS models, they assume that individuals do not have immunity and the population is divided into susceptible and infected individuals. In addition, the SIS model presumes that infected individuals have a certain possibility to return to a susceptible state and may be re-infected, which is opposite to the SI model. Based on this, the SIR model adds a new category, called “the immune”, in addition to the two original types. However, in these three models, the disease propagation process is assumed to be random contact, with the topological relationship between individuals ignored.
Inspired by all the above models, we propose the WKPN algorithm by abstracting the simplest disease propagation process where the infected individuals cannot be cured in the complex network.
First, assume an undirected network graph , in which is defined as the nodes set, is the edges set, and is the edge weights set. Among them, with the weight of , represents the edge between the nodes and .
Generally, there are two definition forms for edge weights: similar and dissimilar weight. For similar weight, a higher value corresponds to a shorter distance between the two nodes and vice versa. In this paper, similar weight forms are employed legitimately since the edge weight is defined as the disease propagation time. The smaller the edge weight is, the shorter the propagation time is, and the greater the node correlation is. Therefore, the similar weights are more available.
We assume that susceptible individuals can only be infected by direct contact with infected individuals. Then, the node and aggregate are defined as an infected source and the adjacent susceptible individuals. With respect to the node , will spread disease to with , spending time affected by the edge weight . In addition, if any node is affected by multiple infected sources in the propagation process, this method comprehensively evaluates that node.
By summarizing the above description of factors, such as the propagation probability and time consumption, the following hypotheses can be made:
Hypothesis 1. The infected individuals can only spread disease to those who are susceptible and adjacent.
Hypothesis 2. The time consumption caused by the disease propagation process is the edge weight between the nodes.
Hypothesis 3. A susceptible node will be transformed into an infected one once it is infected by any of its adjacent nodes.
When considering the importance of a node, a common method is to measure the time required for that node to infect all nodes in the network. The less time is spent, the higher importance the node has. For a connected network, the total number of nodes propagated from any infected source in the network after a long time will be the same. To cope with this problem, the propagation time K is introduced as another significant parameter. The smaller K is, the more likely to represent local network features, while a larger K is more likely to be a global feature. In particular, indicates that the propagation process has not yet started.
According to
Hypotheses 1 and 3, we can find the number of infected nodes
after the propagation time of
K, when setting
as the source of infection:
where
is named the
K-order propagation number, in which
represents the weight sum of the total edges through the shortest path from
to
, and
I is the indicator function. The larger
is, the more important the node is in the scale of
K. Equation (
1) is an improved version of the weighted network from our former research [
28] in unweighted networks. Moreover, when
K is larger than
d, which is the diameter of the largest connected part of a network, the
of any nodes will not change with
K. Therefore, the value of
K can only fall between 0 and
d.
It is clear that the value of the propagation time of
K is the key to the evaluation of node importance. After that, according to
, the
K-order structure entropy
is defined based on the information entropy. In this way, the network heterogeneity can be evaluated [
28] as:
The smaller the
K-order structure entropy
is, the stronger the heterogeneity of the networks is [
28]. Former research [
28] examined the heterogeneity of networks such as small world (WS) and scale-free (Barabasi-Albert (BA)) network. From the perspective of propagation process, the larger the value of
is, the smaller the difference among various
K-order propagation number
is, which is to set each node
as the source of infection. The
K-order structure entropy
needs to consider various cases of
K values, as both the local and global perspective of the impact on node importance are required. In summary, a comprehensive evaluation from
to
is considered and the node importance
of node
is defined as:
where
is the normalized result of
in order to avoid larger
masking the smaller ones since
usually grows with the increase of
K dramatically. Therefore, this paper maps
onto [0, 1], considering only the relative order of node importance. With respect to the weight coefficient
, we consider that the smaller the
K-order structure entropy
is, the larger the weight coefficient
is. Equation (
3) pays more attention to the moment when the difference of the node importance is relatively large and ignores the moment when the difference is small.
To summarize, is the aggregate of node importance calculated via the weighted K-order propagation number algorithm.
3. Node Importance Analysis for the WKPN Algorithm Based on a Deliberate Attack Strategy
To measure the features of the WKPN algorithm in the node importance assessment, comparisons were implemented, including a symmetric network with bridge nodes, the Science Museum visitor network [
29], the Facebook forum network [
30], the non-US airport routing network [
31], and the US 500 busiest commercial airports network [
32].
The deliberate attack strategy was employed to examine the node importance [
33,
34,
35], which refers to attacking the corresponding node, that is, removing all the connecting edges of the node. In this way, the algorithms were evaluated by the characteristics of a complex network change with the attack. As isolated nodes may appear after the network is attacked, the network efficiency
e was selected to evaluate the connectivity of the network. The expression of the network efficiency
e is
where
is the shortest path length between the nodes
and
, and with the increase of the
e value, the network efficiency is higher; when the network is totally composed of isolated nodes,
e takes the minimum value of 0.
Attacks may give rise to an interruption of the network connection path; the shortest path between the nodes will increase and the network efficiency will decrease accordingly. To reflect the reduction of the network efficiency after the attack more intuitively, the network efficiency decline rate
is defined as follows, according to former research [
36],
where
is the original network efficiency without an attack.
increases as the attack progresses from 0 to 1.
when the network has not been attacked and
when all edges have been deleted.
3.1. A Symmetric Network with Bridge Nodes
First, a symmetric network with bridge nodes was taken as an example (as shown in
Figure 1). The node importance aggregate
Q was calculated via the WKPN algorithm to compare with the MI algorithm [
16].
Table 1 is the node importance ranking, which was obtained by the above two algorithms.
There were some differences in evaluating node importance between the MI and WKPN algorithms. In the MI algorithm, the node importance of
and
was higher than
and
, but the algorithm proposed in this paper gave the opposite conclusion. We adopted the deliberate attack to measure the node importance of these nodes.
Table 2 gives the average efficiency value change of the network after deleting the corresponding nodes.
It is clear to see the decline of the average network efficiency after deleting any nodes, which indicates that the deletion weakens the information flow of the network to a certain extent. Nonetheless, it is difficult to neglect that the decline rate of the deleting nodes and is more than twice that of the deleting nodes and . Thus, we consider that the node importance of and is higher than that of and .
From the perspective of
Figure 1, the nodes
and
are in the position with the largest global information control capability, which is equivalent to two “bridge nodes”. With the greatest degree and total edge weight, the network will no longer be connected if these two nodes are deleted. Thus,
and
are of the greatest importance. However, the degrees of
and
are less than those of
and
. Hence, it is reasonable that the node importance of
and
ranked in second place.
Other sorting results in the WKPN algorithm were also consistent with the information shown in
Figure 1.
,
,
, and
were all connected to
and
, which had exactly the same node importance; however, the total edge weight of former two nodes was higher than the latter two. Thus,
and
were ranked after
and
.
and
were both at the margin of the network, which was intended to suffer less structural damage if they were deleted. Although both nodes were directly connected to the most important nodes,
and
, the edge weight between them was tiny. Hence,
and
were considered to be of the least importance.
Therefore, the WKPN algorithm was more accurate in evaluating the node importance.
3.2. Real Networks
To further verify the superiority of the WKPN algorithm, node importance research was conducted on certain real networks: the Science Museum visitor network, the Facebook forum network, the non-US airport routing network, and the US 500 busiest commercial airports network. The basic network features are shown in
Table 3. The network graph structures are shown in
Figure 2 and the
K-order structure entropies are shown in
Figure 3.
Due to the large number of nodes, in this section, the deliberate attack strategy refers to attacking the network concerning node importance from high to low. Considering the bias of node importance sorting before and after a deliberate attack, we updated the sorting result after every attack. In addition, if there were multiple nodes with equal node importance, the one with the minimum was selected to attack.
Furthermore, to analyze the changes in the network topology before and after the attack, the node number of maximum sub-graphs in the network was set as
according to former research [
35]. The WKPN algorithm was applied to these four complex networks mentioned above, and the simulation comparison results (curves of
and
with attacking times) were obtained, as shown in
Figure 4 and
Figure 5. In particular, the damping coefficient of the Page-Rank is 0.5.
As for the Science Museum, the network efficiency declined the most rapidly when deliberate attacks were carried out according to the rank of the WKPN algorithm and the WBC algorithm. After approximately 70 attacks, the network efficiency dropped by nearly 90%. The MI algorithm and WPR algorithm required approximately 100 times, while the WDC algorithm required 120 times, and the HDC algorithm needed 150 times to achieve a similar effect. In addition, when the WKPN and WBC algorithm were employed to attack the network, the decline rate of the was much higher than the other methods.
We could also attack the network to a paralysis and compare the number of attack times. Taking the WKPN and WBC algorithm as examples, when the network was attacked 80 times, the node number of the maximum sub-graph was only 8, which is only 4% of the original network. The network was essentially paralyzed. To achieve the same paralysis, the MI algorithm, the WPR algorithm, and the WDC algorithm required 120 times, while the HDC algorithm needed more than 160 times.
For the Facebook forum network, the damage degree and damage trend of the network were relatively close after the deliberate attacks via the WKPN and WBC algorithm. The network efficiency declined more quickly in the early stage and more moderately in the later stage. For the non-US airport routing networks, the WKPN algorithm gave rise to the fastest decline rate of , the node number of the maximum sub-graph. For the US 500 busiest commercial airports network, the WDC algorithm had the worst attack capability, which was relatively close to the other algorithms. Although decreased slower in the early stage compared with other algorithms, it also paralyzed the network in a small number of times.
In summary, for each network mentioned above, deliberate attacks based on the WKPN algorithm needed to remove fewer nodes with a higher node importance to achieve full damage to the network structure.