Relations between average clustering coefficient and another centralities in graphs

Mikhail Tuzhilin Affiliation: Moscow State University, Electronic address: mtu93@mail.ru;

Abstract

Relations between average clustering coefficient and global clustering coefficient, local efficiency, radiality, closeness, betweenness and stress centralities were obtained for simple graphs.

keywords:

Networks, centralities, local and global properties of graphs, Watts-Strogatz clustering coefficient, global clustering coefficient.

1 Introduction.

The centrality measure was introduced by Bonacich in [1]. Centrality is a local (with relation to a vertex) or global (with relation to a whole graph) measures in networks. There are many centrality measures (or shortly centralities) such as local efficiency, radiality, closeness, betweenness, stress centralities, etc. Calculation of centralities is very useful for finding intrinsic properties of “real” networks (which can be found in applications) [2]- [4]. One of most important centrality measure is a clustering coefficient, that differentiate “real” graphs (or small-world networks) and random generated graphs [5].

There are two definitions of clustering coefficient: the average clustering coefficient introduced by Watts-Strogatz [5] and the global clustering coefficient. It was shown in [6] that for windmill graphs the average clustering coefficient and the global clustering coefficient asymptotically different. More precisely, the average clustering coefficient tends to 1 and the global clustering coefficient tends to 0 if the number of vertices increasing. In this paper, author provides two large class of graphs for which the average clustering coefficient is less or equal than the global cluster coefficient and vise versa.

Nowadays, there are also many articles where centrality measures are used for calculations and predictions of certain network characteristics, but a very few with theoretical basis. In the article [7] relations between different centralities were obtained, also an estimation of the local efficiency was obtained in terms of the average clustering coefficient. In this article relations between the average clustering coefficient and another centralities are proved for simple undirected graphs, in particular, it is proved that the estimation of local efficiency in terms of the average cluster coefficient is in fact an equality.

2 Main definitions.

All subsequent definitions are given for a simple undirected graph $G$ without pendant vertices. It also can be defined to a simple graph with pendant vertices if every function where {vertex degree $-1$ } is in the denominator are defined to be equal to $0$ for all cases where vertices degrees equal to $1$ , but this will be omitted in this article for the sake of brevity.

Let’s give necessary denotations. Let’s denote by

•

$V(G)$ the set of vertices, $E(G)$ the set of edges and $A=\{a_{ij}\}$ adjacency matrix of graph $G$ .
•

Neighbourhood $N(v)$ — the set of vertices which adjacent to the vertex $v$ ,
•

$N^{\prime}(v)=N(v)\bigcup{v}$ subgraph in $G$ on these vertices,
•

$\bar{f}(x_{1},x_{2},...,x_{k})$ , where $f$ is any function $V\times V\times...\times V\rightarrow\mathbb{R}$ , the restriction of this function on $N^{\prime}(v)$ (for example $\bar{L}(x,y)$ will be the average shortest path between $x$ and $y$ with restriction to subgraph $N^{\prime}(v)$ ),
•

$d_{i}=deg(v_{i})$ ,
•

$n=\|V(G)\|,\;\;m=\|E(G)\|$ ,
•

$X(i)=X(v_{i})$ for any $X$ — set or function corresponding to vertex $v_{i}$ ,

Let’s give definitions of centralities:

1.

Diameter $diam(G)=\max_{s,t\in V(G)}dist(s,t)$ .
2.

Density $D(G)=\frac{\text{number of edges in }G}{\text{maximum possible number of edges% in }G}=\frac{2m}{n(n-1)}$ .
3.

Global efficiency $E_{glob}(G)=\frac{1}{n(n-1)}\sum\limits_{s\neq t}\frac{1}{dist(s,t)}$ .
4.

Average shortest path length $L(G)=\frac{1}{n(n-1)}\sum\limits_{s\neq t}dist(s,t)$ .
5.

Local cluster coefficient

$c_{i}=c(i)=\frac{\text{number of edges in }N(i)}{\text{maximum possible number% of edges in }N(i)}=\frac{2\|E(N(i)))\|}{d_{i}(d_{i}-1)}$ .
6.

Average clustering coefficient

$C_{WS}(G)=\frac{1}{n}\sum\limits_{i\in V(G)}c_{i}=\frac{1}{n}\sum\limits_{i\in V% (G)}\frac{2\|E(N(i)))\|}{d_{i}(d_{i}-1)}=\frac{1}{n}\sum\limits_{i\in V(G)}% \frac{\sum\limits_{j,k\in V(G)}a_{ij}a_{jk}a_{ki}}{d_{i}(d_{i}-1)}$ .
7.

Global clustering coefficient

$C(G)=\frac{\text{number of closed triplets in $G$}}{\text{number of all % triplets in $G$}}=\frac{\sum\limits_{i,j,k\in V(G)}a_{ij}a_{jk}a_{ki}}{\sum% \limits_{i\in V(G)}d_{i}(d_{i}-1)}$ .
8.

Betweenness centrality $BC(i)=\sum\limits_{s,t\in V(G),\;s\neq t\neq i}\frac{\sigma_{st}(i)}{\sigma_{% st}}$ , where $\sigma_{st}$ is the total number of shortest paths from $s$ to $t$ and $\sigma_{st}(i)$ is the total number of shortest paths which contains vertex $i$ .
9.

Closeness centrality $Clo(v)=\frac{n-1}{\sum\limits_{t\in V(G)}dist(v,t)}$ .
10.

Local efficiency $E_{loc}(G)=\frac{1}{n}\sum\limits_{v\in V(G)}E_{glob}(N(v)).$
11.

Radiality $Rad(v)=\frac{\sum\limits_{t\in V(G),t\neq v}(diam(G)+1-dist(v,t))}{n-1}$ .
12.

Stress $Str(i)=\sum\limits_{s,t\in V(G),\;s\neq t\neq i}\sigma_{st}(i)$ , where $\sigma_{st}(i)$ is the total number of shortest paths from $s$ to $t$ which contains vertex $i$ .

Note that all centralities are non-negative and $D(G),E_{glob},E_{loc},c_{i},C_{WS},C(G)$ are less or equal 1.

3 Main results.

All subsequent lemmas and theorems are given for a simple undirected graph $G$ without pendant vertices. It also can be defined to a simple graph with pendant vertices if every function where $d_{i}-1$ is in the denominator are defined to be equal to $0$ for all cases $d_{i}=1$ .

First let’s prove a lemma about a relation between average shortest path length between vertices in the neighbourhood of $i$ and local clustering coefficient of this vertex.

Lemma 1.

L(N(i))=2-c_{i}.

{addmargin}

[1em]0em

Proof.

L(N(i))=\frac{1}{d_{i}(d_{i}-1)}\sum\limits_{s,t\in N(i),s\neq t}dist(s,t)=% \frac{1}{d_{i}(d_{i}-1)}\sum\limits_{(s,t)\in E(N(i))}dist(s,t)+\sum\limits_{s% ,t\in N(i),(s,t)\notin E(N(i))}dist(s,t)=

=\frac{1}{d_{i}(d_{i}-1)}(2\|E(N(i))\|+\sum\limits_{(s,i),(i,t)\in E(G),(s,t)% \notin E(G)}dist(s,t))=

=\frac{1}{d_{i}(d_{i}-1)}(2\|E(N(i))\|+2(d_{i}(d_{i}-1)-2\|E(N(i))\|))=2-c_{i}.

Note that shortest paths for vertices in $N(i)$ are defined corresponding to whole graph $G$ . ∎

Let’s prove theorem about a connection between local efficiency and average clustering coefficient of a graph.

Theorem 1.

E_{loc}(G)=\frac{1}{2}(1+C_{WS}(G)).

{addmargin}

[1em]0em

Proof.

Let’s give two proofs of this fact:

Note that by definition $D(N(i))=c_{i}$ . In the article [7] it was proved that

3-L(N(i))\leq 2E_{glob}(N(i))\leq 1+D(N(i)).

Using lemma 1

3-(2-c_{i})\leq 2E_{glob}(N(i))\leq 1+c_{i}.

Note that shortest paths for vertices in $N(i)$ are defined corresponding to whole graph $G$ . Averaging by $i$ ends the proof.

Let’s rewrite the local clustering coefficient formula:

c_{i}=\frac{\sum\limits_{(s,t)\in E(N(i))}1}{d_{i}(d_{i}-1)},

\frac{1}{2}(1+c_{i})=\frac{1}{2}\frac{\sum\limits_{(s,t)\in E(N(i))}1+\sum% \limits_{(s,t)\in E(N(i))}1+\sum\limits_{s,t\in V(N(i)),(s,t)\notin E(N(i))}1}% {d_{i}(d_{i}-1)}=

=\frac{\sum\limits_{(s,t)\in E(N(i))}1+\sum\limits_{s,t\in V(N(i)),(s,t)\notin E% (N(i))}\frac{1}{2}}{d_{i}(d_{i}-1)}=\frac{\sum\limits_{s,t\in V(N(i))}\frac{1}% {dist(s,t)}}{d_{i}(d_{i}-1)}=E_{glob}(N(i)).

Averaging by $i$ ends the proof.

∎

Let’s prove theorem about a connection between average clustering coefficient and stress centrality.

Theorem 2.

C_{WS}(G)\geq\frac{1}{n}\sum\limits_{i\in V(G)}(1-\frac{Str(i)}{d_{i}(d_{i}-1)% }).

{addmargin}

[1em]0em

Proof.

Note that $\forall j,k\in N(i):(j,k)\notin E(N(i))$ the shortest path between $j$ and $k$ is $j\rightarrow i\rightarrow k$ . Therefore,

Str(i)\geq 2(\frac{d_{i}(d_{i}-1)}{2}-\|E(N(i))\|),

\frac{1}{d_{i}(d_{i}-1)}Str(i)\geq 1-c_{i},

Averaging by $i$

C_{WS}(G)\geq\frac{1}{n}\sum\limits_{i\in V(G)}(1-\frac{Str(i)}{d_{i}(d_{i}-1)% }).

Note that for $diam(G)=2$ holds an equality. ∎

Let’s prove theorem about a relation between average clustering coefficient and betweenness centrality.

Theorem 3.

Let’s $BC(i,N(i)):=\sum\limits_{j,k\in N(i),\;j\neq k\neq i}\frac{\sigma_{jk}(i)}{% \sigma_{jk}},$ then

C_{WS}(G)\leq\frac{1}{n}\sum\limits_{i\in V(G)}(1-\frac{BC(i,N(i))}{d_{i}(d_{i% }-1)}).

{addmargin}

[1em]0em

Proof.

Let’s note that

BC(i,N(i))=\sum\limits_{j,k\in N(i),\;(j,k)\notin E(N(i))}\frac{1}{\sigma_{jk}% }\leq\sum\limits_{j,k\in N(i),\;(j,k)\notin E(N(i))}1=d_{i}(d_{i}-1)-2\|E(N(i)% )\|,

\frac{BC(i,N(i))}{d_{i}(d_{i}-1)}\leq 1-c_{i}.

Averaging by $i$

C_{WS}(G)\leq\frac{1}{n}\sum\limits_{i\in V(G)}(1-\frac{BC(i,N(i))}{d_{i}(d_{i% }-1)}).

Note that for $diam(G)=2$ holds an equality. ∎

By using theorems 2 and 3 an estimation of average shortest path in the neighborhood of $i$ is obtained.

Corollary 1.

\frac{BC(i,N(i))}{d_{i}(d_{i}-1)}\leq L(N(i))-1\leq\frac{Str(i)}{d_{i}(d_{i}-1% )}.

Note that shortest paths for vertices in $N(i)$ are defined corresponding to whole graph $G$ .

Let’s prove lemma about a relation between average closeness centrality and average shortest path length in graph.

Lemma 2.

\frac{1}{n}\sum\limits_{v\in V(G)}Clo(v)\geq\frac{1}{L(G)}.

{addmargin}

[1em]0em

Proof.

By the inequality of harmonic mean and arithmetic mean

\frac{1}{n}\sum\limits_{v\in V(G)}Clo(v)=\frac{1}{n}\sum\limits_{v\in V(G)}% \frac{n-1}{\sum\limits_{t\in V(G)}dist(v,t)}\geq\frac{n(n-1)}{\sum\limits_{v,t% \in V(G)}dist(v,t)}=\frac{1}{L(G)}.

Note that an equality holds when all average shortest path lengths from any vertex to all remaining vertices are equal. ∎

Now let’s prove theorem about a relation between average clustering coefficient and closeness centrality.

Theorem 4.

\frac{1}{2-C_{WS}(G)}\leq\frac{1}{n}\sum\limits_{i\in V(G)}\frac{\sum\limits_{% v\in N(i)}\overline{Clo}(v)}{d_{i}}.

{addmargin}

[1em]0em

Proof.

By lemma 2

\frac{1}{d_{i}}\sum\limits_{v\in N(i)}\overline{Clo}(v)\geq\frac{1}{L(N(i))}=% \frac{1}{2-c_{i}}.

By the inequality of harmonic mean and arithmetic mean (since $\forall i\in V(G),\;0\leq c_{i}\leq 1$ ):

\frac{1}{n}\sum\limits_{i\in V(G)}\frac{\sum\limits_{v\in N(i)}\overline{Clo}(% v)}{d_{i}}\geq\frac{1}{n}\sum\limits_{i\in V(G)}\frac{1}{2-c_{i}}\geq\frac{n}{% \sum\limits_{i\in V(G)}(2-c_{i})}=\frac{1}{2-C_{WS}(G)}.

∎

Let’s prove lemma about a relation between average shortest path length and average radiality.

Lemma 3.

\frac{1}{n}\sum\limits_{v\in V(G)}{Rad}(v)=diam(G)+1-L(G).

{addmargin}

[1em]0em

Proof.

The proof holds from definition

\frac{1}{n}\sum\limits_{v\in V(G)}{Rad}(v)=\frac{1}{n}\sum\limits_{v\in V(G)}% \frac{(n-1)(diam(G)+1)-\sum\limits_{t\in V(G),\;t\neq v}dist(v,t))}{n-1}=diam(% G)+1-L(G).

∎

Now let’s prove theorem about a relation between average clustering coefficient and radiality.

Theorem 5.

C_{WS}(G)=\frac{1}{n}\sum\limits_{i\in V(G)}(\frac{1}{d_{i}}\sum\limits_{v\in N% (i)}\overline{Rad}(v)-1)+\frac{\#\{N(i)\text{ which are complete graphs}\}}{n}.

{addmargin}

[1em]0em

Proof.

By lemma 3

\frac{1}{d_{i}}\sum\limits_{v\in N(i)}\overline{Rad}(v)=diam(N^{\prime}(i))+1-% L(N(i))=diam(N^{\prime}(i))-1+c_{i}=c_{i}+1-\chi_{K_{d_{i}}}(N^{\prime}(i)),

where $\chi_{K_{d_{i}}}(N^{\prime}(i))=\begin{cases}1&\text{if $N^{\prime}(i)=K_{d_{i% }}$}\\ 0&\text{otherwise}\end{cases}$ . Averaging by $i$ ends the proof. ∎

Let’s prove two theorems about a relation between average clustering coefficient and global clustering coefficient.

Theorem 6.

Let’s $\forall i,j\in V(G),\;i\leq j$ hold $d_{i}\leq d_{j}\Rightarrow c_{i}\leq c_{j}$ , then

C_{WS}(G)\leq C(G).

{addmargin}

[1em]0em

Proof.

Let’s re-numerate vertices such that $\forall i\leq j:d_{i}\leq d_{j}$ . Note that

c_{i}=\frac{\sum\limits_{j,k\in V(G)}a_{ij}a_{jk}a_{ki}}{d_{i}(d_{i}-1)},\;\;C% (G)=\frac{\sum\limits_{i,j,k\in V(G)}a_{ij}a_{jk}a_{ki}}{\sum\limits_{i\in V(G% )}d_{i}(d_{i}-1)}.

Indeed,

a_{ij}a_{jk}a_{ki}=\begin{cases}1&\text{if there exists edge between vertices % $j$ and $k$ which adjacent to vertex $i$}\\ 0&\text{otherwise}\end{cases}

. Therefore,

C_{WS}(G)=\frac{1}{n}\sum\limits_{i\in V(G)}\frac{\sum\limits_{j,k\in V(G)}a_{% ij}a_{jk}a_{ki}}{d_{i}(d_{i}-1)}.

Let’s denote by $x_{i}=d_{i}(d_{i}-1)$ . Since $\|E(N(i))\|=\frac{1}{2}\sum\limits_{j,k\in V(G)}a_{ij}a_{jk}a_{ki}$ and the maximum number of edges in subgraph $N(i)$ equals to $\frac{d_{i}(d_{i}-1)}{2}$ , then $x_{i}\geq 2,\;0\leq c_{i}\leq 1.$ Hence, using Chebyshev’s sum inequality ( $d_{i}\leq d_{j}\Rightarrow x_{i}\leq x_{j}\text{ and }c_{i}\leq c_{j}$ ):

\frac{1}{n}\sum\limits_{i\in V(G)}x_{i}\;C_{WS}(G)=(\frac{1}{n}\sum\limits_{i% \in V(G)}x_{i})(\frac{1}{n}\sum\limits_{i\in V(G)}c_{i})\leq\frac{1}{n}\sum% \limits_{i\in V(G)}x_{i}c_{i}=\frac{1}{n}\sum\limits_{i,j,k\in V(G)}a_{ij}a_{% jk}a_{ki}.

Therefore,

C_{WS}(G)\leq\frac{\sum\limits_{i,j,k\in V(G)}a_{ij}a_{jk}a_{ki}}{\sum\limits_% {i\in V(G)}d_{i}(d_{i}-1)}=C(G).

The equality holds when $\forall i,j\in V(G):d_{i}=d_{j}$ , that is for a graphs in which all vertices degrees are equal. ∎

Corollary 2.

Let’s $\forall i,j\in V(G),\;i\leq j$ hold $d_{i}\leq d_{j}\Rightarrow c_{i}\geq c_{j}$ , then

C_{WS}(G)\geq C(G).

The proof is the same as in theorem 6.

References

[1] Bonacich P. Factoring and weighting approaches to status scores and clique identification //Journal of mathematical sociology. 1972. 2. № 1. 113–120.
[2] Borgatti S. P., Everett M. G. A graph-theoretic perspective on centrality //Social networks. 2006. 28. № 4. 466–484.
[3] Kiss C., Bichler M. Identification of influencers—measuring influence in customer networks //Decision Support Systems. 2008. 46. № 1. 233–253.
[4] Lee S. H. M., Cotte J., Noseworthy T. J. The role of network centrality in the flow of consumer influence //Journal of Consumer Psychology. 2010. 20. № 1. 66–77.
[5] Watts D. J., Strogatz S. H. Collective dynamics of ‘small-world’networks //nature. 1998. 393. № 6684. 440–442.
[6] Estrada E. When local and global clustering of networks diverge //Linear Algebra and its Applications. 2016. 488. 249–263
[7] Strang A. et al. Generalized relationships between characteristic path length, efficiency, clustering coefficients, and density //Social Network Analysis and Mining. 2018. 8. 1–6.