1. Introduction
Network theory is one of the most important frameworks for a meaningful description and an efficient analysis of many complex systems [
1,
2,
3]. Most popular analysis algorithms comprise the mining of a single network, e.g., using community detection algorithms [
4]. In parallel, the comparison of networks has led to the introduction of many algorithms for comparing the structures on both a global and a local scale [
5,
6] that fall into the class of network alignment (NA) algorithms. The first class of algorithms, also known as global network alignment (GNA) algorithms, aims to find the overall similarity among networks. Differently, algorithms belonging to the second group are called local network alignment (LNA) algorithms and aim to find (relatively) small regions of similarity. The output of LNA algorithms is a set of matched regions (or subgraphs) among two graphs given as the input.
More recently, in many application fields, e.g., mobile and social networks and connectomics and metabolomics studies, the need for introducing models more complex than traditional networks arises [
7,
8]. In such contexts, nodes may have different classes of interactions among them, and such interactions may also be time-varying. In particular, networks representing multiple different associations among patients can be represented by a multilayer graph comprised of multiple interdependent graphs, where each graph represents an aspect or a set of similar interactions [
9,
10].
Figure 1 represents a simple multilayer graph with three layers. Each layer is a different graph
G. Edges of a multilayer graph can be
intra-layer, i.e., connecting nodes of the same layer, and
inter-layer, i.e., connecting nodes of two different layers [
11,
12].
Formally, a multilayer network graph may be described as a tuple
=
, where
is a set of layers and
is a set of edges among layers. For each layer
k, we have a graph
(intra-layer edges), and for each pair of layers,
k,
h we have a set of edges
, which is a set of layers connecting nodes of the layers v and k [
13].
Examples of multilayer networks come from many different fields, from social network analysis to biological networks. For instance,
Figure 2 represents an example of a biological multilayer network representing the interplay among diseases, genes, and drugs.
While many efforts have been made to address challenges related to the analysis of a single network, i.e., community detection in multilayer graphs, there is a need for the formalization and introduction of algorithms to compare multilayer networks. A simple strategy is an adaption or the simple use of existing algorithms for LNA. Unfortunately, this strategy is unsuitable, as previously demonstrated also in heterogeneous networks [
14] because the the current algorithms are not able to manage the difference among layers.
Thus, we propose local alignment of multilayer networks (MultiLoAl), a novel algorithm for the local alignment of multilayer networks. We define the local alignment of multilayer networks and propose a heuristic for solving it. MultiLoAl is based on an extension of the previous L-HetNetAligner [
14], so it is based on the following steps, as depicted in
Figure 3. Our algorithm receives two multilayer networks and a set of similarity relationships among nodes of the same layer in both networks used as the seed to build the alignment.
For instance, considering biological networks, similarity relations are represented by orthologs. The user may find these relations in databases of orthologs (e.g., OrthoMCL, etc.). It produces a set of multilayer graphs representing single local regions of the alignment.
The algorithm merges two input multilayer graphs into a single one, named the multilayer alignment graph, a multilayer graph with the same number of layers of the two inputs, and each layer represents an alignment graph of the same layer of the two input ones. For each node of a layer k, the alignment graph features pairs of nodes of the input ones. After building each alignment graph for each layer separately, we analyse the two input graphs to add inter-layer edges of the multilayer alignment graph. Finally, the algorithm uses a community detection algorithm suitable for multilayer graphs to detect communities representing local regions of similarity, i.e., a single region of the alignment. The result of our algorithm is a list of mappings among a subset of nodes of two networks, i.e., a set of mapped regions among input graphs.
We also realized a preliminary implementation of our algorithm by using the R programming language. We here refined such an implementation even in a high performance computing (HPC) fashion and provided deeper experimentation on a larger dataset. The main contributions of this paper are: (i) the implementation of a novel algorithm for the local alignment of multilayer networks, (ii) the definition of the local alignment of multilayer networks, (iii) the solution of a heuristic for solving it, and (iv) the implementation of a synthetic multilayer network generator to build the data for the algorithm evaluation.
The rest of this paper is organized as follows.
Section 2 discusses the background on multilayer networks and multilayer community detection.
Section 3 presents the MultiLoAl algorithm.
Section 4 presents and discusses the results. Finally,
Section 5 concludes the paper.
3. MultiLoAl Algorithm
Initially, the algorithm inputs two multilayer networks and a set of similarities among node pairs of the same layer into the input networks. Then, it builds the alignment by performing two main steps: (i) construction of the multilayer alignment graph and (ii) mining of the multilayer alignment graph.
MultiLoAl analyses separately each corresponding pair of the corresponding layers of the input graphs. Each pair of a network of the same layer builds an alignment graph, as previously shown in L-HetNetAligner [
14]. Then, it analyses the inter-layer edges of the input networks to add inter-layer edges to the multilayer alignment graph. Once the alignment graph is built, we use an algorithm for detecting communities in multilayer networks to uncover relevant modules.
Figure 4 shows these steps.
MultiLoAl is a novel algorithm for the local alignment of multilayer networks. MultiLoAl builds the alignment on two main steps, as depicted in
Figure 3:
Step (i) may be subdivided into two substeps: (i.a) adding nodes and intra-layer edges; (i.b) adding inter-layer edges.
Let us consider two multilayer input graphs and .
Node colours are used to distinguish different types of nodes belonging to two different types of layers. For simplicity, two multilayer input networks have the same number of nodes.
3.1. Step (1.a): Adding Nodes and Intra-Layer Edges to the Alignment Graph
In the first step, the algorithm considers each pair of corresponding layers separately see
Figure 5. For each layer, it builds an alignment graph following the approach proposed in L-HetNetAligner [
14], adapted to the case of one-colour networks, as reported in
Figure 6.
At this stage, the algorithm, starting from an initial list of seed nodes, builds the alignment graph by initially constructing two intermediate alignment graphs, which we call alignment graph layer 1 and alignment graph layer 2, for two networks belonging to layer 1 and two networks belonging to layer 2. Therefore, we define the alignment graph
as a graph constructed by two initial input graphs
and
. Each node
represents the matching of nodes of the input graphs, so
. The selection of node pairs is guided by the input similarity relationships. Therefore, each node is matched with the most similar node of the other network through the use of the input similarity relationships, i.e., seed nodes; each node of the alignment graph represents a pair of similarities among nodes from the input networks; see
Figure 7.
Once all nodes have been added to the graph, the algorithm builds the edges of the alignment graphs. For each pair of nodes, the algorithm examines the two input graphs, and it inserts and weights the edges considering three conditions: match, mismatch, and gap. Let us consider the nodes of the alignment graphs; in particular, let us consider the pair of nodes
and
in
Figure 6. To determine the presence of an edge, we consider the edge
network and
network. If
and
contain these nodes and the nodes are adjacent, there is a
match, which we call, for convenience, a
homogeneous match, since the nodes of the two networks are of the same type (see
Figure 8a).
Let us consider
as the node distance, i.e., the length of the shortest connecting path threshold to discriminate between gaps and mismatches. If
and
contain these nodes and the nodes are adjacent only in a single network, there is a
mismatch, which we call a
homogeneous mismatch (
Figure 8b).
If
and
contain these nodes, the nodes are adjacent only in a single network, and they are at a distance less than
(gap threshold) in the other network, there is a
gap, which we call a
homogeneous gap (
Figure 8c). After the edges of the alignment, graphs are added, and a weight is assigned to each edge by applying an ad hoc scoring function
F and the gap threshold
. The function assigns a high score to the matches than to the mismatches and gaps. The kind of scoring function has a large significance for the resulting alignment graph and on the alignment itself. The algorithm enables the user to choose other values to optimize the quality of the results. In this work, we set the weight assigned to each edge as follows: homogeneous match equal to 1, homogeneous mismatch equal to 0.5, homogeneous gap equal to 0.2.
3.2. Step (1.b): Adding Inter-Layer Edges
The algorithm adds the inter-layer edges among multilayer alignment graph layer 1 and alignment graph layer 2. For each pair of nodes in the multilayer alignment graphs, the algorithm examines the corresponding layers of the input graphs. Let us consider the pair of nodes
and
in
Figure 8. To determine the presence of an edge, we consider the edge
network and
network. The initial graph contains both edges connecting their internal nodes, and if the nodes are adjacent, there is a
match, which we call, for convenience, a
heterogeneous match, since the nodes of the two networks are of different types; see
Figure 9a.
Let us consider the pair of nodes
and
in
Figure 8b. To determine the presence of an edge, we consider the edge
network and
network.
contains the edge
, while nodes
and
are disconnected in
If the initial graph contains both edges connecting their internal nodes and the nodes are adjacent, there is a
match, which we call, for convenience, a
heterogeneous match, since the nodes of the two networks are of different types; see
Figure 9a. Therefore, there is a
heterogeneous mismatch (
Figure 9b). Then, we set the weight assigned to each edge as follows: heterogeneous match equal to 0.9, heterogeneous mismatch equal to 0.4.
3.3. Step 2: Detection of Communities on the Alignment Graph
Finally, the final alignment graph is then mined to discover communities by applying a community detection algorithm by using existing algorithms for multilayer networks [
27,
28,
29,
30], see
Figure 10. Our methodology presents a general design, so it is possible to mine the final alignment graph by applying a different mining method.
In the current version of MultiLoAl, we applied the Infomap algorithm to mine the communities on the alignment graph. However, the user can choose which community detection algorithm to select among Generalized Louvain, ABACUS, clique percolation, and mdlp. The output consists of a file that contains the extracted communities as a list of nodes, the weight of the edge, and the string in which it is reported if there is a homogeneous/heterogeneous match, homogeneous/mismatch, or homogeneous gap (see an example of the output at
https://github.com/mmilano87/MultiLoAl (accessed on 12 August 2022)).
3.4. MultiLoAl vs. L-HetNetAligner
MultiLoAl, despite being based on the previous L-HetNetAligner, presents many different characteristics. First, the algorithms have different scopes: MultiLoAl is a local aligner of multilayer networks, while L-HetNetAligner works only on heterogeneous networks. In detail, by analysing the building of local alignment, MultiLoAl and L-HetNetAligner have two main general steps: (i) construction of the alignment graph; (ii) mining of the alignment graph. The building of the alignment graph is the first main difference among the two algorithms. In fact, MultiLoAl builds a multilayer alignment graph through two substeps: (i) by adding nodes and intra-layer edges, following the approach proposed in L-HetNetAligner adapted to the case of one-colour networks; (ii) by adding inter-layer edges. This last step represents the main novelty compared to L-HetNetAligner, because MultiLoAl analyses and adds the edges among different layers of input networks. Otherwise, L-HetNetAligner builds a heterogeneous alignment graph. Initially, L-HetNetAligner defines the nodes of the alignment graph as composite nodes representing pairs of nodes matched by the similarity considerations. The algorithm inserts and weights the edges in the alignment graph to the nodes for which the edge links have the same colour and according to their distance in the input network. Finally, once the alignment graph is built, both algorithms mine the alignment graph to discover modules that represent local alignment. MultiLoAl applies a community detection algorithm, Infomap, to mine the final alignment. The result consists of the extracted communities as a list of nodes, the weight of the edge, and the string in which it is reported if there is a homogeneous/heterogeneous match, homogeneous/mismatch, or homogeneous gap. Otherwise, L-HetNetAligner uses the Markov clustering (MCL) algorithm to cluster the graph. Each extracted module represents a single region of the alignment. The result of our algorithm is a list of mappings among a subset of nodes of two networks, i.e., a set of mapped regions among input graphs.
4. Results and Discussion
4.1. Evaluation of the Quality of the Alignment
The evaluation of the quality of the alignment of network is still a matter of debate for simple networks [
5,
31,
32]. There exist many measures able to evaluate both the correctness of the alignment, as well as the quality of the obtained alignment [
33]. On the other side, there is no gold standard to benchmark the alignment. Moreover, all the existing measures need to be extended in the multilayer case. Thus, we first introduce novel measures of correctness in the multilayer case (to the best of our knowledge, there are not any other available measures), then we perform an assessment of our methods. We first designed a proof of concept to show the ability of our algorithm to map correct nodes and edges by aligning a synthetic network with itself and with some randomised versions.
The correctness of an alignment is usually evaluated by means of the analysis of its topological quality, i.e., the ability to reconstruct the underlying true node mapping well (when such a mapping is known) and if it conserves many edges. For simple networks, (F-score node correctness) is a measure of node correctness, and it is a combination of two measures: and . is calculated as , and is defined as , where M is the set of node pairs that are mapped under the true node mapping and N is the set of node pairs that are aligned under an alignment f.
We here extended such a measure in the multilayer case. We first considered in a separate way each layer, and we calculated the for each layer . Then, we computed the multilayer as the average of all .
Similarly, the edge correctness in the simple case can be measured by considering NCV-G, which is a combination of two measures: high node coverage (NCV) and generalized (G). NCV is the percentage of nodes from and that are also in and , and G measures how well edges are conserved between and , where and are two graphs and and are subgraphs of and that are induced by the mapping.
We used NCV-G to measure the edge correctness of each layer , then we averaged the measures of such values for all the layers, and we obtained the multilayer NCV-G.
Finally, we should consider the edge correctness for the inter-layer edges. Without loss of information, we considered all the inter-layer edges as a whole, and we calculated the correctness of all the inter-layer edges as .
4.2. Proof of Concept
As a proof of principle, we present the use of the MultiLoAl dataset consisting of ten multilayer synthetic networks that we built with the graph generator, implemented ad hoc in the R code. An example of the multilayer network and R function are available on the web site of the project (
https://github.com/mmilano87/MultiLoAl (accessed on 12 August 2022)).
All the multilayer networks have 30 nodes and 2 layers, whereas the edges are distributed as depicted in
Table 1.
First, we aligned each network with respect to itself to show the ability to find known regions of similarity; second, we considered the alignment of the network with respect to an altered version of the network obtained by adding different levels of noise (5%, 10%, 15%, 20%, and 25%) by randomly removing edges from the network. The aim of the test was to demonstrate that the alignment algorithms are capable of producing high-quality alignments with an edge conservation of about 90%. Then, we implemented different versions of the MultiLoAl algorithm by varying the strategy applied to mine the community on the alignment graph. We executed the experiments on an Intel Core i5 Processor, 2.9 Ghz, with 4 Gbytes of main memory running the Ubuntu OS ver 18.04. MultiLoAl built 60 alignments, and it completed the whole process of alignment in ten seconds.
To measure the performance of the alignments built with different versions of MultiLoAl, we evaluated the quality of the results by considering the topological aspects of alignments and the number of communities found. At first, the results were evaluated by the topological quality.
We computed the NCV-G
and F-NC measures for all alignment networks by considering the intra-layer and inter-layer edges.
Table 2,
Table 3,
Table 4 and
Table 5 report the results.
Table 6,
Table 7,
Table 8 and
Table 9 report the mean and standard deviation values of the NCV-G
and F-NC measures for each synthetic network aligned with its noisy counterpart.
The results show that the quality of the alignment was greater when Infomap was applied to mine the community. Furthermore, increasing the noise level from 5% to 25% in the original networks caused NCV-G and F-NC to decrease.