Network alignment, which aims at learning a matching between the same entities across multiple information networks, often suffers challenges from feature inconsistency, high-dimensional features, to unstable alignment results. This article presents a novel network alignment framework, Unsupervised Adversarial learning based Network Alignment(UANA), that combines generative adversarial network (GAN) and reinforcement learning (RL) techniques to tackle the above critical challenges. First, we propose a bidirectional adversarial network distribution matching model to perform the bidirectional cross-network alignment translations between two networks, such that the distributions of real and translated networks completely overlap together. In addition, two cross-network alignment translation cycles are constructed for training the unsupervised alignment without the need of prior alignment knowledge. Second, in order to address the feature inconsistency issue, we integrate a dual adversarial autoencoder module with an adversarial binary classification model together to project two copies of the same vertices with high-dimensional inconsistent features into the same low-dimensional embedding space. This facilitates the translations of the distributions of two networks in the adversarial network distribution matching model. Finally, we develop an RL based optimization approach to solve the vertex matching problem in the discrete space of the GAN model, i.e., directly select the vertices in target networks most relevant to the vertices in source networks, without unstable similarity computation that is sensitive to discriminative features and similarity metrics. Extensive evaluation on real-world graph datasets demonstrates the outstanding capability of UANA to address the unsupervised network alignment problem, in terms of both effectiveness and scalability.

1 Introduction

With the proliferation of social media platforms, such as Facebook, LinkedIn, and Instagram, each user usually participates in multiple social worlds, each with unique structures and semantics. Network alignment has received increasing attention in recent years [13, 53, 56, 59, 67, 91, 94, 114, 122, 132]. Network alignment can provide a comprehensive view of entities by integrating multiple aligned networks into one more complete/compact network as well as new insights about how entities interact with and influence each other on different information networks, enabling knowledge transferring across networks. Network alignment techniques have gained huge successes on a wide variety of real-world problems in different domains, such as biological or protein network analysis [14, 35, 36, 56, 81], image matching in computer vision [15, 26], ontology matching [32, 79], user account linking in different social networks [25, 46, 48, 65, 78, 127, 128], and knowledge translation in multilingual knowledge bases [51, 107, 157]. Despite the great success of existing network alignment techniques, the flood of big networks still poses several unsolved computational challenges.

Feature inconsistency. Traditional network alignment techniques are based on the fundamental assumption of topological and/or attribute consistency [3, 9, 21, 26, 31, 39, 81, 117, 119, 122]: Two vertices in different networks are more likely to have a higher matching score if they share similar topological and/or attribute features in respective networks. However, it is often observed that the same entities in different networks have very diverse neighbors and behaviors and thus have quite different structural and attribute features. Figure 1 presents an illustrating example of network alignment for two social networks \(G^1\) and \(G^2\). The actual offline names of users in two networks are anonymized and only their virtual online usernames are available. The goal of network alignment is to align different copies of the same users across two networks, denoted by red dashed lines. Due to various functionalities of different networks, say Facebook for socializing with friends vs. LinkedIn for job seeking, the users in \(G^1\) and \(G^2\) have different features, e.g., four users in \(G^1\) and five users in \(G^2\), attributes Locations in \(G^1\) and Affiliations in \(G^2\). In this situation, different copies of the same vertices across two networks are not often most similar to each other. Without handling feature inconsistency, choosing the vertex pairs with the maximum similarity scores in two networks as the alignment results usually leads to biased or incorrect results. A natural way to address the feature inconsistency issue is to compare pairwise vertices by utilizing common features but ignoring inconsistent features, e.g., MEgo2Vec [112] considers the common neighbors of each pair of vertices from two networks. This strategy is useful for two networks with similar structures. However, two copies of the same vertices on two different networks often share few or no common neighbors. It is challenging to get a clear consensus from different networks with unique characteristics.

Fig. 1.

High-dimensional features. Real-world graph data typically have high-dimensional features. The huge memory consumption dramatically limits the applicability of network alignment methods on large-scale network data. A practical solution is to employ network embedding techniques to learn low-dimensional representation of big networks and preserve the original correlations between vertices in the embedding space, and thus achieve acceptable network alignment results over the network embeddings while further improving the memory performance [13, 31, 48, 53, 60, 88, 112, 132]. PALE [60], FRUI-P [132], and UUIL [48] perform the individual embeddings on two networks. In this case, the feature inconsistency issue still maintains in the individual embedding space of two networks to be aligned. MAH [88], IONE [53], SNNA [47], and CrossMNA [13] project the anchor node pairs with known matching information into the same embedding vectors in a semi-supervised manner. REGAL [31] jointly embeds two graphs based on the similarity computation with the assumption of topological and attribute consistency. MEgo2Vec [112] generates candidate user pairs with considering only the user pairs with similar usernames before executing the network embedding. Can we develop a joint network embedding model to project two networks into the same embedding space for alleviating the feature inconsistency issue without the help of prior alignment and side information in an unsupervised way?

Unsupervised learning. The majority of existing network alignment algorithms are supervised methods [4, 26, 37, 58, 60, 65, 108, 109, 128] or semi-supervised approaches [38, 47, 53, 88, 108, 122, 127], which utilize known anchor links connecting aligned vertices or additional alignment constraints to guide the alignment process. On the other hand, the unsupervised methods are able to provide more generic network alignment solutions with the absence of prior alignment knowledge [3, 9, 31, 39, 48, 115, 116, 128]. However, the unsupervised methods often rely heavily on highly discriminative features for ensuring the effectiveness of network alignment and thus limit their applicability under different scenarios. In addition, the quality of the network alignment results is sensitive to the similarity computation between vertices across different networks. This may result in unstable network alignment results, due to inadequate discriminative features and user-defined similarity metrics.

With these fundamental challenges in mind, we develop a dual Generative adversarial network (GAN) and reinforcement learning (RL) based network alignment approach, Unsupervised Adversarial learning based Network Alignment (UANA), with three original contributions.

A bidirectional cycle-based adversarial network distribution matching model is proposed to train the network alignment task in an unsupervised way. We jointly translate the pairwise copies of the same vertices from one network to another, move the cross-network translated distributions of two networks from two directions, and make the distribution of one network translated from another one as close as possible to the distribution of this network itself. This makes the distributions of real and translated networks completely overlap together. Two cross-network alignment translation cycles are built for training the unsupervised alignment without prior alignment knowledge.

With the goal of alleviating the feature inconsistency issue, a dual adversarial autoencoder module is plugged into an adversarial binary classification model to project two copies of the same vertices with high-dimensional inconsistent features into the same low-dimensional embedding space, such that they are closest to each other for facilitating the translations of the distributions of two networks in the adversarial network distribution matching model.

To avoid the unstable similarity computation that is sensitive to discriminative features and similarity metrics, based on the matched distributions of real and translated networks, a policy gradient-based optimization approach is developed to solve the vertex matching problem in the discrete space of the GAN model, which cannot be optimized by the stochastic gradient descent (SGD) method. The proposed method can directly select the vertices in target networks most relevant to the vertices in source networks.

Extensive evaluation on real graph datasets demonstrate the strength of UANA to address the network alignment problem in terms of both effectiveness and scalability.

2 Problem Definition

We formally define our research problem as follows. In this article, we are given two networks \(G^1 = (V^1, E^1, A^1, \mathbf {A}^1, \mathbf {B}^1)\) and \(G^2 = (V^2, E^2, A^2, \mathbf {A}^2, \mathbf {B}^2)\). Each network is denoted as \(G^k = (V^k, E^k, A^k, \mathbf {A}^k, \mathbf {B}^k)\) \((1 \le k \le 2)\), where \(V^k = \lbrace v_1^k, \ldots , v_{N_k}^k\rbrace\) is the set of \(N_k\) vertices, \(E^k = \lbrace (v_i^k, v_j^k) : 1 \le i, j \le N_k, i \ne j\rbrace\) is the set of \(L_k\) edges, and \(A^k = \lbrace a_1^k, \ldots , a_{M_k}^k\rbrace\) is the set of \(M_k\) attributes. \(\mathbf {A}^k\) denotes the \(N_k \times N_k\) adjacency matrix of \(G^k\). \(\mathbf {A}_{ij}^k\) specifies the value of the edge between two vertices \(v_i^k\) and \(v_j^k\) in \(V^k\). \(\mathbf {B}^k\) specifies the \(N_k \times M_k\) attribute matrix of \(G^k\). \(\mathbf {B}_{ij}^k\) denotes the attribute value of vertex \(v_i^k\) on attribute \(a_j^k\). We use symbols \(\mathbf {A}_{i:}^k\) and \(\mathbf {B}_{i:}^k\) to denote the \(i{\mathrm{th}}\) row vectors of \(\mathbf {A}^k\) and \(\mathbf {B}^k,\) respectively. The concatenation of \(\mathbf {A}_{i:}^k\) and \(\mathbf {B}_{i:}^k\), denoted by \(\mathbf {A}_{i:}^k \oplus \mathbf {B}_{i:}^k\), is used to specify the original representation of the copy \(v_i^k \in V^k\) in \(G^k\) of a vertex \(v_i\), consisting of both structural and attribute features. For ease of presentation, we use \(v_i^k\) to denote the vertex copy itself and its original representation \(\mathbf {A}_{i:}^k \oplus \mathbf {B}_{i:}^k\). The problem of UANA contains three analysis objectives.

(1) Consensus network embedding: Map the original representation \(\mathbf {A}_{i:}^k \oplus \mathbf {B}_{i:}^k\) of each vertex \(v_i^k\) in \(G^k (1 \le k \le 2)\) to a low-dimensional representation \(u_i^k\) with the minimum reconstruction error between the output and the input, i.e.,

and \(d\lt \lt (N_k+M_k)\). In addition, two different copies \(v_i^1\) in \(G^1\) and \(v_i^2\) in \(G^2\) of any vertex \(v_i\), should be projected into the same embedding space for alleviating the feature inconsistency issue, i.e., \(u_i^1 = u_i^2\).

(2) Network distribution matching: Match the distribution of real networks \(G^k\) and the distribution of translated networks \(\tilde{G}^k\) through the alignment translation from \(G^l\) \((1 \le k, l \le 2, k \ne l)\), i.e., continuously move and twist the distributions of \(G^k\) and \(\tilde{G}^k\) and make two distributions as close as possible until two distributions completely overlap together, for ensuring a good network alignment from \(G^l\) to \(G^k\). By following the same idea in existing efforts [46, 47, 106], this article matches the distribution of real networks and the distribution of translated networks by minimizing the Wasserstein distance between two distributions. It is a symmetric distance metric that is equal to zero when two distributions completely overlap together.

(3) Bidirectional vertex matching: Directly select the most relevant embedding \(u_i^k\) of the copy \(v_i^k\) in \(G^k\) of the same vertex \(v_i\) for the given embedding \(u_i^l\) of the copy \(v_i^l\) in \(G^l\) of \(v_i\) \((1 \le k, l \le 2, k \ne l)\) without unstable similarity computation. In addition, the unsupervised bidirectional network alignment should be supported, i.e., given \(v_i^1\) in \(G^1\), the counterpart \(v_i^2\) in \(G^2\) can be retrieved without the need of ground truth, and vice versa. Although we do not utilize any ground truth (i.e., pre-aligned vertex pairs) in the entire learning process, we still use the ground truth information for evaluation only.

The GAN technique was proposed to generate photographs that look authentic to human observers [29, 77]. In the GAN model, a generator \(\mathbf {G}\) tries to fool a discriminator \(\mathbf {D}\) by generating real-looking fake images but \(\mathbf {D}\) aims at distinguishing between real and fake images. As a result of the GAN training, \(\mathbf {D}\) cannot differentiate real images from fake images generated by \(\mathbf {G}\). This implies that the generated fake images become identical to the real images. In other words, the fake images have the same distribution as the real images.

This motivates us to extend the basic GAN model to a dual GAN model to learn the bidirectional matchings between the distributions of two networks for alleviating the feature inconsistency issue as well as train the unsupervised network alignment task. Figure 2 exhibits a toy example of adversarial network alignment. There are the distributions of two networks \(G^1\) and \(G^2\), where each dot denotes a vertex in \(G^1\) or \(G^2\) and two vertices within each network are close/distant if they have similar/dissimilar structural and attribute features. For example, \(v^1_1\) and \(v^2_1\), \(v^1_2\) and \(v^2_2\), and \(v^1_3\) and \(v^2_3\) are three pairs of aligned nodes in \(G^1\) and \(G^2,\) respectively. Two similar nodes \(v^1_1\) and \(v^1_2\) are very close but \(v^1_1\) and \(v^1_3\) are quite distant. As shown in Figure 2(a), two distributions initially have quite different shapes, due to their inconsistent structural and attribute features. In Figure 2(b), an adversarial classification model in Equation (8), containing two encoders \(\mathbf {E_1}\) for \(G^1\) and \(\mathbf {E_2}\) for \(G^2\) and a binary classifier \(\mathbf {C}\), is designed to jointly project \(G^1\) and \(G^2\) into the approximately same embedding space for facilitating the translations of the embeddings of \(G^1\) and \(G^2\). In order to avoid the unstable similarity computation, an adversarial network distribution matching model in Equation (4), consisting of two alignmenters \(\mathbf {M_{12}}\) (or \(\mathbf {M_{21}}\)) for the translation from \(G^1\) to \(G^2\) (or from \(G^2\) to \(G^1\)), is developed to learn the bidirectional alignments between pairwise vertices in \(G^1\) and \(G^2\). Therefore, translated \(\tilde{G}^2\) (or \(\tilde{G}^1\)) from \(G^1\) (or \(G^2\)) are moved close to real \(G^2\) (or \(G^1\)) in Figure 2(c) until the distributions of two networks completely overlap together in Figure 2(d), which implies that \(\mathbf {M_{12}}\) (or \(\mathbf {M_{21}}\)) can produce a good network alignment from \(G^1\) to \(G^2\) (or from \(G^2\) to \(G^1\)). By combining the bidirectional alignments \(G^1\stackrel{\mathbf {M_{12}}}{\longrightarrow }\tilde{G}^2\) and \(G^2\stackrel{\mathbf {M_{21}}}{\longrightarrow }\tilde{G}^1 \approx G^1\) together, the network alignment tasks can be trained in an unsupervised way, since we can translate \(G^1\) into \(\tilde{G}^2\) with \(\mathbf {M_{12}}\) and return back to \(\tilde{G}^1 \approx G^1\) with \(\mathbf {M_{21}}\), i.e., \(G^1\stackrel{\mathbf {M_{12}}}{\longrightarrow }\tilde{G}^2\stackrel{\mathbf {M_{21}}}{\longrightarrow }\tilde{G}^1 \approx G^1\), denoted by red line in Figure 2, and vice versa. Therefore, we do not need any prior aligned vertex pairs as the ground truth for the training.

Fig. 2.

3 NEtwork ALignment

3.1 Framework

Figure 3 presents the model architecture, which contains three analytics components. For ease of presentation, we ignore the vertex subscripts and use \(v\) to denote any same vertex in two networks \(G^1\) and \(G^2\). \(v^1\) and \(v^2\) are used to denote two copies in \(G^1\) and \(G^2\) of \(v\) and their original representations, respectively. Two pairs of encoders and decoders are shared by all three components. Concretely, encoders \(u^k = \mathbf {E_k}(v^k)\) and decoders \(\hat{v}^k = \mathbf {\tilde{E}_k}(u^k)\) for \(G^k\) \((1 \le k \le 2)\) output the embedding \(u^k\) and the reconstruction \(\hat{v}^k\) of \(v^k\) in \(G^k\).

Fig. 3.

Three analytics components consist of (1) bidirectional cycle-based adversarial network distribution matching aims at learning the bidirectional matchings between the distributions of real \(G^1\) (or \(G^2\)) and translated \(\tilde{G}^1\) (or \(\tilde{G}^2\)) from \(G^2\) (or \(G^1\)) with the alignmenter \(\mathbf {M_{21}}\) (or \(\mathbf {M_{12}}\)), such that two distributions completely overlap together. Two cross-network alignment translation cycles are built for training the unsupervised alignment without prior knowledge, e.g., one red cycle \(u^1\stackrel{\mathbf {M_{12}}}{\longrightarrow }\tilde{u}^2\stackrel{\mathbf {M_{21}}}{\longrightarrow }\tilde{u}^1 \approx u^1\) and one cyan cycle \(u^2\stackrel{\mathbf {M_{21}}}{\longrightarrow }\tilde{u}^1\stackrel{\mathbf {M_{12}}}{\longrightarrow }\tilde{u}^2 \approx u^2\) in Figure 3. (2) Adversarial binary classification tries to achieve the consensus network embedding of \(G^1\) and \(G^2\): project two copies \(v^1\) in \(G^1\) and \(v^2\) in \(G^2\) of the same vertex \(v\) into the same embedding space, i.e., \(u^1 = u^2\). And (3) policy gradient-based optimization of the alignmenters aims at optimizing \(\mathbf {M_{21}}\) (or \(\mathbf {M_{12}}\)) in the discrete space of the GAN model, i.e., select the most relevant \(\tilde{u}^1\) (or \(\tilde{u}^2\)) in \(\tilde{G}^1\) (or \(\tilde{G}^2\)) for the given \(u^2\) (or \(u^1\)) in \(G^2\) (or \(G^1\)), based on the matched distributions of real and translated networks.

3.2 Bidirectional Cycle-Based Adversarial Network Distribution Matching

A straightforward GAN model for adversarial network distribution matching is defined as follows:

(1)

At each GAN training, \(\mathbf {D_k}\) is trained to maximize the probability of the embedding \(u^k\) of the copy \(v^k\) of \(v\) in \(G^k\), i.e., \(\mathbf {D_k}(u^k)\). In addition, \(\mathbf {D_k}\) is fooled by using the embedding \(u^l\) of the copy \(v^l\) of \(v\) in \(G^l\) as input to maximize \(\mathbf {D_k}(u^l)\) \((1 \le k, l \le 2, k \ne l)\). The GAN training continuously moves and twists the distributions of \(G^k\) and \(G^l\) and makes two distributions as close as possible until the two distributions completely overlap together, i.e., \(p_{\mathbf {E_l}}(v^l,u^l) = p_{\mathbf {E_k}}(v^k,u^k)\).

The following theorems show that the distributions of two networks can be matched in terms of the embedding features:

Theorem 1.

Given fixed encoders \(\mathbf {E_1}\) and \(\mathbf {E_2}\), the optimal discriminators \(\mathbf {D_1}, \mathbf {D_2}\) for \(\max \limits _{\mathbf {D_1}, \mathbf {D_2}} \mathcal {V}(\mathbf {D_1}, \mathbf {D_2}, \mathbf {E_1}, \mathbf {E_2})\) are \(\mathbf {D_1}^*(u) = f_{12}(v,u)\) and \(\mathbf {D_2}^*(u) = f_{21}(v,u)\), where \(f_{12}\) and \(f_{21}\) are two Radon–Nikodym derivatives.

Theorem 2.

The global minimum of \(\max \limits _{\mathbf {D_1}, \mathbf {D_2}} \mathcal {V}(\mathbf {D_1}, \mathbf {D_2}, \mathbf {E_1}, \mathbf {E_2})\), i.e., \(\min \limits _{\mathbf {E_1}, \mathbf {E_2}} \max \limits _{\mathbf {D_1}, \mathbf {D_2}} \mathcal {V}(\mathbf {D_1}, \mathbf {D_2}, \mathbf {E_1}, \mathbf {E_2})\), is \(-\log 16\) and achieved iff \(p_{\mathbf {E_1}}(v,u) = p_{\mathbf {E_2}}(v,u)\) with \(\mathbf {D_1}^*(u) = \frac{1}{2}\) and \(\mathbf {D_2}^*(u) = \frac{1}{2}\).

The original GAN problem in Equation (1) is easy to solve and optimize, but cannot directly offer the answer of network alignment. We still need to calculate the similarities between vertices across two networks to produce the alignment results. In order to avoid the unstable similarity computation, we convert the original GAN problem in Equation (1) to another equivalent GAN problem for matching the vertices by introducing two alignmenters \(\tilde{u}^1 = \mathbf {M_{21}}(u^2)\) and \(\tilde{u}^2 = \mathbf {M_{12}}(u^1)\).

In the equivalent GAN problem in Equation (4), a discriminator \(\mathbf {D_1}\) outputs the probability of a sample vertex embedding from the true distribution \(p_{\mathbf {E_1}}(u^1|v^1)\) rather than the fake distribution \(p_{\mathbf {E_2}}(u^2|v^2)\) through the translation \(p_{\mathbf {M_{21}}}(\tilde{u}^1|u^2)\), i.e., distinguishes real \(v^1\) in \(G^1\) from fake \(\tilde{v}^1\) through the alignment translation of \(v^2\) in \(G^2\). Given a sample \(u^2\) in \(G^2\), an alignmenter \(\tilde{u}^1 = \mathbf {M_{21}}(u^2)\) selects the most relevant \(\tilde{u}^1\) in \(\tilde{G}^1\) (i.e., the alignment from \(G^2\) to \(G^1\)), in terms of the conditional distribution \(p_{\mathbf {M_{21}}}(\tilde{u}^1|u^2)\), and then the selected \(\tilde{u}^1\) is fed into \(\mathbf {D_1}\) for differentiation. \(\mathbf {D_2}(u^2)\) and \(\tilde{u}^2 = \mathbf {M_{12}}(u^1)\) are the counterparts for the network alignment from \(G^1\) to \(G^2\) with similar meanings.

We train \(\mathbf {D_1}\) (or \(\mathbf {D_2}\)) to maximize the probability of a sample from real \(u^1\) (or \(u^2\)) in \(G^1\) (or \(G^2\)), while optimizing \(\mathbf {M_{21}}\) (or \(\mathbf {M_{12}}\)) to fool \(\mathbf {D_1}\) (or \(\mathbf {D_2}\)) by selecting translated \(\tilde{u}^1\) (or \(\tilde{u}^2\)) from \(u^2\) (or \(u^1\)) in \(G^2\) (or \(G^1\)) with maximizing \(\mathbf {D_1}(\mathbf {M_{21}}(\mathbf {E_2}(v^2))\) (or \(\mathbf {D_2}(\mathbf {M_{12}}(\mathbf {E_1}(v^1))\)). As a result, the GAN training will make the distribution of one network translated from another network as close as possible to the distribution of this network itself until the distributions of real and translated networks completely overlap together. Thus, \(\mathbf {D_1}\) (or \(\mathbf {D_2}\)) cannot differentiate real \(u^1\) (or \(u^2\)) in \(G^1\) (or \(G^2\)) from translated \(\tilde{u}^1\) (or \(\tilde{u}^2\)) from \(u^2\) (or \(u^1\)) in \(G^2\) (or \(G^1\)). It implies that \(\mathbf {M_{12}}\) and \(\mathbf {M_{21}}\) can produce good bidirectional network alignment results from \(G^1\) to \(G^2\) and from \(G^2\) to \(G^1\).

Given two close vertices \(v_i^1\) and \(v_j^1\) in \(G^1\) and two close vertices \(v_s^2\) and \(v_t^2\) in \(G^2\) in terms of both structural and attribute features, we have large values of \(\mathbf {A}_{ij}^1\), \(\mathbf {A}_{st}^2\), \(\mathbf {B}_{i:}^1 \cdot \mathbf {B}_{j:}^1\), and \(\mathbf {B}_{s:}^2 \cdot \mathbf {B}_{t:}^2\), where \(\cdot\) denotes the inner product between two attribute vectors. Intuitively, if \(v_i^1\) and \(v_s^2\) are the alignment, which implies \(\mathbf {E_1}(v_i^1) \approx \mathbf {E_2}(v_s^2)\) based on the adversarial classification model in Equation (8), then it is highly possible that their close neighbors \(v_j^1\) and \(v_t^2\) are the alignment too, i.e., \(\mathbf {E_1}(v_j^1) \approx \mathbf {E_2}(v_t^2)\). The following homophily consistency loss aims at maintaining the vertex homophily consistency between pairwise vertices in both the original and embedding spaces. Namely, strong structural correlations and similar attribute features between pairwise vertices correspond to similar embedding features between them.

\begin{equation} \begin{split}&\mathcal {L}_{homo} = \min \limits _{\mathbf {E_1}, \mathbf {E_2}} \sum _{v_i^1 \leftrightarrow v_s^2, v_j^1 \leftrightarrow v_t^2,\atop v_i^1 \wedge v_j^1 \ne \emptyset , v_s^2 \wedge v_t^2 \ne \emptyset } \\ \mathbf {A}_{ij}^1 \mathbf {A}_{st}^2 \sqrt {\mathbf {B}_{i:}^1 \cdot \mathbf {B}_{j:}^1} &\sqrt {\mathbf {B}_{s:}^2 \cdot \mathbf {B}_{t:}^2} \Big |||\mathbf {E_1}(v_i^1) - \mathbf {E_2}(v_s^2)||_2 - ||\mathbf {E_1}(v_j^1) - \mathbf {E_2}(v_t^2)||_2\Big |,\end{split} \end{equation}

(6)

where \(v_i^1 \leftrightarrow v_s^2\) and \(v_j^1 \leftrightarrow v_t^2\) represent two alignments.

A cycle loss is introduced to guarantee the quality of the unsupervised bidirectional network alignment with two alignment translation cycles, as shown in Equation (7). Intuitively, if a vertex embedding \(u^1\) in \(G^1\) can be translated into the counterpart \(\tilde{u}^2\) in \(\tilde{G}^2\) with \(\mathbf {M_{12}}\) and translated back to \(\tilde{u}^1 \approx u^1\) itself with \(\mathbf {M_{21}}\), i.e., \(u^1\) can be returned back to itself through the cycle of \(\mathbf {M_{12}}\) and \(\mathbf {M_{21}}\), then it is unnecessary to have the prior alignment knowledge between \(u^1\) and \(u^2\) as the ground truth for training. The only focus is whether the original \(u^1\) is the same as the translated \(\tilde{u}^1\) through the translation cycle.

(7)

3.3 Adversarial Binary Classification

Due to inconsistent structural and attribute features, different copies of the same vertices across two networks are not often most similar to each other. Training two independent embedding models for two networks with inconsistent features are hard to eliminate the feature inconsistency issue in the individual embedding space of two networks. Thus, an adversarial binary classification model is designed to jointly train the embedding models of two networks, such that two copies of the same vertices in different networks are projected into the same embedding space.

(8)

where \(\mathbf {C}\) is a binary classifier to predict the probability of a sample vertex embedding \(\mathbf {E_1}(v^1)\) (or \(\mathbf {E_2}(v^2)\)) belonging to class \(\mathbf {1}\) (i.e., \(G^1\)) or class \(\mathbf {2}\) (i.e., \(G^2\)). \(H\) is the cross-entropy for adversarial classification.

We train \(\mathbf {C}\) to maximize the probability of a sample \(\mathbf {E_1}(v^1)\) belonging to class \(\mathbf {1}\) or \(\mathbf {E_2}(v^2)\) on class \(\mathbf {2}\) by minimizing \(H\), while optimizing \(\mathbf {E_1}\) and \(\mathbf {E_2}\) by maximizing \(H\) to fool \(\mathbf {C}\). Compared with the basic GAN model, \(\mathbf {C}\) is like the discriminator and \(\mathbf {E_1}\) and \(\mathbf {E_2}\) are like the generators. As a result, \(\mathbf {C}\) cannot differentiate the embeddings \(\mathbf {E_1}(v^1)\) and \(\mathbf {E_2}(v^2)\) in two networks of the same vertices \(v\). This implies that \(v^1\) and \(v^2\) are projected into the same embedding space, i.e., \(\mathbf {E_1}(v^1) \approx \mathbf {E_2}(v^2)\).

The following theorems exhibit our adversarial classification model can alleviate the feature inconsistency issue:

Theorem 4.

Given fixed encoders \(\mathbf {E_1}\) and \(\mathbf {E_2}\), the optimal classifier \(\mathbf {C}\) for \(\max \limits _{\mathbf {C}} \mathcal {W}(\mathbf {C}, \mathbf {E_1}, \mathbf {E_2})\) is \(\mathbf {C}^*(u) = \frac{p_{\mathbf {E_1}}(v,u)}{p_{\mathbf {E_1}}(v,u) + p_{\mathbf {E_2}}(v,u)}\) for class \(\mathbf {1}\) and \(\mathbf {C}^*(u) = \frac{p_{\mathbf {E_2}}(v,u)}{p_{\mathbf {E_1}}(v,u) + p_{\mathbf {E_2}}(v,u)}\) for class \(\mathbf {2}\).

Theorem 5.

The global minimum of \(\max \limits _{\mathbf {C}} \mathcal {W}(\mathbf {C}, \mathbf {E_1}, \mathbf {E_2})\), i.e., \(\min \limits _{\mathbf {E_1}, \mathbf {E_2}} \max \limits _{\mathbf {C}} \mathcal {W}(\mathbf {C}, \mathbf {E_1}, \mathbf {E_2})\), is \(-\log 4\) and achieved iff \(p_{\mathbf {E_1}}(v,u) = p_{\mathbf {E_2}}(v,u)\) with \(\mathbf {C}^*(u) = \frac{1}{2}\) for class \(\mathbf {1}\) and \(\mathbf {C}^*(u) = \frac{1}{2}\) for class \(\mathbf {2}\).

The proof of Theorems 3 and 5 is omitted due to space limit. The proof methods are similar to Theorems 1 and 2.

We introduce a loss function to minimize the reconstruction error between the output and the input, and preserve the original network structure in the embedding space.

(9)

Intuitively, we want the embedding semantics of one network to be preserved when translated to another network and returned to itself, say \(\mathbf {E_1}(\mathbf {\tilde{E}_1}(\mathbf {E_2}(v^2))) \approx \mathbf {E_2}(v^2)\). A semantic preservation loss is introduced to maintain this property.

(10)

Therefore, the overall objective function of the adversarial network alignment for jointly training dual GAN models is given as follows:

\begin{equation} \begin{split}\mathcal {L}_{Total} =\ &\min \limits _{\mathbf {E_1}, \mathbf {E_2}, \mathbf {M_{12}}, \mathbf {M_{21}}} \max \limits _{\mathbf {D_1}, \mathbf {D_2}} \mathcal {U}(\mathbf {D_1}, \mathbf {D_2}, \mathbf {E_1}, \mathbf {E_2}, \mathbf {M_{12}}, \mathbf {M_{21}}) \\ & + \ \min \limits _{\mathbf {E_1}, \mathbf {E_2}} \max \limits _{\mathbf {C}} \mathcal {W}(\mathbf {C}, \mathbf {E_1}, \mathbf {E_2}) + \mathcal {L}_{homo} + \mathcal {L}_{cycle} + \mathcal {L}_{reco} + \mathcal {L}_{pres.}\end{split} \end{equation}

(11)

We jointly train the adversarial classification model for alleviating the feature inconsistency issue and the adversarial network distribution matching model for training the unsupervised alignment without pairwise similarity computation.

3.4 Policy Gradient-Based Optimization of \(\mathbf {M_{12}}\) and \(\mathbf {M_{21}}\)

Traditional network alignment methods aim at transforming or embed two networks into the same space by maximizing the similarities (or minimizing the distances) between projected or embedded nodes across networks. The node pairs with the largest similarities or smallest distances are selected as the alignment results [46, 47, 60, 106, 108, 128]. Notice that the above network alignment process include two steps: (1) optimize a trainable embedding function or transform matrix to map two networks into the same space; and (2) compute similarity scores between nodes across networks (based on a certain similarity measure) and output the most similar node pairs (i.e., hard alignment) or top-\(k\) matched nodes for each node (i.e., soft alignment). All of these methods focus on the optimization of Step 1 but Step 2 is often deterministic and non-trainable.

In this work, in order to train the network alignment task in an unsupervised way, we propose a bidirectional cycle-based adversarial learning model to translate \(G^1\) into \(\tilde{G}^2\) with \(\mathbf {M_{12}}\) and return back to \(\tilde{G}^1 \approx G^1\) with \(\mathbf {M_{21}}\), i.e., \(G^1\stackrel{\mathbf {M_{12}}}{\longrightarrow }\tilde{G}^2\stackrel{\mathbf {M_{21}}}{\longrightarrow }\tilde{G}^1 \approx G^1\). In our work, encoders \(\mathbf {E_1}\) and \(\mathbf {E_2}\) correspond to the trainable embedding function in the above Step 1, while alignmenters \(\mathbf {M_{12}}\) and \(\mathbf {M_{21}}\) correspond to the non-trainable selection operation in the Step 2. We want to optimize \(\mathbf {M_{12}}\) and \(\mathbf {M_{21}}\) to guarantee the quality of unsupervised learning, i.e., ensure that a node can be returned back to itself through the cycle of \(\mathbf {M_{12}}\) and \(\mathbf {M_{21}}\).

The generators in the traditional GAN models generate fake images from continuous noise space and thus the standard SGD method can be used to optimize them. However, given a vertex embedding in one network, the selection process of the most relevant one among actual vertex embeddings in another network by the alignmenters is discrete and non-trainable. At each time, the alignmenter \(\mathbf {M_{21}}(u^2)\) (or \(\mathbf {M_{12}}(u^1)\)) selects the most relevant one \(\tilde{u}^1\) among actual vertex embeddings in network \(G^1\) (or \(\tilde{u}^2\) in network \(G^2\)) for the given embedding \(u^2\) in \(G^2\) (or \(u^1\) in \(G^1\)). Thus, the SGD method cannot solve this specific optimization problem in the discrete space.

Recall that Step 2 in traditional network alignment methods that compute the similarity scores and output the alignment results are often non-trainable and thus cannot be optimized, we try to utilize the policy gradient technique to optimize this step (i.e., selection of most similar vertex pairs). The RL models essentially aim at optimizing a function mapping from states to actions so as to collect as much reward as possible when the agents transition from state to state and executing an action in each state provides the agents with a reward. In our case, the states correspond to the vertex embeddings and the actions correspond to the selection of most similar vertex pairs in terms of their embeddings. The reward is to make all the translated \(\tilde{u}_i^1\) from \(u_i^2\) in \(G^2\) be the same as the actual \(u_i^1\) in \(G^1\). We thus utilize the policy gradient technique to optimize the parameters of \(\mathbf {M_{21}}\) and \(\mathbf {M_{12}}\), by maximizing the expected cumulative reward of the entire network alignment process.

Notice that \(\mathbf {M_{21}}\) and \(\mathbf {M_{12}}\) exist in only Equations (4) and (7), and we ignore irrelevant terms in the overall objective in Equation (11) and optimize \(\mathbf {M_{21}}\), when \(\mathbf {D_1}\), \(\mathbf {E_2}\), and \(\mathbf {\tilde{E}_1}\) are fixed, as follows. For convenience of presentation, let

and

, where \(u^1\) and \(u^2\) are real copies, \(\tilde{u}^1 = \mathbf {M_{21}}(u^2)\) in the training of \(\mathbf {D_1}\), \(\tilde{u}^1 = \mathbf {M_{21}}(\mathbf {M_{12}}(u^1))\) and \(\tilde{u}^2 = \mathbf {M_{12}}(\mathbf {M_{21}}(u^2))\) in the unsupervised learning loss are translated copies.

(12)

We define the environment of policy gradient as follows: (1) the state \(s_t\) at time step \(t\) for the embedding \(u_i^2\) of the copy \(v_i^2\) in network \(G^2\) of a vertex \(v_i\) is defined as the concatenation of the previously-visited embeddings \(\lbrace \tilde{u}_1^1, \ldots , \tilde{u}_{t-1}^1\rbrace\), the currently-visiting embedding \(\tilde{u}_t^1\) in network \(G^1\), and the \(u_i^2\) itself, i.e., \(s_t = \lbrace \tilde{u}_1^1, \ldots , \tilde{u}_t^1, u_i^2\rbrace\); (2) the action space \(a_t\) at step \(t\) contains two actions of \(select\) and \(skip\) to determine whether \(\tilde{u}_t^1\) in \(G^1\) will be selected for the GAN training or not, according to a logistic policy; (3) the policy model \(\pi _\theta (a_t|s_t)\) chooses an action \(a_t\) given the current state \(s_t\) at step \(t\).

\begin{equation} \pi _\theta (a_t|s_t) = \log p_{\mathbf {M_{21}}}\left(\tilde{u}_t^1|u_i^2\right) . \end{equation}

(13)

We use the alignmenter \(\tilde{u}_t^1 = \mathbf {M_{21}}(u_i^2)\) as the policy model to perform the vertex matching operation; and (4) the network alignment model follows the policy to determine which action (\(select\) or \(skip\) an embedding for the alignment) at each state, and finally receives a delayed reward \(r(s_ta_t)\) at the terminal state \(T\) only when all \(N_1\) and \(N_2\) vertices in \(G^1\) and \(G^2\) are processed. We have the zero reward at other states.

\begin{equation} r(s_ta_t) = \frac{1}{N_2} \sum _{i=1}^{N_2} \Big [\log \frac{1}{1-\mathbf {D_1}(\tilde{u}_i^1)} - \left|\left|u_i^2 - \tilde{u}_i^2\right|\right|_2 \Big ] - \frac{1}{N_1} \sum _{i=1}^{N_1} \left|\left|u_i^1 - \tilde{u}_i^1\right|\right|_2, \ t = T. \end{equation}

(14)

We aim at maximizing the average reward for all vertices to be aligned, i.e., make all the translated \(\tilde{u}_i^1\) from \(u_i^2\) in \(G^2\) be the same as the actual \(u_i^1\) in \(G^1\). At each time, based on the current state \(u_i^2\), we follow the policy \(\pi _\theta (a_t|s_t)\) to take action \(\tilde{u}_t^1\) with the reward of \(r(s_ta_t)\). Now, we can do the gradient ascent to optimize the policy parameters of \(\mathbf {M_{21}}\) by using the REINFORCE algorithm [99]. We also use the same strategy to optimize \(\mathbf {M_{12}}\).

(15)

4 Experimental Evaluation

In this section, we perform a set of experiments to evaluate the performance of UANA on real graph datasets. All experiments were done on a Linux Server with two Intel Xeon E5-2650 v4 at 2.66 GHz, 128 GB main memory, 1 TB HDD, four NVIDIA GeForce GTX 1080 Ti (4*11 GB memory). The algorithm is implemented with TensorFlow v1.9.0 and Python v3.6.3.

Datasets. We collect five groups of real-world graph datasets from different fields, e.g., social platforms and academics, as given in Table 1.

Table 1.

Dataset	Network	\(\#\)Nodes	\(\#\)Edges	\(\#\)Matched Nodes
SNS	Myspace	10,733	22,162	267
SNS	Flickr	12,974	32,298	267
SNS	Last.fm	15,436	32,638	452
SNS	Flickr	12,974	32,298	452
Academia:	2013	13,786	51,240	10,443
DBLP17K	2014	17,363	58,024	10,443
Academia:	2013	53,249	195,383	32,377
DBLP50K	2014	54,299	197,627	32,377
Academia:	2013	100,870	701,179	67,641
DBLP100K	2014	100,876	813,804	67,641

Table 1. Statistics of the Datasets

– Social networks. Myspace [122] is a social network offering an interactive, user-submitted network of friends, where nodes specify users and edges represent the friendships between users. Last.fm [122] is a music-oriented online social network that provides a radio streaming service, where vertices represent users and edges denote their following relationships. Flickr [122] is an online social network to offer image hosting service and video hosting service, where nodes denote users and edges specify their friendships. The same users occurring in two networks are used as the ground truth for evaluation only.

– Academic networks. DBLP is a computer science bibliography website that offers an online search and retrieval service for the academic community.¹ We collect three groups of DBLP datasets with highly prolific \(17K\), \(50K\), and \(100K\) authors from all research areas. We split each dataset group into two coauthor networks in different years (2013 and 2014), respectively, where nodes denote authors and edges specify their coauthorships. The same authors existing in two networks of each dataset group are used as the ground truth for evaluation.

Baseline methods. We compare our UANA algorithm with six state-of-the-art network alignment methods.

– FINAL [117] includes a family of algorithms to leverage both structure and attribute information for solving the alignment problem of two attributed networks. It formulates the problem from an optimization perspective based on the alignment consistency principle.

– iNEAT [120] integrates both network alignment and network completion into a unified learning model by jointly training two tasks and mutually enhancing each other.

– CAlign [9] jointly trains multiple tasks of network alignment, community discovery, and community alignment in an unsupervised way.

– REGAL [31] is an unsupervised network alignment framework that jointly embeds two graphs into low-dimensional space and infers the soft alignment based on the similarity computation with the assumption of topological and attribute consistency.

– Deep graph matching consensus (DGMC) [26] is a supervised graph matching procedure which aims at reaching a data-driven neighborhood consensus between matched node pairs.

– RANA [75] is a preliminary version of this manuscript that was published in the proceedings of the 2019 IEEE International Conference on Data Mining (ICDM’19). It is a supervised learning algorithm without policy gradient-based optimization.

For two supervised learning algorithms of DGMC and Reinforcement learning and Adversarial learning based Network Alignment (RANA), the 10% of ground truth is randomly selected as the training data.

Variants of our model. We evaluate three variants of UANA to show the strengths of adversarial learning and RL one by one.

– UANA is a complete version with the support of bidirectional cycle-based adversarial network distribution matching, adversarial classification, and policy gradient-based optimization.

– UANA-R is a partial version without adversarial classification. It fails to project two networks to be aligned into the same low-dimensional embedding space for alleviating the feature inconsistency issue as well as facilitating the translations of the distributions of two networks.

– UANA-C is a partial version without policy gradient-based optimization. It has to carefully choose discriminative features and similarity metrics to calculate the similarity scores between pairwise vertices across two networks to generate the alignment results. Existing works that use the Wasserstein distance to measure the distance between two discrete distributions often utilize the Cosine distance measure to compare node embeddings across networks [46, 47, 106]. In this case, by following the similar strategy, we calculate the Cosine distances between node embeddings across networks. The pairs of node embeddings with the smallest distances are selected as the alignment results. Thus, UANA-C can only optimize encoders \(\mathbf {E_1}\) and \(\mathbf {E_2}\) but not alignmenters \(\mathbf {M_{12}}\) and \(\mathbf {M_{21}}\) and it is difficult to guarantee the quality of bidirectional cycle-based adversarial network distribution matching.

Evaluation metrics. We use three measures to evaluate the quality of network alignment results generated by different methods. For each test, we perform five independent trials on each dataset group at each setting. The average performance scores of six competitor methods and three variants of our model in five trials over the same dataset groups are reported. Please refer to the cited papers for the detailed definitions of Accuracy [9, 117, 120] and F1 [112]. A larger Accuracy or F1 value indicates a better network alignment. We adopt the RMSE between the real copy \(v_i^1\) in a network \(G^1\) of a vertex \(v_i\) and the translated copy \(\tilde{v}_i^1\) by the alignment algorithms from the copy \(v_i^2\) in another network \(G^2\) of \(v_i\).

\begin{equation} RMSE = \sqrt {\frac{\sum _{i=1}^{N_1} ((v_i^1 - \tilde{v}_i^1)/v_i^1)^2}{N_1}} , \end{equation}

(15)

where \(v_i^1\) denotes a vertex itself and its original features. The smaller the RMSE value, the better the quality. In addition, we evaluate the execution time achieved by different network algorithms, respectively.

Parameter setting. We embed high-dimensional original features of each vertex into low-dimensional latent features. For two networks to be aligned, their encoders are jointly trained, which ensure two copies in two networks of the same vertices with inconsistent features are projected into the same embedding space for facilitating the alignment translation.

In our current implementation, all players in the dual GAN models in Equations (4) and (8) are implemented as three-layer perceptrons (input–hidden-output). The number of neurons in the hidden layer for discriminators \(\mathbf {D_k}\), encoders \(\mathbf {E_k}\) and decoders \(\mathbf {\tilde{E}_k}\), classifier \(\mathbf {C}\), and alignmenters \(\mathbf {M_{lk}}\) \((1 \le k, l \le 2, k \ne l)\) are set to \(d_1 = 20\), \(d_2 = 20\), \(d_3 = 40\), and \(d_4 = 40,\) respectively. Notice that both \(\mathbf {E_k}\) and \(\mathbf {\tilde{E}_k}\) have the same number of neurons in the hidden layer (i.e., \(d_2 = 20\)). The dimension of the embedding space is equal to 20. We solve the adversarial learning problem iteratively by first training \(\mathbf {D_1}\) and \(\mathbf {D_2}\) with fixed \(\mathbf {E_1}\), \(\mathbf {\tilde{E}_1}\), \(\mathbf {E_2}\), \(\mathbf {\tilde{E}_2}\), \(\mathbf {M_{12}}\), and \(\mathbf {M_{21}}\) using a mini-batch of size 100 and then training \(\mathbf {E_1}\), \(\mathbf {\tilde{E}_1}\), \(\mathbf {E_2}\), \(\mathbf {\tilde{E}_2}\), \(\mathbf {M_{12}}\), and \(\mathbf {M_{21}}\) with fixed \(\mathbf {D_1}\) and \(\mathbf {D_2}\) using a mini-batch of size 200. Similarly, we alternately train \(\mathbf {C}\) with \(\mathbf {E_1}\) and \(\mathbf {E_2}\) in the adversarial classification model.

4.1 Effectiveness Analysis

Figure 4 exhibits the network alignment quality by nine algorithms on the DBLP17K dataset to align its 2013 version with 2014 version. In order to verify the robustness to the noisy data, we add the noisy edges to two versions of DBLP17K with different levels of noisy data (i.e., the ratio of modified edges and attribute values to original clean ones), say 5% (i.e., 0.05), by randomly adding structure noise (i.e., edge addition and deletion) and attribute noise (i.e., attribute value modification) with the half noise level, respectively. We perform the network alignment test for all nine algorithms at different noise levels.

Fig. 4.

It is observed from Figure 4 that among all nine methods, the generated network alignment results by our proposed UANA methods achieve the best quality in most experiments, showing the power of UANA in the presence of feature inconsistency and high-dimensional features. In most experiments, the performance by our unsupervised UANA method is comparable to our supervised RANA algorithm with the 10% of training data. A reasonable explanation is that the RANA method fails to utilize the policy gradient-based optimization approach to select the vertex pairs in two networks. The selection of discriminative features and similarity metrics may not be optimal when calculating the similarity scores between pairwise vertices across two networks.

Figures 5 and 6 present the network alignment quality on two pairs of social networks. Similar trends are observed for the network alignment comparison in these two figures: UANA achieves the largest Accuracy values (\(\gt\)0.712) and the smallest RMSE (\(\lt\)0.0101), which are better than other eight methods in most tests. Notice that even if the noise level is very high, such as 0.25 and 0.3, UANA still can achieve considerable quality improvement. It demonstrates that UANA is robust to noisy data. This advantage is very important for network alignment applications since each information network often has its unique structural and attribute features.

Fig. 5.

Fig. 6.

In addition, we have observed from Figures 4–6 that the performance achieved by most network alignment algorithms decreases with increasing noise. This is consistent with the findings in our recent studies [74, 124, 125, 156]: Network alignment models are highly vulnerable to natural noise and adversarial attacks, i.e., small input perturbations in graph structures and node attributes can cause the model failures. In these works, we have evaluated tens of representative network alignment models. All algorithms are fairly sensitive to both random noise and adversarial attacks. In Figures 4–6, although the performance by Fast attrIbuted Network ALignment (FINAL) and Incomplete NEtwork AlignmenT (iNEAT) is relatively stable under different noise levels, their performance scores are fairly low, compared with our UANA method. Our UANA method still achieves relatively stable performance against different noise levels. The accuracy and F1 scores oscillate within the range of only 6.3% and 3.4%, demonstrating the effectiveness and robustness of UANA.

4.2 Ablation Study

Figures 4–6 also evaluate the performance of network alignment on three groups of datasets with three variants of our UANA model. We have observed that UANA averagely achieves 8.7\(\%\) Accuracy boost, 21.5\(\%\) RMSE improvement, and 6.9\(\%\) F1 growth, respectively, compared to the best network alignment results by all other algorithms. Notice that the performance of UANA-R and UANA-C is worse than that of UANA. A reasonable explanation is that UANA-R without utilizing the adversarial classification fails to project two networks to be aligned into the same low-dimensional embedding space and thus increase the difficulty of the translations of two networks with inconsistent features. On the other hand, UANA-C without considering policy gradient-based optimization relies on highly discriminative features and user-defined similarity metrics and thus may result in unstable network alignment results. These results illustrate that both adversarial classification and policy gradient-based optimization are very important in solving the network alignment problem.

4.3 Effectiveness Analysis on Flickr with Enough Ground Truth

By following the same strategy as the papers [8, 31, 70], we construct a new graph dataset with adjacency matrix \(\mathbf {A}^{\prime } = \mathbf {P} \mathbf {A} \mathbf {P}^T\), where \(\mathbf {A}\) is the adjacency matrix of the original Flickr dataset and \(\mathbf {P}\) is a randomly-generated permutation matrix. The nonzero elements in \(\mathbf {P}\) denote ground-truth vertex pairs. We add structural noise to \(\mathbf {A}^{\prime }\) by randomly adding or removing edges with a fixed probability. In addition, we add noise to attributes by flipping binary values, or choosing categorical attribute values uniformly at random from the remaining possible values with another fixed probability. Table 2 exhibits the performance of network alignment by different algorithms over Flickr and its perturbed version with enough ground truth. It is still observed that our UANA model outperforms all baseline methods in terms of three effectiveness measures and running time.

Table 2.

Measure	FINAL	CAlign	REGAL	iNEAT	DGMC	RANA	UANA
Accuracy	0.605	0.432	0.678	0.552	0.646	0.714	0.755
RMSE	0.015	0.020	0.012	0.015	0.012	0.011	0.009
F1	0.603	0.507	0.684	0.619	0.693	0.726	0.771

Table 2. Effectiveness Analysis on Flickr with Enough Ground Truth

4.4 Parameter Sensitivity

Figure 7(a) shows the impact of the number of vertex embedding dimensions in the our UANA model over the DBLP17K dataset. We have observed that the performance of both Accuracy and F1 scores (or RMSE values) initially raises (or decreases) when the number of dimension increases. Intuitively, the final embedding results with more dimensions can introduce enough information for sparse graphs, and thus help improve the quality of network alignment running on the latent representations. Later on, the performance curves keep relatively stable when the number of dimensions continuously increases. A reasonable explanation is that the additional dimensions are unnecessary for future prediction if we already have enough information for network alignment analysis. In addition, less dimensions can help improve the efficiency of both representation learning and network alignment. Thus, it is important to determine the optimal number of dimensions for the network embedding. In this experiment, the Accuracy, RMSE, and F1 scores oscillate within the range of 21.4%, 42.8%, and 12.3%, respectively.

Fig. 7.

The default setting of the weights of three loss functions in the overall objective in Equation (11) in our previous experiments are all fixed to 1. Figures 7(b)–(d) measure the performance effect of three loss functions with different weights for the network alignment. We vary one weighting factor, say the weight of the cycle loss from 0 to 2, and fix the settings of other two weights with 1 in each figure. It is observed that both Accuracy and F1 scores (or RMSE values) have concave (or convex) curves when we continuously increase the weight values. The Accuracy scores oscillate within the range of 22.7%, the RMSE values vary within the range of 52.8%, while the F1 scores fluctuate within the spectrum of 34.6%, respectively. This demonstrates that the optimal weighting setting for the cycle loss for unsupervised alignment, the homophily loss for maintaining the vertex homophily consistency in the embedding space, and the preservation loss for cross-network embedding translation play important roles in handling unsupervised adversarial network alignment on pairwise networks with inconsistent features.

4.5 Scalability Analysis

Complexity analysis. We use the sparse tensors of TensorFlow to implement the neural networks, which allows to perform fast computation on the sparse storage by iterating on the non-zero elements in the matrices, i.e., the edges in the graphs. In our experiments, all players in the dual GAN models in Equations (4) and (8) are implemented as three-layer perceptrons. \(D\) is the dimension of the embedding vectors. \(N_k\), \(L_k\), and \(M_k\) \((1 \le k, l \le 2, k \ne l)\) have the same definitions in Section 2. \(d_1, d_2, d_3\), and \(d_4\) have the same definitions in the parameter setting. Thus, the training cost of neural networks in \(\mathbf {E_k}\) and \(\mathbf {\tilde{E}_k}\) are \(O(L_kd_2+N_kM_kd_2+N_kd_2D)\). The training complexity of other neural networks is given as follows: \(O(N_kDd_1+N_kd_1)\) for \(\mathbf {D_k}\), \(O(N_lDd_4+N_kDd_4)\) for \(\mathbf {M_{lk}}\), and \(O(\sum _{k=1}^2 N_k(Dd_3+d_3))\) for \(\mathbf {C,}\) respectively.

Thus, the total complexity is equal to \(O(\sum _{k=1}^2 (d_1(D+1)+d_2(2D+2M_k)+d_3(D+1)+2d_4D)N_k + 2d_2L_k)\). Let \(\alpha _k = d_1(D+1)+d_2(2D+2M_k)+d_3(D+1)+2d_4D\) and \(\beta = 2d_2\), then the total cost is simplified as \(O(\sum _{k=1}^2 \alpha _k N_k+\beta L_k)\). Since other variables \(d_1, d_2, d_3, d_4, k, D, M_k\) \(\lt \lt N_k, L_k\), the computational cost is almost a linear combination of the number of vertices and the number of edges in two networks.

Tables 3 and 4 exhibit the scalability analysis results over two groups of large-scale DBLP datasets, where N/A represents that the algorithms either fail to normally terminate due to the “out of memory” issue or cannot finish the experiments in 24 hours. We use Accuracy, F1, and RMSE to evaluate the quality of network alignment results generated by different methods. A larger Accuracy or F1 value indicates a better network alignment. The smaller the RMSE value, the better the quality. We observe that UANA works well on two groups of large-scale DBLP datasets. Especially, as given in Table 4, for the largest dataset (DBLP100K), only REGAL and UANA can produce the network alignment results. However, our proposed UANA method achieve much better quality than REGAL, although UANA is slightly slower than REGAL over DBLP100K.

Table 3.

Measure	FINAL	CAlign	REGAL	iNEAT	DGMC	RANA	UANA
Accuracy	N/A	0.546	0.563	0.257	0.322	0.636	0.654
RMSE	N/A	0.008	0.010	0.010	0.011	0.008	0.009
F1	N/A	0.551	0.679	0.261	0.336	0.762	0.723

Table 3. Scalability on DBLP50K 2013 vs. 2014

Table 4.

Measure	FINAL	CAlign	REGAL	iNEAT	DGMC	RANA	UANA
Accuracy	N/A	N/A	0.654	N/A	N/A	0.774	0.783
RMSE	N/A	N/A	0.007	N/A	N/A	0.004	0.003
F1	N/A	N/A	0.627	N/A	N/A	0.740	0.749

Table 4. Scalability on DBLP100K 2013 vs. 2014

4.6 Case Study

Table 5 presents some details of the experiment results by seven network alignment algorithms on the DBLP17K dataset. We examine the network alignment results of 10 experts from 3 research areas of database, data mining, and information retrieval, respectively. The first column represents the authors on the DBLP17K 2013 dataset to be matched, and other columns denote the matched authors by seven algorithms on the DBLP17K 2013 dataset. We use the bold font to mark wrong alignment results produced by each method. Our UANA algorithm can correctly align all of 10 author pairs, while other methods produce from 2 to 4 wrong alignment results among 10 predictions. This demonstrates the strength of UANA for addressing network alignment under the environment of high-dimensional inconsistent feature space.

Table 5.

Ground truth	FINAL	CAlign	REGAL
W. Bruce Croft	W. Bruce Croft	W. Bruce Croft	W. Bruce Croft
David J. DeWitt	David J. DeWitt	David J. DeWitt	David J. Duke
Christos Faloutsos	Christos Faloutsos	Christos Faloutsos	Christos Faloutsos
Daniel A. Keim	Torben Bach Pedersen	Daniel A. Keim	Tobias Schreck
Bing Liu	Bing Liu	Bing Liu	Bing Liu
Hector Garcia-Molina	Jürg Nievergelt	Jeffrey D. Ullman	Jennifer Widom
M. Tamer Özsu	Tim Andrews	M. Tamer Özsu	Lei Chen
Michael Stonebraker	Michael Stonebraker	Michael Stonebraker	Michael Stonebraker
Wei Wang	Wei Wang	Wei Wang	Wei Wang
Chengqi Zhang	Ting Wang	Ting Wang	Chengqi Zhang
[1pt] iNEAT	DGMC	RANA	UANA
W. Bruce Croft	W. Bruce Croft	W. Bruce Croft	W. Bruce Croft
David J. DeWitt	Daniel J. Duke	David J. DeWitt	David J. DeWitt
Christos Faloutsos	Christos Faloutsos	Christos Faloutsos	Christos Faloutsos
Torben Bach Pedersen	Daniel A. Keim	Daniel A. Keim	Daniel A. Keim
Bing Liu	Bing Liu	Bing Liu	Bing Liu
Jürg Nievergelt	Andreas Paepcke	Hector Garcia-Molina	Hector Garcia-Molina
Tim Andrews	M. Tamer Özsu	M. Tamer Özsu	M. Tamer Özsu
Michael Stonebraker	Michael Stonebraker	Michael Stonebraker	Michael Stonebraker
Wei Wang	Xiang Zhang	Wei Wang	Wei Wang
Ting Wang	Chengqi Zhang	Chengqi Zhang	Chengqi Zhang

Table 5. Network Alignment Results by Seven Algorithms on DBLP17K 2013 vs. 2014

5 Related Work

GAN [29, 77] has become an active research topic in different domains, such as speech recognition, image classification, and natural language processing. The GAN model is a powerful tool to generate high-quality synthetic images without the need of multi-step Markov chains for lower computational cost, learn deep representations without extensively annotated training data, and enable machine learning to work with multi-modal outputs. Multiple GAN variations have been proposed to conduct important research problems in image, audio, video, and text data processing, such as CGAN [62], InfoGAN [7], StackGAN [111], and MaskGAN [24] for conditional adversarial nets, DCGAN [71], GAN-CLS [73], StackGAN [111], and Progressive GAN for image generation [34], DTN [87] and CoGAN [54] for image translation, GP-GAN [100] for image blending, Improved GAN [77] and CatGAN [96] for semi-supervised learning, MGAN [77] for texture synthesis, Perceptual GAN [50] for object detection, SRGAN [66] for image super-resolution, and VGAN [6] and MoCoGAN [90] for video generation.

While the standard GAN model learns a mapping from a latent space to the data distribution, inverse models such as Adversarial Autoencoders [57] and Bidirectional GAN [18] also learn a mapping from data to the latent space. This inverse mapping allows real or generated data examples to be projected back into the latent space, similar to the encoder of a variational autoencoder. The techniques have been applied to various real-world problems, including interpretable machine learning [7], semi-supervised learning [23], neural machine translation [123], 3D human motion prediction [40], cross-lingual sentence representation [28], and collaborative filtering [17].

Graph data analysis have attracted active research in the last decade [1, 5, 10, 11, 12, 30, 33, 41, 42, 52, 68, 69, 74, 75, 83, 84, 101, 102, 103, 110, 124, 125, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156]. Network alignment has received much attentions in recent years. Existing research efforts on graph matching can be classified into three broad categories: (1) Structure-based techniques, which rely only upon the topological information to match two or multiple input graphs [27, 46, 49, 56, 67, 72, 85, 106, 119, 120]. IONE [53] and PALE [60] are supervised learning models to keep major structural regularities of networks for anchor link prediction. FRUI-P [132] is an unsupervised structure-based user identification algorithm, which calculates the similarities of all the candidate identical users between two social networks based on their friend features. DeepLink [128] is a deep RL based algorithm by leveraging the duality of mapping between two networks in a semi-supervised manner. CrossMNA [13] is a cross-network embedding based supervised network alignment method by learn inter/intra-embedding vectors for each node and by computing pairwise node similarity scores across networks. (2) Attribute-based approaches, which utilize highly discriminative structure and/or attribute features for ensuring the matching effectiveness [22, 61, 64, 65, 82, 108, 118, 121, 127]. FINAL [117] leverages both structure and attribute information to solve the attributed network alignment problem. Community-based network Alignment (CAlign) [9] jointly trains multiple learning tasks of network alignment, community discovery, and community alignment in one framework for analyzing large attributed networks. FASTEN [20] is a fast Krylov subspace based network alignment algorithm to speed up and scale up the computation of Sylvester equation for the alignment of two perturbation networks. UUIL [48] solves the unsupervised user identity linkage problem by learning a projection function to minimize the Earth Mover’s distance between the distributions of users in two social networks. REpresentation learning-based Graph ALignment (REGAL) [31] is an unsupervised network alignment framework that jointly embeds two graphs into low-dimensional space and infers soft alignments by comparing node embeddings across graphs. SNNA [47] is an adversarial learning framework to solve the weakly-supervised identity alignment problem by incorporating available annotations as the learning guidance. DGMC [26] is a supervised graph matching procedure which aims at reaching a data-driven neighborhood consensus between matched node pairs. And (3) heterogeneous methods, which employ heterogeneous structural, content, spatial, and temporal features to further improve the matching performance [55, 76, 97, 104, 126, 129, 130]. SCAN-PS [113] aims at predicting social links for new users across two aligned heterogeneous social networks. MNA [37] predicts ranking scores between pairwise nodes and infers anchor links across heterogeneous social networks by considering social, spatial, temporal, and text features. COSNET [122] is an energy based model to connect multiple heterogeneous social networks by considering both local and global consistency among them. DPLink [25] is an end-to-end deep learning framework to complete the user identity linkage task for heterogeneous mobility data.

RANA [75] is a preliminary version of this manuscript that was published in the proceedings of the ICDM’19. It is a supervised learning algorithm without policy gradient-based optimization. It needs some known aligned vertices in different networks as training data and thus limits its applicability. In addition, it has to carefully choose discriminative features and similarity metrics to calculate the similarity scores between pairwise vertices across two networks to generate the alignment results.

With the burst of novel artificial intelligence techniques, RL techniques have recently been a powerful tool to answer many challenging questions in robotics [44, 45], game playing [63, 80], indoor navigation [158], power consumption management [89], machine translation [159], and so on. The RL provides a very intuitive and comprehensive solution for autonomous decision making, in which software agents ought to take actions in an environment, consisting of a set of states and a set of actions per state, so as to maximize cumulative reward or reinforcement signal [2]. The RL models essentially aim at learning and optimizing a function mapping from states to actions so as to collect as much reward as possible when the agents transition from state to state and executing an action in each state provides the agents with a reward. The RL techniques have been leveraged to solve many graph optimization problems, such as knowledge graph reasoning [92, 105], adversarial attacks on graph-structured data [16, 86], graph matching [98], medicine combination prediction [95], recommendation [43, 131], robust spammer detection [19], and graph data management [93].

6 Conclusions

We have presented a dual GAN and RL based unsupervised network alignment framework. First, we propose a bidirectional adversarial network distribution matching to perform the bidirectional cycle-based cross-network translations for matching the distributions of real and translated networks in an unsupervised way. Second, we plug a dual adversarial autoencoder module into an adversarial binary classification model to project two copies of the same vertices into the same low-dimensional embedding space. Finally, we develop an RL based optimization method to solve the vertex matching problem in the discrete space of the GAN model.

Footnote

http://dblp.uni-trier.de/xml/.

References

[1]

Xianqiang Bao, Ling Liu, Nong Xiao, Yang Zhou, and Qi Zhang. 2015. Policy-driven autonomic configuration management for NoSQL. In Proceedings of the 2015 IEEE International Conference on Cloud Computing. 245–252.

Abstract

1 Introduction

2 Problem Definition

3 NEtwork ALignment

3.1 Framework

3.2 Bidirectional Cycle-Based Adversarial Network Distribution Matching

3.3 Adversarial Binary Classification

3.4 Policy Gradient-Based Optimization of \(\mathbf {M_{12}}\) and \(\mathbf {M_{21}}\)

4 Experimental Evaluation

4.1 Effectiveness Analysis

4.2 Ablation Study

4.3 Effectiveness Analysis on Flickr with Enough Ground Truth

4.4 Parameter Sensitivity

4.5 Scalability Analysis

4.6 Case Study

5 Related Work

6 Conclusions

Footnote

References

Cited By

Index Terms

Recommendations

Deep Adversarial Network Alignment

Unsupervised Large-Scale Social Network Alignment via Cross Network Embedding

Robust Deep Reinforcement Learning with Adversarial Attacks

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

HTML Format

Login options

Full Access

Share

Share this Publication link

Share on social media

Affiliations