1 Introduction

Knowledge graph

A knowledge graph (KG) is a structured representation of an objective piece of real-world knowledge formed by entities, attributes, relationships, and semantic descriptions in a structured format of a triple like (subject entity, relation, object entity). An entity refers to an object in real-world knowledge, an attribute is a description of an entity character, a relationship draws the graph among entities in KGs, and semantic descriptions, such as entity names, values and analytical strings, are necessary to supplement information of entities in KGs. Existing KGs such as DBpedia [1], Freebase [3] and YAGO [28] are noteworthy large and freely available. KGs are becoming increasingly important because of simplifying knowledge acquisition, storage, and reasoning, which are crucial in applications such as question answering [13] and recommendation systems [29].

Knowledge from existing KGs is required to update according to the new information. In addition, downstream tasks require the integration of multiple KGs for knowledge fusion. Considering the long-term update after KG construction, KG fusion relies on different knowledge sources based on their knowledge domains [45].

Entity alignment in KGs

Entity alignment aims to link entities representing the same real-world objects in different KGs. EA acts as a crucial character in knowledge fusion and updates knowledge by detecting the equivalent entities from KGs in different knowledge domains [7].

State-of-the-art

Recently, many works have focused on representation learning in graphs and KG to form embedding models, e.g., translation method MTransE [9] and Graph Neural Network(GNN) method GCN Align [36]. However, these methods heavily rely on labelled (aligned) data. Even though some self-supervised methods succeed in exploiting features from KGs, noise from generating entity pairs has not been handled, which could cause misleading model training of entity alignment methods. Although semi-supervised learning, including self-learning [16] and reinforcement learning [41], can utilize both labelled and unlabeled data for better performance, the acquisition of labelled data is costly and error-prone. There are some existing unsupervised entity alignment methods that have explored different ideas. Mao et al. [18] converts alignment into an assignment problem and resolves it without a graph neural network. Liu et al. [15] considers different views of a knowledge graph to develop an EA method. Qi et al. [23] adopts a conventional system PARIS [27] for entity mapping and MTransE embedding-based model for structure embedding. However, while structural information is utilized, attribute information has not received enough attention.

Challenges

Bearing state-of-the-art works in mind, three challenges are faced by unsupervised EA.

Challenge 1.:

How to explore the potential to align entities across KGs without labelled data? Existing supervised methods make KG embedding and minimize the distance of labelled entities. The goal of the existing learning objectives is to make them more similar to each other according to the existing labelled data in KGs. The label of entity pairs acts as the ground truth in EA model training and label reinforcement learning. Intuitively, the embedding-based entity alignment methods face two fundamental issues in human-labelled data, accuracy and labour cost. Besides, the Web-scale KGs also increase the difficulty of label supervision of entity alignment. Hence, it is necessary to accurately and independently achieve entity alignment without label supervision.

Challenge 2.:

How to reduce the effect of noise in unsupervised labelled data generation? Existing unsupervised learning methods utilize an independent module to generate aligned (labelled) entity pairs. Therefore, one of the most critical steps is generating reliable pseudo-labelled data for training. However, we cannot avoid noises in the world, which means we should consider the existence of noises in an unsupervised dataset. Otherwise, the pseudo-labelled data, defined as the ‘ground truth’ for model training, can be harmful and lead the model to worse performance. Thus, the second challenge is to deal with noise in entity pair generation.

Challenge 3.:

How to effectively exploit entity information in multi-lingual KGs? In multi-lingual KGs, there are multi-relations among the same entities in the real world with different languages. The influence of multi-relation entities should be different for their links in KGs. Existing entity alignment works rely on entity embedding to estimate semantic similarity for dealing with multi-lingual KGs. Specifically, they rely on most independent KG structure features and context information to build entity alignment models for one-to-one alignment of two languages. An example of the shortage of one-to-one based multi-lingual KG entity alignment is as follows. One single model is applied for English-to-Chinese and English-to-Japanese alignments independently, which cannot capture and utilize the potential connections over Chinese, English, and Japanese. We argue that multi-lingual KG pairs can provide crucial clues and entity pairs to achieve reliable entity alignment. The model trained with cross-lingual entity pairs can offer reliable alignment evidence and help improve the accuracy of KG alignment with lower performance in multi-lingual KG entity alignment tasks. Hence, the third challenge is how to fully exploit the information of multi-lingual KG to achieve entity alignment.

Our method

To address the above challenges, in this work, we propose Generative Adversarial Network for Unsupervised Multi-lingual Knowledge Graph Entity Alignment (GAEA), a generative adversarial network (GAN) for entity alignment on multi-lingual KGs without supervision dataset. Motivated by the popular graph representation learning with GAN, we propose a structure embedding module with GAN to encode the structure information in KGs while decreasing the effect of noise from entity pair generation.

We apply a pre-trained language model for context information encoding to address challenges 1 and 2, where the model has been trained on 109 different language datasets. The highly confident attribute embedding module, including the pre-trained language model and neighbour attribute embedding module, acts as entity pair generators that provide pseudo-labelled data for further training in the structure embedding module, where pseudo-labelled data serves the same role as the labelled data in supervised learning methods. To reduce the impact of noise in the pseudo-labelled data, we propose the structure embedding module, which is a generative graph representation learning model with node-level and edge-level strategies, to eliminate errors in pseudo-labelled data caused by noise. There are two components in the structure embedding module, the generator and the discriminator. The generator may generate a set of fake nodes and edges by learning the distribution of entities from pseudo-labelled entity pairs. The discriminator distinguishes both the pseudo-labelled and the new entity samples from the generator to improve the training performance. The updating of both the generator and discriminator will improve the performance during the adversarial training in our method.

To deal with challenge 3, we design an entity pair generation module for multi-lingual KGs. In order to collect the same entities among multi-lingual KGs, context and neighbour attributes are embedded. The inferred alignment probabilities of multi-lingual entities are injected into the first training set for the discriminator in the structure embedding module for entity alignment. In this way, the generator and discriminator can be utilised to adjust pseudo-label data and guide graph representation learning, thus reducing noise and fully leveraging both attribute information and structure information.

We conduct extensive experiments on benchmark datasets DBP15K and OpenEA, to evaluate the effectiveness of our GAEA method. Experimental results demonstrate that our framework can achieve consistently better performance compared to state-of-the-art methods, and the GAN model can effectively reduce noise’s impact and improve entity alignment’s performance.

Contributions

Our principal contributions are summarized as follows.

  • To get rid of label supervision, we utilize contrastive learning on the multi-lingual attribute embedding and local structure embedding to generate pseudo-labelled data, which makes self-supervised learning possible.

  • To achieve the goal of entity alignment with limited training samples, we develop a generative training module considering structure information, which can be trained with small amounts of pseudo-labelled data.

  • Our model also exploits the multi-relation entities among different KGs. The results of the attribute embedding module and the structure embedding module interactively serve as refined inputs for each other to improve the training effectiveness.

  • Extensive experiments are conducted to demonstrate that our method can achieve consistently better performance compared to most state-of-the-art methods.

The remaining paper is organized as follows. Section 2 discusses the related work. Section 3 introduces preliminaries and formally defines the studied problem. In Section 4, we discuss our GAEA method. Section 5 presents the experimental evaluations of GAEA against state-of-the-art methods. Section 6 concludes this paper.

2 Related works

2.1 Knowledge graph representation learning

Knowledge graph representation learning powerfully considers modelling structure and semantic information to address the EA problem. TransE [4] is a translational distance model, which is a milestone in knowledge graph representation learning. This model builds a vector representation of both entities and relationships into the same vector space and performs effectively on a sparse knowledge graph. Later on, many translational distance models, such as TransH [37], are proposed to deal with the weakness of TransE in a complex relationship.

On the other hand, there are translational models based on polar coordinate system, such as RotatE [30], HAKE [43] and MuRP [2]. The polar coordinate system is a two-dimensional coordinate system in which each point on a plane is determined by a distance from a reference point and an angle from a reference direction. In KGs, translational models based on the polar coordinate system mainly express entities and relationships as points and rotation angles in unit modulus lengths. In transitive, anti-transitive and symmetry relations, these translational models based on the polar coordinate system perform well on knowledge graph representation.

In some reviews and surveys, KG representation methods based on neural networks are widely employed in the KG representation learning period. ConvE [10] and ConvKB [20] utilize a convolutional neural network in order to combine entity and relationship information for comparison. R-GCN [26] introduces a method based on a graph neural network by treating the relationship as a matrix for mapping neighbourhood features, which forms structural information in a significant way. CapsE [21] uses capsule networks to model the triples in KGs. KBGAT [44] uses a graph attention network for integrating encoding and decoding information of the entities’ multi-hop neighbourhoods. The existing graph representation learning methods are focusing on the structure information and data mining on graphs. Since the KGs are heterogeneous graphs with text information and local relationship, our framework adopts the advantage of existing graph representation learning methods and exploit attribute information at the same time.

2.2 Embedding-based entity alignment

MTransE [9] is a translation-based model for multi-lingual knowledge graph embedding to provide a simple and automated solution. By encoding entities and relations of each language in a separated embedding space, MTransE provides transitions for each embedding vector to its cross-lingual counterparts in other spaces while preserving the functionalities of monolingual embedding. IPTransE [46] jointly encodes both entities and relations of various KGs into a unified low-dimensional semantic space according to a small seed set of aligned entities. BootEA [31] is a bootstrapping approach for embedding-based entity alignment. It iteratively labels likely entity alignment as training data for learning alignment-oriented knowledge graph embeddings. GCN-Align [36] is a cross-lingual knowledge graph alignment via graph convolutional networks. MRAEA [19] directly models cross-lingual entity embeddings by attending to the node’s incoming and outgoing neighbours and its connected relations’ meta semantics. HGCN [6] is a GCN-based framework for learning both entity and relation representations. RDGCN [39] incorporates relation information via attentive interactions between the knowledge graph and its dual relation counterpart and further captures neighbouring structures to learn better entity representations. GMNN [40] is a graph-attention-based solution, which first matches all entities in two topic entity graphs, and then jointly models the local matching information to derive a graph-level matching vector. Since Embedding-based methods concentrate on relation semantics and global structure information of KGs based on labelled data, unsupervised entity alignment methods MultiKE [14] and SelfKG [17] are proposed for reducing the labor of human in label marking and data filtering. However, noise entities and fake links are harmful to these model training during entity source generation, which is inevitable. Therefore, our method, which jointly considers the structural embedding and semantic information embedding, is built to deal with the noise of pseudo-labelled entity pairs in the cross-KG information interaction via seed entity pairs generation.

2.3 Graph generative adversarial neural network

Generative Adversarial Network(GAN) is widely used in obtaining information from a lower dimensional structure, and it is also widely applied in the graph neural network. SGAN [22] first introduces adversarial learning to the semi-supervised learning on the image classification task. GAN-FM [25] introduces a heuristic understanding of the non-convergence problem by introducing feature-matching and minibatch techniques. GraphGAN [35] proposes a framework for graph embedding task. Specifically, GraphGAN can generate the link relation for a centre node. However, GraphGAN cannot be applied to the attributed graph. MolGAN [5] proposes a framework for generating the attributed graph of a molecule by generating the adjacency matrix and feature matrix independently. After that, MolGAN used a score for the generated molecule as a reward function to choose the suitable combination of attributes and topology structure by an auxiliary reinforcement learning model. GraphSGAN [11] proposes a framework for graph Laplacian regularization-based classifier with GAN to solve graph-based semi-supervised learning tasks. In GraphSGAN, fake samples in the feature space of the hidden layer are generated. Hence it can not be applied to convolutional-based classifiers. In contrast, our model generates fake nodes directly and is adaptive to convolutional-based classifiers. GraphCGAN [42] can generate attributes and adjacency matrix of the attributed graph jointly, capturing the correlation between attributes and topology relation. DGI [34] proposes a general approach for learning node representations within graph-structured data in an unsupervised manner. For the generator, in DGI, the fake nodes are created from a pre-specified corruption function applied to the original nodes. Since GANs in graphs have more advantages in dealing with unsourced data and distribution processing. Our method adopts a similar idea with more effective interactions in KGs, which contain pseudo-labelled data based on their attributes and generated noise data from GAN.

3 Preliminary and problem statement

Attributed knowledge graph

In this paper, we consider a KG as G = (E,S,Y ), where E is the set of entities, S is the set of relational triples, and Y is the set of attributive triples. S and Y are defined formally as follows.

Relational triples. :

Given a set of relations R, \(S\subseteq E\times R\times E\). W.l.o.g, a sS is (h,r,t), where h,tE and rR. h and t are head and tail entities respectively.

Attributive triples. :

Given a set of attributes A, and for each attribute aA, a has a corresponding value set \(V_{a}\subseteq V\), \(Y\subseteq E\times A\times V\).

Both relational and attributive triples describe important information about entities. We consider both of them in the task of cross-lingual KG alignment.

Entity alignment

Given two KGs G1 = (E1,S1,Y1) and G2 = (E2,S2,Y2) that need to be aligned, a pair of aligned entities is defined as M = {(e1,e2)|e1E1,e2E2,e1e2}, where ⇔ denotes entity equivalence. In this paper, we consider e1e2, if e1 and e2 are structural and attributive similar according to their relational triples and attributive triples. The goal of entity alignment between G1 and G2 is to identify all aligned entity pairs from them if they exist.

We propose our GAEA to solve multi-lingual KGs entity alignment task with one framework consisting of two main modules. In the attribute embedding module, node attribute embedding and neighbour attribute embedding are utilized for pseudo-labelled entity pair set generation. For the structure embedding module, we build a generative adversarial model which contains a generator and a discriminator. The generator aims to learn the distribution of pseudo-labelled entity pairs and produce the fake nodes and edges; while the discriminator aims to classify the fake nodes and edges from the generator and pseudo-labelled entity pairs.

4 Methodology

In this section, we first introduce the framework of our proposed GAEA towards aligning the entities from multi-lingual knowledge graphs. Then, we elaborate technical details of the modules. Last but not least, how the two modules interactively improve the training performance will be discussed.

4.1 Framework

Figure 1 presents the framework of our multi-lingual generative adversarial entity alignment method. Our proposed GAEA comprises two major components: 1) the attribute embedding module and 2) the structure embedding module.

Figure 1
figure 1

Framework of GAEA method

The attribute embedding module is designed to generate an initial set of pseudo-labelled entity pairs based on the attribute information of KGs. Then, the structure-based embedding module utilizes the pseudo-labelled entity pairs to train a generator G and a discriminator D (generative adversarial model) based on structure information of KGs, where D and G play a non-cooperative game to improve the classification effectiveness of D.

The above training results do not directly serve as the output. Instead, we propose an interactive learning method. The results of the structure-based embedding module are sent back to the attribute embedding module so that the set of pseudo-labelled entity pairs is refined, which then serves as the input of the structure-based embedding module again for improving classification effectiveness. This interactive process will be repeated and will continuously improve the classification effectiveness of D till a stop condition is satisfied. Then the trained model is returned.

Compared to existing unsupervised EA methods [14, 17], GAEA is less sensitive to noise entities and fake links that naturally exist in real datasets, thanks to the newly introduced generative adversarial model and interactive training. Next, we discuss each component and the interactive training process in great detail.

4.2 Attribute embedding module

In this subsection, we first propose an effective and reliable entity attribute embedding from unlabeled entities to be aligned. Then the embedded results will be used to generate pseudo-labelled entity pairs for training the generative adversarial model introduced later.

Attribute embedding

For a node (an entity e) in KG, we propose embedding its attributes and the attributes of the neighbours of e together to represent e. To do so, we first apply a pre-trained language model consisting of a BERT model to each entity, which can embed attributes of an entity, such as name, description, etc., into a vector. Then, we adopt a simple graph attention network with one layer to aggregate embedding for e and neighbours of e.

Node attribute embedding.:

The uni-space attribute embedding from different KGs can greatly benefit entity alignment tasks. Considering DBP15K, a widely used multi-lingual entity alignment dataset, as an example, the basic information for an unsupervised entity alignment method are: entity names & descriptions, entity relational triples and entity attributive triples. The DBP15K is built from DBpedia with multi-lingual entities for each alignment dataset, e.g. zh_en, ja_en. It also inspires us of the cross-lingual entity links. The attributes that we utilize contains entity names & descriptions and entity attribute names & values. Values in the attributes are not only numbers but word definitions. To deal with entities’ multi-lingual semantic information in the same structure embedding, we employ the Language-agnostic BERT Sentence Embedding (LaBSE) for attribute embedding. Attribute values in the numerical structure will stay in the original format. After attribute embedding module, the set of entity name embeddings \(\overrightarrow {E_{en}}\), the set of attribute name embeddings \(\overrightarrow {E_{an}}\) and the set of attribute value embeddings \(\overrightarrow {E_{av}}\) can be obtained separately in an entity set. We take \(\overrightarrow {e_{an}}\) and \(\overrightarrow {e_{av}}\) for attribute name and value embedding for a single entity.

For each entity with multiple attributes, we first convert every pair of the attribute name and the corresponding attribute value into embedding via a pre-trained language model. The attribute embedding module uses a single-layer feed-forward to concatenate the embeddings of every pair of the attribute name and the corresponding value for the entity, which can be expressed as follows,

$$ \overrightarrow {e_{a}}=\left | \right |^{N}_{k=1} W[\overrightarrow {e_{an}} , \overrightarrow {e_{av}}] $$
(1)

where N is the number of attributes of the entity, and W is a trainable parameter.

To obtain the aggregation of the entity name embedding and the concatenated attribute embedding of a single entity, we implement a simple weighted concatenation to combine feature embeddings into representation eai for an entity.

$$ \overrightarrow {e_{ai}}= \left[\overrightarrow {e_{en}}, \bigoplus_{n \in N}\left[\beta_{n}\times \overrightarrow {{e_{a}^{n}}}\right]\right] $$
(2)

where N is the number of attributes of the entity, and β is for balancing the weight of multiple attributes of entities, which is defined as:

$$ \beta_{n} =\frac{{\sum}_{j=1}^{N} \| \frac{\overrightarrow {{e_{a}^{n}}} + \overrightarrow {{e_{a}^{j}}}}{\overrightarrow {{e_{a}^{n}}}\times \overrightarrow {{e_{a}^{j}}}}\|}{N} . $$
(3)

The aggregation of attribute embedding and entity name embedding can represent the context information of entities.

Neighbour attribute embedding.:

Node attribute embedding applied in attributes of KGs can provide similar entity pairs via their language. Since different entities in the real world might have similar names and descriptions, aligning entities based on attributes alone may cause errors. Here we bring in the entity neighbour information in each KG. Instead of building a complex embedding graph neural network, we take the neighbour attributes from 1-hop graph structure of each entities as the neighbour attribute embeddings of entities. Aggregating 1-hop neighbours of entities builds the local structure, and the graph embedding aims to learn a low-dimensional representation of entities and their relations while preserving their node attribute features. For one entity embedding \(\overrightarrow {e_{i}}\) and its 1-hop neighbor \(\overrightarrow {e_{j}} \in \mathcal {N}_{i}\), here we employ the graph attention networks [33] based on concatenated entity and structure embedding:

$$ \mathbf{e_{i}} =\|_{k=1}^{N} \sigma\left( \sum\limits_{\overrightarrow{e_{j}}\in \mathcal{N}_{i}}\alpha_{ij}^{k} W^{k} \mathbf{e_{j}}\right) $$
(4)

where || represents concatenation in the multi-head attention, Wk is the corresponding input linear transformation’s weight matrix, and α is defined as:

$$ \alpha_{i j}=\frac{\exp \left( \text{LeakyReLU}\left( \overrightarrow{\mathbf{a}}^{T}\left[\mathbf{W^{h}} \mathbf{E_{i}} \| \mathbf{W^{t}} \mathbf{E_{j}}\right]\right)\right)}{{\sum}_{k \in \mathcal{N}_{i}} \exp \left( \text{LeakyReLU}\left( \overrightarrow{\mathbf{a}}^{T}\left[\mathbf{W^{h}} \mathbf{E_{i}} \| \mathbf{W^{t}} \mathbf{E_{k}}\right]\right)\right)} $$
(5)

where \(\overrightarrow {a}\) is a one-dimensional vector to map input into a scalar, and Wh, Wt are linear transition matrices for head and tail entity representation of relations respectively.

Generating pseudo-labelled entity pairs

One critical issue in the unsupervised entity alignment task is the pseudo-labelled entity pair set generation, which is generated according to embedding generated previously. The shortest distance of each entity embedding from a greedy entity pair list can be selected, which denotes the high possibility of entity pair alignment. For the distance of entity alignment embedding, a pair-wise distance is calculated as follows:

$$ Dis = \sigma ||\mathbf{{e^{1}_{1}}} - \mathbf{{e^{2}_{1}}}|| + (1-\sigma)\max\left( 0, m-||\mathbf{{e^{1}_{2}}} - \mathbf{{e^{2}_{2}}}||\right) $$
(6)

where e1,e2 refer to the embedding discussed above, and \(\mathbf {{e^{1}_{1}}}\) refers to the embedding for the first entity in the entity pairs. To prevent the impact of an imbalanced neighbour, we set a hyper-parameter σ and a limitation of neighbour embedding m.

Considering the error links from noise and the multi-relational entity pairs, i.e., an entity from one KG links to more than one entity in the other KG. We start with a strict proportion threshold to select entity pairs and modify the distance in the interactive learning process referring to the structure-based module. Detailed steps are shown in Algorithm 1.

Algorithm 1
figure a

Pseudo-labelled entity pairs generation.

4.3 Structure embedding module

In this section, we discuss how to embed entities (nodes) in KGs according to structure information in KGs and the generated pseudo-labelled entity pairs, and therefore serve as training input for the classifier for entity alignment purposes.

As discussed in Section 2, graph embedding based entity alignment methods have been widely discussed. These graph structure embedding methods utilize the aligned entity pair dataset with ground-truth. A pseudo-labelled entity pair set from the attribute embedding module is highly dependent on pre-trained language models regarding inevitable noises. As such, designing a structure embedding method on KGs which can reduce the effectiveness of noises is the goal for our method.

We propose a generative adversarial model to deal with entity alignment of KG structure, which utilizes limited entity pairs of the pseudo-labelled entity pair set with noisy samples from generator. This structure embedding module aims at dealing with the small-scale training set and noisy examples to achieve a better performance.

4.3.1 Overview

The structure embedding module utilizes structure information for knowledge graph embedding based on information of relational triples. We propose a graph generative adversarial method on KGs, which consists of a generator and a discriminator. The generator learns the distributions of entities and generates fake nodes and edges. The discriminator compares the distances between samples of pseudo-labelled entity pairs and the fake data to lift the performance of the embedding serving for the EA purpose.

To be specific, the generator can create feature vector x0Rd and adjacency matrix from relation a0Rn. We set a0,i,ai,0 = 1 for new nodes from generator which are connected to real node vi, and a0,i = 0 otherwise. The discriminator learns from negative samples (x1,x0,a1,a0),(x2,x0,a2,a0) and positive sample (x1,x2,a1,a2) with global structure information from KG1 and KG2.

4.3.2 Generative adversarial model

In this subsection, we detail the generator and discriminator in our proposed generative adversarial module.

Generator

The generator generates a set of fake nodes and edges to simulate noise and updates it by learning the distribution of the entities in the pseudo-labelled entity pairs set. As new nodes and edges will affect the structure of KGs, we set two strategies for generator, which are node-level and edge-level. For a given entity pair e1 and e2, the node-level generator creates fake nodes with feature x0, and the edge-level generator modifies the adjacency matrix a0|x0 with the created features. The node-level strategy guides generator for new neighbor nodes and edge-level strategy guides generator to link new relation from existing entities to the fake nodes, shown in Figure 2.

Figure 2
figure 2

Generator strategy

The node-level strategy can create a set of fake nodes (v0,1,v0,2,...,v0,N), and the matrix with fake nodes is denoted as \(X_{0} = (x_{0,1}^{\top }, x_{0,2}^{\top },..., x_{0,N}^{\top })^{\top }\). Under the edge-level strategy, the matrix with fake edges is denoted as \(A_{0} = (a_{0,1}^{\top },a_{0,2}^{\top },...,a_{0,N}^{\top })^{\top }\). The adjacency matrix and feature vector with the fake nodes and features from the generator are denoted as:

$$ \tilde{A} = \begin{bmatrix} A & A_{0}\\ {A_{0}^{T}} & I_{B} \end{bmatrix} \in \mathbb{R}^{(i+N)\times (i+N) } $$
(7)
$$ \tilde{X} = \begin{bmatrix} X \\ X_{0} \end{bmatrix} \in \mathbb{R}^{(i+N)} $$
(8)

The loss function of the generator is:

$$ \mathcal{L}_{G} = Y_{-}^{1} + Y_{-}^{2} = D(X_{1},X_{0},A_{1},A_{0}) +D(X_{2},X_{0},A_{2},A_{0}) $$
(9)

where D is the discriminator discussed in the following. A is the combination of source data and fake data, which is generated by the generator in (4), and A1, A2 denote to the different results from both KGs to be aligned in our research.

Discriminator

The discriminator in this module works on KGs globally to measure the distance of entities from the KGs and the generator. A global convolution-based GNN is built in this part. Two main features of KG structure information have been proved in the existing research [47], which are n-hop neighbour information and global KG structure. To fully utilize global KG information, the discriminator module contains three parts: neighbour entities aggregation, (h,r,t) graph embedding, and discriminator optimizer.

Neighbor entities aggregation obtains the information of entities from KGs. GNN models propagate the information of nodes features across nodes and their neighbours for each iteration, which are defined in GNN models as a layer. To get basic entity representations, we utilize GCNs to explicitly encode entities in KGs with structure information. Specifically, in GCN, the layer-wise propagation rule can be defined as follows:

$$ X^{l+1} = \sigma(D^{-1/2}AD^{-1/2}X^{l}W^{l}+b^{l}) $$
(10)

where σ is an activation function; A is the n × n adjacency matrix of a primal graph, with added self-connections; D is the diagonal node degree matrix of A; Wl is the trainable weight matrix of primal graph convolutional layer at the l layer. The output of the last layer of the (10), embeddings xi of nodes referring to their neighbours is available.

(h,r,t) graph embedding.:

The (h,r,t) graph embedding aims to find relation representation among entity to relation, relation to entity and entity to entity. To represent information on these relations, we employ the GAT separately and concatenate their representations for (h,r,t) graph embedding.

Here we adopt 2-layer multi-head GAT in (5) for entity representation, \(\overrightarrow {e^{h}},\overrightarrow {e^{t}}\). For relation r representation including \({x_{i}^{h}}\), xi and \({x_{i}^{t}}\), according to the previous work [39], [47], duplicate links between entities and relations should be taken into consideration. The relation representation including \({x_{i}^{h}}\), xi and \({x_{i}^{t}}\) is calculated as follows:

$$ \mathbf{x_{i}} = RELU\left( \sum\limits_{_{j}\in \mathcal{T}_{e_{i}}}\sum\limits_{r_{k}\in\mathcal{R}_{e_{i}e_{j}}}\alpha_{ik}(\mathbf{{e^{h}_{k}}}+\mathbf{{e^{t}_{k}}})\right) $$
(11)
$$ \mathbf{x_{i}}^{h} = RELU\left( \sum\limits_{_{j}\in \mathcal{T}_{e_{i}}}\sum\limits_{r_{k}\in\mathcal{R}_{e_{i}e_{j}}}\alpha_{ik}(\mathbf{e_{k}}+\mathbf{{e^{t}_{k}}})\right) $$
(12)
$$ \mathbf{x_{i}}^{t} = RELU\left( \sum\limits_{_{j}\in \mathcal{T}_{e_{i}}}\sum\limits_{r_{k}\in\mathcal{R}_{e_{i}e_{j}}}\alpha_{ik}(\mathbf{{e^{h}_{k}}}+\mathbf{e_{k}})\right) $$
(13)

where αik represents attention weight from relation to entities, \(\mathcal {T}_{e_{i}}\) is the set of tail entities, \(\mathcal {R}_{e_{i}e_{j}}\)is the set of relations in (h,r,t).

$$ \alpha_{i k}=\frac{\exp (\text{LeakyReLU}(\overrightarrow{\mathbf{a}}^{T}\left[\mathbf{x_{i}}\right]\left[\mathbf{r_{k}}\right])}{{\sum}_{_{j}\in \mathcal{T}_{e_{i}}}{\sum}_{r_{k^{\prime}}\in\mathcal{R}_{e_{i}e_{j}}} \exp \left( \operatorname{LeakyReLU}\left( \overrightarrow{\mathbf{a}}^{T}\left[ \vec{x_{i}}\right]\left[ \vec{r_{k^{\prime} }}\right]\right)\right)} $$
(14)

Both head and tail entities are represented as \(\mathbf {x_{i}}^{h}\) and \(\mathbf {x_{i}}^{t}\), where \(i\in \mathbb {R}\). Combine (h,r,t) embedding into a representation \(x_{i}^{out}\), we can get:

$$ \mathbf{x_{i}}^{out} = [\mathbf{x_{i}}|| \mathbf{x_{i}}^{h} || \mathbf{x_{i}}^{t}] $$
(15)

where \(\overrightarrow {x_{i}}^{out}\) fully concatenates the relation representation including \(\mathbf {{x_{i}^{h}}}\), xi and \(\mathbf {{x_{i}^{t}}}\).

Discriminator optimizer. :

Firstly, we employ the Manhattan distance for entities measurement:

$$ dis(e_{i},e_{j}) = ||\mathbf{x_{i}}^{out} - \mathbf{x_{j}}^{out}||_{2} $$
(16)

In training iterations, a loss function in the discriminator optimizer aims at maximizing the distance of positive and negative pairs between entities and fake nodes: \(Y_{-}^{1} = D(X_{1},X_{0},A_{1},A_{0})\), \(Y_{-}^{2} = D(X_{2},X_{0},A_{2},A_{0})\); and minimizing the distance of entity pairs Y+ = D(X1,X2,A1,A2) at the same time. Finally, we set our loss function as:

$$ L = \sum\limits_{(X_{i},X_{j},A_{i},A_{j})\in KG}\max(\beta - Y_{-}^{1},\beta - Y_{-}^{2},Y_{+}, 0) $$
(17)

where β is a hyper-parameter. The discriminator optimizer works on entities and fake nodes iteratively according to the loss function. Detailed steps are shown in Algorithm 2.

Algorithm 2
figure b

Structure embeding module.

In summary, generator G generates fake nodes by generating feature vector and adjacency relations jointly. For generated nodes, the generated feature matrix and generated adjacency matrix have been defined. Hence, the combined adjacency matrix and combined feature vector can be denoted as (19) and (6). The discriminator is illustrated in (13), which starts with a manual parameter discussed in Section 5.1. During the iterative training, the discriminator extracts the distances of both pseudo-labelled entity pairs and fake nodes. Then, according to the loss in (7), the iterative training is progressively updated with gradient descent. When the iterative training finishes, G generates a new set of nodes, and G is trained by the loss function to fix the dissimilarity to the pseudo-labelled entity pairs. The whole module will stop until the epoch ends.

4.4 Interactive learning

Knowledge in the real world is built with different languages in a similar structure. As a multi-lingual entity alignment task, we set an interactive training during the attribute embedding and structure embedding training with the proposed generator and discriminator.

4.4.1 Attribute embedding module training

At first, the attribute embedding module process node attribute into embedding with the pre-trained language model. Then we collect the neighbour attribute embeddings of entities by a simple GAT in (5). As both node attribute embeddings and neighbour attribute embeddings can be collected, we employ the contrastive loss function for pseudo-labelled entity pairs generation from greedy entity pairs. In the interactive learning period, labels from structure embedding module can be collected, which works on the optimizer of the neighbour attribute embeddings. At the same time, we apply the node-level strategy for extending the entity attribute scale with noise extension. The Gaussian noise for node extension is applied initially, and the attribute embedding is separately trained with noise embedding in the negative label.

4.4.2 Structure embedding module training

For the structure embedding module, the edge-level strategy is applied for extending the relation scale with noise extension. The fake edges from random walk for edge extension on KGs is applied initially, and the triples with noise are the negative samples in the entity alignment.

We apply a dual aggregation function to extract the similarity features along both the node-level and edge-level strategies. Different from sentences, the neighbours are disordered and independent from each other.

4.4.3 Interactive learning optimizer

As mentioned above, the pseudo-labelled entity pairs from the attribute embedding module are defined by their node attributes and neighbour attributes. For every iteration in interactive learning, the model of a neighbour attribute in the attribute embedding module requires an update. Inspired by [12], contrastive learning can be applied to distinguish the embedding of entities from different KGs.

Given the entities \({e^{1}_{i}}\) and \({e^{2}_{i}}\) from KGs, their attribute embeddings are \(\overrightarrow {{e^{1}_{i}}}\) and \(\overrightarrow {{e^{2}_{i}}}\), where i refers to the number of entities in the KGs. During the interactive learning, the pseudo-labelled entity pairs and labelled entity pairs from the structure embedding module will update the neighbour attribute model in the attribute embedding module based on the contrastive loss:

$$ \mathcal{L}_{i} = -\log_{}{\frac{e^{sim(\overrightarrow{{e^{1}_{i}}},\overrightarrow{{e^{2}_{i}}})/\beta }}{{\sum}_{j=1}^{N}e^{sim(\overrightarrow{{e^{1}_{i}}},\overrightarrow{{e^{2}_{i}}})/\beta} } } $$
(18)

where β is a temperature hyper-parameter and \(sim(\overrightarrow {{e^{1}_{i}}},\overrightarrow {{e^{2}_{i}}})\) is the cosine similarity:

$$ sim(\overrightarrow{{e^{1}_{i}}},\overrightarrow{{e^{2}_{i}}}) = \frac{{\overrightarrow{{e^{1}_{i}}}}^{\top} \overrightarrow{{e^{2}_{i}}}}{\left \| \overrightarrow{{e^{1}_{i}}} \right \|\dot{}\left \| \overrightarrow{{e^{2}_{i}}} \right \| } $$
(19)

Then we set the goal of the structure embedding module to minimize the distance in (16) of entities from KGs, where the inevitable noises are under consideration.

For updating the generator in the structure embedding module during the interactive learning, we set the same loss function for both node-level and edge-level strategies which is:

$$ \mathcal{L} = \sum\limits_{i=1}^{N}\sum\limits_{j=1}^{M}\frac{\mathbf{x_{i}}^{out} \mathbf{x_{j}}^{out}}{\mathbf{x_{i}}^{out}+\mathbf{x_{j}}^{out}} $$
(20)

where N is the number of fake nodes, M is the number of fake edges, and xout refers to the global structure embedding output from the discriminator.

5 Experiments

In this section, we conduct extensive experiments to justify the effectiveness of the proposed GAEA.

5.1 Experimental settings

Datasets

We evaluate GAEA on two widely-acknowledged public benchmarks DBP15K and the benchmark dataset (V1) in OpenEA, introduced as follows.

DBP15K.:

It consists of three cross-lingual datasets from DBpedia, which include DBP15K\(_{zh\_en}\), DBP15K\(_{ja\_en}\), and DBP15K\(_{fr\_en}\), where zh, en, ja and fr denote Chinese, English, Japanese, and French. DBP15K is created from multi-lingual DBpedia, and each dataset has 15,000 reference entity alignments and about four hundred thousand triples. The detailed information of DBP15K is illustrated in the Table 1.

OpenEA.:

It contains two cross-lingual data sources from multi-lingual DBpedia, which are English-French and English-German, and two monolingual data sources from popular KGs, which are DBpedia-Wikidata and DBpedia-YAGO. Each data source has two sizes with 15K and 100K pairs of reference entity triples and relation triples. Here we only use cross-lingual data sources for unsupervised training and split 70% labelled data for testing. The detailed information of DBP15K is illustrated in the Table 2.

Table 1 Details of DBP15K
Table 2 Details of OpenEA V1

DBpedia is selected to build three cross-lingual datasets in our experiment, which is a large-scale multi-lingual KG including inter-language links from English-Chinese, English-Japanese and English-French. In our experiments, 15 thousand of inter-language links have been extracted. The attribute infobox triples for each entity have already been collected. Attribute information is extracted from the attribute triples, which have the form of (Entity, Attribute Name, and Attribute Value). For OpenEA, we use the name and descriptions in the dataset, which are the same number as the number of entities.

Baseline models

We consider two state-of-the-art unsupervised methods and seven supervised methods as baselines, discussed below.

Unsupervised methods.:

MultiKE [14] is an unsupervised method to divide the various features of KGs into multiple views, which are complementary to each other. SelfKG [17] is a self-supervised learning objective for entity alignment method with efficient strategies to optimize this objective for aligning entities without label supervision.

Supervised methods.:

MTransE [9] is a translation-based model for multi-lingual knowledge graph embeddings to provide a simple and automated solution. IPTransE [46] jointly encodes both entities and relations of various knowledge graphs into a unified low-dimensional semantic space according to a small seed set of aligned entities. BootEA [31] is a bootstrapping approach to embedding-based entity alignment. GCN-Align [36] is a cross-lingual knowledge graph alignment via graph convolutional networks. MRAEA [19] directly models cross-lingual entity embeddings by attending to the node’s incoming and outgoing neighbours and its connected relations’ meta semantics. HGCN [6] is a GCN-based framework for learning both entity and relation representations. RDGCN [39] incorporates relation information via attentive interactions between the knowledge graph and its dual relation counterpart and further capture neighbouring structures to learn better entity representations. GMNN [40] is a graph-attention-based solution, which first matches all entities in two topic entity graphs.

Moreover, in order to highlighting the improvement of our method comparing to the existing attribute-based methods, AttrBased [32] and JarKA [8], are also conducted in our experiment.

Our model variants.:

To evaluate the effectiveness of each module in our work, we provide the following different variants of GAEA:

  • BERT: the original BERT model, which can show the impact of multi-lingual pre-trained BERT.

  • w/o G: our method without the generator in the structure embedding module.

  • w/o GAT: our method without multi-hop graph attention network.

Implementation details

The DBP15K is preprocessed in the same way as SelfKG. The ratio of aligned neighbours of a pair of aligned entities can indicate the neighbourhood information is noisy. We make 5-fold experiment on each method and record the final result of each method.

For the well-structured OpenEA library, we initialize the trainable parameters with Xavier initialization and optimize the loss with Adam (Figure 3).

Figure 3
figure 3

Model performance with different batch sizes

Parameters settings

We employ the BERT model via the hugging face transformer module, which contains the multi-lingual BERT model pre-trained on the different languages. Model performance with different parameters has been illustrated in Figure 4. The first threshold of generated labelled pairs in the attribute embedding module is set to 0.917. The dimension of the generated matrix from the generator is 1000. We use a learning rate of 10− 6 with Adam on an Ubuntu server, and the momentum m is set to 0.999. The batch size is set to 32. Training batch size and epoch configurations are illustrated in Figures 3 and 5. The similarity score is calculated using the L2 distance of between the embeddings of two entities.

Figure 4
figure 4

Model performance with different parameters

Figure 5
figure 5

Model performance with different training epoches

5.2 Results

Evaluation metrics

We use hit ratio (H@k) and mean reciprocal rank (MRR) to show the effectiveness, and for all methods, the larger, the better.

In the top-k recommendation, hit ratio [38] is a commonly used indicator to measure the recall rate, and the calculation formula is:

$$ Hit @ K=\frac{NumberO f Hits@K}{Number Of Records} $$
(21)

The mean reciprocal rank [24] is a measure to evaluate systems that return a ranked list of answers to queries. For a single query, the reciprocal rank is ranki where i is the position of the highest-ranked answer. If no correct answer was returned in the query, then the reciprocal rank is 0. For multiple queries Q, the mean reciprocal rank is the mean of the Q reciprocal ranks, which is:

$$ \text{MRR}=\frac{1}{Q} \sum\limits_{i=1}^{Q} \frac{1}{\text{rank}_{i}} $$
(22)

In this section, we report the result of our model and baselines on DBP15K. For all the baselines, we take the reported scores from the corresponding papers with public codes shown in Tables 3 and 4. We categorize all the models into two types: supervised: 100% of the aligned entity links in the training set are utilized, and unsupervised: 0% of the training set is utilized.

Table 3 Entity alignment results on DBP15K datasets
Table 4 Entity alignment results on OpenEA datasets

5.3 Overall performance

As shown in Table 3, our method GAEA achieves better results than most supervised methods, i.e., MtransE, BootEA and TransEdge. Besides, GAEA also has comparable performance to GNN-based methods, including GCN-Align, MRAEA, RDGCN, HGCN and HMN. Our method also has promising results on zh_en and ja_en datasets. Compared with unsupervised methods, SelfKG and our method GAEA have significant improvements compared with MultiKE. Our method GAEA outperforms all the other unsupervised methods except for the \(DBP15K_{fr\_en}\). For example, on \(DBP15K_{zh\_en}\) dataset, GAEA achieves a gain of 2.9% by Hits@1 compared with SelfKG and 0.9% by Hits@10. This is because GAEA can integrate multiple structural contexts, and the generative adversarial network can further improve the performance of the generator.

The results for OpenEA are shown in Table 4. We mainly focus on comparisons with supervised methods. All the results of the compared methods are from the corresponding papers. In comparison with most supervised methods, GAEA achieves comparable results. For example, our method outperforms 5 methods in EN-FR-15K and 4 methods in EN-DE-15K. We also apply our method to monolingual datasets D-W-15K and D-Y-15K in OpenEA. In comparison with most supervised methods, GAEA also achieves comparable results.

As an unsupervised method, we observe that GAEA achieves comparable results among most supervised methods and has advantages in \(DBP15K_{zh\_en}\) and \(DBP15K_{ja\_en}\) datasets compared with the two unsupervised methods SelfKG and MultiKE.

GAEA vs. PRASE

As mentioned in Section 1, [23] proposes an advanced unsupervised entity alignment method based on probability estimates. Since PRASE adopts F1-score instead of Hit@K, we apply our method on DW100K and DY100k with F1-score as evaluation. The result is illustrated in Figure 6, which shows GAEA outperforms on these datasets.

Figure 6
figure 6

F1-score of GAEA and PRASE

Discussion

There are several reasons why our method is superior: 1. We consider the effect of noise in unsupervised learning. When generating pseudo-label learning samples, methods all choose to trust their own learning results unconditionally. However, the noise and wrong samples in these training results will affect the model training results, which is one of the reasons why the results fluctuate greatly when we reproduce the above research experiments. Our research considers the influence of noise, which not only improves the accuracy of the experiment, but also effectively eliminates the influence of noise and enhances the robustness. 2. Our research relies as little as possible on the form of the data itself, but uses the pre-trained language model to summarize as many attributes, descriptions, and attribute values as possible, and combines the structural information of the knowledge graph to achieve the goal of entity alignment. Experiments show that our model can achieve competitive results compared to partially supervised learning due to the use of a wide range of data sources.

5.4 Ablation study

In Table 5, we present the ablation study for GAEA on DBP15K, including ablation of pre-trained attribute embedding, generator in structure embedding, multi-hop neighbours and neighbour attribute embedding.

Table 5 Ablation study of pre-trained language model on DBP15K

5.4.1 Impact of the pre-trained attribute embedding

To analyze the impact of the pre-trained attribute embedding, we employ the original BERT, called the uncased-BERT model illustrated in Table 5. The results show that the pre-trained attribute embedding module further brings a 12% improvement at least.

5.4.2 Impact of the generative graph model

To explore the impact of the generative graph model on our method, we compare GAEA and entity alignment with the discriminator as shown in Table 6. Generally speaking, the generator in the structure embedding module acts as a training set provider which can extend the KG structure with new nodes and links. To be more specific, we manually extend the dataset in this ablation study to compare our framework without the generative graph model. It seems that when seed entity pairs decrease, GAEA shows a more stable performance than the other one. Moreover, GAEA has stable performance in different datasets.

Table 6 Ablation study of generator impact on DBP15K

5.4.3 Impact of multi-hop neighbours

To analyze the impact of multi-hop neighbours, we change the GCN structure with different layers. According to the former research [15], a complex graph neural network can improve the performance of the entity alignment method. Table 7 illustrates the impact of multi-hop neighbours on our method. With the number of layers increasing, 2-hops GCN reaches the highest Hits@1 score, but the performance of three or more hops model is worse.

Table 7 Ablation study on multi-hop GNN on DBP15K

5.4.4 Impact of neighbour attribute embeddings

To explore the importance of the combination of attribute embedding and neighbour attribute embedding, we make an ablation study by removing the neighbour attribute embedding in our module. According to the former research, leveraging more data can improve the performance of entity alignment and the combination of attribute embedding and neighbour attribute embedding also. Table 8 illustrates the impact of combining the neighbour attribute embeddings on our method. With the neighbour attribute embedding, the results show that it brings a 5% improvement at least.

Table 8 Ablation study of neighbour attribute embeddings on DBP15K

6 Conclusion

In this work, we investigated the entity alignment problem, which targets aligning entities with identical meanings across different knowledge graphs. We developed an unsupervised entity alignment algorithm that automatically aligns entities without training labels. The experiments on two widely-used benchmarks DBP15K and OpenEA showed that our model could beat or match most of the supervised alignment methods, which utilize 100% of the training datasets. Our discovery indicates the potential to get rid of supervision in the entity alignment problem, and more studies are expected for a deeper understanding of unsupervised learning.