Cross-Lingual Knowledge Graph Entity Alignment Based On Relation Awareness and Attribute Involvement
Cross-Lingual Knowledge Graph Entity Alignment Based On Relation Awareness and Attribute Involvement
Cross-Lingual Knowledge Graph Entity Alignment Based On Relation Awareness and Attribute Involvement
https://doi.org/10.1007/s10489-022-03797-6
Abstract
Entity alignment is an effective means of matching entities from various knowledge graphs (KGs) that represent the
equivalent real-world object. With the development of representation learning, recent entity alignment methods learn entity
structure representation by embedding KGs into a low-dimensional vector space, and then entity alignment relies on
the distance between entity vectors. In addition to the graph structures, relations and attributes are also critical to entity
alignment. However, most existing approaches ignore the helpful features included in relations and attributes. Therefore, this
paper presents a new solution RAEA (Relation Awareness and Attribute Involvement for Entity Alignment), which includes
relation and attribute features. Relation representation is incorporated into entity representation by Dual-Primal Graph CNN
(DPGCNN), which alternates convolution-like operations on the original graph and its dual graph. Structure representation
and attribute representation are learned by graph convolutional networks (GCNs). To further enrich the entity embedding, we
integrate the textual information of the entity into the entity graph embedding. Moreover, we fine-tune the entity similarity
matrix by integrating fine-grained features. Experimental results on three benchmark datasets from real-world KGs show
that our approach has superior performance to other representative entity alignment approaches in most cases.
Keywords Entity alignment · Knowledge graph · Representation learning · Relation · Attribute · Textual information
Machine learning-based entity alignment methods rely on central entities are equivalent. The relation between e3 and
feature construction techniques, where features are carefully e2 is r2 , while the relation between e7 and e6 is r5 . r2 and r5
designed manually for specific problems, but most of these have different semantics. Thus, the first challenge of entity
features are difficult to migrate to other scenarios and are alignment is how to effectively use the relations between
not suitable for large-scale knowledge graph entity align- entities. If the relations linked by the two central entities can
ment. The basic idea of the string similarity-based approach be considered, e3 and e7 will have a larger distance although
is to calculate the string similarity of two entities to deter- their neighbors can be aligned.
mine whether they represent the same entity. Representative Challenge 2: Structure heterogeneity of knowledge
models include Mapper [12], RuleMiner [13], SILK [14], graphs. As Fig. 1 shows, the source entity e3 in KG1
and LIMES [15]. In addition, the similarity metric-based and the target entity e11 in KG3 are equivalent in fact.
entity alignment method introduces chunking [16] and iter- However, some methods don’t think that e3 and e11 can be
ative matching [17], which can avoid the comparison of all aligned because the two central entities have no neighbors
entity pairs between two knowledge graphs. Most of the that can be aligned, the structures and relations alone do not
entity alignments based on string similarity are limited to provide sufficient information for aligning the two central
string descriptions of entities, definitions, and attributes, and entities. Thus, the second challenge of entity alignment is
cannot quantify the graph structure. how to mitigate the negative impact of knowledge graph
Recently, the embedding-based entity alignment approaches structure heterogeneity on model performance. Besides
get rid of the reliance on manually constructed features relation triples, there are also a large number of attribute
or rules. Entity alignment methods based on embedding triples in the knowledge graph. If attribute information is
are mainly divided into three methods based on translation utilized effectively, the two central entities will be aligned
model, based on graph neural network, and based on although their neighbors are different.
random walk. The first two methods map entities into a low Challenge 3: Effective use of textual information. The
dimensional vector space and entity alignment is performed more adequate information about the entities that can be
by calculating the distance between entity nodes on the obtained, the more beneficial the performance of the entity
vector space. And the last method relies on path similarity. alignment model. In addition to structure, relations, and
TransE [18] is a classical representation learning approach attributes, knowledge graphs provide textual information
that regards relations as translations from head entities to tail for entities that can provide positive signals for entity
entities, but it isn’t capable of performing well on complex alignment. However, this important information is ignored
relation graphs such as one-to-many. Most recently, graph by most existing models, mainly because this descriptive
neural networks (GNNs) [19–21] are widely used to learn information is cross-lingual and presented in a textual form
structure representation for KGs by recursively capturing that is not easy to handle. Figure 2 shows an example of
the vector representation of its neighboring entities. The a pair of aligned entities, triples, and entity descriptions.
drawbacks of GNNs-based entity alignment methods are Therefore, it is a challenge to cross language barriers
that the convolution operations are only applied to node and compute the semantic similarity of entity textual
features, and they can’t capture edge and attribute features information in the source and target knowledge graphs.
which are also useful for entity alignment. This may have In addition to the challenges mentioned above, most of
serious consequences in many cases. Random Walk (RW) the existing methods are directly based on the similarity
learns node embeddings in the embedding region of the matrix between entities for entity alignment and they fail
network by generating sequences, and its core innovation to incorporate fine-grained features. However, fine-grained
is to optimize node embeddings [22]. If nodes appear features can improve the performance of the model. It is
simultaneously on the random walk path in the graph, it worth exploring how to optimize the similarity matrix.
means they have similar embeddings. Solution. Motivated by the above observations, we
propose a semi-supervised RAEA framework for cross-
lingual entity alignment in this paper. Specifically, we
2 Motivation and contribution capture relation features by improved DPGCNN [23], which
is an extension of the graph attention mechanism to edges
The major challenges that remain to be addressed for entity using the dual graph, whose vertices correspond to the edges
alignment are listed as follows: of the original graph. Then, structure and attribute features
Challenge 1: Effective use of the semantics of rela- are both learned by GCNs in an easy but influential way.
tions. As Fig. 1 shows, the source entity e3 in KG1 and In addition, we enhance the entity representation by using
the target entity e7 in KG2 aren’t equivalent actually. While a fine-tuned Bidirectional Encoder Representations from
some methods will misalign e3 with e7 , because they only Transformers (BERT) model [24] and glove model [25] to
consider KG structure, and the three neighbors of the two handle textual information. Finally, we perform softmax
Cross-lingual knowledge graph entity alignment based on relation awareness and attribute involvement 6161
have been proposed, including TransH [29], TransR [30], and attribute information, but does not make use of tex-
and TransD [31], etc. Given the insufficient representation tual information. Moreover, most existing methods perform
power of the Trans-family model, neural network models alignment prediction directly based on the original similar-
were proposed for semantic matching, such as MLP [32], ity matrix and do not fine-tune the similarity matrix before
GAT [21], GCNs [33], etc. Compared to the translation performing entity alignment.
distance models, the neural network models have better per-
formance because they can aggregate the neighbor structure
information of the entities. 4 Problem formulation
To avoid losing the information contained in the seed set,
we embed two knowledge graphs to be aligned into a unified Definition 1 (Knowledge Graph). A knowledge graph can
low dimensional vector space. The smaller the distance be formalized as KG = (E, R, T), where E, R, and T represent
between two entity vectors, the higher the probability that entities, relations, and triples, respectively.
two entities can be aligned.
Definition 2 (Relation Triple). A relation triple can be
3.2 Embedding for EA formalized by the form (E, R, E), where E and R represent
entities and relations, respectively. A relation triple such
Early embedding-based methods follow similar processes to as (Lily, Classmate, John) coveys that the relation between
align entities from different knowledge graphs, which rely head entity Lily and tail entity John is Classmate.
on TransE to embed KG structure and then calculate the
distance between vectors. For example, JAPE [34] provides Definition 3 (Attribute Triple). An attribute triple can be
transitions for each embedding vector to its cross-lingual formalized by the form (E, A, V), where E, A, and V represent
counterparts in other spaces while preserving the function- entities, attributes, and attribute values, respectively. An
alities of monolingual embedding. To improve alignment attribute triple such as (Lily, Gender, Female) denotes the
accuracy, some methods present the iterative method to Gender of entity Lily is Female.
cope with the data scarcity challenge, such as BootEA [35].
The iterative strategy adds the new aligned entities obtained Definition 4 (Seed Set). We define the seed set as the
from each training to the seed set, expanding the size of set of pre-aligned entity pairs between two cross-lingual
the seed set and thus guiding the subsequent training pro- knowledge graphs.
cess. However, it increases the complexity of the model and
the training overhead. In addition, some models are jointly Definition 5 (Cross-lingual Entity Alignment). Given two
iteratively trained with different modules to improve the cross-lingual knowledge graphs and seed set, cross-lingual
alignment, such as MEEA [36]. entity alignment refers to automatically discovering and
Non-translational GNNs-based methods are also pro- aligning more entities based on two already known cross-
posed to align entities. GCN-Align [37] is among the first lingual knowledge graphs and the seed set.
attempt in this direction. GCN-based entity alignment meth-
ods [38–43] incorporate the entity’s neighbor’s information, Definition 6 (Dual Graph). Let G o = (V o , E o ) be the
so they don’t need too sufficient pre-aligned entity pairs. given original graph. The dual graph of G o is denoted by
GCNs alone fail to capture relation features, which are cru- G d = (V d = E o , E d ), where each dual node (i, j ) ∈ V d
cial for entity alignment. More recently, only a few works corresponds to an original
edge (i, j ) ∈ E o , two dual
vertices (i, j ), i , j ∈ V are connected by an edge in G d
d
like RDGCN [44], RAGA [45] have extended GCNs to
consider relation features, while they fail to incorporate if they share the same head or tail entities in G o [23].
complicated relations and attributes. Based on the assump-
tion that GNN’s approach is isomorphic, SEU [46] treats
cross-lingual entity alignment as a task assignment prob- 5 Our approach
lem, preserving only the basic graph convolution operations
for feature propagation, but SEU only makes use of entity 5.1 Overview
names and relation types, not information such as entity
attributes and entity descriptions. PSR [47] addresses the This subsection introduces how to jointly capture relation,
problem of high time complexity caused by negative sam- KG structure, attribute features, and textual information for
pling by using a method that does not require negative entity alignment. Figure 3 presents the specific composi-
sampling, and one side of the Siamese networks’ backprop- tion of RAEA framework. Given two cross-lingual KGs and
agation is restricted, but PSR does not take full advantage the seed set, dual graphs are generated to consider relation
of multiple information. EMGCN [48] incorporates relation features for entity alignment, whose nodes correspond to
Cross-lingual knowledge graph entity alignment based on relation awareness and attribute involvement 6163
the edges in the original KGs. In the first part, the graph We assume there are one-to-one alignments between testing
attention mechanism (GAT) is introduced to facilitate the source entities and testing target entities in this paper. That
interaction between the dual graph and the original graph to is, for each testing source entity, there is one and only one
get the relation-aware entity embedding of the original testing target entity corresponding to it.
graph. Moreover, RAEA utilizes multi-layer GCNs with
highway gates to embed entities of two heterogeneous 5.2 Relation-aware entity embedding
KGs to obtain the representation of the entities’ topolog-
ical structure. The optimized representation of the entity Improved DPGCNN is applied in our model to get relation-
is computed based on entity embedding and neighboring aware entity representation. Different from RDGCN [44],
structure embedding by concatenating. In the second part, we model more complicated relation features and obtain
attribute features are learned from multi-layer GCNs. In the a more accurate representation by using the multi-head
third part, textual information is learned to improve embed- attention mechanism in the learning process of both the
ding. Then we combine optimized embedding, attribute dual relation graph and the original graph. To better link
embedding, and textual embedding together to generate the the dual relation graph and the original graph, we introduce
embedding-based similarity matrix. Finally, for the entity an attention mechanism to capture their interaction infor-
to be aligned, the framework calculates the embedding- mation. During the process of learning on the dual graph,
based similarities between it and all candidate entities. Fine- we utilize a GAT on the dual graph to obtain relation fea-
grained features are incorporated into the similarity matrix. tures of the original graph. These original relation features
6164 B. Zhu et al.
are utilized in the learning process of the original graph to where σ d is the activation function, ReLU ; K is the number
compute attention scores for another GAT, generating the of head; Nid is the index set of neighbors of vid ; ϕ is
dk is the correlation coefficient computed by the
original node representation. The details of learning on the Softmax; eij
dual graph and the original graph are described as follows. k-th head; x dj denotes the representation of node vjd in the
dual graph.
5.2.1 Learning on dual graph with dual convolution
5.2.2 Learning on original graph with original convolution
Although GCNs can embed head and tail entities, they
cannot directly represent relations. The knowledge graph is Multi-head GAT is devised during the process of learning
stored in the form of a triple, which can be represented as on original graph. To incorporate the relation information
(e1, r, e2), where e1 represents the head entity, r represents generated by learning the dual graph into the original node
the relation, and e2 represents the tail entity. This indicates representation, we calculate the attention scores by using
that the relation r can link the head entity e1 and the tail the dual node representation in G d , which correspond to the
entity e2, and the head entity e1 and the tail entity e2 edges in the original graph G o .
associated with the relation r can reflect the semantics of the First, we need to do a linear transformation for the
relation r to some extent [49]. Therefore, we approximate input node, so that the node has a higher expression ability
the relation representation based on their head and tail entity of advanced features. Then we calculate the similarity
representations to improve entity alignment. For a relation coefficient as follows.
ri , the vector representation of it can be derived as: o
emn = a o x̃ dmn , (4)
Ri = concat (hri , t ri ), (1)
where a o maps the concatenate high-dimensional features to
where hri t ri
and are the sets of embedding of head entities a real number, implemented by the single-layer feedforward
and tail entities of relation ri from the original graph; neural network; x̃ dmn denotes the dual representation for the
concat (·) is a function to concatenate vectors. relation rmn between entity em and en .
The similarity between node i and neighbor node j ∈ Nid As in the case of learning on dual graph, we also use
is computed as: multi-head attention to integrate the entity representation in
original graph G o . For an entity em in original graph G o , its
d
eij = wij
d d
a (concat (R i , R j )), (2) representation x̃ om can be computed by:
⎛ ⎞
where a d maps the concatenated high-dimensional features K
1
to a real number, implemented by the single-layer feedfor- x̃ om = σ o ⎝ ok
ϕ(emn )x on ⎠ , (5)
ward neural network; R i and R j are node representations K o
k=1 t∈Nm
in G d which correspond to relation representations in G o ;
d is the weight where σo is the activation function; K is the number of
concat (·) is the concatenation function; wij
head; Nmo is the index set of neighbors of entity em in G o ;
for each relation in the dual relation graph, which is used ok is the similarity coefficient computed by
ϕ is Softmax; emn
to distinguish the semantics of different relations for better
d is calculated based on the possibility the k-th head; x on denotes the node representation of vno in
entity alignment. wij
the original graph.
that two different relations (Vid and Vjd ) share similar head
entities or tail entities in the original graph G o . While the 5.3 Representation learning of neighboring
Jaccard coefficient is just concerned with the characteristics structure
d by
that are common between samples. So we compute wij
adding the Jaccard coefficient of the head entities and the We incorporate relation features to the entity representation
Jaccard coefficient of the tail entities corresponding to two by learning on the dual and original graph. Then, we further
different relations. leverage GCNs [20, 37] to encode the neighboring structure
To make the model more stable, this layer introduces a information of KGs in this paper. GCNs are based on
multi-head attention mechanism based on RDGCN, using the spectral domain and can extract the spatial features of
K separate attention mechanisms for each node. The output graph structure data by using Laplace matrix. In addition,
representation at dual node vid (corresponding to relation GCNs adopt an end-to-end learning approach to improve
ri ∈ G o ) is x̃ di , which is calculated as: efficiency, which can automatically learn features from the
⎛ ⎞ original data. In other words, feature extraction is integrated
⎜1
K
dk d ⎟
into the algorithm without manual intervention.
x di = σ d ⎝ ϕ(eij )x j ⎠ , (3) Laplace matrix of a graph is calculated by degree matrix
K
k=1 j ∈N
of graph node D and adjacency matrix A. L = D − A
d
i
Cross-lingual knowledge graph entity alignment based on relation awareness and attribute involvement 6165
is a combination Laplace. This situation concerns the gap converted without retaining the original information, which
between adjacent nodes. In practical work, we use the is equivalent to an ordinary neural network:
following calculation: (l)
H (l+1)
=
H (l) , T H
= 0 , (10)
− 12 A
D− 12 , H (l+1) , T H (l) = 1
L=D (6)
where A = A+I , and I is the identity matrix A, on the basis 5.4 Attribute embedding
of adjacency matrix A, the diagonal elements are set to 1,
that is to add a self ring to the node and increase the node’s According to people’s experience, if an entity has numerous
is the diagonal
own characteristics in the transfer process; D aligned neighbors and the distance between the source entity
node degree matrix of A; L is used to realize symmetry and the target entity is very small, it is easy to find the
normalization and eliminate the influence of node degree. target entity by embedding the extensive structure of an
(l)
Let H s denotes the entity node representation in l-th entity, and the precision rate is very high. But in fact, not
layer, and the hidden state update formula is as follows: all entities have sufficient neighbors which can provide
valid information for entity alignment. As Fig. 1 shows,
the central entity e3 in KG1 and the central entity e11
H (l+1)
s = σ LH (l)
s Ws
(l)
, (7)
in KG3 have no aligned neighbors and the number of
neighbors of the two central entities is also different. In this
where W (l)s ∈ d (l) × d (l+1) is a layer-specific trainable case, current embedding-based entity alignment methods,
weight matrix of the l-th layer in the GCN, d (l+1) is the which only take advantage of entity or relation information
number of features in the (l + 1)-th layer; σ is an activation when aligning entities, have little ability to match the entity
function, ReLU . that should have been aligned. Attribute triples make up
In this paper, GCNs are used to embed structure, but with a large percentage of the knowledge graph. Therefore, by
the deepening of network depth, gradient information flow considering the attribute information of entities, we can
is blocked, which will incorporate too much neighborhood distinguish the candidate entities in the alignment stage
information and lead to network training difficulties. from a global perspective, and the correct target entity will
To control the noise propagation, we also apply layer- have a higher probability of being found. Many studies have
wise highway networks [50] during the process of entity shown that using the same method to learn entity structures
embedding to let more information be returned directly and semantics can improve the effectiveness of the model if
to the input without a nonlinear transformation [49]. possible, such as Inga [43] and GCN-Align [37].
Specifically, two nonlinear transformation layers are added, Based on the fact above, we employ GCNs to learn
one is T (transform gate) and the other is C (carry gate). The attribute embedding in this paper. The detail of attribute
update formula can be defined as follows: learning is shown in Algorithm 1. It is worth noting that
dual graphs are designed to fuse relation information, while
H (l+1) = C H (l) · H (l) + T H (l) · H (l+1) , (8) learning attribute information does not require dual graphs
as input. Attribute embedding is independent of entity
where H (l) is the input to layer l + 1; · is element-wise embedding, so we set two different feature vectors for
(l)
multiplication; C represents the part of the original input structure and attribute, respectively. Let H a denotes the
information that is retained and C = 1 − T ; T represents attribute representation in l-th layer, and the convolutional
the part of the input information that is transformed by the computation is as follows:
convolutional or recurrent information:
H (l+1)
a = σ LH (l)
a Wa
(l)
, (11)
T H (l)
=δ W (l)
T H
(l)
+ b(l) , (9) (l)
T where L is a combination Laplace; W a ∈ d (l) × d (l+1) is a
(l)
layer-specific trainable weight matrix of the l-th layer in the
where H (l) is the input to layer l + 1; W T is the weight GCN, d (l+1) is the number of features in the (l +1)-th layer;
(l)
matrix; bT is the bias vector; δ is a Sigmoid function. σ is an activation function, which is also set as ReLU .
Highway networks are used to process only a portion
of the input, while the rest of the input passes directly 5.5 Textual embedding
through the network. Special cases are as follows, when
T= 0, all the original input information is retained without In the real world, many knowledge graphs [1, 51] provide
any change, and when T = 1, all the original information is description information which is presented in the form of
6166 B. Zhu et al.
where eopt is the optimized entity embedding of entity e where [x]+ =max{x, 0} calculates the maximum of x and 0;
which incorporates relation feature and structure feature; ha γ > 0 is a margin hyper-parameter of the distance between
is the embedding of attribute; ht is the embedding of textual positive and negative samples; L and L represent the set
information; concat (·) is a function to concatenate vectors; of positive and negative triples, respectively. Instead of
α is a trade-off coefficient to balance the importance of randomly sampling negative instances [18] that samples the
optimized embedding and attribute embedding; β is a replacer of the entity from the whole entity set, we get more
trade-off coefficient to balance the importance of graph discriminative negative samples L generated by nearest-
embedding and textual information. neighbor sampling [53] to limit the range of sampling.
Cross-lingual knowledge graph entity alignment based on relation awareness and attribute involvement 6167
Specifically, for the entity e to be aligned, the model 2, 3, 4}. The best parameter settings depend on the values
computes the K-nearest entities of e in the embedding space of the evaluation metrics Hits@1 and Hits@10. The best
as candidates during the process of training. We calculate configuration is λ=0.001, de =300, ds =300, da =100, ω=125,
the top-K nearest neighbors of entities with Manhattan γ =3, α=0.8, β=0.8, ξ =200, ζ =1000, =10, η=2, θ=2,
distance described in Eq. 13. and κ=2.
6 Experiments
6.1.1 Datasets
For comparison, we have selected some representative mod- 6.1.5 Impact of alignment direction
els as baselines. These baselines can be roughly classified
into two categories: TransE-based models and GNNs-based Table 3 reports the performance of our model RAEA in both
models. directions on three different datasets for the comparison of
TransE-based models. AKE [56], JAPE [34], JTMEA the results. For example, ZH − EN in the table means that
[55], JETEA [57], AlignE [35], RTEA [58], MEEA [36], the source knowledge graph is a Chinese dataset and the
BootEA [35], NAEA [42], MMR [59] and JarKA [60]. target knowledge graph is an English dataset. The process
GNNs-based models. GCN-Align [37], KECG [39], of alignment of the knowledge graph is that given the source
MuGNN [40], Inga [43], LatsEA [61], AliNet [41], HMEA and target knowledge graphs, for each entity in the source
[62], GTEA [63], GM [38], RDGCN [44], HGCN [49], knowledge graph, go to the target knowledge graph to find
NMN [64], EMGCN [48], RAGA [45], PSR [47] , HMAN the equivalent target entity. Therefore we explore whether
[24], SEU [46] and BERT-INT [65]. different directions have an impact on the performance of
Table 2 Hyperparameter
settings of the proposed Name of hyperparameter Symbol
technique
Learning rate λ
Dimension of the hidden representation of entity embedding de
Dimension of the hidden representation of structural embedding ds
Dimension of the hidden representation of attribute embedding da
Number of negative samples ω
Marginal hyperparameter between positive and negative samples γ
Balance parameter for entity structure embedding and attribute embedding α
Balance parameter for graph embedding and texual embedding β
The number of the epochs for joint entity and structure training ξ
The number of the epochs for attribute training ζ
Update the interval epoch of negative samples
Number of layers of GCNs η
The interaction times between dual and original graph θ
Number of heads of multi-headed attention κ
Cross-lingual knowledge graph entity alignment based on relation awareness and attribute involvement 6169
Table 3 The influence of alignment direction on results the performance in the entity alignment task, but it has
some limitations. The key lies in how to make better use
Model Datasets Alignment direction Hits@1 Hits@10
of the entity’s extensive structural information and semantic
RAEA DBP 15KZH −EN ZH − EN 95.1 99.2 information, and fusing the entity’s external semantic infor-
EN − ZH 90.9 98.6 mation is more useful than using only the entity’s structural
DBP 15KJ A−EN J A − EN 95.9 99.5 information.
EN − J A 92.6 97.2 Moreover, BERT-INT performs best among all base-
DBP 15KF R−EN F R − EN 98.4 99.8 lines regarding nearly all metrics. Although BERT-INT
EN − F R 95.3 96.4 was not the best performer on the Hits@10 metric for
the DBP 15KF R−EN dataset, it was only 0.1% different
from the best metric. On the Hits@10 metric on dataset
aligned entities. The experimental results show that when DBP 15KJ A−EN , BERT-INT differs from HMAN by only
the target knowledge graph is an English dataset, the values 0.3%. By observing the experimental results, we can find
of both Hits@1 and Hits@10 metrics are somewhat higher, that the Hits@1 of our proposed model is slightly worse than
and the above phenomenon occurs on all three datasets. the performance of the BERT-INT model, which uses BERT
Therefore, we use the knowledge graph of the English for embedding both entity descriptions and entity names.
language as the target knowledge graph in this paper. However, considering the computational complexity, our
model only uses BERT for embedding entity descriptions.
6.2 Overall results Our model RAEA exceeds all comparative models except
for the BERT-INT model. Even though several better-
Table 4 presents the overall performance of the models performing GNNs-based methods RDGCN, HGCN, and
on entity alignment. The prior approaches are divided NMN incorporate entity neighbors or relations into entity
into two categories: TransE-based methods and GNNs- embeddings, they do not consider attributes, nor refine
based methods which are shown on the top part of the the similarity matrix. EMGCN, RAGA, PSR, and SEU
table. The experimental data for all comparison models perform second only to the optimal model, this may be
were obtained from the original papers. The results of our because none of these four models make use of the textual
models are listed at the bottom of the table. The number embedding information. Even though HMAN also uses rela-
of evaluation metrics is in percentage (%). We compare tions, attributes, and entity descriptions, the large difference
the performance of each model based on the values of between HMAN model and our model proves our valid-
evaluation metrics, and the evaluation metrics used in ity for graph embedding and embedding textual informa-
this paper are Hits@1 and Hits@10, which represent the tion, because we consider the case where entity description
better performance of the model with higher values. The information may not be available. This confirms that rela-
experimental results show that RAEA outperforms almost tion awareness, attribute involvement, textual embedding,
all baselines on three datasets. Hits@1 is similar to the and fine-tuning the matrix can effectively improve the
precision of alignment. Therefore, the outstanding results performance of entity alignment.
in Hits@1 demonstrate the superiority of our proposed
methods on the three datasets. We also highlight the best 6.3 Ablation experiments
scores for the TransE-based and GNNs-based methods by
underlining the numbers on all metrics in each dataset, To better analyze the usefulness of our models, we conduct
respectively. four ablated experiments in detail by ourselves. More
Specifically, for all TransE-based models, JarKA per- specifically, in Table 5, we report the results of RAEA
forms best. The value of the Hits@1 metric of JarKA model w/o HG, which removes highway networks from RAEA.
is firmly in the first place on all three datasets, although the RAEA w/o SE and RAEA w/o AE represent the removal
value of Hits@10 is not as good as that of NAEA model of structural embedding and attribute embedding from the
on DBP 15KJ A−EN and DBP 15KF R−EN , it is not much RAEA model, respectively. RAEA w/o TE represents the
different from that of NAEA. Hits@1 is equivalent to the removal of textual embedding from the RAEA model.
accuracy rate, and the Hits@1 of JarKA is much higher RAEA w/o MF represents no fine-tuning of the similarity
than that of NAEA model, which proves that overall the matrix.
JarKA model’s performance is better overall. JarKA boot-
straps the models iteratively by the extended alignments. 6.3.1 Effect of highway networks
Notably, although RAEA does not apply an iterative strat-
egy, it achieves much better performance than JarKA. So To prove the effectiveness of highway networks used in our
this confirms the iterative strategy is helpful to improve approach, we remove highway networks from the model
6170 B. Zhu et al.
and only use the original GCNs for structural embedding. conclude that the performance of RAEA w/o HG drops,
RAEA w/o HG in Table 5 refers to the model RAEA with- which shows that incorporating highway networks into orig-
out highway gates. From the experimental results, we can inal GCNs can effectively address the problem of network
training difficulties caused by the deepening of the network 6.3.4 Effect of textual embedding
and the obstruction of gradient information flow back.
To analyze whether textual embedding can have an effect on
6.3.2 Effect of neighbor structure embedding the performance of the model, we made an ablation model
RAEA w/o TE in Table 5. Removing the textual embedding
To analyze the influence of neighbor structure embedding brings down the performance on all metrics and datasets. By
on entity alignment, we only consider the relation and observing Table 5, RAEA w/o TE has the largest difference
attributes of entities in the graph embedding part. RAEA from RAEA compared to the other ablation experiments,
w/o SE from Table 5 is the ablated model. From the which can prove the importance and effectiveness of textual
experimental results, we can find that the performance of embedding for entity alignment.
the ablated model drops, which suggests that it is necessary
to use neighbor structure embedding for better entity 6.3.5 Effect of similarity matrix fine-tuning strategy
alignment. Neighboring structures of entities provide broad
information, relations, attributes, and textual information To demonstrate whether softmax operation can improve the
provide semantic information, and structures and semantics performance of the model, we made an ablation model
are mutually reinforcing. RAEA w/o MF in Table 5. Removing the softmax operation
on the similarity matrix brings down the performance
6.3.3 Effect of attribute embedding on all metrics and datasets, proving the validity of the
fine-tuning strategy for entity alignment. Experimental
To verify whether the attribute information contributes results demonstrate that the softmax operation can indeed
to entity alignment, we made an ablation model RAEA incorporate fine-grained features into the feature matrix.
w/o AE in Table 5. Based on the experimental results,
we find that incorporating attribute information into our 6.4 Discussion
model does optimize the model, especially in the metric
Hits@1 where the model performance improves more sig- 6.4.1 Impact of the ratio of available seed sets
nificantly. This indicates that the performance of entity
alignment relies heavily on topological structure informa- The performance of the model is heavily influenced by the
tion, and at the same time additional attribute information is number of seed sets, so we compare our model with NMN,
very crucial. HGCN, RDGCN, and BootEA by increasing the seed set
Fig. 6 The effects of information balance parameter on Hits@1 Fig. 9 The effects of information balance parameter on Hits@10
Fig. 7 The effects of information balance parameter on Hits@10 Fig. 10 The effects of interaction times on Hit@1
Fig. 8 The effects of information balance parameter on Hits@1 Fig. 11 The effects of interaction times on Hit@10
Cross-lingual knowledge graph entity alignment based on relation awareness and attribute involvement 6173
Fig. 13 The effects of different GCN layers on Hits@10 Fig. 14 Consumed time test
6174 B. Zhu et al.
Netw Anal Min, Vancouver, British Columbia, Canada, 27–30 49. Wu Y, Liu X, Feng Y, Wang Z, Zhao D (2019) Jointly learning
August 2019 entity and relation representations for entity alignment. Paper
34. Sun Z, Hu W, Li C (2017) Cross-lingual entity alignment via Presented at the 2019 Conf Empir Methods Nat Lang Process
joint attribute-preserving embedding. Paper Presented at the 16th and the 9th International Joint Conference on Natural Language
International Semantic Web Conference, Vienna, Austria, 21–25 Processing, Hong Kong China, 3–7 November 2019
October 2017 50. Srivastava RK, Greff K, Schmidhuber J (2015) Highway
35. Sun Z, Hu W, Zhang Q, Qu Y (2018) Bootstrapping entity Networks. arXiv:1505.00387
alignment with knowledge graph embedding. Paper Presented at 51. Mahdisoltani F, Biega J, Suchanek FM (2015) YAGO3 : A
the 27th Int Jt Conf Artif Intell, Stockholm, Sweden, 13-19 July, knowledge base from multilingual wikipedias. Paper Presented at
2018 the 7th Bienn Conf Innov Data Syst Res, Asilomar, CA USA, 4–7
36. Chen L, Tian X, Tang X, Cui J (2021) Multi-information January 2015
embedding based entity alignment. Appl Intell 51(12):8896–8912. 52. Munne RF, Ichise R (2020) Joint entity summary and attribute
https://doi.org/10.1007/s10489-021-02400-8 embeddings for entity alignment between knowledge graphs.
37. Wang Z, Lv Q, Lan X, Zhang Y (2018) Cross-lingual Paper Presented at the Hybrid Artificial Intelligent Systems - 15th
knowledge graph alignment via graph convolutional networks. International Conference Spain, 11–13 November 2020
Paper Presented at the 2018 Conf Empir Methods Nat Lang 53. Kotnis B, Nastase V (2017) Analysis of the impact of negative
Process, Brussels, Belgium, 31 October–4 November 2018 sampling on link prediction in knowledge graphs. arXiv:1708.
38. Xu K, Song L, Feng Y, Song Y, Yu D (2020) Coordinated 06816
reasoning for cross-lingual knowledge graph alignment. Paper 54. Chen M, Tian Y, Yang M, Zaniolo C (2017) Multilingual
Presented at the 32nd Innov Appl Artif Intell Conf, New York, NY, Knowledge Graph Embeddings for Cross-lingual Knowledge
USA, 7–12 February, 2020 Alignment. Paper Presented at the 26th Int Jt Conf Artif Intell,
39. Li C, Cao Y, Hou L, Shi J, Li J, Chua T (2019) Semi-supervised Melbourne, Australia, 19–25 August 2017
entity alignment via joint knowledge embedding model and cross- 55. Lu G, Zhang L, Jin M, Li P, Huang X (2021) Entity alignment via
graph model. Paper Presented at the 2019 Conf Empir Methods knowledge embedding and type matching constraints for knowl-
Nat Lang Process and the 9th International Joint Conference on edge graph inference. J Ambient Intell Humaniz Comput,(4),
Natural Language Processing, Hong Kong, China, 3-7 November pp 1–11
2019 56. Lin X, Yang H, Wu J, Zhou C, Wang B (2019) Guiding cross-
40. Cao Y, Liu Z, Li C, Liu Z, Li J, Chua T (2019) Multi-channel lingual entity alignment via adversarial knowledge embedding.
graph neural network for entity alignment. Paper Presented Paper Presented at Int Conf Data Min, Beijing, China, 8–11
at the 57th Conference of the Association for Computational November 2019
Linguistics, Florence, Italy, 28 July – 2 August, 2019 57. Song X, Zhang H, Bai L (2021) Entity alignment between
41. Sun Z, Wang C, Hu W, Chen M, Dai J, Zhang W, Qu Y knowledge graphs using entity type matching. Paper Presented
(2020) Knowledge graph alignment network with gated multi- at Knowledge Science, Engineering and Management - 14th
hop neighborhood aggregation. Paper Presented at the 32nd Innov International Conference, KSEM 2021, Tokyo, Japan, 14–16
Appl Artif Intell Conf, New York, NY, USA, 7–12 February 2020 August 2021
42. Zhu Q, Zhou X, Wu J, Tan J, Guo L (2019) Neighborhood-aware 58. Jiang T, Bu C, Zhu Y, Wu X (2019) Two-stage entity alignment:
attentional representation for multilingual knowledge graphs. combining hybrid knowledge graph embedding with similarity-
Paper Presented at the 28th Int. Jt Conf Artif Intell, Macao, China, based relation alignment. Paper Presented at the 16th Pacific Rim
10–16 August 2019 International Conference on Artificial Intelligence, Cuvu, Yanuca
43. Pang N, Zeng W, Tang J, Tan Z, Zhao X (2019) Iterative Island, Fiji, 26–30 August 2019
entity alignment with improved neural attribute embedding. Paper 59. Shi X, Xiao Y (2019) Modeling Multi-mapping Relations for
Presented at the 16th Extended Semantic Web Conference 2019, Precise Cross-lingual Entity Alignment. Paper Presented at the
Portoroz, Slovenia, 2 June 2019 2019 Conf Empir Methods Nat Lang Process and the 9th
44. Wu Y, Liu X, Feng Y, Wang Z, Yan R, Zhao D (2019) Relation- International Joint Conference on Natural Language Processing,
aware entity alignment for heterogeneous knowledge graphs. Hong Kong, China, 3–7 November 2019
Paper Presented at the 28th Int Jt Conf Artif Intell, Macao, China, 60. Chen B, Zhang J, Tang X, Chen H, Li C (2020) JarKA: modeling
10–16 August 2019 attribute interactions for cross-lingual knowledge alignment.
45. Zhu R, Ma M, Wang P (2021) RAGA: relation-aware graph Paper Presented At Advances In Knowledge Discovery And Data
attention networks for global entity alignment. Paper Presented at Mining - 24th Pacific-Asia Conference, Singapore, pp 11–14, May
25th Pacific-Asia Conference, Virtual Event, 11–14 May 2021 2020
46. Mao X, Wang W, Wu Y, Lan M (2021) From alignment to 61. Chen W, Chen X, Xiong S (2021) Global entity alignment with
assignment: Frustratingly Simple Unsupervised Entity Alignment. gated latent space neighborhood aggregation. Paper Presented At
Paper Presented at the 2021 Conf Empir Methods Nat Lang The 20Th China National Conference, Hohhot, China, pp 13–15
Process, Virtual Event / Punta Cana Dominican Republic, 7–11 62. Guo H, Tang J, Zeng W, Zhao X, Liu L (2021) Multi-modal entity
November 2021 alignment in hyperbolic space. Neurocomputing 461:598–607.
47. Mao X, Wang W, Wu Y, Lan M (2021) Are negative https://doi.org/10.1016/j.neucom.2021.03.132
samples necessary in entity alignment?: An Approach with High 63. Jiang S, Nie T, Shen D, Kou Y, Yu G (2021) Entity alignment
Performance, Scalability and Robustness. Paper Presented at the of knowledge graph by joint graph attention and translation
30th ACM Int Conf Inf Knowl Manag, Virtual Event, Queensland representation. Paper Presented At The 18Th International
Australia, 1–5 November 2021 Conference, Kaifeng, China, pp 24–26
48. Tam NT, Trung HT, Yin H, Vinh TV, Sakong D, Zheng B, 64. Wu Y, Liu X, Feng Y, Wang Z, Zhao D (2020) Neighborhood
Hung NQV (2021) Multi-order graph convolutional networks for Matching Network for Entity Alignment. Paper Presented At The
knowledge graph alignment. Paper Presented at the 37th IEEE Int 58th Annual Meeting Of The Association For Computational
Conf Data Eng, Chania Greece, 19–22 April 2021 Linguistics Online, pp 5–10
Cross-lingual knowledge graph entity alignment based on relation awareness and attribute involvement 6177
Affiliations
Beibei Zhu1 · Tie Bao1 · Lu Liu2,3 · Jiayu Han4 · Junyi Wang1 · Tao Peng1,3
Beibei Zhu
zhubb20@mails.jlu.edu.cn
Tie Bao
baotie@jlu.edu.cn
Lu Liu
liulu@jlu.edu.cn
Jiayu Han
jyhan126@uw.edu
Junyi Wang
wangjy20@mails.jlu.edu.cn
1 College of Computer Science and Technology, Jilin University,
Qianjin Street, Changchun, 130012, Jilin, China
2 College of Software, Jilin University, Qianjin Street,
Changchun, 130012, Jilin, China
3 Key Laboratory of Symbol Computation and Knowledge
Engineering for Ministry of Education, Jilin University,
Qianjin Street, Changchun, 130012, Jilin, China
4 Department of Linguistics, University of Washington,
WA98195-3770, Seattle, 98195, Washington, USA