Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Cross-Lingual Knowledge Graph Entity Alignment Based On Relation Awareness and Attribute Involvement

Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

Applied Intelligence (2023) 53:6159–6177

https://doi.org/10.1007/s10489-022-03797-6

Cross-lingual knowledge graph entity alignment based on relation


awareness and attribute involvement
Beibei Zhu1 · Tie Bao1 · Lu Liu2,3 · Jiayu Han4 · Junyi Wang1 · Tao Peng1,3

Accepted: 22 May 2022 / Published online: 6 July 2022


© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2022

Abstract
Entity alignment is an effective means of matching entities from various knowledge graphs (KGs) that represent the
equivalent real-world object. With the development of representation learning, recent entity alignment methods learn entity
structure representation by embedding KGs into a low-dimensional vector space, and then entity alignment relies on
the distance between entity vectors. In addition to the graph structures, relations and attributes are also critical to entity
alignment. However, most existing approaches ignore the helpful features included in relations and attributes. Therefore, this
paper presents a new solution RAEA (Relation Awareness and Attribute Involvement for Entity Alignment), which includes
relation and attribute features. Relation representation is incorporated into entity representation by Dual-Primal Graph CNN
(DPGCNN), which alternates convolution-like operations on the original graph and its dual graph. Structure representation
and attribute representation are learned by graph convolutional networks (GCNs). To further enrich the entity embedding, we
integrate the textual information of the entity into the entity graph embedding. Moreover, we fine-tune the entity similarity
matrix by integrating fine-grained features. Experimental results on three benchmark datasets from real-world KGs show
that our approach has superior performance to other representative entity alignment approaches in most cases.

Keywords Entity alignment · Knowledge graph · Representation learning · Relation · Attribute · Textual information

1 Introduction applications like information extraction [5], machine trans-


lation [6], and intelligent question-answering [7]. Entity
In recent years, knowledge graphs such as DBpedia [1], alignment is also known as entity resolution or entity match-
YAGO [2], and BabelNet [3] have been widely used in many ing, etc [8].
fields. Different organizations will choose the data source Traditional entity alignment methods rely on machine
according to their own business needs. In addition, the meth- translation or feature engineering, which are labor-intensive
ods of constructing knowledge graphs in different fields and require a lot of resources. In addition, the hand-designed
do not have unified industry standards, which leads to the features are subjective. Therefore, the accuracy of these
problems of heterogeneity and redundancy among different methods depends heavily on the quality of the translation
knowledge graphs. For example, entity http://dbpedia.org/ and the definition of the features. Traditional entity align-
resource/Russes and entity http://fr.dbpedia.org/resource/ ment methods are mainly classified into three categories
Russians in DBpedia refer to the same real-world iden- based on traditional probabilistic models, based on machine
tity. Therefore, to make full use of entity information, learning, and based on string similarity. When using tradi-
more and more researchers integrate different knowledge tional probabilistic models for entity alignment, there is no
graphs by aligning entities [4]. Entity alignment intends need to consider the relation between entities, and only the
to automatically match equivalent entities in different field weights of the log-likelihood ratio are used to deter-
knowledge graphs, which is beneficial to knowledge-driven mine the similarity of attributes and assign link states to
candidate pairs, which is an unsupervised approach that does
not rely on training data [9]. The traditional probabilistic
 Tao Peng model-based entity alignment methods [10, 11] have poor
tpeng@jlu.edu.cn
robustness and significant limitations. Machine learning-
based entity alignment methods focus on transforming the
Extended author information available on the last page of the article. entity alignment problem into a binary classification problem.
6160 B. Zhu et al.

Machine learning-based entity alignment methods rely on central entities are equivalent. The relation between e3 and
feature construction techniques, where features are carefully e2 is r2 , while the relation between e7 and e6 is r5 . r2 and r5
designed manually for specific problems, but most of these have different semantics. Thus, the first challenge of entity
features are difficult to migrate to other scenarios and are alignment is how to effectively use the relations between
not suitable for large-scale knowledge graph entity align- entities. If the relations linked by the two central entities can
ment. The basic idea of the string similarity-based approach be considered, e3 and e7 will have a larger distance although
is to calculate the string similarity of two entities to deter- their neighbors can be aligned.
mine whether they represent the same entity. Representative Challenge 2: Structure heterogeneity of knowledge
models include Mapper [12], RuleMiner [13], SILK [14], graphs. As Fig. 1 shows, the source entity e3 in KG1
and LIMES [15]. In addition, the similarity metric-based and the target entity e11 in KG3 are equivalent in fact.
entity alignment method introduces chunking [16] and iter- However, some methods don’t think that e3 and e11 can be
ative matching [17], which can avoid the comparison of all aligned because the two central entities have no neighbors
entity pairs between two knowledge graphs. Most of the that can be aligned, the structures and relations alone do not
entity alignments based on string similarity are limited to provide sufficient information for aligning the two central
string descriptions of entities, definitions, and attributes, and entities. Thus, the second challenge of entity alignment is
cannot quantify the graph structure. how to mitigate the negative impact of knowledge graph
Recently, the embedding-based entity alignment approaches structure heterogeneity on model performance. Besides
get rid of the reliance on manually constructed features relation triples, there are also a large number of attribute
or rules. Entity alignment methods based on embedding triples in the knowledge graph. If attribute information is
are mainly divided into three methods based on translation utilized effectively, the two central entities will be aligned
model, based on graph neural network, and based on although their neighbors are different.
random walk. The first two methods map entities into a low Challenge 3: Effective use of textual information. The
dimensional vector space and entity alignment is performed more adequate information about the entities that can be
by calculating the distance between entity nodes on the obtained, the more beneficial the performance of the entity
vector space. And the last method relies on path similarity. alignment model. In addition to structure, relations, and
TransE [18] is a classical representation learning approach attributes, knowledge graphs provide textual information
that regards relations as translations from head entities to tail for entities that can provide positive signals for entity
entities, but it isn’t capable of performing well on complex alignment. However, this important information is ignored
relation graphs such as one-to-many. Most recently, graph by most existing models, mainly because this descriptive
neural networks (GNNs) [19–21] are widely used to learn information is cross-lingual and presented in a textual form
structure representation for KGs by recursively capturing that is not easy to handle. Figure 2 shows an example of
the vector representation of its neighboring entities. The a pair of aligned entities, triples, and entity descriptions.
drawbacks of GNNs-based entity alignment methods are Therefore, it is a challenge to cross language barriers
that the convolution operations are only applied to node and compute the semantic similarity of entity textual
features, and they can’t capture edge and attribute features information in the source and target knowledge graphs.
which are also useful for entity alignment. This may have In addition to the challenges mentioned above, most of
serious consequences in many cases. Random Walk (RW) the existing methods are directly based on the similarity
learns node embeddings in the embedding region of the matrix between entities for entity alignment and they fail
network by generating sequences, and its core innovation to incorporate fine-grained features. However, fine-grained
is to optimize node embeddings [22]. If nodes appear features can improve the performance of the model. It is
simultaneously on the random walk path in the graph, it worth exploring how to optimize the similarity matrix.
means they have similar embeddings. Solution. Motivated by the above observations, we
propose a semi-supervised RAEA framework for cross-
lingual entity alignment in this paper. Specifically, we
2 Motivation and contribution capture relation features by improved DPGCNN [23], which
is an extension of the graph attention mechanism to edges
The major challenges that remain to be addressed for entity using the dual graph, whose vertices correspond to the edges
alignment are listed as follows: of the original graph. Then, structure and attribute features
Challenge 1: Effective use of the semantics of rela- are both learned by GCNs in an easy but influential way.
tions. As Fig. 1 shows, the source entity e3 in KG1 and In addition, we enhance the entity representation by using
the target entity e7 in KG2 aren’t equivalent actually. While a fine-tuned Bidirectional Encoder Representations from
some methods will misalign e3 with e7 , because they only Transformers (BERT) model [24] and glove model [25] to
consider KG structure, and the three neighbors of the two handle textual information. Finally, we perform softmax
Cross-lingual knowledge graph entity alignment based on relation awareness and attribute involvement 6161

Fig. 1 An example to illustrate


the importance of relations and
attributes when aligning entities

calculations on the entity similarity matrix to incorporate 3 Related work


fine-grained features into the similarity matrix.
Contribution. The main contributions of the paper are as 3.1 KG embedding
follows:
KGs contain knowledge about the world and provide a
1. We propose a representation learning-based entity structural representation of knowledge [26]. To learn the
alignment model RAEA that preserves relation-aware- representation of KGs [27], KG embedding maps entities
ness and attribute embedding, while we incorporate a and relations from the knowledge graph to a continuous
textual information enhancement strategy. vector space, making it more convenient to operate the
2. We require only entity seed sets in this paper to reduce knowledge graph in the downstream task [28].
the overhead of building pre-aligned relation pairs and The methods of KG embedding are divided into two
attribute pairs. categories: translation distance model and neural network
3. We provide an informative entity representation using model. Translation distance model mainly uses the distance-
the structure of entities and multiple semantic informa- based scoring function, which regards the relation vector
tion, so the research in this paper is beneficial for other as a translation between the head entity vector and the
entity representation-based tasks. tail entity vector. Among them, the most representative is
4. We evaluate RAEA on public benchmark datasets, TransE [18]. TransE is simple and performs well in one-to-
including three cross-lingual datasets. Compared with one link prediction tasks. However, when the relation is one-
conventional entity alignment baselines, experimental to-many or many-to-one or many-to-many, the representa-
results and ablation study demonstrate the outperfor- tion of entities or relations cannot be learned accurately.
mance and robustness of RAEA. To overcome the lack of TransE, other Trans-family models

Fig. 2 An example of a pair of


aligned entity, triples and entity
descriptions
6162 B. Zhu et al.

have been proposed, including TransH [29], TransR [30], and attribute information, but does not make use of tex-
and TransD [31], etc. Given the insufficient representation tual information. Moreover, most existing methods perform
power of the Trans-family model, neural network models alignment prediction directly based on the original similar-
were proposed for semantic matching, such as MLP [32], ity matrix and do not fine-tune the similarity matrix before
GAT [21], GCNs [33], etc. Compared to the translation performing entity alignment.
distance models, the neural network models have better per-
formance because they can aggregate the neighbor structure
information of the entities. 4 Problem formulation
To avoid losing the information contained in the seed set,
we embed two knowledge graphs to be aligned into a unified Definition 1 (Knowledge Graph). A knowledge graph can
low dimensional vector space. The smaller the distance be formalized as KG = (E, R, T), where E, R, and T represent
between two entity vectors, the higher the probability that entities, relations, and triples, respectively.
two entities can be aligned.
Definition 2 (Relation Triple). A relation triple can be
3.2 Embedding for EA formalized by the form (E, R, E), where E and R represent
entities and relations, respectively. A relation triple such
Early embedding-based methods follow similar processes to as (Lily, Classmate, John) coveys that the relation between
align entities from different knowledge graphs, which rely head entity Lily and tail entity John is Classmate.
on TransE to embed KG structure and then calculate the
distance between vectors. For example, JAPE [34] provides Definition 3 (Attribute Triple). An attribute triple can be
transitions for each embedding vector to its cross-lingual formalized by the form (E, A, V), where E, A, and V represent
counterparts in other spaces while preserving the function- entities, attributes, and attribute values, respectively. An
alities of monolingual embedding. To improve alignment attribute triple such as (Lily, Gender, Female) denotes the
accuracy, some methods present the iterative method to Gender of entity Lily is Female.
cope with the data scarcity challenge, such as BootEA [35].
The iterative strategy adds the new aligned entities obtained Definition 4 (Seed Set). We define the seed set as the
from each training to the seed set, expanding the size of set of pre-aligned entity pairs between two cross-lingual
the seed set and thus guiding the subsequent training pro- knowledge graphs.
cess. However, it increases the complexity of the model and
the training overhead. In addition, some models are jointly Definition 5 (Cross-lingual Entity Alignment). Given two
iteratively trained with different modules to improve the cross-lingual knowledge graphs and seed set, cross-lingual
alignment, such as MEEA [36]. entity alignment refers to automatically discovering and
Non-translational GNNs-based methods are also pro- aligning more entities based on two already known cross-
posed to align entities. GCN-Align [37] is among the first lingual knowledge graphs and the seed set.
attempt in this direction. GCN-based entity alignment meth-
ods [38–43] incorporate the entity’s neighbor’s information, Definition 6 (Dual Graph). Let G o = (V o , E o ) be the
so they don’t need too sufficient pre-aligned entity pairs. given original graph. The dual graph of G o is denoted by
GCNs alone fail to capture relation features, which are cru- G d = (V d = E o , E d ), where each dual node (i, j ) ∈ V d
cial for entity alignment. More recently, only a few works corresponds to an original
 edge (i, j ) ∈ E o , two dual
 
vertices (i, j ), i , j ∈ V are connected by an edge in G d
d
like RDGCN [44], RAGA [45] have extended GCNs to
consider relation features, while they fail to incorporate if they share the same head or tail entities in G o [23].
complicated relations and attributes. Based on the assump-
tion that GNN’s approach is isomorphic, SEU [46] treats
cross-lingual entity alignment as a task assignment prob- 5 Our approach
lem, preserving only the basic graph convolution operations
for feature propagation, but SEU only makes use of entity 5.1 Overview
names and relation types, not information such as entity
attributes and entity descriptions. PSR [47] addresses the This subsection introduces how to jointly capture relation,
problem of high time complexity caused by negative sam- KG structure, attribute features, and textual information for
pling by using a method that does not require negative entity alignment. Figure 3 presents the specific composi-
sampling, and one side of the Siamese networks’ backprop- tion of RAEA framework. Given two cross-lingual KGs and
agation is restricted, but PSR does not take full advantage the seed set, dual graphs are generated to consider relation
of multiple information. EMGCN [48] incorporates relation features for entity alignment, whose nodes correspond to
Cross-lingual knowledge graph entity alignment based on relation awareness and attribute involvement 6163

Fig. 3 Outline of our proposed


model

the edges in the original KGs. In the first part, the graph We assume there are one-to-one alignments between testing
attention mechanism (GAT) is introduced to facilitate the source entities and testing target entities in this paper. That
interaction between the dual graph and the original graph to is, for each testing source entity, there is one and only one
get the relation-aware entity embedding of the original testing target entity corresponding to it.
graph. Moreover, RAEA utilizes multi-layer GCNs with
highway gates to embed entities of two heterogeneous 5.2 Relation-aware entity embedding
KGs to obtain the representation of the entities’ topolog-
ical structure. The optimized representation of the entity Improved DPGCNN is applied in our model to get relation-
is computed based on entity embedding and neighboring aware entity representation. Different from RDGCN [44],
structure embedding by concatenating. In the second part, we model more complicated relation features and obtain
attribute features are learned from multi-layer GCNs. In the a more accurate representation by using the multi-head
third part, textual information is learned to improve embed- attention mechanism in the learning process of both the
ding. Then we combine optimized embedding, attribute dual relation graph and the original graph. To better link
embedding, and textual embedding together to generate the the dual relation graph and the original graph, we introduce
embedding-based similarity matrix. Finally, for the entity an attention mechanism to capture their interaction infor-
to be aligned, the framework calculates the embedding- mation. During the process of learning on the dual graph,
based similarities between it and all candidate entities. Fine- we utilize a GAT on the dual graph to obtain relation fea-
grained features are incorporated into the similarity matrix. tures of the original graph. These original relation features
6164 B. Zhu et al.

are utilized in the learning process of the original graph to where σ d is the activation function, ReLU ; K is the number
compute attention scores for another GAT, generating the of head; Nid is the index set of neighbors of vid ; ϕ is
dk is the correlation coefficient computed by the
original node representation. The details of learning on the Softmax; eij
dual graph and the original graph are described as follows. k-th head; x dj denotes the representation of node vjd in the
dual graph.
5.2.1 Learning on dual graph with dual convolution
5.2.2 Learning on original graph with original convolution
Although GCNs can embed head and tail entities, they
cannot directly represent relations. The knowledge graph is Multi-head GAT is devised during the process of learning
stored in the form of a triple, which can be represented as on original graph. To incorporate the relation information
(e1, r, e2), where e1 represents the head entity, r represents generated by learning the dual graph into the original node
the relation, and e2 represents the tail entity. This indicates representation, we calculate the attention scores by using
that the relation r can link the head entity e1 and the tail the dual node representation in G d , which correspond to the
entity e2, and the head entity e1 and the tail entity e2 edges in the original graph G o .
associated with the relation r can reflect the semantics of the First, we need to do a linear transformation for the
relation r to some extent [49]. Therefore, we approximate input node, so that the node has a higher expression ability
the relation representation based on their head and tail entity of advanced features. Then we calculate the similarity
representations to improve entity alignment. For a relation coefficient as follows.
ri , the vector representation of it can be derived as: o
emn = a o x̃ dmn , (4)
Ri = concat (hri , t ri ), (1)
where a o maps the concatenate high-dimensional features to
where hri t ri
and are the sets of embedding of head entities a real number, implemented by the single-layer feedforward
and tail entities of relation ri from the original graph; neural network; x̃ dmn denotes the dual representation for the
concat (·) is a function to concatenate vectors. relation rmn between entity em and en .
The similarity between node i and neighbor node j ∈ Nid As in the case of learning on dual graph, we also use
is computed as: multi-head attention to integrate the entity representation in
original graph G o . For an entity em in original graph G o , its
d
eij = wij
d d
a (concat (R i , R j )), (2) representation x̃ om can be computed by:
⎛ ⎞
where a d maps the concatenated high-dimensional features K 
1
to a real number, implemented by the single-layer feedfor- x̃ om = σ o ⎝ ok
ϕ(emn )x on ⎠ , (5)
ward neural network; R i and R j are node representations K o
k=1 t∈Nm
in G d which correspond to relation representations in G o ;
d is the weight where σo is the activation function; K is the number of
concat (·) is the concatenation function; wij
head; Nmo is the index set of neighbors of entity em in G o ;
for each relation in the dual relation graph, which is used ok is the similarity coefficient computed by
ϕ is Softmax; emn
to distinguish the semantics of different relations for better
d is calculated based on the possibility the k-th head; x on denotes the node representation of vno in
entity alignment. wij
the original graph.
that two different relations (Vid and Vjd ) share similar head
entities or tail entities in the original graph G o . While the 5.3 Representation learning of neighboring
Jaccard coefficient is just concerned with the characteristics structure
d by
that are common between samples. So we compute wij
adding the Jaccard coefficient of the head entities and the We incorporate relation features to the entity representation
Jaccard coefficient of the tail entities corresponding to two by learning on the dual and original graph. Then, we further
different relations. leverage GCNs [20, 37] to encode the neighboring structure
To make the model more stable, this layer introduces a information of KGs in this paper. GCNs are based on
multi-head attention mechanism based on RDGCN, using the spectral domain and can extract the spatial features of
K separate attention mechanisms for each node. The output graph structure data by using Laplace matrix. In addition,
representation at dual node vid (corresponding to relation GCNs adopt an end-to-end learning approach to improve
ri ∈ G o ) is x̃ di , which is calculated as: efficiency, which can automatically learn features from the
⎛ ⎞ original data. In other words, feature extraction is integrated
⎜1  
K
dk d ⎟
into the algorithm without manual intervention.

x di = σ d ⎝ ϕ(eij )x j ⎠ , (3) Laplace matrix of a graph is calculated by degree matrix
K
k=1 j ∈N
of graph node D and adjacency matrix A. L = D − A
d
i
Cross-lingual knowledge graph entity alignment based on relation awareness and attribute involvement 6165

is a combination Laplace. This situation concerns the gap converted without retaining the original information, which
between adjacent nodes. In practical work, we use the is equivalent to an ordinary neural network:
following calculation:   (l) 
H (l+1)
=
H (l) , T H
 = 0 , (10)
− 12 A
D− 12 , H (l+1) , T H (l) = 1
L=D (6)

where A  = A+I , and I is the identity matrix A, on the basis 5.4 Attribute embedding
of adjacency matrix A, the diagonal elements are set to 1,
that is to add a self ring to the node and increase the node’s According to people’s experience, if an entity has numerous
 is the diagonal
own characteristics in the transfer process; D aligned neighbors and the distance between the source entity
node degree matrix of A;  L is used to realize symmetry and the target entity is very small, it is easy to find the
normalization and eliminate the influence of node degree. target entity by embedding the extensive structure of an
(l)
Let H s denotes the entity node representation in l-th entity, and the precision rate is very high. But in fact, not
layer, and the hidden state update formula is as follows: all entities have sufficient neighbors which can provide
valid information for entity alignment. As Fig. 1 shows,
the central entity e3 in KG1 and the central entity e11
H (l+1)
s = σ LH (l)
s Ws
(l)
, (7)
in KG3 have no aligned neighbors and the number of
neighbors of the two central entities is also different. In this
where W (l)s ∈ d (l) × d (l+1) is a layer-specific trainable case, current embedding-based entity alignment methods,
weight matrix of the l-th layer in the GCN, d (l+1) is the which only take advantage of entity or relation information
number of features in the (l + 1)-th layer; σ is an activation when aligning entities, have little ability to match the entity
function, ReLU . that should have been aligned. Attribute triples make up
In this paper, GCNs are used to embed structure, but with a large percentage of the knowledge graph. Therefore, by
the deepening of network depth, gradient information flow considering the attribute information of entities, we can
is blocked, which will incorporate too much neighborhood distinguish the candidate entities in the alignment stage
information and lead to network training difficulties. from a global perspective, and the correct target entity will
To control the noise propagation, we also apply layer- have a higher probability of being found. Many studies have
wise highway networks [50] during the process of entity shown that using the same method to learn entity structures
embedding to let more information be returned directly and semantics can improve the effectiveness of the model if
to the input without a nonlinear transformation [49]. possible, such as Inga [43] and GCN-Align [37].
Specifically, two nonlinear transformation layers are added, Based on the fact above, we employ GCNs to learn
one is T (transform gate) and the other is C (carry gate). The attribute embedding in this paper. The detail of attribute
update formula can be defined as follows: learning is shown in Algorithm 1. It is worth noting that
dual graphs are designed to fuse relation information, while
H (l+1) = C H (l) · H (l) + T H (l) · H (l+1) , (8) learning attribute information does not require dual graphs
as input. Attribute embedding is independent of entity
where H (l) is the input to layer l + 1; · is element-wise embedding, so we set two different feature vectors for
(l)
multiplication; C represents the part of the original input structure and attribute, respectively. Let H a denotes the
information that is retained and C = 1 − T ; T represents attribute representation in l-th layer, and the convolutional
the part of the input information that is transformed by the computation is as follows:
convolutional or recurrent information:
H (l+1)
a = σ LH (l)
a Wa
(l)
, (11)
T H (l)
=δ W (l)
T H
(l)
+ b(l) , (9) (l)
T where L is a combination Laplace; W a ∈ d (l) × d (l+1) is a
(l)
layer-specific trainable weight matrix of the l-th layer in the
where H (l) is the input to layer l + 1; W T is the weight GCN, d (l+1) is the number of features in the (l +1)-th layer;
(l)
matrix; bT is the bias vector; δ is a Sigmoid function. σ is an activation function, which is also set as ReLU .
Highway networks are used to process only a portion
of the input, while the rest of the input passes directly 5.5 Textual embedding
through the network. Special cases are as follows, when
T= 0, all the original input information is retained without In the real world, many knowledge graphs [1, 51] provide
any change, and when T = 1, all the original information is description information which is presented in the form of
6166 B. Zhu et al.

text. The description information of an entity is a detailed


elaboration of the entity, which contains rich semantics and
helps enhance the performance of entity alignment. How-
ever, obtaining the semantic relevance of entity descriptions
in different languages is a tough task, and most existing deep
models cannot handle textual information well.
Given the recent excellent performance of BERT in
the field of natural language, which is better than
existing deep neural networks in terms of embedding,
we use BERT to embed entity descriptions and obtain
vector representations containing rich semantics of the
entities. There have been studies using BERT to learn
entity textual information such as JSAE [52] and HMAN
[24]. However, the approach in this paper differs from
existing approaches. The knowledge graph does not provide
descriptive information for all entities, but the entity
name of each entity is definitely present. Therefore, when
we consider textual information embedding, we consider
not only the descriptive information but also the textual
information of entity names that are not utilized by JSAE
and HMAN. If the description text information of an entity
does not exist, the embedding of the entity name is utilized
as the result of the textual embedding part.
The original BERT design was more computationally
intensive, so we use a pair-level BERT model like HMAN.
The information of entities is fed into the pair-wise 5.7 Entity alignment
BERT model to obtain the embedded representation of
entity descriptions. The relevance of entity descriptions is 5.7.1 Model training
represented by the distance between description vectors.
Different from HMAN, in addition to entity descriptions, End-to-end training is used in our method. For entity ei
we also use the pre-trained word vectors of the glove model from KG1 and entity ej from KG2 , we employ Manhattan
[25] to compute the vector representation of entity names distance to calculate their distance:
for textual information. That is, our model considers the  
absence of entity description information and improves the dis ei , ej = f (hei , hej ), (13)
deficiency of HMAN.
where f (m, n) = m − nL1 ; hei and hej denote the vector
representation of ei and ej , respectively.
5.6 Joint embedding
We use a ranking scoring function to assume that aligned
entity pairs should be as near as possible, while negative
For the entity e, we combine optimized entity representa-
alignment pairs should be as far as possible. The ranking
tion, attribute representation, and textual representation to
loss is defined as:
generate the final entity representation ej oint . ej oint can be
     
formally computed as follows: Loss = dis(p, q) − dis p , q  + γ + ,
ejoint = concat(β(concat(αeopt , (1 − α)ha )), (p,q)∈L (p ,q  )∈L

(1 − β)ht ), (12) (14)

where eopt is the optimized entity embedding of entity e where [x]+ =max{x, 0} calculates the maximum of x and 0;
which incorporates relation feature and structure feature; ha γ > 0 is a margin hyper-parameter of the distance between
is the embedding of attribute; ht is the embedding of textual positive and negative samples; L and L represent the set
information; concat (·) is a function to concatenate vectors; of positive and negative triples, respectively. Instead of
α is a trade-off coefficient to balance the importance of randomly sampling negative instances [18] that samples the
optimized embedding and attribute embedding; β is a replacer of the entity from the whole entity set, we get more
trade-off coefficient to balance the importance of graph discriminative negative samples L generated by nearest-
embedding and textual information. neighbor sampling [53] to limit the range of sampling.
Cross-lingual knowledge graph entity alignment based on relation awareness and attribute involvement 6167

Specifically, for the entity e to be aligned, the model 2, 3, 4}. The best parameter settings depend on the values
computes the K-nearest entities of e in the embedding space of the evaluation metrics Hits@1 and Hits@10. The best
as candidates during the process of training. We calculate configuration is λ=0.001, de =300, ds =300, da =100, ω=125,
the top-K nearest neighbors of entities with Manhattan γ =3, α=0.8, β=0.8, ξ =200, ζ =1000, =10, η=2, θ=2,
distance described in Eq. 13. and κ=2.

5.7.2 Refine similarity matrix to optimize alignment

Based on the final entity embedding, entity similarity is


computed by measuring the distance between two entity
nodes on the vector space in this paper. Each element of
embedding-based entity similarity matrix SimJ E represents
the similarity between two entities. Entity alignment aligns
the entities in the source and target knowledge graphs in
both directions. The rows in the entity similarity matrix
represent the source entities, and the columns represent the
target entities. The softmax function involves an exponential
operation, which performs a normalization operation with
a magnification effect and is beneficial to entity alignment
[45]. So to highlight each element in the entity similarity
matrix, we perform softmax operations on both rows and
columns of the entity similarity matrix and sum the results
to obtain the new similarity matrix SimJ E (optimized
matrix). Then we rank the elements in each row of SimJ E
in order. Finally, the rank of the correct target entity will
be obtained. The process of entity alignment is described in
Algorithm 2.

6 Experiments

6.1 Experimental settings

6.1.1 Datasets

We applied public DBP15K as the dataset in this paper,


which is a widely used cross-lingual dataset for entity
alignment tasks. DBP15K is built upon DBpedia, includ-
ing DBP15KZH −EN (Chinese-English), DBP15KJ A−EN
(Japanese-English), and DBP15KF R−EN (French-English). Like most entity alignment work [37, 40, 54, 55], we
Each dataset contains 15,000 inter-language links to match also use 30% of the aligned seed entity pairs as the training
equivalent entities in different knowledge graphs. Table 1 set and the remaining aligned seed entity pairs as the
provides details of the datasets. test set. Section 6.4.1 shows that as the percentage of
the training set increases, the performance of the model
6.1.2 System Settings increases as well. In this paper, we choose to use 30%
of the aligned seed entity pairs as the training set for a
The hyperparameters of the proposed technique are shown fair comparison with other existing work, since almost all
in Table 2. We search among the following values for hyper- models in the knowledge graph entity alignment field use
parameters, i.e., λ in {0.0001, 0.0005, 0.001, 0.005, 0.01}, 30% of the aligned seed entity pairs as the training set,
de , ds and da in {100, 200, 300, 400}, ω in {5, 25, 125, 250}, and the remaining 70% as the test set. Some work, such
γ in {1, 2, 3, 4, 5}, α and β in {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, as JAPE [34], regards this division as the gold standard.
0.7, 0.8, 0.9}, ξ in {200, 400, 600}, ζ in {200, 400, 600, 800, We use Adaptive Moment Estimation (Adam) in joint entity
1000}, in {5, 10, 15}, η and θ in {1, 2, 3}, and κ in {1, and structure embedding and Stochastic Gradient Descent
6168 B. Zhu et al.

Table 1 Statistics of the


DBP15K datasets Datasets Language Ents Rels Attrs Rel.triples Attr.triples

DBP 15KZH −EN Chinese 66,469 2,830 8,113 153,929 379,684


English 98,125 2,317 7,173 237,674 567,755
DBP 15KJ A−EN Japanese 65,744 2,043 5,882 164,373 354,619
English 95,680 2,096 6,066 233,319 497,230
DBP 15KF R−EN French 66,858 1,379 4,547 192,191 528,665
English 105,889 2,209 6,422 278,590 576,543

(SGD) in attribute embedding to improve the performance. 6.1.4 Evaluation metrics


We use TensorFlow—an end-to-end open-source machine
learning platform that contains various tools, libraries, and We use Hits@k as the metric to evaluate the performance of
community resources. Our experiments are conducted on a RAEA. Hits@k means the proportion of correct alignments
personal workstation with an Intel(R) Core(TM) i9-9900K ranked in the top-k. k is set to 1 and 10 as in the previous
3.60GHz CPU, 128 GB memory, and an NVIDIA-SMI work. A higher value of Hits@k means better performance
430.64 2080Ti GPU. of the model. If there is a correct alignment entity with
the highest score in the top-k candidate entities, the exact
6.1.3 Baseline number of Hits@k increases by 1.

For comparison, we have selected some representative mod- 6.1.5 Impact of alignment direction
els as baselines. These baselines can be roughly classified
into two categories: TransE-based models and GNNs-based Table 3 reports the performance of our model RAEA in both
models. directions on three different datasets for the comparison of
TransE-based models. AKE [56], JAPE [34], JTMEA the results. For example, ZH − EN in the table means that
[55], JETEA [57], AlignE [35], RTEA [58], MEEA [36], the source knowledge graph is a Chinese dataset and the
BootEA [35], NAEA [42], MMR [59] and JarKA [60]. target knowledge graph is an English dataset. The process
GNNs-based models. GCN-Align [37], KECG [39], of alignment of the knowledge graph is that given the source
MuGNN [40], Inga [43], LatsEA [61], AliNet [41], HMEA and target knowledge graphs, for each entity in the source
[62], GTEA [63], GM [38], RDGCN [44], HGCN [49], knowledge graph, go to the target knowledge graph to find
NMN [64], EMGCN [48], RAGA [45], PSR [47] , HMAN the equivalent target entity. Therefore we explore whether
[24], SEU [46] and BERT-INT [65]. different directions have an impact on the performance of

Table 2 Hyperparameter
settings of the proposed Name of hyperparameter Symbol
technique
Learning rate λ
Dimension of the hidden representation of entity embedding de
Dimension of the hidden representation of structural embedding ds
Dimension of the hidden representation of attribute embedding da
Number of negative samples ω
Marginal hyperparameter between positive and negative samples γ
Balance parameter for entity structure embedding and attribute embedding α
Balance parameter for graph embedding and texual embedding β
The number of the epochs for joint entity and structure training ξ
The number of the epochs for attribute training ζ
Update the interval epoch of negative samples
Number of layers of GCNs η
The interaction times between dual and original graph θ
Number of heads of multi-headed attention κ
Cross-lingual knowledge graph entity alignment based on relation awareness and attribute involvement 6169

Table 3 The influence of alignment direction on results the performance in the entity alignment task, but it has
some limitations. The key lies in how to make better use
Model Datasets Alignment direction Hits@1 Hits@10
of the entity’s extensive structural information and semantic
RAEA DBP 15KZH −EN ZH − EN 95.1 99.2 information, and fusing the entity’s external semantic infor-
EN − ZH 90.9 98.6 mation is more useful than using only the entity’s structural
DBP 15KJ A−EN J A − EN 95.9 99.5 information.
EN − J A 92.6 97.2 Moreover, BERT-INT performs best among all base-
DBP 15KF R−EN F R − EN 98.4 99.8 lines regarding nearly all metrics. Although BERT-INT
EN − F R 95.3 96.4 was not the best performer on the Hits@10 metric for
the DBP 15KF R−EN dataset, it was only 0.1% different
from the best metric. On the Hits@10 metric on dataset
aligned entities. The experimental results show that when DBP 15KJ A−EN , BERT-INT differs from HMAN by only
the target knowledge graph is an English dataset, the values 0.3%. By observing the experimental results, we can find
of both Hits@1 and Hits@10 metrics are somewhat higher, that the Hits@1 of our proposed model is slightly worse than
and the above phenomenon occurs on all three datasets. the performance of the BERT-INT model, which uses BERT
Therefore, we use the knowledge graph of the English for embedding both entity descriptions and entity names.
language as the target knowledge graph in this paper. However, considering the computational complexity, our
model only uses BERT for embedding entity descriptions.
6.2 Overall results Our model RAEA exceeds all comparative models except
for the BERT-INT model. Even though several better-
Table 4 presents the overall performance of the models performing GNNs-based methods RDGCN, HGCN, and
on entity alignment. The prior approaches are divided NMN incorporate entity neighbors or relations into entity
into two categories: TransE-based methods and GNNs- embeddings, they do not consider attributes, nor refine
based methods which are shown on the top part of the the similarity matrix. EMGCN, RAGA, PSR, and SEU
table. The experimental data for all comparison models perform second only to the optimal model, this may be
were obtained from the original papers. The results of our because none of these four models make use of the textual
models are listed at the bottom of the table. The number embedding information. Even though HMAN also uses rela-
of evaluation metrics is in percentage (%). We compare tions, attributes, and entity descriptions, the large difference
the performance of each model based on the values of between HMAN model and our model proves our valid-
evaluation metrics, and the evaluation metrics used in ity for graph embedding and embedding textual informa-
this paper are Hits@1 and Hits@10, which represent the tion, because we consider the case where entity description
better performance of the model with higher values. The information may not be available. This confirms that rela-
experimental results show that RAEA outperforms almost tion awareness, attribute involvement, textual embedding,
all baselines on three datasets. Hits@1 is similar to the and fine-tuning the matrix can effectively improve the
precision of alignment. Therefore, the outstanding results performance of entity alignment.
in Hits@1 demonstrate the superiority of our proposed
methods on the three datasets. We also highlight the best 6.3 Ablation experiments
scores for the TransE-based and GNNs-based methods by
underlining the numbers on all metrics in each dataset, To better analyze the usefulness of our models, we conduct
respectively. four ablated experiments in detail by ourselves. More
Specifically, for all TransE-based models, JarKA per- specifically, in Table 5, we report the results of RAEA
forms best. The value of the Hits@1 metric of JarKA model w/o HG, which removes highway networks from RAEA.
is firmly in the first place on all three datasets, although the RAEA w/o SE and RAEA w/o AE represent the removal
value of Hits@10 is not as good as that of NAEA model of structural embedding and attribute embedding from the
on DBP 15KJ A−EN and DBP 15KF R−EN , it is not much RAEA model, respectively. RAEA w/o TE represents the
different from that of NAEA. Hits@1 is equivalent to the removal of textual embedding from the RAEA model.
accuracy rate, and the Hits@1 of JarKA is much higher RAEA w/o MF represents no fine-tuning of the similarity
than that of NAEA model, which proves that overall the matrix.
JarKA model’s performance is better overall. JarKA boot-
straps the models iteratively by the extended alignments. 6.3.1 Effect of highway networks
Notably, although RAEA does not apply an iterative strat-
egy, it achieves much better performance than JarKA. So To prove the effectiveness of highway networks used in our
this confirms the iterative strategy is helpful to improve approach, we remove highway networks from the model
6170 B. Zhu et al.

Table 4 Performance on entity


alignment DBP15KZH −EN DBP15KJ A−EN DBP15KF R−EN

Methods Hits@1 Hits@10 Hits@1 Hits@10 Hits@1 Hits@10

AKE 24.9 66.5 28.1 69.1 24.0 67.4


JAPE 41.2 74.5 36.3 68.5 32.4 66.7
JTMEA 42.2 75.9 38.7 72.4 37.1 75.6
JETEA 42.7 75.0 36.4 72.4 36.5 71.8
AlignE 47.2 79.2 44.8 78.9 48.1 82.4
RTEA 57.3 86.4 53.4 85.7 53.8 86.8
MEEA - - 55.6 79.0 57.8 81.1
BootEA 62.9 84.8 62.2 85.4 65.3 87.4
NAEA 65.0 86.7 64.1 87.3 67.3 89.4
MMR 65.5 86.1 62.8 84.9 64.3 87.9
JarKA 70.6 87.8 64.6 85.5 70.4 88.8

GCN-Align 41.3 74.4 39.9 74.5 37.3 74.5


KECG 47.8 83.5 49.0 84.4 48.6 85.1
MuGNN 49.4 84.4 50.1 85.7 49.5 87.0
Inga 50.5 79.4 51.5 79.5 50.5 79.4
LatsEA 52.2 76.3 53.9 77.2 53.8 78.7
AliNet 53.9 82.6 54.9 83.1 55.2 85.2
HMEA 54.0 87.9 53.1 87.5 48.4 86.5
GTEA 59.6 76.5 66.2 79.1 60.3 75.6
GM 67.9 78.5 74.0 87.2 89.4 95.2
RDGCN 70.8 84.6 76.7 89.5 88.6 95.7
HGCN 72.0 85.7 76.6 89.7 89.2 96.1
NMN 73.3 86.9 78.5 91.2 90.2 96.7
EMGCN 86.3 94.6 86.6 95.2 94.0 98.9
RAGA 87.3 - 90.9 - 96.6 -
PSR 88.3 98.2 90.8 98.7 95.8 99.7
HMAN 87.1 98.7 93.5 99.4 97.3 99.8
SEU 90.0 96.5 95.6 99.1 98.8 99.9
BERT-INT 96.8 99.0 96.4 99.1 99.2 99.8

RAEA 95.1 99.2 95.9 99.5 98.4 99.8

and only use the original GCNs for structural embedding. conclude that the performance of RAEA w/o HG drops,
RAEA w/o HG in Table 5 refers to the model RAEA with- which shows that incorporating highway networks into orig-
out highway gates. From the experimental results, we can inal GCNs can effectively address the problem of network

Table 5 Ablation experiments


DBP 15KZH −EN DBP 15KJ A−EN DBP 15KF R−EN

Methods Hits@1 Hits@10 Hits@1 Hits@10 Hits@1 Hits@10

RAEA w/o HG 92.9 98.5 93.4 97.3 96.1 98.4


RAEA w/o SE 92.1 98.3 92.6 98.2 95.2 97.9
RAEA w/o AE 92.4 98.4 92.2 98.1 95.8 98.3
RAEA w/o TE 80.5 91.9 84.0 94.4 91.5 97.5
RAEA w/o MF 93.2 98.6 93.1 97.9 96.5 97.2

RAEA 95.1 99.2 95.9 99.5 98.4 99.8


Cross-lingual knowledge graph entity alignment based on relation awareness and attribute involvement 6171

Fig. 4 The effects of different


seed proportions on Hits@1

training difficulties caused by the deepening of the network 6.3.4 Effect of textual embedding
and the obstruction of gradient information flow back.
To analyze whether textual embedding can have an effect on
6.3.2 Effect of neighbor structure embedding the performance of the model, we made an ablation model
RAEA w/o TE in Table 5. Removing the textual embedding
To analyze the influence of neighbor structure embedding brings down the performance on all metrics and datasets. By
on entity alignment, we only consider the relation and observing Table 5, RAEA w/o TE has the largest difference
attributes of entities in the graph embedding part. RAEA from RAEA compared to the other ablation experiments,
w/o SE from Table 5 is the ablated model. From the which can prove the importance and effectiveness of textual
experimental results, we can find that the performance of embedding for entity alignment.
the ablated model drops, which suggests that it is necessary
to use neighbor structure embedding for better entity 6.3.5 Effect of similarity matrix fine-tuning strategy
alignment. Neighboring structures of entities provide broad
information, relations, attributes, and textual information To demonstrate whether softmax operation can improve the
provide semantic information, and structures and semantics performance of the model, we made an ablation model
are mutually reinforcing. RAEA w/o MF in Table 5. Removing the softmax operation
on the similarity matrix brings down the performance
6.3.3 Effect of attribute embedding on all metrics and datasets, proving the validity of the
fine-tuning strategy for entity alignment. Experimental
To verify whether the attribute information contributes results demonstrate that the softmax operation can indeed
to entity alignment, we made an ablation model RAEA incorporate fine-grained features into the feature matrix.
w/o AE in Table 5. Based on the experimental results,
we find that incorporating attribute information into our 6.4 Discussion
model does optimize the model, especially in the metric
Hits@1 where the model performance improves more sig- 6.4.1 Impact of the ratio of available seed sets
nificantly. This indicates that the performance of entity
alignment relies heavily on topological structure informa- The performance of the model is heavily influenced by the
tion, and at the same time additional attribute information is number of seed sets, so we compare our model with NMN,
very crucial. HGCN, RDGCN, and BootEA by increasing the seed set

Fig. 5 The effects of different


seed proportions on Hits@10
6172 B. Zhu et al.

Fig. 6 The effects of information balance parameter on Hits@1 Fig. 9 The effects of information balance parameter on Hits@10

Fig. 7 The effects of information balance parameter on Hits@10 Fig. 10 The effects of interaction times on Hit@1

Fig. 8 The effects of information balance parameter on Hits@1 Fig. 11 The effects of interaction times on Hit@10
Cross-lingual knowledge graph entity alignment based on relation awareness and attribute involvement 6173

Table 6 Performance on on different number of head

DBP15KZH −EN DBP15KJ A−EN DBP15KF R−EN

κ Hits@1 Hits@10 Hits@1 Hits@10 Hits@1 Hits@10

1 93.9 98.5 94.7 98.2 97.1 98.9


2 95.1 99.2 95.9 99.5 98.4 99.8
3 94.5 99.0 95.1 99.1 97.6 99.3
4 94.2 98.9 94.8 98.7 97.0 98.7

the performance of the model. However, when the value of β


reaches 0.9, the performance decreases slightly. This shows
Fig. 12 The effects of different GCN layers on Hits@1
that the performance of knowledge graph entity alignment
depends on graph embedding, but textual embedding can
from 10% to 40% with the step of 10% to prove this idea. It also play a positive role.
can be observed from Figs. 4 and 5 that, the performance of
the model on all three datasets shows a continuous upward 6.4.4 Impact of the times of interaction for relation
trend as the size of the seed set increases. RAEA also
consistently achieves better results compared to the rest of Figures 10 and 11 show the effect of the times of interaction
entity alignment methods. between learning on the dual graph and the original graph.
Experimental results show that when the number of inter-
6.4.2 Balance in the part of graph embedding action times is 3, the model shows a downward trend. This
proves that the two interactions have been able to integrate
Figures 6 and 7 show the effects of different α values on the relation information well into the final representation.
Hits@1 and Hits@10. The more weight α is applied to the
internal information of an entity, the better the performance 6.4.5 Impact of the number of GCN layers
of the model. Each weight denotes the importance. This
shows internal information of an entity is more important Figures 12 and 13 show the effect of the number of GCNs.
than external information for entity alignment. Experimental results show that when the number of layers
of GCNs is 3, the performance begins to decline. This is
6.4.3 Balance of graph embedding and textual embedding because the increase in the number of GCN layers causes
the embedding representation of each node to be similar,
Figures 8 and 9 show the effects of different β values on which is not conducive to distinguishing the features of
Hits@1 and Hits@10. The weight β reflects the balance different entities.
between graph embedding and textual embedding. From the
figures, we can see that the larger the value of β, the better

Fig. 13 The effects of different GCN layers on Hits@10 Fig. 14 Consumed time test
6174 B. Zhu et al.

and textual embedding to optimize entity representation.


Moreover, a similarity matrix fine-tuning strategy is applied
to improve alignment. Finally, we conduct experiments on
three cross-lingual datasets. Compared with existing mod-
els, the model proposed in this paper requires less training
data and can obtain richer entity representations using struc-
tural, relational, attribute, and textual information, and thus
has better robustness and accuracy. The model in this paper
achieves better performance on three publicly available
cross-lingual datasets. Our model assumes that the given
knowledge graph must provide textual information for the
entities, however, the descriptive information of the enti-
ties is not always present, so our model has limitations. The
performance of our proposed model RAEA depends on the
Fig. 15 Consumed memory test number of the seed set, and there is no method designed in
this paper to increase the size of the seed set to improve the
6.4.6 Impact of the number of attention heads performance of the model, which is a limitation of RAEA.
Hence in the future, we will explore how to use a semi-
Multi-head attention helps the network capture richer supervised iterative strategy to generate seed entity pairs as
features and can balance the biases that may arise from training data. And we will test whether the entity initial-
the same attention mechanism, thus effectively balancing ization embedding can be optimized using the pre-trained
the expressive power of the model. Multiple attention is language model BERT.
computed independently, acting as an integration to prevent
Acknowledgements Thanks to all the authors for their hard work.
overfitting. In Table 6, we report the results with 1 to 4
This work is supported by the National Natural Science Foundation
attention heads. However, experimental data demonstrates of China under grant No.61872163 and 61806084, Jilin Province Key
that it is not the case that the larger the number of attention Scientific and Technological Research and Development Project under
heads is, the better the model works. When the number of grant No.20210201131GX, and Jilin Provincial Education Department
attention heads exceeds 2, the model performance starts to project under grant No.JJKH20190160KJ.
degrade in most cases. It is because the larger the number
Author Contributions Beibei Zhu: conduct experiments, write the first
of attention heads is, the more noise will be inevitably draft of the article and revise the manuscript; Tie Bao: formulation of
introduced. overarching research goals, oversight for the research activity planning
and execution; Jiayu Han: analyze and synthesize data; Lu Liu: provide
6.4.7 Consumed time and memory test funding for publication of the article and revise the article; Junyi
Wang: visualization of data and production of graphs for the article;
Tao Peng: revise the article, provide financial support and validate the
Figures 14 and 15 show the time and memory consumption experimental results.
of our model and some comparative models respectively.
Through observation, we can find that some models Funding This work is supported by the National Natural Science
improve efficiency and memory consumption at the expense Foundation of China under grant No.61872163 and 61806084, Jilin
Province Key Scientific and Technological Research and Develop-
of performance. For example, GCN-Align model consumes
ment Project under grant No.20210201131GX, and Jilin Provincial
the least time and memory, and its performance is the worst. Education Department project under grant No.JJKH20190160KJ.
Our model achieves a balance between performance and
efficiency or memory occupation. The high performance of Materials Availability The data used or analyzed during the current
our model is not based on significantly increasing time and study are available from the corresponding author after the paper is
memory. accepted for publication.

Code Availability Code is available from the corresponding author


after the paper is accepted for publication.
7 Conclusions and future work
Declarations
In this paper, we propose a new RAEA framework for the
cross-lingual entity alignment problem. RAEA takes full Conflict of Interests No potential conflict of interest was reported by
use of relation-aware entity embedding, attribute embedding, the authors.
Cross-lingual knowledge graph entity alignment based on relation awareness and attribute involvement 6175

References aligning large knowledge bases. Paper Presented at the 19th


International Conference on Knowledge Discovery and Data
Mining, Chicago, IL, USA, 11–14 August 2012
1. Bizer C, Lehmann J, Kobilarov G, Auer S, Becker C, Cyganiak R,
18. Bordes A, Usunier N, Garcia-Duran A, Weston J, Yakhnenko O
Hellmann S (2009) Dbpedia - a crystallization point for the web of
(2013) Translating embeddings for modeling multi-relational data.
data. J Web Semant 7:154–165. https://doi.org/10.1016/j.websem.
Paper Presented at the 27th Adv Neural Inf Process Syst, Lake
2009
Tahoe, Nevada, United States, 5–8 December 2013
2. Suchanek FM, Kasneci G, Weikum G (2008) YAGO: a large
19. Bruna J, Zaremba W, Szlam A, LeCun Y (2014) Spectral networks
ontology from wikipedia and wordnet. J Web Semant 6:203–217.
and locally connected networks on graphs. Paper Presented at the
https://doi.org/10.1016/j.websem.2008.06.001
2nd Int Conf Learn Represent, Banff, AB, Canada, 14-16 April
3. Navigli R, Ponzetto SP (2012) Babelnet: the automatic construc- 2014
tion, evaluation and application of a wide-coverage multilingual 20. Kipf TN, Welling M (2017) Semi-supervised classification with
semantic network. Artif Intell 193:217–250. https://doi.org/10. graph convolutional networks. Paper Presented at the 5th Int Conf
1016/j.artint.2012.07.001 Learn Represent, Toulon, France, 24–26 April 2017
4. Lin L, Liu J, Lv Y, Guo F (2020) A similarity model based on 21. Velickovic P, Cucurull G, Casanova A, Romero A, Liò P, Bengio
reinforcement local maximum connected same destination struc- Y (2018) Graph attention networks. Paper Presented at 6th Int
ture oriented to disordered fusion of knowledge graphs. Appl Intell Conf Learn Represent, Vancouver, BC, Canada, 30 April –3 May
50(9):2867–2886. https://doi.org/10.1007/s10489-020-01673-9 2018
5. Hoffmann R, Zhang C, Ling X, Zettlemoyer LS, Weld DS 22. Chen J, Gu B, Li Z, Zhao P, Liu A, Zhao L (2020) SAEA :
(2011) Knowledge-based weak supervision for information self-attentive heterogeneous sequence learning model for entity
extraction of overlapping relations. Paper Presented at the 49Th alignment. Paper Presented at Database Systems for Advanced
Annual Meeting of the Association for Computational Linguistics Applications - 25th International Conference, Jeju, South Korea,
Portland, Oregon, USA, 19–24 June 2011 24–27 September 2020
6. Moussallem D, Wauer M, Ngomo AN (2018) Machine translation 23. Monti F, Shchur O, Bojchevski A, Litany O, Günnemann S,
using semantic web technologies: a survey. J Web Semant 51:1– Bronstein MM (2018) Dual-primal graph convolutional networks.
19. https://doi.org/10.1016/j.websem.2018.07.001 arXiv:1806.00770
7. Zhang Y, Dai H, Kozareva Z, Smola AJ, Song L (2018) Variational 24. Yang H, Zou Y, Shi P, Lu W, Lin J, Sun X (2019) Aligning cross-
reasoning for question answering with knowledge graph. Paper lingual entities with multi-aspect information. Paper Presented at
Presented at the 30Th Innovative Applications of Artificial the 2019 Conf Empir Methods Nat Lang Process and the 9th
Intelligence, and the 8th AAAI Symposium on Educational International Joint Conference on Natural Language Processing,
Advances in Artificial Intelligence, New Orleans, Louisiana, Hong Kong, China, 3–7 November 2019
USA, 2–7 February 2018 25. Zhu Y, Liu H, Wu Z, Du Y (2020) Relation-aware neighborhood
8. Mishra S, Saha S, Mondal S (2017) GAEMTBD : genetic matching model for entity alignment. arXiv:2012.08128
algorithm based entity matching techniques for bibliographic 26. Kazemi SM, Poole D (2018) Simple embedding for link prediction
databases. Appl Intell 47(1):197–230. https://doi.org/10.1007/ in knowledge graphs. Paper Presented at the 31st Adv Neural Inf
s10489-016-0874-z Process Syst, montréal, Canada 3–8 December 2018
9. Fellegi IP, Sunter AB (1969) A theory for record linkage. J Am 27. Zhang Y, Yao Q, Chen L (2020) Interstellar: searching recurrent
Stat Assoc 64(328):1183–1210 architecture for knowledge graph embedding. Paper Presented at
10. Ong TC, Mannino MV, Schilling LM, Kahn MG (2014) the 33rd Adv Neural Inf Process Syst, virtual, 6–12 December,
Improving record linkage performance in the presence of missing 2020
linkage data. J Biomed Inform 52:43–54 28. Wang Q, Mao Z, Wang B, Guo L (2017) Knowledge graph embed-
11. Daggy J, Xu H, Hui S, Grannis S (2014) Evaluating latent class ding: a survey of approaches and applications. IEEE Trans Knowl
models with conditional dependence in record linkage. Stat Med Data Eng 29:2724–2743. https://doi.org/10.1109/TKDE.2017.
33(24):4250–4265 2754499
12. Raimond Y, Sutton C, Sandler MB (2008) Automatic interlinking 29. Wang Z, Zhang J, Feng J, Chen Z (2014) Knowledge graph
of music datasets on the semantic web. Paper Presented at the embedding by translating on hyperplanes. Paper Presented at
WWW 2008 Workshop on Linked Data on the Web, Beijing the 28th Association for the Advance of Artificial Intelligence,
China, 22 April 2008 Québec City, Québec, Canada, 27–31 July 2014
13. Niu X, Rong S, Wang H, Yu Y (2012) An effective rule miner for 30. Lin Y, Liu Z, Sun M, Liu Y, Zhu X (2015) Learning entity
instance matching in a web of data. Paper Presented at the 21St and relation embeddings for knowledge graph completion. Paper
Acm International Conference on Information and Knowledge Presented at the 29th Association for the Advance of Artificial
Management, Maui, HI, USA, 29 October – 02 November 2012 Intelligence, Austin TexasAustin, Texas, USA, 25–30 January
14. Volz J, Bizer C, Gaedke M, Kobilarov G (2009) Discovering 2015
and maintaining links on the web of data. Paper Presented at the 31. Ji G, Liu K, He S, Zhao J (2016) Knowledge graph completion
8th International Semantic Web Conference, Chantilly, VA, USA, with adaptive sparse transfer matrix. Paper Presented at the 13rd
25–29 October 2009 Association for the Advance of Artificial Intelligence, Phoenix,
15. Ngomo ACN, Auer S (2011) LIMES – a time-efficient approach Arizona, USA, 12–17 February 2016
for large-scale link discovery on the web of data. Paper Presented 32. Dong X, Gabrilovich E, Heitz G, Horn W, Lao N, Murphy K,
at the 22nd Int Jt Conf Artif Intell, Barcelona, Spain, 16 Jul – 22 Strohmann T, Sun S, Zhang W (2014) Knowledge vault : a web-
Jul 2011 scale approach to probabilistic knowledge fusion. Paper Presented
16. Papadakis G, Alexiou G, Papastefanatos G, Koutrika G (2015) at The 20th Proc ACM SIGKDD Int Conf Knowl Discov Data
Schema-agnostic vs schema-based configurations for blocking Min, New York, NY, USA, 24 –27 August 2014
methods on homogeneous data. Proc VLDB Endow 9(4):312–323 33. Ghorbani M, Baghshah MS, Rabiee HR (2019) MGCN :
17. Lacoste-Julien S, Palla K, Davies A, Kasneci G, Graepel T, Semi-supervised classification in multi-layer graphs with graph
Ghahramani Z (2012) SiGMa: simple greedy matching for convolutional networks. Paper Presented at Int Conf Adv Soc
6176 B. Zhu et al.

Netw Anal Min, Vancouver, British Columbia, Canada, 27–30 49. Wu Y, Liu X, Feng Y, Wang Z, Zhao D (2019) Jointly learning
August 2019 entity and relation representations for entity alignment. Paper
34. Sun Z, Hu W, Li C (2017) Cross-lingual entity alignment via Presented at the 2019 Conf Empir Methods Nat Lang Process
joint attribute-preserving embedding. Paper Presented at the 16th and the 9th International Joint Conference on Natural Language
International Semantic Web Conference, Vienna, Austria, 21–25 Processing, Hong Kong China, 3–7 November 2019
October 2017 50. Srivastava RK, Greff K, Schmidhuber J (2015) Highway
35. Sun Z, Hu W, Zhang Q, Qu Y (2018) Bootstrapping entity Networks. arXiv:1505.00387
alignment with knowledge graph embedding. Paper Presented at 51. Mahdisoltani F, Biega J, Suchanek FM (2015) YAGO3 : A
the 27th Int Jt Conf Artif Intell, Stockholm, Sweden, 13-19 July, knowledge base from multilingual wikipedias. Paper Presented at
2018 the 7th Bienn Conf Innov Data Syst Res, Asilomar, CA USA, 4–7
36. Chen L, Tian X, Tang X, Cui J (2021) Multi-information January 2015
embedding based entity alignment. Appl Intell 51(12):8896–8912. 52. Munne RF, Ichise R (2020) Joint entity summary and attribute
https://doi.org/10.1007/s10489-021-02400-8 embeddings for entity alignment between knowledge graphs.
37. Wang Z, Lv Q, Lan X, Zhang Y (2018) Cross-lingual Paper Presented at the Hybrid Artificial Intelligent Systems - 15th
knowledge graph alignment via graph convolutional networks. International Conference Spain, 11–13 November 2020
Paper Presented at the 2018 Conf Empir Methods Nat Lang 53. Kotnis B, Nastase V (2017) Analysis of the impact of negative
Process, Brussels, Belgium, 31 October–4 November 2018 sampling on link prediction in knowledge graphs. arXiv:1708.
38. Xu K, Song L, Feng Y, Song Y, Yu D (2020) Coordinated 06816
reasoning for cross-lingual knowledge graph alignment. Paper 54. Chen M, Tian Y, Yang M, Zaniolo C (2017) Multilingual
Presented at the 32nd Innov Appl Artif Intell Conf, New York, NY, Knowledge Graph Embeddings for Cross-lingual Knowledge
USA, 7–12 February, 2020 Alignment. Paper Presented at the 26th Int Jt Conf Artif Intell,
39. Li C, Cao Y, Hou L, Shi J, Li J, Chua T (2019) Semi-supervised Melbourne, Australia, 19–25 August 2017
entity alignment via joint knowledge embedding model and cross- 55. Lu G, Zhang L, Jin M, Li P, Huang X (2021) Entity alignment via
graph model. Paper Presented at the 2019 Conf Empir Methods knowledge embedding and type matching constraints for knowl-
Nat Lang Process and the 9th International Joint Conference on edge graph inference. J Ambient Intell Humaniz Comput,(4),
Natural Language Processing, Hong Kong, China, 3-7 November pp 1–11
2019 56. Lin X, Yang H, Wu J, Zhou C, Wang B (2019) Guiding cross-
40. Cao Y, Liu Z, Li C, Liu Z, Li J, Chua T (2019) Multi-channel lingual entity alignment via adversarial knowledge embedding.
graph neural network for entity alignment. Paper Presented Paper Presented at Int Conf Data Min, Beijing, China, 8–11
at the 57th Conference of the Association for Computational November 2019
Linguistics, Florence, Italy, 28 July – 2 August, 2019 57. Song X, Zhang H, Bai L (2021) Entity alignment between
41. Sun Z, Wang C, Hu W, Chen M, Dai J, Zhang W, Qu Y knowledge graphs using entity type matching. Paper Presented
(2020) Knowledge graph alignment network with gated multi- at Knowledge Science, Engineering and Management - 14th
hop neighborhood aggregation. Paper Presented at the 32nd Innov International Conference, KSEM 2021, Tokyo, Japan, 14–16
Appl Artif Intell Conf, New York, NY, USA, 7–12 February 2020 August 2021
42. Zhu Q, Zhou X, Wu J, Tan J, Guo L (2019) Neighborhood-aware 58. Jiang T, Bu C, Zhu Y, Wu X (2019) Two-stage entity alignment:
attentional representation for multilingual knowledge graphs. combining hybrid knowledge graph embedding with similarity-
Paper Presented at the 28th Int. Jt Conf Artif Intell, Macao, China, based relation alignment. Paper Presented at the 16th Pacific Rim
10–16 August 2019 International Conference on Artificial Intelligence, Cuvu, Yanuca
43. Pang N, Zeng W, Tang J, Tan Z, Zhao X (2019) Iterative Island, Fiji, 26–30 August 2019
entity alignment with improved neural attribute embedding. Paper 59. Shi X, Xiao Y (2019) Modeling Multi-mapping Relations for
Presented at the 16th Extended Semantic Web Conference 2019, Precise Cross-lingual Entity Alignment. Paper Presented at the
Portoroz, Slovenia, 2 June 2019 2019 Conf Empir Methods Nat Lang Process and the 9th
44. Wu Y, Liu X, Feng Y, Wang Z, Yan R, Zhao D (2019) Relation- International Joint Conference on Natural Language Processing,
aware entity alignment for heterogeneous knowledge graphs. Hong Kong, China, 3–7 November 2019
Paper Presented at the 28th Int Jt Conf Artif Intell, Macao, China, 60. Chen B, Zhang J, Tang X, Chen H, Li C (2020) JarKA: modeling
10–16 August 2019 attribute interactions for cross-lingual knowledge alignment.
45. Zhu R, Ma M, Wang P (2021) RAGA: relation-aware graph Paper Presented At Advances In Knowledge Discovery And Data
attention networks for global entity alignment. Paper Presented at Mining - 24th Pacific-Asia Conference, Singapore, pp 11–14, May
25th Pacific-Asia Conference, Virtual Event, 11–14 May 2021 2020
46. Mao X, Wang W, Wu Y, Lan M (2021) From alignment to 61. Chen W, Chen X, Xiong S (2021) Global entity alignment with
assignment: Frustratingly Simple Unsupervised Entity Alignment. gated latent space neighborhood aggregation. Paper Presented At
Paper Presented at the 2021 Conf Empir Methods Nat Lang The 20Th China National Conference, Hohhot, China, pp 13–15
Process, Virtual Event / Punta Cana Dominican Republic, 7–11 62. Guo H, Tang J, Zeng W, Zhao X, Liu L (2021) Multi-modal entity
November 2021 alignment in hyperbolic space. Neurocomputing 461:598–607.
47. Mao X, Wang W, Wu Y, Lan M (2021) Are negative https://doi.org/10.1016/j.neucom.2021.03.132
samples necessary in entity alignment?: An Approach with High 63. Jiang S, Nie T, Shen D, Kou Y, Yu G (2021) Entity alignment
Performance, Scalability and Robustness. Paper Presented at the of knowledge graph by joint graph attention and translation
30th ACM Int Conf Inf Knowl Manag, Virtual Event, Queensland representation. Paper Presented At The 18Th International
Australia, 1–5 November 2021 Conference, Kaifeng, China, pp 24–26
48. Tam NT, Trung HT, Yin H, Vinh TV, Sakong D, Zheng B, 64. Wu Y, Liu X, Feng Y, Wang Z, Zhao D (2020) Neighborhood
Hung NQV (2021) Multi-order graph convolutional networks for Matching Network for Entity Alignment. Paper Presented At The
knowledge graph alignment. Paper Presented at the 37th IEEE Int 58th Annual Meeting Of The Association For Computational
Conf Data Eng, Chania Greece, 19–22 April 2021 Linguistics Online, pp 5–10
Cross-lingual knowledge graph entity alignment based on relation awareness and attribute involvement 6177

65. Tang X, Zhang J, Chen B, Yang Y, Chen H, Li C (2020)


BERT-INT: a bert-based interaction model for knowledge graph
alignment.Paper Presented At The 29th International Joint Con-
ference on Artificial Intelligence, Yokohama, Japan, pp 11–17

Publisher’s note Springer Nature remains neutral with regard to


jurisdictional claims in published maps and institutional affiliations.

Affiliations
Beibei Zhu1 · Tie Bao1 · Lu Liu2,3 · Jiayu Han4 · Junyi Wang1 · Tao Peng1,3

Beibei Zhu
zhubb20@mails.jlu.edu.cn
Tie Bao
baotie@jlu.edu.cn
Lu Liu
liulu@jlu.edu.cn
Jiayu Han
jyhan126@uw.edu
Junyi Wang
wangjy20@mails.jlu.edu.cn
1 College of Computer Science and Technology, Jilin University,
Qianjin Street, Changchun, 130012, Jilin, China
2 College of Software, Jilin University, Qianjin Street,
Changchun, 130012, Jilin, China
3 Key Laboratory of Symbol Computation and Knowledge
Engineering for Ministry of Education, Jilin University,
Qianjin Street, Changchun, 130012, Jilin, China
4 Department of Linguistics, University of Washington,
WA98195-3770, Seattle, 98195, Washington, USA

You might also like