IME: Integrating Multi-curvature Shared and Specific Embedding for Temporal Knowledge Graph Completion
Abstract.
Temporal Knowledge Graphs (TKGs) incorporate a temporal dimension, allowing for a precise capture of the evolution of knowledge and reflecting the dynamic nature of the real world. Typically, TKGs contain complex geometric structures, with various geometric structures interwoven. However, existing Temporal Knowledge Graph Completion (TKGC) methods either model TKGs in a single space or neglect the heterogeneity of different curvature spaces, thus constraining their capacity to capture these intricate geometric structures. In this paper, we propose a novel Integrating Multi-curvature shared and specific Embedding (IME) model for TKGC tasks. Concretely, IME models TKGs into multi-curvature spaces, including hyperspherical, hyperbolic, and Euclidean spaces. Subsequently, IME incorporates two key properties, namely space-shared property and space-specific property. The space-shared property facilitates the learning of commonalities across different curvature spaces and alleviates the spatial gap caused by the heterogeneous nature of multi-curvature spaces, while the space-specific property captures characteristic features. Meanwhile, IME proposes an Adjustable Multi-curvature Pooling (AMP) approach to effectively retain important information. Furthermore, IME innovatively designs similarity, difference, and structure loss functions to attain the stated objective. Experimental results clearly demonstrate the superior performance of IME over existing state-of-the-art TKGC models.
1. Introduction
Knowledge Graphs (KGs) are structured collections of entities and relations, providing a semantic representation of knowledge. They serve as a powerful tool for organizing and representing real-world information in a way that machines can comprehend. Typically, knowledge in KGs is represented as triplets, where each node is represented as an entity, and the directed edge between nodes is denoted as a relation. For example, given one triplet (Albert Einstein, born_in, Germany), Albert Einstein and Germany are the head and tail entities, and born_in means the relation between the head and tail entities. KGs find applications in a wide array of domains, including recommendation systems (Ko et al., 2022), information retrieval (Dalton et al., 2014), and semantic search (Guha et al., 2003). They enable machines to reason about entities and their relations, uncover patterns, and make informed decisions based on the structured knowledge they encapsulate.
Acknowledging the ever-changing nature of information, Temporal Knowledge Graphs (TKGs) have arisen as a natural extension of traditional KGs. In contrast to their static counterparts, TKGs introduce the temporal dimension, enabling us to track the evolution of knowledge over time. Specifically, TKGs aim to incorporate temporal attributes with triplets for quadruplets: (Albert Einstein, born_in, Germany, 1879-03-14), with 1879-03-14 serving as the timestamp. Therefore, the temporal dimension allows for a systematic depiction of the trends and changes in events, thereby facilitating more context-aware and precise knowledge representation.
Despite the presence of TKGs like ICEWS (Boschee et al., 2015) and GDELT (Leetaru and Schrodt, 2013), which encompass millions or even billions of quadruplets, the ongoing evolution of knowledge driven by natural events leaves these TKGs far from being comprehensive. The incompleteness of TKGs poses a substantial hindrance to the efficiency of knowledge-driven systems, underscoring the critical significance of ”Temporal Knowledge Graph Completion (TKGC)” as an essential undertaking. The goal of the TKGC task is to enhance the completeness and accuracy of TKGs by predicting missing relations, entities, or temporal attributes that change over time within the TKGs.
The quality of the embedding representations in TKGs depends on how well the geometric structure of the embedding space matches the structure of the data. As depicted in (Wang et al., 2021; Cao et al., 2022), various curvature spaces yield diverse impacts when embed different types of structured data. Specifically, hyperspherical space (Wilson et al., 2014; Liu et al., 2017) excels in capturing ring structures, hyperbolic space (Nickel and Kiela, 2017; Sala et al., 2018) is highly effective in representing hierarchical arrangements, and Euclidean space proves invaluable for describing chain-like structures. However, in reality, TKGs may exhibit complex and diverse structures, resembling tree shapes in some regions and forming ring structures in others. Nonetheless, the majority of TKGC methods typically model TKGs within a singular space, posing a challenge in effectively capturing the intricate geometric structures inherent in TKGs.
The challenge of how to effectively integrate information from different curvature spaces subsequently needs to be addressed. Current TKGC methods (Han et al., 2020; Zhang et al., 2023) typically overlook the spatial gap among different curvature spaces. Despite significant advancements, the spatial gap remains a substantial constraint on expressive capacities.
The last challenge is the feature fusion issue. Existing methods (Yue et al., 2023; Wang et al., 2023a) predominantly focus on developing sophisticated fusion mechanisms, causing a high computational complexity. Despite the effectiveness of pooling approaches like average pooling and max pooling in reducing computational complexity, their utilization of fixed pooling strategies presents a challenge in preserving important information.
This paper proposes a novel Integrating Multi-curvature shared and specific Embedding (IME) model to address the above challenges. As shown in Figure 1, IME simultaneously models TKGs in hyperspherical, hyperbolic, and Euclidean spaces, introducing the quadruplet distributor (Wang et al., 2023a) within each space to facilitate the aggregation and distribution of information among entities, relations, and timestamps. In addition, IME acquires two distinct properties for each space, encompassing both space-shared and space-specific properties. The space-shared property aids in mitigating the space gap by capturing shared information among entities, relations, and timestamps across various curvature spaces. Conversely, the space-specific property excels at fully capturing the complementary information exclusive to each curvature space. Finally, an Adjustable Multi-curvature Pooling (AMP) approach is proposed, which can learn appropriate pooling weights to get a superior pooling strategy, ultimately improving the effective retention of important information. We utilize AMP to aggregate space-shared and -specific representations of entities, relations, and timestamps to get a joint vector for downstream predictions.
The main contributions of this paper are summarized as follows:
-
•
This paper designs a novel Multi-curvature Space-Shared and -Specific Embedding (IME) model for TKGC tasks, which learns two key properties, namely space-shared property and space-specific property. Specifically, space-shared property learns the commonalities across distinct curvature spaces and mitigates spatial gaps among them, while space-specific property captures characteristic features;
-
•
This paper proposes an adjustable multi-curvature pooling module, designed to attain a superior pooling strategy through training for the effective retention of important information;
-
•
To the best of our knowledge, we are the first to introduce the concept of structure loss into TKGC tasks, ensuring the structural similarity of quadruplets across various curvature spaces;
-
•
Experimental results on several widely used datasets demonstrate that IME achieves competitive performance compared to state-of-the-art TKGC methods.
2. Related work
In this section, we provide an overview of KGC methods from two perspectives (Wang et al., 2023c; Pan et al., 2024): Euclidean embedding-based methods and Non-Euclidean embedding-based methods.
2.1. Euclidean Embedding-based Methods
Euclidean embedding-based KGC methods typically model the KGs in the Euclidean space. Depending on the types of knowledge, we can categorize them into static knowledge graph completion for triplets and temporal knowledge graph completion for quadruplets.
Static knowledge graph completion (SKGC) focuses on SKGs where the information about entities and relations remains unchanged over time. The task of SKGC methods aims to predict missing triplets (e.g., relations between entities) based on known information. Several popular SKGC methods include McRL (Wang et al., 2022), TDN (Wang et al., 2023b), and ConvE (Dettmers et al., 2018).
Translation-based methods take the relation as a translation from the head entity to the tail entity, such as TransE (Bordes et al., 2013) and RotatE (Sun et al., 2019). RotatE regards the relation as a rotation from the head entity to the tail entity in the complex space. Based on TransE, TransR (Lin et al., 2015) learns a unified mapping matrix to model the entities and relations into a common space. SimplE (Kazemi and Poole, 2018) improves upon the complex Canonical Polyadic (CP) decomposition (Hitchcock, 1927) by enabling the interdependent learning of the two embeddings for each entity within the complex space. Furthermore, BoxE (Abboud et al., 2020) introduces the box embedding method as a means to model the uncertainty and diversity inherent in knowledge.
Semantic matching-based methods employ a similarity-based scoring function to evaluate the probabilities of triplets, such as DistMult (Yang et al., 2015) and McRL (Wang et al., 2022). DistMult employs matrix multiplication to model the interaction between the entity and relation. ComplEx (Trouillon et al., 2016) operates within the complex space to calculate the score of the triplet. CapsE (Vu et al., 2019) introduces the capsule network to capture the hierarchical relations and semantic information among entities. TuckER (Balažević et al., 2019) explores Tucker decomposition into the SKGC task. In addition, McRL (Wang et al., 2022) captures the complex conceptual information hidden in triplets to acquire accurate representations of entities and relations. MLI (Wang et al., 2024) simultaneously captures the coarse-grained and fine-grained information to enhance the information interaction.
Convolutional neural network-based methods explore the use of CNN to capture the inherent correlations within triplets. ConvE (Dettmers et al., 2018) first employs the CNN into the SKGC task. R-GCN (Schlichtkrull et al., 2018) explores the graph neural network to update entity embeddings. Moreover, TDN (Wang et al., 2023b) creatively designs the triplet distributor to facilitate the information transmission between entities and relations.
Temporal knowledge graph completion (TKGC) refers to the prediction of unknown quadruplets in TKGs based on known information, including entities, relations, and timestamps. Some classic TKGCs contain ChronoR (Sadeghian et al., 2021), TeLM (Xu et al., 2021) and BoxTE (Messner et al., 2022).
TTransE (Leblay and Chekol, 2018) models the pair of the relation and timestamp as the translation between the head entity and the tail entity. TA-TransE and TA-DistMult (García-Durán et al., 2018) integrate timestamps into entities using recurrent neural networks to capture the dynamic evolution of entities. Building upon RotatE, ChronoR represents relation-timestamp pairs as rotations from the head entity to the tail entity. Similarly, TuckERTNT (Shao et al., 2022) extends the rd-order tensor to the th-order to model quadruplets. More recently, BoxTE (Messner et al., 2022) has been introduced to enable more versatile and flexible knowledge representation.
HyTE (Dasgupta et al., 2018) first explores the dynamic evolution of entities and relations by modeling entities and relations into the timestamp space. TeRo (Xu et al., 2020b) models the temporal evolution of entities as a rotation in complex vector space, and handles time interval facts using dual complex embeddings for relations. TComplEx (Lacroix et al., 2020) is based on CompleEx, which expands the rd-order tensor into a th-order tensor to perform TKGC. DE-SimplE (Goel et al., 2020) designs the diachronic entity embedding function to capture the dynamic evolution of entities over time, subsequently employing SimplE for predicting missing items. ATiSE (Xu et al., 2020a) decomposes timestamps into the trend, seasonal, and irregular components to capture the evolution of entities and relations over time. TeLM (Xu et al., 2021) employs multivector embeddings and a linear temporal regularizer to obtain entity and timestamp embeddings, respectively. EvoExplore (Zhang et al., 2022) incorporates two critical factors for comprehending the evolution of TKGs: local structure describes the formation process of the graph structure in detail, and global structure reflects the dynamic topology of TKGs. BDME (Yue et al., 2023) leverages the interaction among entities, relations, and timestamps for coarse-grained embeddings and block decomposition for fine-grained embeddings. Particularly, QDN (Wang et al., 2023a) extends the triplet distributor (Wang et al., 2023b) into a quadruplet distributor and designs the th-order tensor decomposition to facilitate the information interaction among entities, relations, and timestamps.
2.2. Non-Euclidean Embedding-based Methods
Non-Euclidean embedding-based methods typically embed KGs into non-Euclidean space, effectively capturing the complex geometric structure inherent to them. Some classic non-Euclidean embedding methods include ATTH (Chami et al., 2020), MuRMP (Wang et al., 2021), and BiQCap (Zhang et al., 2023).
For SKGC, ATTH models the KG within the hyperbolic space to capture both hierarchical and logical patterns. BiQUE (Guo and Kok, 2021) utilizes biquaternions to incorporate multiple geometric transformations, including Euclidean rotation, which is valuable for modeling patterns like symmetry, and hyperbolic rotation, which proves effective in capturing hierarchical relations. MuRMP and GIE (Cao et al., 2022) simultaneously model the KG within multi-curvature spaces to capture the complex structure.
For TKGC, DyERNIE (Han et al., 2020) embeds TKGs into multi-curvature spaces to explore the dynamic evolution guided by velocity vectors defined in the tangent space. BiQCap (Zhang et al., 2023) simultaneously models each relation in Euclidean and hyperbolic spaces to represent hierarchical semantics and other relation patterns of TKGs.
3. Problem Definition
Temporal knowledge graph is a collection of entity set , relation set and timestamp set . Specifically, each quadruplet in is denoted as , where represent the head and tail entities, denotes the relation and is the timestamp. The primary objective of the TKGC task is to predict the missing tail entity when given a query , or the missing head entity when provided with a query .
4. Methodology
In this section, we present a detailed description of IME, which can be segmented into three main stages: Multi-curvature Embeddings, Space-shared and -specific Representations, and Adjustable Multi-curvature Pooling. The whole framework of IME is illustrated in Figure 2.
4.1. Multi-curvature Embeddings
TKGs typically encompass intricate geometric structures, including ring, hierarchical, and chain structures. Specifically, distinct geometric structures are characterized by differing modeling capacities across various geometric spaces. We simultaneously model TKGs in multi-curvature spaces to capture the complex structures.
Inspired by QDN (Wang et al., 2023a), for each curvature space, we introduce the quadruplet distributor to facilitate the information aggregation and distribution among them. This is due to the fact that entities, relations, and timestamps within each curvature space typically exist in distinct semantic spaces, hindering the information transmission among them.
Given the entity, relation, timestamp, and the initial zero-tensor of the quadruplet distributor, denoted as , , , and , we operate the information aggregation and information distribution.
Information Aggregation dynamically aggregates the information of entities, relations, and timestamps into the quadruplet distributor through gating functions,
(1) | ||||
where represents the sigmoid activation function; is the element-wise multiplication.
Subsequently, we employ the residual network to aggregate the information of the entity, relation, and timestamp into the quadruplet distributor,
(2) |
where is the element-wise sum.
Information Distribution distributes the above aggregated quadruplet distributor to the entity , relation and timestamp through gating functions,
(3) |
Finally, we distribute the information of quadruplet distributor into the entities, relations, and timestamps,
(4) |
Through the above information aggregation and information distribution process, we can obtain updated representations of entities, relations, and timestamps , and .
Similarly, we operate the above information aggregation and information distribution in multi-curvature spaces, including Euclidean, hyperbolic, and hyperspherical spaces. For each entity , relation , and timestamp , we can obtain their features in three curvature spaces, namely , , and (). Thus, we obtain nine features.
4.2. Space-Shared and -Specific Representations
In order to facilitate the learning of commonalities across different curvature spaces, and comprehensively capture the characteristic features unique to each curvature space, we employ encoding functions to capture both space-shared and space-specific properties. Given the updated representations (, , , , , ) of the entity, relation, and timestamp for different curvature spaces, we explore the gate attention mechanism to achieve the encoding functions.
Space-shared property focuses on recognizing commonalities across various curvature spaces to reduce spatial gaps among them. Specifically, it shares the parameters in encoding function to obtain the space-shared representations. The encoding process can be denoted as,
(5) |
where is the shared parameter across all three curvature spaces, represents the feature concatenation operation, is the element-wise multiplication, denotes the Sigmoid function. Thus, we can generate nine space-shared representations (, , , , , ) through the encoding functions .
Space-specific property comprehensively captures the characteristic features unique to each curvature space. Similarly, it employs the encoding function to obtain the space-specific representations,
(6) |
where , and are the specific parameters unique to each curvature space. Similar to the space-shared property, we can generate nine space-specific representations (, , , , , ) through the encoding functions .
Through the above encoding functions and , we can generate eighteen space-shared and -specific vectors (, , ).
4.3. Adjustable Multi-curvature Pooling
After obtaining the space-shared and -specific representations of entities, relations, and timestamps, the pooling approach is employed to aggregate them into a joint vector for downstream predictions. We first introduce two simple pooling approaches: Average Pooling (AP) and Max Pooling (MP). Then we introduce the proposed Adjustable Multi-curvature Pooling (AMP) approach.
As shown in Figure 3, for input features , , we first sort each dimension of features to extract the significant information, obtaining the sorted features . Then we introduce the pooling weights , which are used to perform a weighted sum over to get pooling feature ,
(7) |
Average Pooling sets all pooling weights to to get pooling feature ,
(8) |
Max Pooling sets the first pooling weight to and the others to to get pooling feature ,
(9) |
However, the aforementioned two pooling approaches rely on fixed pooling strategies, posing a challenge in ensuring the effective retention of important information.
Adjustable Multi-curvature Pooling automatically adjusts pooling weights to obtain a superior pooling strategy, effectively retaining important information. To learn appropriate pooling weights for the different positions of , i.e. , we first utilize the positional encoding strategy in (Vaswani et al., 2017; Chen et al., 2021) to get positional encoding . This positional encoding contains prior information between position indices, and can be formulated as follows,
(10) |
where indicates the dimension. Then we regard the sequence of positional encoding as input and utilize Bi-GRU (Schuster and Paliwal, 1997) and Multi-Layer Perceptron (MLP) to obtain pooling weights ,
(11) |
Further, is normalized as follows,
(12) |
Based on the learned pooling weights, we get the pooling feature ,
(13) |
According to (10), (11), (12) and (13), the entire calculation process of AMP can be integrated as follows,
(14) |
where indicates all the learnable parameters.
Datasets | #Entities | #Relations | #Timestamps | #Time Span | #Granularity | #Training | #Validation | #Test |
---|---|---|---|---|---|---|---|---|
ICEWS14 | 6,869 | 230 | 365 | 2014 | 1 day | 72,826 | 8,941 | 8,963 |
ICEWS05-15 | 10,094 | 251 | 4,017 | 2005-2015 | 1 day | 368,962 | 46,275 | 46,092 |
GDELT | 500 | 20 | 366 | 2015-2016 | 1 day | 2,735,685 | 341,961 | 341,961 |
Pooling Procedure. We concatenate the space-shared and space-specific representations of the entity, relation and timestamp into (, , ). Subsequently, we employ the AMP approach to effectively retain important information among entities, relations, and timestamps, and the score function can be defined as follows,
(15) |
where represents the inner product operation.
4.4. Loss Function
In this section, we propose the overall loss of the proposed model IME as follows,
(16) |
where are the hyper-parameters. Each component within the loss is responsible for achieving the desired properties.
Task Loss. Following the strategy in (Xu et al., 2021), we explore the cross-entropy and standard data augmentation protocol to achieve the multi-class task,
(17) |
Similarity Loss. The purpose of the similarity loss is to minimize the disparities among shared features across different curvature spaces, aiming to bridge spatial gaps among them. Specifically, Central Moment Discrepancy (CMD) (Zellinger et al., 2017) is a distance metric employed to evaluate the similarity between two distributions by quantifying the discrepancy in their central moments. A smaller CMD value indicates a higher similarity between the two distributions. Let and be bounded independent and identically distributed random vectors from two probability distributions, and , defined on the interval . The CMD can be defined as,
(18) | ||||
where is the expectation of , and is the central moment vector of order .
In our case, we calculate the similarity loss through CMD,
(19) |
where .
Difference Loss. The difference loss is designed to capture characteristic features of different curvature spaces through a similarity function. Specifically, we not only impose the soft orthogonality constraint between the shared and specific features but also between the space-specific features. The difference loss is calculated as:
(20) |
where , , is the squared Frobenius norm.
Structure Loss. The structure loss (Gao et al., 2022) aims to ensure the structural similarity of quadruplets across various curvature spaces. Specifically, we define the relation on a triplet of samples as the following cosine value:
(21) |
where is sample. Thus, the structure loss can be calculated as,
(22) |
where .
Datasets | ICEWS14 | ICEWS05-15 | GDELT | |||||||||
Metrics | MRR | Hit@1 | Hit@3 | Hit@10 | MRR | Hit@1 | Hit@3 | Hit@10 | MRR | Hit@1 | Hit@3 | Hit@10 |
TransE (2013) | 0.280 | 0.094 | – | 0.637 | 0.294 | 0.090 | – | 0.663 | 0.113 | 0.0 | 0.158 | 0.312 |
DistMult (2015) | 0.439 | 0.323 | – | 0.672 | 0.456 | 0.337 | – | 0.691 | 0.196 | 0.117 | 0.208 | 0.348 |
SimplE (2018) | 0.458 | 0.341 | 0.516 | 0.687 | 0.478 | 0.359 | 0.539 | 0.708 | 0.206 | 0.124 | 0.220 | 0.366 |
RotatE (2019) | 0.418 | 0.291 | 0.478 | 0.690 | 0.304 | 0.164 | 0.355 | 0.595 | – | – | – | – |
TA-DistMult (2018) | 0.477 | 0.363 | – | 0.686 | 0.474 | 0.346 | – | 0.728 | 0.206 | 0.124 | 0.219 | 0.365 |
ATiSE (2019) | 0.550 | 0.436 | 0.629 | 0.750 | 0.519 | 0.378 | 0.606 | 0.794 | – | – | – | – |
TeRo (2020) | 0.562 | 0.468 | 0.621 | 0.732 | 0.586 | 0.469 | 0.668 | 0.795 | 0.245 | 0.154 | 0.264 | 0.420 |
ChronoR (2021) | 0.625 | 0.547 | 0.669 | 0.773 | 0.675 | 0.596 | 0.723 | 0.820 | – | – | – | – |
TeLM (2021) | 0.625 | 0.545 | 0.673 | 0.774 | 0.678 | 0.599 | 0.728 | 0.823 | – | – | – | – |
TuckERTNT (2022) | 0.604 | 0.521 | 0.655 | 0.753 | 0.638 | 0.559 | 0.686 | 0.783 | 0.381 | 0.283 | 0.418 | 0.576 |
BoxTE (2022) | 0.613 | 0.528 | 0.664 | 0.763 | 0.667 | 0.582 | 0.719 | 0.820 | 0.352 | 0.269 | 0.377 | 0.511 |
EvoExplore (2022) | 0.725 | 0.653 | 0.778 | 0.852 | 0.790 | 0.719 | 0.843 | 0.915 | 0.514 | 0.353 | 0.602 | 0.748 |
BDME (2023) | 0.635 | 0.555 | 0.683 | 0.778 | – | – | – | – | 0.278 | 0.191 | 0.299 | 0.448 |
QDN (2023) | 0.643 | 0.567 | 0.688 | 0.784 | 0.692 | 0.611 | 0.743 | 0.838 | 0.545 | 0.481 | 0.576 | 0.668 |
DyERNIE (2020) | 0.669 | 0.599 | 0.714 | 0.797 | 0.739 | 0.679 | 0.773 | 0.855 | 0.457 | 0.390 | 0.479 | 0.589 |
BiQCap (2023) | 0.643 | 0.563 | 0.687 | 0.798 | 0.691 | 0.621 | 0.738 | 0.837 | 0.273 | 0.183 | 0.308 | 0.469 |
IME | 0.819 | 0.790 | 0.835 | 0.872 | 0.796 | 0.750 | 0.821 | 0.875 | 0.624 | 0.485 | 0.754 | 0.791 |
5. Experiment
In this section, we provide detailed information about the datasets, describe the experimental setups, present experimental results, and conduct a comprehensive analysis of experimental results.
5.1. Datasets
We provide a list of three commonly-used TKG datasets and their key statistics are summarized in Table 1. ICEWS14 and ICEWS05-15 (García-Durán et al., 2018) are subsets of Integrated Crisis Early Warning System (ICEWS), which encompass various political events along with their respective timestamps. GDELT (Leetaru and Schrodt, 2013) is a subset of the larger Global Database of Events, Language, and Tone (GDELT) that includes data on human social relationships.
5.2. Baselines
The proposed model is compared with some classic KGC methods, including SKGC and TKGC methods.
- •
-
•
TKGC methods: TA-DistMult (García-Durán et al., 2018), TeRo (Xu et al., 2020b), ChronoR (Sadeghian et al., 2021), ATiSE (Xu et al., 2020a), TeLM (Xu et al., 2021), TuckERTNT (Shao et al., 2022), BoxTE (Messner et al., 2022), BDME (Yue et al., 2023), EvoExplore (Zhang et al., 2022), DyERNIE (Han et al., 2020), BiQCap (Zhang et al., 2023), and QDN (Wang et al., 2023a).
5.3. Link Prediction Metrics
We substitute either the head or tail entity in each test quadruplet with all feasible entities sampled from the TKG. Subsequently, we rank the scores calculated by the score function. We employ Mean Reciprocal Rank (MRR) and Hit@ as evaluation metrics, with =, and . Higher values indicate better performance. Finally, we present the filtered results as final experimental results, which exclude all corrupted quadruplets from the TKG.
5.4. Parameters Setting
We use a grid search to find the best hyper-parameters based on the MRR performance on the validation dataset. Specifically, we tune the similarity loss weight , the difference loss weight , and the structure loss weight , choosing from . The optimal , and on different datasets are set as follows: , and for ICEWS14; , and for ICEWS05-15; , and for GDELT. We set the optimal embedding dimension to across all datasets. For the AMP approach, the dimension of positional encoding is set to , i.e., is set to . The dimension of Bi-GRU is also set to and MLP is used to project features from dimensions to .
Moreover, the learning rate is fine-tuned within the range on different datasets, ultimately being set to for all datasets. The batch size of 1000 is consistently applied across all datasets. The entire experiment is implemented using the PyTorch 1.8.1 platform and conducted on a single NVIDIA RTX A6000 GPU.
5.5. Experimental Results and Analysis
The link prediction experimental results are displayed in Table 2, and the experimental analyses are listed as follows:
(1) The proposed model outperforms state-of-the-art baselines on three datasets, showing clear superiority in most metrics. For example, the proposed model obtains and improvements over EvoExplore under MRR on ICEWS14 and ICEWS05-15, respectively. This phenomenon indicates that a single space is insufficient for modeling complex geometric structures concurrently, and the spatial gap in multi-curvature spaces severely limits the expressive capacity of TKGC models.
(2) BiQCap (Zhang et al., 2023) and DyERNIE (Han et al., 2020) are two important baselines because they both model TKGs in multi-curvature spaces. However, our proposed method still improves most metrics for all datasets. This phenomenon reflects that our proposed method can effectively reduce spatial gaps caused by the heterogeneity of different curvature spaces.
(3) QDN (Wang et al., 2023a) is also an essential baseline because it serves as a key component of the multi-curvature embeddings module. When compared to QDN, our proposed method exhibits a substantial improvement in performance across all metrics. This observation underscores the inadequacy of a single Euclidean space for modeling complex geometric structures.
These observations indicate that our proposed method can not only model complex geometric structures but also effectively reduce spatial gaps among different curvature spaces.
5.6. Impact of Loss Weights , , and
In this experiment, we explore the influence of changing the loss weights , , and on MRR. As depicted in Figure 4, it becomes evident that with increasing weight, various loss functions display noteworthy differences in performance. To be specific, the similarity loss and the difference loss display a parabolic shape, with their peaks occurring at . In contrast, the structure loss reveals an overall declining trend, gradually diminishing as the weight increases.
These phenomena clearly illustrate that appropriate weights for similarity and difference losses effectively facilitate the learning of common and characteristic features of entities, relations, and timestamps across multiple curvature spaces. Conversely, a higher weight for the structure loss restricts their flexibility in embeddings across these multiple curvature spaces.
Datasets | ICEWS14 | |||
Metrics | MRR | Hit@1 | Hit@3 | Hit@10 |
DistMult (2015) | 0.439 | 0.323 | – | 0.672 |
TA-DistMult (2018) | 0.477 | 0.363 | – | 0.686 |
IME (-) MP | 0.523 | 0.430 | 0.574 | 0.696 |
IME w/o | 0.740 | 0.693 | 0.765 | 0.824 |
IME w/o | 0.716 | 0.653 | 0.752 | 0.835 |
IME w/o | 0.810 | 0.760 | 0.810 | 0.859 |
IME | 0.819 | 0.790 | 0.835 | 0.872 |
-
1
MP represents Max Pooling.
5.7. Ablation Experiments
In order to investigate the impact of key modules and loss functions on experimental performance, we conducted a series of ablation experiments, and the corresponding link prediction results are presented in Table 3.
-
i.
“IME w/o ”, “IME w/o ”, and “IME w/o ” mean removing the similarity loss , difference loss , and structure loss ;
-
ii.
“IME (-) MP” represents replacing Adjustable Multi-curvature Pooling (AMP) with the Max Pooling (MP).
(1) In the first category of ablation experiments, the proposed model achieves a significant improvement on ICEWS14. For example, compared to “IME w/o ”, “IME w/o ”, and “IME w/o ”, the proposed model achieves , , and improvements on Hit@, respectively. Thus, we can summarize the following conclusions:
-
a)
Similarity loss can effectively learn the commonalities across distinct curvature spaces and mitigate spatial gaps among them;
-
b)
Difference loss can capture characteristic features specific to each space;
-
c)
Structure loss serves to constrain the embeddings of entities, relations, and timestamps by ensuring that information in distinct spaces exhibits comparable geometric structures.
(2) In the second category of ablation experiments, the proposed model exhibits a certain improvement on ICEWS14. This phenomenon demonstrates that the adjustable multi-curvature pooling approach can effectively strengthen the important information for modeling the current TKG while weakening the undesirable ones.
5.8. Impact of Embedding Dimensions
To empirically investigate the impact of embedding dimensions on ICEWS14, we fine-tune the dimension within the range of and analyze the experimental results. As shown in Figure 5, the MRR performance on ICEWS14 exhibits an initial increase followed by a decrease as the dimension increases, eventually peaking at .
This phenomenon implies that the proposed model faces challenges in capturing intricate data relationships at lower dimensions, resulting in poorer performance. As the dimension increases, the model becomes more capable of effectively representing data, leading to enhanced performance. Nevertheless, beyond a certain threshold, this may introduce some issues such as overfitting or heightened complexity, consequently causing a decline in performance.
6. Conclusion
In this paper, we proposed a novel TKGC method called Integrating Multi-curvature shared and specific Embedding (IME). Specifically, IME models TKGs in multi-curvature spaces to capture complex geometric structures. Meanwhile, IME learns the space-specific property to comprehensively capture characteristic information, and the space-shared property to reduce spatial gaps caused by the heterogeneity of different curvature spaces. Furthermore, IME innovatively proposes an Adjustable Multi-curvature Pooling (AMP) approach to effectively strengthen the retention of important information. Experimental results on several well-established datasets incontrovertibly show that IME achieves competitive performance when compared to state-of-the-art TKGC methods.
Acknowledgements.
This work was funded by the National Key RD Program of China (No. 2021ZD0111902), National Natural Science Foundation of China (No. 92370102, 62272015, U21B2038), RD Program of Beijing Municipal Education Commission (KZ202210005008).References
- (1)
- Abboud et al. (2020) Ralph Abboud, Ismail Ceylan, Thomas Lukasiewicz, and Tommaso Salvatori. 2020. BoxE: A box embedding model for knowledge base completion. In Neural Information Processing Systems. 9649–9661.
- Balažević et al. (2019) Ivana Balažević, Carl Allen, and Timothy M Hospedales. 2019. Tucker: Tensor factorization for knowledge graph completion. In Empirical Methods in Natural Language Processing. 5185–5194.
- Bordes et al. (2013) Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. 2013. Translating embeddings for modeling multi-relational data. In Neural Information Processing Systems. 2787–2795.
- Boschee et al. (2015) Elizabeth Boschee, Jennifer Lautenschlager, Sean O’Brien, Steve Shellman, James Starz, and Michael Ward. 2015. ICEWS Coded Event Data. In Harvard Dataverse.
- Cao et al. (2022) Zongsheng Cao, Qianqian Xu, Zhiyong Yang, Xiaochun Cao, and Qingming Huang. 2022. Geometry interaction knowledge graph embeddings. In AAAI Conference on Artificial Intelligence.
- Chami et al. (2020) Ines Chami, Adva Wolf, Da Cheng Juan, Frederic Sala, Sujith Ravi, and Christopher Ré. 2020. Low-dimensional hyperbolic knowledge graph embeddings. In Annual Meeting of the Association for Computational Linguistics. 6901–6914.
- Chen et al. (2021) Jiacheng Chen, Hexiang Hu, Hao Wu, Yuning Jiang, and Changhu Wang. 2021. Learning the best pooling strategy for visual semantic embedding. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 15789–15798.
- Dalton et al. (2014) Jeffrey Dalton, Laura Dietz, and James Allan. 2014. Entity query feature expansion using knowledge base links. In ACM SIGIR Conference on Research & Development in Information Retrieval. 365–374.
- Dasgupta et al. (2018) Shib Sankar Dasgupta, Swayambhu Nath Ray, and Partha Talukdar. 2018. HyTE: Hyperplane-based temporally aware knowledge graph embedding. In Empirical Methods in Natural Language Processing. 2001–2011.
- Dettmers et al. (2018) Tim Dettmers, Pasquale Minervini, Pontus Stenetorp, and Sebastian Riedel. 2018. Convolutional 2D knowledge graph embeddings. In AAAI Conference on Artificial Intelligence. 1811–1818.
- Gao et al. (2022) Qiankun Gao, Chen Zhao, Bernard Ghanem, and Jian Zhang. 2022. R-DFCIL: Relation-guided representation learning for data-free class incremental learning. In European Conference on Computer Vision. 423–439.
- García-Durán et al. (2018) Alberto García-Durán, Sebastijan Dumančić, and Mathias Niepert. 2018. Learning sequence encoders for temporal knowledge graph completion. In Empirical Methods in Natural Language Processing. 4816–4821.
- Goel et al. (2020) Rishab Goel, Seyed Mehran Kazemi, Marcus Brubaker, and Pascal Poupart. 2020. Diachronic embedding for temporal knowledge graph completion. In AAAI Conference on Artificial Intelligence. 3988–3995.
- Guha et al. (2003) Ramanathan Guha, Rob McCool, and Eric Miller. 2003. Semantic search. In Proceedings of the 12th international conference on World Wide Web. 700–709.
- Guo and Kok (2021) Jia Guo and Stanley Kok. 2021. BiQUE: Biquaternionic Embeddings of Knowledge Graphs. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 8338–8351.
- Han et al. (2020) Zhen Han, Yunpu Ma, Peng Chen, and Volker Tresp. 2020. DyERNIE: Dynamic evolution of riemannian manifold embeddings for temporal knowledge graph completion. In Empirical Methods in Natural Language Processing. 7301–7316.
- Hitchcock (1927) Frank L Hitchcock. 1927. The expression of a tensor or a polyadic as a sum of products. Journal of Mathematics and Physics 6, 1-4 (1927), 164–189.
- Kazemi and Poole (2018) Seyed Mehran Kazemi and David Poole. 2018. SimplE embedding for link prediction in knowledge graphs. In Neural Information Processing Systems. 4284–4295.
- Ko et al. (2022) Hyeyoung Ko, Suyeon Lee, Yoonseo Park, and Anna Choi. 2022. A survey of recommendation systems: recommendation models, techniques, and application fields. Electronics 11, 1 (2022), 141.
- Lacroix et al. (2020) Timothée Lacroix, Guillaume Obozinski, and Nicolas Usunier. 2020. Tensor decompositions for temporal knowledge base completion. In International Conference on Learning Representations. 1–12.
- Leblay and Chekol (2018) Julien Leblay and Melisachew Wudage Chekol. 2018. Deriving validity time in knowledge graph. In International Conference on World Wide Web. 1771–1776.
- Leetaru and Schrodt (2013) Kalev Leetaru and Philip A Schrodt. 2013. GDELT: Global data on events, location, and tone. In ISA Annual Convention. 1–49.
- Lin et al. (2015) Yankai Lin, Zhiyuan Liu, Maosong Sun, Yang Liu, and Xuan Zhu. 2015. Learning entity and relation embeddings for knowledge graph completion. In AAAI Conference on Artificial Intelligence. 2181–2187.
- Liu et al. (2017) Weiyang Liu, Yandong Wen, Zhiding Yu, Ming Li, Bhiksha Raj, and Le Song. 2017. Sphereface: Deep hypersphere embedding for face recognition. In the Conference on Computer Vision and Pattern Recognition. 212–220.
- Messner et al. (2022) Johannes Messner, Ralph Abboud, and Ismail Ilkan Ceylan. 2022. Temporal knowledge graph completion using box embeddings. In AAAI Conference on Artificial Intelligence. 7779–7787.
- Nickel and Kiela (2017) Maximillian Nickel and Douwe Kiela. 2017. Poincaré embeddings for learning hierarchical representations. In Advances in Neural Information Processing Systems.
- Pan et al. (2024) Shirui Pan, Linhao Luo, Yufei Wang, Chen Chen, Jiapu Wang, and Xindong Wu. 2024. Unifying Large Language Models and Knowledge Graphs: A Roadmap. IEEE Transactions on Knowledge and Data Engineering (2024), 1–20.
- Sadeghian et al. (2021) Ali Sadeghian, Mohammadreza Armandpour, Anthony Colas, and Daisy Zhe Wang. 2021. ChronoR: rotation based temporal knowledge graph embedding. In AAAI Conference on Artificial Intelligence. 6471–6479.
- Sala et al. (2018) Frederic Sala, Chris De Sa, Albert Gu, and Christopher Ré. 2018. Representation tradeoffs for hyperbolic embeddings. In International Conference on Machine Learning. 4460–4469.
- Schlichtkrull et al. (2018) Michael Schlichtkrull, Thomas N Kipf, Peter Bloem, Rianne Van Den Berg, Ivan Titov, and Max Welling. 2018. Modeling relational data with graph convolutional networks. In European Semantic Web Conference. Springer, 593–607.
- Schuster and Paliwal (1997) Mike Schuster and Kuldip K Paliwal. 1997. Bidirectional recurrent neural networks. IEEE transactions on Signal Processing 45, 11 (1997), 2673–2681.
- Shao et al. (2022) Pengpeng Shao, Dawei Zhang, Guohua Yang, Jianhua Tao, Feihu Che, and Tong Liu. 2022. Tucker decomposition-based temporal knowledge graph completion. Knowledge-Based Systems 238 (2022), 107841.
- Sun et al. (2019) Zhiqing Sun, Zhi-Hong Deng, Jian-Yun Nie, and Jian Tang. 2019. RotatE: Knowledge graph embedding by relational rotation in complex space. In International Conference on Learning Representations. 1–18.
- Trouillon et al. (2016) Théo Trouillon, Johannes Welbl, Sebastian Riedel, Éric Gaussier, and Guillaume Bouchard. 2016. Complex embeddings for simple link prediction. In International Conference on Machine Learning. 2071–2080.
- Vaswani et al. (2017) Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).
- Vu et al. (2019) Thanh Vu, Tu Dinh Nguyen, Dat Quoc Nguyen, Dinh Phung, et al. 2019. A capsule network-based embedding model for knowledge graph completion and search personalization. In Proceedings of the Conference of the Association for Computational Linguistics: Human Language Technologies. 2180–2189.
- Wang et al. (2024) Jiapu Wang, Boyue Wang, Junbin Gao, Simin Hu, Yongli Hu, and Baocai Yin. 2024. Multi-Level Interaction Based Knowledge Graph Completion. IEEE/ACM Transactions on Audio, Speech, and Language Processing 32 (2024), 386–396.
- Wang et al. (2022) Jiapu Wang, Boyue Wang, Junbin Gao, Yongli Hu, and Baocai Yin. 2022. Multi-concept Representation Learning for Knowledge Graph Completion. ACM Transactions on Knowledge Discovery from Data (2022).
- Wang et al. (2023a) Jiapu Wang, Boyue Wang, Junbin Gao, Xiaoyan Li, Yongli Hu, and Baocai Yin. 2023a. QDN: A Quadruplet Distributor Network for Temporal Knowledge Graph Completion. IEEE Transactions on Neural Networks and Learning Systems (2023).
- Wang et al. (2023b) Jiapu Wang, Boyue Wang, Junbin Gao, Xiaoyan Li, Yongli Hu, and Baocai Yin. 2023b. TDN: Triplet Distributor Network for Knowledge Graph Completion. IEEE Transactions on Knowledge and Data Engineering (2023).
- Wang et al. (2023c) Jiapu Wang, Boyue Wang, Meikang Qiu, Shirui Pan, Bo Xiong, Heng Liu, Linhao Luo, Tengfei Liu, Yongli Hu, Baocai Yin, et al. 2023c. A Survey on Temporal Knowledge Graph Completion: Taxonomy, Progress, and Prospects. arXiv preprint arXiv:2308.02457 (2023).
- Wang et al. (2021) Shen Wang, Xiaokai Wei, Cicero Nogueira Nogueira dos Santos, Zhiguo Wang, Ramesh Nallapati, Andrew Arnold, Bing Xiang, Philip S Yu, and Isabel F Cruz. 2021. Mixed-curvature multi-relational graph neural network for knowledge graph completion. In International Conference on World Wide Web. 1761–1771.
- Wilson et al. (2014) Richard C Wilson, Edwin R Hancock, Elżbieta Pekalska, and Robert PW Duin. 2014. Spherical and hyperbolic embeddings of data. IEEE Transactions on Pattern Analysis and Machine Intelligence 36, 11 (2014), 2255–2269.
- Xu et al. (2021) Chengjin Xu, Yung-Yu Chen, Mojtaba Nayyeri, and Jens Lehmann. 2021. Temporal knowledge graph completion using a linear temporal regularizer and multivector embeddings. In Annual Meeting of the Association for Computational Linguistics. 2569–2578.
- Xu et al. (2020a) Chenjin Xu, Mojtaba Nayyeri, Fouad Alkhoury, Hamed Yazdi, and Jens Lehmann. 2020a. Temporal knowledge graph completion based on time series gaussian embedding. In International Semantic Web Conference. 654–671.
- Xu et al. (2020b) Chengjin Xu, Mojtaba Nayyeri, Fouad Alkhoury, Hamed Shariat Yazdi, and Jens Lehmann. 2020b. TeRo: A time-aware knowledge graph embedding via temporal rotation. In International Conference on Computational Linguistics. 1583–1593.
- Yang et al. (2015) Bishan Yang, Wen Tau Yih, Xiaodong He, Jianfeng Gao, and Li Deng. 2015. Embedding entities and relations for learning and inference in knowledge bases. In International Conference on Learning Representations. 1–13.
- Yue et al. (2023) Lupeng Yue, Yongjian Ren, Yan Zeng, Jilin Zhang, Kaisheng Zeng, and Jian Wan. 2023. Block Decomposition with Multi-granularity Embedding for Temporal Knowledge Graph Completion. In International Conference on Database Systems for Advanced Applications. 706–715.
- Zellinger et al. (2017) Werner Zellinger, Thomas Grubinger, Edwin Lughofer, Thomas Natschläger, and Susanne Saminger-Platz. 2017. Central Moment Discrepancy (CMD) for Domain-Invariant Representation Learning. In International Conference on Learning Representations. 1–13.
- Zhang et al. (2022) Jiasheng Zhang, Shuang Liang, Yongpan Sheng, and Jie Shao. 2022. Temporal knowledge graph representation learning with local and global evolutions. Knowledge-Based Systems 251 (2022), 109234.
- Zhang et al. (2023) Sensen Zhang, Xun Liang, Zhiying Li, Junlan Feng, Xiangping Zheng, and Bo Wu. 2023. BiQCap: A Biquaternion and Capsule Network-Based Embedding Model for Temporal Knowledge Graph Completion. In International Conference on Database Systems for Advanced Applications. 673–688.