Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

IME: Integrating Multi-curvature Shared and Specific Embedding for Temporal Knowledge Graph Completion

Jiapu Wang Beijing University of TechnologyBeijingChina jpwang@emails.bjut.edu.cn 0001-7639-5289 Zheng Cui Beijing University of TechnologyBeijingChina CuiZ@emails.bjut.edu.cn Boyue Wang Beijing University of TechnologyBeijingChina wby@bjut.edu.cn Shirui Pan Griffith UniversityGold CoastAustralia s.pan@griffith.edu.au Junbin Gao The University of SydneySydneyAustralia junbin.gao@sydney.edu.au Baocai Yin Beijing University of TechnologyBeijingChina ybc@bjut.edu.cn  and  Wen Gao Peking UniversityBeijingChina wgao@pku.edu.cn
(2024)
Abstract.

Temporal Knowledge Graphs (TKGs) incorporate a temporal dimension, allowing for a precise capture of the evolution of knowledge and reflecting the dynamic nature of the real world. Typically, TKGs contain complex geometric structures, with various geometric structures interwoven. However, existing Temporal Knowledge Graph Completion (TKGC) methods either model TKGs in a single space or neglect the heterogeneity of different curvature spaces, thus constraining their capacity to capture these intricate geometric structures. In this paper, we propose a novel Integrating Multi-curvature shared and specific Embedding (IME) model for TKGC tasks. Concretely, IME models TKGs into multi-curvature spaces, including hyperspherical, hyperbolic, and Euclidean spaces. Subsequently, IME incorporates two key properties, namely space-shared property and space-specific property. The space-shared property facilitates the learning of commonalities across different curvature spaces and alleviates the spatial gap caused by the heterogeneous nature of multi-curvature spaces, while the space-specific property captures characteristic features. Meanwhile, IME proposes an Adjustable Multi-curvature Pooling (AMP) approach to effectively retain important information. Furthermore, IME innovatively designs similarity, difference, and structure loss functions to attain the stated objective. Experimental results clearly demonstrate the superior performance of IME over existing state-of-the-art TKGC models.

Temporal Knowledge Graph, Knowledge Graph Completion, Multi-curvature Embeddings, Adjustable Pooling
journalyear: 2024copyright: acmlicensedconference: Proceedings of the ACM Web Conference 2024; May 13–17, 2024; Singapore, Singapore.booktitle: Proceedings of the ACM Web Conference 2024 (WWW ’24), May 13–17, 2024, Singapore, Singaporeisbn: 979-8-4007-0171-9/24/05doi: 10.1145/3589334.3645361price: 15.00ccs: Computing methodologies Reasoning about belief and knowledge

1. Introduction

Knowledge Graphs (KGs) are structured collections of entities and relations, providing a semantic representation of knowledge. They serve as a powerful tool for organizing and representing real-world information in a way that machines can comprehend. Typically, knowledge in KGs is represented as triplets, where each node is represented as an entity, and the directed edge between nodes is denoted as a relation. For example, given one triplet (Albert Einstein,  born_in, Germany), Albert Einstein and Germany are the head and tail entities, and born_in means the relation between the head and tail entities. KGs find applications in a wide array of domains, including recommendation systems (Ko et al., 2022), information retrieval (Dalton et al., 2014), and semantic search (Guha et al., 2003). They enable machines to reason about entities and their relations, uncover patterns, and make informed decisions based on the structured knowledge they encapsulate.

Refer to caption
Figure 1. A brief description of IME. Learning multi-curvature representations through space-shared and space-specific properties. These features are later utilized for subsequent predictions by the adjustable multi-curvature pooling.

Acknowledging the ever-changing nature of information, Temporal Knowledge Graphs (TKGs) have arisen as a natural extension of traditional KGs. In contrast to their static counterparts, TKGs introduce the temporal dimension, enabling us to track the evolution of knowledge over time. Specifically, TKGs aim to incorporate temporal attributes with triplets for quadruplets: (Albert Einstein,  born_in, Germany, 1879-03-14), with 1879-03-14 serving as the timestamp. Therefore, the temporal dimension allows for a systematic depiction of the trends and changes in events, thereby facilitating more context-aware and precise knowledge representation.

Despite the presence of TKGs like ICEWS (Boschee et al., 2015) and GDELT (Leetaru and Schrodt, 2013), which encompass millions or even billions of quadruplets, the ongoing evolution of knowledge driven by natural events leaves these TKGs far from being comprehensive. The incompleteness of TKGs poses a substantial hindrance to the efficiency of knowledge-driven systems, underscoring the critical significance of ”Temporal Knowledge Graph Completion (TKGC)” as an essential undertaking. The goal of the TKGC task is to enhance the completeness and accuracy of TKGs by predicting missing relations, entities, or temporal attributes that change over time within the TKGs.

The quality of the embedding representations in TKGs depends on how well the geometric structure of the embedding space matches the structure of the data. As depicted in (Wang et al., 2021; Cao et al., 2022), various curvature spaces yield diverse impacts when embed different types of structured data. Specifically, hyperspherical space (Wilson et al., 2014; Liu et al., 2017) excels in capturing ring structures, hyperbolic space (Nickel and Kiela, 2017; Sala et al., 2018) is highly effective in representing hierarchical arrangements, and Euclidean space proves invaluable for describing chain-like structures. However, in reality, TKGs may exhibit complex and diverse structures, resembling tree shapes in some regions and forming ring structures in others. Nonetheless, the majority of TKGC methods typically model TKGs within a singular space, posing a challenge in effectively capturing the intricate geometric structures inherent in TKGs.

The challenge of how to effectively integrate information from different curvature spaces subsequently needs to be addressed. Current TKGC methods (Han et al., 2020; Zhang et al., 2023) typically overlook the spatial gap among different curvature spaces. Despite significant advancements, the spatial gap remains a substantial constraint on expressive capacities.

The last challenge is the feature fusion issue. Existing methods (Yue et al., 2023; Wang et al., 2023a) predominantly focus on developing sophisticated fusion mechanisms, causing a high computational complexity. Despite the effectiveness of pooling approaches like average pooling and max pooling in reducing computational complexity, their utilization of fixed pooling strategies presents a challenge in preserving important information.

This paper proposes a novel Integrating Multi-curvature shared and specific Embedding (IME) model to address the above challenges. As shown in Figure 1, IME simultaneously models TKGs in hyperspherical, hyperbolic, and Euclidean spaces, introducing the quadruplet distributor (Wang et al., 2023a) within each space to facilitate the aggregation and distribution of information among entities, relations, and timestamps. In addition, IME acquires two distinct properties for each space, encompassing both space-shared and space-specific properties. The space-shared property aids in mitigating the space gap by capturing shared information among entities, relations, and timestamps across various curvature spaces. Conversely, the space-specific property excels at fully capturing the complementary information exclusive to each curvature space. Finally, an Adjustable Multi-curvature Pooling (AMP) approach is proposed, which can learn appropriate pooling weights to get a superior pooling strategy, ultimately improving the effective retention of important information. We utilize AMP to aggregate space-shared and -specific representations of entities, relations, and timestamps to get a joint vector for downstream predictions.

The main contributions of this paper are summarized as follows:

  • This paper designs a novel Multi-curvature Space-Shared and -Specific Embedding (IME) model for TKGC tasks, which learns two key properties, namely space-shared property and space-specific property. Specifically, space-shared property learns the commonalities across distinct curvature spaces and mitigates spatial gaps among them, while space-specific property captures characteristic features;

  • This paper proposes an adjustable multi-curvature pooling module, designed to attain a superior pooling strategy through training for the effective retention of important information;

  • To the best of our knowledge, we are the first to introduce the concept of structure loss into TKGC tasks, ensuring the structural similarity of quadruplets across various curvature spaces;

  • Experimental results on several widely used datasets demonstrate that IME achieves competitive performance compared to state-of-the-art TKGC methods.

Refer to caption
Figure 2. The framework of IME. Specifically, IME models the query (Albert Einstein,  Born In, ?, 1879-3-14) in multi-curvature spaces through information aggregation and information distribution. Subsequently, IME explores space-shared and space-specific properties to learn the commonalities and characteristics across different curvature spaces, effectively reducing spatial gaps among them. Finally, these identified features are employed for adjustable multi-curvature pooling in subsequent predictions.

2. Related work

In this section, we provide an overview of KGC methods from two perspectives (Wang et al., 2023c; Pan et al., 2024): Euclidean embedding-based methods and Non-Euclidean embedding-based methods.

2.1. Euclidean Embedding-based Methods

Euclidean embedding-based KGC methods typically model the KGs in the Euclidean space. Depending on the types of knowledge, we can categorize them into static knowledge graph completion for triplets and temporal knowledge graph completion for quadruplets.

Static knowledge graph completion (SKGC) focuses on SKGs where the information about entities and relations remains unchanged over time. The task of SKGC methods aims to predict missing triplets (e.g., relations between entities) based on known information. Several popular SKGC methods include McRL (Wang et al., 2022), TDN (Wang et al., 2023b), and ConvE (Dettmers et al., 2018).

Translation-based methods take the relation as a translation from the head entity to the tail entity, such as TransE (Bordes et al., 2013) and RotatE (Sun et al., 2019). RotatE regards the relation as a rotation from the head entity to the tail entity in the complex space. Based on TransE, TransR (Lin et al., 2015) learns a unified mapping matrix to model the entities and relations into a common space. SimplE (Kazemi and Poole, 2018) improves upon the complex Canonical Polyadic (CP) decomposition (Hitchcock, 1927) by enabling the interdependent learning of the two embeddings for each entity within the complex space. Furthermore, BoxE (Abboud et al., 2020) introduces the box embedding method as a means to model the uncertainty and diversity inherent in knowledge.

Semantic matching-based methods employ a similarity-based scoring function to evaluate the probabilities of triplets, such as DistMult (Yang et al., 2015) and McRL (Wang et al., 2022). DistMult employs matrix multiplication to model the interaction between the entity and relation. ComplEx (Trouillon et al., 2016) operates within the complex space to calculate the score of the triplet. CapsE (Vu et al., 2019) introduces the capsule network to capture the hierarchical relations and semantic information among entities. TuckER (Balažević et al., 2019) explores Tucker decomposition into the SKGC task. In addition, McRL (Wang et al., 2022) captures the complex conceptual information hidden in triplets to acquire accurate representations of entities and relations. MLI (Wang et al., 2024) simultaneously captures the coarse-grained and fine-grained information to enhance the information interaction.

Convolutional neural network-based methods explore the use of CNN to capture the inherent correlations within triplets. ConvE (Dettmers et al., 2018) first employs the CNN into the SKGC task. R-GCN (Schlichtkrull et al., 2018) explores the graph neural network to update entity embeddings. Moreover, TDN (Wang et al., 2023b) creatively designs the triplet distributor to facilitate the information transmission between entities and relations.

Temporal knowledge graph completion (TKGC) refers to the prediction of unknown quadruplets in TKGs based on known information, including entities, relations, and timestamps. Some classic TKGCs contain ChronoR (Sadeghian et al., 2021), TeLM (Xu et al., 2021) and BoxTE (Messner et al., 2022).

TTransE (Leblay and Chekol, 2018) models the pair of the relation and timestamp as the translation between the head entity and the tail entity. TA-TransE and TA-DistMult (García-Durán et al., 2018) integrate timestamps into entities using recurrent neural networks to capture the dynamic evolution of entities. Building upon RotatE, ChronoR represents relation-timestamp pairs as rotations from the head entity to the tail entity. Similarly, TuckERTNT (Shao et al., 2022) extends the 3333rd-order tensor to the 4444th-order to model quadruplets. More recently, BoxTE (Messner et al., 2022) has been introduced to enable more versatile and flexible knowledge representation.

HyTE (Dasgupta et al., 2018) first explores the dynamic evolution of entities and relations by modeling entities and relations into the timestamp space. TeRo (Xu et al., 2020b) models the temporal evolution of entities as a rotation in complex vector space, and handles time interval facts using dual complex embeddings for relations. TComplEx (Lacroix et al., 2020) is based on CompleEx, which expands the 3333rd-order tensor into a 4444th-order tensor to perform TKGC. DE-SimplE (Goel et al., 2020) designs the diachronic entity embedding function to capture the dynamic evolution of entities over time, subsequently employing SimplE for predicting missing items. ATiSE (Xu et al., 2020a) decomposes timestamps into the trend, seasonal, and irregular components to capture the evolution of entities and relations over time. TeLM (Xu et al., 2021) employs multivector embeddings and a linear temporal regularizer to obtain entity and timestamp embeddings, respectively. EvoExplore (Zhang et al., 2022) incorporates two critical factors for comprehending the evolution of TKGs: local structure describes the formation process of the graph structure in detail, and global structure reflects the dynamic topology of TKGs. BDME (Yue et al., 2023) leverages the interaction among entities, relations, and timestamps for coarse-grained embeddings and block decomposition for fine-grained embeddings. Particularly, QDN (Wang et al., 2023a) extends the triplet distributor (Wang et al., 2023b) into a quadruplet distributor and designs the 4444th-order tensor decomposition to facilitate the information interaction among entities, relations, and timestamps.

2.2. Non-Euclidean Embedding-based Methods

Non-Euclidean embedding-based methods typically embed KGs into non-Euclidean space, effectively capturing the complex geometric structure inherent to them. Some classic non-Euclidean embedding methods include ATTH (Chami et al., 2020), MuRMP (Wang et al., 2021), and BiQCap (Zhang et al., 2023).

For SKGC, ATTH models the KG within the hyperbolic space to capture both hierarchical and logical patterns. BiQUE (Guo and Kok, 2021) utilizes biquaternions to incorporate multiple geometric transformations, including Euclidean rotation, which is valuable for modeling patterns like symmetry, and hyperbolic rotation, which proves effective in capturing hierarchical relations. MuRMP and GIE (Cao et al., 2022) simultaneously model the KG within multi-curvature spaces to capture the complex structure.

For TKGC, DyERNIE (Han et al., 2020) embeds TKGs into multi-curvature spaces to explore the dynamic evolution guided by velocity vectors defined in the tangent space. BiQCap (Zhang et al., 2023) simultaneously models each relation in Euclidean and hyperbolic spaces to represent hierarchical semantics and other relation patterns of TKGs.

3. Problem Definition

Temporal knowledge graph 𝒢={,,𝒯,𝒬}𝒢𝒯𝒬\mathcal{G}=\{\mathcal{E},\ \mathcal{R},\ \mathcal{T},\ \mathcal{Q}\}caligraphic_G = { caligraphic_E , caligraphic_R , caligraphic_T , caligraphic_Q } is a collection of entity set \mathcal{E}caligraphic_E, relation set \mathcal{R}caligraphic_R and timestamp set 𝒯𝒯\mathcal{T}caligraphic_T. Specifically, each quadruplet in 𝒢𝒢\mathcal{G}caligraphic_G is denoted as (𝐬,𝐫,𝐨,𝐭)𝒬𝐬𝐫𝐨𝐭𝒬(\mathbf{s},\ \mathbf{r},\ \mathbf{o},\ \mathbf{t})\in\mathcal{Q}( bold_s , bold_r , bold_o , bold_t ) ∈ caligraphic_Q, where 𝐬,𝐨𝐬𝐨\mathbf{s},\ \mathbf{o}\in\mathcal{E}bold_s , bold_o ∈ caligraphic_E represent the head and tail entities, 𝐫𝐫\mathbf{r}\in\mathcal{R}bold_r ∈ caligraphic_R denotes the relation and 𝐭𝒯𝐭𝒯\mathbf{t}\in\mathcal{T}bold_t ∈ caligraphic_T is the timestamp. The primary objective of the TKGC task is to predict the missing tail entity when given a query (𝐬,𝐫,?,𝐭)𝐬𝐫?𝐭(\mathbf{s},\ \mathbf{r},\ \mathbf{?},\ \mathbf{t})( bold_s , bold_r , ? , bold_t ), or the missing head entity when provided with a query (?,𝐫,𝐨,𝐭)?𝐫𝐨𝐭(?,\ \mathbf{r},\ \mathbf{o},\ \mathbf{t})( ? , bold_r , bold_o , bold_t ).

4. Methodology

In this section, we present a detailed description of IME, which can be segmented into three main stages: Multi-curvature Embeddings, Space-shared and -specific Representations, and Adjustable Multi-curvature Pooling. The whole framework of IME is illustrated in Figure 2.

4.1. Multi-curvature Embeddings

TKGs typically encompass intricate geometric structures, including ring, hierarchical, and chain structures. Specifically, distinct geometric structures are characterized by differing modeling capacities across various geometric spaces. We simultaneously model TKGs in multi-curvature spaces to capture the complex structures.

Inspired by QDN (Wang et al., 2023a), for each curvature space, we introduce the quadruplet distributor to facilitate the information aggregation and distribution among them. This is due to the fact that entities, relations, and timestamps within each curvature space typically exist in distinct semantic spaces, hindering the information transmission among them.

Given the entity, relation, timestamp, and the initial zero-tensor of the quadruplet distributor, denoted as 𝐬𝐬\mathbf{s}bold_s, 𝐫𝐫\mathbf{r}bold_r, 𝐭𝐭\mathbf{t}bold_t, and 𝐪𝐪\mathbf{q}bold_q, we operate the information aggregation and information distribution.

Information Aggregation dynamically aggregates the information of entities, relations, and timestamps into the quadruplet distributor through gating functions,

(1) 𝐬𝐪𝟏subscript𝐬𝐪𝟏\displaystyle\mathbf{s_{q1}}bold_s start_POSTSUBSCRIPT bold_q1 end_POSTSUBSCRIPT =(𝐬𝐪)[σ(𝐬𝐪)]absentdirect-product𝐬𝐪delimited-[]𝜎𝐬𝐪\displaystyle=(\mathbf{s}-\mathbf{q})\odot[\sigma(\mathbf{s}-\mathbf{q})]= ( bold_s - bold_q ) ⊙ [ italic_σ ( bold_s - bold_q ) ]
𝐫𝐪𝟏subscript𝐫𝐪𝟏\displaystyle\mathbf{r_{q1}}bold_r start_POSTSUBSCRIPT bold_q1 end_POSTSUBSCRIPT =(𝐫𝐪)[σ(𝐫𝐪)]absentdirect-product𝐫𝐪delimited-[]𝜎𝐫𝐪\displaystyle=(\mathbf{r}-\mathbf{q})\odot[\sigma(\mathbf{r}-\mathbf{q})]= ( bold_r - bold_q ) ⊙ [ italic_σ ( bold_r - bold_q ) ]
𝐭𝐪𝟏subscript𝐭𝐪𝟏\displaystyle\mathbf{t_{q1}}bold_t start_POSTSUBSCRIPT bold_q1 end_POSTSUBSCRIPT =(𝐭𝐪)[σ(𝐭𝐪)],absentdirect-product𝐭𝐪delimited-[]𝜎𝐭𝐪\displaystyle=(\mathbf{t}-\mathbf{q})\odot[\sigma(\mathbf{t}-\mathbf{q})],= ( bold_t - bold_q ) ⊙ [ italic_σ ( bold_t - bold_q ) ] ,

where σ𝜎\sigmaitalic_σ represents the sigmoid activation function; direct-product\odot is the element-wise multiplication.

Subsequently, we employ the residual network to aggregate the information of the entity, relation, and timestamp into the quadruplet distributor,

(2) 𝐪ˇ=𝐪𝐬𝐪𝟏𝐫𝐪𝟏𝐭𝐪𝟏,ˇ𝐪direct-sum𝐪subscript𝐬𝐪𝟏subscript𝐫𝐪𝟏subscript𝐭𝐪𝟏\mathbf{\check{q}}=\mathbf{q}\oplus\mathbf{s_{q1}}\oplus\mathbf{r_{q1}}\oplus% \mathbf{t_{q1}},overroman_ˇ start_ARG bold_q end_ARG = bold_q ⊕ bold_s start_POSTSUBSCRIPT bold_q1 end_POSTSUBSCRIPT ⊕ bold_r start_POSTSUBSCRIPT bold_q1 end_POSTSUBSCRIPT ⊕ bold_t start_POSTSUBSCRIPT bold_q1 end_POSTSUBSCRIPT ,

where direct-sum\oplus is the element-wise sum.

Information Distribution distributes the above aggregated quadruplet distributor 𝐪ˇˇ𝐪\mathbf{\check{q}}overroman_ˇ start_ARG bold_q end_ARG to the entity 𝐬𝐬\mathbf{s}bold_s, relation 𝐫𝐫\mathbf{r}bold_r and timestamp 𝐭𝐭\mathbf{t}bold_t through gating functions,

(3) 𝐬𝐪𝟐=(𝐬𝐪ˇ)[σ(𝐬𝐪ˇ)]𝐫𝐪𝟐=(𝐫𝐪ˇ)[σ(𝐫𝐪ˇ)]𝐭𝐪𝟐=(𝐭𝐪ˇ)[σ(𝐭𝐪ˇ)].subscript𝐬𝐪𝟐direct-product𝐬ˇ𝐪delimited-[]𝜎𝐬ˇ𝐪subscript𝐫𝐪𝟐direct-product𝐫ˇ𝐪delimited-[]𝜎𝐫ˇ𝐪subscript𝐭𝐪𝟐direct-product𝐭ˇ𝐪delimited-[]𝜎𝐭ˇ𝐪\begin{split}\mathbf{s_{q2}}&=(\mathbf{s}-\mathbf{\check{q}})\odot[\sigma(% \mathbf{s}-\mathbf{\check{q}})]\\ \mathbf{r_{q2}}&=(\mathbf{r}-\mathbf{\check{q}})\odot[\sigma(\mathbf{r}-% \mathbf{\check{q}})]\\ \mathbf{t_{q2}}&=(\mathbf{t}-\mathbf{\check{q}})\odot[\sigma(\mathbf{t}-% \mathbf{\check{q}})].\end{split}start_ROW start_CELL bold_s start_POSTSUBSCRIPT bold_q2 end_POSTSUBSCRIPT end_CELL start_CELL = ( bold_s - overroman_ˇ start_ARG bold_q end_ARG ) ⊙ [ italic_σ ( bold_s - overroman_ˇ start_ARG bold_q end_ARG ) ] end_CELL end_ROW start_ROW start_CELL bold_r start_POSTSUBSCRIPT bold_q2 end_POSTSUBSCRIPT end_CELL start_CELL = ( bold_r - overroman_ˇ start_ARG bold_q end_ARG ) ⊙ [ italic_σ ( bold_r - overroman_ˇ start_ARG bold_q end_ARG ) ] end_CELL end_ROW start_ROW start_CELL bold_t start_POSTSUBSCRIPT bold_q2 end_POSTSUBSCRIPT end_CELL start_CELL = ( bold_t - overroman_ˇ start_ARG bold_q end_ARG ) ⊙ [ italic_σ ( bold_t - overroman_ˇ start_ARG bold_q end_ARG ) ] . end_CELL end_ROW

Finally, we distribute the information of quadruplet distributor into the entities, relations, and timestamps,

(4) 𝐬ˇ=𝐬𝐬𝐪𝟐𝐫ˇ=𝐫𝐫𝐪𝟐𝐭ˇ=𝐭𝐭𝐪𝟐.ˇ𝐬direct-sum𝐬subscript𝐬𝐪𝟐ˇ𝐫direct-sum𝐫subscript𝐫𝐪𝟐ˇ𝐭direct-sum𝐭subscript𝐭𝐪𝟐\begin{split}\mathbf{\check{s}}&=\mathbf{s}\oplus\mathbf{s_{q2}}\\ \mathbf{\check{r}}&=\mathbf{r}\oplus\mathbf{r_{q2}}\\ \mathbf{\check{t}}&=\mathbf{t}\oplus\mathbf{t_{q2}}.\end{split}start_ROW start_CELL overroman_ˇ start_ARG bold_s end_ARG end_CELL start_CELL = bold_s ⊕ bold_s start_POSTSUBSCRIPT bold_q2 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL overroman_ˇ start_ARG bold_r end_ARG end_CELL start_CELL = bold_r ⊕ bold_r start_POSTSUBSCRIPT bold_q2 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL overroman_ˇ start_ARG bold_t end_ARG end_CELL start_CELL = bold_t ⊕ bold_t start_POSTSUBSCRIPT bold_q2 end_POSTSUBSCRIPT . end_CELL end_ROW

Through the above information aggregation and information distribution process, we can obtain updated representations of entities, relations, and timestamps 𝐬ˇˇ𝐬\mathbf{\check{s}}overroman_ˇ start_ARG bold_s end_ARG, 𝐫ˇˇ𝐫\mathbf{\check{r}}overroman_ˇ start_ARG bold_r end_ARG and 𝐭ˇˇ𝐭\mathbf{\check{t}}overroman_ˇ start_ARG bold_t end_ARG.

Similarly, we operate the above information aggregation and information distribution in multi-curvature spaces, including Euclidean, hyperbolic, and hyperspherical spaces. For each entity 𝐬𝐬\mathbf{s}bold_s, relation 𝐫𝐫\mathbf{r}bold_r, and timestamp 𝐭𝐭\mathbf{t}bold_t, we can obtain their features in three curvature spaces, namely 𝐬ˇ𝕄subscriptˇ𝐬𝕄\mathbf{\check{s}}_{\mathbb{M}}overroman_ˇ start_ARG bold_s end_ARG start_POSTSUBSCRIPT blackboard_M end_POSTSUBSCRIPT, 𝐫ˇ𝕄subscriptˇ𝐫𝕄\mathbf{\check{r}}_{\mathbb{M}}overroman_ˇ start_ARG bold_r end_ARG start_POSTSUBSCRIPT blackboard_M end_POSTSUBSCRIPT, and 𝐭ˇ𝕄subscriptˇ𝐭𝕄\mathbf{\check{t}}_{\mathbb{M}}overroman_ˇ start_ARG bold_t end_ARG start_POSTSUBSCRIPT blackboard_M end_POSTSUBSCRIPT (𝕄{𝕊,,𝔼}𝕄𝕊𝔼\mathbb{M}\in\{\mathbb{S},\mathbb{H},\mathbb{E}\}blackboard_M ∈ { blackboard_S , blackboard_H , blackboard_E }). Thus, we obtain nine features.

4.2. Space-Shared and -Specific Representations

In order to facilitate the learning of commonalities across different curvature spaces, and comprehensively capture the characteristic features unique to each curvature space, we employ encoding functions to capture both space-shared and space-specific properties. Given the updated representations 𝐡ˇ𝕄subscriptˇ𝐡𝕄\mathbf{\check{h}}_{\mathbb{M}}overroman_ˇ start_ARG bold_h end_ARG start_POSTSUBSCRIPT blackboard_M end_POSTSUBSCRIPT (𝐡{𝐬\mathbf{h}\in\{\mathbf{s}bold_h ∈ { bold_s, 𝐫𝐫\mathbf{r}bold_r, 𝐭}\mathbf{t}\}bold_t }, 𝕄{𝕊\mathbb{M}\in\{\mathbb{S}blackboard_M ∈ { blackboard_S, \mathbb{H}blackboard_H, 𝔼}\mathbb{E}\}blackboard_E }) of the entity, relation, and timestamp for different curvature spaces, we explore the gate attention mechanism to achieve the encoding functions.

Space-shared property focuses on recognizing commonalities across various curvature spaces to reduce spatial gaps among them. Specifically, it shares the parameters 𝐖Isubscript𝐖𝐼\mathbf{W}_{I}bold_W start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT in encoding function EI()subscript𝐸𝐼E_{I}(\cdot)italic_E start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT ( ⋅ ) to obtain the space-shared representations. The encoding process can be denoted as,

(5) EI(𝐡ˇ𝕊)=𝐡ˇ𝕊σ(𝐖I𝖳[𝐡ˇ𝕊,𝐡ˇ,𝐡ˇ𝔼])EI(𝐡ˇ)=𝐡ˇσ(𝐖I𝖳[𝐡ˇ𝕊,𝐡ˇ,𝐡ˇ𝔼])EI(𝐡ˇ𝔼)=𝐡ˇ𝔼σ(𝐖I𝖳[𝐡ˇ𝕊,𝐡ˇ,𝐡ˇ𝔼]),subscript𝐸𝐼subscriptˇ𝐡𝕊direct-productsubscriptˇ𝐡𝕊𝜎superscriptsubscript𝐖𝐼𝖳subscriptˇ𝐡𝕊subscriptˇ𝐡subscriptˇ𝐡𝔼subscript𝐸𝐼subscriptˇ𝐡direct-productsubscriptˇ𝐡𝜎superscriptsubscript𝐖𝐼𝖳subscriptˇ𝐡𝕊subscriptˇ𝐡subscriptˇ𝐡𝔼subscript𝐸𝐼subscriptˇ𝐡𝔼direct-productsubscriptˇ𝐡𝔼𝜎superscriptsubscript𝐖𝐼𝖳subscriptˇ𝐡𝕊subscriptˇ𝐡subscriptˇ𝐡𝔼\begin{split}E_{I}(\mathbf{\check{h}}_{\mathbb{S}})&=\mathbf{\check{h}}_{% \mathbb{S}}\odot\sigma(\mathbf{W}_{I}^{\mathsf{T}}[\mathbf{\check{h}}_{\mathbb% {S}},\ \mathbf{\check{h}}_{\mathbb{H}},\ \mathbf{\check{h}}_{\mathbb{E}}])\\ E_{I}(\mathbf{\check{h}}_{\mathbb{H}})&=\mathbf{\check{h}}_{\mathbb{H}}\odot% \sigma(\mathbf{W}_{I}^{\mathsf{T}}[\mathbf{\check{h}}_{\mathbb{S}},\ \mathbf{% \check{h}}_{\mathbb{H}},\ \mathbf{\check{h}}_{\mathbb{E}}])\\ E_{I}(\mathbf{\check{h}}_{\mathbb{E}})&=\mathbf{\check{h}}_{\mathbb{E}}\odot% \sigma(\mathbf{W}_{I}^{\mathsf{T}}[\mathbf{\check{h}}_{\mathbb{S}},\ \mathbf{% \check{h}}_{\mathbb{H}},\ \mathbf{\check{h}}_{\mathbb{E}}]),\\ \end{split}start_ROW start_CELL italic_E start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT ( overroman_ˇ start_ARG bold_h end_ARG start_POSTSUBSCRIPT blackboard_S end_POSTSUBSCRIPT ) end_CELL start_CELL = overroman_ˇ start_ARG bold_h end_ARG start_POSTSUBSCRIPT blackboard_S end_POSTSUBSCRIPT ⊙ italic_σ ( bold_W start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT [ overroman_ˇ start_ARG bold_h end_ARG start_POSTSUBSCRIPT blackboard_S end_POSTSUBSCRIPT , overroman_ˇ start_ARG bold_h end_ARG start_POSTSUBSCRIPT blackboard_H end_POSTSUBSCRIPT , overroman_ˇ start_ARG bold_h end_ARG start_POSTSUBSCRIPT blackboard_E end_POSTSUBSCRIPT ] ) end_CELL end_ROW start_ROW start_CELL italic_E start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT ( overroman_ˇ start_ARG bold_h end_ARG start_POSTSUBSCRIPT blackboard_H end_POSTSUBSCRIPT ) end_CELL start_CELL = overroman_ˇ start_ARG bold_h end_ARG start_POSTSUBSCRIPT blackboard_H end_POSTSUBSCRIPT ⊙ italic_σ ( bold_W start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT [ overroman_ˇ start_ARG bold_h end_ARG start_POSTSUBSCRIPT blackboard_S end_POSTSUBSCRIPT , overroman_ˇ start_ARG bold_h end_ARG start_POSTSUBSCRIPT blackboard_H end_POSTSUBSCRIPT , overroman_ˇ start_ARG bold_h end_ARG start_POSTSUBSCRIPT blackboard_E end_POSTSUBSCRIPT ] ) end_CELL end_ROW start_ROW start_CELL italic_E start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT ( overroman_ˇ start_ARG bold_h end_ARG start_POSTSUBSCRIPT blackboard_E end_POSTSUBSCRIPT ) end_CELL start_CELL = overroman_ˇ start_ARG bold_h end_ARG start_POSTSUBSCRIPT blackboard_E end_POSTSUBSCRIPT ⊙ italic_σ ( bold_W start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT [ overroman_ˇ start_ARG bold_h end_ARG start_POSTSUBSCRIPT blackboard_S end_POSTSUBSCRIPT , overroman_ˇ start_ARG bold_h end_ARG start_POSTSUBSCRIPT blackboard_H end_POSTSUBSCRIPT , overroman_ˇ start_ARG bold_h end_ARG start_POSTSUBSCRIPT blackboard_E end_POSTSUBSCRIPT ] ) , end_CELL end_ROW

where 𝐖Isubscript𝐖𝐼\mathbf{W}_{I}bold_W start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT is the shared parameter across all three curvature spaces, [,,][\cdot,\ \cdot,\ \cdot][ ⋅ , ⋅ , ⋅ ] represents the feature concatenation operation, direct-product\odot is the element-wise multiplication, σ𝜎\sigmaitalic_σ denotes the Sigmoid function. Thus, we can generate nine space-shared representations 𝐡𝕄Isubscriptsuperscript𝐡𝐼𝕄\mathbf{h}^{I}_{\mathbb{M}}bold_h start_POSTSUPERSCRIPT italic_I end_POSTSUPERSCRIPT start_POSTSUBSCRIPT blackboard_M end_POSTSUBSCRIPT (𝐡{𝐬\mathbf{h}\in\{\mathbf{s}bold_h ∈ { bold_s, 𝐫𝐫\mathbf{r}bold_r, 𝐭}\mathbf{t}\}bold_t }, 𝕄{𝕊\mathbb{M}\in\{\mathbb{S}blackboard_M ∈ { blackboard_S, \mathbb{H}blackboard_H, 𝔼}\mathbb{E}\}blackboard_E }) through the encoding functions EI(𝐡ˇ𝕄)subscript𝐸𝐼subscriptˇ𝐡𝕄E_{I}(\mathbf{\check{h}}_{\mathbb{M}})italic_E start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT ( overroman_ˇ start_ARG bold_h end_ARG start_POSTSUBSCRIPT blackboard_M end_POSTSUBSCRIPT ).

Space-specific property comprehensively captures the characteristic features unique to each curvature space. Similarly, it employs the encoding function ES()subscript𝐸𝑆E_{S}(\cdot)italic_E start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ( ⋅ ) to obtain the space-specific representations,

(6) ES(𝐡ˇ𝕊)=𝐡ˇ𝕊σ(𝐖S1𝖳[𝐡ˇ𝕊,𝐡ˇ,𝐡ˇ𝔼])ES(𝐡ˇ)=𝐡ˇσ(𝐖S2𝖳[𝐡ˇ𝕊,𝐡ˇ,𝐡ˇ𝔼])ES(𝐡ˇ𝔼)=𝐡ˇ𝔼σ(𝐖S3𝖳[𝐡ˇ𝕊,𝐡ˇ,𝐡ˇ𝔼]),subscript𝐸𝑆subscriptˇ𝐡𝕊direct-productsubscriptˇ𝐡𝕊𝜎superscriptsubscript𝐖𝑆1𝖳subscriptˇ𝐡𝕊subscriptˇ𝐡subscriptˇ𝐡𝔼subscript𝐸𝑆subscriptˇ𝐡direct-productsubscriptˇ𝐡𝜎superscriptsubscript𝐖𝑆2𝖳subscriptˇ𝐡𝕊subscriptˇ𝐡subscriptˇ𝐡𝔼subscript𝐸𝑆subscriptˇ𝐡𝔼direct-productsubscriptˇ𝐡𝔼𝜎superscriptsubscript𝐖𝑆3𝖳subscriptˇ𝐡𝕊subscriptˇ𝐡subscriptˇ𝐡𝔼\begin{split}E_{S}(\mathbf{\check{h}}_{\mathbb{S}})&=\mathbf{\check{h}}_{% \mathbb{S}}\odot\sigma(\mathbf{W}_{S1}^{\mathsf{T}}[\mathbf{\check{h}}_{% \mathbb{S}},\ \mathbf{\check{h}}_{\mathbb{H}},\ \mathbf{\check{h}}_{\mathbb{E}% }])\\ E_{S}(\mathbf{\check{h}}_{\mathbb{H}})&=\mathbf{\check{h}}_{\mathbb{H}}\odot% \sigma(\mathbf{W}_{S2}^{\mathsf{T}}[\mathbf{\check{h}}_{\mathbb{S}},\ \mathbf{% \check{h}}_{\mathbb{H}},\ \mathbf{\check{h}}_{\mathbb{E}}])\\ E_{S}(\mathbf{\check{h}}_{\mathbb{E}})&=\mathbf{\check{h}}_{\mathbb{E}}\odot% \sigma(\mathbf{W}_{S3}^{\mathsf{T}}[\mathbf{\check{h}}_{\mathbb{S}},\ \mathbf{% \check{h}}_{\mathbb{H}},\ \mathbf{\check{h}}_{\mathbb{E}}]),\\ \end{split}start_ROW start_CELL italic_E start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ( overroman_ˇ start_ARG bold_h end_ARG start_POSTSUBSCRIPT blackboard_S end_POSTSUBSCRIPT ) end_CELL start_CELL = overroman_ˇ start_ARG bold_h end_ARG start_POSTSUBSCRIPT blackboard_S end_POSTSUBSCRIPT ⊙ italic_σ ( bold_W start_POSTSUBSCRIPT italic_S 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT [ overroman_ˇ start_ARG bold_h end_ARG start_POSTSUBSCRIPT blackboard_S end_POSTSUBSCRIPT , overroman_ˇ start_ARG bold_h end_ARG start_POSTSUBSCRIPT blackboard_H end_POSTSUBSCRIPT , overroman_ˇ start_ARG bold_h end_ARG start_POSTSUBSCRIPT blackboard_E end_POSTSUBSCRIPT ] ) end_CELL end_ROW start_ROW start_CELL italic_E start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ( overroman_ˇ start_ARG bold_h end_ARG start_POSTSUBSCRIPT blackboard_H end_POSTSUBSCRIPT ) end_CELL start_CELL = overroman_ˇ start_ARG bold_h end_ARG start_POSTSUBSCRIPT blackboard_H end_POSTSUBSCRIPT ⊙ italic_σ ( bold_W start_POSTSUBSCRIPT italic_S 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT [ overroman_ˇ start_ARG bold_h end_ARG start_POSTSUBSCRIPT blackboard_S end_POSTSUBSCRIPT , overroman_ˇ start_ARG bold_h end_ARG start_POSTSUBSCRIPT blackboard_H end_POSTSUBSCRIPT , overroman_ˇ start_ARG bold_h end_ARG start_POSTSUBSCRIPT blackboard_E end_POSTSUBSCRIPT ] ) end_CELL end_ROW start_ROW start_CELL italic_E start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ( overroman_ˇ start_ARG bold_h end_ARG start_POSTSUBSCRIPT blackboard_E end_POSTSUBSCRIPT ) end_CELL start_CELL = overroman_ˇ start_ARG bold_h end_ARG start_POSTSUBSCRIPT blackboard_E end_POSTSUBSCRIPT ⊙ italic_σ ( bold_W start_POSTSUBSCRIPT italic_S 3 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT [ overroman_ˇ start_ARG bold_h end_ARG start_POSTSUBSCRIPT blackboard_S end_POSTSUBSCRIPT , overroman_ˇ start_ARG bold_h end_ARG start_POSTSUBSCRIPT blackboard_H end_POSTSUBSCRIPT , overroman_ˇ start_ARG bold_h end_ARG start_POSTSUBSCRIPT blackboard_E end_POSTSUBSCRIPT ] ) , end_CELL end_ROW

where 𝐖S1subscript𝐖𝑆1\mathbf{W}_{S1}bold_W start_POSTSUBSCRIPT italic_S 1 end_POSTSUBSCRIPT, 𝐖S2subscript𝐖𝑆2\mathbf{W}_{S2}bold_W start_POSTSUBSCRIPT italic_S 2 end_POSTSUBSCRIPT and 𝐖S3subscript𝐖𝑆3\mathbf{W}_{S3}bold_W start_POSTSUBSCRIPT italic_S 3 end_POSTSUBSCRIPT are the specific parameters unique to each curvature space. Similar to the space-shared property, we can generate nine space-specific representations 𝐡𝕄Ssubscriptsuperscript𝐡𝑆𝕄\mathbf{h}^{S}_{\mathbb{M}}bold_h start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT start_POSTSUBSCRIPT blackboard_M end_POSTSUBSCRIPT (𝐡{𝐬\mathbf{h}\in\{\mathbf{s}bold_h ∈ { bold_s, 𝐫𝐫\mathbf{r}bold_r, 𝐭}\mathbf{t}\}bold_t }, 𝕄{𝕊\mathbb{M}\in\{\mathbb{S}blackboard_M ∈ { blackboard_S, \mathbb{H}blackboard_H, 𝔼}\mathbb{E}\}blackboard_E }) through the encoding functions ES(𝐡ˇ𝕄)subscript𝐸𝑆subscriptˇ𝐡𝕄E_{S}(\mathbf{\check{h}}_{\mathbb{M}})italic_E start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ( overroman_ˇ start_ARG bold_h end_ARG start_POSTSUBSCRIPT blackboard_M end_POSTSUBSCRIPT ).

Through the above encoding functions EI()subscript𝐸𝐼E_{I}(\cdot)italic_E start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT ( ⋅ ) and ES()subscript𝐸𝑆E_{S}(\cdot)italic_E start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ( ⋅ ), we can generate eighteen space-shared and -specific vectors 𝐡𝕊//𝔼I/Ssuperscriptsubscript𝐡𝕊𝔼𝐼𝑆\mathbf{h}_{\mathbb{S}/\mathbb{H}/\mathbb{E}}^{I/S}bold_h start_POSTSUBSCRIPT blackboard_S / blackboard_H / blackboard_E end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_I / italic_S end_POSTSUPERSCRIPT (𝐡{𝐬\mathbf{h}\in\{\mathbf{s}bold_h ∈ { bold_s, 𝐫𝐫\mathbf{r}bold_r, 𝐭}\mathbf{t}\}bold_t }).

Refer to caption
Figure 3. Comparison of different pooling approaches.

4.3. Adjustable Multi-curvature Pooling

After obtaining the space-shared and -specific representations of entities, relations, and timestamps, the pooling approach is employed to aggregate them into a joint vector for downstream predictions. We first introduce two simple pooling approaches: Average Pooling (AP) and Max Pooling (MP). Then we introduce the proposed Adjustable Multi-curvature Pooling (AMP) approach.

As shown in Figure 3, for n𝑛nitalic_n input features 𝐗={𝐱1,𝐱2,,𝐱n}𝐗subscript𝐱1subscript𝐱2subscript𝐱𝑛\mathbf{X}=\{\mathbf{x}_{1},\ \mathbf{x}_{2},\ \cdots,\ \mathbf{x}_{n}\}bold_X = { bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , ⋯ , bold_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT }, 𝐱idxsubscript𝐱𝑖superscriptsubscript𝑑𝑥\mathbf{x}_{i}\in\mathcal{R}^{d_{x}}bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ caligraphic_R start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, we first sort each dimension of n𝑛nitalic_n features to extract the significant information, obtaining the sorted features 𝐌={𝐦𝐚𝐱1,𝐦𝐚𝐱2,,𝐦𝐚𝐱n},𝐦𝐚𝐱idxformulae-sequence𝐌subscript𝐦𝐚𝐱1subscript𝐦𝐚𝐱2subscript𝐦𝐚𝐱𝑛subscript𝐦𝐚𝐱𝑖superscriptsubscript𝑑𝑥\mathbf{M}=\{\mathbf{max}_{1},\ \mathbf{max}_{2},\ \cdots,\ \mathbf{max}_{n}\}% ,\mathbf{max}_{i}\in\mathcal{R}^{d_{x}}bold_M = { bold_max start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_max start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , ⋯ , bold_max start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } , bold_max start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ caligraphic_R start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT end_POSTSUPERSCRIPT. Then we introduce the pooling weights Ψ={ψ1,ψ2,,ψn},ψi1formulae-sequenceΨsubscript𝜓1subscript𝜓2subscript𝜓𝑛subscript𝜓𝑖superscript1\Psi=\{\psi_{1},\psi_{2},\ \cdots,\psi_{n}\},\psi_{i}\in\mathcal{R}^{1}roman_Ψ = { italic_ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_ψ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , ⋯ , italic_ψ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } , italic_ψ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ caligraphic_R start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT, which are used to perform a weighted sum over 𝐌𝐌\mathbf{M}bold_M to get pooling feature 𝐱pdxsubscript𝐱𝑝superscriptsubscript𝑑𝑥\mathbf{x}_{p}\in\mathcal{R}^{d_{x}}bold_x start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ∈ caligraphic_R start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT end_POSTSUPERSCRIPT,

(7) 𝐱p=i=1nψi𝐦𝐚𝐱i.subscript𝐱𝑝superscriptsubscript𝑖1𝑛subscript𝜓𝑖subscript𝐦𝐚𝐱𝑖\mathbf{x}_{p}=\sum_{i=1}^{n}{\psi}_{i}\cdot\mathbf{max}_{i}.bold_x start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⋅ bold_max start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT .

Average Pooling sets all pooling weights ψisubscript𝜓𝑖{\psi}_{i}italic_ψ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT to 1n1𝑛\frac{1}{n}divide start_ARG 1 end_ARG start_ARG italic_n end_ARG to get pooling feature 𝐱apdxsubscript𝐱𝑎𝑝superscriptsubscript𝑑𝑥\mathbf{x}_{ap}\in\mathcal{R}^{d_{x}}bold_x start_POSTSUBSCRIPT italic_a italic_p end_POSTSUBSCRIPT ∈ caligraphic_R start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT end_POSTSUPERSCRIPT,

(8) 𝐱ap=i=1n1n𝐦𝐚𝐱i.subscript𝐱𝑎𝑝superscriptsubscript𝑖1𝑛1𝑛subscript𝐦𝐚𝐱𝑖\mathbf{x}_{ap}=\sum_{i=1}^{n}\frac{1}{n}\cdot\mathbf{max}_{i}.bold_x start_POSTSUBSCRIPT italic_a italic_p end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ⋅ bold_max start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT .

Max Pooling sets the first pooling weight ψ1subscript𝜓1{\psi}_{1}italic_ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT to 1111 and the others ψi,i1subscript𝜓𝑖𝑖1{\psi}_{i},i\neq 1italic_ψ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_i ≠ 1 to 00 to get pooling feature 𝐱mpdxsubscript𝐱𝑚𝑝superscriptsubscript𝑑𝑥\mathbf{x}_{mp}\in\mathcal{R}^{d_{x}}bold_x start_POSTSUBSCRIPT italic_m italic_p end_POSTSUBSCRIPT ∈ caligraphic_R start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT end_POSTSUPERSCRIPT,

(9) 𝐱mp=𝐦𝐚𝐱1.subscript𝐱𝑚𝑝subscript𝐦𝐚𝐱1\mathbf{x}_{mp}=\mathbf{max}_{1}.bold_x start_POSTSUBSCRIPT italic_m italic_p end_POSTSUBSCRIPT = bold_max start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT .

However, the aforementioned two pooling approaches rely on fixed pooling strategies, posing a challenge in ensuring the effective retention of important information.

Adjustable Multi-curvature Pooling automatically adjusts pooling weights to obtain a superior pooling strategy, effectively retaining important information. To learn appropriate pooling weights ΨΨ\Psiroman_Ψ for the different positions of 𝐌𝐌\mathbf{M}bold_M, i.e. 𝐦𝐚𝐱isubscript𝐦𝐚𝐱𝑖\mathbf{max}_{i}bold_max start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, we first utilize the positional encoding strategy in (Vaswani et al., 2017; Chen et al., 2021) to get positional encoding 𝐏={𝐩1,𝐩2,,𝐩n},𝐩idpformulae-sequence𝐏subscript𝐩1subscript𝐩2subscript𝐩𝑛subscript𝐩𝑖superscriptsubscript𝑑𝑝\mathbf{P}=\{\mathbf{p}_{1},\ \mathbf{p}_{2},\ \cdots,\ \mathbf{p}_{n}\},\ % \mathbf{p}_{i}\in\mathcal{R}^{d_{p}}bold_P = { bold_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , ⋯ , bold_p start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } , bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ caligraphic_R start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_POSTSUPERSCRIPT. This positional encoding 𝐏𝐏\mathbf{P}bold_P contains prior information between position indices, and can be formulated as follows,

(10) 𝐩i(2k)=sin(i100002k/dp)𝐩i(2k+1)=cos(i100002k/dp),subscript𝐩𝑖2𝑘𝑠𝑖𝑛𝑖superscript100002𝑘subscript𝑑𝑝subscript𝐩𝑖2𝑘1𝑐𝑜𝑠𝑖superscript100002𝑘subscript𝑑𝑝\begin{split}\mathbf{p}_{i}(2k)&=sin(\frac{i}{10000^{2k/d_{p}}})\\ \mathbf{p}_{i}(2k+1)&=cos(\frac{i}{10000^{2k/d_{p}}}),\end{split}start_ROW start_CELL bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( 2 italic_k ) end_CELL start_CELL = italic_s italic_i italic_n ( divide start_ARG italic_i end_ARG start_ARG 10000 start_POSTSUPERSCRIPT 2 italic_k / italic_d start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG ) end_CELL end_ROW start_ROW start_CELL bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( 2 italic_k + 1 ) end_CELL start_CELL = italic_c italic_o italic_s ( divide start_ARG italic_i end_ARG start_ARG 10000 start_POSTSUPERSCRIPT 2 italic_k / italic_d start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG ) , end_CELL end_ROW

where k𝑘kitalic_k indicates the dimension. Then we regard the sequence of positional encoding 𝐏𝐏\mathbf{P}bold_P as input and utilize Bi-GRU (Schuster and Paliwal, 1997) and Multi-Layer Perceptron (MLP) to obtain pooling weights ΨΨ\Psiroman_Ψ,

(11) Ψ=MLP(Bi-GRU(𝐏)).ΨMLPBi-GRU𝐏\Psi=\text{MLP}(\text{Bi-GRU}(\mathbf{P})).roman_Ψ = MLP ( Bi-GRU ( bold_P ) ) .

Further, ΨΨ\Psiroman_Ψ is normalized as follows,

(12) ψi=exp(ψi)j=1nexp(ψj).subscript𝜓𝑖expsubscript𝜓𝑖superscriptsubscript𝑗1𝑛expsubscript𝜓𝑗\psi_{i}=\frac{\text{exp}(\psi_{i})}{\sum_{j=1}^{n}\text{exp}(\psi_{j})}.italic_ψ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = divide start_ARG exp ( italic_ψ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT exp ( italic_ψ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) end_ARG .

Based on the learned pooling weights, we get the pooling feature 𝐱ampdxsubscript𝐱𝑎𝑚𝑝superscriptsubscript𝑑𝑥\mathbf{x}_{amp}\in\mathcal{R}^{d_{x}}bold_x start_POSTSUBSCRIPT italic_a italic_m italic_p end_POSTSUBSCRIPT ∈ caligraphic_R start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT end_POSTSUPERSCRIPT,

(13) 𝐱amp=i=1nψi𝐦𝐚𝐱i.subscript𝐱𝑎𝑚𝑝superscriptsubscript𝑖1𝑛subscript𝜓𝑖subscript𝐦𝐚𝐱𝑖\mathbf{x}_{amp}=\sum_{i=1}^{n}{\psi}_{i}\cdot\mathbf{max}_{i}.bold_x start_POSTSUBSCRIPT italic_a italic_m italic_p end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⋅ bold_max start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT .

According to (10), (11), (12) and (13), the entire calculation process of AMP can be integrated as follows,

(14) 𝐱amp=AMP(𝐗,θ),subscript𝐱𝑎𝑚𝑝AMP𝐗𝜃\mathbf{x}_{amp}=\text{AMP}(\mathbf{X},\ \theta),bold_x start_POSTSUBSCRIPT italic_a italic_m italic_p end_POSTSUBSCRIPT = AMP ( bold_X , italic_θ ) ,

where θ𝜃\thetaitalic_θ indicates all the learnable parameters.

Table 1. Statistic information of whole datasets.
Datasets #Entities #Relations #Timestamps #Time Span #Granularity #Training #Validation #Test
ICEWS14 6,869 230 365 2014 1 day 72,826 8,941 8,963
ICEWS05-15 10,094 251 4,017 2005-2015 1 day 368,962 46,275 46,092
GDELT 500 20 366 2015-2016 1 day 2,735,685 341,961 341,961

Pooling Procedure. We concatenate the space-shared and space-specific representations of the entity, relation and timestamp into 𝐇={𝐡𝕊I,𝐡I,𝐡𝔼I,𝐡𝕊S,𝐡S,𝐡𝔼S}𝐇superscriptsubscript𝐡𝕊𝐼superscriptsubscript𝐡𝐼superscriptsubscript𝐡𝔼𝐼superscriptsubscript𝐡𝕊𝑆superscriptsubscript𝐡𝑆superscriptsubscript𝐡𝔼𝑆\mathbf{H}=\{\mathbf{h}_{\mathbb{S}}^{I},\ \mathbf{h}_{\mathbb{H}}^{I},\ % \mathbf{h}_{\mathbb{E}}^{I},\ \mathbf{h}_{\mathbb{S}}^{S},\ \mathbf{h}_{% \mathbb{H}}^{S},\ \mathbf{h}_{\mathbb{E}}^{S}\}bold_H = { bold_h start_POSTSUBSCRIPT blackboard_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_I end_POSTSUPERSCRIPT , bold_h start_POSTSUBSCRIPT blackboard_H end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_I end_POSTSUPERSCRIPT , bold_h start_POSTSUBSCRIPT blackboard_E end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_I end_POSTSUPERSCRIPT , bold_h start_POSTSUBSCRIPT blackboard_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT , bold_h start_POSTSUBSCRIPT blackboard_H end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT , bold_h start_POSTSUBSCRIPT blackboard_E end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT } (𝐡{𝐬\mathbf{h}\in\{\mathbf{s}bold_h ∈ { bold_s, 𝐫𝐫\mathbf{r}bold_r, 𝐭}\mathbf{t}\}bold_t }). Subsequently, we employ the AMP approach to effectively retain important information among entities, relations, and timestamps, and the score function can be defined as follows,

(15) f(𝐬,𝐫,𝐨,𝐭)=AMP(𝐇,θ),𝐨,𝑓𝐬𝐫𝐨𝐭AMP𝐇𝜃𝐨f(\mathbf{s},\ \mathbf{r},\ \mathbf{o},\ \mathbf{t})=\langle\text{AMP}(\mathbf% {H},\ \theta),\ \mathbf{o}\rangle,italic_f ( bold_s , bold_r , bold_o , bold_t ) = ⟨ AMP ( bold_H , italic_θ ) , bold_o ⟩ ,

where ,\langle\cdot,\cdot\rangle⟨ ⋅ , ⋅ ⟩ represents the inner product operation.

4.4. Loss Function

In this section, we propose the overall loss of the proposed model IME as follows,

(16) =task+αsim+βdiff+γstru,subscript𝑡𝑎𝑠𝑘𝛼subscript𝑠𝑖𝑚𝛽subscript𝑑𝑖𝑓𝑓𝛾subscript𝑠𝑡𝑟𝑢\mathcal{L}=\mathcal{L}_{task}+\alpha\mathcal{L}_{sim}+\beta\mathcal{L}_{diff}% +\gamma\mathcal{L}_{stru},caligraphic_L = caligraphic_L start_POSTSUBSCRIPT italic_t italic_a italic_s italic_k end_POSTSUBSCRIPT + italic_α caligraphic_L start_POSTSUBSCRIPT italic_s italic_i italic_m end_POSTSUBSCRIPT + italic_β caligraphic_L start_POSTSUBSCRIPT italic_d italic_i italic_f italic_f end_POSTSUBSCRIPT + italic_γ caligraphic_L start_POSTSUBSCRIPT italic_s italic_t italic_r italic_u end_POSTSUBSCRIPT ,

where α,β,γ𝛼𝛽𝛾\alpha,\beta,\gammaitalic_α , italic_β , italic_γ are the hyper-parameters. Each component within the loss is responsible for achieving the desired properties.

Task Loss. Following the strategy in (Xu et al., 2021), we explore the cross-entropy and standard data augmentation protocol to achieve the multi-class task,

(17) task=log(exp(f(𝐬,𝐫,𝐨,𝐭))𝐬exp(f(𝐬,𝐫,𝐨,𝐭)))log(exp(f(𝐨,𝐫1,𝐬,𝐭))𝐨exp(f(𝐨,𝐫1,𝐬,𝐭))).subscript𝑡𝑎𝑠𝑘logexp𝑓𝐬𝐫𝐨𝐭subscriptsuperscript𝐬exp𝑓superscript𝐬𝐫𝐨𝐭logexp𝑓𝐨superscript𝐫1𝐬𝐭subscriptsuperscript𝐨exp𝑓superscript𝐨superscript𝐫1𝐬𝐭\begin{split}\mathcal{L}_{task}=&-\text{\text{log}}(\frac{\text{\text{exp}}(f(% \mathbf{s},\ \mathbf{r},\ \mathbf{o},\ \mathbf{t}))}{\sum_{\mathbf{s}^{\prime}% \in\mathcal{E}}\text{\text{exp}}(f(\mathbf{s}^{\prime},\ \mathbf{r},\ \mathbf{% o},\ \mathbf{t}))})\\ &-\text{\text{log}}(\frac{\text{\text{exp}}(f(\mathbf{o},\ \mathbf{r}^{-1},\ % \mathbf{s},\ \mathbf{t}))}{\sum_{\mathbf{o}^{\prime}\in\mathcal{E}}\text{\text% {exp}}(f(\mathbf{o}^{\prime},\ \mathbf{r}^{-1},\ \mathbf{s},\ \mathbf{t}))}).% \\ \end{split}start_ROW start_CELL caligraphic_L start_POSTSUBSCRIPT italic_t italic_a italic_s italic_k end_POSTSUBSCRIPT = end_CELL start_CELL - log ( divide start_ARG exp ( italic_f ( bold_s , bold_r , bold_o , bold_t ) ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT bold_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_E end_POSTSUBSCRIPT exp ( italic_f ( bold_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_r , bold_o , bold_t ) ) end_ARG ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL - log ( divide start_ARG exp ( italic_f ( bold_o , bold_r start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT , bold_s , bold_t ) ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT bold_o start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_E end_POSTSUBSCRIPT exp ( italic_f ( bold_o start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_r start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT , bold_s , bold_t ) ) end_ARG ) . end_CELL end_ROW

Similarity Loss. The purpose of the similarity loss is to minimize the disparities among shared features across different curvature spaces, aiming to bridge spatial gaps among them. Specifically, Central Moment Discrepancy (CMD) (Zellinger et al., 2017) is a distance metric employed to evaluate the similarity between two distributions by quantifying the discrepancy in their central moments. A smaller CMD value indicates a higher similarity between the two distributions. Let X𝑋Xitalic_X and Y𝑌Yitalic_Y be bounded independent and identically distributed random vectors from two probability distributions, p𝑝pitalic_p and q𝑞qitalic_q, defined on the interval [a,b]𝑎𝑏[a,b][ italic_a , italic_b ]. The CMD can be defined as,

(18) CMD(X,Y)CMD𝑋𝑌\displaystyle\text{CMD}(X,Y)CMD ( italic_X , italic_Y ) =1|ba|𝐄(X)𝐄(Y)2absent1𝑏𝑎subscriptnorm𝐄𝑋𝐄𝑌2\displaystyle=\frac{1}{|b-a|}\parallel\mathbf{E}(X)-\mathbf{E}(Y)\parallel_{2}= divide start_ARG 1 end_ARG start_ARG | italic_b - italic_a | end_ARG ∥ bold_E ( italic_X ) - bold_E ( italic_Y ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT
+k=21|ba|kck(X)ck(Y)2,superscriptsubscript𝑘21superscript𝑏𝑎𝑘subscriptnormsubscript𝑐𝑘𝑋subscript𝑐𝑘𝑌2\displaystyle+\sum_{k=2}^{\infty}\frac{1}{|b-a|^{k}}\parallel c_{k}(X)-c_{k}(Y% )\parallel_{2},+ ∑ start_POSTSUBSCRIPT italic_k = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG | italic_b - italic_a | start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT end_ARG ∥ italic_c start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_X ) - italic_c start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_Y ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ,

where 𝐄(X)𝐄𝑋\mathbf{E}(X)bold_E ( italic_X ) is the expectation of X𝑋Xitalic_X, and ck(x)=𝐄((X𝐄(X))k)subscript𝑐𝑘𝑥𝐄superscript𝑋𝐄𝑋𝑘c_{k}(x)=\mathbf{E}((X-\mathbf{E}(X))^{k})italic_c start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x ) = bold_E ( ( italic_X - bold_E ( italic_X ) ) start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) is the central moment vector of order k𝑘kitalic_k.

In our case, we calculate the similarity loss through CMD,

(19) sim=13(𝕄1,𝕄2)CMD(𝐡𝕄1S,𝐡𝕄2S)subscript𝑠𝑖𝑚13subscriptsubscript𝕄1subscript𝕄2CMDsuperscriptsubscript𝐡subscript𝕄1𝑆superscriptsubscript𝐡subscript𝕄2𝑆\mathcal{L}_{sim}=\frac{1}{3}\sum_{(\mathbb{M}_{1},\mathbb{M}_{2})}\text{CMD}(% \mathbf{h}_{\mathbb{M}_{1}}^{S},\mathbf{h}_{\mathbb{M}_{2}}^{S})caligraphic_L start_POSTSUBSCRIPT italic_s italic_i italic_m end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG 3 end_ARG ∑ start_POSTSUBSCRIPT ( blackboard_M start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , blackboard_M start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT CMD ( bold_h start_POSTSUBSCRIPT blackboard_M start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT , bold_h start_POSTSUBSCRIPT blackboard_M start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT )

where (𝕄1,𝕄2){(𝔼,),(𝔼,𝕊),(,𝕊)}subscript𝕄1subscript𝕄2𝔼𝔼𝕊𝕊(\mathbb{M}_{1},\mathbb{M}_{2})\in\{(\mathbb{E},\mathbb{H}),(\mathbb{E},% \mathbb{S}),(\mathbb{H},\mathbb{S})\}( blackboard_M start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , blackboard_M start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∈ { ( blackboard_E , blackboard_H ) , ( blackboard_E , blackboard_S ) , ( blackboard_H , blackboard_S ) }.

Difference Loss. The difference loss is designed to capture characteristic features of different curvature spaces through a similarity function. Specifically, we not only impose the soft orthogonality constraint between the shared and specific features but also between the space-specific features. The difference loss is calculated as:

(20) diff=𝕄(𝐡𝕄S)T𝐡𝕄IF2+(𝕄1,𝕄2)(𝐡𝕄1S)T𝐡𝕄2SF2,subscript𝑑𝑖𝑓𝑓subscript𝕄superscriptsubscriptnormsuperscriptsuperscriptsubscript𝐡𝕄𝑆𝑇superscriptsubscript𝐡𝕄𝐼𝐹2subscriptsubscript𝕄1subscript𝕄2superscriptsubscriptnormsuperscriptsuperscriptsubscript𝐡subscript𝕄1𝑆𝑇superscriptsubscript𝐡subscript𝕄2𝑆𝐹2\mathcal{L}_{diff}=\sum_{\mathbb{M}}\parallel({\mathbf{h}_{\mathbb{M}}^{S}})^{% T}\mathbf{h}_{\mathbb{M}}^{I}\parallel_{F}^{2}+\sum_{(\mathbb{M}_{1},\mathbb{M% }_{2})}\parallel({\mathbf{h}_{\mathbb{M}_{1}}^{S}})^{T}\mathbf{h}_{\mathbb{M}_% {2}}^{S}\parallel_{F}^{2},caligraphic_L start_POSTSUBSCRIPT italic_d italic_i italic_f italic_f end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT blackboard_M end_POSTSUBSCRIPT ∥ ( bold_h start_POSTSUBSCRIPT blackboard_M end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_h start_POSTSUBSCRIPT blackboard_M end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_I end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∑ start_POSTSUBSCRIPT ( blackboard_M start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , blackboard_M start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ∥ ( bold_h start_POSTSUBSCRIPT blackboard_M start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_h start_POSTSUBSCRIPT blackboard_M start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,

where 𝕄{𝕊,,𝔼}𝕄𝕊𝔼\mathbb{M}\in\{\mathbb{S},\mathbb{H},\mathbb{E}\}blackboard_M ∈ { blackboard_S , blackboard_H , blackboard_E }, (𝕄1,𝕄2){(𝔼,),(𝔼,𝕊),(,𝕊)}subscript𝕄1subscript𝕄2𝔼𝔼𝕊𝕊(\mathbb{M}_{1},\mathbb{M}_{2})\in\{(\mathbb{E},\mathbb{H}),(\mathbb{E},% \mathbb{S}),(\mathbb{H},\mathbb{S})\}( blackboard_M start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , blackboard_M start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∈ { ( blackboard_E , blackboard_H ) , ( blackboard_E , blackboard_S ) , ( blackboard_H , blackboard_S ) }, F2\parallel\cdot\parallel_{F}^{2}∥ ⋅ ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT is the squared Frobenius norm.

Structure Loss. The structure loss (Gao et al., 2022) aims to ensure the structural similarity of quadruplets across various curvature spaces. Specifically, we define the relation on a triplet of samples (𝐱a,𝐱b,𝐱c)subscript𝐱𝑎subscript𝐱𝑏subscript𝐱𝑐(\mathbf{x}_{a},\mathbf{x}_{b},\mathbf{x}_{c})( bold_x start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT , bold_x start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT , bold_x start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) as the following cosine value:

(21) cos𝐫a𝐫b𝐫c=𝐞ab,𝐞cbwhere𝐞ij=𝐫i𝐫j𝐫i𝐫j2formulae-sequencecossubscript𝐫𝑎subscript𝐫𝑏subscript𝐫𝑐superscript𝐞𝑎𝑏superscript𝐞𝑐𝑏wheresuperscript𝐞𝑖𝑗subscript𝐫𝑖subscript𝐫𝑗subscriptnormsubscript𝐫𝑖subscript𝐫𝑗2\text{cos}\angle\mathbf{r}_{a}\mathbf{r}_{b}\mathbf{r}_{c}=\langle\mathbf{e}^{% ab},\ \mathbf{e}^{cb}\rangle\quad\text{where}\quad\mathbf{e}^{ij}=\frac{% \mathbf{r}_{i}-\mathbf{r}_{j}}{\parallel\mathbf{r}_{i}-\mathbf{r}_{j}\parallel% _{2}}cos ∠ bold_r start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT bold_r start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT bold_r start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT = ⟨ bold_e start_POSTSUPERSCRIPT italic_a italic_b end_POSTSUPERSCRIPT , bold_e start_POSTSUPERSCRIPT italic_c italic_b end_POSTSUPERSCRIPT ⟩ where bold_e start_POSTSUPERSCRIPT italic_i italic_j end_POSTSUPERSCRIPT = divide start_ARG bold_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG start_ARG ∥ bold_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG

where 𝐫subscript𝐫\mathbf{r}_{*}bold_r start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT is sample. Thus, the structure loss can be calculated as,

(22) stru=13(𝕄1,𝕄2)cos𝕄1a𝕄1b𝕄1ccos𝕄2a𝕄2b𝕄2c1,subscript𝑠𝑡𝑟𝑢13subscriptsubscript𝕄1subscript𝕄2subscriptnormcossubscript𝕄subscript1𝑎subscript𝕄subscript1𝑏subscript𝕄subscript1𝑐cossubscript𝕄subscript2𝑎subscript𝕄subscript2𝑏subscript𝕄subscript2𝑐1\mathcal{L}_{stru}=\frac{1}{3}\sum_{(\mathbb{M}_{1},\mathbb{M}_{2})}\parallel% \text{cos}\angle\mathbb{M}_{1_{a}}\mathbb{M}_{1_{b}}\mathbb{M}_{1_{c}}-\text{% cos}\angle\mathbb{M}_{2_{a}}\mathbb{M}_{2_{b}}\mathbb{M}_{2_{c}}\parallel_{1},caligraphic_L start_POSTSUBSCRIPT italic_s italic_t italic_r italic_u end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG 3 end_ARG ∑ start_POSTSUBSCRIPT ( blackboard_M start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , blackboard_M start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ∥ cos ∠ blackboard_M start_POSTSUBSCRIPT 1 start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_POSTSUBSCRIPT blackboard_M start_POSTSUBSCRIPT 1 start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT end_POSTSUBSCRIPT blackboard_M start_POSTSUBSCRIPT 1 start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_POSTSUBSCRIPT - cos ∠ blackboard_M start_POSTSUBSCRIPT 2 start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_POSTSUBSCRIPT blackboard_M start_POSTSUBSCRIPT 2 start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT end_POSTSUBSCRIPT blackboard_M start_POSTSUBSCRIPT 2 start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ,

where (𝕄1,𝕄2){(𝔼,),(𝔼,𝕊),(,𝕊)}subscript𝕄1subscript𝕄2𝔼𝔼𝕊𝕊(\mathbb{M}_{1},\mathbb{M}_{2})\in\{(\mathbb{E},\mathbb{H}),(\mathbb{E},% \mathbb{S}),(\mathbb{H},\mathbb{S})\}( blackboard_M start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , blackboard_M start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∈ { ( blackboard_E , blackboard_H ) , ( blackboard_E , blackboard_S ) , ( blackboard_H , blackboard_S ) }.

Table 2. Link prediction results on ICEWS14, ICEWS05-15, and GDELT datasets. The best results are in bold and the second results are underlined. - means the result is unavailable.
Datasets ICEWS14 ICEWS05-15 GDELT
Metrics MRR Hit@1 Hit@3 Hit@10 MRR Hit@1 Hit@3 Hit@10 MRR Hit@1 Hit@3 Hit@10
TransE (2013) 0.280 0.094 0.637 0.294 0.090 0.663 0.113 0.0 0.158 0.312
DistMult (2015) 0.439 0.323 0.672 0.456 0.337 0.691 0.196 0.117 0.208 0.348
SimplE (2018) 0.458 0.341 0.516 0.687 0.478 0.359 0.539 0.708 0.206 0.124 0.220 0.366
RotatE (2019) 0.418 0.291 0.478 0.690 0.304 0.164 0.355 0.595
TA-DistMult (2018) 0.477 0.363 0.686 0.474 0.346 0.728 0.206 0.124 0.219 0.365
ATiSE (2019) 0.550 0.436 0.629 0.750 0.519 0.378 0.606 0.794
TeRo (2020) 0.562 0.468 0.621 0.732 0.586 0.469 0.668 0.795 0.245 0.154 0.264 0.420
ChronoR (2021) 0.625 0.547 0.669 0.773 0.675 0.596 0.723 0.820
TeLM (2021) 0.625 0.545 0.673 0.774 0.678 0.599 0.728 0.823
TuckERTNT (2022) 0.604 0.521 0.655 0.753 0.638 0.559 0.686 0.783 0.381 0.283 0.418 0.576
BoxTE (2022) 0.613 0.528 0.664 0.763 0.667 0.582 0.719 0.820 0.352 0.269 0.377 0.511
EvoExplore (2022) 0.725 0.653 0.778 0.852 0.790 0.719 0.843 0.915 0.514 0.353 0.602 0.748
BDME (2023) 0.635 0.555 0.683 0.778 0.278 0.191 0.299 0.448
QDN (2023) 0.643 0.567 0.688 0.784 0.692 0.611 0.743 0.838 0.545 0.481 0.576 0.668
DyERNIE (2020) 0.669 0.599 0.714 0.797 0.739 0.679 0.773 0.855 0.457 0.390 0.479 0.589
BiQCap (2023) 0.643 0.563 0.687 0.798 0.691 0.621 0.738 0.837 0.273 0.183 0.308 0.469
IME 0.819 0.790 0.835 0.872 0.796 0.750 0.821 0.875 0.624 0.485 0.754 0.791

5. Experiment

In this section, we provide detailed information about the datasets, describe the experimental setups, present experimental results, and conduct a comprehensive analysis of experimental results.

5.1. Datasets

We provide a list of three commonly-used TKG datasets and their key statistics are summarized in Table 1. ICEWS14 and ICEWS05-15 (García-Durán et al., 2018) are subsets of Integrated Crisis Early Warning System (ICEWS), which encompass various political events along with their respective timestamps. GDELT (Leetaru and Schrodt, 2013) is a subset of the larger Global Database of Events, Language, and Tone (GDELT) that includes data on human social relationships.

5.2. Baselines

The proposed model is compared with some classic KGC methods, including SKGC and TKGC methods.

  • SKGC methods: TransE (Bordes et al., 2013), DistMult (Yang et al., 2015), SimplE (Kazemi and Poole, 2018), RotatE (Sun et al., 2019);

  • TKGC methods: TA-DistMult (García-Durán et al., 2018), TeRo (Xu et al., 2020b), ChronoR (Sadeghian et al., 2021), ATiSE (Xu et al., 2020a), TeLM (Xu et al., 2021), TuckERTNT (Shao et al., 2022), BoxTE (Messner et al., 2022), BDME (Yue et al., 2023), EvoExplore (Zhang et al., 2022), DyERNIE (Han et al., 2020), BiQCap (Zhang et al., 2023), and QDN (Wang et al., 2023a).

5.3. Link Prediction Metrics

We substitute either the head or tail entity in each test quadruplet (𝐬,𝐫,𝐨,𝐭)𝐬𝐫𝐨𝐭(\mathbf{s},\mathbf{r},\mathbf{o},\mathbf{t})( bold_s , bold_r , bold_o , bold_t ) with all feasible entities sampled from the TKG. Subsequently, we rank the scores calculated by the score function. We employ Mean Reciprocal Rank (MRR) and Hit@N𝑁Nitalic_N as evaluation metrics, with N𝑁Nitalic_N=1111, 3333 and 10101010. Higher values indicate better performance. Finally, we present the filtered results as final experimental results, which exclude all corrupted quadruplets from the TKG.

5.4. Parameters Setting

We use a grid search to find the best hyper-parameters based on the MRR performance on the validation dataset. Specifically, we tune the similarity loss weight α𝛼\alphaitalic_α, the difference loss weight β𝛽\betaitalic_β, and the structure loss weight γ𝛾\gammaitalic_γ, choosing from {0.1, 0.2,, 0.9}0.10.20.9\{0.1,\ 0.2,\ \cdots,\ 0.9\}{ 0.1 , 0.2 , ⋯ , 0.9 }. The optimal α𝛼\alphaitalic_α, β𝛽\betaitalic_β and γ𝛾\gammaitalic_γ on different datasets are set as follows: α=0.4𝛼0.4\alpha=0.4italic_α = 0.4, β=0.4𝛽0.4\beta=0.4italic_β = 0.4 and γ=0.1𝛾0.1\gamma=0.1italic_γ = 0.1 for ICEWS14; α=0.9𝛼0.9\alpha=0.9italic_α = 0.9, β=0.3𝛽0.3\beta=0.3italic_β = 0.3 and γ=0.1𝛾0.1\gamma=0.1italic_γ = 0.1 for ICEWS05-15; α=1𝛼1\alpha=1italic_α = 1, β=0.3𝛽0.3\beta=0.3italic_β = 0.3 and γ=0.1𝛾0.1\gamma=0.1italic_γ = 0.1 for GDELT. We set the optimal embedding dimension D𝐷Ditalic_D to 500500500500 across all datasets. For the AMP approach, the dimension of positional encoding is set to 32323232, i.e., dpsubscript𝑑𝑝d_{p}italic_d start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT is set to 32323232. The dimension of Bi-GRU is also set to 32323232 and MLP is used to project features from 32323232 dimensions to 1111.

Moreover, the learning rate is fine-tuned within the range {0.1,\{0.1,{ 0.1 , 0.05, 0.01, 0.005, 0.001}0.05,\ 0.01,\ 0.005,\ 0.001\}0.05 , 0.01 , 0.005 , 0.001 } on different datasets, ultimately being set to 0.10.10.10.1 for all datasets. The batch size of 1000 is consistently applied across all datasets. The entire experiment is implemented using the PyTorch 1.8.1 platform and conducted on a single NVIDIA RTX A6000 GPU.

5.5. Experimental Results and Analysis

The link prediction experimental results are displayed in Table 2, and the experimental analyses are listed as follows:

(1) The proposed model outperforms state-of-the-art baselines on three datasets, showing clear superiority in most metrics. For example, the proposed model obtains 9.4%percent9.49.4\%9.4 % and 0.6%percent0.60.6\%0.6 % improvements over EvoExplore under MRR on ICEWS14 and ICEWS05-15, respectively. This phenomenon indicates that a single space is insufficient for modeling complex geometric structures concurrently, and the spatial gap in multi-curvature spaces severely limits the expressive capacity of TKGC models.

(2) BiQCap (Zhang et al., 2023) and DyERNIE (Han et al., 2020) are two important baselines because they both model TKGs in multi-curvature spaces. However, our proposed method still improves most metrics for all datasets. This phenomenon reflects that our proposed method can effectively reduce spatial gaps caused by the heterogeneity of different curvature spaces.

(3) QDN (Wang et al., 2023a) is also an essential baseline because it serves as a key component of the multi-curvature embeddings module. When compared to QDN, our proposed method exhibits a substantial improvement in performance across all metrics. This observation underscores the inadequacy of a single Euclidean space for modeling complex geometric structures.

These observations indicate that our proposed method can not only model complex geometric structures but also effectively reduce spatial gaps among different curvature spaces.

Refer to caption
Figure 4. H@1 with varying loss weights α𝛼\alphaitalic_α, β𝛽\betaitalic_β and γ𝛾\gammaitalic_γ on ICEWS14.

5.6. Impact of Loss Weights α𝛼\alphaitalic_α, β𝛽\betaitalic_β, and γ𝛾\gammaitalic_γ

In this experiment, we explore the influence of changing the loss weights α𝛼\alphaitalic_α, β𝛽\betaitalic_β, and γ𝛾\gammaitalic_γ on MRR. As depicted in Figure 4, it becomes evident that with increasing weight, various loss functions display noteworthy differences in performance. To be specific, the similarity loss α𝛼\alphaitalic_α and the difference loss β𝛽\betaitalic_β display a parabolic shape, with their peaks occurring at 0.40.40.40.4. In contrast, the structure loss γ𝛾\gammaitalic_γ reveals an overall declining trend, gradually diminishing as the weight increases.

These phenomena clearly illustrate that appropriate weights for similarity and difference losses effectively facilitate the learning of common and characteristic features of entities, relations, and timestamps across multiple curvature spaces. Conversely, a higher weight for the structure loss restricts their flexibility in embeddings across these multiple curvature spaces.

Table 3. The ablation experiment results on ICEWS14. “w/o” represents removal for the mentioned factors, “(-)” denotes replacing Adjustable Multi-curvature Pooling (AMP) with the mentioned factors. We mark the better results in bolded.
Datasets ICEWS14
Metrics MRR Hit@1 Hit@3 Hit@10
DistMult (2015) 0.439 0.323 0.672
TA-DistMult (2018) 0.477 0.363 0.686
IME (-) MP 0.523 0.430 0.574 0.696
IME w/o simsubscript𝑠𝑖𝑚\mathcal{L}_{sim}caligraphic_L start_POSTSUBSCRIPT italic_s italic_i italic_m end_POSTSUBSCRIPT 0.740 0.693 0.765 0.824
IME w/o diffsubscript𝑑𝑖𝑓𝑓\mathcal{L}_{diff}caligraphic_L start_POSTSUBSCRIPT italic_d italic_i italic_f italic_f end_POSTSUBSCRIPT 0.716 0.653 0.752 0.835
IME w/o strusubscript𝑠𝑡𝑟𝑢\mathcal{L}_{stru}caligraphic_L start_POSTSUBSCRIPT italic_s italic_t italic_r italic_u end_POSTSUBSCRIPT 0.810 0.760 0.810 0.859
IME 0.819 0.790 0.835 0.872
  • 1

    MP represents Max Pooling.

5.7. Ablation Experiments

In order to investigate the impact of key modules and loss functions on experimental performance, we conducted a series of ablation experiments, and the corresponding link prediction results are presented in Table 3.

  1. i.

    “IME w/o simsubscript𝑠𝑖𝑚\mathcal{L}_{sim}caligraphic_L start_POSTSUBSCRIPT italic_s italic_i italic_m end_POSTSUBSCRIPT”, “IME w/o diffsubscript𝑑𝑖𝑓𝑓\mathcal{L}_{diff}caligraphic_L start_POSTSUBSCRIPT italic_d italic_i italic_f italic_f end_POSTSUBSCRIPT”, and “IME w/o strusubscript𝑠𝑡𝑟𝑢\mathcal{L}_{stru}caligraphic_L start_POSTSUBSCRIPT italic_s italic_t italic_r italic_u end_POSTSUBSCRIPT” mean removing the similarity loss simsubscript𝑠𝑖𝑚\mathcal{L}_{sim}caligraphic_L start_POSTSUBSCRIPT italic_s italic_i italic_m end_POSTSUBSCRIPT, difference loss diffsubscript𝑑𝑖𝑓𝑓\mathcal{L}_{diff}caligraphic_L start_POSTSUBSCRIPT italic_d italic_i italic_f italic_f end_POSTSUBSCRIPT, and structure loss strusubscript𝑠𝑡𝑟𝑢\mathcal{L}_{stru}caligraphic_L start_POSTSUBSCRIPT italic_s italic_t italic_r italic_u end_POSTSUBSCRIPT;

  2. ii.

    “IME (-) MP” represents replacing Adjustable Multi-curvature Pooling (AMP) with the Max Pooling (MP).

(1) In the first category of ablation experiments, the proposed model achieves a significant improvement on ICEWS14. For example, compared to “IME w/o simsubscript𝑠𝑖𝑚\mathcal{L}_{sim}caligraphic_L start_POSTSUBSCRIPT italic_s italic_i italic_m end_POSTSUBSCRIPT”, “IME w/o diffsubscript𝑑𝑖𝑓𝑓\mathcal{L}_{diff}caligraphic_L start_POSTSUBSCRIPT italic_d italic_i italic_f italic_f end_POSTSUBSCRIPT”, and “IME w/o strusubscript𝑠𝑡𝑟𝑢\mathcal{L}_{stru}caligraphic_L start_POSTSUBSCRIPT italic_s italic_t italic_r italic_u end_POSTSUBSCRIPT”, the proposed model achieves 7.9%percent7.97.9\%7.9 %, 10.3%percent10.310.3\%10.3 %, and 0.9%percent0.90.9\%0.9 % improvements on Hit@1111, respectively. Thus, we can summarize the following conclusions:

  1. a)

    Similarity loss can effectively learn the commonalities across distinct curvature spaces and mitigate spatial gaps among them;

  2. b)

    Difference loss can capture characteristic features specific to each space;

  3. c)

    Structure loss serves to constrain the embeddings of entities, relations, and timestamps by ensuring that information in distinct spaces exhibits comparable geometric structures.

(2) In the second category of ablation experiments, the proposed model exhibits a certain improvement on ICEWS14. This phenomenon demonstrates that the adjustable multi-curvature pooling approach can effectively strengthen the important information for modeling the current TKG while weakening the undesirable ones.

Refer to caption
Figure 5. Comparison of MRR performance with different embedding dimensions on ICEWS14.

5.8. Impact of Embedding Dimensions D𝐷Ditalic_D

To empirically investigate the impact of embedding dimensions on ICEWS14, we fine-tune the dimension D𝐷Ditalic_D within the range of {200,500,800,1000,1500,2000}200500800100015002000\{200,500,800,1000,1500,2000\}{ 200 , 500 , 800 , 1000 , 1500 , 2000 } and analyze the experimental results. As shown in Figure 5, the MRR performance on ICEWS14 exhibits an initial increase followed by a decrease as the dimension increases, eventually peaking at D=500𝐷500D=500italic_D = 500.

This phenomenon implies that the proposed model faces challenges in capturing intricate data relationships at lower dimensions, resulting in poorer performance. As the dimension increases, the model becomes more capable of effectively representing data, leading to enhanced performance. Nevertheless, beyond a certain threshold, this may introduce some issues such as overfitting or heightened complexity, consequently causing a decline in performance.

6. Conclusion

In this paper, we proposed a novel TKGC method called Integrating Multi-curvature shared and specific Embedding (IME). Specifically, IME models TKGs in multi-curvature spaces to capture complex geometric structures. Meanwhile, IME learns the space-specific property to comprehensively capture characteristic information, and the space-shared property to reduce spatial gaps caused by the heterogeneity of different curvature spaces. Furthermore, IME innovatively proposes an Adjustable Multi-curvature Pooling (AMP) approach to effectively strengthen the retention of important information. Experimental results on several well-established datasets incontrovertibly show that IME achieves competitive performance when compared to state-of-the-art TKGC methods.

Acknowledgements.
This work was funded by the National Key R&\&&D Program of China (No. 2021ZD0111902), National Natural Science Foundation of China (No. 92370102, 62272015, U21B2038), R&\&&D Program of Beijing Municipal Education Commission (KZ202210005008).

References

  • (1)
  • Abboud et al. (2020) Ralph Abboud, Ismail Ceylan, Thomas Lukasiewicz, and Tommaso Salvatori. 2020. BoxE: A box embedding model for knowledge base completion. In Neural Information Processing Systems. 9649–9661.
  • Balažević et al. (2019) Ivana Balažević, Carl Allen, and Timothy M Hospedales. 2019. Tucker: Tensor factorization for knowledge graph completion. In Empirical Methods in Natural Language Processing. 5185–5194.
  • Bordes et al. (2013) Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. 2013. Translating embeddings for modeling multi-relational data. In Neural Information Processing Systems. 2787–2795.
  • Boschee et al. (2015) Elizabeth Boschee, Jennifer Lautenschlager, Sean O’Brien, Steve Shellman, James Starz, and Michael Ward. 2015. ICEWS Coded Event Data. In Harvard Dataverse.
  • Cao et al. (2022) Zongsheng Cao, Qianqian Xu, Zhiyong Yang, Xiaochun Cao, and Qingming Huang. 2022. Geometry interaction knowledge graph embeddings. In AAAI Conference on Artificial Intelligence.
  • Chami et al. (2020) Ines Chami, Adva Wolf, Da Cheng Juan, Frederic Sala, Sujith Ravi, and Christopher Ré. 2020. Low-dimensional hyperbolic knowledge graph embeddings. In Annual Meeting of the Association for Computational Linguistics. 6901–6914.
  • Chen et al. (2021) Jiacheng Chen, Hexiang Hu, Hao Wu, Yuning Jiang, and Changhu Wang. 2021. Learning the best pooling strategy for visual semantic embedding. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 15789–15798.
  • Dalton et al. (2014) Jeffrey Dalton, Laura Dietz, and James Allan. 2014. Entity query feature expansion using knowledge base links. In ACM SIGIR Conference on Research & Development in Information Retrieval. 365–374.
  • Dasgupta et al. (2018) Shib Sankar Dasgupta, Swayambhu Nath Ray, and Partha Talukdar. 2018. HyTE: Hyperplane-based temporally aware knowledge graph embedding. In Empirical Methods in Natural Language Processing. 2001–2011.
  • Dettmers et al. (2018) Tim Dettmers, Pasquale Minervini, Pontus Stenetorp, and Sebastian Riedel. 2018. Convolutional 2D knowledge graph embeddings. In AAAI Conference on Artificial Intelligence. 1811–1818.
  • Gao et al. (2022) Qiankun Gao, Chen Zhao, Bernard Ghanem, and Jian Zhang. 2022. R-DFCIL: Relation-guided representation learning for data-free class incremental learning. In European Conference on Computer Vision. 423–439.
  • García-Durán et al. (2018) Alberto García-Durán, Sebastijan Dumančić, and Mathias Niepert. 2018. Learning sequence encoders for temporal knowledge graph completion. In Empirical Methods in Natural Language Processing. 4816–4821.
  • Goel et al. (2020) Rishab Goel, Seyed Mehran Kazemi, Marcus Brubaker, and Pascal Poupart. 2020. Diachronic embedding for temporal knowledge graph completion. In AAAI Conference on Artificial Intelligence. 3988–3995.
  • Guha et al. (2003) Ramanathan Guha, Rob McCool, and Eric Miller. 2003. Semantic search. In Proceedings of the 12th international conference on World Wide Web. 700–709.
  • Guo and Kok (2021) Jia Guo and Stanley Kok. 2021. BiQUE: Biquaternionic Embeddings of Knowledge Graphs. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 8338–8351.
  • Han et al. (2020) Zhen Han, Yunpu Ma, Peng Chen, and Volker Tresp. 2020. DyERNIE: Dynamic evolution of riemannian manifold embeddings for temporal knowledge graph completion. In Empirical Methods in Natural Language Processing. 7301–7316.
  • Hitchcock (1927) Frank L Hitchcock. 1927. The expression of a tensor or a polyadic as a sum of products. Journal of Mathematics and Physics 6, 1-4 (1927), 164–189.
  • Kazemi and Poole (2018) Seyed Mehran Kazemi and David Poole. 2018. SimplE embedding for link prediction in knowledge graphs. In Neural Information Processing Systems. 4284–4295.
  • Ko et al. (2022) Hyeyoung Ko, Suyeon Lee, Yoonseo Park, and Anna Choi. 2022. A survey of recommendation systems: recommendation models, techniques, and application fields. Electronics 11, 1 (2022), 141.
  • Lacroix et al. (2020) Timothée Lacroix, Guillaume Obozinski, and Nicolas Usunier. 2020. Tensor decompositions for temporal knowledge base completion. In International Conference on Learning Representations. 1–12.
  • Leblay and Chekol (2018) Julien Leblay and Melisachew Wudage Chekol. 2018. Deriving validity time in knowledge graph. In International Conference on World Wide Web. 1771–1776.
  • Leetaru and Schrodt (2013) Kalev Leetaru and Philip A Schrodt. 2013. GDELT: Global data on events, location, and tone. In ISA Annual Convention. 1–49.
  • Lin et al. (2015) Yankai Lin, Zhiyuan Liu, Maosong Sun, Yang Liu, and Xuan Zhu. 2015. Learning entity and relation embeddings for knowledge graph completion. In AAAI Conference on Artificial Intelligence. 2181–2187.
  • Liu et al. (2017) Weiyang Liu, Yandong Wen, Zhiding Yu, Ming Li, Bhiksha Raj, and Le Song. 2017. Sphereface: Deep hypersphere embedding for face recognition. In the Conference on Computer Vision and Pattern Recognition. 212–220.
  • Messner et al. (2022) Johannes Messner, Ralph Abboud, and Ismail Ilkan Ceylan. 2022. Temporal knowledge graph completion using box embeddings. In AAAI Conference on Artificial Intelligence. 7779–7787.
  • Nickel and Kiela (2017) Maximillian Nickel and Douwe Kiela. 2017. Poincaré embeddings for learning hierarchical representations. In Advances in Neural Information Processing Systems.
  • Pan et al. (2024) Shirui Pan, Linhao Luo, Yufei Wang, Chen Chen, Jiapu Wang, and Xindong Wu. 2024. Unifying Large Language Models and Knowledge Graphs: A Roadmap. IEEE Transactions on Knowledge and Data Engineering (2024), 1–20.
  • Sadeghian et al. (2021) Ali Sadeghian, Mohammadreza Armandpour, Anthony Colas, and Daisy Zhe Wang. 2021. ChronoR: rotation based temporal knowledge graph embedding. In AAAI Conference on Artificial Intelligence. 6471–6479.
  • Sala et al. (2018) Frederic Sala, Chris De Sa, Albert Gu, and Christopher Ré. 2018. Representation tradeoffs for hyperbolic embeddings. In International Conference on Machine Learning. 4460–4469.
  • Schlichtkrull et al. (2018) Michael Schlichtkrull, Thomas N Kipf, Peter Bloem, Rianne Van Den Berg, Ivan Titov, and Max Welling. 2018. Modeling relational data with graph convolutional networks. In European Semantic Web Conference. Springer, 593–607.
  • Schuster and Paliwal (1997) Mike Schuster and Kuldip K Paliwal. 1997. Bidirectional recurrent neural networks. IEEE transactions on Signal Processing 45, 11 (1997), 2673–2681.
  • Shao et al. (2022) Pengpeng Shao, Dawei Zhang, Guohua Yang, Jianhua Tao, Feihu Che, and Tong Liu. 2022. Tucker decomposition-based temporal knowledge graph completion. Knowledge-Based Systems 238 (2022), 107841.
  • Sun et al. (2019) Zhiqing Sun, Zhi-Hong Deng, Jian-Yun Nie, and Jian Tang. 2019. RotatE: Knowledge graph embedding by relational rotation in complex space. In International Conference on Learning Representations. 1–18.
  • Trouillon et al. (2016) Théo Trouillon, Johannes Welbl, Sebastian Riedel, Éric Gaussier, and Guillaume Bouchard. 2016. Complex embeddings for simple link prediction. In International Conference on Machine Learning. 2071–2080.
  • Vaswani et al. (2017) Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).
  • Vu et al. (2019) Thanh Vu, Tu Dinh Nguyen, Dat Quoc Nguyen, Dinh Phung, et al. 2019. A capsule network-based embedding model for knowledge graph completion and search personalization. In Proceedings of the Conference of the Association for Computational Linguistics: Human Language Technologies. 2180–2189.
  • Wang et al. (2024) Jiapu Wang, Boyue Wang, Junbin Gao, Simin Hu, Yongli Hu, and Baocai Yin. 2024. Multi-Level Interaction Based Knowledge Graph Completion. IEEE/ACM Transactions on Audio, Speech, and Language Processing 32 (2024), 386–396.
  • Wang et al. (2022) Jiapu Wang, Boyue Wang, Junbin Gao, Yongli Hu, and Baocai Yin. 2022. Multi-concept Representation Learning for Knowledge Graph Completion. ACM Transactions on Knowledge Discovery from Data (2022).
  • Wang et al. (2023a) Jiapu Wang, Boyue Wang, Junbin Gao, Xiaoyan Li, Yongli Hu, and Baocai Yin. 2023a. QDN: A Quadruplet Distributor Network for Temporal Knowledge Graph Completion. IEEE Transactions on Neural Networks and Learning Systems (2023).
  • Wang et al. (2023b) Jiapu Wang, Boyue Wang, Junbin Gao, Xiaoyan Li, Yongli Hu, and Baocai Yin. 2023b. TDN: Triplet Distributor Network for Knowledge Graph Completion. IEEE Transactions on Knowledge and Data Engineering (2023).
  • Wang et al. (2023c) Jiapu Wang, Boyue Wang, Meikang Qiu, Shirui Pan, Bo Xiong, Heng Liu, Linhao Luo, Tengfei Liu, Yongli Hu, Baocai Yin, et al. 2023c. A Survey on Temporal Knowledge Graph Completion: Taxonomy, Progress, and Prospects. arXiv preprint arXiv:2308.02457 (2023).
  • Wang et al. (2021) Shen Wang, Xiaokai Wei, Cicero Nogueira Nogueira dos Santos, Zhiguo Wang, Ramesh Nallapati, Andrew Arnold, Bing Xiang, Philip S Yu, and Isabel F Cruz. 2021. Mixed-curvature multi-relational graph neural network for knowledge graph completion. In International Conference on World Wide Web. 1761–1771.
  • Wilson et al. (2014) Richard C Wilson, Edwin R Hancock, Elżbieta Pekalska, and Robert PW Duin. 2014. Spherical and hyperbolic embeddings of data. IEEE Transactions on Pattern Analysis and Machine Intelligence 36, 11 (2014), 2255–2269.
  • Xu et al. (2021) Chengjin Xu, Yung-Yu Chen, Mojtaba Nayyeri, and Jens Lehmann. 2021. Temporal knowledge graph completion using a linear temporal regularizer and multivector embeddings. In Annual Meeting of the Association for Computational Linguistics. 2569–2578.
  • Xu et al. (2020a) Chenjin Xu, Mojtaba Nayyeri, Fouad Alkhoury, Hamed Yazdi, and Jens Lehmann. 2020a. Temporal knowledge graph completion based on time series gaussian embedding. In International Semantic Web Conference. 654–671.
  • Xu et al. (2020b) Chengjin Xu, Mojtaba Nayyeri, Fouad Alkhoury, Hamed Shariat Yazdi, and Jens Lehmann. 2020b. TeRo: A time-aware knowledge graph embedding via temporal rotation. In International Conference on Computational Linguistics. 1583–1593.
  • Yang et al. (2015) Bishan Yang, Wen Tau Yih, Xiaodong He, Jianfeng Gao, and Li Deng. 2015. Embedding entities and relations for learning and inference in knowledge bases. In International Conference on Learning Representations. 1–13.
  • Yue et al. (2023) Lupeng Yue, Yongjian Ren, Yan Zeng, Jilin Zhang, Kaisheng Zeng, and Jian Wan. 2023. Block Decomposition with Multi-granularity Embedding for Temporal Knowledge Graph Completion. In International Conference on Database Systems for Advanced Applications. 706–715.
  • Zellinger et al. (2017) Werner Zellinger, Thomas Grubinger, Edwin Lughofer, Thomas Natschläger, and Susanne Saminger-Platz. 2017. Central Moment Discrepancy (CMD) for Domain-Invariant Representation Learning. In International Conference on Learning Representations. 1–13.
  • Zhang et al. (2022) Jiasheng Zhang, Shuang Liang, Yongpan Sheng, and Jie Shao. 2022. Temporal knowledge graph representation learning with local and global evolutions. Knowledge-Based Systems 251 (2022), 109234.
  • Zhang et al. (2023) Sensen Zhang, Xun Liang, Zhiying Li, Junlan Feng, Xiangping Zheng, and Bo Wu. 2023. BiQCap: A Biquaternion and Capsule Network-Based Embedding Model for Temporal Knowledge Graph Completion. In International Conference on Database Systems for Advanced Applications. 673–688.