Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
\etocdepthtag

.tocmtchapter \etocsettagdepthmtchaptersubsection \etocsettagdepthmtappendixnone

LAMP: Learnable Meta-Path Guided Adversarial Contrastive Learning for Heterogeneous Graphs

Siqing Li
University of New South Wales
siqing.li@unsw.edu.au
&Jin-Duk Park
Yonsei University
jindeok6@yonsei.ac.kr
&Wei Huang
RIKEN AIP
wei.huang.vr@riken.jp
&Xin Cao11footnotemark: 1
University of New South Wales
xin.cao@unsw.edu.au
&Won-Yong Shin
Yonsei University
wy.shin@yonsei.ac.kr
&Zhiqiang Xu
Mohamed bin Zayed University of Artificial Intelligence
zhiqiangxu2001@gmail.com
Corresponding author.
Abstract

Heterogeneous graph neural networks (HGNNs) have significantly propelled the information retrieval (IR) field. Still, the effectiveness of HGNNs heavily relies on high-quality labels, which are often expensive to acquire. This challenge has shifted attention towards Heterogeneous Graph Contrastive Learning (HGCL), which usually requires pre-defined meta-paths. However, our findings reveal that meta-path combinations significantly affect performance in unsupervised settings, an aspect often overlooked in current literature. Existing HGCL methods have considerable variability in outcomes across different meta-path combinations, thereby challenging the optimization process to achieve consistent and high performance. In response, we introduce LAMP (LearnAble Meta-Path), a novel adversarial contrastive learning approach that integrates various meta-path sub-graphs into a unified and stable structure, leveraging the overlap among these sub-graphs. To address the denseness of this integrated sub-graph, we propose an adversarial training strategy for edge pruning, maintaining sparsity to enhance model performance and robustness. LAMP aims to maximize the difference between meta-path and network schema views for guiding contrastive learning to capture the most meaningful information. Our extensive experimental study conducted on four diverse datasets from the Heterogeneous Graph Benchmark (HGB) demonstrates that LAMP significantly outperforms existing state-of-the-art unsupervised models in terms of accuracy and robustness.

1 Introduction

Heterogeneous graphs characterized by diverse node and edge types are ubiquitous across various domains including social, academic, and user interaction networks. The use of heterogeneous graph neural networks (HGNNs) has surged in IR applications, ranging from search engines [2, 12, 50] to recommendation systems [1, 34, 39, 22, 28] and question answering systems [7, 9, 4].

HGNNs fall into two categories: Meta-path based models [46, 54, 8, 49], converting HINs into homogeneous sub-graphs via predefined meta-paths, and Meta-path free models [53, 58, 17, 16, 31, 51], facilitating distinct information propagation along varied relations. These models have shown promising results but often require extensive labeling, posing challenges for large-scale IR tasks. Consequently, there has been a shift towards self-supervised learning (SSL) approaches, particularly in Heterogeneous Graph Contrastive Learning (HGCL) [35, 25, 47, 24, 3, 57, 60].

In HGCL, a widely adopted approach involves the generation of multiple graph views via diverse data augmentation techniques, subsequently refining node representations through contrastive learning. Two principal categories of HGCL augmentations emerge: (1) the meta-path view [35, 24, 25], which converts heterogeneous graphs into homogeneous sub-graphs according to selected meta-paths, and (2) the network schema view [47, 36], wherein the target nodes aggregate information from one-hop neighbors of varying node types. Distinctively, the network schema view imparts a localized perspective, while the meta-path view delivers a more expansive, higher-order perspective, connecting target nodes through meta-path instances that span multiple hops. However, recent studies have revealed that manually crafted augmentations, including the prevalent meta-path view, often fall short of achieving optimal results [61, 59, 20]. This demonstrates a significant reliance on the specific combination of meta-paths chosen, which in turn, greatly affects the overall model performance.

In this study, we explore the relationship between the meta-path set selection and HGNN model performance on node classification, detailed in Section 3. Our findings illustrated in Figure 1 reveal that the set of meta-paths selected crucially affect model performance, with all models showing at least a 5% deviation across different combinations, especially pronounced in SSL models. Clearly, the identification of the optimal meta-path combination is crucial, yet presents considerable challenges due to:

(1) No Universal Meta-Path Combination: Our research indicates the absence of a universally optimal meta-path combination among models, with effectiveness varying significantly (see Figure 3). The optimal set for supervised models often underperforms in unsupervised scenarios, highlighting SSL’s inherent complexity.

(2) No use in Adding More Meta-paths: Surprisingly, adding more meta-paths doesn’t consistently lead to better performance. Although effective in supervised learning contexts as evidenced by SOTA methods [49, 6], this approach does not translate as effectively into SSL scenarios. Consequently, a straightforward greedy search for the optimal meta-path combination is inadequate in the SSL landscape.

(3) No Downstream Task Labels in SSL: SSL methods face a unique challenge in that they cannot employ downstream tasks to determine the most effective meta-path combinations, as these tasks are not applicable in unsupervised contexts.

Addressing the issue in an unsupervised framework, our solution is to increase the robustness of HGCL models against diverse meta-path combinations. The existing models lack robustness primarily because each meta-path is treated as an independent channel, making changes in these channels potentially harmful to model stability. To address the overlooked issue of meta-path sensitivity, we present LAMP — a LearnAble Meta-Path guided adversarial contrastive learning model which aims at creating a stable meta-path view. It reduces dependency on specific meta-path combinations and achieves consistent performance, also simplifying the integration of a wide range of meta-paths. Furthermore, we enhance LAMP with adversarial training, a technique known to improve contrastive learning performance in homogeneous graphs.

LAMP proposes a new perspective in meta-path view construction by merging different meta-path sub-graphs into a unified structure. This results in a singular sub-graph that integrates nodes and edges from various meta-path sub-graphs. In this integrated sub-graph, each edge carries a one-hot-like encoding based on its meta-path instance, maintaining the semantic integrity of the original sub-graphs. This unified form ensures stability across various meta-path combinations, utilizing the overlaps between them. For instance, combining sub-graphs from PAP,PSP,PAPAP (refer to Figure 2 (b)) into one integrated sub-graph (Figure 2 (c)) retains the topological structure when modifying the combination, such as removing a meta-path, but with different edge encoding. This stability stems from the shared edges commonly found in heterogeneous graphs, as detailed in Section 3, thereby significantly reducing variability between combinations and enhancing the model’s robustness.

Nevertheless, As the number of meta-paths in the integrated sub-graph increases, so does its density, which may hinder performance since Graph Contrastive Learning (GCL) generally performs better with sparser structures[61]. In extreme cases, the integrated sub-graph might become too dense, resembling a complete graph, and lead to high computational cost. To address this, we apply an adversarial training method named LMA (Learnable Meta-path Guided Augmentation). Initially, LMA simplifies the graph by randomly removing edges. It then employs a learned edge-pruning approach, guided by node features and semantic information, to optimally refine the graph’s structure. This process enhances both the model’s efficiency and its robustness. The edge encoding is combined with a learnable weight vector to represent the importance of different meta-paths. LMA’s goal is to create a significant distinction between the network schema and meta-path views, allowing the HGCL framework to effectively extract the most meaningful knowledge. This approach follows the adversarial training model common in graph contrastive learning. Our comprehensive experiments on four HGB [31] real-world datasets demonstrate that LAMP not only outperforms current SOTA baselines but also greatly improves robustness.

Refer to caption
Figure 1: Comparing performance variability in node classification and illustrating standard deviation and min-max gaps across HGNN models (supervised HAN with 1/2 layers, unsupervised XGOAL, HeCo, DMGI, and our LAMP) using varied meta-path combinations.

2 Preliminary

Refer to caption
Figure 2: A simplistic toy example derived from the ACM dataset: (a) Illustrates a Heterogeneous Graph. (b) Demonstrates three distinct meta-path sub-graphs associated with their respective meta-paths: PAP, PSP, and PAPAP. (c) Displays an integrated meta-path sub-graph that aggregates all the meta-path sub-graphs; its edge type embedding indicates which meta-paths are involved in each edge.

Definition 1. Heterogeneous Information Network (HIN). A HIN is a network 𝒢=(𝒱,,𝒜,,θ,ϕ)𝒢𝒱𝒜𝜃italic-ϕ\mathcal{G}=(\mathcal{V},\mathcal{E},\mathcal{A},\mathcal{R},\mathcal{\theta},% \mathcal{\phi})caligraphic_G = ( caligraphic_V , caligraphic_E , caligraphic_A , caligraphic_R , italic_θ , italic_ϕ ), where 𝒱𝒱\mathcal{V}caligraphic_V and \mathcal{E}caligraphic_E represent the sets of nodes and edges, respectively. The network is associated with a node type mapping function 𝒱𝒜𝒱𝒜\mathcal{V}\rightarrow\mathcal{A}caligraphic_V → caligraphic_A and an edge type mapping function ϕ::italic-ϕ\mathcal{\phi}:\mathcal{E}\rightarrow\mathcal{R}italic_ϕ : caligraphic_E → caligraphic_R. Here, 𝒜𝒜\mathcal{A}caligraphic_A and \mathcal{R}caligraphic_R represent the sets of object and link types, respectively, with the constraint |𝒜|+||>2𝒜2|\mathcal{A}|+|\mathcal{R}|>2| caligraphic_A | + | caligraphic_R | > 2.
Definition 2. Meta-path. A meta-path 𝒫𝒫\mathcal{P}caligraphic_P is a structural pattern connecting different node types, represented as

A1R1A2R2A3RlAl+1subscript𝑅1subscript𝐴1subscript𝐴2subscript𝑅2subscript𝐴3subscript𝑅𝑙subscript𝐴𝑙1A_{1}\xrightarrow{R_{1}}A_{2}\xrightarrow{R_{2}}A_{3}\cdots\xrightarrow{R_{l}}% A_{l+1}italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_ARROW start_OVERACCENT italic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_OVERACCENT → end_ARROW italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_ARROW start_OVERACCENT italic_R start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_OVERACCENT → end_ARROW italic_A start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ⋯ start_ARROW start_OVERACCENT italic_R start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT end_OVERACCENT → end_ARROW italic_A start_POSTSUBSCRIPT italic_l + 1 end_POSTSUBSCRIPT

(abbreviated as A1A2Al+1subscript𝐴1subscript𝐴2subscript𝐴𝑙1A_{1}A_{2}\dots A_{l+1}italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT … italic_A start_POSTSUBSCRIPT italic_l + 1 end_POSTSUBSCRIPT), which describes a composite relation R=R1R2Rl𝑅subscript𝑅1subscript𝑅2subscript𝑅𝑙R=R_{1}\circ R_{2}\circ\dots\circ R_{l}italic_R = italic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∘ italic_R start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∘ ⋯ ∘ italic_R start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT between node types A1subscript𝐴1A_{1}italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and Al+1subscript𝐴𝑙1A_{l+1}italic_A start_POSTSUBSCRIPT italic_l + 1 end_POSTSUBSCRIPT, where \circ represents the composition operator on relations. Paths in 𝒢𝒢\mathcal{G}caligraphic_G that follow the pattern of 𝒫𝒫\mathcal{P}caligraphic_P  are termed as meta-path instances.
Definition 3. Meta-path Sub-Graph. Given a meta-path 𝒫𝒫\mathcal{P}caligraphic_P, the nodes in 𝒢𝒢\mathcal{G}caligraphic_G can be re-connected to form a meta-path sub-graph 𝒢𝒫subscript𝒢𝒫\mathcal{G}_{\mathcal{P}}caligraphic_G start_POSTSUBSCRIPT caligraphic_P end_POSTSUBSCRIPT. An edge ev𝑒𝑣e\rightarrow vitalic_e → italic_v exists in 𝒢𝒫subscript𝒢𝒫\mathcal{G}_{\mathcal{P}}caligraphic_G start_POSTSUBSCRIPT caligraphic_P end_POSTSUBSCRIPT if and only if there’s at least one path (a meta-path instance) between u𝑢uitalic_u and v𝑣vitalic_v following the meta-path 𝒫𝒫\mathcal{P}caligraphic_P in the original graph 𝒢𝒢\mathcal{G}caligraphic_G. For instance, Figure 2 (b) illustrates three meta-path sub-graphs derived from the HIN in Figure 2 (a). PAP indicates two papers authored by the same individual, while PSP signifies two papers related to the same subject. As meta-paths combine multiple relations, meta-path sub-graphs encapsulate high-order structures.

3 Empirical Observations

Refer to caption
Figure 3: We generated a total of 26 distinct meta-path combinations using five predefined meta-paths: PAP, PSP, PTP, PPSP, and -PPSP. A flag "1" indicates the inclusion of a particular meta-path in the combination, whereas the absence of a meta-path is denoted by a flag "0". Each column on the right side of the table ranks the performance of these meta-path combinations for different models.

To explore the influence of meta-path combinations on HGNN performance, we conducted a detailed empirical study using the ACM dataset. We generated 26 distinct combinations from 5 predefined meta-paths and assessed the performance variations in HGNNs, evidenced by the standard deviation and min-max gap. The key findings, depicted in Figures 1 and 3, are summarized below:

(1) Sensitivity to Meta-path Combinations. Meta-path combinations critically affect HGNN performance. Variations in these combinations impact the structural configuration of meta-path sub-graphs, significantly influencing model outcomes, as evidenced by the substantial standard deviation and min-max gap shown in Figure 1. In extreme cases, improper combinations can lead to model failure. This challenge is more acute in SSL models due to the lack of downstream task feedback. Even proven meta-paths can cause dramatic performance deterioration if combined inappropriately. The sensitivity of HGNNs to these combinations is partly due to their responsiveness to topological changes and is further compounded by the low homophily ratios in meta-path sub-graphs [13] (referenced in Table 1), which exacerbates the issue in denser sub-graph structures.

(2) Absence of Universal Optimal Combinations.: Our study reveals that no single meta-path combination is optimal for all models. This absence of a universal ‘best’ combination becomes a formidable challenge in SSL, where the lack of direct feedback from downstream tasks makes finding the ideal combination through exhaustive search impractical. The disparity between the effective combinations in supervised and unsupervised models further complicates this issue. This gap suggests that strategies successful in supervised learning may not directly translate to superior performance in unsupervised settings.

(3) Naively adding more meta-path do not guarantee the best.: Contrary to expectations, simply adding more meta-paths does not linearly improve HGNN performance. While certain meta-paths are essential, their impact varies across different models. In some instances, such as the comparison between ‘comb26’ and the optimal ‘comb21’ for X-GOAL, adding an extra meta-path resulted in decreased performance. Our analysis, illustrated in Figure 4, shows significant edge overlaps among meta-path sub-graphs. For example, ‘-PPSP’ overlaps with over 50% of every other meta-path sub-graph. Such overlaps cause an accumulation of redundant information, overshadowing valuable insights from less common structures. What’s worse, current semantic-level aggregation methods struggle to filter out this redundancy, indicating that increasing meta-path count is not a straightforward solution for performance enhancement. In some cases, it can even be counterproductive.

Dataset Meta-path HR(%) ACC(%) Edges
PAP 81.45 87.33 ± 0.56 29767
PSP 64.03 66.72 ± 0.49 2217089
ACM PTP 33.38 68.21 ± 0.14 9150595
PcPSP 60.62 68.21 ± 1.08 1933761
PrPSP 61.41 68.16 ± 1.28 1440299
Table 1: Homophility rate (HR) in different meta-path sub-graph of ACM dataset. ACC represents the node classification accuracy of 2-layer GCN with ReLU activation.

This study’s insights emphasize the critical need for a methodological strategy capable of forging a robust meta-path perspective, while simultaneously mitigating the redundancies that emerge from the amalgamation of various meta-paths.

4 The Proposed Model: LAMP

In this section, we introduce LAMP, a Learnable Meta-Path Guided Adversarial Contrastive Learning method, detailed in Figure 5. LAMP leverages a dual-view approach: a high-order information-rich meta-path view, processed by LMA, and a locally-focused network schema view. The essence of LAMP lies in its integration of diverse meta-path sub-graphs into a single, comprehensive meta-path sub-graph. To manage the inherent density of this integrated view, LMA – a meta-path guided learnable edge-pruning strategy – is employed. LAMP’s aim is to effectively retain essential sparsity for contrastive learning and reduce redundant information across the network schema and integrated meta-path views, enhancing node consistency across these views via an advanced adversarial training regime.

Refer to caption
Figure 4: We calculated Jaccard Similarity and coverage ratio based on meta-path instances (edges) in meta-path sub-graphs.
Refer to caption
Figure 5: Overall architecture of the proposed LAMP model. LAMP processes network schema view G𝐺Gitalic_G and meta-path graph t(G^)𝑡^𝐺t(\hat{G})italic_t ( over^ start_ARG italic_G end_ARG ), which supply local and high-order information, respectively. The adversarial training mechanism is aplied to enhance the robustness of the meta-path view, alongside the contrastive optimization strategy employed to minimize the discrepancy between the two views.

4.1 Problem Formulation

Given a HIN 𝒢=(𝒱,,𝒜,)𝒢𝒱𝒜\mathcal{G}=(\mathcal{V},\mathcal{E},\mathcal{A},\mathcal{R})caligraphic_G = ( caligraphic_V , caligraphic_E , caligraphic_A , caligraphic_R ) denoted as G𝐺Gitalic_G for short and a set of meta-path {𝒫}𝒫\{\mathcal{P}\}{ caligraphic_P } with |{𝒫}|=n𝒫𝑛|\{\mathcal{P}\}|=n| { caligraphic_P } | = italic_n, we define {𝒢𝒫1𝒢𝒫n}subscript𝒢subscript𝒫1subscript𝒢subscript𝒫𝑛\{\mathcal{G}_{\mathcal{P}_{1}}\dots\mathcal{G}_{\mathcal{P}_{n}}\}{ caligraphic_G start_POSTSUBSCRIPT caligraphic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT … caligraphic_G start_POSTSUBSCRIPT caligraphic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT } as meta-path sub-graphs and 𝒢^𝒫subscript^𝒢𝒫{\mathcal{\hat{G}_{P}}}over^ start_ARG caligraphic_G end_ARG start_POSTSUBSCRIPT caligraphic_P end_POSTSUBSCRIPT denoted as G^^𝐺\hat{G}over^ start_ARG italic_G end_ARG as the integrated meta-path sub-graph. We represent the encoding function with parameter θ𝜃\thetaitalic_θ as f()𝑓f(\cdot)italic_f ( ⋅ ) and the augmentation function with parameter ϕitalic-ϕ\phiitalic_ϕ as t()𝑡t(\cdot)italic_t ( ⋅ ). For simplicity, we denote the network schema view by fθ(G)subscript𝑓𝜃𝐺f_{\theta}(G)italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_G ) and the meta-path view as fθ(G^)subscript𝑓𝜃^𝐺f_{\theta}(\hat{G})italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( over^ start_ARG italic_G end_ARG ). The primary objective for contrastive learning is:

argmaxθI(fθ(G),fθ(tϕ(G^))),subscript𝜃𝐼subscript𝑓𝜃𝐺subscript𝑓𝜃subscript𝑡italic-ϕ^𝐺\arg\max_{\theta}I(f_{\theta}(G),f_{\theta}(t_{\phi}(\hat{G}))),roman_arg roman_max start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT italic_I ( italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_G ) , italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT ( over^ start_ARG italic_G end_ARG ) ) ) , (1)

Then for the adversarial training which tries to increase the difficulty of getting agreement in contrastive learning, the objective is:

argminϕI(fθ(G),fθ(tϕ(G^))),subscriptitalic-ϕ𝐼subscript𝑓𝜃𝐺subscript𝑓𝜃subscript𝑡italic-ϕ^𝐺\arg\min_{\phi}I(f_{\theta}(G),f_{\theta}(t_{\phi}(\hat{G}))),roman_arg roman_min start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT italic_I ( italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_G ) , italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT ( over^ start_ARG italic_G end_ARG ) ) ) , (2)

where I(X1;X2)𝐼subscript𝑋1subscript𝑋2I(X_{1};X_{2})italic_I ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ; italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) represents the mutual information between random variables X1subscript𝑋1X_{1}italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and X2subscript𝑋2X_{2}italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. The graph tϕ(G^)=(V^,E^)subscript𝑡italic-ϕ^𝐺^𝑉^𝐸t_{\phi}(\hat{G})=(\hat{V},\hat{E})italic_t start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT ( over^ start_ARG italic_G end_ARG ) = ( over^ start_ARG italic_V end_ARG , over^ start_ARG italic_E end_ARG ) retains the nodes from G^^𝐺\hat{G}over^ start_ARG italic_G end_ARG, but its edge set is a subset of E^^𝐸\hat{E}over^ start_ARG italic_E end_ARG. The insight is, we are trying to make the contrast as strong as possible while the two different view still could reach an agreement, which has been proved to be a effective optimization in contrastive learning. To bridge the min-max procedure and address potential biases, we incorporate a learnable meta-path importance parameter γ1×|𝒫|𝛾superscript1𝒫\gamma\in\mathbb{R}^{1\times|\mathcal{P}|}italic_γ ∈ blackboard_R start_POSTSUPERSCRIPT 1 × | caligraphic_P | end_POSTSUPERSCRIPT, which shared by f()𝑓f(\cdot)italic_f ( ⋅ ) and t()𝑡t(\cdot)italic_t ( ⋅ ). Then we put all the objective together and the refined version is:

argmaxθminϕI(fθ(G),fθ,γ(tϕ,γ(G^)).\arg\max_{\theta}\min_{\phi}I(f_{\theta}(G),f_{\theta,\gamma}(t_{\phi,\gamma}(% \hat{G})).roman_arg roman_max start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT roman_min start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT italic_I ( italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_G ) , italic_f start_POSTSUBSCRIPT italic_θ , italic_γ end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_ϕ , italic_γ end_POSTSUBSCRIPT ( over^ start_ARG italic_G end_ARG ) ) . (3)

Consistent with prior research [40, 11], we employ InfoNCE [33] to approximate I(X1;X2)𝐼subscript𝑋1subscript𝑋2I(X_{1};X_{2})italic_I ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ; italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ), detailed further in Section 4.6. Regarding γ𝛾\gammaitalic_γ, the insight is that γ𝛾\gammaitalic_γ prioritizes longer meta-paths because the most straightforward strategy for tϕ()subscript𝑡italic-ϕt_{\phi}(\cdot)italic_t start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT ( ⋅ ) to diminish the similarity between the two views is by preserving long meta-path instances in G^^𝐺\hat{G}over^ start_ARG italic_G end_ARG. Conversely, during the maximization phase, shorter meta-paths become more influential. This balanced strategy empowers LAMP to harness rich high-order information while discerning the value of different meta-paths.

4.2 Integrated Sub-graph based meta-path view

Given a batch of meta-path sub-graphs {𝒢𝒫i}={(𝒱𝒫i,𝒫i)}subscript𝒢subscript𝒫𝑖subscript𝒱subscript𝒫𝑖subscriptsubscript𝒫𝑖\{\mathcal{G}_{\mathcal{P}_{i}}\}=\{(\mathcal{V}_{\mathcal{P}_{i}},\mathcal{E}% _{\mathcal{P}_{i}})\}{ caligraphic_G start_POSTSUBSCRIPT caligraphic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT } = { ( caligraphic_V start_POSTSUBSCRIPT caligraphic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT , caligraphic_E start_POSTSUBSCRIPT caligraphic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) } with i=1,,|𝒫|𝑖1𝒫i=1,\cdots,|\mathcal{P}|italic_i = 1 , ⋯ , | caligraphic_P |, we amalgamate all of them into a singular sub-graph denoted as G^=(V^,E^)^𝐺^𝑉^𝐸\hat{G}=(\hat{V},\hat{E})over^ start_ARG italic_G end_ARG = ( over^ start_ARG italic_V end_ARG , over^ start_ARG italic_E end_ARG ), to create the meta-path view. Here V^=i𝒱𝒫i^𝑉subscript𝑖subscript𝒱subscript𝒫𝑖\hat{V}=\cup_{i}\mathcal{V}_{\mathcal{P}_{i}}over^ start_ARG italic_V end_ARG = ∪ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT caligraphic_V start_POSTSUBSCRIPT caligraphic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT and E^=i𝒫i^𝐸subscript𝑖subscriptsubscript𝒫𝑖\hat{E}=\cup_{i}\mathcal{E}_{\mathcal{P}_{i}}over^ start_ARG italic_E end_ARG = ∪ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT caligraphic_E start_POSTSUBSCRIPT caligraphic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT. As an illustration, Figure 2(c) depicts an integrated sub-graph derived from three meta-path sub-graphs: PAP, PSP, and PAPAP. The edge e12=(0,1,1)subscript𝑒12011e_{12}=(0,1,1)italic_e start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT = ( 0 , 1 , 1 ) emerges since it’s absent in PAP but present in both PSP and PAPAP. For every edge (u,v)E^𝑢𝑣^𝐸(u,v)\in\hat{E}( italic_u , italic_v ) ∈ over^ start_ARG italic_E end_ARG, we assign a vector euv=(x1,x2,,x|𝒫|)subscript𝑒𝑢𝑣subscript𝑥1subscript𝑥2subscript𝑥𝒫e_{uv}=(x_{1},x_{2},\cdots,x_{|\mathcal{P}|})italic_e start_POSTSUBSCRIPT italic_u italic_v end_POSTSUBSCRIPT = ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , ⋯ , italic_x start_POSTSUBSCRIPT | caligraphic_P | end_POSTSUBSCRIPT ) to present its semantic information, where xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is set to 1111 if (u,v){𝒫i}𝑢𝑣subscriptsubscript𝒫𝑖(u,v)\in\{\mathcal{E}_{\mathcal{P}_{i}}\}( italic_u , italic_v ) ∈ { caligraphic_E start_POSTSUBSCRIPT caligraphic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT }, otherwise xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is set to 00. To further capture the semantic level information, we assign a learnable vector γ1×|𝒫|𝛾superscript1𝒫\gamma\in\mathbb{R}^{1\times|\mathcal{P}|}italic_γ ∈ blackboard_R start_POSTSUPERSCRIPT 1 × | caligraphic_P | end_POSTSUPERSCRIPT that quantifies the importance of each meta-path. In the message-passing phase, we utilize e^uv=γ×euvsubscript^𝑒𝑢𝑣𝛾subscript𝑒𝑢𝑣\hat{e}_{uv}=\gamma\times e_{uv}over^ start_ARG italic_e end_ARG start_POSTSUBSCRIPT italic_u italic_v end_POSTSUBSCRIPT = italic_γ × italic_e start_POSTSUBSCRIPT italic_u italic_v end_POSTSUBSCRIPT as the edge embedding within the meta-path view. This approach ensures that overlaps between meta-path sub-graphs are mitigated, thereby curtailing redundant message passing and rendering the meta-path view more robust compared to prior methodologies. Nevertheless, this method can lead to a dense meta-path view, an aspect we address through the proposed LAMP, detailed in the subsequent section.

Refer to caption
Figure 6: Overall architecture of LMA. To generate t(G^)𝑡^𝐺t(\hat{G})italic_t ( over^ start_ARG italic_G end_ARG ), LMA firstly accept the integrated sub-graph G^^𝐺\hat{G}over^ start_ARG italic_G end_ARG then processes the node embedding of G^^𝐺\hat{G}over^ start_ARG italic_G end_ARG using a two-layer GCN and combines these node embedding with the edge type embedding e^u,vsubscript^𝑒𝑢𝑣\hat{e}_{u,v}over^ start_ARG italic_e end_ARG start_POSTSUBSCRIPT italic_u , italic_v end_POSTSUBSCRIPT to form edge embedding and then fed into an MLP to determine Bernoulli parameters, which are ultimately converted to dropout probabilities utilizing the Gumbel-Max reparametrization trick.

4.3 Learnable Meta-Path Guided Augmentation

The dense links of the integrated sub-graph, while capturing all given meta-path sub-graphs, can pose challenges for graph contrastive learning, as sparser graphs tend to yield more favorable results [61]. To address this issue, we introduce the Learnable Meta-Path Augmentation (LMA), a adversarial training based method aimed at learning a optimized edge prunning strategy. This ensures a sparser meta-view while overcoming manually-induced biases. LMA firstly applies a random edge dropping then a learned GCN based edge prunning strategy based according to node feature and semantic information. At the first stage, random droping will effectively decrease the graph complexity since with the growth of engaged meta-path, the integrated sub-graph will be denser and eventually approximate a compeleted which is not desirable. Besides, random dropping will provdie the LMA a dynamic integrated sub-graph in each epoch so that LMA could learn a more powerful edge-cutting strategy rather than fall into a sub-optimal solution specialized for a fixed input. For learning edge cutting strategy, each edge eE^𝑒^𝐸e\in\hat{E}italic_e ∈ over^ start_ARG italic_E end_ARG is correlated with a Bernoulli random variable, described as peBernoulli(ωe)similar-tosubscript𝑝𝑒Bernoullisubscript𝜔𝑒p_{e}\sim\textrm{Bernoulli}(\omega_{e})italic_p start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ∼ Bernoulli ( italic_ω start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ). An edge will be present in tϕ(G^)=(V,E)subscript𝑡italic-ϕ^𝐺𝑉𝐸t_{\phi}(\hat{G})=(V,E)italic_t start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT ( over^ start_ARG italic_G end_ARG ) = ( italic_V , italic_E ) if pe=1subscript𝑝𝑒1p_{e}=1italic_p start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT = 1 and excluded otherwise. In order to cutting off edges based on not only node features, but also semantic information, the weights ωesubscript𝜔𝑒\omega_{e}italic_ω start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT of the Bernoulli distribution are parameterized using an MLP that takes as input the concatenation of the node embeddings obtained from a K𝐾Kitalic_K-layer HGNN augmenter on G^^𝐺\hat{G}over^ start_ARG italic_G end_ARG and the edge type embedding e^uvsubscript^𝑒𝑢𝑣\hat{e}_{uv}over^ start_ARG italic_e end_ARG start_POSTSUBSCRIPT italic_u italic_v end_POSTSUBSCRIPT. Thus, the edge representations can be expressed as:

ωe=MLP([huK;hzK;e^uv]).subscript𝜔𝑒MLPsuperscriptsubscript𝑢𝐾superscriptsubscript𝑧𝐾subscript^𝑒𝑢𝑣\omega_{e}=\text{MLP}\left([h_{u}^{K};h_{z}^{K};\hat{e}_{uv}]\right).italic_ω start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT = MLP ( [ italic_h start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ; italic_h start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ; over^ start_ARG italic_e end_ARG start_POSTSUBSCRIPT italic_u italic_v end_POSTSUBSCRIPT ] ) . (4)

For a seamless end-to-end training of tϕ(G^)subscript𝑡italic-ϕ^𝐺t_{\phi}(\hat{G})italic_t start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT ( over^ start_ARG italic_G end_ARG ), the binary nature of pesubscript𝑝𝑒p_{e}italic_p start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT is transformed into a continuous variable between [0,1] using the Gumbel-Max reparametrization trick [32, 21]. Specifically:

pe=Sigmoid(log(δ)log(1δ)+ωeτ),subscript𝑝𝑒Sigmoid𝛿1𝛿subscript𝜔𝑒𝜏p_{e}=\mathrm{Sigmoid}\left(\frac{\log(\delta)-\log(1-\delta)+\omega_{e}}{\tau% }\right),italic_p start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT = roman_Sigmoid ( divide start_ARG roman_log ( italic_δ ) - roman_log ( 1 - italic_δ ) + italic_ω start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT end_ARG start_ARG italic_τ end_ARG ) , (5)

where θUniform(0,1)similar-to𝜃Uniform01\theta\sim\text{Uniform}(0,1)italic_θ ∼ Uniform ( 0 , 1 ). As τ𝜏\tauitalic_τ converges to zero, pesubscript𝑝𝑒p_{e}italic_p start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT gravitates towards binary values, ensuring the gradient remains smooth and well-defined. Notably, this kind of edge pruning, underpinned by a stochastic graph model, has also been utilized to provide parameterized explanations of GNNs [30].

To curb LMA’s tendency for aggressive edge pruning, a regularization term λregeE^ωe|E^|subscript𝜆𝑟𝑒𝑔subscript𝑒^𝐸subscript𝜔𝑒^𝐸\lambda_{reg}\frac{\sum_{e\in\hat{E}}\omega_{e}}{|\hat{E}|}italic_λ start_POSTSUBSCRIPT italic_r italic_e italic_g end_POSTSUBSCRIPT divide start_ARG ∑ start_POSTSUBSCRIPT italic_e ∈ over^ start_ARG italic_E end_ARG end_POSTSUBSCRIPT italic_ω start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT end_ARG start_ARG | over^ start_ARG italic_E end_ARG | end_ARG is incorporated into the objective function. The hyper-parameter λregsubscript𝜆𝑟𝑒𝑔\lambda_{reg}italic_λ start_POSTSUBSCRIPT italic_r italic_e italic_g end_POSTSUBSCRIPT dictates the quantity of retained edges. Without this regulation, LMA might opt for an extreme strategy of eliminating all edges to minimize the mutual information between G𝐺Gitalic_G and tϕ(G^)subscript𝑡italic-ϕ^𝐺t_{\phi}(\hat{G})italic_t start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT ( over^ start_ARG italic_G end_ARG ), which is counterproductive. This regularization ensures an edge ratio is maintained in tϕ(G^)subscript𝑡italic-ϕ^𝐺t_{\phi}(\hat{G})italic_t start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT ( over^ start_ARG italic_G end_ARG ) to keep sufficient information for contrastive learning. The refined objective is:

argmaxθminϕ(I(fθ(G),fθ,γ(tϕ,γ(G^))λregeE^ωe|E^|).\arg\max_{\theta}\min_{\phi}\left(I(f_{\theta}(G),f_{\theta,\gamma}(t_{\phi,% \gamma}(\hat{G}))-\lambda_{reg}\frac{\sum_{e\in\hat{E}}\omega_{e}}{|\hat{E}|}% \right).roman_arg roman_max start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT roman_min start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT ( italic_I ( italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_G ) , italic_f start_POSTSUBSCRIPT italic_θ , italic_γ end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_ϕ , italic_γ end_POSTSUBSCRIPT ( over^ start_ARG italic_G end_ARG ) ) - italic_λ start_POSTSUBSCRIPT italic_r italic_e italic_g end_POSTSUBSCRIPT divide start_ARG ∑ start_POSTSUBSCRIPT italic_e ∈ over^ start_ARG italic_E end_ARG end_POSTSUBSCRIPT italic_ω start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT end_ARG start_ARG | over^ start_ARG italic_E end_ARG | end_ARG ) . (6)

Of note, the meta-path importance γ𝛾\gammaitalic_γ offers a holistic perspective for both fϕ,γ()subscript𝑓italic-ϕ𝛾f_{\phi,\gamma}(\cdot)italic_f start_POSTSUBSCRIPT italic_ϕ , italic_γ end_POSTSUBSCRIPT ( ⋅ ) and tϕ,γ()subscript𝑡italic-ϕ𝛾t_{\phi,\gamma}(\cdot)italic_t start_POSTSUBSCRIPT italic_ϕ , italic_γ end_POSTSUBSCRIPT ( ⋅ ). While tϕ,γ()subscript𝑡italic-ϕ𝛾t_{\phi,\gamma}(\cdot)italic_t start_POSTSUBSCRIPT italic_ϕ , italic_γ end_POSTSUBSCRIPT ( ⋅ ) strives to maximize divergence from the network schema view, it places a premium on longer meta-paths. This is because, in contrast to the meta-path view, the network schema primarily harbors single-hop information. Conversely, during the agreement maximization phase, shorter meta-paths become more salient contributors by fϕ,γ()subscript𝑓italic-ϕ𝛾f_{\phi,\gamma}(\cdot)italic_f start_POSTSUBSCRIPT italic_ϕ , italic_γ end_POSTSUBSCRIPT ( ⋅ ).

4.4 Network Schema view

In the network schema view, for a given node i𝑖iitalic_i, we initiate the process by employing a type-specific multilayer perceptron (MLP), denoted as MLP𝒜(i)𝑀𝐿superscript𝑃𝒜𝑖MLP^{\mathcal{A}(i)}italic_M italic_L italic_P start_POSTSUPERSCRIPT caligraphic_A ( italic_i ) end_POSTSUPERSCRIPT, to transform the features xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT of node i𝑖iitalic_i into a unified feature space. This transformation is represented as follows:

hi(0)=MLP𝒜(i)(xi).superscriptsubscript𝑖0𝑀𝐿superscript𝑃𝒜𝑖subscript𝑥𝑖h_{i}^{(0)}=MLP^{\mathcal{A}(i)}(x_{i}).italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT = italic_M italic_L italic_P start_POSTSUPERSCRIPT caligraphic_A ( italic_i ) end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) . (7)

Here, 𝒜(i)𝒜𝑖\mathcal{A}(i)caligraphic_A ( italic_i ) presents the type of node i𝑖iitalic_i. Subsequently, we incorporate one-hot encoding to represent the semantic information of various relations. This encoded information, along with the node features, is input into a unified HGNN encoder. The specifics of this unified HGNN encoder, which operates irrespective of node types while preserving edge type embeddings, will be elaborated in the following section.It is important to note that both the network schema view and the meta-path view utilize the same HGNN encoder, denoted as fθ,γsubscript𝑓𝜃𝛾f_{\theta,\gamma}italic_f start_POSTSUBSCRIPT italic_θ , italic_γ end_POSTSUBSCRIPT. However, a key distinction lies in the treatment of the parameter γ𝛾\gammaitalic_γ: it remains frozen in the network schema view, whereas gradients are enabled for γ𝛾\gammaitalic_γ in the meta-path view, allowing for adaptability in encoding different types of information.

4.5 Unified HGNN Encoder

In the context of LAMP, as outlined in eq 6, it is crucial to employ a unified HGNN that can efficiently handle both the network schema view (heterogeneous graph) and the meta-path view (homogeneous graph). While an approach could involve two distinct HGNN encoders tailored for each view, such an architecture may be inappropriate for LAMP. The core concern is that distinct encoders might produce node embedding governed by entirely different parameter sets, making it extremely hard for LAMP to meaningfully minimize similarity based on topological information. Essentially, a unified HGNN encoder fosters a harmonious link between the two views, ensuring that embedding reflects inherent structural divergence rather than encoder bias.

α^=exp(LeakyReLU(aT[WhiWhjWrrψ(i,j)]))kNiexp(LeakyReLU(aT[WhiWhkWrrψ(i,k)])).^𝛼expLeakyReLUsuperscript𝑎𝑇delimited-[]𝑊subscript𝑖norm𝑊subscript𝑗subscript𝑊𝑟subscript𝑟𝜓𝑖𝑗subscript𝑘subscript𝑁𝑖expLeakyReLUsuperscript𝑎𝑇delimited-[]𝑊subscript𝑖norm𝑊subscript𝑘subscript𝑊𝑟subscript𝑟𝜓𝑖𝑘\small\hat{\alpha}=\frac{\text{exp}(\text{LeakyReLU}(a^{T}[Wh_{i}\Arrowvert Wh% _{j}\Arrowvert W_{r}r_{\psi}(\langle i,j\rangle)]))}{\sum_{k\in N_{i}}\text{% exp}(\text{LeakyReLU}(a^{T}[Wh_{i}\Arrowvert Wh_{k}\Arrowvert W_{r}r_{\psi}(% \langle i,k\rangle)]))}.over^ start_ARG italic_α end_ARG = divide start_ARG exp ( LeakyReLU ( italic_a start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT [ italic_W italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ italic_W italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∥ italic_W start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_ψ end_POSTSUBSCRIPT ( ⟨ italic_i , italic_j ⟩ ) ] ) ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_k ∈ italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT exp ( LeakyReLU ( italic_a start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT [ italic_W italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ italic_W italic_h start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ italic_W start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_ψ end_POSTSUBSCRIPT ( ⟨ italic_i , italic_k ⟩ ) ] ) ) end_ARG . (8)

4.5.1 Node Residual:

Introducing pre-activation residual connections for nodes:

hi(l)=σ(jNiαij(l)W(l)hjl1+Wres(l)hi(l1)).superscriptsubscript𝑖𝑙𝜎subscript𝑗subscript𝑁𝑖superscriptsubscript𝛼𝑖𝑗𝑙superscript𝑊𝑙superscriptsubscript𝑗𝑙1superscriptsubscript𝑊𝑟𝑒𝑠𝑙superscriptsubscript𝑖𝑙1h_{i}^{(l)}=\sigma(\sum_{j\in N_{i}}\alpha_{ij}^{(l)}W^{(l)}h_{j}^{l-1}+W_{res% }^{(l)}h_{i}^{(l-1)}).italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT = italic_σ ( ∑ start_POSTSUBSCRIPT italic_j ∈ italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT italic_W start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT + italic_W start_POSTSUBSCRIPT italic_r italic_e italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l - 1 ) end_POSTSUPERSCRIPT ) . (9)

4.5.2 Edge Residual:

Following the insights from Realformer [15], we add residuals to the attention scores:

αij(l)=(1β)αij(l)+βαij(l1),superscriptsubscript𝛼𝑖𝑗𝑙1𝛽superscriptsubscript𝛼𝑖𝑗𝑙𝛽superscriptsubscript𝛼𝑖𝑗𝑙1\alpha_{ij}^{(l)}=(1-\beta)\alpha_{ij}^{(l)}+\beta\alpha_{ij}^{(l-1)},italic_α start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT = ( 1 - italic_β ) italic_α start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT + italic_β italic_α start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l - 1 ) end_POSTSUPERSCRIPT , (10)

with β[0,1]𝛽01\beta\in[0,1]italic_β ∈ [ 0 , 1 ] serving as a scaling factor. In our framework, the representation of relationships between end nodes varies based on the view. For the network schema view, the function rψ(u,v)subscript𝑟𝜓𝑢𝑣r_{\psi}(\langle u,v\rangle)italic_r start_POSTSUBSCRIPT italic_ψ end_POSTSUBSCRIPT ( ⟨ italic_u , italic_v ⟩ ) yields a one-hot vector encapsulating the relation between the nodes. Conversely, in the meta-path view, the relationship is captured by rψ(u,v)=e^uvsubscript𝑟𝜓𝑢𝑣subscript^𝑒𝑢𝑣r_{\psi}(\langle u,v\rangle)=\hat{e}_{uv}italic_r start_POSTSUBSCRIPT italic_ψ end_POSTSUBSCRIPT ( ⟨ italic_u , italic_v ⟩ ) = over^ start_ARG italic_e end_ARG start_POSTSUBSCRIPT italic_u italic_v end_POSTSUBSCRIPT leveraging the embedded semantic information. The transformation matrix Wrsubscript𝑊𝑟W_{r}italic_W start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT is designed to align the dimension of edge embedding with that of node embedding. Uniquely within the HGNN encoder, Wrsubscript𝑊𝑟W_{r}italic_W start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT is the sole parameter not shared across both the network schema and meta-path views.

4.6 Contrastive Optimization

The core of our approach involves utilizing the network schema view G𝐺Gitalic_G and meta-path view G^^𝐺\hat{G}over^ start_ARG italic_G end_ARG for the contrastive learning mechanism. Both graphs are fed into an HGNN followed by an MLP with a single hidden layer, mapping them into a space where the contrastive loss is computed:

ziG,projsuperscriptsubscript𝑧𝑖𝐺proj\displaystyle z_{i}^{G,\textrm{proj}}italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_G , proj end_POSTSUPERSCRIPT =W(2)σ(W(1)ziG+b(1))+b(2),absentsuperscript𝑊2𝜎superscript𝑊1superscriptsubscript𝑧𝑖𝐺superscript𝑏1superscript𝑏2\displaystyle=W^{(2)}\sigma(W^{(1)}z_{i}^{G}+b^{(1)})+b^{(2)},= italic_W start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT italic_σ ( italic_W start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_G end_POSTSUPERSCRIPT + italic_b start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT ) + italic_b start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT , (11)
ziG^,projsuperscriptsubscript𝑧𝑖^𝐺proj\displaystyle z_{i}^{\hat{G},\textrm{proj}}italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT over^ start_ARG italic_G end_ARG , proj end_POSTSUPERSCRIPT =W(2)σ(W(1)ziG^+b(1))+b(2),absentsuperscript𝑊2𝜎superscript𝑊1superscriptsubscript𝑧𝑖^𝐺superscript𝑏1superscript𝑏2\displaystyle=W^{(2)}\sigma(W^{(1)}z_{i}^{\hat{G}}+b^{(1)})+b^{(2)},= italic_W start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT italic_σ ( italic_W start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT over^ start_ARG italic_G end_ARG end_POSTSUPERSCRIPT + italic_b start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT ) + italic_b start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT , (12)

where σ𝜎\sigmaitalic_σ denotes the Leaky Relu function. The parameters are shared between the two views’ embedding.

Adopting the strategy introduced in HeCo, we generate high-quality positive and negative pairs. We introduce a connectivity vector Ci(j)subscript𝐶𝑖𝑗C_{i}(j)italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_j ), which represents the connectivity between nodes based on the number of meta-path instances connecting them.

Ci(j)=n=1|𝒫|𝟙(jNi𝒫n),subscript𝐶𝑖𝑗superscriptsubscript𝑛1𝒫1𝑗superscriptsubscript𝑁𝑖subscript𝒫𝑛C_{i}(j)=\sum_{n=1}^{|\mathcal{P}|}\mathbbm{1}(j\in N_{i}^{\mathcal{P}_{n}}),italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_j ) = ∑ start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT | caligraphic_P | end_POSTSUPERSCRIPT blackboard_1 ( italic_j ∈ italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT caligraphic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) , (13)

where 𝟙()1\mathbbm{1}(\cdot)blackboard_1 ( ⋅ ) represents the indicator function. Following this, we establish positive and negative samples by applying a threshold to the sorted node connectivity using Tpossubscript𝑇𝑝𝑜𝑠T_{pos}italic_T start_POSTSUBSCRIPT italic_p italic_o italic_s end_POSTSUBSCRIPT. The intuition here is that node pairs with higher connectivity are more likely to belong to the same class. The contrastive loss for node i𝑖iitalic_i can be defined as follows:

i=logjPosiexp(sim(ziG,proj,zjG^,proj)/τ)kPosiNegiexp(sim(ziG,proj,zkG^,proj)/τ),subscript𝑖subscript𝑗𝑃𝑜subscript𝑠𝑖𝑠𝑖𝑚superscriptsubscript𝑧𝑖𝐺projsuperscriptsubscript𝑧𝑗^𝐺proj𝜏subscript𝑘𝑃𝑜subscript𝑠𝑖𝑁𝑒subscript𝑔𝑖𝑠𝑖𝑚superscriptsubscript𝑧𝑖𝐺projsuperscriptsubscript𝑧𝑘^𝐺proj𝜏\mathcal{L}_{i}=-\log\frac{\sum_{j\in Pos_{i}}\exp(sim(z_{i}^{G,\textrm{proj}}% ,z_{j}^{\hat{G},\textrm{proj}})/\tau)}{\sum_{k\in Pos_{i}\cup Neg_{i}}\exp(sim% (z_{i}^{G,\textrm{proj}},z_{k}^{\hat{G},\textrm{proj}})/\tau)},caligraphic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = - roman_log divide start_ARG ∑ start_POSTSUBSCRIPT italic_j ∈ italic_P italic_o italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT roman_exp ( italic_s italic_i italic_m ( italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_G , proj end_POSTSUPERSCRIPT , italic_z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT over^ start_ARG italic_G end_ARG , proj end_POSTSUPERSCRIPT ) / italic_τ ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_k ∈ italic_P italic_o italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∪ italic_N italic_e italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT roman_exp ( italic_s italic_i italic_m ( italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_G , proj end_POSTSUPERSCRIPT , italic_z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT over^ start_ARG italic_G end_ARG , proj end_POSTSUPERSCRIPT ) / italic_τ ) end_ARG , (14)

where sim(u,v)𝑠𝑖𝑚𝑢𝑣sim(u,v)italic_s italic_i italic_m ( italic_u , italic_v ) represents the cosine similarity between vectors u𝑢uitalic_u and v𝑣vitalic_v, and τ𝜏\tauitalic_τ is the temperature parameter. The final objective aggregates the contrastive losses for all nodes:

𝒥=1|V|iVi.𝒥1𝑉subscript𝑖𝑉subscript𝑖\mathcal{J}=\frac{1}{|V|}\sum_{i\in V}\mathcal{L}_{i}.caligraphic_J = divide start_ARG 1 end_ARG start_ARG | italic_V | end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ italic_V end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT . (15)

For downstream tasks, embedding from zG^superscript𝑧^𝐺z^{\hat{G}}italic_z start_POSTSUPERSCRIPT over^ start_ARG italic_G end_ARG end_POSTSUPERSCRIPT from the meta-path view is employed. Throughout the training process, a two-step approach is implemented for each epoch. For every epoch, in the first step, parameters within LMP are frozen, and we train the HGNN by minimizing the contrastive loss. Subsequently, in the second step, HGNN parameters are frozen while LAMP is trained with the objective of maximizing the contrastive loss.

Dataset Nodes NodeTypes Edges EdgeTypes Target Classes
DBLP 26 1282612826\,12826 128 4 239 566239566239\,566239 566 6 author 4
IMDB 21 4202142021\,42021 420 4 86 6428664286\,64286 642 6 movie 5
ACM 10 9421094210\,94210 942 4 547 872547872547\,872547 872 8 paper 3
Freebase 180 098180098180\,098180 098 8 1 057 68810576881\,057\,6881 057 688 36 book 7
Table 2: The statistics of the datasets

5 Experimental Evaluation

Dataset DBLP IMDB ACM FreeBase
Methods Training Data Micro-F1 Macro-F1 Micro-F1 Macro-F1 Micro-F1 Macro-F1 Micro-F1 Macro-F1
GCN X,A,P,Y 90.84±0.32 91.47±0.34 57.88±1.18 64.82±0.64 92.17±0.24 92.12±0.23 27.84±3.13 60.23±0.92
RGCN X,A,Y 91.52±0.50 92.07±0.50 58.85±0.26 62.05±0.15 91.55±0.74 91.41±0.77 46.78±0.77 58.33±1.57
HAN X,A,P,Y 91.67±0.49 92.05±0.62 57.74±0.96 64.63±0.58 90.89±0.43 60.79±0.43 21.31±1.68 54.77±1.4
GTN X,A,Y 93.52±0.55 93.97±0.54 60.47±0.98 65.14±0.45 91.31±0.70 91.20±0.71 OOM OOM
HGT X,A,Y 93.01±0.23 93.49±0.25 63.00±1.19 67.20±0.57 91.12±0.76 91.00±0.76 29.28±2.52 60.51±1.16
GAT X,A,P,Y 93.83±0.27 93.39±0.30 58.94±1.35 64.86±0.43 92.26±0.94 92.19±0.93 40.73±2.58 65.26±0.45
\hdashline[1pt/0.5pt] Mp2vec A,P 90.25±0.10 91.17±0.10 41.45±1.60 42.46±1.70 61.13±0.40 62.72±0.30 55.94±0.7 58.74±0.80
DGI X,A,P 89.19±0.90 90.35±0.80 46.13±0.30 47.21±0.90 80.03±3.30 80.15±3.20 53.81±1.10 57.96±0.70
DMGI X,A,P 89.46±0.60 90.66±0.50 47.49±1.40 61.97±1.30 87.97±0.40 87.82±0.50 52.10±0.70 56.69±1.20
X-GOAL X,A,P 83.00±0.25 91.90±0.22 57.43±0.50 58.14±0.62 91.22±0.10 91.26±0.17 58.44±1.10 57.91±1.10
HeCo X,A,P 90.64±0.30 91.59±0.20 58.07±0.50 59.13±0.60 89.04±0.50 88.71±0.50 60.13±1.30 62.24±1.60
LAMP X,A 92.44±0.32 92.22±0.30 61.85±0.39 62.19±0.50 91.35±0.50 91.27±0.50 61.32±1.20 64.13±1.20
Table 3: Quantitative results on node classification, detailing accuracy percentages and standard deviations. The second column specifies the training data available for each method, where X𝑋Xitalic_X, A𝐴Aitalic_A, P𝑃Pitalic_P, and Y𝑌Yitalic_Y correspond to node features, the adjacency matrix, optimal meta-path combination, and labels, respectively. The best and second best performance for unsupervised models is highlighted in boldface and underline. Instances where the computation surpassed the memory constraints of a 200GB CPU are marked as "OOM".

5.1 Experimental Setup

5.1.1 Datasets:

In our study, we leveraged the HGB benchmark [31], which includes four diverse HIN datasets detailed in Table 2. The DBLP dataset [8] is sourced from the renowned DBLP bibliography website, focusing on a subset of computer science publications and featuring nodes such as authors, papers, terms, and venues. The ACM dataset [56] is also a citation network from the computer science domain. We utilized the Freebase knowledge graph [29], specifically a subgraph with around 1,000,000 edges across eight types of entities, in line with previous research methodologies [48]. Lastly, the IMDB dataset focuses on the IMDB movie database, particularly covering movie genres like Action, Comedy, Drama, Romance, and Thriller.

5.1.2 Baselines and Implementation Details:

We compare LAMP with a diverse set of methods, including five unsupervised techniques: Mp2vec [5], DGI [42], DMGI [35], X-GOAL [24] and HeCo [47], as well as six (semi-)supervised ones: GAT [41, 31], GCN [27, 31], RGCN [37], HAN [46], GTN [53], HGT [17]. For Mp2vec, we configure parameters with 40 walks per node, a walk length of 100, and a window size of 5. For the meta-path selection, in the case of Mp2vec and DGI, we evaluate all meta-paths and report the best results; for all the other meta-path based methods we report the best performance with their optimal meta-path combination. Unless stated otherwise, default parameters are adopted from the original papers. For GCN and GAT, we employ the approach outlined in [31], enriching the original HIN with additional meta-path instances based on selected meta-paths. Specifically, for GAT, we employ the same edge-type embedding technique in the attention mechanism as in HGB. For LAMP, without a selection of optimal combination, we engage all the pre-defined meta-paths to construct the integrated meta-path subgraph. We use Glorot initialization [10] with the Adam optimizer [26]. The learning rate ranges from 1×1041superscript1041\times 10^{-4}1 × 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT to 5×1025superscript1025\times 10^{-2}5 × 10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT, and patience values for early stopping are set between 5 and 200. Dropout rates are adjusted between 0.1 and 0.5, with increments of 0.05. LMA utilizes a two-layer GCN and LAMP integrates a two-layer HGB for node embedding within its contrastive learning framework. For the randomly edeg dropping, we search the best parameter from 0.3 to 0.8. We fixed the embedding dimensions at 64 for all techniques. Experiments are conducted 10 times randomly, with average results reported. For datasets lacking attributes, nodes receive one-hot ID vectors.

5.2 Node Classification

In node classification task, we leveraged learned node embeddings to train a linear classifier in a transductive setting, utilizing all available edges during training. The distribution of node labels was consistent across datasets: 24% for training, 6% for validation, and 70% for testing. Classification performance was evaluated using Macro-F1 and Micro-F1 metrics, with results reported for the test set based on optimal validation performance (Table 3). Among all baseline methods, we report the best performance with their corresponding optimal meta-path combinations For LAMP, we report the performance with combination involving all the meta-path to demonstrate the robustness. Notably, LAMP consistently surpassed other unsupervised methods and showed remarkable efficacy against supervised models, particularly in sparser datasets like IMDB and Freebase. Crucially, LAMP operates without relying on an optimal meta-path combination, setting it apart from other methodologies. We also examined LAMP’s sensitivity to meta-path combinations (Figure 1), demonstrating its superior stability and robustness, even in comparison to supervised approaches.

5.3 Sensitivity of Meta-Paths

To examine the sensitivity of various meta-path combinations, we conducted experiments on the ACM dataset. Our focus was to observe the variations and the min-max gap in Micro-F1 scores across all possible meta-path combinations. We considered the following candidate meta-paths: "PAP", "PSP", "PTP", "PPSP", and "-PPSP", which collectively form 26 distinct meta-path combinations, as illustrated in Figure 3. It is important to note that methods like Mp2vec and DGI were excluded from these experiments, as they are incompatible with all meta-path combinations due to their inherent design limitations and their inability to achieve state-of-the-art (SOTA) performance. The results of our experiments are presented in Table 4. In these tests, LAMP demonstrated a significant outperformance over existing unsupervised methods and even surpassed some of the supervised learning methods in terms of Micro-F1 scores. Intriguingly, current state-of-the-art methods, including HeCo and Xgoal, exhibited substantial sensitivity to the choice of meta-path combinations. This finding underscores the importance of robust meta-path handling, especially in self-supervised learning contexts, and highlights the effectiveness of LAMP in addressing this challenge.

Methods Standard Deviation(%) Min-Max gap(%)
DMGI 5.46 25.26
XGOAL 7.01 24.89
HeCo 11.70 36.69
HAN-1Layer 3.95 11.16
HAN-2Layer 4.49 20.82
LAMP 2.07 6.08
Table 4: Quantitative results on Sensitivity of Meta-Paths

5.4 Node Clustering

In our experimental setup, we employ the K-means clustering algorithm for the learned node embedding. For performance evaluation, we utilize standard clustering metrics: normalized mutual information (NMI) and adjusted rand index (ARI). Recognizing the potential variability introduced by K-means due to its sensitivity to initialization, we execute the clustering process across ten independent runs and present the averaged outcomes in Table 5. Notably, the IMDB dataset is excluded from this evaluation, given its multi-dimensional label structure in HGB dataset. Furthermore, direct comparisons with supervised methodologies are omitted; these models have inherent access to label information during training and are optimized based on validation metrics. Empirical results underscore that LAMP consistently exhibits superior performance across datasets, reaffirming its effectiveness in the clustering context.

Datesets DBLP ACM Freebase
Metrics NMI ARI NMI ARI NMI ARI
Mp2vec 73.55 77.70 48.43 34.65 16.47 17.32
DGI 59.23 61.85 51.73 41.16 18.34 11.29
DMGI 70.06 75.46 51.66 46.64 16.98 16.91
X-GOAL 61.53 78.91 56.77 43.67 18.67 17.44
HeCo 74.51 80.17 56.87 56.94 20.38 20.98
LAMP 77.13 82.73 58.45 59.12 23.44 24.38
Table 5: Quantitative results on node clustering.

5.5 Ablation Study

This section evaluates two distinct variants: LAMPw.o.mpsubscriptLAMPw.o.mp\text{{LAMP}}_{\textrm{w.o.mp}}LAMP start_POSTSUBSCRIPT w.o.mp end_POSTSUBSCRIPT (referred to as LAMPvar1subscriptLAMP𝑣𝑎𝑟1\text{{LAMP}}_{var1}LAMP start_POSTSUBSCRIPT italic_v italic_a italic_r 1 end_POSTSUBSCRIPT) and LAMPw.o.unifiedHGNNsubscriptLAMPw.o.unifiedHGNN\text{{LAMP}}_{\textrm{w.o.unifiedHGNN}}LAMP start_POSTSUBSCRIPT w.o.unifiedHGNN end_POSTSUBSCRIPT (referred to as LAMPvar2subscriptLAMP𝑣𝑎𝑟2\text{{LAMP}}_{var2}LAMP start_POSTSUBSCRIPT italic_v italic_a italic_r 2 end_POSTSUBSCRIPT). For the LAMPvar1subscriptLAMP𝑣𝑎𝑟1\text{{LAMP}}_{var1}LAMP start_POSTSUBSCRIPT italic_v italic_a italic_r 1 end_POSTSUBSCRIPT version, we freeze the parameter γ𝛾\gammaitalic_γ to cancel out the effect of meta-path importance during LMA learning. The intent behind this is to examine the role of meta-path importance in bridging local and high-order information. On the other hand, LAMPvar2subscriptLAMP𝑣𝑎𝑟2\text{{LAMP}}_{var2}LAMP start_POSTSUBSCRIPT italic_v italic_a italic_r 2 end_POSTSUBSCRIPT replaces the unified HGB encoder with the meta-path and network-schema encoders from HeCo. Within this setup, the meta-path view is processed using the HAN [46] attention mechanism, while a standard GCN tackles the original HIN. For the meta-path view, the LMA edge-pruning technique is applied to each individual meta-path sub-graph.

Table 3 illustrates that both LAMPvar1subscriptLAMP𝑣𝑎𝑟1\text{{LAMP}}_{var1}LAMP start_POSTSUBSCRIPT italic_v italic_a italic_r 1 end_POSTSUBSCRIPT and LAMPvar2subscriptLAMP𝑣𝑎𝑟2\text{{LAMP}}_{var2}LAMP start_POSTSUBSCRIPT italic_v italic_a italic_r 2 end_POSTSUBSCRIPT suffer a considerable decline in performance. (1) Lacking the meta-path importance γ𝛾\gammaitalic_γ, LAMPvar1subscriptLAMP𝑣𝑎𝑟1\text{{LAMP}}_{var1}LAMP start_POSTSUBSCRIPT italic_v italic_a italic_r 1 end_POSTSUBSCRIPT struggles to harness sufficient overall structural data. It primarily emphasizes local details based on node attributes. Similarly, without the guidance of meta-path importance γ𝛾\gammaitalic_γ, LMA tends to prioritize lengthy meta-paths, and neglect potentially valuable shorter meta-paths. The resultant effect weakens LAMPvar1subscriptLAMP𝑣𝑎𝑟1\text{{LAMP}}_{var1}LAMP start_POSTSUBSCRIPT italic_v italic_a italic_r 1 end_POSTSUBSCRIPT’s capability to bridge local and high-order information. This underscores that the guidance from meta-path importance is crucial for the LAMP model. (2) For LAMPvar2subscriptLAMP𝑣𝑎𝑟2\text{{LAMP}}_{var2}LAMP start_POSTSUBSCRIPT italic_v italic_a italic_r 2 end_POSTSUBSCRIPT , employing separate HGNN encoders for the two views might have been effective in HeCo, but it does not work for LAMP. As shown in Table6, LAMPvar2subscriptLAMP𝑣𝑎𝑟2\text{{LAMP}}_{var2}LAMP start_POSTSUBSCRIPT italic_v italic_a italic_r 2 end_POSTSUBSCRIPT lags behind in performance across all datasets.Using disparate HGNN encoders inherently amplifies the differences in embedding produced by the two views, even when the target node attributes remain consistent across both views. This introduces a dilemma for LMA, making it challenging to determine which edges to prune, as the two views already appear distinct. This inconsistency can destabilize the model, increasing the risk of training collapse.

Dataset DBLP IMDB ACM FreeBase
Methods Micro-F1 Macro-F1 Micro-F1 Macro-F1 Micro-F1 Macro-F1 Micro-F1 Macro-F1
LAMPvar1subscriptLAMP𝑣𝑎𝑟1\text{LAMP}_{var1}LAMP start_POSTSUBSCRIPT italic_v italic_a italic_r 1 end_POSTSUBSCRIPT 71.05 71.12 34.98 34.05 73.85 73.06 32.06 31.12
LAMPvar2subscriptLAMP𝑣𝑎𝑟2\text{LAMP}_{var2}LAMP start_POSTSUBSCRIPT italic_v italic_a italic_r 2 end_POSTSUBSCRIPT 86.33 87.27 53.40 54.20 84.54 84.75 49.51 50.04
LAMP 92.44 92.22 61.85 62.19 91.35 91.27 61.32 64.13
Table 6: Quantitative results with two LAMP variants.

5.6 Analysis of Hyper-parameters

In this section, we examine our model’s sensitivity to two critical hyper-parameters: the threshold for positive samples Tpossubscript𝑇𝑝𝑜𝑠T_{pos}italic_T start_POSTSUBSCRIPT italic_p italic_o italic_s end_POSTSUBSCRIPT and the regulation term λregsubscript𝜆𝑟𝑒𝑔\lambda_{reg}italic_λ start_POSTSUBSCRIPT italic_r italic_e italic_g end_POSTSUBSCRIPT, which determines the proportion of retained edges in LMA. Node classification on the ACM and DBLP datasets is evaluated, with both Macro-F1 and Micro-F1 scores presented.

5.6.1 Analysis of Tpossubscript𝑇𝑝𝑜𝑠T_{pos}italic_T start_POSTSUBSCRIPT italic_p italic_o italic_s end_POSTSUBSCRIPT

The threshold Tpossubscript𝑇𝑝𝑜𝑠T_{pos}italic_T start_POSTSUBSCRIPT italic_p italic_o italic_s end_POSTSUBSCRIPT controls the number of positive samples. We vary its value to observe its impact on performance, as shown in Figure 7(a) and Figure 7(b). As Tpossubscript𝑇𝑝𝑜𝑠T_{pos}italic_T start_POSTSUBSCRIPT italic_p italic_o italic_s end_POSTSUBSCRIPT increases, performance initially improves before declining. The optimal thresholds are determined to be 7 for DBLP and 8 for ACM. These performance trends are consistent across both datasets.

5.6.2 Analysis of λregsubscript𝜆𝑟𝑒𝑔\lambda_{reg}italic_λ start_POSTSUBSCRIPT italic_r italic_e italic_g end_POSTSUBSCRIPT

Our exploration also considers the consequences of adjusting λregsubscript𝜆𝑟𝑒𝑔\lambda_{reg}italic_λ start_POSTSUBSCRIPT italic_r italic_e italic_g end_POSTSUBSCRIPT, which governs the fraction of edges retained by LMA. Results are presented in Figure 7(c) and Figure 7(d). For both DBLP and ACM datasets, λregsubscript𝜆𝑟𝑒𝑔\lambda_{reg}italic_λ start_POSTSUBSCRIPT italic_r italic_e italic_g end_POSTSUBSCRIPT=0.3 yields peak performance, preserving approximately half of the meta-path view edges. Notably, raising λregsubscript𝜆𝑟𝑒𝑔\lambda_{reg}italic_λ start_POSTSUBSCRIPT italic_r italic_e italic_g end_POSTSUBSCRIPT beyond 0.5 results in the preservation of 70%-80% of edges. This excessive retention introduces redundant data into the model, leading to diminished efficacy.

Refer to caption
Figure 7: Impact of Tpossubscript𝑇𝑝𝑜𝑠T_{pos}italic_T start_POSTSUBSCRIPT italic_p italic_o italic_s end_POSTSUBSCRIPT and λregsubscript𝜆𝑟𝑒𝑔\lambda_{reg}italic_λ start_POSTSUBSCRIPT italic_r italic_e italic_g end_POSTSUBSCRIPT on performance.

6 Related Work

6.1 Heterogeneous Graph Contrastive Learning

HGCL has rapidly evolved, effectively adapting contrastive learning techniques for heterogeneous graphs [35, 25, 47, 24, 3, 57, 60]. Standard HGCL approaches involve creating multiple graph views via meta-path or network-schema based augmentations, followed by representation learning through contrasting positive and negative samples. DMGI [35], for instance, contrasts the original network with its corrupted counterpart for each meta-path view, integrating a consensus regularization for meta-path fusion. HeCo [47] introduces two augmentation techniques—meta-path sub-graph view and network schema view—and minimizes the inter-view information entropy using personalized pairwise InfoNCE. HDMI [25] and XGOAL [24] are advanced versions of DGMI. HDMI improved semantic attention via high-order mutual information, XGOAL proposed a stronger positive and negative samples generating strategy, and node embeddings are obtained by simply average pooling over these layer-specific embeddings. CPT-HG [23] presents a pre-training model grounded in contrastive learning by making sub-graphs derived from positive samples integrate randomly swapped nodes from the negative set.

6.2 HGNNs applications in IR

In recent years, heterogeneous graph neural networks (HGNNs) as general extension of homogeneous graph [19, 43, 18, 45, 55, 44, 14] have risen to prominence as a pivotal tool in information retrieval (IR), adept at extracting rich structural and semantic information from heterogeneous graphs. This capability has led to their widespread application across various IR domains, including search engines, recommendation systems, and question-answering systems, among others. In the context of search engines and matching, Chen et al. [2] innovated a cross-modal retrieval method utilizing heterogeneous graph embeddings. This method adeptly preserves cross-modal information, overcoming the limitations of traditional approaches that often lose modality-specific details. Similarly, Guan et al. [12] addressed fashion compatibility modeling by integrating user preferences and attribute entities within a meta-path-guided HGNN framework. Additionally, Yuan et al. [52] introduced the Spatio-Temporal Dual Graph Attention Network (STDGAT) for intelligent query-Point of Interest (POI) matching in location-based services. By leveraging semantic representation, dual graph attention, and spatiotemporal factors, STDGAT enhances matching accuracy, even with partial query keywords.The domain of recommendation systems has also seen significant advancements through the application of HGNNs. Cai et al. [1] proposed an inductive heterogeneous graph neural network (IHGNN) model tailored for cold-start recommendation scenarios, addressing the challenge of sparse user attribute data. Pang et al. [34] developed a personalized session-based recommendation method using heterogeneous global graph neural networks (HG-GNN), which effectively captures user preferences from both current and historical sessions. Moreover, Song et al. [38] presented a self-supervised, calorie-aware heterogeneous graph network (SCHGN) for food recommendations, integrating user preferences and ingredient relationships to enhance the recommendation quality.In the arena of question-answering systems, HGNNs have garnered considerable attention. Feng et al. [7] proposed a document-entity heterogeneous graph network (DEHG) that integrates structured and unstructured information sources for multi-hop reasoning in open-domain question answering. Furthermore, Gao et al. [9] introduced HeteroQA, employing a question-aware heterogeneous graph transformer to assimilate multiple information sources from user communities, enriching the question-answering process.

7 Conclusion

Our study reveals the sensitivity of existing methodologies to meta-path combinations in unsupervised heterogeneous graph neural networks. To address this challenge, we introduce LAMP, a meta-path-guided adversarial approach for Heterogeneous Graph Contrastive Learning (HGCL). LAMP excels in capturing local and high-order structural information through dual views and Learnable Meta-Path guided augmentation (LMA) with an HGNN. Empirical tests across various datasets showcase LAMP’s superiority over existing unsupervised models and competitive performance even with supervised models. LAMP holds great potential for future heterogeneous graph contrastive learning research.

References

  • [1] Desheng Cai, Shengsheng Qian, Quan Fang, Jun Hu, and Changsheng Xu. User cold-start recommendation via inductive heterogeneous graph neural network. ACM Transactions on Information Systems, 41(3):1–27, 2023.
  • [2] Dapeng Chen, Min Wang, Haobin Chen, Lin Wu, Jing Qin, and Wei Peng. Cross-modal retrieval with heterogeneous graph embedding. In João Magalhães, Alberto Del Bimbo, Shin’ichi Satoh, Nicu Sebe, Xavier Alameda-Pineda, Qin Jin, Vincent Oria, and Laura Toni, editors, MM ’22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10 - 14, 2022, pages 3291–3300. ACM, 2022.
  • [3] Mengru Chen, Chao Huang, Lianghao Xia, Wei Wei, Yong Xu, and Ronghua Luo. Heterogeneous graph contrastive learning for recommendation. In Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining, pages 544–552. ACM, 2023.
  • [4] Philipp Christmann, Rishiraj Saha Roy, and Gerhard Weikum. Explainable conversational question answering over heterogeneous sources via iterative graph neural networks. In Hsin-Hsi Chen, Wei-Jou (Edward) Duh, Hen-Hsen Huang, Makoto P. Kato, Josiane Mothe, and Barbara Poblete, editors, Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2023, Taipei, Taiwan, July 23-27, 2023, pages 643–653. ACM, 2023.
  • [5] Yuxiao Dong, Nitesh V Chawla, and Ananthram Swami. metapath2vec: Scalable representation learning for heterogeneous networks. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, pages 135–144, 2017.
  • [6] Chenguang Du, Kaichun Yao, Hengshu Zhu, Deqing Wang, Fuzhen Zhuang, and Hui Xiong. Seq-hgnn: Learning sequential node representation on heterogeneous graph. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’23, page 1721–1730, New York, NY, USA, 2023. Association for Computing Machinery.
  • [7] Yue Feng, Zhen Han, Mingming Sun, and Ping Li. Multi-hop open-domain question answering over structured and unstructured knowledge. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 151–156, 2022.
  • [8] Xinyu Fu, Jiani Zhang, Ziqiao Meng, and Irwin King. Magnn: Metapath aggregated graph neural network for heterogeneous graph embedding. In Proceedings of The Web Conference 2020, pages 2331–2341, 2020.
  • [9] Shen Gao, Yuchi Zhang, Yongliang Wang, Yang Dong, Xiuying Chen, Dongyan Zhao, and Rui Yan. Heteroqa: Learning towards question-and-answering through multiple information sources via heterogeneous graph modeling. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining, pages 307–315, 2022.
  • [10] Xavier Glorot and Yoshua Bengio. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics, pages 249–256. JMLR Workshop and Conference Proceedings, 2010.
  • [11] Shannan Guan, Xin Yu, Wei Huang, Gengfa Fang, and Haiyan Lu. Dmmg: Dual min-max games for self-supervised skeleton-based action recognition. IEEE Transactions on Image Processing, 2023.
  • [12] Weili Guan, Fangkai Jiao, Xuemeng Song, Haokun Wen, Chung-Hsing Yeh, and Xiaojun Chang. Personalized fashion compatibility modeling via metapath-guided heterogeneous graph learning. In Proceedings of the 45th international ACM SIGIR conference on research and development in information retrieval, pages 482–491, 2022.
  • [13] Jiayan Guo, Lun Du, Wendong Bi, Qiang Fu, Xiaojun Ma, Xu Chen, Shi Han, Dongmei Zhang, and Yan Zhang. Homophily-oriented heterogeneous graph rewiring. In Proceedings of the ACM Web Conference 2023, pages 511–522, 2023.
  • [14] Haoyu Han, Juanhui Li, Wei Huang, Xianfeng Tang, Hanqing Lu, Chen Luo, Hui Liu, and Jiliang Tang. Node-wise filtering in graph neural networks: A mixture of experts approach. arXiv preprint arXiv:2406.03464, 2024.
  • [15] Ruining He, Anirudh Ravula, Bhargav Kanagal, and Joshua Ainslie. Realformer: Transformer likes residual attention. arXiv preprint arXiv:2012.11747, 2020.
  • [16] Huiting Hong, Hantao Guo, Yucheng Lin, Xiaoqing Yang, Zang Li, and Jieping Ye. An attention-based graph neural network for heterogeneous structural learning. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 4132–4139, 2020.
  • [17] Ziniu Hu, Yuxiao Dong, Kuansan Wang, and Yizhou Sun. Heterogeneous graph transformer. In Proceedings of the web conference 2020, pages 2704–2710, 2020.
  • [18] Wei Huang, Yuan Cao, Haonan Wang, Xin Cao, and Taiji Suzuki. Graph neural networks provably benefit from structural information: A feature learning perspective. arXiv preprint arXiv:2306.13926, 2023.
  • [19] Wei Huang, Yayong Li, Weitao Du, Jie Yin, Richard Yi Da Xu, Ling Chen, and Miao Zhang. Towards deepening graph neural networks: A gntk-based optimization perspective. arXiv preprint arXiv:2103.03113, 2021.
  • [20] Rana Hussein, Dingqi Yang, and Philippe Cudré-Mauroux. Are meta-paths necessary? revisiting heterogeneous graph embeddings. In Proceedings of the 27th ACM international conference on information and knowledge management, pages 437–446, 2018.
  • [21] Eric Jang, Shixiang Gu, and Ben Poole. Categorical reparameterization with gumbel-softmax. arXiv preprint arXiv:1611.01144, 2016.
  • [22] Hao Jiang, Chuanzhen Li, Juanjuan Cai, and Jingling Wang. RCENR: A reinforced and contrastive heterogeneous network reasoning model for explainable news recommendation. In Hsin-Hsi Chen, Wei-Jou (Edward) Duh, Hen-Hsen Huang, Makoto P. Kato, Josiane Mothe, and Barbara Poblete, editors, Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2023, Taipei, Taiwan, July 23-27, 2023, pages 1710–1720. ACM, 2023.
  • [23] Xunqiang Jiang, Yuanfu Lu, Yuan Fang, and Chuan Shi. Contrastive pre-training of gnns on heterogeneous graphs. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, pages 803–812, 2021.
  • [24] Baoyu Jing, Shengyu Feng, Yuejia Xiang, Xi Chen, Yu Chen, and Hanghang Tong. X-goal: multiplex heterogeneous graph prototypical contrastive learning. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management, pages 894–904, 2022.
  • [25] Baoyu Jing, Chanyoung Park, and Hanghang Tong. Hdmi: High-order deep multiplex infomax. In Proceedings of the Web Conference 2021, pages 2414–2424, 2021.
  • [26] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  • [27] Thomas N Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016.
  • [28] Ang Li, Jian Hu, Ke Ding, Xiaolu Zhang, Jun Zhou, Yong He, and Xu Min. Uncertainty-based heterogeneous privileged knowledge distillation for recommendation system. In Hsin-Hsi Chen, Wei-Jou (Edward) Duh, Hen-Hsen Huang, Makoto P. Kato, Josiane Mothe, and Barbara Poblete, editors, Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2023, Taipei, Taiwan, July 23-27, 2023, pages 2471–2475. ACM, 2023.
  • [29] Xiang Li, Danhao Ding, Ben Kao, Yizhou Sun, and Nikos Mamoulis. Leveraging meta-path contexts for classification in heterogeneous information networks. In 2021 IEEE 37th International Conference on Data Engineering (ICDE), pages 912–923. IEEE, 2021.
  • [30] Dongsheng Luo, Wei Cheng, Dongkuan Xu, Wenchao Yu, Bo Zong, Haifeng Chen, and Xiang Zhang. Parameterized explainer for graph neural network. Advances in neural information processing systems, 33:19620–19631, 2020.
  • [31] Qingsong Lv, Ming Ding, Qiang Liu, Yuxiang Chen, Wenzheng Feng, Siming He, Chang Zhou, Jianguo Jiang, Yuxiao Dong, and Jie Tang. Are we really making much progress? revisiting, benchmarking and refining heterogeneous graph neural networks. In Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining, pages 1150–1160, 2021.
  • [32] Chris J Maddison, Andriy Mnih, and Yee Whye Teh. The concrete distribution: A continuous relaxation of discrete random variables. arXiv preprint arXiv:1611.00712, 2016.
  • [33] Aaron van den Oord, Yazhe Li, and Oriol Vinyals. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
  • [34] Yitong Pang, Lingfei Wu, Qi Shen, Yiming Zhang, Zhihua Wei, Fangli Xu, Ethan Chang, Bo Long, and Jian Pei. Heterogeneous global graph neural networks for personalized session-based recommendation. In Proceedings of the fifteenth ACM international conference on web search and data mining, pages 775–783, 2022.
  • [35] Chanyoung Park, Donghyun Kim, Jiawei Han, and Hwanjo Yu. Unsupervised attributed multiplex network embedding. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 5371–5378, 2020.
  • [36] Minjae Park. Cross-view self-supervised learning on heterogeneous graph neural network via bootstrapping, 2022.
  • [37] Michael Schlichtkrull, Thomas N Kipf, Peter Bloem, Rianne Van Den Berg, Ivan Titov, and Max Welling. Modeling relational data with graph convolutional networks. In The Semantic Web: 15th International Conference, ESWC 2018, Heraklion, Crete, Greece, June 3–7, 2018, Proceedings 15, pages 593–607. Springer, 2018.
  • [38] Yaguang Song, Xiaoshan Yang, and Changsheng Xu. Self-supervised calorie-aware heterogeneous graph networks for food recommendation. ACM Trans. Multimedia Comput. Commun. Appl., 19(1s), feb 2023.
  • [39] Ke Sun, Zhouchen Lin, and Zhanxing Zhu. Multi-stage self-supervised learning for graph convolutional networks on graphs with few labeled nodes. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 5892–5899, 2020.
  • [40] Susheel Suresh, Pan Li, Cong Hao, and Jennifer Neville. Adversarial graph augmentation to improve graph contrastive learning. Advances in Neural Information Processing Systems, 34:15920–15933, 2021.
  • [41] Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. Graph attention networks. arXiv preprint arXiv:1710.10903, 2017.
  • [42] Petar Velickovic, William Fedus, William L Hamilton, Pietro Liò, Yoshua Bengio, and R Devon Hjelm. Deep graph infomax. ICLR (Poster), 2(3):4, 2019.
  • [43] Haonan Wang, Jieyu Zhang, Qi Zhu, Wei Huang, Kenji Kawaguchi, and Xiaokui Xiao. Single-pass contrastive learning can work for both homophilic and heterophilic graph. arXiv preprint arXiv:2211.10890, 2022.
  • [44] Kun Wang, Guibin Zhang, Xinnan Zhang, Junfeng Fang, Xun Wu, Guohao Li, Shirui Pan, Wei Huang, and Yuxuan Liang. The heterophilic snowflake hypothesis: Training and empowering gnns for heterophilic graphs. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 3164–3175, 2024.
  • [45] Li Wang, Wei Huang, Miao Zhang, Shirui Pan, Xiaojun Chang, and Steven Weidong Su. Pruning graph neural networks by evaluating edge properties. Knowledge-Based Systems, 256:109847, 2022.
  • [46] Xiao Wang, Houye Ji, Chuan Shi, Bai Wang, Yanfang Ye, Peng Cui, and Philip S Yu. Heterogeneous graph attention network. In The world wide web conference, pages 2022–2032, 2019.
  • [47] Xiao Wang, Nian Liu, Hui Han, and Chuan Shi. Self-supervised heterogeneous graph neural network with co-contrastive learning. In Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining, pages 1726–1736, 2021.
  • [48] Carl Yang, Yuxin Xiao, Yu Zhang, Yizhou Sun, and Jiawei Han. Heterogeneous network representation learning: A unified framework with survey and benchmark. IEEE Transactions on Knowledge and Data Engineering, 34(10):4854–4873, 2020.
  • [49] Xiaocheng Yang, Mingyu Yan, Shirui Pan, Xiaochun Ye, and Dongrui Fan. Simple and efficient heterogeneous graph neural network. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 10816–10824, 2023.
  • [50] Zuoxi Yang. Biomedical information retrieval incorporating knowledge graph for explainable precision medicine. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 2486–2486, 2020.
  • [51] Pengyang Yu, Chaofan Fu, Yanwei Yu, Chao Huang, Zhongying Zhao, and Junyu Dong. Multiplex heterogeneous graph convolutional network. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 2377–2387, 2022.
  • [52] Zixuan Yuan, Hao Liu, Yanchi Liu, Denghui Zhang, Fei Yi, Nengjun Zhu, and Hui Xiong. Spatio-temporal dual graph attention network for query-poi matching. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’20, page 629–638, New York, NY, USA, 2020. Association for Computing Machinery.
  • [53] Seongjun Yun, Minbyul Jeong, Raehyun Kim, Jaewoo Kang, and Hyunwoo J Kim. Graph transformer networks. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
  • [54] Chuxu Zhang, Dongjin Song, Chao Huang, Ananthram Swami, and Nitesh V Chawla. Heterogeneous graph neural network. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pages 793–803, 2019.
  • [55] Guibin Zhang, Kun Wang, Wei Huang, Yanwei Yue, Yang Wang, Roger Zimmermann, Aojun Zhou, Dawei Cheng, Jin Zeng, and Yuxuan Liang. Graph lottery ticket automated. In The Twelfth International Conference on Learning Representations, 2024.
  • [56] Jianan Zhao, Xiao Wang, Chuan Shi, Zekuan Liu, and Yanfang Ye. Network schema preserving heterogeneous information network embedding. In International Joint Conference on Artificial Intelligence (IJCAI), 2020.
  • [57] Lecheng Zheng, Jinjun Xiong, Yada Zhu, and Jingrui He. Contrastive learning with complex heterogeneity. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 2594–2604, 2022.
  • [58] Shichao Zhu, Chuan Zhou, Shirui Pan, Xingquan Zhu, and Bin Wang. Relation structure-aware heterogeneous graph neural network. In 2019 IEEE international conference on data mining (ICDM), pages 1534–1539. IEEE, 2019.
  • [59] Yanqiao Zhu, Weizhi Xu, Jinghao Zhang, Qiang Liu, Shu Wu, and Liang Wang. Deep graph structure learning for robust representations: A survey. arXiv preprint arXiv:2103.03036, 14, 2021.
  • [60] Yanqiao Zhu, Yichen Xu, Hejie Cui, Carl Yang, Qiang Liu, and Shu Wu. Structure-enhanced heterogeneous graph contrastive learning. In Proceedings of the 2022 SIAM International Conference on Data Mining (SDM), pages 82–90. SIAM, 2022.
  • [61] Yanqiao Zhu, Yichen Xu, Qiang Liu, and Shu Wu. An empirical study of graph contrastive learning. arXiv preprint arXiv:2109.01116, 2021.