\etocdepthtag

.tocmtchapter \etocsettagdepthmtchaptersubsection \etocsettagdepthmtappendixnone

LAMP: Learnable Meta-Path Guided Adversarial Contrastive Learning for Heterogeneous Graphs

Siqing Li
University of New South Wales
siqing.li@unsw.edu.au
&Jin-Duk Park
Yonsei University
jindeok6@yonsei.ac.kr
&Wei Huang
RIKEN AIP
wei.huang.vr@riken.jp
&Xin Cao¹¹footnotemark: 1
University of New South Wales
xin.cao@unsw.edu.au
&Won-Yong Shin
Yonsei University
wy.shin@yonsei.ac.kr
&Zhiqiang Xu
Mohamed bin Zayed University of Artificial Intelligence
zhiqiangxu2001@gmail.com
Corresponding author.

Abstract

Heterogeneous graph neural networks (HGNNs) have significantly propelled the information retrieval (IR) field. Still, the effectiveness of HGNNs heavily relies on high-quality labels, which are often expensive to acquire. This challenge has shifted attention towards Heterogeneous Graph Contrastive Learning (HGCL), which usually requires pre-defined meta-paths. However, our findings reveal that meta-path combinations significantly affect performance in unsupervised settings, an aspect often overlooked in current literature. Existing HGCL methods have considerable variability in outcomes across different meta-path combinations, thereby challenging the optimization process to achieve consistent and high performance. In response, we introduce LAMP (LearnAble Meta-Path), a novel adversarial contrastive learning approach that integrates various meta-path sub-graphs into a unified and stable structure, leveraging the overlap among these sub-graphs. To address the denseness of this integrated sub-graph, we propose an adversarial training strategy for edge pruning, maintaining sparsity to enhance model performance and robustness. LAMP aims to maximize the difference between meta-path and network schema views for guiding contrastive learning to capture the most meaningful information. Our extensive experimental study conducted on four diverse datasets from the Heterogeneous Graph Benchmark (HGB) demonstrates that LAMP significantly outperforms existing state-of-the-art unsupervised models in terms of accuracy and robustness.

1 Introduction

Heterogeneous graphs characterized by diverse node and edge types are ubiquitous across various domains including social, academic, and user interaction networks. The use of heterogeneous graph neural networks (HGNNs) has surged in IR applications, ranging from search engines [2, 12, 50] to recommendation systems [1, 34, 39, 22, 28] and question answering systems [7, 9, 4].

HGNNs fall into two categories: Meta-path based models [46, 54, 8, 49], converting HINs into homogeneous sub-graphs via predefined meta-paths, and Meta-path free models [53, 58, 17, 16, 31, 51], facilitating distinct information propagation along varied relations. These models have shown promising results but often require extensive labeling, posing challenges for large-scale IR tasks. Consequently, there has been a shift towards self-supervised learning (SSL) approaches, particularly in Heterogeneous Graph Contrastive Learning (HGCL) [35, 25, 47, 24, 3, 57, 60].

In HGCL, a widely adopted approach involves the generation of multiple graph views via diverse data augmentation techniques, subsequently refining node representations through contrastive learning. Two principal categories of HGCL augmentations emerge: (1) the meta-path view [35, 24, 25], which converts heterogeneous graphs into homogeneous sub-graphs according to selected meta-paths, and (2) the network schema view [47, 36], wherein the target nodes aggregate information from one-hop neighbors of varying node types. Distinctively, the network schema view imparts a localized perspective, while the meta-path view delivers a more expansive, higher-order perspective, connecting target nodes through meta-path instances that span multiple hops. However, recent studies have revealed that manually crafted augmentations, including the prevalent meta-path view, often fall short of achieving optimal results [61, 59, 20]. This demonstrates a significant reliance on the specific combination of meta-paths chosen, which in turn, greatly affects the overall model performance.

In this study, we explore the relationship between the meta-path set selection and HGNN model performance on node classification, detailed in Section 3. Our findings illustrated in Figure 1 reveal that the set of meta-paths selected crucially affect model performance, with all models showing at least a 5% deviation across different combinations, especially pronounced in SSL models. Clearly, the identification of the optimal meta-path combination is crucial, yet presents considerable challenges due to:

(1) No Universal Meta-Path Combination: Our research indicates the absence of a universally optimal meta-path combination among models, with effectiveness varying significantly (see Figure 3). The optimal set for supervised models often underperforms in unsupervised scenarios, highlighting SSL’s inherent complexity.

(2) No use in Adding More Meta-paths: Surprisingly, adding more meta-paths doesn’t consistently lead to better performance. Although effective in supervised learning contexts as evidenced by SOTA methods [49, 6], this approach does not translate as effectively into SSL scenarios. Consequently, a straightforward greedy search for the optimal meta-path combination is inadequate in the SSL landscape.

(3) No Downstream Task Labels in SSL: SSL methods face a unique challenge in that they cannot employ downstream tasks to determine the most effective meta-path combinations, as these tasks are not applicable in unsupervised contexts.

Addressing the issue in an unsupervised framework, our solution is to increase the robustness of HGCL models against diverse meta-path combinations. The existing models lack robustness primarily because each meta-path is treated as an independent channel, making changes in these channels potentially harmful to model stability. To address the overlooked issue of meta-path sensitivity, we present LAMP — a LearnAble Meta-Path guided adversarial contrastive learning model which aims at creating a stable meta-path view. It reduces dependency on specific meta-path combinations and achieves consistent performance, also simplifying the integration of a wide range of meta-paths. Furthermore, we enhance LAMP with adversarial training, a technique known to improve contrastive learning performance in homogeneous graphs.

LAMP proposes a new perspective in meta-path view construction by merging different meta-path sub-graphs into a unified structure. This results in a singular sub-graph that integrates nodes and edges from various meta-path sub-graphs. In this integrated sub-graph, each edge carries a one-hot-like encoding based on its meta-path instance, maintaining the semantic integrity of the original sub-graphs. This unified form ensures stability across various meta-path combinations, utilizing the overlaps between them. For instance, combining sub-graphs from PAP,PSP,PAPAP (refer to Figure 2 (b)) into one integrated sub-graph (Figure 2 (c)) retains the topological structure when modifying the combination, such as removing a meta-path, but with different edge encoding. This stability stems from the shared edges commonly found in heterogeneous graphs, as detailed in Section 3, thereby significantly reducing variability between combinations and enhancing the model’s robustness.

Nevertheless, As the number of meta-paths in the integrated sub-graph increases, so does its density, which may hinder performance since Graph Contrastive Learning (GCL) generally performs better with sparser structures[61]. In extreme cases, the integrated sub-graph might become too dense, resembling a complete graph, and lead to high computational cost. To address this, we apply an adversarial training method named LMA (Learnable Meta-path Guided Augmentation). Initially, LMA simplifies the graph by randomly removing edges. It then employs a learned edge-pruning approach, guided by node features and semantic information, to optimally refine the graph’s structure. This process enhances both the model’s efficiency and its robustness. The edge encoding is combined with a learnable weight vector to represent the importance of different meta-paths. LMA’s goal is to create a significant distinction between the network schema and meta-path views, allowing the HGCL framework to effectively extract the most meaningful knowledge. This approach follows the adversarial training model common in graph contrastive learning. Our comprehensive experiments on four HGB [31] real-world datasets demonstrate that LAMP not only outperforms current SOTA baselines but also greatly improves robustness.

Refer to caption — Figure 1: Comparing performance variability in node classification and illustrating standard deviation and min-max gaps across HGNN models (supervised HAN with 1/2 layers, unsupervised XGOAL, HeCo, DMGI, and our LAMP) using varied meta-path combinations.

2 Preliminary

Definition 1. Heterogeneous Information Network (HIN). A HIN is a network $\mathcal{G}=(\mathcal{V},\mathcal{E},\mathcal{A},\mathcal{R},\mathcal{\theta},% \mathcal{\phi})$ , where $\mathcal{V}$ and $\mathcal{E}$ represent the sets of nodes and edges, respectively. The network is associated with a node type mapping function $\mathcal{V}\rightarrow\mathcal{A}$ and an edge type mapping function $\mathcal{\phi}:\mathcal{E}\rightarrow\mathcal{R}$ . Here, $\mathcal{A}$ and $\mathcal{R}$ represent the sets of object and link types, respectively, with the constraint $|\mathcal{A}|+|\mathcal{R}|>2$ .
Definition 2. Meta-path. A meta-path $\mathcal{P}$ is a structural pattern connecting different node types, represented as

A_{1}\xrightarrow{R_{1}}A_{2}\xrightarrow{R_{2}}A_{3}\cdots\xrightarrow{R_{l}}% A_{l+1}

(abbreviated as $A_{1}A_{2}\dots A_{l+1}$ ), which describes a composite relation $R=R_{1}\circ R_{2}\circ\dots\circ R_{l}$ between node types $A_{1}$ and $A_{l+1}$ , where $\circ$ represents the composition operator on relations. Paths in $\mathcal{G}$ that follow the pattern of $\mathcal{P}$ are termed as meta-path instances.
Definition 3. Meta-path Sub-Graph. Given a meta-path $\mathcal{P}$ , the nodes in $\mathcal{G}$ can be re-connected to form a meta-path sub-graph $\mathcal{G}_{\mathcal{P}}$ . An edge $e\rightarrow v$ exists in $\mathcal{G}_{\mathcal{P}}$ if and only if there’s at least one path (a meta-path instance) between $u$ and $v$ following the meta-path $\mathcal{P}$ in the original graph $\mathcal{G}$ . For instance, Figure 2 (b) illustrates three meta-path sub-graphs derived from the HIN in Figure 2 (a). PAP indicates two papers authored by the same individual, while PSP signifies two papers related to the same subject. As meta-paths combine multiple relations, meta-path sub-graphs encapsulate high-order structures.

3 Empirical Observations

To explore the influence of meta-path combinations on HGNN performance, we conducted a detailed empirical study using the ACM dataset. We generated 26 distinct combinations from 5 predefined meta-paths and assessed the performance variations in HGNNs, evidenced by the standard deviation and min-max gap. The key findings, depicted in Figures 1 and 3, are summarized below:

(1) Sensitivity to Meta-path Combinations. Meta-path combinations critically affect HGNN performance. Variations in these combinations impact the structural configuration of meta-path sub-graphs, significantly influencing model outcomes, as evidenced by the substantial standard deviation and min-max gap shown in Figure 1. In extreme cases, improper combinations can lead to model failure. This challenge is more acute in SSL models due to the lack of downstream task feedback. Even proven meta-paths can cause dramatic performance deterioration if combined inappropriately. The sensitivity of HGNNs to these combinations is partly due to their responsiveness to topological changes and is further compounded by the low homophily ratios in meta-path sub-graphs [13] (referenced in Table 1), which exacerbates the issue in denser sub-graph structures.

(2) Absence of Universal Optimal Combinations.: Our study reveals that no single meta-path combination is optimal for all models. This absence of a universal ‘best’ combination becomes a formidable challenge in SSL, where the lack of direct feedback from downstream tasks makes finding the ideal combination through exhaustive search impractical. The disparity between the effective combinations in supervised and unsupervised models further complicates this issue. This gap suggests that strategies successful in supervised learning may not directly translate to superior performance in unsupervised settings.

(3) Naively adding more meta-path do not guarantee the best.: Contrary to expectations, simply adding more meta-paths does not linearly improve HGNN performance. While certain meta-paths are essential, their impact varies across different models. In some instances, such as the comparison between ‘comb26’ and the optimal ‘comb21’ for X-GOAL, adding an extra meta-path resulted in decreased performance. Our analysis, illustrated in Figure 4, shows significant edge overlaps among meta-path sub-graphs. For example, ‘-PPSP’ overlaps with over 50% of every other meta-path sub-graph. Such overlaps cause an accumulation of redundant information, overshadowing valuable insights from less common structures. What’s worse, current semantic-level aggregation methods struggle to filter out this redundancy, indicating that increasing meta-path count is not a straightforward solution for performance enhancement. In some cases, it can even be counterproductive.

Dataset	Meta-path	HR(%)	ACC(%)	Edges
	PAP	81.45	87.33 ± 0.56	29767
	PSP	64.03	66.72 ± 0.49	2217089
ACM	PTP	33.38	68.21 ± 0.14	9150595
	PcPSP	60.62	68.21 ± 1.08	1933761
	PrPSP	61.41	68.16 ± 1.28	1440299

Table 1: Homophility rate (HR) in different meta-path sub-graph of ACM dataset. ACC represents the node classification accuracy of 2-layer GCN with ReLU activation.

This study’s insights emphasize the critical need for a methodological strategy capable of forging a robust meta-path perspective, while simultaneously mitigating the redundancies that emerge from the amalgamation of various meta-paths.

4 The Proposed Model: LAMP

In this section, we introduce LAMP, a Learnable Meta-Path Guided Adversarial Contrastive Learning method, detailed in Figure 5. LAMP leverages a dual-view approach: a high-order information-rich meta-path view, processed by LMA, and a locally-focused network schema view. The essence of LAMP lies in its integration of diverse meta-path sub-graphs into a single, comprehensive meta-path sub-graph. To manage the inherent density of this integrated view, LMA – a meta-path guided learnable edge-pruning strategy – is employed. LAMP’s aim is to effectively retain essential sparsity for contrastive learning and reduce redundant information across the network schema and integrated meta-path views, enhancing node consistency across these views via an advanced adversarial training regime.

4.1 Problem Formulation

Given a HIN $\mathcal{G}=(\mathcal{V},\mathcal{E},\mathcal{A},\mathcal{R})$ denoted as $G$ for short and a set of meta-path $\{\mathcal{P}\}$ with $|\{\mathcal{P}\}|=n$ , we define $\{\mathcal{G}_{\mathcal{P}_{1}}\dots\mathcal{G}_{\mathcal{P}_{n}}\}$ as meta-path sub-graphs and ${\mathcal{\hat{G}_{P}}}$ denoted as $\hat{G}$ as the integrated meta-path sub-graph. We represent the encoding function with parameter $\theta$ as $f(\cdot)$ and the augmentation function with parameter $\phi$ as $t(\cdot)$ . For simplicity, we denote the network schema view by $f_{\theta}(G)$ and the meta-path view as $f_{\theta}(\hat{G})$ . The primary objective for contrastive learning is:

\arg\max_{\theta}I(f_{\theta}(G),f_{\theta}(t_{\phi}(\hat{G}))),

(1)

Then for the adversarial training which tries to increase the difficulty of getting agreement in contrastive learning, the objective is:

\arg\min_{\phi}I(f_{\theta}(G),f_{\theta}(t_{\phi}(\hat{G}))),

(2)

where $I(X_{1};X_{2})$ represents the mutual information between random variables $X_{1}$ and $X_{2}$ . The graph $t_{\phi}(\hat{G})=(\hat{V},\hat{E})$ retains the nodes from $\hat{G}$ , but its edge set is a subset of $\hat{E}$ . The insight is, we are trying to make the contrast as strong as possible while the two different view still could reach an agreement, which has been proved to be a effective optimization in contrastive learning. To bridge the min-max procedure and address potential biases, we incorporate a learnable meta-path importance parameter $\gamma\in\mathbb{R}^{1\times|\mathcal{P}|}$ , which shared by $f(\cdot)$ and $t(\cdot)$ . Then we put all the objective together and the refined version is:

\arg\max_{\theta}\min_{\phi}I(f_{\theta}(G),f_{\theta,\gamma}(t_{\phi,\gamma}(% \hat{G})).

(3)

Consistent with prior research [40, 11], we employ InfoNCE [33] to approximate $I(X_{1};X_{2})$ , detailed further in Section 4.6. Regarding $\gamma$ , the insight is that $\gamma$ prioritizes longer meta-paths because the most straightforward strategy for $t_{\phi}(\cdot)$ to diminish the similarity between the two views is by preserving long meta-path instances in $\hat{G}$ . Conversely, during the maximization phase, shorter meta-paths become more influential. This balanced strategy empowers LAMP to harness rich high-order information while discerning the value of different meta-paths.

4.2 Integrated Sub-graph based meta-path view

Given a batch of meta-path sub-graphs $\{\mathcal{G}_{\mathcal{P}_{i}}\}=\{(\mathcal{V}_{\mathcal{P}_{i}},\mathcal{E}% _{\mathcal{P}_{i}})\}$ with $i=1,\cdots,|\mathcal{P}|$ , we amalgamate all of them into a singular sub-graph denoted as $\hat{G}=(\hat{V},\hat{E})$ , to create the meta-path view. Here $\hat{V}=\cup_{i}\mathcal{V}_{\mathcal{P}_{i}}$ and $\hat{E}=\cup_{i}\mathcal{E}_{\mathcal{P}_{i}}$ . As an illustration, Figure 2(c) depicts an integrated sub-graph derived from three meta-path sub-graphs: PAP, PSP, and PAPAP. The edge $e_{12}=(0,1,1)$ emerges since it’s absent in PAP but present in both PSP and PAPAP. For every edge $(u,v)\in\hat{E}$ , we assign a vector $e_{uv}=(x_{1},x_{2},\cdots,x_{|\mathcal{P}|})$ to present its semantic information, where $x_{i}$ is set to $1$ if $(u,v)\in\{\mathcal{E}_{\mathcal{P}_{i}}\}$ , otherwise $x_{i}$ is set to $0$ . To further capture the semantic level information, we assign a learnable vector $\gamma\in\mathbb{R}^{1\times|\mathcal{P}|}$ that quantifies the importance of each meta-path. In the message-passing phase, we utilize $\hat{e}_{uv}=\gamma\times e_{uv}$ as the edge embedding within the meta-path view. This approach ensures that overlaps between meta-path sub-graphs are mitigated, thereby curtailing redundant message passing and rendering the meta-path view more robust compared to prior methodologies. Nevertheless, this method can lead to a dense meta-path view, an aspect we address through the proposed LAMP, detailed in the subsequent section.

4.3 Learnable Meta-Path Guided Augmentation

The dense links of the integrated sub-graph, while capturing all given meta-path sub-graphs, can pose challenges for graph contrastive learning, as sparser graphs tend to yield more favorable results [61]. To address this issue, we introduce the Learnable Meta-Path Augmentation (LMA), a adversarial training based method aimed at learning a optimized edge prunning strategy. This ensures a sparser meta-view while overcoming manually-induced biases. LMA firstly applies a random edge dropping then a learned GCN based edge prunning strategy based according to node feature and semantic information. At the first stage, random droping will effectively decrease the graph complexity since with the growth of engaged meta-path, the integrated sub-graph will be denser and eventually approximate a compeleted which is not desirable. Besides, random dropping will provdie the LMA a dynamic integrated sub-graph in each epoch so that LMA could learn a more powerful edge-cutting strategy rather than fall into a sub-optimal solution specialized for a fixed input. For learning edge cutting strategy, each edge $e\in\hat{E}$ is correlated with a Bernoulli random variable, described as $p_{e}\sim\textrm{Bernoulli}(\omega_{e})$ . An edge will be present in $t_{\phi}(\hat{G})=(V,E)$ if $p_{e}=1$ and excluded otherwise. In order to cutting off edges based on not only node features, but also semantic information, the weights $\omega_{e}$ of the Bernoulli distribution are parameterized using an MLP that takes as input the concatenation of the node embeddings obtained from a $K$ -layer HGNN augmenter on $\hat{G}$ and the edge type embedding $\hat{e}_{uv}$ . Thus, the edge representations can be expressed as:

\omega_{e}=\text{MLP}\left([h_{u}^{K};h_{z}^{K};\hat{e}_{uv}]\right).

(4)

For a seamless end-to-end training of $t_{\phi}(\hat{G})$ , the binary nature of $p_{e}$ is transformed into a continuous variable between [0,1] using the Gumbel-Max reparametrization trick [32, 21]. Specifically:

p_{e}=\mathrm{Sigmoid}\left(\frac{\log(\delta)-\log(1-\delta)+\omega_{e}}{\tau% }\right),

(5)

where $\theta\sim\text{Uniform}(0,1)$ . As $\tau$ converges to zero, $p_{e}$ gravitates towards binary values, ensuring the gradient remains smooth and well-defined. Notably, this kind of edge pruning, underpinned by a stochastic graph model, has also been utilized to provide parameterized explanations of GNNs [30].

To curb LMA’s tendency for aggressive edge pruning, a regularization term $\lambda_{reg}\frac{\sum_{e\in\hat{E}}\omega_{e}}{|\hat{E}|}$ is incorporated into the objective function. The hyper-parameter $\lambda_{reg}$ dictates the quantity of retained edges. Without this regulation, LMA might opt for an extreme strategy of eliminating all edges to minimize the mutual information between $G$ and $t_{\phi}(\hat{G})$ , which is counterproductive. This regularization ensures an edge ratio is maintained in $t_{\phi}(\hat{G})$ to keep sufficient information for contrastive learning. The refined objective is:

\arg\max_{\theta}\min_{\phi}\left(I(f_{\theta}(G),f_{\theta,\gamma}(t_{\phi,% \gamma}(\hat{G}))-\lambda_{reg}\frac{\sum_{e\in\hat{E}}\omega_{e}}{|\hat{E}|}% \right).

(6)

Of note, the meta-path importance $\gamma$ offers a holistic perspective for both $f_{\phi,\gamma}(\cdot)$ and $t_{\phi,\gamma}(\cdot)$ . While $t_{\phi,\gamma}(\cdot)$ strives to maximize divergence from the network schema view, it places a premium on longer meta-paths. This is because, in contrast to the meta-path view, the network schema primarily harbors single-hop information. Conversely, during the agreement maximization phase, shorter meta-paths become more salient contributors by $f_{\phi,\gamma}(\cdot)$ .

4.4 Network Schema view

In the network schema view, for a given node $i$ , we initiate the process by employing a type-specific multilayer perceptron (MLP), denoted as $MLP^{\mathcal{A}(i)}$ , to transform the features $x_{i}$ of node $i$ into a unified feature space. This transformation is represented as follows:

h_{i}^{(0)}=MLP^{\mathcal{A}(i)}(x_{i}).

(7)

Here, $\mathcal{A}(i)$ presents the type of node $i$ . Subsequently, we incorporate one-hot encoding to represent the semantic information of various relations. This encoded information, along with the node features, is input into a unified HGNN encoder. The specifics of this unified HGNN encoder, which operates irrespective of node types while preserving edge type embeddings, will be elaborated in the following section.It is important to note that both the network schema view and the meta-path view utilize the same HGNN encoder, denoted as $f_{\theta,\gamma}$ . However, a key distinction lies in the treatment of the parameter $\gamma$ : it remains frozen in the network schema view, whereas gradients are enabled for $\gamma$ in the meta-path view, allowing for adaptability in encoding different types of information.

4.5 Unified HGNN Encoder

In the context of LAMP, as outlined in eq 6, it is crucial to employ a unified HGNN that can efficiently handle both the network schema view (heterogeneous graph) and the meta-path view (homogeneous graph). While an approach could involve two distinct HGNN encoders tailored for each view, such an architecture may be inappropriate for LAMP. The core concern is that distinct encoders might produce node embedding governed by entirely different parameter sets, making it extremely hard for LAMP to meaningfully minimize similarity based on topological information. Essentially, a unified HGNN encoder fosters a harmonious link between the two views, ensuring that embedding reflects inherent structural divergence rather than encoder bias.

\small\hat{\alpha}=\frac{\text{exp}(\text{LeakyReLU}(a^{T}[Wh_{i}\Arrowvert Wh% _{j}\Arrowvert W_{r}r_{\psi}(\langle i,j\rangle)]))}{\sum_{k\in N_{i}}\text{% exp}(\text{LeakyReLU}(a^{T}[Wh_{i}\Arrowvert Wh_{k}\Arrowvert W_{r}r_{\psi}(% \langle i,k\rangle)]))}.

(8)

4.5.1 Node Residual:

Introducing pre-activation residual connections for nodes:

h_{i}^{(l)}=\sigma(\sum_{j\in N_{i}}\alpha_{ij}^{(l)}W^{(l)}h_{j}^{l-1}+W_{res% }^{(l)}h_{i}^{(l-1)}).

(9)

4.5.2 Edge Residual:

Following the insights from Realformer [15], we add residuals to the attention scores:

\alpha_{ij}^{(l)}=(1-\beta)\alpha_{ij}^{(l)}+\beta\alpha_{ij}^{(l-1)},

(10)

with $\beta\in[0,1]$ serving as a scaling factor. In our framework, the representation of relationships between end nodes varies based on the view. For the network schema view, the function $r_{\psi}(\langle u,v\rangle)$ yields a one-hot vector encapsulating the relation between the nodes. Conversely, in the meta-path view, the relationship is captured by $r_{\psi}(\langle u,v\rangle)=\hat{e}_{uv}$ leveraging the embedded semantic information. The transformation matrix $W_{r}$ is designed to align the dimension of edge embedding with that of node embedding. Uniquely within the HGNN encoder, $W_{r}$ is the sole parameter not shared across both the network schema and meta-path views.

4.6 Contrastive Optimization

The core of our approach involves utilizing the network schema view $G$ and meta-path view $\hat{G}$ for the contrastive learning mechanism. Both graphs are fed into an HGNN followed by an MLP with a single hidden layer, mapping them into a space where the contrastive loss is computed:

	$\displaystyle z_{i}^{G,\textrm{proj}}$	$\displaystyle=W^{(2)}\sigma(W^{(1)}z_{i}^{G}+b^{(1)})+b^{(2)},$		(11)
	$\displaystyle z_{i}^{\hat{G},\textrm{proj}}$	$\displaystyle=W^{(2)}\sigma(W^{(1)}z_{i}^{\hat{G}}+b^{(1)})+b^{(2)},$		(12)

where $\sigma$ denotes the Leaky Relu function. The parameters are shared between the two views’ embedding.

Adopting the strategy introduced in HeCo, we generate high-quality positive and negative pairs. We introduce a connectivity vector $C_{i}(j)$ , which represents the connectivity between nodes based on the number of meta-path instances connecting them.

C_{i}(j)=\sum_{n=1}^{|\mathcal{P}|}\mathbbm{1}(j\in N_{i}^{\mathcal{P}_{n}}),

(13)

where $\mathbbm{1}(\cdot)$ represents the indicator function. Following this, we establish positive and negative samples by applying a threshold to the sorted node connectivity using $T_{pos}$ . The intuition here is that node pairs with higher connectivity are more likely to belong to the same class. The contrastive loss for node $i$ can be defined as follows:

\mathcal{L}_{i}=-\log\frac{\sum_{j\in Pos_{i}}\exp(sim(z_{i}^{G,\textrm{proj}}% ,z_{j}^{\hat{G},\textrm{proj}})/\tau)}{\sum_{k\in Pos_{i}\cup Neg_{i}}\exp(sim% (z_{i}^{G,\textrm{proj}},z_{k}^{\hat{G},\textrm{proj}})/\tau)},

(14)

where $sim(u,v)$ represents the cosine similarity between vectors $u$ and $v$ , and $\tau$ is the temperature parameter. The final objective aggregates the contrastive losses for all nodes:

\mathcal{J}=\frac{1}{|V|}\sum_{i\in V}\mathcal{L}_{i}.

(15)

For downstream tasks, embedding from $z^{\hat{G}}$ from the meta-path view is employed. Throughout the training process, a two-step approach is implemented for each epoch. For every epoch, in the first step, parameters within LMP are frozen, and we train the HGNN by minimizing the contrastive loss. Subsequently, in the second step, HGNN parameters are frozen while LAMP is trained with the objective of maximizing the contrastive loss.

Dataset	Nodes	NodeTypes	Edges	EdgeTypes	Target	Classes
DBLP	$26\,128$	4	$239\,566$	6	author	4
IMDB	$21\,420$	4	$86\,642$	6	movie	5
ACM	$10\,942$	4	$547\,872$	8	paper	3
Freebase	$180\,098$	8	$1\,057\,688$	36	book	7

Table 2: The statistics of the datasets

5 Experimental Evaluation

Dataset		DBLP		IMDB		ACM		FreeBase
Methods	Training Data	Micro-F1	Macro-F1	Micro-F1	Macro-F1	Micro-F1	Macro-F1	Micro-F1	Macro-F1
GCN	X,A,P,Y	90.84±0.32	91.47±0.34	57.88±1.18	64.82±0.64	92.17±0.24	92.12±0.23	27.84±3.13	60.23±0.92
RGCN	X,A,Y	91.52±0.50	92.07±0.50	58.85±0.26	62.05±0.15	91.55±0.74	91.41±0.77	46.78±0.77	58.33±1.57
HAN	X,A,P,Y	91.67±0.49	92.05±0.62	57.74±0.96	64.63±0.58	90.89±0.43	60.79±0.43	21.31±1.68	54.77±1.4
GTN	X,A,Y	93.52±0.55	93.97±0.54	60.47±0.98	65.14±0.45	91.31±0.70	91.20±0.71	OOM	OOM
HGT	X,A,Y	93.01±0.23	93.49±0.25	63.00±1.19	67.20±0.57	91.12±0.76	91.00±0.76	29.28±2.52	60.51±1.16
GAT	X,A,P,Y	93.83±0.27	93.39±0.30	58.94±1.35	64.86±0.43	92.26±0.94	92.19±0.93	40.73±2.58	65.26±0.45
\hdashline[1pt/0.5pt] Mp2vec	A,P	90.25±0.10	91.17±0.10	41.45±1.60	42.46±1.70	61.13±0.40	62.72±0.30	55.94±0.7	58.74±0.80
DGI	X,A,P	89.19±0.90	90.35±0.80	46.13±0.30	47.21±0.90	80.03±3.30	80.15±3.20	53.81±1.10	57.96±0.70
DMGI	X,A,P	89.46±0.60	90.66±0.50	47.49±1.40	61.97±1.30	87.97±0.40	87.82±0.50	52.10±0.70	56.69±1.20
X-GOAL	X,A,P	83.00±0.25	91.90±0.22	57.43±0.50	58.14±0.62	91.22±0.10	91.26±0.17	58.44±1.10	57.91±1.10
HeCo	X,A,P	90.64±0.30	91.59±0.20	58.07±0.50	59.13±0.60	89.04±0.50	88.71±0.50	60.13±1.30	62.24±1.60
LAMP	X,A	92.44±0.32	92.22±0.30	61.85±0.39	62.19±0.50	91.35±0.50	91.27±0.50	61.32±1.20	64.13±1.20

Table 3: Quantitative results on node classification, detailing accuracy percentages and standard deviations. The second column specifies the training data available for each method, where

X

A

P

, and

Y

correspond to node features, the adjacency matrix, optimal meta-path combination, and labels, respectively. The best and second best performance for unsupervised models is highlighted in boldface and underline. Instances where the computation surpassed the memory constraints of a 200GB CPU are marked as "OOM".

5.1 Experimental Setup

5.1.1 Datasets:

In our study, we leveraged the HGB benchmark [31], which includes four diverse HIN datasets detailed in Table 2. The DBLP dataset [8] is sourced from the renowned DBLP bibliography website, focusing on a subset of computer science publications and featuring nodes such as authors, papers, terms, and venues. The ACM dataset [56] is also a citation network from the computer science domain. We utilized the Freebase knowledge graph [29], specifically a subgraph with around 1,000,000 edges across eight types of entities, in line with previous research methodologies [48]. Lastly, the IMDB dataset focuses on the IMDB movie database, particularly covering movie genres like Action, Comedy, Drama, Romance, and Thriller.

5.1.2 Baselines and Implementation Details:

We compare LAMP with a diverse set of methods, including five unsupervised techniques: Mp2vec [5], DGI [42], DMGI [35], X-GOAL [24] and HeCo [47], as well as six (semi-)supervised ones: GAT [41, 31], GCN [27, 31], RGCN [37], HAN [46], GTN [53], HGT [17]. For Mp2vec, we configure parameters with 40 walks per node, a walk length of 100, and a window size of 5. For the meta-path selection, in the case of Mp2vec and DGI, we evaluate all meta-paths and report the best results; for all the other meta-path based methods we report the best performance with their optimal meta-path combination. Unless stated otherwise, default parameters are adopted from the original papers. For GCN and GAT, we employ the approach outlined in [31], enriching the original HIN with additional meta-path instances based on selected meta-paths. Specifically, for GAT, we employ the same edge-type embedding technique in the attention mechanism as in HGB. For LAMP, without a selection of optimal combination, we engage all the pre-defined meta-paths to construct the integrated meta-path subgraph. We use Glorot initialization [10] with the Adam optimizer [26]. The learning rate ranges from $1\times 10^{-4}$ to $5\times 10^{-2}$ , and patience values for early stopping are set between 5 and 200. Dropout rates are adjusted between 0.1 and 0.5, with increments of 0.05. LMA utilizes a two-layer GCN and LAMP integrates a two-layer HGB for node embedding within its contrastive learning framework. For the randomly edeg dropping, we search the best parameter from 0.3 to 0.8. We fixed the embedding dimensions at 64 for all techniques. Experiments are conducted 10 times randomly, with average results reported. For datasets lacking attributes, nodes receive one-hot ID vectors.

5.2 Node Classification

In node classification task, we leveraged learned node embeddings to train a linear classifier in a transductive setting, utilizing all available edges during training. The distribution of node labels was consistent across datasets: 24% for training, 6% for validation, and 70% for testing. Classification performance was evaluated using Macro-F1 and Micro-F1 metrics, with results reported for the test set based on optimal validation performance (Table 3). Among all baseline methods, we report the best performance with their corresponding optimal meta-path combinations For LAMP, we report the performance with combination involving all the meta-path to demonstrate the robustness. Notably, LAMP consistently surpassed other unsupervised methods and showed remarkable efficacy against supervised models, particularly in sparser datasets like IMDB and Freebase. Crucially, LAMP operates without relying on an optimal meta-path combination, setting it apart from other methodologies. We also examined LAMP’s sensitivity to meta-path combinations (Figure 1), demonstrating its superior stability and robustness, even in comparison to supervised approaches.

5.3 Sensitivity of Meta-Paths

To examine the sensitivity of various meta-path combinations, we conducted experiments on the ACM dataset. Our focus was to observe the variations and the min-max gap in Micro-F1 scores across all possible meta-path combinations. We considered the following candidate meta-paths: "PAP", "PSP", "PTP", "PPSP", and "-PPSP", which collectively form 26 distinct meta-path combinations, as illustrated in Figure 3. It is important to note that methods like Mp2vec and DGI were excluded from these experiments, as they are incompatible with all meta-path combinations due to their inherent design limitations and their inability to achieve state-of-the-art (SOTA) performance. The results of our experiments are presented in Table 4. In these tests, LAMP demonstrated a significant outperformance over existing unsupervised methods and even surpassed some of the supervised learning methods in terms of Micro-F1 scores. Intriguingly, current state-of-the-art methods, including HeCo and Xgoal, exhibited substantial sensitivity to the choice of meta-path combinations. This finding underscores the importance of robust meta-path handling, especially in self-supervised learning contexts, and highlights the effectiveness of LAMP in addressing this challenge.

Methods	Standard Deviation(%)	Min-Max gap(%)
DMGI	5.46	25.26
XGOAL	7.01	24.89
HeCo	11.70	36.69
HAN-1Layer	3.95	11.16
HAN-2Layer	4.49	20.82
LAMP	2.07	6.08

Table 4: Quantitative results on Sensitivity of Meta-Paths

5.4 Node Clustering

In our experimental setup, we employ the K-means clustering algorithm for the learned node embedding. For performance evaluation, we utilize standard clustering metrics: normalized mutual information (NMI) and adjusted rand index (ARI). Recognizing the potential variability introduced by K-means due to its sensitivity to initialization, we execute the clustering process across ten independent runs and present the averaged outcomes in Table 5. Notably, the IMDB dataset is excluded from this evaluation, given its multi-dimensional label structure in HGB dataset. Furthermore, direct comparisons with supervised methodologies are omitted; these models have inherent access to label information during training and are optimized based on validation metrics. Empirical results underscore that LAMP consistently exhibits superior performance across datasets, reaffirming its effectiveness in the clustering context.

Datesets	DBLP		ACM		Freebase
Metrics	NMI	ARI	NMI	ARI	NMI	ARI
Mp2vec	73.55	77.70	48.43	34.65	16.47	17.32
DGI	59.23	61.85	51.73	41.16	18.34	11.29
DMGI	70.06	75.46	51.66	46.64	16.98	16.91
X-GOAL	61.53	78.91	56.77	43.67	18.67	17.44
HeCo	74.51	80.17	56.87	56.94	20.38	20.98
LAMP	77.13	82.73	58.45	59.12	23.44	24.38

Table 5: Quantitative results on node clustering.

5.5 Ablation Study

This section evaluates two distinct variants: $\text{{LAMP}}_{\textrm{w.o.mp}}$ (referred to as $\text{{LAMP}}_{var1}$ ) and $\text{{LAMP}}_{\textrm{w.o.unifiedHGNN}}$ (referred to as $\text{{LAMP}}_{var2}$ ). For the $\text{{LAMP}}_{var1}$ version, we freeze the parameter $\gamma$ to cancel out the effect of meta-path importance during LMA learning. The intent behind this is to examine the role of meta-path importance in bridging local and high-order information. On the other hand, $\text{{LAMP}}_{var2}$ replaces the unified HGB encoder with the meta-path and network-schema encoders from HeCo. Within this setup, the meta-path view is processed using the HAN [46] attention mechanism, while a standard GCN tackles the original HIN. For the meta-path view, the LMA edge-pruning technique is applied to each individual meta-path sub-graph.

Table 3 illustrates that both $\text{{LAMP}}_{var1}$ and $\text{{LAMP}}_{var2}$ suffer a considerable decline in performance. (1) Lacking the meta-path importance $\gamma$ , $\text{{LAMP}}_{var1}$ struggles to harness sufficient overall structural data. It primarily emphasizes local details based on node attributes. Similarly, without the guidance of meta-path importance $\gamma$ , LMA tends to prioritize lengthy meta-paths, and neglect potentially valuable shorter meta-paths. The resultant effect weakens $\text{{LAMP}}_{var1}$ ’s capability to bridge local and high-order information. This underscores that the guidance from meta-path importance is crucial for the LAMP model. (2) For $\text{{LAMP}}_{var2}$ , employing separate HGNN encoders for the two views might have been effective in HeCo, but it does not work for LAMP. As shown in Table6, $\text{{LAMP}}_{var2}$ lags behind in performance across all datasets.Using disparate HGNN encoders inherently amplifies the differences in embedding produced by the two views, even when the target node attributes remain consistent across both views. This introduces a dilemma for LMA, making it challenging to determine which edges to prune, as the two views already appear distinct. This inconsistency can destabilize the model, increasing the risk of training collapse.

Dataset	DBLP		IMDB		ACM		FreeBase
Methods	Micro-F1	Macro-F1	Micro-F1	Macro-F1	Micro-F1	Macro-F1	Micro-F1	Macro-F1
$\text{LAMP}_{var1}$	71.05	71.12	34.98	34.05	73.85	73.06	32.06	31.12
$\text{LAMP}_{var2}$	86.33	87.27	53.40	54.20	84.54	84.75	49.51	50.04
LAMP	92.44	92.22	61.85	62.19	91.35	91.27	61.32	64.13

Table 6: Quantitative results with two LAMP variants.

5.6 Analysis of Hyper-parameters

In this section, we examine our model’s sensitivity to two critical hyper-parameters: the threshold for positive samples $T_{pos}$ and the regulation term $\lambda_{reg}$ , which determines the proportion of retained edges in LMA. Node classification on the ACM and DBLP datasets is evaluated, with both Macro-F1 and Micro-F1 scores presented.

5.6.1 Analysis of $T_{pos}$

The threshold $T_{pos}$ controls the number of positive samples. We vary its value to observe its impact on performance, as shown in Figure 7(a) and Figure 7(b). As $T_{pos}$ increases, performance initially improves before declining. The optimal thresholds are determined to be 7 for DBLP and 8 for ACM. These performance trends are consistent across both datasets.

5.6.2 Analysis of $\lambda_{reg}$

Our exploration also considers the consequences of adjusting $\lambda_{reg}$ , which governs the fraction of edges retained by LMA. Results are presented in Figure 7(c) and Figure 7(d). For both DBLP and ACM datasets, $\lambda_{reg}$ =0.3 yields peak performance, preserving approximately half of the meta-path view edges. Notably, raising $\lambda_{reg}$ beyond 0.5 results in the preservation of 70%-80% of edges. This excessive retention introduces redundant data into the model, leading to diminished efficacy.

6 Related Work

6.1 Heterogeneous Graph Contrastive Learning

HGCL has rapidly evolved, effectively adapting contrastive learning techniques for heterogeneous graphs [35, 25, 47, 24, 3, 57, 60]. Standard HGCL approaches involve creating multiple graph views via meta-path or network-schema based augmentations, followed by representation learning through contrasting positive and negative samples. DMGI [35], for instance, contrasts the original network with its corrupted counterpart for each meta-path view, integrating a consensus regularization for meta-path fusion. HeCo [47] introduces two augmentation techniques—meta-path sub-graph view and network schema view—and minimizes the inter-view information entropy using personalized pairwise InfoNCE. HDMI [25] and XGOAL [24] are advanced versions of DGMI. HDMI improved semantic attention via high-order mutual information, XGOAL proposed a stronger positive and negative samples generating strategy, and node embeddings are obtained by simply average pooling over these layer-specific embeddings. CPT-HG [23] presents a pre-training model grounded in contrastive learning by making sub-graphs derived from positive samples integrate randomly swapped nodes from the negative set.

6.2 HGNNs applications in IR

In recent years, heterogeneous graph neural networks (HGNNs) as general extension of homogeneous graph [19, 43, 18, 45, 55, 44, 14] have risen to prominence as a pivotal tool in information retrieval (IR), adept at extracting rich structural and semantic information from heterogeneous graphs. This capability has led to their widespread application across various IR domains, including search engines, recommendation systems, and question-answering systems, among others. In the context of search engines and matching, Chen et al. [2] innovated a cross-modal retrieval method utilizing heterogeneous graph embeddings. This method adeptly preserves cross-modal information, overcoming the limitations of traditional approaches that often lose modality-specific details. Similarly, Guan et al. [12] addressed fashion compatibility modeling by integrating user preferences and attribute entities within a meta-path-guided HGNN framework. Additionally, Yuan et al. [52] introduced the Spatio-Temporal Dual Graph Attention Network (STDGAT) for intelligent query-Point of Interest (POI) matching in location-based services. By leveraging semantic representation, dual graph attention, and spatiotemporal factors, STDGAT enhances matching accuracy, even with partial query keywords.The domain of recommendation systems has also seen significant advancements through the application of HGNNs. Cai et al. [1] proposed an inductive heterogeneous graph neural network (IHGNN) model tailored for cold-start recommendation scenarios, addressing the challenge of sparse user attribute data. Pang et al. [34] developed a personalized session-based recommendation method using heterogeneous global graph neural networks (HG-GNN), which effectively captures user preferences from both current and historical sessions. Moreover, Song et al. [38] presented a self-supervised, calorie-aware heterogeneous graph network (SCHGN) for food recommendations, integrating user preferences and ingredient relationships to enhance the recommendation quality.In the arena of question-answering systems, HGNNs have garnered considerable attention. Feng et al. [7] proposed a document-entity heterogeneous graph network (DEHG) that integrates structured and unstructured information sources for multi-hop reasoning in open-domain question answering. Furthermore, Gao et al. [9] introduced HeteroQA, employing a question-aware heterogeneous graph transformer to assimilate multiple information sources from user communities, enriching the question-answering process.

7 Conclusion

Our study reveals the sensitivity of existing methodologies to meta-path combinations in unsupervised heterogeneous graph neural networks. To address this challenge, we introduce LAMP, a meta-path-guided adversarial approach for Heterogeneous Graph Contrastive Learning (HGCL). LAMP excels in capturing local and high-order structural information through dual views and Learnable Meta-Path guided augmentation (LMA) with an HGNN. Empirical tests across various datasets showcase LAMP’s superiority over existing unsupervised models and competitive performance even with supervised models. LAMP holds great potential for future heterogeneous graph contrastive learning research.

References

[1] Desheng Cai, Shengsheng Qian, Quan Fang, Jun Hu, and Changsheng Xu. User cold-start recommendation via inductive heterogeneous graph neural network. ACM Transactions on Information Systems, 41(3):1–27, 2023.
[2] Dapeng Chen, Min Wang, Haobin Chen, Lin Wu, Jing Qin, and Wei Peng. Cross-modal retrieval with heterogeneous graph embedding. In João Magalhães, Alberto Del Bimbo, Shin’ichi Satoh, Nicu Sebe, Xavier Alameda-Pineda, Qin Jin, Vincent Oria, and Laura Toni, editors, MM ’22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10 - 14, 2022, pages 3291–3300. ACM, 2022.
[3] Mengru Chen, Chao Huang, Lianghao Xia, Wei Wei, Yong Xu, and Ronghua Luo. Heterogeneous graph contrastive learning for recommendation. In Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining, pages 544–552. ACM, 2023.
[4] Philipp Christmann, Rishiraj Saha Roy, and Gerhard Weikum. Explainable conversational question answering over heterogeneous sources via iterative graph neural networks. In Hsin-Hsi Chen, Wei-Jou (Edward) Duh, Hen-Hsen Huang, Makoto P. Kato, Josiane Mothe, and Barbara Poblete, editors, Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2023, Taipei, Taiwan, July 23-27, 2023, pages 643–653. ACM, 2023.
[5] Yuxiao Dong, Nitesh V Chawla, and Ananthram Swami. metapath2vec: Scalable representation learning for heterogeneous networks. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, pages 135–144, 2017.
[6] Chenguang Du, Kaichun Yao, Hengshu Zhu, Deqing Wang, Fuzhen Zhuang, and Hui Xiong. Seq-hgnn: Learning sequential node representation on heterogeneous graph. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’23, page 1721–1730, New York, NY, USA, 2023. Association for Computing Machinery.
[7] Yue Feng, Zhen Han, Mingming Sun, and Ping Li. Multi-hop open-domain question answering over structured and unstructured knowledge. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 151–156, 2022.
[8] Xinyu Fu, Jiani Zhang, Ziqiao Meng, and Irwin King. Magnn: Metapath aggregated graph neural network for heterogeneous graph embedding. In Proceedings of The Web Conference 2020, pages 2331–2341, 2020.
[9] Shen Gao, Yuchi Zhang, Yongliang Wang, Yang Dong, Xiuying Chen, Dongyan Zhao, and Rui Yan. Heteroqa: Learning towards question-and-answering through multiple information sources via heterogeneous graph modeling. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining, pages 307–315, 2022.
[10] Xavier Glorot and Yoshua Bengio. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics, pages 249–256. JMLR Workshop and Conference Proceedings, 2010.
[11] Shannan Guan, Xin Yu, Wei Huang, Gengfa Fang, and Haiyan Lu. Dmmg: Dual min-max games for self-supervised skeleton-based action recognition. IEEE Transactions on Image Processing, 2023.
[12] Weili Guan, Fangkai Jiao, Xuemeng Song, Haokun Wen, Chung-Hsing Yeh, and Xiaojun Chang. Personalized fashion compatibility modeling via metapath-guided heterogeneous graph learning. In Proceedings of the 45th international ACM SIGIR conference on research and development in information retrieval, pages 482–491, 2022.
[13] Jiayan Guo, Lun Du, Wendong Bi, Qiang Fu, Xiaojun Ma, Xu Chen, Shi Han, Dongmei Zhang, and Yan Zhang. Homophily-oriented heterogeneous graph rewiring. In Proceedings of the ACM Web Conference 2023, pages 511–522, 2023.
[14] Haoyu Han, Juanhui Li, Wei Huang, Xianfeng Tang, Hanqing Lu, Chen Luo, Hui Liu, and Jiliang Tang. Node-wise filtering in graph neural networks: A mixture of experts approach. arXiv preprint arXiv:2406.03464, 2024.
[15] Ruining He, Anirudh Ravula, Bhargav Kanagal, and Joshua Ainslie. Realformer: Transformer likes residual attention. arXiv preprint arXiv:2012.11747, 2020.
[16] Huiting Hong, Hantao Guo, Yucheng Lin, Xiaoqing Yang, Zang Li, and Jieping Ye. An attention-based graph neural network for heterogeneous structural learning. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 4132–4139, 2020.
[17] Ziniu Hu, Yuxiao Dong, Kuansan Wang, and Yizhou Sun. Heterogeneous graph transformer. In Proceedings of the web conference 2020, pages 2704–2710, 2020.
[18] Wei Huang, Yuan Cao, Haonan Wang, Xin Cao, and Taiji Suzuki. Graph neural networks provably benefit from structural information: A feature learning perspective. arXiv preprint arXiv:2306.13926, 2023.
[19] Wei Huang, Yayong Li, Weitao Du, Jie Yin, Richard Yi Da Xu, Ling Chen, and Miao Zhang. Towards deepening graph neural networks: A gntk-based optimization perspective. arXiv preprint arXiv:2103.03113, 2021.
[20] Rana Hussein, Dingqi Yang, and Philippe Cudré-Mauroux. Are meta-paths necessary? revisiting heterogeneous graph embeddings. In Proceedings of the 27th ACM international conference on information and knowledge management, pages 437–446, 2018.
[21] Eric Jang, Shixiang Gu, and Ben Poole. Categorical reparameterization with gumbel-softmax. arXiv preprint arXiv:1611.01144, 2016.
[22] Hao Jiang, Chuanzhen Li, Juanjuan Cai, and Jingling Wang. RCENR: A reinforced and contrastive heterogeneous network reasoning model for explainable news recommendation. In Hsin-Hsi Chen, Wei-Jou (Edward) Duh, Hen-Hsen Huang, Makoto P. Kato, Josiane Mothe, and Barbara Poblete, editors, Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2023, Taipei, Taiwan, July 23-27, 2023, pages 1710–1720. ACM, 2023.
[23] Xunqiang Jiang, Yuanfu Lu, Yuan Fang, and Chuan Shi. Contrastive pre-training of gnns on heterogeneous graphs. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, pages 803–812, 2021.
[24] Baoyu Jing, Shengyu Feng, Yuejia Xiang, Xi Chen, Yu Chen, and Hanghang Tong. X-goal: multiplex heterogeneous graph prototypical contrastive learning. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management, pages 894–904, 2022.
[25] Baoyu Jing, Chanyoung Park, and Hanghang Tong. Hdmi: High-order deep multiplex infomax. In Proceedings of the Web Conference 2021, pages 2414–2424, 2021.
[26] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
[27] Thomas N Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016.
[28] Ang Li, Jian Hu, Ke Ding, Xiaolu Zhang, Jun Zhou, Yong He, and Xu Min. Uncertainty-based heterogeneous privileged knowledge distillation for recommendation system. In Hsin-Hsi Chen, Wei-Jou (Edward) Duh, Hen-Hsen Huang, Makoto P. Kato, Josiane Mothe, and Barbara Poblete, editors, Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2023, Taipei, Taiwan, July 23-27, 2023, pages 2471–2475. ACM, 2023.
[29] Xiang Li, Danhao Ding, Ben Kao, Yizhou Sun, and Nikos Mamoulis. Leveraging meta-path contexts for classification in heterogeneous information networks. In 2021 IEEE 37th International Conference on Data Engineering (ICDE), pages 912–923. IEEE, 2021.
[30] Dongsheng Luo, Wei Cheng, Dongkuan Xu, Wenchao Yu, Bo Zong, Haifeng Chen, and Xiang Zhang. Parameterized explainer for graph neural network. Advances in neural information processing systems, 33:19620–19631, 2020.
[31] Qingsong Lv, Ming Ding, Qiang Liu, Yuxiang Chen, Wenzheng Feng, Siming He, Chang Zhou, Jianguo Jiang, Yuxiao Dong, and Jie Tang. Are we really making much progress? revisiting, benchmarking and refining heterogeneous graph neural networks. In Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining, pages 1150–1160, 2021.
[32] Chris J Maddison, Andriy Mnih, and Yee Whye Teh. The concrete distribution: A continuous relaxation of discrete random variables. arXiv preprint arXiv:1611.00712, 2016.
[33] Aaron van den Oord, Yazhe Li, and Oriol Vinyals. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
[34] Yitong Pang, Lingfei Wu, Qi Shen, Yiming Zhang, Zhihua Wei, Fangli Xu, Ethan Chang, Bo Long, and Jian Pei. Heterogeneous global graph neural networks for personalized session-based recommendation. In Proceedings of the fifteenth ACM international conference on web search and data mining, pages 775–783, 2022.
[35] Chanyoung Park, Donghyun Kim, Jiawei Han, and Hwanjo Yu. Unsupervised attributed multiplex network embedding. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 5371–5378, 2020.
[36] Minjae Park. Cross-view self-supervised learning on heterogeneous graph neural network via bootstrapping, 2022.
[37] Michael Schlichtkrull, Thomas N Kipf, Peter Bloem, Rianne Van Den Berg, Ivan Titov, and Max Welling. Modeling relational data with graph convolutional networks. In The Semantic Web: 15th International Conference, ESWC 2018, Heraklion, Crete, Greece, June 3–7, 2018, Proceedings 15, pages 593–607. Springer, 2018.
[38] Yaguang Song, Xiaoshan Yang, and Changsheng Xu. Self-supervised calorie-aware heterogeneous graph networks for food recommendation. ACM Trans. Multimedia Comput. Commun. Appl., 19(1s), feb 2023.
[39] Ke Sun, Zhouchen Lin, and Zhanxing Zhu. Multi-stage self-supervised learning for graph convolutional networks on graphs with few labeled nodes. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 5892–5899, 2020.
[40] Susheel Suresh, Pan Li, Cong Hao, and Jennifer Neville. Adversarial graph augmentation to improve graph contrastive learning. Advances in Neural Information Processing Systems, 34:15920–15933, 2021.
[41] Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. Graph attention networks. arXiv preprint arXiv:1710.10903, 2017.
[42] Petar Velickovic, William Fedus, William L Hamilton, Pietro Liò, Yoshua Bengio, and R Devon Hjelm. Deep graph infomax. ICLR (Poster), 2(3):4, 2019.
[43] Haonan Wang, Jieyu Zhang, Qi Zhu, Wei Huang, Kenji Kawaguchi, and Xiaokui Xiao. Single-pass contrastive learning can work for both homophilic and heterophilic graph. arXiv preprint arXiv:2211.10890, 2022.
[44] Kun Wang, Guibin Zhang, Xinnan Zhang, Junfeng Fang, Xun Wu, Guohao Li, Shirui Pan, Wei Huang, and Yuxuan Liang. The heterophilic snowflake hypothesis: Training and empowering gnns for heterophilic graphs. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 3164–3175, 2024.
[45] Li Wang, Wei Huang, Miao Zhang, Shirui Pan, Xiaojun Chang, and Steven Weidong Su. Pruning graph neural networks by evaluating edge properties. Knowledge-Based Systems, 256:109847, 2022.
[46] Xiao Wang, Houye Ji, Chuan Shi, Bai Wang, Yanfang Ye, Peng Cui, and Philip S Yu. Heterogeneous graph attention network. In The world wide web conference, pages 2022–2032, 2019.
[47] Xiao Wang, Nian Liu, Hui Han, and Chuan Shi. Self-supervised heterogeneous graph neural network with co-contrastive learning. In Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining, pages 1726–1736, 2021.
[48] Carl Yang, Yuxin Xiao, Yu Zhang, Yizhou Sun, and Jiawei Han. Heterogeneous network representation learning: A unified framework with survey and benchmark. IEEE Transactions on Knowledge and Data Engineering, 34(10):4854–4873, 2020.
[49] Xiaocheng Yang, Mingyu Yan, Shirui Pan, Xiaochun Ye, and Dongrui Fan. Simple and efficient heterogeneous graph neural network. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 10816–10824, 2023.
[50] Zuoxi Yang. Biomedical information retrieval incorporating knowledge graph for explainable precision medicine. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 2486–2486, 2020.
[51] Pengyang Yu, Chaofan Fu, Yanwei Yu, Chao Huang, Zhongying Zhao, and Junyu Dong. Multiplex heterogeneous graph convolutional network. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 2377–2387, 2022.
[52] Zixuan Yuan, Hao Liu, Yanchi Liu, Denghui Zhang, Fei Yi, Nengjun Zhu, and Hui Xiong. Spatio-temporal dual graph attention network for query-poi matching. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’20, page 629–638, New York, NY, USA, 2020. Association for Computing Machinery.
[53] Seongjun Yun, Minbyul Jeong, Raehyun Kim, Jaewoo Kang, and Hyunwoo J Kim. Graph transformer networks. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
[54] Chuxu Zhang, Dongjin Song, Chao Huang, Ananthram Swami, and Nitesh V Chawla. Heterogeneous graph neural network. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pages 793–803, 2019.
[55] Guibin Zhang, Kun Wang, Wei Huang, Yanwei Yue, Yang Wang, Roger Zimmermann, Aojun Zhou, Dawei Cheng, Jin Zeng, and Yuxuan Liang. Graph lottery ticket automated. In The Twelfth International Conference on Learning Representations, 2024.
[56] Jianan Zhao, Xiao Wang, Chuan Shi, Zekuan Liu, and Yanfang Ye. Network schema preserving heterogeneous information network embedding. In International Joint Conference on Artificial Intelligence (IJCAI), 2020.
[57] Lecheng Zheng, Jinjun Xiong, Yada Zhu, and Jingrui He. Contrastive learning with complex heterogeneity. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 2594–2604, 2022.
[58] Shichao Zhu, Chuan Zhou, Shirui Pan, Xingquan Zhu, and Bin Wang. Relation structure-aware heterogeneous graph neural network. In 2019 IEEE international conference on data mining (ICDM), pages 1534–1539. IEEE, 2019.
[59] Yanqiao Zhu, Weizhi Xu, Jinghao Zhang, Qiang Liu, Shu Wu, and Liang Wang. Deep graph structure learning for robust representations: A survey. arXiv preprint arXiv:2103.03036, 14, 2021.
[60] Yanqiao Zhu, Yichen Xu, Hejie Cui, Carl Yang, Qiang Liu, and Shu Wu. Structure-enhanced heterogeneous graph contrastive learning. In Proceedings of the 2022 SIAM International Conference on Data Mining (SDM), pages 82–90. SIAM, 2022.
[61] Yanqiao Zhu, Yichen Xu, Qiang Liu, and Shu Wu. An empirical study of graph contrastive learning. arXiv preprint arXiv:2109.01116, 2021.