SUREL+: Moving from Walks to Sets for Scalable Subgraph-based Graph Representation Learning

Haoteng Yin

{}^{\dagger}

, Muhan Zhang

{}^{\ddagger}

, Jianguo Wang

{}^{\dagger}

, Pan Li

{}^{\dagger\lx@sectionsign}

{}^{\dagger}

Department of Computer Science, Purdue University

{}^{\ddagger}

Institute for Artificial Intelligence, Peking University

{}^{\lx@sectionsign}

School of Electrical and Computer Engineering, Georgia Institute of Technology

{}^{\dagger}

{yinht, csjgwang}@purdue.edu

{}^{\ddagger}

muhan@pku.edu.cn

{}^{\lx@sectionsign}

panli@gatech.edu

Abstract.

Subgraph-based graph representation learning (SGRL) has recently emerged as a powerful tool in many prediction tasks on graphs due to its advantages in model expressiveness and generalization ability. Most previous SGRL models face computational challenges associated with the high cost of subgraph extraction for each training or test query. Recently, SUREL was proposed to accelerate SGRL, which samples random walks offline and joins these walks online as a proxy of subgraph for representation learning. Thanks to the reusability of sampled walks across different queries, SUREL achieves state-of-the-art performance in terms of scalability and prediction accuracy. However, SUREL still suffers from high computational overhead caused by node duplication in sampled walks. In this work, we propose a novel framework SUREL+ that upgrades SUREL by using node sets instead of walks to represent subgraphs. This set-based representation eliminates repeated nodes by definition but can also be irregular in size. To address this issue, we design a customized sparse data structure to efficiently store and access node sets and provide a specialized operator to join them in parallel batches. SUREL+ is modularized to support multiple types of set samplers, structural features, and neural encoders to complement the structural information loss after the reduction from walks to sets. Extensive experiments have been performed to validate SUREL+ in the prediction tasks of links, relation types, and higher-order patterns. SUREL+ achieves 3-11 $\times$ speedups of SUREL while maintaining comparable or even better prediction performance; compared to other SGRL baselines, SUREL+ achieves $\sim$ 20 $\times$ speedups and significantly improves the prediction accuracy.

PVLDB Reference Format:
Haoteng Yin, Muhan Zhang, Jianguo Wang, Pan Li. PVLDB, 16(11): 2939-2948, 2023.
doi:10.14778/3611479.3611499 ^†^†This work is licensed under the Creative Commons BY-NC-ND 4.0 International License. Visit https://creativecommons.org/licenses/by-nc-nd/4.0/ to view a copy of this license. For any use beyond those covered by this license, obtain permission by emailing info@vldb.org. Copyright is held by the owner/author(s). Publication rights licensed to the VLDB Endowment.
Proceedings of the VLDB Endowment, Vol. 16, No. 11 ISSN 2150-8097.
doi:10.14778/3611479.3611499

PVLDB Artifact Availability:
The source code, data, and/or other artifacts have been made available at https://github.com/Graph-COM/SUREL_Plus.

1. Introduction

Graphs are widely used to model interactions in natural sciences and relationships in social life (Koller et al., 2007; Jumper et al., 2021). Graph-structured data in the real world are highly irregular and often large-scale. To solve inference tasks on graphs, graph representation learning (GRL) that studies quantitative representations of graph-structured data has attracted much attention (Hamilton et al., 2017b; Hamilton, 2020; Wu et al., 2022). Recently, subgraph-based GRL (SGRL) has become an important research direction for researchers studying GRL algorithms and systems, as it achieves far better prediction performance than other approaches on many GRL tasks, especially those involving a set of nodes. Given a set of queried nodes, SGRL models such as SEAL (Zhang and Chen, 2018; Zhang et al., 2021), GraIL (Teru et al., 2020), and SubGNN (Alsentzer et al., 2020) first extract a subgraph around the queried node set (called query-induced subgraph), and then use neural networks to encode extracted subgraphs for prediction. Extensive work shows that SGRL models are more robust (Zeng et al., 2021) and more expressive (Bouritsas et al., 2022; Frasca et al., 2022); while canonical graph neural networks (GNNs) including GCN (Kipf and Welling, 2017) and GraphSAGE (Hamilton et al., 2017a) usually fail to make accurate predictions, due to their limited expressive power (Zhang et al., 2021; Garg et al., 2020; Chen et al., 2020), incapability of capturing intra-node distance information (Srinivasan and Ribeiro, 2020; Li et al., 2020), and improper entanglement between receptive field size and model depth (Huang and Zitnik, 2020; Zeng et al., 2021; Yin et al., 2022). An example in Fig. 2 illustrates how SGRL works for link prediction and demonstrates its advantages over GNNs. Here, canonical GNNs generate and aggregate node-wise representations to predict links, which would map structurally symmetric nodes without distinct features into the same representation and thus lead to the ambiguity issue (Xu et al., 2019; Zhang et al., 2021). So far, the advantages of SGRL methods have been verified in many applications, such as link and relation prediction (Zhang and Chen, 2018; Zhang et al., 2021; Teru et al., 2020), higher-order pattern prediction (Meng et al., 2018; Liu et al., 2022), temporal network modeling (Wang et al., 2021), recommender systems (Zhang and Chen, 2020), anomaly detection (Alsentzer et al., 2020; Cai et al., 2021), graph meta-learning (Huang and Zitnik, 2020), subgraph matching (Liu et al., 2020; Lou et al., 2020), and molecular/protein study in life sciences (Wang and Zhang, 2021).

Refer to caption — Figure 1. GNNs cannot correctly predict whether $x$ is more likely linked with $y$ or $z$ : $y$ and $z$ have the same representation without distinct features. The representations based on one-hop neighbors (query-induced subgraph) are more expressive in distinguishing node pairs $(x,y)$ and $(x,z)$ .

Albeit with multiple benefits of its algorithm, SGRL methods currently face two major computational challenges: (1) Query Dependency. A subgraph must be extracted for each queried node set, which is not reusable across different queries, and cannot be preprocessed if the query is unknown; (2) Irregularity. The extracted subgraphs are irregularly sized, resulting in poor batch processing and load-balancing performance. As Fig. 3 (a) shows, subgraph extraction in SEAL (Zhang and Chen, 2018; Zhang et al., 2021) is prohibitively slow for practical deployment. This inspired recent work on dedicated hardware acceleration for extracting subgraphs (DGL, 2022; PyG, 2022). However, how to fundamentally improve the scalability and efficiency of SGRL methods remains largely unexplored.

SUREL (Yin et al., 2022) is the state-of-the-art (SOTA) framework that applies algorithm and system co-design to implement SGRL. It decouples the input from specific queries by sampling node-level random walks and uses the joint walks of queried nodes as a proxy of subgraphs. Specifically, SUREL treats each node in the graph as a seed and runs multiple random walks from the seed offline. Given a queried node set, SUREL online joins and encodes the sampled walks of all queried nodes for prediction. The join operation builds the connection between sampled walks of queried nodes so that their joins can function as the query-induced subgraph. To compensate for the structure loss by representing subgraphs in walks, a structural feature termed relative position encoding (RPE) is adopted to record the relative distance between each distinct node in sampled walks and the seed. The RPE is pre-computed offline and attached to sampled walks before being fed into neural networks (NNs) to make predictions. Sampled walks from one seed can be reused for multiple queries whenever that seed node is involved. Through this walk-sharing mechanism, SUREL significantly improves the efficiency of SGRL. The regularity of walks also enables highly parallel walk sampling and online joining with a dedicated system design. However, SUREL still faces several inherent drawbacks of the walk-based representation, namely high node redundancy in sampled walks (over 55% is duplicated, see Fig. 3 (b)). This further raises the following issues: (1) extra space for hosting walks and positional encoding in memory, (2) extra time for operations on duplicated nodes in subsequent routines of walk joining and NN-based encoding, and (3) high workload of data transfer between CPU and GPU.

In this work, we upgrade SUREL and develop a novel framework SUREL+ that again benefits from algorithm and system co-design. The core concept of SUREL+ is simple: instead of using walks to represent subgraphs, we now employ node sets, thereby obviating node duplication. However, this new idea based on node sets also introduces several algorithm and system design difficulties. From an algorithmic perspective, as explored in this study, the transition from induced subgraphs to walks and then to node sets results in a considerable loss of structural information. Therefore, the first priority is to develop a method that can compensate for such loss while maintaining performance. On the system side, SUREL (Yin et al., 2022) utilizes walks, which can be easily stored and processed in an aligned format by controlling sampling parameters. In contrast, node sets are irregular in size, creating difficulties for efficient storage and fast access. To sum up, how to coordinate the designs of both sides constitutes the primary challenge of this work.

SUREL+ tackles the above challenges through its entire pipeline. To avoid node duplication in walk sampling, SUREL+ only keeps unique nodes from neighborhood sampling of seed nodes during preprocessing. To compensate for structural information loss, SUREL+ incorporates various types of set samplers and structure encoders to preserve local graph structures: set samplers employ different graph metrics to measure node importance and determine sampling rules; structure encoders support landing probabilities of random walks (Li et al., 2019), shortest path distances, and personalized PageRank scores (Jeh and Widom, 2003), covering most of the structural features used by previous SGRL models (Zhang and Chen, 2018; Zhang et al., 2021; Li et al., 2020; Teru et al., 2020; Yin et al., 2022). Furthermore, SUREL+ designs a customized sparse data structure, namely SpG, which can efficiently store sampled node sets and achieve fast access. A sparse operator SpJoin is developed accordingly to perform join operations on the sampled node sets and associated structural features for serving queries online. To capture diverse levels of interactions between node and structural features, SUREL+ introduces multiple set neural encoders, such as multi-linear perception with mean pooling, set attention (Veličković et al., 2018) and LSTM (Hamilton et al., 2017a) that ensure sufficient expressive power and consistent performance across various types of SGRL tasks.

Overall, our contributions can be summarized as follows:

•

Algorithm: SUREL+ is a novel SGRL framework (open source), which utilizes reusable node sets associated with various structural features to represent query-induced subgraphs via online joining. Compared with the SOTA baselines, the proposed set-based subgraph representation greatly reduces memory and computation costs without degrading prediction performance.
•

System: SUREL+ designs a customized sparse data structure SpG and a sparse join operator SpJoin to support efficient storage and fast access of node sets, which achieves much lower latency and higher throughput than previous SGRL methods.
•

We conduct extensive experiments on 9 real-world graphs, with millions/billions of nodes/edges, and demonstrate the advantages of SUREL+ in link/relation-type/motif prediction tasks. SUREL+ is 3-11 $\times$ faster than the current SOTA SGRL method SUREL while maintaining comparable or even better prediction accuracy. SUREL+ also achieves $\sim$ 20 $\times$ speedups with substantial accuracy improvements over other SGRL baselines.

2. Preliminaries

2.1. Notations and Relevant Definitions in SGRL

Let $\mathcal{G}(\mathcal{V},\mathcal{E},X)$ be an attributed graph with node set $\mathcal{V}=\{1,2,...,n\}$ and edge set $\mathcal{E}$ , where $X\in\mathbb{R}^{n\times d}$ denotes node attributes with $d$ -dimension. A query $Q\subset\mathcal{V}$ is a node set of interest for a certain type of task. We denote the subgraph induced by query $Q$ as $\mathcal{G}_{Q}$ and the node-induced subgraph as $\mathcal{G}_{u}$ , where induced subgraphs are typically within a small number of hops.

Definition 2.0 (Subgraph-based Graph Representation Learning (SGRL)).

Given a query $Q$ of node set over graph $\mathcal{G}$ , SGRL aims to learn a representation of the query-induced subgraph $\mathcal{G}_{Q}$ to make prediction $f(\mathcal{G}_{Q})$ . $f(\cdot)$ is usually a neural network. SGRL tasks come with some labeled queries $\{(Q_{i},y_{i})\}_{i=1}^{L}$ for supervision (positive samples) and other unlabeled queries $\{Q_{i}\}_{i=L+1}^{L+N}$ for inference.

Examples of SGRL Tasks Link prediction seeks to estimate the likelihood of a link between two nodes in a given graph, where a query $Q$ corresponds to a node pair. It can be further generalized to predict links with types over heterogeneous graphs (Teru et al., 2020) or to predict blood vessels (Paetzold et al., 2021) and chemical bonds (Jumper et al., 2021) in domain-specific graphs. Tasks beyond pairwise relations are named higher-order pattern prediction, where a query $Q$ consists of three or more nodes. In this work, we consider that given partially observed pairwise relations among queried nodes in $Q$ , whether these queried nodes will establish certain full higher-order relation of interest (Srinivasan et al., 2021; Liu et al., 2022).

Review of SGRL Methods The current SGRL pipeline mainly has three parts, as shown in the Algorithm Design section of Fig. 2: subgraph preparation, structural feature construction, and neural encoder to obtain the readout of subgraphs. Classical SGRL models often group query-dependent parts together (e.g., SEAL (Zhang and Chen, 2018; Zhang et al., 2021) couples subgraph extraction with labeling trick (Zhang et al., 2021)), and then apply GNNs on extracted and labeled subgraphs for prediction. However, such coupling is expensive and makes the computed intermediate results (e.g. labeled subgraphs) not reusable across queries, which motivates recent SGRL methods to decouple them. SUREL (Yin et al., 2022) substitutes explicit subgraph extraction with online joining of multiple pre-sampled walks attached with positional encoding defined on walk landing as structural features, both of which are node-level and thus can be reused to serve multiple queries. Lastly, it applies neural networks to encode joint walks and aggregate their embeddings for prediction.

2.2. Related Works

Scalable SGRL Design. Recent works on SGRL models have primarily focused on efficient subgraph extraction. Various techniques have been proposed, including PPR-based (Bojchevski et al., 2020; Zeng et al., 2021) and random walk-based (Yin et al., 2022) subgraph samplers, node neighborhood sampling through CUDA kernel (DGL, (DGL, 2022)) and tensor operations (PyG, (PyG, 2022)). Some frameworks have customized data structures to better support subgraph operations and gain higher throughput, such as associative arrays in SUREL (Yin et al., 2022), temporal-CSR in TGL (Zhou et al., 2022), and GPU-orientated dictionary in NAT (Luo and Li, 2022). To achieve scalable modeling design, GDGNN (Kong et al., 2022) utilizes node representations along the geodesic path between queried nodes for prediction, partially decoupling structural feature construction from subgraph extraction. BUDDY (Chamberlain et al., 2023) employs subgraph sketches to avoid explicitly constructing subgraphs for link prediction. However, these works either focus on specific aspects of computational issues in SGRL, namely bottlenecks of extraction, storage, and feature construction, or are limited to one type of SGRL task. In contrast, SUREL+ provides a comprehensive co-design approach in scalable sampling, efficient storage, and expressive modeling, offering a general and scalable framework for various SGRL tasks.

3. The framework of SUREL+

This section introduces SUREL+, whose key concept is to sample node sets and encode structural features offline and then join them online as a proxy of query-induced subgraph for representation learning. This approach only keeps distinct nodes in the sampled set for reuse in different queries, effectively addressing memory and computation concerns of node duplication in the walk-based representation adopted by SUREL (Yin et al., 2022). SUREL+ features a modular design that supports various set samplers, structure encoders, and set neural encoders to provide a trade-off between complexity and expressiveness after reducing subgraphs to node sets. Furthermore, SUREL+ introduces a customized sparse data structure SpG and an arithmetic operator SpJoin to store node sets and perform their online joins efficiently. Fig. 2 summarizes and compares SUREL+ and current SGRL models. The following subsections describe these modules in detail.

3.1. Set Samplers and Structure Encoders

SUREL+ uses set samplers to sample a set of nodes from each node’s neighborhood and calls structure encoders to construct the corresponding structural features. Both operations are executed offline: the former is primarily for computational benefits, while the latter is to offset the structure loss of node sets reducing from subgraphs (adopted by SEAL (Zhang and Chen, 2018; Zhang et al., 2021)) or walks (adopted by SUREL (Yin et al., 2022)). Conceptually, SUREL+ represents the node-induced subgraph $\mathcal{G}_{u}$ via a combination of (1) a node set $\mathcal{S}_{u}$ comprising unique nodes sampled from the neighborhood of node $u$ and (2) the associated structural features $\mathcal{Z}_{u}$ reflects the position of sampled nodes in $\mathcal{G}_{u}$ .

Set Samplers Two types of set samplers are adopted. The first type, named Walk-based Sampler, is to sample short-step random walks and then reduce sampled walks into a set of unique nodes. The second type, named Metric-based Sampler, is based on graph metrics that measure the proximity between neighboring nodes and the seed, such as personalized PageRank (PPR) scores (Jeh and Widom, 2003) or short path distances. Specifically, the walk-based sampler runs $M$ -many $m$ -step random walks, starting from each seed $u$ in parallel on the graph $\mathcal{G}$ , and then only puts distinct nodes of sampled walks into the set $\mathcal{S}_{u}$ . The metric-based sampler, taking PPR-based (Bojchevski et al., 2020) as an example, first runs the push-flow algorithm (Andersen et al., 2006) to obtain an approximation of the PPR vector for each seed $u$ , and then selects the top- $K$ nodes with the highest PPR scores into the set $\mathcal{S}_{u}$ . Mathematically, PPR scores are convergent landing probabilities of seeded random walks that reach infinite steps. Therefore, these two samplers complement each other by leveraging either more local or global structures of the graph. We use hyper-parameters $M$ , $m$ to control random walks, and $K$ to control metric-based samplers, which are all set as some constants in practice. The complexity of the above offline sampling procedures is $O(|\mathcal{V}|)$ .

Structure Encoders The structure encoder is to construct structural features $\mathcal{Z}_{u,x}\in\mathbb{R}^{k}$ for each node $x$ in the sampled node set $\mathcal{S}_{u}$ . These features prove to be crucial for inference tasks on graphs involving multiple nodes (Zhang et al., 2021) and can be conceptually understood as anchoring sampled node $x$ in the seed $u$ ’s neighborhood. One possible choice is landing probabilities of random walk (Li et al., 2019, 2020; Yin et al., 2022): each element $\mathcal{Z}_{u,x}[i]$ stores the counts of node $x$ landed at step $i$ of all walks rooted at the seed $u$ divided by the number of walks performed by the sampler. Landing probabilities (LPs) can be computed along with walk sampling. Another option is the shortest path distance (SPD) between $x$ and $u$ (Zhang and Chen, 2018; Li et al., 2020; Zhang et al., 2021), which records their relative distance in terms of reachability. PPR scores (Jeh and Widom, 2003) is also a popular structural feature and can be naturally obtained by running a PPR-based sampler. Later, we denote the collection of structural features for all nodes in $\mathcal{S}_{u}$ as $\mathcal{Z}_{u}=\{\mathcal{Z}_{u,x}|x\in\mathcal{S}_{u}\}$ .

3.2. Set-based Storage - SpG

Set-based subgraph representation has advantages in terms of flexibility and compactness. However, the uneven sizes of sampled node sets pose great challenges to their storage and fast access. Note that, these node sets need to be frequently visited in subsequent online phases. To overcome these obstacles, SUREL+ designs a customized compressed sparse row (CSR) format called SpG, which reorganizes the storage of node sets and their structural features in a memory-efficient manner, as depicted in Fig. 4. Specifically, the node set $\mathcal{S}_{u}$ and its structural features $\mathcal{Z}_{u}$ are stored as a row of SpG, denoted as $\texttt{SpG}[u,:]$ . Multiple node sets and their associated structural features are consolidated into three contiguous arrays:

•

indptr $\delta\in\mathbb{Z}^{n+1}$ , an integer array tracks the starting index of each stored node set (row). It records the cumulative sum of the sizes of all node sets $\mathcal{S}_{u},~{}\forall u\in\mathcal{V}$ , e.g., $\delta_{u+1}=\delta_{u}+|\mathcal{S}_{u}|$ , where $|\mathcal{S}_{u}|$ represents the size of the set $\mathcal{S}_{u}$ . The total number of sampled nodes stored in SpG is $\delta_{n+1}$ ;
•

indices $I\in\mathbb{Z}^{\delta_{n+1}}$ , a coalesce array of all node sets $\mathcal{S}_{u},~{}\forall u\in\mathcal{V}$ . The segment $I[\delta_{u}:\delta_{u+1}]$ corresponds to node indices of the set $\mathcal{S}_{u}$ in sorted order. This ordering is particularly useful for speeding up the join operation discussed in Sec. 3.3.
•

SFptr $D\in\mathbb{R}^{\delta_{n+1}}$ , an array contains the values of the structural features $\mathcal{Z}_{u}$ , or the indices of encoding stored in the array $D_{\text{SF}}$ . $D_{\text{SF}}$ is introduced to eliminate duplicate encoding of structural features, which typically reside in GPU memory. This two-level indexing can further reduce memory needs when LPs/SPDs are used, as they are likely to have many repeated values, but it is not necessary when using PPR scores since their values tend to be distinct.

Table 1. Complexity comparison of GRL models. Suppose using

O(|\mathcal{E}|)

-many queries, SGRLs use partial edges (

q\ll|\mathcal{E}|

) for training.

S

and

K

denote the average size of extracted subgraphs and sampled node sets, respectively.

L

is the number of layers.

d

and

k

are respective dimensions of node and structural features. Assume

d

is fixed for all layers. Both SUREL and SUREL+ use the walk-based sampler for

M

-many

m

-step walks.

c

is the number of distinct

k

-dim structural features.

\delta_{n+1}

is the size sum of all node sets.

Methods	GNN (Kipf and Welling, 2017)	SEAL (Zhang and Chen, 2018; Zhang et al., 2021)	SUREL (Yin et al., 2022)	SUREL+
Structure	$O(\|\mathcal{V}\|+\|\mathcal{E}\|)$	$O(S\|\mathcal{E}\|)$	$O(mM\|\mathcal{V}\|)$	$O(\delta_{n+1})$
Feature	$O(d\|\mathcal{V}\|)$	$O(kS\|\mathcal{E}\|)$	$O(\delta_{n+1}*k)$	$O(\delta_{n+1}+c*k)$
Time	$O(\|\mathcal{E}\|Ld+\|\mathcal{E}\|Ld^{2})$	$O(qS^{L}d^{2})$	$O(qmMd^{2})$	$O(qKd^{2})$

Regarding the cost of SpG, indptr array is of size $|\mathcal{V}|+1$ , and the size of both indices and SFptr arrays is $\delta_{n+1}$ . The compressed encoding array $D_{\text{SF}}$ has a size of $c*k$ , where $c$ is the number of distinct structural features and $k$ denotes feature dimension. The overall complexity of SpG is $O(|\mathcal{V}|+\delta_{n+1}+c*k)$ .

Comparison with Other Methods Table 1 summarizes the space and time complexity of GRL methods. By adopting the walk-based sampler (sampling $M$ -many $m$ -step walks), $\delta_{n+1}$ amounts to around one-fifth space of $O(mM|\mathcal{V}|)$ used by SUREL. The metric-based sampler (sampling top- $K$ PPR scores) results in $\delta_{n+1}=K|\mathcal{V}|$ and $K<mM$ in general. Both values are substantially lower than $O(S|\mathcal{E}|)$ used by SEAL, where $S$ is the average size of extracted subgraphs. SUREL+ further reduces the memory footprint, when the two-level indexing is employed for hosting structural features and only distinct values are stored in $D_{\text{SF}}$ . In practice, $c$ typically remains independent of $|\mathcal{V}|$ . SpG enables SUREL+ to handle SGRL tasks more efficiently on large-scale graph data.

3.3. Joining Node Sets via Sparse Operations

The goal of joining node sets is to connect queried nodes and construct query-level subgraphs from pre-sampled node sets to make predictions. For a given query $Q$ , we merge relevant node sets $S_{u},\forall u\in Q$ into $\mathcal{S}_{Q}=\bigcup_{u\in Q}\mathcal{S}_{u}$ and join their node-level structural features $\mathcal{Z}_{u}$ to the query level. In essence, query-level structural features $\mathcal{Z}_{Q}$ record the relative position of each node $x\in\mathcal{S}_{Q}$ with respect to the set of queried nodes $Q$ (equivalently labeling the query-induced subgraph $\mathcal{G}_{Q}$ ). Specifically, for a node $x$ in $\mathcal{S}_{Q}$ , the query-level structural feature $\mathcal{Z}_{Q,x}$ is obtained by merging its node-level ones $\mathcal{Z}_{u,x}$ for all queried node $u$ in $Q$ as

(1)

\displaystyle\mathcal{Z}_{Q,x}=||_{u\in Q}\mathcal{Z}_{u,x}=[\dots\mathcal{Z}_% {u,x}\dots]\in\mathbb{R}^{|Q|\times k},

where $||$ denotes concatenation. In cases where $\mathcal{Z}_{u,x}$ does not exist as node $x\notin S_{u}$ , it is set to all zeros. For instance, in Fig. 5, node $b$ is in $\mathcal{S}_{v}$ but not in $\mathcal{S}_{u}$ , hence $\mathcal{Z}_{u,b}$ is set to zero. $\mathcal{Z}_{Q}$ is a collection of $\mathcal{Z}_{Q,x},\forall x\in\mathcal{S}_{Q}$ . Together, $\mathcal{S}_{Q}$ and $\mathcal{Z}_{Q}$ function as the query-induced subgraph $\mathcal{G}_{Q}$ , which is later fed into the neural encoder to obtain the query-level readout for prediction.

The JOIN operator in databases is used to merge tables and establish connections. Concatenation in Eq. (1) requires matching among different node sets with varying sizes and arbitrary node orders, where an outer JOIN is well-suited for this task. In this case, the JOIN operator returns associated values from target sets based on node indices as the specified common field, regardless of their existence. To obtain $\mathcal{Z}_{Q,x}$ , node sets $\{\mathcal{S}_{u}\}_{u\in Q}$ are treated as tables: if the index of node $x$ matches one of the node indices in $\mathcal{S}_{u}$ for all $u$ in $Q$ , then the associated structural feature $\mathcal{Z}_{u,x}$ is appended; otherwise, the field is filled with zeros. However, iterating over all $\mathcal{S}_{u}$ ’s to retrieve $\mathcal{Z}_{u,x}$ for each node $x\in\mathcal{S}_{Q}$ is highly inefficient, as its complexity can be $O(|Q|*|\mathcal{S}_{Q}|^{2})$ per query. This becomes even more challenging when performing these operations for massive queries with varying sizes of $\mathcal{S}_{u}$ and $\mathcal{S}_{Q}$ .

To tackle this issue, we design an efficient arithmetic operator SpJoin to perform joins on sparse data objects of SpG in parallel. This operator reduces the per-query time complexity to $O(|Q|*|\mathcal{S}_{Q}|)$ by taking advantage of node indices of $\mathcal{S}_{u}$ stored in SpG are unique and in sorted order. The following demonstrates the use of SpJoin for a query $Q=\{u,v\}$ .

Sparse Join Operator The operator SpJoin performs an outer JOIN for query $Q$ on the sampled node sets from seeds $u$ and $v$ stored in SpG as $\texttt{SpG}[u,:]$ and $\texttt{SpG}[v,:]$ through

	SpJoin	$\displaystyle(\texttt{SpG}[u,:],\texttt{SpG}[v,:])=$
		$\displaystyle{\color[rgb]{0,0,1}\texttt{SpAdd}}\definecolor{temp}{rgb}{0,0,0}% \color[rgb]{0,0,0}\left(\color[rgb]{0,0,0}\texttt{mask},\texttt{SpG}[u,:]% \definecolor{temp}{rgb}{0,0,0}\color[rgb]{0,0,0}\right)\color[rgb]{0,0,0}{% \color[rgb]{1,0,0}-1}~{}\|\|~{}{\color[rgb]{0,0,1}\texttt{SpAdd}}\definecolor{% temp}{rgb}{0,0,0}\color[rgb]{0,0,0}\left(\color[rgb]{0,0,0}\texttt{mask},% \texttt{SpG}[v,:]\definecolor{temp}{rgb}{0,0,0}\color[rgb]{0,0,0}\right)\color% [rgb]{0,0,0}{\color[rgb]{1,0,0}-1},$

where $\texttt{mask}={\color[rgb]{1,.5,0}\texttt{bool}}(\texttt{SpAdd}(\texttt{SpG}[u% ,:],\texttt{SpG}[v,:]))$ .

As illustrated in Fig. 5, SpJoin consists of three steps:

(1)

It utilizes sparse arithmetic operations from SciPy (Virtanen et al., 2020): SpAdd performs an element-wise addition ( $X\oplus Y$ ) of the non-zero elements in $X$ and $Y$ ; the resulting values are converted to binary via the bool operator and saved in the mask, which corresponds to node indices of the union set $\mathcal{S}_{Q}$ .
(2)

SpAdd are applied between mask and each $\texttt{SpG}[u,:],\,\forall u\in Q$ following by the reduction ‘-1’, which explicitly adds missing values (all zeros by default) to structural features $\mathcal{Z}_{u,x}$ for all $x$ if $x\not\in\mathcal{S}_{u}$ while $x\in\mathcal{S}_{Q}$ .
(3)

When the two-level indexing is enabled, the results of SpJoin are pointers saved in SFptr, which can be used to gather the values of structural features $\mathcal{Z}_{Q}$ from the array $D_{\text{SF}}$ .

Multithreading is employed to leverage the pattern of single program multiple data in arithmetic operations of SpJoin. Since the processing time of each query linearly depends on the size of $\mathcal{S}_{Q}$ , we further divide queries of each training batch into groups with nearly balanced sums of $|\mathcal{S}_{Q}|$ ’s, and assign one thread per group to mitigate potential delays caused by uneven workloads.

Comparison with SUREL (Yin et al., 2022) SUREL adopts a hash-based join operator to construct query-level structural features, but its overall computation and memory cost is much higher than SUREL+. This is due to the presence of numerous repeated nodes in walks, depicted in Fig. 3 (b). The set-based input of SUREL+ substantially reduces the workload of transferring data from CPU to GPU and also requires fewer per-query operations on GPU to process transmitted $\mathcal{Z}_{Q}$ ’s. As Table 1 shows, SUREL+ reduces time complexity from $O(mM)$ to $O(K)$ , where $K<mM$ is the average size of sampled node sets. These advantages ultimately enable SUREL+ to achieve superior performance in terms of efficiency and scalability.

3.4. Set Neural Encoders

After joining node sets for each query $Q$ , the resulting $(\mathcal{S}_{Q},\mathcal{Z}_{Q})$ acts as the query-induced subgraph $\mathcal{G}_{Q}$ and then is fed into a neural encoder for prediction. The mini-batch training procedure of multiple queries is summarized in Algorithm 1. Next, we introduce neural encoders supported by SUREL+.

The adopted neural encoders are simple. For each $(\mathcal{S}_{Q},\mathcal{Z}_{Q})$ ,

(2)

h_{Q}=\texttt{AGGR}\definecolor{temp}{rgb}{0,0,0}\color[rgb]{0,0,0}\left(% \color[rgb]{0,0,0}\{enc(\mathcal{Z}_{Q,x})|x\in\mathcal{S}_{Q}\}\definecolor{% temp}{rgb}{0,0,0}\color[rgb]{0,0,0}\right)\color[rgb]{0,0,0}\in\mathbb{R}^{d}.

Here, $enc(\cdot)$ encodes query-level structural features $\mathcal{Z}_{Q,x}$ using a multi-linear perception (MLP). If node attributes are present, they can be appended after structural features as $\mathcal{Z}_{Q,u}||X_{u}$ . AGGR is used to aggregate the encoded features, which can be any neural encoders applicable to sets such as mean/sum/max pooling or set transformers. Currently, SUREL+ supports the implementations of AGGR in mean pooling, LSTM (Hamilton et al., 2017a), and attention (Veličković et al., 2018). Note that, the LSTM applies random permutations to the elements in the set before encoding them as a sequence; while the attention first computes soft attention scores based on the output of $enc(\cdot)$ for each set element and then performs attention-score-weighted pooling. Sec. 4.4 empirically demonstrates that the choice of AGGR has non-trivial effects on prediction performance. Lastly, a fully connected layer takes the readout $h_{Q}$ as input to make the final prediction $\hat{y}_{Q}$ . In our experiments, all SGRL tasks are formulated as binary classification, and thus Binary Cross Entropy is used as the loss function $\mathcal{L}$ .

Input: Given a graph

\mathcal{G}(\mathcal{V},\mathcal{E},X)

, a group of queries

\{(Q,y_{Q})\}

for training, batch size

B

, a set SAMPLER, a structure ENCODER, and a set AGGR

Output: A neural network for encoding subgraphs

enc(\cdot)

1 Preprocessing: SAMPLER and ENCODER

\to(\mathcal{S}_{u},\mathcal{Z}_{u})

for all

u\in\mathcal{V}

; convert and save

(\mathcal{S}_{u},\mathcal{Z}_{u})

’s as SpG objects.

2 for each mini-batch $\mathcal{Q}_{B}=\{...,Q,...\}$ do

3 Generate negative training queries (if not given)

\{...,\bar{Q},...\}

by random sampling and put them into

\mathcal{Q}_{B}

;

4 Call SpJoin operator to perform joining on SpG objects

\{(\mathcal{S}_{u},\mathcal{Z}_{u})|u\in Q\}

for all queries

Q\in\mathcal{Q}_{B}

in parallel;

5 Encode the joined results

(\mathcal{S}_{Q},\mathcal{Z}_{Q})

as proxy of subgraphs via Eq. (2) with specified AGGR and get the prediction

\hat{y}_{Q}

from readout

h_{Q}

by multithreads;

6 Backward propagation based on the loss

\mathcal{L}(\hat{y}_{Q},y_{Q})

7 end for

Algorithm 1 The mini-batch training pipeline of SUREL+

4. Evaluation

In this section, we aim to evaluate the following questions:

•

Regarding space and time complexity, how much improvement can SUREL+ achieve by adopting node sets instead of walks compared to the SOTA framework SUREL?
•

Can SUREL+ provide comparable prediction performance to all baselines using or not using subgraph-based methods?
•

How sensitive is SUREL+ to choices of different set samplers, structure encoders, and set neural encoders?
•

How do sparse storage SpG and parallelism in SpJoin operator perform and benefit the overall performance of SUREL+?

4.1. Experiment Setup

Extensive experiments have been performed to evaluate SUREL+ using nine homogeneous, heterogeneous, and higher-order homogeneous graphs on three types of tasks: link prediction, relation type prediction, and higher-order pattern prediction. A homogeneous graph is a graph that does not contain node/link types, while a heterogeneous graph includes various node/link types. In our setting, higher-order graphs are hypergraphs consisting of hyperedges connecting two or more nodes.

Datasets Table 2 summarizes the statistics of datasets used to benchmark SGRL methods. Five datasets are selected from the Open Graph Benchmark (OGB, (Hu et al., 2020)) for link and relation type prediction, including social networks of citation - citation2 and collaboration - collab; biological network of protein interaction - ppa and blood vessels - vessel; and one heterogeneous academic network ogb-mag, which contains node types of paper (P), author (A) and their extracted relations. The vessel dataset is a large ( $>$ 3M nodes), sparse, biological graph recently constructed from mouse brains (Paetzold et al., 2021), and has unique significance for examining GRL in scientific discovery. The structure of vessels illustrates the spatial organization of the brain’s microvasculature, which can be used for early detection of neurological disorders, e.g. Alzheimer’s and stroke. Two hypergraph datasets collected by (Benson et al., 2018) are used for higher-order pattern prediction: DBLP-coauthor is a temporal hypergraph, where each hyperedge denotes a time-stamped paper connecting all its authors. tags-math contains groups of tags applied to questions on the website math.stackexchange.com as hyperedges. For higher-order pattern prediction tasks, the number of hyperedges is the main computation bottleneck, in which one may connect more than two nodes. Two industry-level graphs, criteo-click with 16.5M records of online banner ads clicking (Diemert et al., 2017) and twitter-2010 with 1.5B user following relations (Kwak et al., 2010) are used to examine the model scalability for real-world applications.

Settings For link prediction, OGB’s standard data split is used to isolate validation and test links from the input graph. For prediction tasks of relation type and higher-order pattern, the same procedure to prepare graph data is adopted as in (Yin et al., 2022): the relations of paper-author (P-A, ”written by”) and paper-paper (P-P, ”cited by”) are selected; higher-order queries in hypergraph datasets are node triplets, where the goal is to predict whether it will foster in a hyperedge given two of them have observed pairwise connections; to learn the representation on hypergraphs, we project hyperedges into cliques and treat the projection results as ordinary graphs. All experiments are run 10 times independently, and we report the mean performance and standard deviation.

Table 2. Summary Statistics for Evaluation Datasets.

Dataset

Type

#Nodes

#Edges

Split(%)

criteo-click

Homo./Bipartite

Campaign: 675

User: 6,142,256

16,468,027

97/1.5/1.5

twitter-2010

Homo./Social.

41,652,230

1,468,364,884

99.98/0.01/0.01

citation2

Homo./Social.

2,927,963

30,561,187

98/1/1

collab

Homo./Social.

235,868

1,285,465

92/4/4

ppa

Homo./Bio.

576,289

30,326,273

70/20/10

vessel

Homo./Bio.

3,538,495

5,345,897

80/10/10

ogb-mag

Hetero.

(P): 736,389

(A): 1,134,649

P-A: 7,145,660

P-P: 5,416,271

99/0.5/0.5

tags-math

Higher.

1,629

projected: 91,685

hyperedges: 822,059

60/20/20

DBLP-

coauthor

Higher.

1,924,991

projected: 7,904,336

hyperedges: 3,700,067

60/20/20

Table 3. Prediction Performance for Links, Relation Types and Higher-Order Patterns: the best (bold) and the second best (underlined).

Models	citation2	click	twitter	collab	ppa	vessel	Models	MAG(P-A)	MAG(P-P)	tags-math	DBLP-coauthor
Models	MRR (%)			Hits@50 (%)	Hits@100 (%)	ROC-AUC	Models	MRR (%)
GCN	84.74±0.21	5.31±0.17	OOM	44.75±1.07	18.67±1.32	43.53±9.61	H*GCN	39.43±0.29	57.43±0.30	51.64±0.27	37.95±2.59
GraphSAINT	79.85±0.40	2.86±0.63	4.12±0.73	53.12±0.52	3.83±1.33	47.14±6.83	H*SAGE	25.35±1.49	60.54±1.60	54.68±2.03	22.91±0.94
GDGNN	86.96±0.28	13.30±0.45	49.86±0.39	54.74±0.48	45.92±2.14	75.84±0.08	R-GCN	37.10±1.05	56.82±4.71	-	-
SEAL	87.67±0.32	OOM	OOM	63.64±0.71	48.80±3.16	80.50±0.21	SUREL	45.33±2.94	82.47±0.26	71.86±2.15	97.66±2.89
SUREL	89.74±0.18	40.39±0.61	OOM	63.34±0.52	53.23±1.03	86.16±0.39	SUREL+	58.81±0.42	80.45±0.13	77.73±0.16	99.83±0.02
SUREL+	88.90±0.06	60.87±0.15	55.67±0.67	64.10±1.06	54.32±0.44	85.73±0.88	/	/	/	/	/

Table 4. Breakdown of Runtime, Memory Consumption for Different Models on Prediction of Link, Relation Type, and Higher-order Pattern. The column Train records the runtime per 10K queries.

Models	Runtime (s)			Memory (GB)		Runtime (s)			Memory (GB)		Runtime (s)			Memory (GB)		Runtime (s)			Memory (GB)
Models	Prep.	Train	Inf.	RAM	SDRAM	Prep.	Train	Inf.	RAM	SDRAM	Prep.	Train	Inf.	RAM	SDRAM	Prep.	Train	Inf.	RAM	SDRAM
Dataset	criteo-click					twitter-2010					citation2					ppa
GCN	3	0.085	8	3.1	62.74	-	-	-	-	OOM	17	21.74	105	9.3	36.84	2	0.026	1.2	4.6	11.35
GraphSAINT	1	0.012	20	13.1	8.79	111	0.009	920	253	76.60	151	1.79	107	9.6	9.78	10	0.003	1.5	4.9	23.06
GDGNN	215	1.43	2,928	16.2	23.77	1204	1.84	9,744	188	79.34	338	2.26	5,460	40.6	16.96	127	1.77	902	21.1	10.27
SEAL	-	-	-	OOM	-	-	-	-	OOM	-	46	3.52	24,626	35.4	5.71	46	10.57	3,988	9.5	12.13
SUREL	2	1.59	2,307	11.7	16.25	-	-	-	OOM	-	151	4.14	6,081	25.1	9.68	31	2.68	1,429	13.6	31.01
SUREL+	22	0.23	502	10.4	11.93	327	0.26	3,779	210	49.44	130	0.35	1,389	16.7	4.75	69	0.72	201	9.8	19.02

Baselines We consider two classes of baselines. Canonical GNNs: GCN (Kipf and Welling, 2017), GraphSAGE (Hamilton et al., 2017a), GraphSAINT (Zeng et al., 2020) and their variants with the prefix ‘H*’ that are directly applied for heterogeneous graphs with node types and for hypergraphs through clique expansion. R-GCN (Schlichtkrull et al., 2018) performs relational message passing on heterogeneous graphs. SGRL Models: SEAL (Zhang and Chen, 2018; Zhang et al., 2021), GDGNN (Kong et al., 2022), and SUREL (Yin et al., 2022). SEAL adopts online subgraph sampling due to its intractable space needs for offline extraction. Fig. 3 (a) compares the time cost for subgraph sampling across different SGRL methods. We use all baselines’ official implementations with tuned hyperparameters to match their reported results.

Hyperparameters By default, SUREL+ uses the walk-based sampler, the structural encoder LP, and the better set neural encoder tuned between mean pooling and attention. SUREL+ adopts a 2-layer MLP as $enc(\cdot)$ in Eq. (2) followed by a 2-layer MLP classifier to map set-aggregated readouts for final predictions. Default training hyperparameters: learning rate lr=1e-3 with early stopping of 5 epochs, dropout p=0.1, Adam (Kingma and Ba, 2015) as the optimizer. Analysis of parameters $M$ and $m$ to control the walk-based sampler and $K$ to control the metric-based sampler and selection of structure encoders and set neural encoders are studied in Sec. 4.4.

Evaluation Metrics The evaluation metrics include Hits@P, Mean Reciprocal Rank (MRR), and Area Under Curve (ROC-AUC). Hit@P counts the ratio of positive samples ranked at the top-P place against negative ones. MRR first computes the inverse of the rank of the first correct prediction and then takes the average of obtained reciprocal ranks for a sample of queries. For all datasets adopting MRR, each positive query is paired with 1000 randomly sampled negative test queries, except tags-math using 100 and crieo-click using 650. ROC-AUC follows the standard definition to measure the model’s performance in binary classification.

Environment We use a server with two Intel Xeon Gold 6248R CPUs, 512GB DRAM, and NVIDIA A100 (80GB) GPU. SUREL+ is built on PyTorch 1.12 and PyG 2.2. Set samplers are implemented in C, OpenMP, NumPy, Numba, and uhash, integrated into Python scripts; SpG is customized based on the CSR format of Scipy (Virtanen et al., 2020).

4.2. Prediction Accuracy Comparison

Table 3 shows the prediction performance of different methods. SGRL models significantly outperform canonical GNNs on these six link prediction benchmarks, especially on two challenging biological datasets ppa and vessel. Predicting links in biological datasets requires richer structural information that canonical GNNs have limited expressive power to capture. Within SGRL models, SUREL+ achieves comparable performance to SUREL and outperforms SEAL, which validates the effectiveness of the proposed set-based representation for subgraphs. For predictions of relation type and higher-order pattern, we observe additional performance gains (+2 $\sim$ 13%) from SUREL+ compared to SUREL on three of the four datasets. A large performance gap exists between canonical GNNs and SGRL models, particularly in the higher-order case. This demonstrates the inherent limitations of canonical GNNs to make predictions of complex relations involving multiple nodes.

4.3. Efficiency and Scalability Analysis

Improved Efficiency in Training and Inference.

Table 4 compares model runtime and memory usage on the four largest benchmarks. SUREL+ offers a reasonable training time compared with canonical GNNs. It shows clear improvement in inference compared to the current SOTA framework SUREL (3-11 $\times$ speedups across all datasets) and its predecessor SEAL ( $\sim$ 20 $\times$ speedups). SUREL+ achieves comparable and even lower RAM usage than canonical GNNs. Compared to other SGRL models, it can save up to half of RAM with lower usage of GPU SDRAM. This is attributed to set-based subgraphs eliminating node duplicates with structural features, which is further echoed by the analysis in Table 1 and the empirical results in Table 4. The key factor scales SUREL+ to billion-size graphs is its set-based subgraph with the sparse design, while GCN (full adjacency matrix), SEAL (complex subgraph extraction), and SUREL (dense walks with duplicate nodes) are all out of memory (OOM) on twitter-2010.

Profiling Different Strategies for Offline Processing

Fig. 5(a) reports the time cost of different samplers with multithreading on citation2. Fig. 5(b) shows memory consumption to store different types of sampled data (walks in SUREL (Yin et al., 2022) or sets in SUREL+) and associated structural features (LPs, SPDs, PPR scores). Compared to the SUREL sampler, the walk-based sampler in SUREL+ is more efficient and only adds one extra minute for encoding and converting data to SpG format (slash/dash marked in Fig. 5(a)), while achieving $6.94\times$ , $3.63\times$ and $4.12\times$ memory savings on three OGB datasets, respectively. Those savings are crucial for model scalability as they reduce data transfer from CPU to GPU and reduce GPU operations on duplicate nodes. These two factors dominate the online stage and thus lead to improved memory usage and runtime of SUREL+ in Table 4. In addition, the PPR-based sampler has better scaling performance with more threads. When PPR scores or SPDs are used as structural features, SUREL+ further reduces the memory footprint, though they often slightly harm prediction performance.

Note that, in the above comparison of memory cost, techniques of compressing structural features are adopted both in SUREL (locally) and SUREL+ (globally). When LPs are used as structural features, the two-level indexing in SpG achieves compression of $493\times$ , $11318\times$ , $19527\times$ on three datasets listed in Fig. 5(b).

Scaling Analysis for SpJoin

Fig. 7 shows the speedups and throughput of the SpJoin operator for constructing query-level structural features via multithreading, where the walk join operation of SUREL is used for comparison. SUREL employs a hash-based search for joining walks, which has unfavorable memory access patterns and suffers from imbalanced workloads due to inconsistent searching times across different threads. SUREL+ gains more benefits from multithreading, thanks to sparse arithmetic operations and batch-wise load balancing used in SpJoin.

4.4. Comparison between Different Set Samplers, Structural Features and Set Neural Encoders

SUREL+ is a modularized framework that supports different set samplers (walk- and metric-based), structural features (LP, SPD, PPR), and set neural encoders AGGR (mean pooling, LSTM, attention).

Table 5 shows the prediction performance and inference runtime by adopting different combinations of structure encoders and set neural encoders. Landing probabilities (LPs) as structural features perform the best on all three OGB datasets while being the slowest for inference. By recording the landing probabilities over different steps of walks, LPs provide structural information in finer granularity than scalar values of SPDs and PPR scores. Furthermore, the adopted link prediction task might favor more local information held by LPs and SPDs than global information carried by PPR scores. The authors conjecture that other tasks that rely on more global information may favor PPR scores. In comparison, no set neural encoder is always a winner. Attention seems to perform the best on average while slower than mean pooling. LSTM is the slowest. On the two social networks (citation2 and collab), mean pooling can provide comparable prediction results with much fewer parameters. However, prediction on the biological network (ppa) requires more expressive and complicated encoders, where LSTM and attention are favored as they can model more complex interactions between sampled nodes in the union set $\mathcal{S}_{Q}$ .

Fig. 8 compares prediction results and inference time by using different hyperparameters $m,M$ , and $K$ of set samplers, which heavily affects the coverage of sampled neighborhoods and computation overhead. The performance consistently increases if the walk-based sampler uses a larger $M$ , but is not guaranteed for a larger $m$ (broader exploration). Better coverage with a larger $K$ is usually beneficial for the metric-based sampler over citation2 but not for collab, which is due to different characteristics of these two datasets and is also observed by (Yin et al., 2022). In general, small sampling parameters $m~{}(2\sim 4),M~{}(100\sim 400)$ and $K~{}(50\sim 200)$ can yield satisfactory performance with fast inference speed that achieves the trade-off between accuracy and efficiency.

Table 5. Prediction Performance and Inference Time of SUREL+ with Different Combinations of Structure Features (LP, SPD, PPR) and Set Neural Encoders (Mean, LSTM, Attn.). The best and the second best are highlighted in bold and underlined accordingly.

Dataset	PPR+Mean	SPD+Mean	LP+Mean	LP+LSTM	LP+Attention
citation2	78.59±0.38	87.99±1.07	88.55±0.15	88.46±0.34	88.90±0.06
citation2	834	1057s	1389s	3678s	2171s
collab	47.15±0.21	62.11±0.13	64.10±1.06	61.31±1.37	62.85±1.19
collab	1.4s	1.7s	2.0s	3.5s	2.3s
ppa	13.28±1.20	41.06±1.70	46.41±1.65	54.45±1.35	54.32±0.44
ppa	63s	126s	165s	322s	201s

5. Conclusion

This work proposes a novel framework SUREL+ for scalable subgraph-based graph representation learning. SUREL+ avoids costly subgraph extraction by decoupling it into sampled node sets with structural features, whose join can function as query-induced subgraphs for prediction. SUREL+ benefits from the reusability and compactness of pre-sampled node sets across different queries. Compared to the SOTA framework SUREL, the set-based subgraph of SUREL+ substantially reduces space and time complexity by avoiding heavy node duplication in sampled walks. To handle irregularly sized node sets, SUREL+ designs a customized sparse storage SpG and a sparse join operator SpJoin, providing memory-efficient storage with fast access. In addition, SUREL+ adopts a modular design, enabling users to choose different set samplers, structure encoders, and set neural encoders flexibly based on the nature of their SGRL tasks. Extensive experiments on three types of prediction tasks over nine real-world graph benchmarks show that SUREL+ significantly improves scalability, memory efficiency, and prediction accuracy compared to current SGRL methods and canonical GNNs.

Acknowledgements.

The authors would like to thank Rongzhe Wei and Yanbang Wang for their helpful discussions and valuable feedback. Haoteng Yin and Pan Li are supported by the 2021 JPMorgan Faculty Award, NSF awards OAC-2117997, IIS-2239565.

References

(1)
Alsentzer et al. (2020) Emily Alsentzer, Samuel Finlayson, Michelle Li, and Marinka Zitnik. 2020. Subgraph neural networks. Advances in Neural Information Processing Systems 33 (2020), 8017–8029.
Andersen et al. (2006) Reid Andersen, Fan Chung, and Kevin Lang. 2006. Local graph partitioning using pagerank vectors. In The 47th Annual IEEE Symposium on Foundations of Computer Science. IEEE, 475–486.
Benson et al. (2018) Austin R Benson, Rediet Abebe, Michael T Schaub, Ali Jadbabaie, and Jon Kleinberg. 2018. Simplicial closure and higher-order link prediction. Proceedings of the National Academy of Sciences 115, 48 (2018), E11221–E11230.
Bojchevski et al. (2020) Aleksandar Bojchevski, Johannes Klicpera, Bryan Perozzi, Amol Kapoor, Martin Blais, Benedek Rózemberczki, Michal Lukasik, and Stephan Günnemann. 2020. Scaling graph neural networks with approximate pagerank. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2464–2473.
Bouritsas et al. (2022) Giorgos Bouritsas, Fabrizio Frasca, Stefanos P Zafeiriou, and Michael Bronstein. 2022. Improving graph neural network expressivity via subgraph isomorphism counting. IEEE Transactions on Pattern Analysis and Machine Intelligence (2022).
Cai et al. (2021) Lei Cai, Zhengzhang Chen, Chen Luo, Jiaping Gui, Jingchao Ni, Ding Li, and Haifeng Chen. 2021. Structural temporal graph neural networks for anomaly detection in dynamic graphs. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 3747–3756.
Chamberlain et al. (2023) Benjamin Paul Chamberlain, Sergey Shirobokov, Emanuele Rossi, Fabrizio Frasca, Thomas Markovich, Nils Hammerla, Michael M Bronstein, and Max Hansmire. 2023. Graph Neural Networks for Link Prediction with Subgraph Sketching. In International Conference on Learning Representations.
Chen et al. (2018) Jie Chen, Tengfei Ma, and Cao Xiao. 2018. Fastgcn: fast learning with graph convolutional networks via importance sampling. In International Conference on Learning Representations.
Chen et al. (2020) Zhengdao Chen, Lei Chen, Soledad Villar, and Joan Bruna. 2020. Can graph neural networks count substructures? Advances in Neural Information Processing Systems 33 (2020), 10383–10395.
Chiang et al. (2019) Wei-Lin Chiang, Xuanqing Liu, Si Si, Yang Li, Samy Bengio, and Cho-Jui Hsieh. 2019. Cluster-gcn: An efficient algorithm for training deep and large graph convolutional networks. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 257–266.
DGL (2022) DGL. 2022. 6.7 Using GPU for Neighborhood Sampling — DGL 0.9.1post1 documentation. https://docs.dgl.ai/guide/minibatch-gpu-sampling.html
Diemert et al. (2017) Eustache Diemert, Julien Meynet, Pierre Galland, and Damien Lefortier. 2017. Attribution modeling increases efficiency of bidding in display advertising. In Proceedings of the AdKDD and TargetAd Workshop. ACM, 1–6.
Frasca et al. (2022) Fabrizio Frasca, Beatrice Bevilacqua, Michael M Bronstein, and Haggai Maron. 2022. Understanding and Extending Subgraph GNNs by Rethinking Their Symmetries. Advances in Neural Information Processing Systems 35 (2022).
Garg et al. (2020) Vikas Garg, Stefanie Jegelka, and Tommi Jaakkola. 2020. Generalization and representational limits of graph neural networks. In International Conference on Machine Learning. PMLR, 3419–3430.
Hamilton et al. (2017a) Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017a. Inductive representation learning on large graphs. Advances in Neural Information Processing Systems 30 (2017), 1025–1035.
Hamilton (2020) William L Hamilton. 2020. Graph representation learning. Synthesis Lectures on Artificial Intelligence and Machine Learning 14, 3 (2020), 1–159.
Hamilton et al. (2017b) William L. Hamilton, Rex Ying, and Jure Leskovec. 2017b. Representation Learning on Graphs: Methods and Applications. IEEE Data Eng. Bull. 40, 3 (2017), 52–74.
Hu et al. (2020) Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta, and Jure Leskovec. 2020. Open graph benchmark: Datasets for machine learning on graphs. Advances in Neural Information Processing Systems 33 (2020), 22118–22133.
Huang and Zitnik (2020) Kexin Huang and Marinka Zitnik. 2020. Graph meta learning via local subgraphs. Advances in Neural Information Processing Systems 33 (2020), 5862–5874.
Jeh and Widom (2003) Glen Jeh and Jennifer Widom. 2003. Scaling personalized web search. In Proceedings of the 12th International Conference on World Wide Web. 271–279.
Jumper et al. (2021) John Jumper, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunyasuvunakool, Russ Bates, Augustin Žídek, Anna Potapenko, et al. 2021. Highly accurate protein structure prediction with AlphaFold. Nature 596, 7873 (2021), 583–589.
Kingma and Ba (2015) Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In International Conference on Learning Representations.
Kipf and Welling (2017) Thomas N Kipf and Max Welling. 2017. Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations.
Koller et al. (2007) Daphne Koller, Nir Friedman, Sašo Džeroski, Charles Sutton, Andrew McCallum, Avi Pfeffer, Pieter Abbeel, Ming-Fai Wong, Chris Meek, Jennifer Neville, et al. 2007. Introduction to statistical relational learning. MIT press.
Kong et al. (2022) Lecheng Kong, Yixin Chen, and Muhan Zhang. 2022. Geodesic Graph Neural Network for Efficient Graph Representation Learning. Advances in Neural Information Processing Systems 35 (2022).
Kwak et al. (2010) Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is Twitter, a social network or a news media?. In Proceedings of the 19th International Conference on World Wide Web. 591–600.
Li et al. (2019) Pan Li, I Chien, and Olgica Milenkovic. 2019. Optimizing generalized pagerank methods for seed-expansion community detection. Advances in Neural Information Processing Systems 32 (2019), 11710–11721.
Li et al. (2020) Pan Li, Yanbang Wang, Hongwei Wang, and Jure Leskovec. 2020. Distance Encoding: Design Provably More Powerful Neural Networks for Graph Representation Learning. Advances in Neural Information Processing Systems 33 (2020), 4465–4478.
Liu et al. (2020) Xin Liu, Haojie Pan, Mutian He, Yangqiu Song, Xin Jiang, and Lifeng Shang. 2020. Neural subgraph isomorphism counting. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1959–1969.
Liu et al. (2022) Yunyu Liu, Jianzhu Ma, and Pan Li. 2022. Neural Predicting Higher-Order Patterns in Temporal Networks. In Proceedings of the Web Conference 2022. ACM, 1340–1351.
Lou et al. (2020) Zhaoyu Lou, Jiaxuan You, Chengtao Wen, Arquimedes Canedo, Jure Leskovec, et al. 2020. Neural Subgraph Matching. arXiv preprint arXiv:2007.03092 (2020).
Luo and Li (2022) Yuhong Luo and Pan Li. 2022. Neighborhood-aware Scalable Temporal Network Representation Learning. Learning on Graphs Conference (2022).
Meng et al. (2018) Changping Meng, S Chandra Mouli, Bruno Ribeiro, and Jennifer Neville. 2018. Subgraph pattern neural networks for high-order graph evolution prediction. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.
Paetzold et al. (2021) Johannes C Paetzold, Julian McGinnis, Suprosanna Shit, Ivan Ezhov, Paul Büschl, Chinmay Prabhakar, Anjany Sekuboyina, Mihail Todorov, Georgios Kaissis, Ali Ertürk, et al. 2021. Whole Brain Vessel Graphs: A Dataset and Benchmark for Graph Learning and Neuroscience. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track.
Peng et al. (2022) Jingshu Peng, Zhao Chen, Yingxia Shao, Yanyan Shen, Lei Chen, and Jiannong Cao. 2022. Sancus: staleness-aware communication-avoiding full-graph decentralized training in large-scale graph neural networks. Proceedings of the VLDB Endowment 15, 9 (2022), 1937–1950.
PyG (2022) PyG. 2022. Accelerating PyG on NVIDIA GPUs. https://www.pyg.org//ns-newsarticle-accelerating-pyg-on-nvidia-gpus
Schlichtkrull et al. (2018) Michael Schlichtkrull, Thomas N Kipf, Peter Bloem, Rianne Van Den Berg, Ivan Titov, and Max Welling. 2018. Modeling relational data with graph convolutional networks. In European semantic web conference. Springer, 593–607.
Srinivasan and Ribeiro (2020) Balasubramaniam Srinivasan and Bruno Ribeiro. 2020. On the equivalence between positional node embeddings and structural graph representations. In International Conference on Learning Representations.
Srinivasan et al. (2021) Balasubramaniam Srinivasan, Da Zheng, and George Karypis. 2021. Learning over Families of Sets-Hypergraph Representation Learning for Higher Order Tasks. In Proceedings of the 2021 SIAM International Conference on Data Mining (SDM). SIAM, 756–764.
Teru et al. (2020) Komal Teru, Etienne Denis, and Will Hamilton. 2020. Inductive relation prediction by subgraph reasoning. In International Conference on Machine Learning. PMLR, 9448–9457.
Veličković et al. (2018) Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2018. Graph attention networks. In International Conference on Learning Representations.
Virtanen et al. (2020) Pauli Virtanen, Ralf Gommers, Travis E. Oliphant, Matt Haberland, Tyler Reddy, David Cournapeau, Evgeni Burovski, Pearu Peterson, Warren Weckesser, Jonathan Bright, Stéfan J. van der Walt, Matthew Brett, Joshua Wilson, K. Jarrod Millman, Nikolay Mayorov, Andrew R. J. Nelson, Eric Jones, Robert Kern, Eric Larson, C J Carey, İlhan Polat, Yu Feng, Eric W. Moore, Jake VanderPlas, Denis Laxalde, Josef Perktold, Robert Cimrman, Ian Henriksen, E. A. Quintero, Charles R. Harris, Anne M. Archibald, Antônio H. Ribeiro, Fabian Pedregosa, Paul van Mulbregt, and SciPy 1.0 Contributors. 2020. SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods 17 (2020), 261–272.
Wan et al. (2022a) Cheng Wan, Youjie Li, Ang Li, Nam Sung Kim, and Yingyan Lin. 2022a. BNS-GCN: Efficient full-graph training of graph convolutional networks with partition-parallelism and random boundary node sampling. Proceedings of Machine Learning and Systems 4, 673–693.
Wan et al. (2022b) Cheng Wan, Youjie Li, Cameron R Wolfe, Anastasios Kyrillidis, Nam Sung Kim, and Yingyan Lin. 2022b. Pipegcn: Efficient full-graph training of graph convolutional networks with pipelined feature communication. In International Conference on Learning Representations.
Wang and Zhang (2021) Xiyuan Wang and Muhan Zhang. 2021. GLASS: GNN with Labeling Tricks for Subgraph Representation Learning. In International Conference on Learning Representations.
Wang et al. (2021) Yanbang Wang, Yen-Yu Chang, Yunyu Liu, Jure Leskovec, and Pan Li. 2021. Inductive Representation Learning in Temporal Networks via Causal Anonymous Walks. In International Conference on Learning Representations.
Wu et al. (2022) Lingfei Wu, Peng Cui, Jian Pei, Liang Zhao, and Xiaojie Guo. 2022. Graph neural networks: foundation, frontiers and applications. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 4840–4841.
Xu et al. (2019) Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. 2019. How Powerful are Graph Neural Networks?. In International Conference on Learning Representations.
Yin et al. (2022) Haoteng Yin, Muhan Zhang, Yanbang Wang, Jianguo Wang, and Pan Li. 2022. Algorithm and System Co-design for Efficient Subgraph-based Graph Representation Learning. Proceedings of the VLDB Endowment 15, 11 (2022), 2788–2796.
Zeng et al. (2021) Hanqing Zeng, Muhan Zhang, Yinglong Xia, Ajitesh Srivastava, Andrey Malevich, Rajgopal Kannan, Viktor Prasanna, Long Jin, and Ren Chen. 2021. Decoupling the depth and scope of graph neural networks. Advances in Neural Information Processing Systems 34 (2021), 19665–19679.
Zeng et al. (2020) Hanqing Zeng, Hongkuan Zhou, Ajitesh Srivastava, Rajgopal Kannan, and Viktor Prasanna. 2020. Graphsaint: Graph sampling based inductive learning method. In International Conference on Learning Representations.
Zhang and Chen (2018) Muhan Zhang and Yixin Chen. 2018. Link prediction based on graph neural networks. Advances in Neural Information Processing Systems 31 (2018), 5165–5175.
Zhang and Chen (2020) Muhan Zhang and Yixin Chen. 2020. Inductive Matrix Completion Based on Graph Neural Networks. In International Conference on Learning Representations.
Zhang et al. (2021) Muhan Zhang, Pan Li, Yinglong Xia, Kai Wang, and Long Jin. 2021. Labeling Trick: A Theory of Using Graph Neural Networks for Multi-Node Representation Learning. Advances in Neural Information Processing Systems 34 (2021), 9061–9073.
Zhou et al. (2022) Hongkuan Zhou, Da Zheng, Israt Nisa, Vasileios Ioannidis, Xiang Song, and George Karypis. 2022. TGL: A General Framework for Temporal GNN Training on Billion-Scale Graphs. Proceedings of the VLDB Endowment 15, 8 (2022), 1572–1580.

Appendix A Notations

Frequently used symbols are summarized in Table 6.

Appendix B More Details

Table 6. Summary of Frequently Used Notations.

Symbol	Meaning
Q	a query (set of nodes), i.e. $Q=\{u,v,w\}$
$\mathcal{Q}$	a set of queries, i.e. $Q\in\mathcal{Q}$
$\mathcal{G}_{u}$	a subgraph induced by node $u$
$\mathcal{G}_{Q}$	a subgraph induced by query $Q$
$\mathcal{S}_{u}$	a set of unique nodes sampled from the neighborhood of the seed node $u$
$\mathcal{Z}_{u,x}$	structural features of node $x$ regarding the seed node $u$ (all zeros if $x\notin\mathcal{S}_{u}$ )
$\mathcal{Z}_{u}$	collection of structural features for all nodes in $\mathcal{S}_{u}$ as $\mathcal{Z}_{u}=\{\mathcal{Z}_{u,x}\|x\in\mathcal{S}_{u}\}$
$\|\|$	the concatenation that joins node-level structural features, i.e. join $\mathcal{Z}_{\cdot,x}$ for a query $Q=\{u,v,w\}$ as $[\mathcal{Z}_{u,x},\mathcal{Z}_{v,x},\mathcal{Z}_{w,x}]$ .
$\mathcal{Z}_{Q,x}$	query-level structural features for node $x$ regarding the query $Q$ , $\mathcal{Z}_{Q,x}=\|\|_{u\in Q}\mathcal{Z}_{u,x}$

B.1. Other Related Works

Scalable GNN Design. GNNs are the most widely used toolbox for graph representation learning nowadays, although they face certain challenges when directly applied to subgraph-based methods. To address the scalability of GNNs, current studies focus on improving graph subsampling and mini-batch training techniques (Chiang et al., 2019; Zeng et al., 2020). However, graph subsampling used in GNNs fundamentally differs from subgraph extractions in SGRL. The goal of subsampling is to handle GPU memory overflow during full-batch training of GNN models. For SGRL, subgraphs sampled around a query serve as features for making predictions. Consequently, the scaling techniques developed for GNNs cannot be directly applied to SGRL. Another direction is to deploy distributed GNN systems for industry-level graphs. Unfortunately, these specialized techniques, including pipelining (Wan et al., 2022b), partitioned parallelism (Wan et al., 2022a), and update with staleness (Peng et al., 2022) do not address the main bottleneck of subgraph extraction for SGRL methods.

B.2. Model Design

Benefits of Subgraph-based Graph Representation Learning First, subgraph-based representation is versatile for different types of tasks, especially when queries of certain tasks involving multiple nodes and relations, e.g. existence of a link, property of a motif, development of higher-order patterns; while canonical GNNs are limited to handle such polyadic dynamics via node-wise representations (Srinivasan and Ribeiro, 2020; Wang and Zhang, 2021). Second, subgraph-based models are more expressive by pairing with structural features to obtain most expressive structural representations (Srinivasan and Ribeiro, 2020; Li et al., 2020; Wang and Zhang, 2021; Bouritsas et al., 2022). However, canonical GNNs cannot capture intra-distance information and joint relations over multiple nodes, which are critical to distinguishing nodes in structural symmetry and making predictions over them (also refers to the example in Fig. 2). Lastly, subgraph-based methods decouple the model depth from the receptive field since extracted subgraphs are localized to certain hops: when adding more layers for non-linearity, it does not contaminate embedding with irrelevant nodes or get over-smooth as canonical GNNs do. This results in a more robust representation and is particularly beneficial for modeling relations beyond singleton.

Table 7. Summary Statistics and Experimental Setup for Evaluation Datasets.

Dataset

Type

#Nodes

#Edges

Avg. Node Deg.

Density

Split Ratio

Split Type

Metric

criteo-click

Homo./Bipartite

Campaign(C): 675

User(U): 6,142,256

16,468,027

2.68

N/A

97/1.5/1.5

Time

MRR

twitter-2010

Homo./Social.

41,652,230

1,468,364,884

35.25

0.00017%

99.98/0.01/0.01

Random

MRR

citation2

Homo./Social.

2,927,963

30,561,187

20.7

0.00036%

98/1/1

Time

MRR

collab

Homo./Social.

235,868

1,285,465

8.2

0.0046%

92/4/4

Time

Hits@50

ppa

Homo./Bio.

576,289

30,326,273

73.7

0.018%

70/20/10

Throughput

Hits@100

vessel

Homo./Bio.

3,538,495

5,345,897

3.02

0.000085%

80/10/10

Random

AUC-ROC

ogb-mag

Hetero.

Paper(P): 736,389

Author(A): 1,134,649

P-A: 7,145,660

P-P: 5,416,271

21.7

N/A

99/0.5/0.5

Time

MRR

tags-math

Higher.

1,629

91,685 (projected)

822,059 (hyperedges)

N/A

60/20/20

Time

MRR

DBLP-coauthor

Higher.

1,924,991

7,904,336 (projected)

3,700,067 (hyperedges)

N/A

60/20/20

Time

MRR

Table 8. [Extended] Breakdown of Runtime, Memory Consumption for Different Models on Prediction of Link, Relation Type, and Higher-order Pattern. The column Train records the runtime per 10K queries.

Models	Runtime (s)			Memory (GB)		Runtime (s)			Memory (GB)		Runtime (s)			Memory (GB)		Runtime (s)			Memory (GB)
Models	Prep.	Train	Inf.	RAM	SDRAM	Prep.	Train	Inf.	RAM	SDRAM	Prep.	Train	Inf.	RAM	SDRAM	Prep.	Train	Inf.	RAM	SDRAM
Dataset	citation2					ppa					collab					vessel
GCN	17	21.74	105	9.3	36.84	2	0.026	1.2	4.6	11.35	2	0.005	0.05	2.5	5.50	5	0.076	0.3	2.8	36.98
GraphSAINT	151	1.79	107	9.6	9.78	10	0.003	1.5	4.9	23.06	1	0.004	0.08	2.5	8.11	5	0.008	15	6.9	10.21
GDGNN	338	2.26	5,460	40.6	16.96	127	1.77	902	21.1	10.27	14	0.74	15	4.3	1.08	25	0.85	84	7.2	8.03
SEAL	46	3.52	24,626	35.4	5.71	46	10.57	3,988	9.5	12.13	5	4.05	37	4.0	6.20	6	10.69	998	6.2	2.46
SUREL	151	4.14	6,081	25.1	9.68	31	2.68	1,429	13.6	31.01	1	2.13	17	3.4	9.86	5	1.57	32	5.8	5.18
SUREL+	130	0.35	1,389	16.7	4.75	69	0.72	201	9.8	19.02	7	0.27	2	2.8	3.37	3	0.31	3	3.3	1.25
Dataset	MAG(P-A)					MAG(P-P)					tags-math					DBLP-coauthor
H*GCN	3	0.03	9	5.0	21.56	4	0.03	13	5.5	21.66	2	0.004	1.3	2.4	3.10	-	0.58	95	8.0	25.80
H*SAGE	3	0.03	10	5.0	20.29	4	0.03	13	5.5	20.28	1	0.003	1.3	2.4	3.10	-	0.32	77	7.5	24.70
R-GCN	1	0.52	5	5.3	26.34	1	0.52	4	5.1	31.41	-	-	-	-	-	-	-	-	-	-
SUREL	10	3.20	1,998	7.3	7.18	15	0.99	1924	8.1	16.66	-	2.13	341	3.0	5.95	11	1.29	949	9.7	7.79
SUREL+	58	0.33	101	7.2	2.95	77	0.13	168	8.1	13.49	1	0.67	116	2.4	5.70	8	0.24	315	3.8	3.16

B.3. Datasets

The full statistics of benchmark datasets are summarized in Table 7. OGB datasets¹¹1https://ogb.stanford.edu/docs/dataset_overview/ are selected to benchmark our proposed framework and other baselines. The benchmark contains large-scale graphs (millions of nodes/edges) for real-world applications (e.g., academic and biological networks) and provides standard, open-sourced evaluation metrics and toolkits. Note that, vessel is a newly added benchmark of a biological graph, with $>3M$ nodes and sparse vessel structures extracted from the whole mouse brain (Paetzold et al., 2021), where nodes represent bifurcation points, and edges represent the blood vessels. Each node is associated with features of its physical location in the coordinate space $(x,y,z)$ . The introduction of vessel provides a unique opportunity to examine graph representation learning approaches in neuroscience, especially in scaling subgraph-based methods to handle sparse and spatial graphs with millions of nodes and edges for scientific discovery.

criteo-click contains a sample of 30 days of Criteo live traffic data, each corresponding to one impression (a banner) displayed to a user and whether it is clicked (Diemert et al., 2017). Each record has 9 contextual features that are aggregated into a 270-dimensional edge feature. There are 675 unique campaign banners and 6.1M users, consisting of a bipartite graph of 16.5M edges: 97% is used for training, and the rest is evenly split for validation and testing based on temporal orders. The task is to predict which campaign the user is most likely to click among 651 candidates. twitter-2010 is an industry-level social network with 1.5B user following relations (Kwak et al., 2010). An edge $(i,j)$ of this network indicates that user $i$ is followed by user $j$ . 1% of Twitter users who follow 10 to 1000 accounts are randomly sampled for evaluation. The task is to recommend which account they will most likely follow among 1001 candidates. The OGB formatted files of these two datasets are accessible via Box at https://purdue.box.com/v/SGRL-LSC-dataset.

B.4. Baselines

For link prediction and relation type prediction, baseline models are selected based on their scalability and prediction performance from the current OGB leaderboard ²²2https://ogb.stanford.edu/docs/leader_linkprop/. All models listed on the leaderboard are publicly accessible. We adopt their reported numbers on the leaderboard with verification. For the rest of the baselines, we benchmark these models using their official implementations with tuned hyperparameters as listed below.

•

Canonical GNNs: a graph auto-encoder model that uses graph convolution layers to learn node-wise representations, including GCN (Kipf and Welling, 2017), GraphSAGE (Hamilton et al., 2017a), and their more scalable variants by employing graph subsampling, such as GraphSAINT (Zeng et al., 2020).
•

R-GCN³³3https://github.com/pyg-team/pytorch_geometric/blob/master/examples (Schlichtkrull et al., 2018): a relational GCN that models heterogeneous graphs with different types of node/link.
•

SEAL⁴⁴4https://github.com/facebookresearch/SEAL_OGB (Zhang and Chen, 2018): apply GCN on query-induced subgraphs attached with double radius node labeling to obtain subgraph-level readout for link prediction. SEAL shows great empirical performance on multiple graph machine learning benchmarks and promotes the deployment of subgraph-based models for scientific discovery. The implementation we tested is specialized for OGB datasets provided in (Zhang et al., 2021).
•

GDGNN⁵⁵5https://github.com/woodcutter1998/gdgnn (Kong et al., 2022): a subgraph-based model aggregates node representations generated by GNNs along geodesic paths between queried nodes for fast inference.
•

SUREL⁶⁶6https://github.com/Graph-COM/SUREL(Yin et al., 2022): a walk-based computation framework to accelerate subgraph-based methods, where subgraphs are decomposed to pre-sampled walks and then are joined online to substitute the query-induced subgraph for prediction. By adopting the walk-based representation, SUREL achieves state-of-the-art scalability and prediction accuracy on SGRL tasks.

All canonical GNN baselines⁷⁷7https://github.com/snap-stanford/ogb/tree/master/examples/linkproppred come with three GCNConv/SAGEConv layers of 256 hidden dimensions, and a tuned dropout ratio in $\{0,0.5\}$ for full-batch training. Canonical GNNs aggregate all node embeddings involved in a query as the representation of link/hyperedge, which is later fed into an MLP classifier for final prediction. In addition, all GNN models need to use full training data (edges/triplets) to generate robust node representations. The hypergraph datasets do not come with raw node features, and thus GNN baselines use randomly initialized features as input for training along with other model parameters. R-GCN uses RGCNConv layers that support message passing with multiple relation types between different types of nodes, where the edge types (relations) are used as input besides node features.

Subgraph-based models only use partial edges/triplets for training. For SEAL, 1-hop enclosing subgraphs are extracted online during the training and inference. Then, it applies three GCN layers of 32 hidden dimensions plus a sort pooling and several 1D convolution layers to generate a readout of the target subgraph for prediction. SUREL consists of a 2-layer MLP for query-level relative position encoding (RPE) and a 2-layer RNN to encode joined walks with attached RPEs. The hidden dimension of both networks is set to 64. The obtained readout of joined walks is aggregated and fed into a 2-layer MLP classifier to make predictions. GDGNN employs GINLayer as its backbone to obtain node embeddings. The horizontal geodesic representation is used for predictions, which finds the shortest path between two nodes in a query and aggregates node representations generated by GNNs along the found geodesic path. The max search distance for geodesic is the same as the number of GNN layers. For collab, ppa, citation2 and vessel, the threshold of distance is set to 4, 4, 3, and 2, respectively. The hidden dimension of all fully connected layers is set to 32.

Appendix C Architecture and Hyperparameter

SUREL+ uses a 2-layer MLP with ReLU activation for encoding structural features and supports three set neural encoders, including mean pooling, LSTM, and attention. LSTM interprets elements to be aggregated in a set as a sequence (Hamilton et al., 2017a); attention first calculates soft attention scores for elements in a set and then performs attention-score-weighted average pooling. The hidden dimension of all parameterized layers is set to 96. Lastly, hidden representations of query-level joined node sets are fed into a 2-layer MLP classifier for final predictions.

The walk-based sampler builds on the sampling function from SubGAcc⁸⁸8https://github.com/VeritasYin/subg_acc library developed by the authors, which also provides the support for efficient structural feature compression and index remapping. The metric-based sampler is adopted from fast PPR approximation in (Bojchevski et al., 2020).

Table 9. Hyperparameters Used for Benchmark SUREL+.

Dataset	#steps $m$	#walks $M$	#negative samples $k$	Structural Feature	Set Neural Encoder
criteo-click	4	200	10	LP	Mean
twitter-2010	4	100	25	LP	Mean
citation2	4	100	10	LP	Mean
collab	3	200	10	LP	Mean
ppa	4	200	20	LP	Attn.
vessel	2	50	5	LP	Mean
MAG (P-A)	3	200	10	LP	Mean
MAG (P-P)	4	100	10	LP	Mean
tags-math	4	200	10	LP	Mean
DBLP-coauthor	3	100	10	LP	Mean

We follow the inductive setting for link and relation prediction: only partial samples will be used for training. Over the training graph, we randomly select 5% links as positive training queries, each paired with $k$ -many negative samples ( $k=10$ by default). We mask these links and use the remaining 95% links to compute each node’s structural features in the split training set via structure encoder. For vessel, as the input graph is very sparse, we first sort the nodes in training set by their degree and then randomly pick 5% nodes to obtain edges of their 2-hop induced subgraphs for training and the rest reserved for structural feature construction. For higher-order pattern prediction, we use the given graph before timestamp $t$ to sample node sets and encode their structural features. The model parameters are optimized by triplets provided in the training set. No node features are used in SUREL+, except for vessel where normalized physical locations of each node are attached after its structural features and similarly for contextual features in click.

Table 8 presents the extended version of Table 4. The results reported in Table 3 and the profiling of SUREL+ in Tables 4, 8 are obtained through the combination of hyperparameters listed in Table 9. The dropout rate on vessel is set to p=0.2. The metric-based sampler is adopted to obtain the results of using PPR and SPD as structural features in Table 5. Its sampling size $K$ is set to $50$ , $50$ and $150$ for collab, ppa, citation2, respectively. The walk-based sampler is used for the results of LP as structural features, whose sampling parameters are listed in Table 9. The rest of the hyperparameters remain the same as reported in Sec. 4.1. The SUREL+ framework including SubGAcc library is open-source and free for academic use under the BSD-2-Clause license.