survey

Public Access

Graph Neural Networks in Recommender Systems: A Survey

Authors:

Shiwen Wu,

Fei Sun,

Wentao Zhang,

Xu Xie,

Bin CuiAuthors Info & Claims

ACM Computing Surveys, Volume 55, Issue 5

Article No.: 97, Pages 1 - 37

https://doi.org/10.1145/3535101

Published: 03 December 2022 Publication History

All formats PDF

Abstract

With the explosive growth of online information, recommender systems play a key role to alleviate such information overload. Due to the important application value of recommender systems, there have always been emerging works in this field. In recommender systems, the main challenge is to learn the effective user/item representations from their interactions and side information (if any). Recently, graph neural network (GNN) techniques have been widely utilized in recommender systems since most of the information in recommender systems essentially has graph structure and GNN has superiority in graph representation learning. This article aims to provide a comprehensive review of recent research efforts on GNN-based recommender systems. Specifically, we provide a taxonomy of GNN-based recommendation models according to the types of information used and recommendation tasks. Moreover, we systematically analyze the challenges of applying GNN on different types of data and discuss how existing works in this field address these challenges. Furthermore, we state new perspectives pertaining to the development of this field. We collect the representative papers along with their open-source implementations in https://github.com/wusw14/GNN-in-RS.

1 Introduction

With the rapid development of e-commerce and social media platforms, recommender systems have become indispensable tools for many businesses [15, 25, 84, 183, 190, 200]. They can be recognized as various forms depending on industries, like product suggestions on online e-commerce websites (e.g., Amazon and Taobao) or playlist generators for video and music services (e.g., YouTube, Netflix, and Spotify). Users rely on recommender systems to alleviate the information overload problem and explore what they are interested in from the vast sea of items (e.g., products, movies, news, or restaurants). Therefore, accurately modeling users’ preferences from their historical interactions (e.g., click, watch, read, and purchase) lives at the heart of an effective recommender system.

Broadly speaking, in the past decades, the mainstream modeling paradigm in recommender systems has evolved from neighborhood methods [6, 60, 95, 123] to representation-learning-based frameworks [25, 77, 78, 125, 143]. Item-based neighborhood methods [6, 95, 123] directly recommend items to users that are similar to the historical items they have interacted with. In a sense, they represent users’ preferences by directly using their historical interacted items. Early item-based neighborhood approaches have achieved great success in real-world applications because of their simplicity, efficiency, and effectiveness.

An alternative approach is representation-learning-based methods that try to encode both users and items as continuous vectors (i.e., embeddings) in a shared space, thus making them directly comparable. Representation-based models have sparked a surge of interest since the Netflix Prize competition [7] demonstrated that matrix factorization models are superior to classic neighborhood methods for recommendations. After that, various methods have been proposed to learn the representations of users and items, from matrix factorization [77, 78] to deep learning models [25, 58, 125, 200]. Nowadays, deep learning models have been a dominant methodology for recommender systems in both academic research and industrial applications due to the ability in effectively capturing the non-linear and non-trivial user-item relationships and easily incorporating abundant data sources, e.g., contextual, textual, and visual information.

Among all those deep learning algorithms, one line is graph-learning-based methods, which consider the information in recommender systems from the perspective of graphs [151]. Most of the data in recommender systems have a graph structure essentially [8, 190]. For example, the interaction data in a recommendation application can be represented by a bipartite graph between user and item nodes, with observed interactions represented by links. Even the item transitions in users’ behavior sequences can also be constructed as graphs. The benefit of formulating recommendation as a task on graphs becomes especially evident when incorporating structured external information, e.g., the social relationship among users [33, 172] and the knowledge graph related to items [146, 196]. In this way, graph learning provides a unified perspective to model the abundant heterogeneous data in recommender systems. Early efforts in graph-learning-based recommender systems utilize graph embedding techniques to model the relations between nodes, which can be further divided into factorization-based methods, distributed-representation-based methods, and neural-embedding-based methods [151]. Inspired by the superior ability of GNN in learning on graph-structured data, a great number of GNN-based recommendation models have emerged recently.

Nevertheless, providing a unified framework to model the abundant data in recommendation applications is only part of the reason for the widespread adoption of GNN in recommender systems. Another reason is that, different from traditional methods that only implicitly capture the collaborative signals (i.e., using user-item interactions as the supervised signals), GNN can naturally and explicitly encode the crucial collaborative signal (i.e., topological structure) to improve the user and item representations. In fact, using collaborative signals to improve representation learning in recommender systems is not a new idea that originated from GNN [41, 69, 76, 184, 203]. Early efforts, such as SVD++ [76] and FISM [69], have already demonstrated the effectiveness of the interacted items in user representation learning. In view of the user-item interaction graph, these previous works can be seen as using one-hop neighbors to improve user representation learning. The advantage of GNN is that it provides powerful and systematic tools to explore multi-hop relationships that have been proven to be beneficial to the recommender systems [55, 155, 190].

With these advantages, GNN has achieved remarkable success in recommender systems in the past few years. In academic research, a lot of works demonstrate that GNN-based models outperform previous methods and achieve new state-of-the-art results on the public benchmark datasets [55, 155, 210]. Meanwhile, plenty of their variants are proposed and applied to various recommendation tasks, e.g., session-based recommendation [115, 175], point-of-interest (POI) recommendation [10, 92, 177], group recommendation [59, 153], multimedia recommendation [164, 165], and bundle recommendation [11]. In industry, GNN has also been deployed in web-scale recommender systems to produce high-quality recommendation results [32, 114, 190]. For example, Pinterest developed and deployed a random-walk-based Graph Convolutional Network (GCN) algorithm model named PinSage on a graph with 3 billion nodes and 18 billion edges, and gained substantial improvements in user engagement in online A/B test.

Differences between this survey and existing ones. There exist surveys focusing on different perspectives of recommender systems [4, 16, 22, 28, 45, 117, 200]. However, there are very few comprehensive reviews that position existing works and current progress of applying GNN in recommender systems. For example, Zhang et al. [200] and Batmaz et al. [4] focus on most of the deep-learning techniques in recommender systems while ignoring GNN. Chen et al. [16] summarize the studies on the bias issue in recommender systems. Guo et al. [45] review knowledge-graph-based recommendations, and Wang et al. [150] propose a comprehensive survey in the session-based recommendations. These two works only include some of the GNN methods applied in the corresponding sub-fields and examine a limited number of works. To the extent of our knowledge, the most relevant survey published formally is a short paper [151], which presents a review of graph-learning-based systems and briefly discusses the application of GNN in recommendation. One recent survey under review [40] classifies the existing works in GNN-based recommender systems from four perspectives of recommender systems, i.e., stage, scenario, objective, and application. Such taxonomy emphasizes recommender systems but pays insufficient attention to applying GNN techniques in recommender systems. Besides, this survey [40] provides few discussions on the advantages and limitations of existing methods. There are some comprehensive surveys on the GNN techniques [179, 208], but they only roughly discuss recommender systems as one of the applications.

Given the impressive pace at which the GNN-based recommendation models are growing, we believe it is important to summarize and describe all the representative methods in one unified and comprehensible framework. This survey summarizes the literature on the advances of GNN-based recommendation and discusses open issues or future directions in this field To this end, more than 100 studies were shortlisted and classified in this survey.

Contribution of this survey. The goal of this survey is to thoroughly review the literature on the advances of GNN-based recommender systems and discuss further directions. The researchers and practitioners who are interested in recommender systems could have a general understanding of the latest developments in the field of GNN-based recommendation. The key contributions of this survey are summarized as follows:

\(\bullet\) New taxonomy. We propose a systematic classification schema to organize the existing GNN-based recommendation models. Specifically, we categorize the existing works based on the type of information used and recommendation tasks into five categories: user-item collaborative filtering, sequential recommendation, social recommendation, knowledge-graph-based recommendation, and other tasks (including POI recommendation, multimedia recommendation, etc.).

\(\bullet\) Comprehensive review. For each category, we demonstrate the main issues to deal with. Moreover, we introduce the representative models and illustrate how they address these issues.

\(\bullet\) Future research. We discuss the limitations of current methods and propose nine potential future directions.

The remainder of this article is organized as follows: Section 2 introduces the preliminaries for recommender systems and graph neural networks. Then, it discusses the motivations of applying GNNs in recommender systems and categorizes the existing GNN-based recommendation models. Section 3 through 7 summarize the main issues of models in each category and how existing works tackle these challenges, and analyze their advantages and limitations. Section 8 gives a summary of the mainstream benchmark datasets, widely adopted evaluation metrics, and real-world applications. Section 9 discusses the challenges and points out nine future directions in this field. Finally, we conclude the survey in Section 10.

2 Backgrounds and Categorization

Before diving into the details of this survey, we give a brief introduction to recommender systems and GNN techniques. We also discuss the motivation of utilizing GNN techniques in recommender systems. Furthermore, we propose a new taxonomy to classify the existing GNN-based models. Throughout this article, we use bold uppercase characters to denote matrices, bold lowercase characters to denote vectors, italic bold uppercase characters to denote sets, and calligraphic fonts to denote graphs. For easy reading, we summarize the notations that will be used throughout the article in Table 1.

Table 1.

Notations	Descriptions
\(\mathcal {U} / \mathcal {I}\)	The set of users/items
\(\mathbf {R}=\lbrace r_{u,i}\rbrace\)	Interaction between users and items
\(\mathcal {G}_{\mathrm{S}}\)	Social relationship between users
\(\mathcal {G}_{\mathrm{KG}}\)	Knowledge graph
\(\mathcal {E}_{\mathrm{KG}}=\lbrace e_i\rbrace\)	The set of entities in Knowledge graph
\(\mathcal {R}_{\mathrm{KG}}=\lbrace r_{e_i,e_j}\rbrace\)	The set of relations in Knowledge graph
\(\mathbf {A}\)	Adjacency matrix of graph
\(\mathbf {A}^{\mathrm{in}} / \mathbf {A}^{\mathrm{out}}\)	In & out adjacency matrix of directed graph
\(\mathcal {N}_v\)	Neighborhood set of node \(v\)
\(\mathbf {h}_v^{(l)}\)	Hidden state of node embedding at layer \(l\)
\(\mathbf {n}_v^{(l)}\)	Aggregated representation of node \(v\) ’s neighbors at layer \(l\)
\(\mathbf {h}_u^{*}\)	Final representation of user \(u\)
\(\mathbf {h}_i^{*}\)	Final representation of item \(i\)
\(\mathbf {h}_u^{S}\)	Final representation of user \(u\) in the social space
\(\mathbf {h}_u^{I}\)	Final representation of user \(u\) in the item space
\(\mathbf {W}^{(l)}\)	Transformation matrix at layer \(l\)
\(\mathbf {W}_r^{(l)}\)	Transformation matrix of relation \(r\) at layer \(l\)
\(\mathbf {b}^{(l)}\)	Bias term at layer \(l\)
\(\oplus\)	Vector concatenation
\(\odot\)	Element-wise multiplication operation

Table 1. Key Notations Used in This Article

2.1 Recommender Systems

Recommender systems infer users’ preferences from user-item interactions or static features and further recommend items that users might be interested in [1]. It has been a popular research area for decades because it has great application value and the challenges in this field are still not well addressed. Formally, the task is to estimate her/his preference for any item \(i\in \mathcal {I}\) by the learned user representation \(h_u^*\) and item representation \(h_i^*\) , i.e.,

\begin{equation} y_{u,i}=f(h_u^*,h_i^*), \end{equation}

(1)

where score function \(f(\cdot)\) can be dot product, cosine, multi-layer perceptions, and so forth, and \(y_{u,i}\) denotes the preference score for user \(u\) on item \(i\) , which is usually presented in probability.

According to the types of information used to learn user/item representations, the research of recommender systems can usually be classified into specific types of tasks. The user-item collaborative filtering recommendation aims to capture the collaborative signal by leveraging only the user-item interactions; i.e., the user/item representations are jointly learned from pairwise data [58, 78, 80, 121, 125, 178]. When the timestamps of the user’s historical behavior are known or the historical behavior is organized in chronological order, the user representations can be enhanced via exploring the sequential patterns in her/his historical interactions [53, 61, 70, 85, 97, 119, 131, 136, 150]. According to whether the users are anonymous or not and whether the behaviors are segmented into sessions, works in this field can be further divided into sequential recommendation and session-based recommendation. The session-based recommendation can be viewed as a sub-type of sequential recommendation with anonymous and session assumptions [117]. In this survey, we do not distinguish them and refer to them collectively as the much broader term “sequential recommendation” for simplicity since our main focus is the contribution of GNN to recommendation, and the differences between them are negligible for the application of GNN. In addition to sequential information, another line of research exploits the social relationship to enhance the user representations, which is classified as social recommendation [43, 65, 103, 104, 105, 138]. The social recommendation assumes that the users with social relationships tend to have similar user representations based on the social influence theory that connected people would influence each other. Besides the user representation enhancement, a lot of efforts try to enhance the item representations by leveraging a knowledge graph, which expresses relationships between items through attributes. These works are always categorized as knowledge-graph-based recommender systems, which incorporate the semantic relations among items into collaborative signals.

2.2 Graph Neural Network Techniques

Recently, systems based on variants of GNN have demonstrated ground-breaking performances on many tasks related to graph data, such as physical systems [5, 122], protein structure [37], and knowledge graph [49]. In this part, we first introduce the definition of graphs, and then give a brief summary of the existing GNN techniques.

A graph is represented as \(\mathcal {G}=(\mathcal {V},\mathcal {E})\) , where \(\mathcal {V}\) is the set of nodes and \(\mathcal {E}\) is the set of edges. Let \(v_i\in \mathcal {V}\) be a node and \(e_{ij}=(v_i, v_j)\in \mathcal {E}\) be an edge pointing from \(v_j\) to \(v_i\) . The neighborhood of a node \(v\) is denoted as \(\mathcal {N}(v)=\lbrace u\in \mathcal {V} | (v,u)\in \mathcal {E} \rbrace\) . Generally, graphs can be categorized as:

\(\bullet\) Directed/Undirected Graph. A directed graph is a graph with all edges directed from one node to another. An undirected graph is considered as a special case of directed graphs where there is a pair of edges with inverse directions if two nodes are connected.

\(\bullet\) Homogeneous/Heterogeneous Graph. A homogeneous graph consists of one type of nodes and edges, and a heterogeneous graph has multiple types of nodes or edges.

\(\bullet\) Hypergraph. A hypergraph is a generalization of a graph in which an edge can join any number of vertices.

Given the graph data, the main idea of GNN is to iteratively aggregate feature information from neighbors and integrate the aggregated information with the current central node representation during the propagation process [179, 208]. From the perspective of network architecture, GNN stacks multiple propagation layers, which consist of the aggregation and update operations. The formulation of propagation is

\begin{equation} \begin{aligned}\text{Aggregation:}\quad \mathbf {n}_v^{(l)} &=\operatorname{Aggregator}_{l}\left(\left\lbrace \mathbf {h}_{u}^{{l}}, \forall u \in \mathcal {N}_{v}\right\rbrace \right)\!, \\ \text{Update:}\quad \mathbf {h}_v^{(l+1)} &=\operatorname{Updater}_{l}\left(\mathbf {h}_v^{(l)},\mathbf {n}_v^{(l)}\right), \end{aligned} \end{equation}

(2)

where \(\mathbf {h}_{u}^{(l)}\) denotes the representation of node \(u\) at the \(l{\rm {th}}\) layer, and \(\operatorname{Aggregator}_{l}\) and \(\operatorname{Updater}_{l}\) represent the function of aggregation operation and update operation at the \(l{\rm {th}}\) layer, respectively. In the aggregation step, existing works either treat each neighbor equally with the mean-pooling operation [50, 89] or differentiate the importance of neighbors with the attention mechanism [140]. In the update step, the representation of the central node and the aggregated neighborhood will be integrated into the updated representation of the central node. In order to adapt to different scenarios, various strategies are proposed to better integrate the two representations, such as GRU mechanism [89], concatenation with nonlinear transformation [50] and sum operation [140]. To learn more about GNN techniques, we refer the readers to the surveys [179, 208].

Here, we briefly summarize the aggregation and update operations of five typical GNN frameworks that are widely adopted in the field of recommendation.

\(\bullet\) GCN [73] approximates the first-order eigendecomposition of the graph Laplacian to iteratively aggregate information from neighbors. Concretely, it updates the embedding by

\begin{equation} \begin{aligned}\text{Aggregation:} \quad \mathbf {n}_{v}^{(l)} = \sum _{j \in \mathcal {N}_{v}} d_{vv}^{-\frac{1}{2}} \tilde{a_{vj}} d_{jj}^{-\frac{1}{2}}\mathbf {h}^{(l)}_j,\quad \text{Update:} \quad \mathbf {h}_{v}^{(l+1)} = \delta \left(\mathbf {W}^{(l)}\mathbf {n}_{v}^{(l)} \right)\!, \end{aligned} \end{equation}

(3)

where \(\delta (\cdot)\) is the nonlinear activation function, like ReLU; \(\mathbf {W}^{(l)}\) is the learnable transformation matrix for layer \(l\) ; \(\tilde{a_{vj}}\) is the adjacency weight ( \(\tilde{a_{vv}}=1\) ); and \(d_{jj}=\Sigma _k \tilde{a_{jk}}\) .

\(\bullet\) GraphSAGE [50] samples a fixed size of neighborhood for each node, proposes mean/sum/ max-pooling aggregator, and adopts concatenation operation for update:

\begin{equation} \begin{aligned}\text{Aggregation:}\quad \mathbf {n}_{v}^{(l)} &= \operatorname{Aggregator}_{l} \left(\left\lbrace \mathbf {h}_{u}^{{l}}, \forall u \in \mathcal {N}_{v}\right\rbrace \right)\!, \\ \text{Update:}\quad \mathbf {h}_{v}^{(l+1)} &=\delta \left(\mathbf {W}^{(l)} \cdot \bigl [\mathbf {h}_{v}^{(l)} \oplus \mathbf {n}_{v}^{(l)}\bigr ]\right)\!, \end{aligned} \end{equation}

(4)

where \(\operatorname{Aggregator}_{l}\) denotes the aggregation function at the \(l{\rm {th}}\) layer, \(\delta (\cdot)\) is the nonlinear activation function, and \(\mathbf {W}^{(l)}\) is the learnable transformation matrix.

\(\bullet\) GAT [140] assumes that the influence of neighbors is neither identical nor pre-determined by the graph structure, and thus it differentiates the contributions of neighbors by leveraging the attention mechanism and updates the vector of each node by attending over its neighbors:

\begin{equation} \begin{aligned}\text{Aggregation:} \quad & \mathbf {n}_{v}^{(l)} =\sum _{j \in \mathcal {N}_{v}} \alpha _{v j} \mathbf {h}^{(l)}_{j}, \alpha _{v j} =\frac{\exp \left(\text{Att}({h}^{(l)}_{v},\mathbf {h}^{(l)}_{j})\right)}{\sum _{k \in \mathcal {N}_{v}} \exp \left(\text{Att}({h}^{(l)}_{v},\mathbf {h}^{(l)}_{k})\right)},\\ \text{Update:} \quad & \mathbf {h}_{v}^{(l+1)} = \delta \Bigl (\mathbf {W}^{(l)}\mathbf {n}_{v}^{(l)}\Bigr), \end{aligned} \end{equation}

(5)

where \(\text{Att}(\cdot)\) is an attention function and a typical \(\text{Att}(\cdot)\) is \(\operatorname{LeakyReLU}(\mathbf {a}^{T}[\mathbf {W}^{(l)}\mathbf {h}^{(l)}_{v} \oplus \mathbf {W}^{(l)}\mathbf {h}^{(l)}_{j}])\) , \(\mathbf {W}^{(l)}\) is responsible for transforming the node representations at the \(l{\rm {th}}\) propagation, and \(\mathbf {a}\) is the learnable parameter.

\(\bullet\) GGNN [89] adopts a gated recurrent unit (GRU) [89] in the update step:

\begin{equation} \begin{aligned}\text{Aggregation:} \quad \mathbf {n}_v^{(l)} = \frac{1}{|\mathcal {N}_{v}|}\sum _{j \in \mathcal {N}_{v}}\mathbf {h}^{(l)}_j, \quad \text{Update:}\quad \mathbf {h}_v^{(l+1)} = \text{GRU}\left(\mathbf {h}_v^{(l)},\mathbf {n}_v^{(l)}\right)\!. \end{aligned} \end{equation}

(6)

GGNN executes the recurrent function several times over all nodes [179], which might face the scalability issue when it is applied in large graphs.

\(\bullet\) HGNN [36] is a typical hypergraph neural network, which encodes high-order data correlation in a hypergraph structure. The hyperedge convolutional layer is in the following formulation:

\begin{equation} \text{Aggregation:}\quad \mathbf {N}^{(l)} = \mathbf {D}_v^{-\frac{1}{2}} \mathbf {E}\mathbf {W}^0\tilde{\mathbf {D_e}}^{-1}\mathbf {E}^T\mathbf {D}_v^{-\frac{1}{2}}\mathbf {H}^{(l)},\quad \text{Update:} \quad \mathbf {H}^{(l+1)} =\delta \left(\mathbf {W}^{(l)}\mathbf {N}^{(l)}\right)\!, \end{equation}

(7)

where \(\delta (\cdot)\) is the nonlinear activation function, like ReLU; \(\mathbf {W}^{(l)}\) is the learnable transformation matrix for layer \(l\) ; \(\mathbf {E}\) is the hypergraph adjacent matrix; and \(\mathbf {D}_e\) and \(\mathbf {D}_v\) denote the diagonal matrices of the edge degrees and the vertex degrees, respectively.

2.3 Why Graph Neural Network for Recommendation

In the past few years, many works on GNN-based recommendation have been proposed. Before diving into the details of the latest developments, it is beneficial to understand the motivations of applying GNN to recommender systems.

The most intuitive reason is that GNN techniques have been demonstrated to be powerful in representation learning for graph data in various domains [44, 208], and most of the data in recommendation has essentially a graph structure as shown in Figure 1. For instance, the user-item interaction data can be represented by a bipartite graph (as shown in Figure 1(a)) between the user and item nodes, where the link represents the interaction between the corresponding user and item. Besides, a sequence of items can be transformed into the sequence graph, where each item can be connected with one or more subsequent items. Figure 1(b) shows an example of a sequence graph where there is an edge between consecutive items. Compared to the original sequence data, a sequence graph allows more flexibility to item-to-item relationships. Beyond that, some side information also naturally has a graph structure, such as a social relationship and knowledge graph, as shown in Figures 1(c) and 1(d).

Fig. 1.

Due to the specific characteristic of different types of data in recommendation, a variety of models have been proposed to effectively learn their pattern for better recommendation results, which is a big challenge for the model design. Considering the information in recommendation from the perspective of the graph, a unified GNN framework can be utilized to address all these tasks. For example, the task of non-sequential recommendation is to learn the effective node representations, i.e., user/item representations, and to further predict user preferences. The task of sequential recommendation is to learn the informative graph representation, i.e., sequence representation. Both node representation and graph representation can be learned through GNN. Besides, it is more convenient and flexible to incorporate additional information (if available) compared to the non-graph perspective. For instance, the social network can be integrated into the user-item bipartite relationship as a unified graph. Both the social influence and collaborative signal can be captured during the iterative propagation.

Moreover, GNN can explicitly encode the crucial collaborative signal of user-item interactions to enhance the user/item representations through the propagation process. Utilizing collaborative signals for better representation learning is not a completely new idea. For instance, SVD++ [76] incorporates the representations of interacted items to enrich the user representations. ItemRank [41] constructs the item-item graph from interactions and adopts the random-walk algorithm to rank items according to user preferences. Note that SVD++ can be seen as using one-hop neighbors (i.e., items) to improve user representations, while ItemRank utilizes two-hop neighbors to improve item representations. Compared with the non-graph model, GNN is more flexible and convenient to model multi-hop connectivity from user-item interactions, and the captured CF signals in high-hop neighbors have been demonstrated to be effective for recommendation.

2.4 Categories of Graph-neural-network-based Recommendation

In this survey, we propose a new taxonomy to classify the existing GNN-based models. Based on the types of information used and recommendation tasks, the existing works are categorized into user-item collaborative filtering, sequential recommendation, social recommendation, knowledge-graph-based recommendation, and other tasks. In addition to the former four types of tasks, there are other recommendation tasks, such as POI recommendation, multimedia recommendation, and bundle recommendation. Since the studies utilizing GNN in these tasks are not that abundant, we group them into one category and discuss their current developments, respectively.

The rationale of classification is as follows: The graph structure depends to a large extent on the type of information. For example, a social network is naturally a homogeneous graph, and user-item interaction can be considered either a bipartite graph or two homogeneous graphs (i.e., user-user and item-item graphs). Besides, the information type also plays a key role in designing an efficient GNN architecture, such as aggregation and update operations and network depth. For instance, a knowledge graph has multi-type entities and relations, which requires considering such heterogeneity during propagation. Moreover, recommendation tasks are highly related to the type of information used. For example, the social recommendation is to make a recommendation by utilizing the social network information, and the knowledge-graph-based recommendation is to enhance the item representation by leveraging semantic relations among items in the knowledge graph. This survey is mainly for the readers interested in the development of GNN in recommender systems. Thus, our taxonomy is primarily from the perspective of recommender systems but also takes the GNN into account.

3 User-item Collaborative Filtering

Given the user-item interaction data, the basic idea of user-item collaborative filtering is essentially using the items interacted with by users to enhance user representations and using the users’ once-interacted-with items to enrich item representations. Inspired by the advantage of GNN techniques in simulating the information diffusion process, recent efforts have studied the design of GNN methods, in order to exploit high-order connectivity from user-item interactions more efficiently. Figure 2 illustrates the pipeline of applying GNN to user-item interaction information.

Fig. 2.

To take full advantage of GNN methods on capturing collaborative signals from user-item interactions, there are four main issues to deal with:

\(\bullet\) Graph Construction. Graph structure is essential for the scope and type of information to propagate. The original bipartite graph consists of a set of user/item nodes and the interactions between them. Should GNN be applied over the heterogeneous bipartite graph or should the homogeneous graph be constructed based on two-hop neighbors? Considering computational efficiency, how should representative neighbors be sampled for graph propagation instead of operating on the full graph?

\(\bullet\) Neighbor Aggregation. How the information be aggregated from neighbor nodes–specifically, whether to differentiate the importance of neighbors, model the affinity between the central node and neighbors, or model the interactions among neighbors?

\(\bullet\) Information Update. How the central node representation and the aggregated representation of its neighbors be integrated?

\(\bullet\) Final Node Representation. Predicting the user’s preference for the items requires the overall user/item representation. Should the node representation in the last layer or the combination of the node representations in all layers be used as the final node representation?

3.1 Graph Construction

Most works [8, 18, 55, 82, 132, 135, 142, 155, 173, 197, 205] apply the GNN on the original user-item bipartite graph directly. There are two issues in directly applying GNN on the original graph: one is effectiveness, that the original graph structure might not be sufficient enough for learning user/item representations; another one is efficiency, that aggregating the information of the full neighborhoods of nodes requires high computation cost especially for the large-scale graph [190].

One strategy to address the first issue is to enrich the original graph structure by adding edges, such as links between two-hop neighbors and hyperedges. For instance, Multi-GCCF [133] and DGCF [101] add edges between two-hop neighbors on the original graph to obtain the user-user and item-item graph. In this way, the proximity information among users and items can be explicitly incorporated into user-item interactions. DHCF [66] introduces the hyperedges and constructs the user/item hypergraphs in order to capture explicit hybrid high-order correlations. Another strategy is to introduce virtual nodes for enriching the user-item interactions. For example, DGCF [156] introduces virtual intent nodes and decomposes the original graph into a corresponding subgraph for each intent, which represents the node from different aspects and has better expressive power. HiGNN [91] creates new coarsened user-item graphs by clustering similar users/items and taking the clustered centers as new nodes in order to explicitly capture hierarchical relationships among users and items.

In terms of the second issue, sampling strategies are proposed to make GNN efficient and scalable to large-scale graph-based recommendation tasks. PinSage [190] designs a random-walk-based sampling method to obtain the fixed size of neighborhoods with the highest visit counts. In this way, those nodes that are not directly adjacent to the central node may also become its neighbors. Multi-GCCF [133] and NIA-GCN [132] randomly sample a fixed size of neighbors. Sampling is a tradeoff between the original graph information and computational efficiency. The performance of the model depends on the sampling strategy, and the more efficient sampling strategy for neighborhood construction deserves further studying.

3.2 Neighbor Aggregation

The aggregation step is of the vital importance for information propagation for the graph structure, which decides how much neighbors’ information should be propagated. Mean-pooling is one of the most straightforward aggregation operations [8, 133, 135, 197, 198], which treats neighbors equally:

\begin{equation} {\bf n}_u^{(l)}=\frac{1}{\left|\mathcal {N}_{u}\right|}\mathbf {W}^{(l)} \mathbf {h}_{i}^{(l)}. \end{equation}

(8)

Mean-pooling is easy for implementation but might be inappropriate when the importance of neighbors is significantly different. Following the traditional GCN, some works employ “degree normalization” [18, 55, 173], which assigns weights to nodes based on the graph structure:

\begin{equation} {\bf n}_u^{(l)} =\sum _{i \in \mathcal {N}_{u}} \frac{1}{\sqrt {\left|\mathcal {N}_{u}\right|\left|\mathcal {N}_{i}\right|}}\mathbf {W}^{(l)} \mathbf {h}_{i}^{(l)}. \end{equation}

(9)

Owing to the random-walk sampling strategy, PinSage [190] adopts the normalized visit counts as the importance of neighbors when aggregating the vector representations of neighbors. However, these aggregation functions determine the importance of neighbors according to the graph structure but ignore the relationships between the connected nodes.

Motivated by common sense that the embeddings of items in line with the user’s interests should be passed more to the user (analogously for the items), MCCF [158] and DisenHAN [107] leverage the attention mechanism to learn the weights of neighbors [107, 146]. NGCF [155] employs element-wise product to augment the items’ features the user cares about or the users’ preferences for features the item has. Take the user node as an example; the aggregated neighbor representation is calculated as follows:

\begin{equation} {\bf n}_u^{(l)} =\sum _{i \in \mathcal {N}_{u}} \frac{1}{\sqrt {\left|\mathcal {N}_{u}\right|\left|\mathcal {N}_{i}\right|}}\left(\mathbf {W}_{1}^{(l)} \mathbf {h}_{i}^{(l)}+\mathbf {W}_{2}^{(l)}\Bigl (\mathbf {h}_{i}^{(l)} \odot \mathbf {h}_{u}^{(l)}\Bigr)\right). \end{equation}

(10)

NIA-GCN [132] argues that existing aggregation functions fail to preserve the relational information within the neighborhood, and thus proposes the pairwise neighborhood aggregation approach to explicitly capture the interactions among neighbors. Concretely, it applies element-wise multiplication between every two neighbors to model the user-user/item-item relationships.

3.3 Information Update

Given the information aggregated from its neighbors, how to update the representation of the node is essential for iterative information propagation. According to whether to retain the information of the node itself, the existing methods can be divided into two directions. One is to discard the original information of the user or item node completely and use the aggregated representation of neighbors as the new central node representation [8, 55, 156, 197], which might overlook the intrinsic user preference or the intrinsic item property.

Another is to take both the node itself ( \(\mathbf {h}_u^{(l)}\) ) and its neighborhood message ( \(\mathbf {n}_u^{(l)}\) ) into consideration to update node representations. The most straightforward way is to combine these two representations linearly with sum-pooling or mean-pooling operation [132, 155, 173, 198]. Inspired by the GraphSAGE [50], some works [82, 133, 190] adopt concatenation function with non-linear transformation to integrate these two representations as follows:

\begin{equation} {\bf h}_u^{(l+1)} = \sigma \left(\mathbf {W}^{(l)} \cdot (\mathbf {h}_{u}^{(l)}\oplus \mathbf {n}_{u}^{(l)})+\mathbf {b}^{(l)}\right), \end{equation}

(11)

where \(\sigma\) denotes the activation function, e.g., ReLU, LeakyReLU, and sigmoid. Compared to linear combination, the concatenation operation with feature transformation allows more complex feature interaction. LightGCN [55] and LR-GCCF [18] observe that nonlinear activation contributes little to the overall performance, and they simplify the update operation by removing the non-linearities, thereby retaining or even improving performance and increasing computational efficiency.

3.4 Final Node Representation

Applying the aggregation and update operations layer by layer generates the representations of nodes for each depth of GNN. The overall representations of users and items are required for the final prediction task.

A mainstream approach is to use the node vector in the last layer as the final representation, i.e., \({\bf h}_u^*={\bf h}_u^{(L)}\) [8, 82, 135, 161, 190, 197]. However, the representations obtained in different layers emphasize the messages passed over different connections [155]. Specifically, the representations in the lower layer reflect the individual feature more, while those in the higher layer reflect the neighbor feature more. To take advantage of the connections expressed by the output of different layers, recent studies employ different methods to integrate the messages from different layers:

\begin{equation} \begin{aligned}\text{Mean-pooling:} \quad &\mathbf {h}_u^*=\frac{1}{L+1}\sum _{l=0}^{L}\mathbf {h}_u^{(l)},\\ \text{Sum-pooling:} \quad &\mathbf {h}_u^*=\sum _{l=0}^{L}\mathbf {h}_u^{(l)},\\ \text{Weighted-pooling:} \quad &\mathbf {h}_u^*=\frac{1}{L+1}\sum _{l=0}^{L}\alpha ^{(l)}\mathbf {h}_u^{(l)},\\ \text{Concatenation:} \quad &\mathbf {h}_u^*=\mathbf {h}_u^{(0)}\oplus \mathbf {h}_u^{(1)}\oplus \cdots \oplus \mathbf {h}_u^{(L)}, \end{aligned} \end{equation}

(12)

where \(\alpha ^{(l)}\) is a learnable parameter. Note that mean-pooling and sum-pooling can be seen as two special cases of weighted pooling. Compared to mean-pooling and sum-pooling, weighted pooling allows more flexibility to differentiate the contribution of different layers. Among these four methods, the former three all belong to the linear operation, and only the concatenation operation preserves information from all layers.

3.5 Summary

Corresponding to the discussion at the beginning of this section, we briefly summarize the existing works from four issues:

\(\bullet\) Graph Construction. The most straightforward way is to directly use the original user-item bipartite graph. If some nodes have few neighbors in the original graph, it would be beneficial to enrich the graph structure by adding either edges or nodes. When dealing with large-scale graphs, it is necessary to sample the neighborhood for computational efficiency. Sampling is a tradeoff between effectiveness and efficiency, and a more effective sampling strategy deserves further study.

\(\bullet\) Neighbor Aggregation. When neighbors are more heterogeneous, aggregating neighbors with attentive weights would be preferable to equal weights and degree normalization; otherwise, the latter two are preferable for easier calculation. Explicitly modeling the influence among neighbors or the affinity between the central node and neighbors might bring additional benefits but needs to be verified on more datasets.

\(\bullet\) Information Update. Compared to discarding the original node, updating the node with its original representation and the aggregated neighbor representation would be preferable. Recent works show that simplifying the traditional GCN by removing the transformation and non-linearity operation can achieve better performance than the original ones.

\(\bullet\) Final Node Representation. To obtain overall user/item representation, utilizing the representations from all layers is preferable to directly using the last layer representation. In terms of the function of integrating the representations from all layers, weighted-pooling allows more flexibility, and concatenation preserves information from all layers.

Figure 3 summarizes the typical strategies for each of the main issues and lists the representative works accordingly.

Fig. 3.

4 Sequential Recommendation

Sequential recommendation predicts users’ next preferences based on their most recent activities, which seeks to model sequential patterns among successive items and generate accurate recommendations for users [117]. From the perspective of adjacency between items, sequences of items can be modeled as graph-structured data. Inspired by the advantage of GNN, it is becoming popular to utilize GNN to capture the transition pattern from users’ sequential behaviors by transforming them into the sequence graph.

Figure 4 illustrates the overall framework of GNN in sequential recommendation. To fully utilize GNN in the sequential recommendation, there are three main issues to deal with:

Fig. 4.

\(\bullet\) Graph Construction. To apply GNN in the sequential recommendation, the sequence data should be transformed into a sequence graph. Is it sufficient to construct a subgraph for each sequence independently? Would it be better to add edges among several consecutive items than only between the two consecutive items?

\(\bullet\) Information Propagation. To capture the transition patterns, which propagation mechanism is more appropriate? Is it necessary to distinguish the sequential order of the linked items?

\(\bullet\) Sequential Preference. To get the user’s temporal preference, the item representations in a sequence should be integrated. Should one simply apply attentive pooling or leverage RNN structure to enhance consecutive time patterns?

4.1 Graph Construction

Unlike the user-item interactions, which have essentially a bipartite graph structure, the sequential behaviors are naturally expressed in the order of time, i.e., sequences, instead of sequence graphs. Constructing a graph based on the original bipartite graph is optional and mainly driven by the scalability or heterogeneity issue, whereas the construction of a sequence graph based on users’ sequential behaviors is a necessity for applying GNN in sequential recommendation. Figure 5 shows the representative graph construction strategies for sequential behaviors.

Fig. 5.

Constructing the directed graph for each sequence by treating each item in the sequence as a node and adding edges between two consecutively clicked items is the most straightforward way [48, 115, 116, 175, 185]. However, in most scenarios, the length of the user sequence is short; e.g., the average length on the preprocessed Yoochoose1/4¹ dataset is 5.71 [175]. A sequence graph constructed from a single and short sequence consists of a small number of nodes and connections, and some nodes might even have only one neighbor, which contains too limited knowledge to reflect users’ dynamic preferences and cannot take full advantage of GNN in graph learning. To tackle this challenge, recent works propose several strategies to enrich the original sequence graph structure, which can be divided into two mainstreams.

One mainstream is to utilize additional sequences to enrich the item-item transitions. The additional sequences can be other types of behavior sequences [152], the historical sequences of the same user [176], or part/all of the sequences in the whole dataset [14, 163, 206, 207]. For instance, HetGNN [152] utilizes all behavior sequences and constructs edges between two consecutive items in the same sequence with their behavior types as the edge types. A-PGNN [176] deals with the occasion when users are known, thus incorporating the user’s historical sequences with the current sequence to enrich the item-item connections. GCE-GNN [163] and DAT-MDI [14] exploit the item transitions in all sessions to assist the transition patterns in the current sequence, which leverage the local context and global context. Different from GCE-GNN [163] and DAT-MDI [14] that treats all the transitions equally, TASRec [207] attaches more importance to the recent transitions to augment the more recent transitions. Instead of incorporating all the sessions, DGTN [206] only adds similar sessions to the current session, based on the assumption that similar sequences are more likely to reflect similar transition patterns. All these methods introduce more information into the original graph and improve the performance compared to a single sequence graph.

Another mainstream approach is to adjust the graph structure of the current sequence. For example, assuming the current node has a direct influence on more than one consecutive item, MA-GNN [102] extracts three subsequent items and adds edges between them. Considering that only adding edges between consecutive items might neglect the relationships between distant items, SGNN-HN [113] introduces a virtual “star” node as the center of the sequence, which is linked with all the items in the current sequence. The vector-wise representation of the “star” node reflects the overall characteristics of the whole sequence. Hence, each item can gain some knowledge of the items without direct connections through the “star” node. Chen and Wong [19] point out that existing graph construction methods ignore the sequential information of neighbors and bring about the ineffective long-term capturing problem. Therefore, they propose LESSR, which constructs two graphs from one sequence: one distinguishes the order of neighbors, and the other allows the short-cut path from the item to all the items after it.

In addition to the above two mainstreams, other graph construction methods have emerged recently. Inspired by the advantage of the hypergraph in modeling beyond-pairwise relations, the hypergraph has been leveraged to capture the high-order relations among items and the cross-session information. SHARE [148] constructs a hypergraph for each session, of which the hyperedges are defined by various sizes of sliding windows. DHCN [182] takes each session as one hyperedge and integrates all the sessions in one hypergraph. To explicitly incorporate cross-session relationships, DHCN [182] and COTREC [181] construct the session-to-session graph, which takes each session as a node and assigns the weights based on the shared items.

4.2 Information Propagation

Given a built sequence graph, it is essential to design an efficient propagation mechanism to capture transition patterns among items. The GGNN framework is widely adopted to propagate information on the directed graph. Specifically, it employs mean-pooling to aggregate the information of the previous items and the next items, respectively; combines the two aggregated representations; and utilizes the GRU [89] component to integrate the information of neighbors and the central node. The propagation functions are given as follows:

\begin{equation} \begin{split} \mathbf {n}_{i_{s,t}}^{\mathrm{in}(l)}=\frac{1}{|\mathcal {N}_{i_{s,t}}^{\mathrm{in}}|}\Sigma _{j\in \mathcal {N}_{i_{s,t}}^{\mathrm{in}}}\mathbf {h}_j^{(l)}&, \quad \mathbf {n}_{i_{s,t}}^{\mathrm{out}(l)}=\frac{1}{|\mathcal {N}_{i_{s,t}}^{\mathrm{out}}|}\Sigma _{j\in \mathcal {N}_{i_{s,t}}^{\mathrm{out}}}\mathbf {h}_j^{(l)},\\ \mathbf {n}_{i_{s,t}}^{(l)}=\mathbf {n}_{i_{s,t}}^{\mathrm{in}(l)}\oplus \mathbf {n}_{i_{s,t}}^{\mathrm{out}(l)}&, \quad \mathbf {h}_{i_{s,t}}^{(l+1)}={\bf GRU}\left(\mathbf {h}_{i_{s,t}}^{(l)},\mathbf {n}_{i_{s,t}}^{(l)}\right)\!, \end{split} \end{equation}

(13)

where \(\mathcal {N}_{i_{s,t}}^{\mathrm{in}}\) , \(\mathcal {N}_{i_{s,t}}^{\mathrm{out}}\) denote the neighborhood set of previous items and next items, respectively, and \({\bf GRU}(\cdot)\) represents the GRU component. Different from the pooling operation, the gate mechanism in GRU decides what information is to be preserved and discarded. Unlike GGNN, which treats the neighbors equally, the attention mechanism is also utilized to differentiate the importance of neighbors [12, 115, 163]. All the above methods adopt the permutation-invariant aggregation function during the message passing, ignoring the order of items within the neighborhood, which may lead to the loss of information [19]. To address this issue, LESSR [19] preserves the order of items in the graph construction and leverages the GRU component [89] to aggregate the neighbors sequentially, as in the following equation:

\begin{equation} \mathbf {n}_{i_{s,t},k}^{(l)}={\bf GRU}^{(l)}\left(\mathbf {n}_{i_{s,t},k-1}^{(l)}, \mathbf {h}_{i_{s,t},k}^{(l)}\right), \end{equation}

(14)

where \(\mathbf {h}_{i_{s,t},k}^{(l)}\) represents the \(k{\rm {th}}\) item in the neighborhood of \(i_{s,t}\) ordered by time, and \(\mathbf {n}_{i_{s,t},k}^{(l)}\) denotes the neighborhood representation after aggregating \(k\) items.

For the sequence graph with hypergraph structure, DHCN [182] adopts the typical hypergraph neural network HGNN [36], which treats the nodes equally during propagation. To differentiate the importance of items within the same hyperedge, SHARE [148] designs two attention mechanisms to propagate the information of item nodes. One is the hyperedges, and the other is the information of the hyperedges to the connected item nodes. For user-aware sequential recommendation, A-PGNN [176] and GAGA [116] implicitly incorporate the user information and augment the representations of items in the neighborhood with user representation.

4.3 Sequential Preference

Due to the limited iteration of propagation, GNN cannot effectively capture long-range dependency among items [19]. Therefore, the representation of the last item (or any item) in the sequence is not sufficient enough to reflect the user’s sequential preference. Besides, most of the graph construction methods of transforming sequences into graphs lose part of the sequential information [19]. In order to obtain the effective sequence representation, existing works propose several strategies to integrate the item representations in the sequence.

Considering that the items in a sequence have different levels of priority, the attention mechanism is widely adopted for integration. Some works [113, 116, 175, 206] calculate the attentive weights between the last item and all the items in the sequence, aggregate the item representations as the global preference, and incorporate it with local preference (i.e., the last item representation) as the overall preference. In this way, the overall preference relies heavily on the relevance of the last item to the user preference. Inspired by the superiority of the multi-layer self-attention strategy in sequence modeling, GC-SAN [185] stacks multiple self-attention layers on top of the item representations generated by GNN to capture long-range dependencies.

In addition to leveraging the attention mechanism for sequence integration, sequential signals are explicitly incorporated into the integration process. For instance, NISER [48] and GCE-GNN [163] add the positional embeddings, which reflect the relative order of the items, to effectively obtain position-aware item representations. To balance the consecutive time and flexible transition pattern, FGNN [115] employs the GRU with the attention mechanism to iteratively update the user preference with item representations in the sequence.

All the above works integrate the item representations within the user’s behavior sequence to generate the representation of sequential preference. Apart from these methods, DHCN [182] and COTREC [181] enrich the sequence graph by the session-to-session graph in the graph construction step. Therefore, they combine the sequential representation learned from the session-to-session graph and the one aggregated from items at this step.

4.4 Summary

This part briefly summarizes the reviewed works in terms of the three main issues.

\(\bullet\) Graph Construction. The most straightforward construction is to add edges between the two consecutive items. When the sequence length is short, utilizing additional sequences can enrich the sequence graph, and it would be preferable if the additional sequences are more similar to the original sequence. Another line is to adjust the graph structure of the behavior sequence. There is no accepted statement on which method is better. Moreover, incorporating the session-to-session graph into the sequence graph is also used to gain further improvements.

\(\bullet\) Information Propagation. Most of the propagation methods are variants of the propagation methods in traditional GNN frameworks, and there is no consensus on which method is better. Some complex propagation methods, such as LESSR [19], achieve performance gain at the cost of more computation. Whether to adopt complex propagation methods in practice depends on the tradeoff between computation costs and performance gains.

\(\bullet\) Sequential Preference. To obtain the sequential preference, an attention mechanism is widely adopted to integrate the representations of items in the sequence. Beyond that, adding positional embeddings can enhance the relative order of the items and can bring a few improvements. Whether leveraging RNN structure can boost performance for all the sequential recommendation tasks requires further investigation.

Figure 6 summarizes the typical strategies for each of the main issues and lists the representative works accordingly.

Fig. 6.

5 Social Recommendation

With the emergence of online social networks, social recommender systems have been proposed to utilize each user’s local neighbors’ preferences to enhance user modeling [43, 65, 103, 104, 172]. All these works assume users with social relationships should have similar representations based on the social influence theory that connected people would influence each other. Some of them directly use such relationship as a regularizer to constrain the final user representations [65, 104, 105, 138], while others leverage such relationship as input to enhance the original user embeddings [43, 103].

From the perspective of graph learning, the early works mentioned above can be seen as modeling the first-order neighbors of each user. However, in practice, a user might be influenced by her/his friends’ friends. Overlooking the high-order influence diffusion in previous works might lead to the suboptimal recommendation performance [172]. Thanks to the ability to simulate how users are influenced by the recursive social diffusion process, GNN has become a popular choice to model the social information in recommendation.

To incorporate relationships among users into interaction behaviors by leveraging GNN, there are two main issues to deal with:

\(\bullet\) Influence of Friends. Do friends have equal influence? If not, how can one distinguish the influence of different friends?

\(\bullet\) Preference Integration. Users are involved in two types of relationships, i.e., social relationships with their friends and interactions with items. How can one integrate the user representations from the social influence perspective and interaction behavior?

5.1 Influence of Friends

Generally, a social graph only contains information about whether the users are friends, but the strengths of social ties are usually unknown. To propagate the information of friends, it is essential to decide the influence of friends. DiffNet [172] treats the influence of friends equally by leveraging mean-pooling operation. However, the assumption of equal influence is not in accordance with the actual situation, and the influence of a user is unsuitable to be simply determined by the number of her/his friends. Indeed, users are more likely to be influenced by friends with strong social ties or similar preferences. Therefore, the attention mechanism is widely leveraged to differentiate the influence of neighbors [2, 33, 110, 130, 171, 174]. For example, Song et al. [130] propose DGRec, which dynamically infers the influence of neighbors based on their current interests. It first models dynamic users’ behaviors with a recurrent neural network and then acquires the social influence with a graph attention neural network. Compared to the mean-pooling operation, the attention mechanism boosts the overall performance, which further verifies the assumption that different friends have different influence power.

Moreover, a recent work, named ESRF [191], argues that social relations are not always reliable. The unreliability of social information lies in two aspects: on the one hand, the users with explicit social connections might have no influence power; on the other hand, the obtained social relationships might be incomplete. Considering that indiscriminately incorporating unreliable social relationships into recommendation may lead to poor performance, ESRP leverages the autoencoder mechanism to modify the observed social relationships by filtering irrelevant relationships and investigating the new neighbors. Similarly, DiffNetLG [129] involves implicit local influence to predict the unobserved social relationship and then utilizes both explicit and implicit social relations to make recommendations.

5.2 Preference Integration

Users in social recommendation are involved in two types of relationships: one is the user-item interactions and the other is the social graph. To enhance the user preference representation by leveraging social information, there are two strategies for combining the information from these two networks:

•

To learn the user representation from these two networks respectively [33, 172, 174] and then integrate them into the final preference vector, as illustrated in Figure 7(a)

Two strategies for social enhanced general recommendation.

•

To combine the two networks into one unified network [171] and apply GNN to propagate information, as illustrated in Figure 7(b)

The advantage of the first strategy lies in two aspects: on the one hand, we can differentiate the depth of the diffusion process of two networks since they are treated separately; on the other hand, any advanced method for a user-item bipartite graph can be directly applied, and for social network, a homogeneous graph, GNN techniques are extremely suitable for simulating the influence process since they are originally proposed for homogeneous graphs. As for the integration of the user representations learned from two relationships, there are two main mechanisms, i.e., linearity combination and non-linearity combination. Among the linearity combination, DiffNet [172] treats the user representations from two spaces equally and combines them with a sum-pooling operation. Instead of an equal-weight combination, DANSER [174] dynamically allocates weights according to the user-item paired features. Among the non-linearity combination, multi-layer perceptrons over the concatenated vector are widely adopted to enhance the feature interactions [33, 47, 186].

The advantage of integrating the two graphs into one unified network is that both the higher-order social influence diffusion in the social network and interest diffusion in the user-item bipartite graph can be simulated in a unified model, and these two kinds of information simultaneously reflect users’ preferences. DiffNet++ [171] designs a two-level attention network to update user nodes at each layer. Specifically, it first aggregates the information of neighbors in the bipartite graph (i.e., interacted items) and social network (i.e., friends) by utilizing the GAT mechanism, respectively. Considering that different users may have different preferences in balancing these two relationships, it further leverages another attention network to fuse the two hidden states of neighbors. Similarly, SEFrame [20] utilizes a heterogeneous graph network to fuse the knowledge from social relationships and user-item interactions and item transitions from the heterogeneous graph, and employs a two-level attention network for propagation. Up till now, there is no evidence to show which strategy always achieves better performance.

5.3 Summary

Corresponding to the discussion at the beginning of this section, we briefly summarize the current works in terms of the two issues:

\(\bullet\) Influence of Friends. Compared to assigning equal weights to friends, differentiating the influence of different friends is more appropriate. An emerging direction is to automatically modify the social relationship, which can benefit from the presence of noise in social networks.

\(\bullet\) Preference Integration. The strategies for combining the two sources of information depend on whether to consider the two graphs separately or unify them into one graph. For the separate graphs, user preference is an integration of the overall representations learned from these two graphs. For the unified graph, a commonly adopted strategy is the hierarchical aggregation schema.

Figure 8 summarizes the typical strategies for each of the main issues and lists the representative works accordingly.

Fig. 8.

6 Knowledge-graph-based Recommendation

A social network that reflects relationships between users is utilized to enhance user representation, while a knowledge graph that expresses relationships between items through attributes is leveraged to enhance the item representation. Incorporating a knowledge graph into recommendation can bring two-facet benefits [144]: (1) the rich semantic relations among items in a knowledge graph can help explore their connections and improve the item representation, and (2) a knowledge graph connects a user’s historically interacted-with items and recommended items, which enhances the interpretability of the results [189].

Despite the above benefits, utilizing a knowledge graph in recommendation is rather challenging due to its complex graph structure, i.e., multi-type entities and multi-type relations. Previous works preprocess a knowledge graph by knowledge graph embedding (KGE) methods to learn the embeddings of entities and relations, such as [26, 146, 196, 202]. The limitation of commonly used KGE methods is that they focus on modeling rigorous semantic relatedness with the transition constraints, which are more suitable for the tasks related to graphs, such as link prediction rather than recommendation [145]. Meta-path-based methods manually define meta-paths that carry the high-order information and feed them into a predictive model, and thus they require domain knowledge and are rather labor-intensive for complicated a knowledge graph [147, 154].

Given the user-item interaction information as well as the knowledge graph, the knowledge-graph-based recommendation seeks to take full advantage of the rich information in the knowledge graph, which can help to estimate the users’ preferences for items by explicitly capturing relatedness between items. For the effectiveness of knowledge-graph-based recommendation, there are two main issues to deal with:

\(\bullet\) Graph Construction. How can one effectively integrate the collaborative signals from user-item interactions and the semantic information from the knowledge graph? Should the user nodes explicitly be incorporated into the knowledge graph or the user nodes be implicitly used to distinguish the importance of different relations?

\(\bullet\) Relation-aware Aggregation. One characteristic of a knowledge graph is that it has multiple types of relations between entities. How can one design a relation-aware aggregation function to aggregate information from linked entities?

6.1 Graph Construction

For the stage of graph construction, one main concern is how to effectively integrate the collaborative signals and knowledge information.

One direction is to incorporate the user nodes into the knowledge graph. For instance, KGAT [154], MKGAT [134], and CKAN [162] combine the user-item bipartite graph and knowledge graph into one unified graph by taking the user nodes as one type of entity and the relation between users and items as “interaction.” Recent efforts focus on the entities and relations relevant to the user-item pair. Therefore, they construct the subgraph that links the user-item pair with the user’s historical interacted-with items and the related semantics in the knowledge graph [35, 127]. Based on the assumption that a shorter path between two nodes reflects more reliable connections, AKGE [127] constructs the subgraph by the following steps: pre-train the embeddings of entities in the knowledge graph by TransR [93], calculate the pairwise Euclidean distance between two linked entities, and keep the \(K\) paths with the shortest distance between the target user and item node. The potential limitation is that the subgraph structure depends on the pre-trained entity embeddings and the definition of distance measurement. ATBRG [35] exhaustively searches the multi-layer entity neighbors for the target item and the items from the user’s historical behaviors and restores the paths connecting the user behaviors and the target item by multiple overlapped entities. In order to emphasize the information-intensive entities, ATBRG further prunes the entities with a single link, which can also help control the scale of the graph. Although these methods can obtain subgraphs more relevant to the user-item pair, it is quite time-consuming to pre-train the entity embedding or search and prune paths exhaustively. An effective and efficient subgraph construction strategy is worthy of further investigation.

Another direction is to implicitly use the user nodes to distinguish the importance of different relations. For instance, KGCN [147] and KGNN-LS [145] take the user nodes as queries to assign weights to different relations. In terms of graph construction, this line of research emphasizes the users’ preferences toward relations instead of the collaborative signal in user-item interactions.

6.2 Relation-aware Aggregation

To fully capture the semantic information in a knowledge graph, both the linked entities (i.e., \(e_i, e_j\) ) and the relations in between (i.e., \(r_{e_i, e_j}\) ) should be taken into consideration during the propagation process. Besides, from the perspective of recommender systems, the role of users might also have an influence. Owing to the advantage of GAT in adaptively assigning weights based on the connected nodes, most of the existing works apply the variants of the traditional GAT over the knowledge graph; i.e., the central node is updated by the weighted average of the linked entities, and the weights are assigned according to the score function, denoted as \(a(e_i, e_j, r_{e_i, e_j}, u)\) . The key challenge is to design a reasonable and effective score function.

For the works [35, 134, 154] that regard the user nodes as one type of entity, the users’ preferences are expected to be spilled over to the entities in the knowledge graph during the propagation process since the item nodes would be updated with the information of interacted users and related attributes, and then the other entities would contain users’ preferences with iterative diffusion. Therefore, these works do not explicitly model users’ interests in relations but differentiate the influence of entities by the connected nodes and their relations. For instance, inspired by the transition relationship in a knowledge graph, KGAT [154] assigns the weight according to the distance between the linked entities in the relation space:

\begin{equation} a(e_i, e_j, r_{e_i, e_j}, u)=\left(\mathbf {W}_{r} \mathbf {e}_{j}\right)^{\top } \tanh \left(\bigl (\mathbf {W}_{r} \mathbf {e}_{i}+\mathbf {e}_{r_{e_i, e_j}}\bigr)\right)\!, \end{equation}

(15)

where \(\mathbf {W}_{r}\) is the transformation matrix for the relation, which maps the entity into relation space. In this way, the closer entities would pass more information to the central node. These methods are more appropriate for the constructed subgraph containing user nodes, since it is difficult for the users’ interests to extend to all the related entities by stacking a limited number of GNN layers.

For the works that do not combine the two sources of graphs, these studies [145, 147] explicitly characterize users’ interests in relations by assigning weights according to the connecting relation and specific user. For example, the score function adopted by KGCN [147] is the dot product of the user embedding and the relation embedding, i.e.,:

\begin{equation} a(e_i, e_j, r_{e_i, e_j}, u)=\mathbf {u}^{\top }\mathbf {r_{e_i, e_j}}. \end{equation}

(16)

In this way, the entities whose relations are more consistent with users’ interests will spread more information to the central node.

6.3 Summary

Corresponding to the discussion at the beginning of this section, we briefly summarize the current works in terms of the two issues:

\(\bullet\) Graph Construction. Existing works either consider the user nodes as one type of entity or implicitly use the user nodes to differentiate the relations. The first direction can be further divided into the overall unified graph or the specific subgraph for the user-item pair. Compared to the overall unified graph, the user-item subgraph has the advantage of focusing on more related entities and relations, but it requires more computation time and the performance depends on the construction of the subgraph, which still requires further investigation.

\(\bullet\) Relation-aware Aggregation. The variants of GAT are widely leveraged to aggregate information from linked entities, taking into account the relations. For the graphs that do not explicitly incorporate user nodes, user representations are utilized to assign weights to the relations.

Figure 9 summarizes the typical strategies for each of the main issues and lists the representative works accordingly.

Fig. 9.

7 Other Tasks

In addition to these four types of tasks, researchers have started to utilize GNN for improving the performance of other recommendation tasks, such as POI recommendation and multimedia recommendation. In this section, we will summarize the recent developments for each task respectively.

POI recommendation plays a key role in location-based service, which utilizes the geographical information to capture geographical influence among POIs and users’ historical check-ins to model the transition patterns. In the field of POI recommendation, there are several kinds of graph data, such as the user-POI bipartite graph, the sequence graph based on check-ins, and the geographical graph; i.e., the POIs within a certain distance are connected and the edge weights depend on the distance between POIs [10, 92]. SGRec [88] enriches the check-in sequence with the correlated POIs belonging to other check-ins, which allows collaborative signals to be propagated across sequences. Chang et al. [10] believe that the more often users consecutively visited the two POIs, the greater the geographical influence between these two POIs. Hence, the check-ins not only reflect users’ dynamic preferences but also indicate the geographical influence among POIs. To explicitly incorporate the information of geographical distribution among POIs, the edge weights in the sequence graph depend on the distance between POIs [10].

Group recommendation aims to suggest items to a group of users instead of an individual one [59] based on their historical behaviors. There exist three types of relationships: user-item, where each user interacts with several items; user-group, where a group consists of several users; and group-item, where a group of users all choose the same item. “Group” can be regarded as a bridge connecting the users and the items in the group recommendation, which can either be treated as a part of the graph or not. Here are two representative works corresponding to these two strategies respectively. GAME [59] introduces the “group node” in the graph and applies the GAT to assign appropriate weights to each interacted neighbor. With the propagation diffusion, group representation can be iteratively updated with interacted items and users. However, this approach cannot be directly applied to the task where groups are changed dynamically and new groups are constantly formed. Different from the former transductive method, GLS-GRL [153] learns the group representative in an inductive way, which constructs the corresponding graph for each group specifically. The group representation is generated by integrating the user representations involved in the group, which can address the new group problem.

Bundle recommendation aims to recommend a set of items as a whole for a user. There are three types of relationships: user-item, where each user interacts with several items; user-bundle, where users choose the bundles; and bundle-item, where a bundle consists of several items. For group recommendation, “group” is made up of users; for bundle recommendation, “group” means a set of items. Analogously, the key challenge is to obtain the bundle representation. BGCN [11] unifies the three relationships into one graph and designs the item level and bundle level propagation from the users’ perspective. HFGN [87] considers the bundle as the bridge through which users interact with the items through bundles. Correspondingly, it constructs a hierarchical structure upon user-bundle interactions and bundle-item mappings and further captures the item-item interactions within a bundle.

Click-through rate (CTR) prediction is an essential task for recommender systems in large-scale industrial applications, which predicts the click rate based on the multi-type features. The key challenges of CTR are to model feature interactions and capture user interests. Inspired by the information diffusion process of GNN, a recent work, Fi-GNN [90], employs GNN to capture the high-order interactions among features. Specifically, it constructs a feature graph, where each node corresponds to a feature field and different fields are connected with each other through edges. Hence, the task of feature interactions is converted to propagate node information across the graph. Despite its considerable performance, Fi-GNN ignores the collaborative signals implicit in user behaviors. DG-ENN [46] designs both the attribute graph and user-item collaborative graph and utilizes GNN techniques to capture the high-order feature interactions and collaborative signals. To further alleviate the sparsity issue of the user-item interactions, DG-ENN enriches the original user-item interaction relationships with user-user similarity relationships and item-item transitions.

Multimedia recommendation has been a core service to help users identify multimedia contents of interest. The main characteristic is that the contents are in multi-modality, e.g., text, images, and videos. Recently, researchers have started to adopt GNN to capture the collaborative signals from users’ interactions with multi-modal contents. For instance, MMGCN [165] constructs a user-item bipartite graph for each modality and applies GNN to propagate information for each graph, respectively. The overall user/item representations are the sum of the user/item representations of different modalities. GRCN [164] utilizes the multi-modal contents to refine the connectivity of user-item interactions. For each propagation layer, GRCN takes the maximum value of the user-item similarities in different modalities as the weight of the user-item interaction edges and uses the corresponding weights to aggregate neighbors. MKGAT [134] unifies the user nodes and multi-modal knowledge graph into one graph and employs a relation-aware graph attention network to propagate information. Considering the multi-modal characteristic of entities, MKGAT designs the entity encoder to map each specific data type into a condensed vector.

8 Datasets, Evaluation Metrics, and Applications

In this section, we introduce the commonly adopted datasets and evaluation metrics for different recommendation tasks and summarize the real-world applications of GNN-based recommendation. This section can help researchers find suitable datasets and evaluation metrics to test their methods and get an overview of the practical applications of GNN-based recommendation.

8.1 Datasets

This part introduces the public commonly adopted datasets for GNN-based recommendation systems, as summarized in Table 2. Due to the page limit, we do not list the datasets used by other recommender tasks, and we refer readers to the published works.

Table 2.

Task	Dataset	Paper
User-Item CF	MovieLens-100K	GC-MC [8], STAR-GCN [197], DHCF [66]
	MovieLens-1M	GC-MC [8], SpectralCF [205], STAR-GCN [197], Bi-HGNN [82], DGCN-BinCF [142]
	MovieLens-10M	GC-MC [8], STAR-GCN [197], DGCN-BinCF [142]
	Amazon	SpectralCF [205], NGCF [155], LR-GCCF [18], LightGCN [55], DGCF [156], AGCN [173], NIA-GCN [132], Multi-GCCF [133]
	Gowalla	NGCF [155], LR-GCCF [18], LightGCN [55], HashGNN [135], DGCF [156], NIA-GCN [132], Multi-GCCF [133]
	Yelp	NGCF [155], LightGCN [55], DGCN-BinCF [142], DGCF [156], Multi-GCCF [133]
Sequential Recommendation	Yoochoose	SR-GNN [175], NISER [48], FGNN [115], HetGNN [152], DGTN [206], DAT-MDI [14], SGNN-HN [113], SHARE [148], DHCN [182],
	Diginetica	SR-GNN [175], GC-SAN [185], NISER [48], FGNN [115], DGTN [206], GCE-GNN [163], DAT-MDI[14], TASRec [207], COTREC [181], SGNN-HN [113], LESSR [19], SHARE [148], DHCN [182]
	Retailrocket	GC-SAN [185], NISER [48], TASRec [207], COTREC [181]
	Amazon	MA-GNN [102], TGSRec [34]
	LastFM	GAG [116], LESSR [19]
	Gowalla	GAG [116], LESSR [19], SERec [20], DGSR [199]
	Social Recommendation	Yelp	DiffNet [172], DiffNet++ [171], DGRec [130], GNN-SoR [47], KCGN [62]
Flickr		DiffNet [172], DiffNet++ [171], MHCN [192]
Ciao		GraphRec [33], GNN-SoR [47], SR-HGNN [186]
Epinions		GraphRec [33], DANSER [174], KCGN [62], SR-HGNN [186]
LastFM		ESRF [191], GNN-SoR [47], MHCN [192]
Douban		ESRF [191], DGRec [130], SR-HGNN [186], MHCN [192]
KG-based Recommendation	MovieLens-1M	AKGE [127], TGCN [13], KHGT [180]
	MovieLens-10M	MKGAT [134]
	MovieLens-20M	KGCN [147], KGNN-LS [145], CKAN [162]
	Book-Crossing	KGCN [147], KGNN-LS [145], CKAN [162]
	LastFM	KGCN [147], KGNN-LS [145], KGAT [154], AKGE [127], TGCN [13], CKAN [162]
	Dianping	KGNN-LS [145], MKGAT [134], CKAN [162]
	Yelp	KGAT [154], AKGE [127], KHGT [180]
	Amazon	KGAT [154], IntentGC [204], ATBRG [35]
	Taobao	IntentGC [204], ATBRG [35]

Table 2. Datasets of GNN-based Recommendation Tasks

MovieLens ² datasets are collected from the MovieLens website, among which three stable benchmark datasets with different scales of rating pairs, i.e., MovieLens-100K, MovieLens-1M, and MovieLens-20M, are most commonly used [51]. Each dataset contains user-item rating pairs with timestamps, the movie’s attributes and tags, and user demographic features. The ratings range from 1 to 5, with a minimum interval of 1. The MovieLens datasets are widely adopted as benchmark datasets in user-item collaborative filtering tasks and knowledge-graph-based recommendation.

The Amazon³ dataset includes reviews (ratings, text, helpfulness votes), product metadata (descriptions, category information, price, brand, and image features), and links (also viewed/also bought graphs) [52]. The full dataset is split into sub-datasets by categories, e.g., Amazon-Books, Amazon-Instant Video, and Amazon-Electronics. The sub-datasets in Amazon are usually adopted to test the performance in user-item collaborative filtering and sequential recommendation.

The Yelp⁴ dataset contains user check-ins and is still being updated. The Yelp dataset is widely adopted in user-item collaborative filtering and POI recommendation tasks. Existing works usually select 1 year of data for experiments; e.g., NGCF [155] uses the 2018 edition of the Yelp dataset.

The Gowalla⁵ dataset is the check-in dataset obtained from Gowalla, where users share their locations by checking in [23]. In addition to the check-in information, the Gowalla dataset also contains the social relationship among users. Gowalla is a classical dataset for POI recommendation and adopted in user-item collaborative filtering and sequential recommendation as well.

The Yoochoose⁶ dataset is obtained from the RecSys Challenge 2015, which contains a stream of user clicks on an e-commerce website within 6 months. Instead of the entire dataset, most of the recent studies use the most recent fractions 1/64 and 1/4 of the sequences as the experimental datasets, named Yoochoose1/64 and Yoochoose1/4, respectively.

Diginetica ⁷ is provided by CIKM Cup 2016, which contains the transactional data in chronological order. Diginetica is commonly used in session-based recommendation.

The RetailRocket⁸ dataset has been collected from a real-world ecommerce website, which contains 6 months of user browsing activities.

The LastFM⁹ dataset contains musician listening information from a set of 2,000 users and the attributes of artists from the Last.fm¹⁰ online music system [81]. This dataset is widely adopted by sequential recommendation, social recommendation, and knowledge-graph-based recommendation.

The Epinions dataset and Ciao dataset are shared by Tang et al. [137]. Each dataset contains the users’ ratings (from 1 to 5), reviews toward items, and directed trust relationships between users. These two datasets have been widely used as benchmarks for social recommendation.

The Book-Crossing¹¹ dataset contains 1 million ratings (ranging from 0 to 10) of books and the attributes of books (e.g., title, author) in the Book-Crossing community. This dataset is widely used as a benchmark for knowledge-graph-based recommendation.

8.2 Evaluation Metrics

It is essential to select adequate metrics to evaluate the performance of compared methods. Table 3 summarizes the evaluation metrics adopted by different recommendation tasks.

Table 3.

Task	Metric	Paper
User-Item CF	Recall	SpectralCF [205], NGCF [155], LightGCN [55], DGCN-BinCF [142], DHCF [66], DGCF [156], NIA-GCN [132], Multi-GCCF [133]
	MAP	SpectralCF [205], DGCN-BinCF [142]
	NDCG	NGCF [155], LR-GCCF [18], LightGCN [55], AGCN [173], HashGNN [135], DGCN-BinCF [142], DHCF [66], DGCF [156], NIA-GCN [132], Multi-GCCF [133]
	HR	LR-GCCF [18], AGCN [173], HashGNN [135], DHCF [66], PinSage [190]
Sequential Recommendation	Precision	SR-GNN [175], FGNN [115], GCE-GNN [163], COTREC [181], SGNN-HN [113], DHCN [182]
	MRR	SR-GNN [175], GC-SAN [185], NISER [48], FGNN [115], GAG [116], HetGNN [152], A-PGNN [176], DGTN [206], GCE-GNN [163], COTREC [181], SGNN-HN [113], LESSR [19], SURGE [12], SHARE [148], DHCN [182], SERec [20], TGSRec [34]
	HR	GC-SAN [185], HetGNN [152], LESSR [19], SHARE [148], SERec [20], DGSR [199]
	Recall	NISER [48], GAG [116], A-PGNN [176], DGTN [206], DAT-MDI[14], TASRec [207], MA-GNN [102], TGSRec [34]
	NDCG	HetGNN [152], DAT-MDI[14], TASRec [207], MA-GNN [102], SURGE [12], DGSR [199]
Social Recommendation	HR	DiffNet [172], DiffNet++ [171], KCGN [62]
	NDCG	DiffNet [172], DiffNet++ [171], ESRF [191], DGRec [130], GNN-SoR [47], KCGN [62], MHCN [192]
	AUC	DANSER [174], HGP [72]
	Precision	DANSER [174], ESRF [191], MHCN [192]
	Recall	ESRF [191], DGRec [130], MHCN [192]
KG-based Recommendation	AUC	KGCN [147], IntentGC [204], ATBRG [35], CKAN [162]
	F1	KGCN [147], CKAN [162]
	Recall	KGNN-LS [145], KGAT [154], MKGAT [134]
	NDCG	KGAT [154], AKGE [127], MKGAT [134], TGCN [13], KHGT [180]
	HR	AKGE [127], TGCN [13], KHGT [180]

Table 3. Evaluation Metrics of GNN-based Recommendation Tasks

HR measures the proportion of users who have at least one click on the recommended items, i.e.,

\begin{equation} {\bf HR}@K = \frac{1}{|\mathcal {U}|}\Sigma _{u\in \mathcal {U}}I(|R^K(u)\cap T(u)|\gt 0), \end{equation}

(17)

where \(T(u)\) denotes the ground-truth item set, \(R^K(u)\) denotes the top-K recommended item set, and \(I(\cdot)\) is the indicator function.

Precision, Recall, and F1 are widely adopted to evaluate the accuracy of top-K recommendation. Precision@K measures the fraction of the items the user will click among the recommended K items. Recall@K measures the proportion of the number of user clicks in the recommended K items to the entire click set. F1@K is the combination of Precision@K and Recall@K:

\begin{equation} \begin{aligned}{\bf Precision}@K(u) =& \frac{|R^K(u)\cap T(u)|}{K},\quad {\bf Recall}@K(u) = \frac{|R^K(u)\cap T(u)|}{|T(u)|},\\ {\bf F1}@K(u) =& \frac{2\times {\bf Precision}@K(u) \times {\bf Recall}@K(u)}{{\bf Precision}@K(u)+{\bf Recall}@K(u)}. \end{aligned} \end{equation}

(18)

The overall metric is the average over all the users, e.g., \({\bf Precision}@K = \frac{1}{|\mathcal {U}|}\Sigma _{u\in \mathcal {U}} {\bf Precision}@K(u)\) .

NDCG differentiates the contributions of the accurately recommended items based on their ranking positions:

\begin{equation} {\bf NDCG}@K = \frac{1}{|\mathcal {U}|}\Sigma _{u\in \mathcal {U}}\frac{\Sigma _{k=1}^K \frac{I(R^K_k(u)\in T(u))}{\log {(k+1)}}}{\Sigma _{k=1}^K \frac{1}{\log {(k+1)}}}, \end{equation}

(19)

where \(R^K_k(u)\) denotes the \(k{\rm {th}}\) item in the recommended list \(R^K(u)\) .

MAP is a widely adopted ranking metric, which measures the average precision over users:

\begin{equation} {\bf MAP}@K = \frac{1}{|\mathcal {U}|}\Sigma _{u\in \mathcal {U}}\Sigma _{k=1}^K \frac{I(R^K_k(u)\in T(u))\mathbf {Precision}@k(u)}{K}. \end{equation}

(20)

AUC is the probability that the model ranks a clicked item more highly than a non-clicked item. When the implicit feedback estimation is considered a binary classification problem, AUC is widely adopted to evaluate the performance:

\begin{equation} {\bf AUC}(u) = \frac{\Sigma _{i\in T(u)}\Sigma _{j\in \mathcal {I}\setminus T(u)} I(\hat{r}_i\gt \hat{r}_j)}{|T(u)||\mathcal {I}\!\setminus \!T(u)|}. \end{equation}

(21)

The overall AUC is the average over all the users.

8.3 Applications

In this part, we summarize the real-world applications of GNN-based recommendation models according to the existing works published by the industry.

Product(Advertisement) recommendation on e-commerce platforms is one of the most common application scenarios [35, 82, 87, 135, 180, 204]. For instance, IntentGC [204] leverages both explicit preferences in user-item interactions and heterogeneous relationships in a knowledge graph by graph convolutional networks, and is deployed at the Alibaba platform for recommending products to users. Another application is the content recommendation, which recommends news articles to users. For example, Wu et al. [174] deploy DANSER on a real-world article recommender system, WeChat Top Story, by exploiting the user-article interactions and social relationships. App recommendation has also attempted to utilize GNN-based models; e.g., GraphSAIL is deployed in the recommender system of a mainstream app store, which selects several hundred apps from the universal set for each user [188]. Besides, Ying et al. [190] deploy PinSage at Pinterest, which can generate higher-quality recommendations of images than comparable deep learning and graph-based alternatives.

9 Future Research Directions and Open Issues

While GNN has achieved great success in recommender systems, this section outlines several promising prospective research directions.

9.1 Diverse and Uncertain Representation

In addition to heterogeneity in data types (e.g., node types like user and item and edge types like different behavior types), users in the graph usually also have diverse and uncertain interests [21, 79]. Representing each user as a onefold vector (a point in the low-dimensional vector space) as in previous works makes it hard to capture such characteristics in users’ interests. Thus, how to represent users’ multiple and uncertain interests is a direction worth exploring.

A natural choice is to extend such onefold vector to multiple vectors with various methods [99, 100, 166], e.g., disentangled representation learning [106, 107] or capsule networks [83]. Some works on GNN-based recommendation also have begun to represent users with multiple vectors. For instance, DGCF [156] explicitly adds orthogonal constraints for multi-aspect representations and iteratively updates the adjacent relationships between the linked nodes for each aspect, respectively. The research of multiple vector representation for recommendation, especially for the GNN-based recommendation model, is still in the preliminary stage, and many issues need to be studied in the future, e.g., how to disentangle the embedding pertinent to users’ intents, how to set the different interest number for each user in an adaptive way, how to design an efficient and effective propagation schema for multiple vector representations.

Another feasible solution is to represent each user as a density instead of a vector. Representing data as a density (usually a multi-dimensional Gaussian distribution) provides many advantages, e.g., better encoding uncertainty for a representation and its relationships and expressing asymmetries more naturally than dot product, cosine similarity, or Euclidean distance. Specifically, Gaussian embedding has been widely used to model the data uncertainty in various domains, e.g., word embedding [141], document embedding [42, 112], and network/graph embedding [9, 54, 160]. For recommendation, Dos Santos et al. [31] and Jiang et al. [67] also deploy Gaussian embedding to capture users’ uncertain preferences for improving user representations and recommendation performance. Density-based representation, e.g., Gaussian embedding, is an interesting direction that is worth exploring but has not been well studied in the GNN-based recommendation models.

9.2 Scalability of GNN in Recommendation

In industrial recommendation scenarios where the datasets include billions of nodes and edges while each node contains millions of features, it is challenging to directly apply the traditional GNN due to the large memory usage and long training time. To deal with the large-scale graphs, there are two mainstreams: one is to reduce the size of the graph by sampling to make existing GNN applicable; another is to design a scalable and efficient GNN by changing the model architecture. Sampling is a natural and widely adopted strategy for training large graphs. For example, GraphSAGE [50] randomly samples a fixed number of neighbors, and PinSage [190] employs the random walk strategy for sampling. Besides, some works [35, 127] reconstruct the small-scale subgraph from the original graph for each user-item pair. However, sampling will lose more or less part of the information, and few studies focus on how to design an effective sampling strategy to balance the effectiveness and scalability.

Another mainstream to solve this problem is to decouple the operations of non-linearities and collapsing weight matrices between consecutive layers [38, 55, 168]. As the neighbor-averaged features need to be precomputed only once, they are more scalable without the communication cost in the model training. However, these models are limited by their choice of aggregators and updaters, as compared to traditional GNN with higher flexibility in learning [17]. Therefore, more future works should be studied in the face of the large-scale graphs.

9.3 Dynamic Graphs in Recommendation

In real-world recommender systems, not only the objects such as users and items but also the relationships between them are changing over time. To maintain the up-to-date recommendation, the systems should be iteratively updated with the new coming information. From the perspective of graphs, the constantly updated information brings about dynamic graphs instead of static ones. Static graphs are stable so they can be modeled feasibly, while dynamic graphs introduce changing structures. An interesting prospective research problem is how to design the corresponding GNN framework in response to the dynamic graphs in practice. Existing studies in recommendation pay little attention to the dynamic graphs. As far as we know, GraphSAIL [188] is the first attempt to address the incremental learning on GNN for recommender systems, which deals with the changing of interactions, i.e., the edges between nodes. To balance the update and preservation, it constrains the embedding similarity between the central node and its neighborhood in successively learned models and controls the incrementally learned embedding close to its previous version. Dynamic graphs in recommendation are a largely under-explored area that deserves further study.

9.4 Reception Field of GNN in Recommendation

The reception field of a node refers to a set of nodes including the node itself and its neighbors reachable within \(K\) -hops [118], where \(K\) is the number of propagation iterations. Generally, the aggregation step \(K\) is the same as the number of GNN layers in coupled GNNs (e.g., GCN and GraphSAGE). In addition, some recent graph diffusion-based works [74, 75, 109] decouple the operations of aggregation and update and embrace a larger reception field with a larger aggregation step. For nodes with low degree, they need a deep GNN architecture to enlarge their reception field for sufficient neighborhood information. However, by increasing the propagation steps, the reception field of nodes with high degree will expand too much and may introduce noise, which could lead to the over-smoothing problem [86] and a consequent drop in performance.

For the graph data in recommendation, the degree of nodes exhibits a long-tail distribution; i.e., active users have lots of interactions with items, while cold users have few interactions, and similar to the popular items and cold items. Therefore, applying the same propagation step on all the nodes may be suboptimal. There are only a few emerging works to adaptively decide the propagation step for each node in order to obtain a reasonable reception field [71, 96, 98]. As a result, how to adaptively select a suitable reception field for each user or item in GNN-based recommendation is still an issue worthy of research.

9.5 Self-supervised Learning

Self-supervised learning (SSL) is an emerging paradigm for improving the utilization of data, which can help alleviate the sparsity issue. Inspired by the success of SSL in other areas, recent efforts have leveraged SSL to recommender systems and made remarkable achievements [170, 209]. In the field of GNN-based recommender systems, there exist few attempts to employ SSL as well. For instance, COTREC [181] designs a contrastive learning task by maximizing the agreement between the representations of the last-clicked item and the predicted item samples, accompanied with the given session representation. DHCN [182] maximizes the mutual information between the session representations learned via the session-to-session graph and item-session hypergraph. The key challenge is how to design an effective supervised signal corresponding to the main task. Considering the prevalence of sparsity issue in recommender systems, we believe self-supervised learning in GNN-based recommender systems is a promising direction.

9.6 Robustness in GNN-based Recommendation

Recent studies show that GNN can be easily fooled by small perturbations on the input [187]; i.e., the performance of GNN will be greatly reduced if the graph structure contains noise. In real-world recommendation scenarios, it is a common phenomenon that the relationships between nodes are not always reliable. For instance, users may accidentally click the items, and part of social relationships cannot be captured. In addition, the attacker may also inject fake data into the recommender systems. Due to the vulnerability of GNN to noisy data, it is of great practical significance to construct a robust recommender system that is able to generate stable recommendations even in the presence of shilling attacks [201]. In the field of GNN, there are emerging efforts in graph adversarial learning to enhance the robustness [56, 187, 211]. Few attempts in GNN-based recommendation have started to pay attention to robustness. For instance, GraphRf [201] jointly learns the rating prediction and fraudster detection, where the probability of being a fraudster determines the reliability of the user’s rating in the rating prediction component. How to build a more robust recommender system is worth exploring but has not been well studied in the GNN-based recommender systems.

9.7 Privacy Preservation

Due to the strict privacy protection under the General Data Protection Regulation,¹² the privacy preservation in recommender systems has aroused lots of concern in academia and industry since most of the data may be deemed confidential/private, e.g., social network and historical behavior [57]. An emerging paradigm is to use federated learning to train recommender systems without uploading users’ data to the central server [94, 111, 149]. However, the local user data only contains first-order user-item interactions, which makes it challenging to capture high-order connectivity without privacy linkage [167]. Another line is to employ differential privacy to guarantee the user privacy in the procedure of recommender systems [39, 128]. One limitation of differential privacy is that it usually brings a decrease in performance [29].

Some efforts have focused on privacy preservation in GNN-based recommendation. For instance, FedGNN [167] trains the GNN model locally based on the local user-item graph. The pseudo interacted items and a user-item graph expansion method are proposed to protect the items and exploit the high-order interactions, respectively. Based on local differential privacy (LDP) [24], FedGNN may add noise to the local gradient of each model and thus decrease the model accuracy. To reduce the amount of noise while maintaining privacy protection, PPGRec [120] converts the LDP model into the central differential privacy model and only adds noise to the aggregated global gradient. With society’s increasing emphasis on privacy protection, privacy preservation in GNN-based recommendations should be an attractive direction due to its practical value.

9.8 Fairness in GNN-based Recommender System

Recent years have seen a surge of research efforts on recommendation biases to achieve fairness [16]. For instance, the recommendation performance for users of different demographic groups should be close, and each item should have an equal probability of overall exposure. With the widespread nature of GNN, there is an increasing societal concern that GNN could make discriminatory decisions [30]. Some explorations have been made toward alleviating biases in GNN-based recommender systems. For instance, NISER [48] applies a normalization operation over the representations to handle popularity bias. FairGNN [27] employs an adversarial learning paradigm to eliminate the bias of GNN by leveraging graph structures and limited sensitive information. Due to the prevalence of biases in recommender systems and society’s growing focus on fairness, ensuring fairness while maintaining comparable performance in GNN-based recommender systems deserves further studying.

9.9 Explainability

Explainability is beneficial for recommender systems: on the one hand, explainable recommendations to users allow them to know why the items are recommended and could be persuasive; on the other hand, the practitioners can know why the model works, which could help further improvements [200]. Due to the significance of explainability, many interests have focused on designing explainable recommendation models or conducting post hoc interpretations [63, 126, 139].

With the proliferation of GNN, recent efforts have investigated GNN explainability methods [194]. The methods can be divided into two categories: the instance-level methods provide example-specific explanations by identifying important input features for its prediction [3, 124, 195]; the model-level methods provide high-level interpretations and a generic understanding of how deep graph models work [193]. There are also some attempts on explainability on GNN-based recommendation [64, 108, 157]. Most of them utilize semantic information in knowledge graph and conduct post hoc interpretations. Up till now, the explainable GNN-based recommender systems still have not been fully explored, which should be an interesting and beneficial direction.

10 Conclusion

Owing to the superiority of GNN in learning on graph data, utilizing GNN techniques in recommender systems has gained increasing interest in academia and industry. In this survey, we provided a comprehensive review of the most recent works on GNN-based recommender systems. We proposed a classification scheme for organizing existing works. For each category, we briefly clarified the main issues, detailed the corresponding strategies adopted by the representative models, and discussed their advantages and limitations. Furthermore, we suggested several promising directions for future research. We hope this survey can provide readers with a general understanding of the recent progress in this field and shed some light on future developments.

Footnotes

The dataset is available at http://2015.recsyschallenge.com/challege.html. Note that this work preprocesses the dataset by filtering out the sequences of length 1 and items appearing less than five times.

https://grouplens.org/datasets/movielens/.

http://jmcauley.ucsd.edu/data/amazon/links.html.

⁴

https://www.yelp.com/dataset.

⁵

http://snap.stanford.edu/data/loc-gowalla.html.

⁶

http://2015.recsyschallenge.com/challege.html.

⁷

http://cikm2016.cs.iupui.edu/cikm-cup.

⁸

https://www.kaggle.com/retailrocket/ecommerce-dataset.

⁹

http://mtg.upf.edu/static/datasets/last.fm/lastfm-dataset-1K.tar.gz.

¹⁰

https://www.last.fm/.

¹¹

http://www2.informatik.uni-freiburg.de/~cziegler/BX/.

¹²

https://gdpr-info.eu/.

References

[1]

Gediminas Adomavicius and Alexander Tuzhilin. 2005. Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. TKDE 17, 6 (2005), 734–749.

Abstract

1 Introduction

2 Backgrounds and Categorization

2.1 Recommender Systems

2.2 Graph Neural Network Techniques

2.3 Why Graph Neural Network for Recommendation

2.4 Categories of Graph-neural-network-based Recommendation

3 User-item Collaborative Filtering

3.1 Graph Construction

3.2 Neighbor Aggregation

3.3 Information Update

3.4 Final Node Representation

3.5 Summary

4 Sequential Recommendation

4.1 Graph Construction

4.2 Information Propagation

4.3 Sequential Preference

4.4 Summary

5 Social Recommendation

5.1 Influence of Friends

5.2 Preference Integration

5.3 Summary

6 Knowledge-graph-based Recommendation

6.1 Graph Construction

6.2 Relation-aware Aggregation

6.3 Summary

7 Other Tasks

8 Datasets, Evaluation Metrics, and Applications

8.1 Datasets

8.2 Evaluation Metrics

8.3 Applications

9 Future Research Directions and Open Issues

9.1 Diverse and Uncertain Representation

9.2 Scalability of GNN in Recommendation

9.3 Dynamic Graphs in Recommendation

9.4 Reception Field of GNN in Recommendation

9.5 Self-supervised Learning

9.6 Robustness in GNN-based Recommendation

9.7 Privacy Preservation

9.8 Fairness in GNN-based Recommender System

9.9 Explainability

10 Conclusion

Footnotes

References

Cited By

Index Terms

Recommendations

Improving Sequential Recommendation with Attribute-Augmented Graph Neural Networks

A survey of graph neural network based recommendation in social networks

Rule-Guided Graph Neural Networks for Recommender Systems

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

HTML Format

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations