1 Introduction
With the rapid development of e-commerce and social media platforms, recommender systems have become indispensable tools for many businesses [
15,
25,
84,
183,
190,
200]. They can be recognized as various forms depending on industries, like product suggestions on online e-commerce websites (e.g., Amazon and Taobao) or playlist generators for video and music services (e.g., YouTube, Netflix, and Spotify). Users rely on recommender systems to alleviate the information overload problem and explore what they are interested in from the vast sea of items (e.g., products, movies, news, or restaurants). Therefore, accurately modeling users’ preferences from their historical interactions (e.g., click, watch, read, and purchase) lives at the heart of an effective recommender system.
Broadly speaking, in the past decades, the mainstream modeling paradigm in recommender systems has evolved from neighborhood methods [
6,
60,
95,
123] to representation-learning-based frameworks [
25,
77,
78,
125,
143]. Item-based neighborhood methods [
6,
95,
123] directly recommend items to users that are similar to the historical items they have interacted with. In a sense, they represent users’ preferences by directly using their historical interacted items. Early item-based neighborhood approaches have achieved great success in real-world applications because of their simplicity, efficiency, and effectiveness.
An alternative approach is representation-learning-based methods that try to encode both users and items as continuous vectors (i.e., embeddings) in a shared space, thus making them directly comparable. Representation-based models have sparked a surge of interest since the Netflix Prize competition [
7] demonstrated that matrix factorization models are superior to classic neighborhood methods for recommendations. After that, various methods have been proposed to learn the representations of users and items, from matrix factorization [
77,
78] to deep learning models [
25,
58,
125,
200]. Nowadays, deep learning models have been a dominant methodology for recommender systems in both academic research and industrial applications due to the ability in effectively capturing the non-linear and non-trivial user-item relationships and easily incorporating abundant data sources, e.g., contextual, textual, and visual information.
Among all those deep learning algorithms, one line is graph-learning-based methods, which consider the information in recommender systems from the perspective of graphs [
151]. Most of the data in recommender systems have a graph structure essentially [
8,
190]. For example, the interaction data in a recommendation application can be represented by a bipartite graph between user and item nodes, with observed interactions represented by links. Even the item transitions in users’ behavior sequences can also be constructed as graphs. The benefit of formulating recommendation as a task on graphs becomes especially evident when incorporating structured external information, e.g., the social relationship among users [
33,
172] and the knowledge graph related to items [
146,
196]. In this way, graph learning provides a unified perspective to model the abundant heterogeneous data in recommender systems. Early efforts in graph-learning-based recommender systems utilize graph embedding techniques to model the relations between nodes, which can be further divided into factorization-based methods, distributed-representation-based methods, and neural-embedding-based methods [
151]. Inspired by the superior ability of GNN in learning on graph-structured data, a great number of GNN-based recommendation models have emerged recently.
Nevertheless, providing a unified framework to model the abundant data in recommendation applications is only part of the reason for the widespread adoption of GNN in recommender systems. Another reason is that, different from traditional methods that only implicitly capture the collaborative signals (i.e., using user-item interactions as the supervised signals), GNN can naturally and explicitly encode the crucial collaborative signal (i.e., topological structure) to improve the user and item representations. In fact, using collaborative signals to improve representation learning in recommender systems is not a new idea that originated from GNN [
41,
69,
76,
184,
203]. Early efforts, such as SVD++ [
76] and FISM [
69], have already demonstrated the effectiveness of the interacted items in user representation learning. In view of the user-item interaction graph, these previous works can be seen as using one-hop neighbors to improve user representation learning. The advantage of GNN is that it provides powerful and systematic tools to explore multi-hop relationships that have been proven to be beneficial to the recommender systems [
55,
155,
190].
With these advantages, GNN has achieved remarkable success in recommender systems in the past few years. In academic research, a lot of works demonstrate that GNN-based models outperform previous methods and achieve new state-of-the-art results on the public benchmark datasets [
55,
155,
210]. Meanwhile, plenty of their variants are proposed and applied to various recommendation tasks, e.g., session-based recommendation [
115,
175],
point-of-interest (POI) recommendation [
10,
92,
177], group recommendation [
59,
153], multimedia recommendation [
164,
165], and bundle recommendation [
11]. In industry, GNN has also been deployed in web-scale recommender systems to produce high-quality recommendation results [
32,
114,
190]. For example, Pinterest developed and deployed a random-walk-based
Graph Convolutional Network (GCN) algorithm model named PinSage on a graph with 3 billion nodes and 18 billion edges, and gained substantial improvements in user engagement in online A/B test.
Differences between this survey and existing ones. There exist surveys focusing on different perspectives of recommender systems [
4,
16,
22,
28,
45,
117,
200]. However, there are very few comprehensive reviews that position existing works and current progress of applying GNN in recommender systems. For example, Zhang et al. [
200] and Batmaz et al. [
4] focus on most of the deep-learning techniques in recommender systems while ignoring GNN. Chen et al. [
16] summarize the studies on the bias issue in recommender systems. Guo et al. [
45] review knowledge-graph-based recommendations, and Wang et al. [
150] propose a comprehensive survey in the session-based recommendations. These two works only include some of the GNN methods applied in the corresponding sub-fields and examine a limited number of works. To the extent of our knowledge, the most relevant survey published formally is a short paper [
151], which presents a review of graph-learning-based systems and briefly discusses the application of GNN in recommendation. One recent survey under review [
40] classifies the existing works in GNN-based recommender systems from four perspectives of recommender systems, i.e., stage, scenario, objective, and application. Such taxonomy emphasizes recommender systems but pays insufficient attention to applying GNN techniques in recommender systems. Besides, this survey [
40] provides few discussions on the advantages and limitations of existing methods. There are some comprehensive surveys on the GNN techniques [
179,
208], but they only roughly discuss recommender systems as one of the applications.
Given the impressive pace at which the GNN-based recommendation models are growing, we believe it is important to summarize and describe all the representative methods in one unified and comprehensible framework. This survey summarizes the literature on the advances of GNN-based recommendation and discusses open issues or future directions in this field To this end, more than 100 studies were shortlisted and classified in this survey.
Contribution of this survey. The goal of this survey is to thoroughly review the literature on the advances of GNN-based recommender systems and discuss further directions. The researchers and practitioners who are interested in recommender systems could have a general understanding of the latest developments in the field of GNN-based recommendation. The key contributions of this survey are summarized as follows:
\(\bullet\) New taxonomy. We propose a systematic classification schema to organize the existing GNN-based recommendation models. Specifically, we categorize the existing works based on the type of information used and recommendation tasks into five categories: user-item collaborative filtering, sequential recommendation, social recommendation, knowledge-graph-based recommendation, and other tasks (including POI recommendation, multimedia recommendation, etc.).
\(\bullet\) Comprehensive review. For each category, we demonstrate the main issues to deal with. Moreover, we introduce the representative models and illustrate how they address these issues.
\(\bullet\) Future research. We discuss the limitations of current methods and propose nine potential future directions.
The remainder of this article is organized as follows: Section
2 introduces the preliminaries for recommender systems and graph neural networks. Then, it discusses the motivations of applying GNNs in recommender systems and categorizes the existing GNN-based recommendation models. Section
3 through
7 summarize the main issues of models in each category and how existing works tackle these challenges, and analyze their advantages and limitations. Section
8 gives a summary of the mainstream benchmark datasets, widely adopted evaluation metrics, and real-world applications. Section
9 discusses the challenges and points out nine future directions in this field. Finally, we conclude the survey in Section
10.
2 Backgrounds and Categorization
Before diving into the details of this survey, we give a brief introduction to recommender systems and GNN techniques. We also discuss the motivation of utilizing GNN techniques in recommender systems. Furthermore, we propose a new taxonomy to classify the existing GNN-based models. Throughout this article, we use bold uppercase characters to denote matrices, bold lowercase characters to denote vectors, italic bold uppercase characters to denote sets, and calligraphic fonts to denote graphs. For easy reading, we summarize the notations that will be used throughout the article in Table
1.
2.1 Recommender Systems
Recommender systems infer users’ preferences from user-item interactions or static features and further recommend items that users might be interested in [
1]. It has been a popular research area for decades because it has great application value and the challenges in this field are still not well addressed. Formally, the task is to estimate her/his preference for any item
\(i\in \mathcal {I}\) by the learned user representation
\(h_u^*\) and item representation
\(h_i^*\) , i.e.,
where score function
\(f(\cdot)\) can be dot product, cosine, multi-layer perceptions, and so forth, and
\(y_{u,i}\) denotes the preference score for user
\(u\) on item
\(i\) , which is usually presented in probability.
According to the types of information used to learn user/item representations, the research of recommender systems can usually be classified into specific types of tasks. The
user-item collaborative filtering recommendation aims to capture the collaborative signal by leveraging only the user-item interactions; i.e., the user/item representations are jointly learned from pairwise data [
58,
78,
80,
121,
125,
178]. When the timestamps of the user’s historical behavior are known or the historical behavior is organized in chronological order, the user representations can be enhanced via exploring the sequential patterns in her/his historical interactions [
53,
61,
70,
85,
97,
119,
131,
136,
150]. According to whether the users are anonymous or not and whether the behaviors are segmented into sessions, works in this field can be further divided into
sequential recommendation and
session-based recommendation. The session-based recommendation can be viewed as a sub-type of sequential recommendation with anonymous and session assumptions [
117]. In this survey, we do not distinguish them and refer to them collectively as the much broader term “sequential recommendation” for simplicity since our main focus is the contribution of GNN to recommendation, and the differences between them are negligible for the application of GNN. In addition to sequential information, another line of research exploits the social relationship to enhance the user representations, which is classified as
social recommendation [
43,
65,
103,
104,
105,
138]. The social recommendation assumes that the users with social relationships tend to have similar user representations based on the social influence theory that connected people would influence each other. Besides the user representation enhancement, a lot of efforts try to enhance the item representations by leveraging a knowledge graph, which expresses relationships between items through attributes. These works are always categorized as
knowledge-graph-based recommender systems, which incorporate the semantic relations among items into collaborative signals.
2.2 Graph Neural Network Techniques
Recently, systems based on variants of GNN have demonstrated ground-breaking performances on many tasks related to graph data, such as physical systems [
5,
122], protein structure [
37], and knowledge graph [
49]. In this part, we first introduce the definition of graphs, and then give a brief summary of the existing GNN techniques.
A graph is represented as \(\mathcal {G}=(\mathcal {V},\mathcal {E})\) , where \(\mathcal {V}\) is the set of nodes and \(\mathcal {E}\) is the set of edges. Let \(v_i\in \mathcal {V}\) be a node and \(e_{ij}=(v_i, v_j)\in \mathcal {E}\) be an edge pointing from \(v_j\) to \(v_i\) . The neighborhood of a node \(v\) is denoted as \(\mathcal {N}(v)=\lbrace u\in \mathcal {V} | (v,u)\in \mathcal {E} \rbrace\) . Generally, graphs can be categorized as:
\(\bullet\) Directed/Undirected Graph. A directed graph is a graph with all edges directed from one node to another. An undirected graph is considered as a special case of directed graphs where there is a pair of edges with inverse directions if two nodes are connected.
\(\bullet\) Homogeneous/Heterogeneous Graph. A homogeneous graph consists of one type of nodes and edges, and a heterogeneous graph has multiple types of nodes or edges.
\(\bullet\) Hypergraph. A hypergraph is a generalization of a graph in which an edge can join any number of vertices.
Given the graph data, the main idea of GNN is to iteratively aggregate feature information from neighbors and integrate the aggregated information with the current central node representation during the propagation process [
179,
208]. From the perspective of network architecture, GNN stacks multiple propagation layers, which consist of the aggregation and update operations. The formulation of propagation is
where
\(\mathbf {h}_{u}^{(l)}\) denotes the representation of node
\(u\) at the
\(l{\rm {th}}\) layer, and
\(\operatorname{Aggregator}_{l}\) and
\(\operatorname{Updater}_{l}\) represent the function of aggregation operation and update operation at the
\(l{\rm {th}}\) layer, respectively. In the aggregation step, existing works either treat each neighbor equally with the mean-pooling operation [
50,
89] or differentiate the importance of neighbors with the attention mechanism [
140]. In the update step, the representation of the central node and the aggregated neighborhood will be integrated into the updated representation of the central node. In order to adapt to different scenarios, various strategies are proposed to better integrate the two representations, such as GRU mechanism [
89], concatenation with nonlinear transformation [
50] and sum operation [
140]. To learn more about GNN techniques, we refer the readers to the surveys [
179,
208].
Here, we briefly summarize the aggregation and update operations of five typical GNN frameworks that are widely adopted in the field of recommendation.
\(\bullet\) GCN [
73] approximates the first-order eigendecomposition of the graph Laplacian to iteratively aggregate information from neighbors. Concretely, it updates the embedding by
where
\(\delta (\cdot)\) is the nonlinear activation function, like ReLU;
\(\mathbf {W}^{(l)}\) is the learnable transformation matrix for layer
\(l\) ;
\(\tilde{a_{vj}}\) is the adjacency weight (
\(\tilde{a_{vv}}=1\) ); and
\(d_{jj}=\Sigma _k \tilde{a_{jk}}\) .
\(\bullet\) GraphSAGE [
50] samples a fixed size of neighborhood for each node, proposes mean/sum/ max-pooling aggregator, and adopts concatenation operation for update:
where
\(\operatorname{Aggregator}_{l}\) denotes the aggregation function at the
\(l{\rm {th}}\) layer,
\(\delta (\cdot)\) is the nonlinear activation function, and
\(\mathbf {W}^{(l)}\) is the learnable transformation matrix.
\(\bullet\) GAT [
140] assumes that the influence of neighbors is neither identical nor pre-determined by the graph structure, and thus it differentiates the contributions of neighbors by leveraging the attention mechanism and updates the vector of each node by attending over its neighbors:
where
\(\text{Att}(\cdot)\) is an attention function and a typical
\(\text{Att}(\cdot)\) is
\(\operatorname{LeakyReLU}(\mathbf {a}^{T}[\mathbf {W}^{(l)}\mathbf {h}^{(l)}_{v} \oplus \mathbf {W}^{(l)}\mathbf {h}^{(l)}_{j}])\) ,
\(\mathbf {W}^{(l)}\) is responsible for transforming the node representations at the
\(l{\rm {th}}\) propagation, and
\(\mathbf {a}\) is the learnable parameter.
\(\bullet\) GGNN [
89] adopts a gated
recurrent unit (GRU) [
89] in the update step:
GGNN executes the recurrent function several times over all nodes [
179], which might face the scalability issue when it is applied in large graphs.
\(\bullet\) HGNN [
36] is a typical hypergraph neural network, which encodes high-order data correlation in a hypergraph structure. The hyperedge convolutional layer is in the following formulation:
where
\(\delta (\cdot)\) is the nonlinear activation function, like ReLU;
\(\mathbf {W}^{(l)}\) is the learnable transformation matrix for layer
\(l\) ;
\(\mathbf {E}\) is the hypergraph adjacent matrix; and
\(\mathbf {D}_e\) and
\(\mathbf {D}_v\) denote the diagonal matrices of the edge degrees and the vertex degrees, respectively.
2.3 Why Graph Neural Network for Recommendation
In the past few years, many works on GNN-based recommendation have been proposed. Before diving into the details of the latest developments, it is beneficial to understand the motivations of applying GNN to recommender systems.
The most intuitive reason is that GNN techniques have been demonstrated to be powerful in representation learning for graph data in various domains [
44,
208], and most of the data in recommendation has essentially a graph structure as shown in Figure
1. For instance, the user-item interaction data can be represented by a bipartite graph (as shown in Figure
1(a)) between the user and item nodes, where the link represents the interaction between the corresponding user and item. Besides, a sequence of items can be transformed into the sequence graph, where each item can be connected with one or more subsequent items. Figure
1(b) shows an example of a sequence graph where there is an edge between consecutive items. Compared to the original sequence data, a sequence graph allows more flexibility to item-to-item relationships. Beyond that, some side information also naturally has a graph structure, such as a social relationship and knowledge graph, as shown in Figures
1(c) and
1(d).
Due to the specific characteristic of different types of data in recommendation, a variety of models have been proposed to effectively learn their pattern for better recommendation results, which is a big challenge for the model design. Considering the information in recommendation from the perspective of the graph, a unified GNN framework can be utilized to address all these tasks. For example, the task of non-sequential recommendation is to learn the effective node representations, i.e., user/item representations, and to further predict user preferences. The task of sequential recommendation is to learn the informative graph representation, i.e., sequence representation. Both node representation and graph representation can be learned through GNN. Besides, it is more convenient and flexible to incorporate additional information (if available) compared to the non-graph perspective. For instance, the social network can be integrated into the user-item bipartite relationship as a unified graph. Both the social influence and collaborative signal can be captured during the iterative propagation.
Moreover, GNN can explicitly encode the crucial collaborative signal of user-item interactions to enhance the user/item representations through the propagation process. Utilizing collaborative signals for better representation learning is not a completely new idea. For instance, SVD++ [
76] incorporates the representations of interacted items to enrich the user representations. ItemRank [
41] constructs the item-item graph from interactions and adopts the random-walk algorithm to rank items according to user preferences. Note that SVD++ can be seen as using one-hop neighbors (i.e., items) to improve user representations, while ItemRank utilizes two-hop neighbors to improve item representations. Compared with the non-graph model, GNN is more flexible and convenient to model multi-hop connectivity from user-item interactions, and the captured CF signals in high-hop neighbors have been demonstrated to be effective for recommendation.
2.4 Categories of Graph-neural-network-based Recommendation
In this survey, we propose a new taxonomy to classify the existing GNN-based models. Based on the types of information used and recommendation tasks, the existing works are categorized into user-item collaborative filtering, sequential recommendation, social recommendation, knowledge-graph-based recommendation, and other tasks. In addition to the former four types of tasks, there are other recommendation tasks, such as POI recommendation, multimedia recommendation, and bundle recommendation. Since the studies utilizing GNN in these tasks are not that abundant, we group them into one category and discuss their current developments, respectively.
The rationale of classification is as follows: The graph structure depends to a large extent on the type of information. For example, a social network is naturally a homogeneous graph, and user-item interaction can be considered either a bipartite graph or two homogeneous graphs (i.e., user-user and item-item graphs). Besides, the information type also plays a key role in designing an efficient GNN architecture, such as aggregation and update operations and network depth. For instance, a knowledge graph has multi-type entities and relations, which requires considering such heterogeneity during propagation. Moreover, recommendation tasks are highly related to the type of information used. For example, the social recommendation is to make a recommendation by utilizing the social network information, and the knowledge-graph-based recommendation is to enhance the item representation by leveraging semantic relations among items in the knowledge graph. This survey is mainly for the readers interested in the development of GNN in recommender systems. Thus, our taxonomy is primarily from the perspective of recommender systems but also takes the GNN into account.
3 User-item Collaborative Filtering
Given the user-item interaction data, the basic idea of user-item collaborative filtering is essentially using the items interacted with by users to enhance user representations and using the users’ once-interacted-with items to enrich item representations. Inspired by the advantage of GNN techniques in simulating the information diffusion process, recent efforts have studied the design of GNN methods, in order to exploit high-order connectivity from user-item interactions more efficiently. Figure
2 illustrates the pipeline of applying GNN to user-item interaction information.
To take full advantage of GNN methods on capturing collaborative signals from user-item interactions, there are four main issues to deal with:
\(\bullet\) Graph Construction. Graph structure is essential for the scope and type of information to propagate. The original bipartite graph consists of a set of user/item nodes and the interactions between them. Should GNN be applied over the heterogeneous bipartite graph or should the homogeneous graph be constructed based on two-hop neighbors? Considering computational efficiency, how should representative neighbors be sampled for graph propagation instead of operating on the full graph?
\(\bullet\) Neighbor Aggregation. How the information be aggregated from neighbor nodes–specifically, whether to differentiate the importance of neighbors, model the affinity between the central node and neighbors, or model the interactions among neighbors?
\(\bullet\) Information Update. How the central node representation and the aggregated representation of its neighbors be integrated?
\(\bullet\) Final Node Representation. Predicting the user’s preference for the items requires the overall user/item representation. Should the node representation in the last layer or the combination of the node representations in all layers be used as the final node representation?
3.1 Graph Construction
Most works [
8,
18,
55,
82,
132,
135,
142,
155,
173,
197,
205] apply the GNN on the original user-item bipartite graph directly. There are two issues in directly applying GNN on the original graph: one is effectiveness, that the original graph structure might not be sufficient enough for learning user/item representations; another one is efficiency, that aggregating the information of the full neighborhoods of nodes requires high computation cost especially for the large-scale graph [
190].
One strategy to address the first issue is to enrich the original graph structure by adding edges, such as links between two-hop neighbors and hyperedges. For instance, Multi-GCCF [
133] and DGCF [
101] add edges between two-hop neighbors on the original graph to obtain the user-user and item-item graph. In this way, the proximity information among users and items can be explicitly incorporated into user-item interactions. DHCF [
66] introduces the hyperedges and constructs the user/item hypergraphs in order to capture explicit hybrid high-order correlations. Another strategy is to introduce virtual nodes for enriching the user-item interactions. For example, DGCF [
156] introduces virtual intent nodes and decomposes the original graph into a corresponding subgraph for each intent, which represents the node from different aspects and has better expressive power. HiGNN [
91] creates new coarsened user-item graphs by clustering similar users/items and taking the clustered centers as new nodes in order to explicitly capture hierarchical relationships among users and items.
In terms of the second issue, sampling strategies are proposed to make GNN efficient and scalable to large-scale graph-based recommendation tasks. PinSage [
190] designs a random-walk-based sampling method to obtain the fixed size of neighborhoods with the highest visit counts. In this way, those nodes that are not directly adjacent to the central node may also become its neighbors. Multi-GCCF [
133] and NIA-GCN [
132] randomly sample a fixed size of neighbors. Sampling is a tradeoff between the original graph information and computational efficiency. The performance of the model depends on the sampling strategy, and the more efficient sampling strategy for neighborhood construction deserves further studying.
3.2 Neighbor Aggregation
The aggregation step is of the vital importance for information propagation for the graph structure, which decides how much neighbors’ information should be propagated. Mean-pooling is one of the most straightforward aggregation operations [
8,
133,
135,
197,
198], which treats neighbors equally:
Mean-pooling is easy for implementation but might be inappropriate when the importance of neighbors is significantly different. Following the traditional GCN, some works employ “degree normalization” [
18,
55,
173], which assigns weights to nodes based on the graph structure:
Owing to the random-walk sampling strategy, PinSage [
190] adopts the normalized visit counts as the importance of neighbors when aggregating the vector representations of neighbors. However, these aggregation functions determine the importance of neighbors according to the graph structure but ignore the relationships between the connected nodes.
Motivated by common sense that the embeddings of items in line with the user’s interests should be passed more to the user (analogously for the items), MCCF [
158] and DisenHAN [
107] leverage the attention mechanism to learn the weights of neighbors [
107,
146]. NGCF [
155] employs element-wise product to augment the items’ features the user cares about or the users’ preferences for features the item has. Take the user node as an example; the aggregated neighbor representation is calculated as follows:
NIA-GCN [
132] argues that existing aggregation functions fail to preserve the relational information within the neighborhood, and thus proposes the pairwise neighborhood aggregation approach to explicitly capture the interactions among neighbors. Concretely, it applies element-wise multiplication between every two neighbors to model the user-user/item-item relationships.
3.3 Information Update
Given the information aggregated from its neighbors, how to update the representation of the node is essential for iterative information propagation. According to whether to retain the information of the node itself, the existing methods can be divided into two directions. One is to discard the original information of the user or item node completely and use the aggregated representation of neighbors as the new central node representation [
8,
55,
156,
197], which might overlook the intrinsic user preference or the intrinsic item property.
Another is to take both the node itself (
\(\mathbf {h}_u^{(l)}\) ) and its neighborhood message (
\(\mathbf {n}_u^{(l)}\) ) into consideration to update node representations. The most straightforward way is to combine these two representations linearly with sum-pooling or mean-pooling operation [
132,
155,
173,
198]. Inspired by the GraphSAGE [
50], some works [
82,
133,
190] adopt concatenation function with non-linear transformation to integrate these two representations as follows:
where
\(\sigma\) denotes the activation function, e.g., ReLU, LeakyReLU, and sigmoid. Compared to linear combination, the concatenation operation with feature transformation allows more complex feature interaction. LightGCN [
55] and LR-GCCF [
18] observe that nonlinear activation contributes little to the overall performance, and they simplify the update operation by removing the non-linearities, thereby retaining or even improving performance and increasing computational efficiency.
3.4 Final Node Representation
Applying the aggregation and update operations layer by layer generates the representations of nodes for each depth of GNN. The overall representations of users and items are required for the final prediction task.
A mainstream approach is to use the node vector in the last layer as the final representation, i.e.,
\({\bf h}_u^*={\bf h}_u^{(L)}\) [
8,
82,
135,
161,
190,
197]. However, the representations obtained in different layers emphasize the messages passed over different connections [
155]. Specifically, the representations in the lower layer reflect the individual feature more, while those in the higher layer reflect the neighbor feature more. To take advantage of the connections expressed by the output of different layers, recent studies employ different methods to integrate the messages from different layers:
where
\(\alpha ^{(l)}\) is a learnable parameter. Note that mean-pooling and sum-pooling can be seen as two special cases of weighted pooling. Compared to mean-pooling and sum-pooling, weighted pooling allows more flexibility to differentiate the contribution of different layers. Among these four methods, the former three all belong to the linear operation, and only the concatenation operation preserves information from all layers.
3.5 Summary
Corresponding to the discussion at the beginning of this section, we briefly summarize the existing works from four issues:
\(\bullet\) Graph Construction. The most straightforward way is to directly use the original user-item bipartite graph. If some nodes have few neighbors in the original graph, it would be beneficial to enrich the graph structure by adding either edges or nodes. When dealing with large-scale graphs, it is necessary to sample the neighborhood for computational efficiency. Sampling is a tradeoff between effectiveness and efficiency, and a more effective sampling strategy deserves further study.
\(\bullet\) Neighbor Aggregation. When neighbors are more heterogeneous, aggregating neighbors with attentive weights would be preferable to equal weights and degree normalization; otherwise, the latter two are preferable for easier calculation. Explicitly modeling the influence among neighbors or the affinity between the central node and neighbors might bring additional benefits but needs to be verified on more datasets.
\(\bullet\) Information Update. Compared to discarding the original node, updating the node with its original representation and the aggregated neighbor representation would be preferable. Recent works show that simplifying the traditional GCN by removing the transformation and non-linearity operation can achieve better performance than the original ones.
\(\bullet\) Final Node Representation. To obtain overall user/item representation, utilizing the representations from all layers is preferable to directly using the last layer representation. In terms of the function of integrating the representations from all layers, weighted-pooling allows more flexibility, and concatenation preserves information from all layers.
Figure
3 summarizes the typical strategies for each of the main issues and lists the representative works accordingly.
4 Sequential Recommendation
Sequential recommendation predicts users’ next preferences based on their most recent activities, which seeks to model sequential patterns among successive items and generate accurate recommendations for users [
117]. From the perspective of adjacency between items, sequences of items can be modeled as graph-structured data. Inspired by the advantage of GNN, it is becoming popular to utilize GNN to capture the transition pattern from users’ sequential behaviors by transforming them into the sequence graph.
Figure
4 illustrates the overall framework of GNN in sequential recommendation. To fully utilize GNN in the sequential recommendation, there are three main issues to deal with:
\(\bullet\) Graph Construction. To apply GNN in the sequential recommendation, the sequence data should be transformed into a sequence graph. Is it sufficient to construct a subgraph for each sequence independently? Would it be better to add edges among several consecutive items than only between the two consecutive items?
\(\bullet\) Information Propagation. To capture the transition patterns, which propagation mechanism is more appropriate? Is it necessary to distinguish the sequential order of the linked items?
\(\bullet\) Sequential Preference. To get the user’s temporal preference, the item representations in a sequence should be integrated. Should one simply apply attentive pooling or leverage RNN structure to enhance consecutive time patterns?
4.1 Graph Construction
Unlike the user-item interactions, which have essentially a bipartite graph structure, the sequential behaviors are naturally expressed in the order of time, i.e., sequences, instead of sequence graphs. Constructing a graph based on the original bipartite graph is optional and mainly driven by the scalability or heterogeneity issue, whereas the construction of a sequence graph based on users’ sequential behaviors is a necessity for applying GNN in sequential recommendation. Figure
5 shows the representative graph construction strategies for sequential behaviors.
Constructing the directed graph for each sequence by treating each item in the sequence as a node and adding edges between two consecutively clicked items is the most straightforward way [
48,
115,
116,
175,
185]. However, in most scenarios, the length of the user sequence is short; e.g., the average length on the preprocessed Yoochoose1/4
1 dataset is 5.71 [
175]. A sequence graph constructed from a single and short sequence consists of a small number of nodes and connections, and some nodes might even have only one neighbor, which contains too limited knowledge to reflect users’ dynamic preferences and cannot take full advantage of GNN in graph learning. To tackle this challenge, recent works propose several strategies to enrich the original sequence graph structure, which can be divided into two mainstreams.
One mainstream is to utilize additional sequences to enrich the item-item transitions. The additional sequences can be other types of behavior sequences [
152], the historical sequences of the same user [
176], or part/all of the sequences in the whole dataset [
14,
163,
206,
207]. For instance, HetGNN [
152] utilizes all behavior sequences and constructs edges between two consecutive items in the same sequence with their behavior types as the edge types. A-PGNN [
176] deals with the occasion when users are known, thus incorporating the user’s historical sequences with the current sequence to enrich the item-item connections. GCE-GNN [
163] and DAT-MDI [
14] exploit the item transitions in all sessions to assist the transition patterns in the current sequence, which leverage the local context and global context. Different from GCE-GNN [
163] and DAT-MDI [
14] that treats all the transitions equally, TASRec [
207] attaches more importance to the recent transitions to augment the more recent transitions. Instead of incorporating all the sessions, DGTN [
206] only adds similar sessions to the current session, based on the assumption that similar sequences are more likely to reflect similar transition patterns. All these methods introduce more information into the original graph and improve the performance compared to a single sequence graph.
Another mainstream approach is to adjust the graph structure of the current sequence. For example, assuming the current node has a direct influence on more than one consecutive item, MA-GNN [
102] extracts three subsequent items and adds edges between them. Considering that only adding edges between consecutive items might neglect the relationships between distant items, SGNN-HN [
113] introduces a virtual “star” node as the center of the sequence, which is linked with all the items in the current sequence. The vector-wise representation of the “star” node reflects the overall characteristics of the whole sequence. Hence, each item can gain some knowledge of the items without direct connections through the “star” node. Chen and Wong [
19] point out that existing graph construction methods ignore the sequential information of neighbors and bring about the ineffective long-term capturing problem. Therefore, they propose LESSR, which constructs two graphs from one sequence: one distinguishes the order of neighbors, and the other allows the short-cut path from the item to all the items after it.
In addition to the above two mainstreams, other graph construction methods have emerged recently. Inspired by the advantage of the hypergraph in modeling beyond-pairwise relations, the hypergraph has been leveraged to capture the high-order relations among items and the cross-session information. SHARE [
148] constructs a hypergraph for each session, of which the hyperedges are defined by various sizes of sliding windows. DHCN [
182] takes each session as one hyperedge and integrates all the sessions in one hypergraph. To explicitly incorporate cross-session relationships, DHCN [
182] and COTREC [
181] construct the session-to-session graph, which takes each session as a node and assigns the weights based on the shared items.
4.2 Information Propagation
Given a built sequence graph, it is essential to design an efficient propagation mechanism to capture transition patterns among items. The GGNN framework is widely adopted to propagate information on the directed graph. Specifically, it employs mean-pooling to aggregate the information of the previous items and the next items, respectively; combines the two aggregated representations; and utilizes the GRU [
89] component to integrate the information of neighbors and the central node. The propagation functions are given as follows:
where
\(\mathcal {N}_{i_{s,t}}^{\mathrm{in}}\) ,
\(\mathcal {N}_{i_{s,t}}^{\mathrm{out}}\) denote the neighborhood set of previous items and next items, respectively, and
\({\bf GRU}(\cdot)\) represents the GRU component. Different from the pooling operation, the gate mechanism in GRU decides what information is to be preserved and discarded. Unlike GGNN, which treats the neighbors equally, the attention mechanism is also utilized to differentiate the importance of neighbors [
12,
115,
163]. All the above methods adopt the permutation-invariant aggregation function during the message passing, ignoring the order of items within the neighborhood, which may lead to the loss of information [
19]. To address this issue, LESSR [
19] preserves the order of items in the graph construction and leverages the GRU component [
89] to aggregate the neighbors sequentially, as in the following equation:
where
\(\mathbf {h}_{i_{s,t},k}^{(l)}\) represents the
\(k{\rm {th}}\) item in the neighborhood of
\(i_{s,t}\) ordered by time, and
\(\mathbf {n}_{i_{s,t},k}^{(l)}\) denotes the neighborhood representation after aggregating
\(k\) items.
For the sequence graph with hypergraph structure, DHCN [
182] adopts the typical hypergraph neural network HGNN [
36], which treats the nodes equally during propagation. To differentiate the importance of items within the same hyperedge, SHARE [
148] designs two attention mechanisms to propagate the information of item nodes. One is the hyperedges, and the other is the information of the hyperedges to the connected item nodes. For user-aware sequential recommendation, A-PGNN [
176] and GAGA [
116] implicitly incorporate the user information and augment the representations of items in the neighborhood with user representation.
4.3 Sequential Preference
Due to the limited iteration of propagation, GNN cannot effectively capture long-range dependency among items [
19]. Therefore, the representation of the last item (or any item) in the sequence is not sufficient enough to reflect the user’s sequential preference. Besides, most of the graph construction methods of transforming sequences into graphs lose part of the sequential information [
19]. In order to obtain the effective sequence representation, existing works propose several strategies to integrate the item representations in the sequence.
Considering that the items in a sequence have different levels of priority, the attention mechanism is widely adopted for integration. Some works [
113,
116,
175,
206] calculate the attentive weights between the last item and all the items in the sequence, aggregate the item representations as the global preference, and incorporate it with local preference (i.e., the last item representation) as the overall preference. In this way, the overall preference relies heavily on the relevance of the last item to the user preference. Inspired by the superiority of the multi-layer self-attention strategy in sequence modeling, GC-SAN [
185] stacks multiple self-attention layers on top of the item representations generated by GNN to capture long-range dependencies.
In addition to leveraging the attention mechanism for sequence integration, sequential signals are explicitly incorporated into the integration process. For instance, NISER [
48] and GCE-GNN [
163] add the positional embeddings, which reflect the relative order of the items, to effectively obtain position-aware item representations. To balance the consecutive time and flexible transition pattern, FGNN [
115] employs the GRU with the attention mechanism to iteratively update the user preference with item representations in the sequence.
All the above works integrate the item representations within the user’s behavior sequence to generate the representation of sequential preference. Apart from these methods, DHCN [
182] and COTREC [
181] enrich the sequence graph by the session-to-session graph in the graph construction step. Therefore, they combine the sequential representation learned from the session-to-session graph and the one aggregated from items at this step.
4.4 Summary
This part briefly summarizes the reviewed works in terms of the three main issues.
\(\bullet\) Graph Construction. The most straightforward construction is to add edges between the two consecutive items. When the sequence length is short, utilizing additional sequences can enrich the sequence graph, and it would be preferable if the additional sequences are more similar to the original sequence. Another line is to adjust the graph structure of the behavior sequence. There is no accepted statement on which method is better. Moreover, incorporating the session-to-session graph into the sequence graph is also used to gain further improvements.
\(\bullet\) Information Propagation. Most of the propagation methods are variants of the propagation methods in traditional GNN frameworks, and there is no consensus on which method is better. Some complex propagation methods, such as LESSR [
19], achieve performance gain at the cost of more computation. Whether to adopt complex propagation methods in practice depends on the tradeoff between computation costs and performance gains.
\(\bullet\) Sequential Preference. To obtain the sequential preference, an attention mechanism is widely adopted to integrate the representations of items in the sequence. Beyond that, adding positional embeddings can enhance the relative order of the items and can bring a few improvements. Whether leveraging RNN structure can boost performance for all the sequential recommendation tasks requires further investigation.
Figure
6 summarizes the typical strategies for each of the main issues and lists the representative works accordingly.
5 Social Recommendation
With the emergence of online social networks, social recommender systems have been proposed to utilize each user’s local neighbors’ preferences to enhance user modeling [
43,
65,
103,
104,
172]. All these works assume users with social relationships should have similar representations based on the social influence theory that connected people would influence each other. Some of them directly use such relationship as a regularizer to constrain the final user representations [
65,
104,
105,
138], while others leverage such relationship as input to enhance the original user embeddings [
43,
103].
From the perspective of graph learning, the early works mentioned above can be seen as modeling the first-order neighbors of each user. However, in practice, a user might be influenced by her/his friends’ friends. Overlooking the high-order influence diffusion in previous works might lead to the suboptimal recommendation performance [
172]. Thanks to the ability to simulate how users are influenced by the recursive social diffusion process, GNN has become a popular choice to model the social information in recommendation.
To incorporate relationships among users into interaction behaviors by leveraging GNN, there are two main issues to deal with:
\(\bullet\) Influence of Friends. Do friends have equal influence? If not, how can one distinguish the influence of different friends?
\(\bullet\) Preference Integration. Users are involved in two types of relationships, i.e., social relationships with their friends and interactions with items. How can one integrate the user representations from the social influence perspective and interaction behavior?
5.1 Influence of Friends
Generally, a social graph only contains information about whether the users are friends, but the strengths of social ties are usually unknown. To propagate the information of friends, it is essential to decide the influence of friends. DiffNet [
172] treats the influence of friends equally by leveraging mean-pooling operation. However, the assumption of equal influence is not in accordance with the actual situation, and the influence of a user is unsuitable to be simply determined by the number of her/his friends. Indeed, users are more likely to be influenced by friends with strong social ties or similar preferences. Therefore, the attention mechanism is widely leveraged to differentiate the influence of neighbors [
2,
33,
110,
130,
171,
174]. For example, Song et al. [
130] propose DGRec, which dynamically infers the influence of neighbors based on their current interests. It first models dynamic users’ behaviors with a recurrent neural network and then acquires the social influence with a graph attention neural network. Compared to the mean-pooling operation, the attention mechanism boosts the overall performance, which further verifies the assumption that different friends have different influence power.
Moreover, a recent work, named ESRF [
191], argues that social relations are not always reliable. The unreliability of social information lies in two aspects: on the one hand, the users with explicit social connections might have no influence power; on the other hand, the obtained social relationships might be incomplete. Considering that indiscriminately incorporating unreliable social relationships into recommendation may lead to poor performance, ESRP leverages the autoencoder mechanism to modify the observed social relationships by filtering irrelevant relationships and investigating the new neighbors. Similarly, DiffNetLG [
129] involves implicit local influence to predict the unobserved social relationship and then utilizes both explicit and implicit social relations to make recommendations.
5.2 Preference Integration
Users in social recommendation are involved in two types of relationships: one is the user-item interactions and the other is the social graph. To enhance the user preference representation by leveraging social information, there are two strategies for combining the information from these two networks:
•
To learn the user representation from these two networks respectively [
33,
172,
174] and then integrate them into the final preference vector, as illustrated in Figure
7(a)
•
To combine the two networks into one unified network [
171] and apply GNN to propagate information, as illustrated in Figure
7(b)
The advantage of the first strategy lies in two aspects: on the one hand, we can differentiate the depth of the diffusion process of two networks since they are treated separately; on the other hand, any advanced method for a user-item bipartite graph can be directly applied, and for social network, a homogeneous graph, GNN techniques are extremely suitable for simulating the influence process since they are originally proposed for homogeneous graphs. As for the integration of the user representations learned from two relationships, there are two main mechanisms, i.e., linearity combination and non-linearity combination. Among the linearity combination, DiffNet [
172] treats the user representations from two spaces equally and combines them with a sum-pooling operation. Instead of an equal-weight combination, DANSER [
174] dynamically allocates weights according to the user-item paired features. Among the non-linearity combination, multi-layer perceptrons over the concatenated vector are widely adopted to enhance the feature interactions [
33,
47,
186].
The advantage of integrating the two graphs into one unified network is that both the higher-order social influence diffusion in the social network and interest diffusion in the user-item bipartite graph can be simulated in a unified model, and these two kinds of information simultaneously reflect users’ preferences. DiffNet++ [
171] designs a two-level attention network to update user nodes at each layer. Specifically, it first aggregates the information of neighbors in the bipartite graph (i.e., interacted items) and social network (i.e., friends) by utilizing the GAT mechanism, respectively. Considering that different users may have different preferences in balancing these two relationships, it further leverages another attention network to fuse the two hidden states of neighbors. Similarly, SEFrame [
20] utilizes a heterogeneous graph network to fuse the knowledge from social relationships and user-item interactions and item transitions from the heterogeneous graph, and employs a two-level attention network for propagation. Up till now, there is no evidence to show which strategy always achieves better performance.
5.3 Summary
Corresponding to the discussion at the beginning of this section, we briefly summarize the current works in terms of the two issues:
\(\bullet\) Influence of Friends. Compared to assigning equal weights to friends, differentiating the influence of different friends is more appropriate. An emerging direction is to automatically modify the social relationship, which can benefit from the presence of noise in social networks.
\(\bullet\) Preference Integration. The strategies for combining the two sources of information depend on whether to consider the two graphs separately or unify them into one graph. For the separate graphs, user preference is an integration of the overall representations learned from these two graphs. For the unified graph, a commonly adopted strategy is the hierarchical aggregation schema.
Figure
8 summarizes the typical strategies for each of the main issues and lists the representative works accordingly.
6 Knowledge-graph-based Recommendation
A social network that reflects relationships between users is utilized to enhance user representation, while a knowledge graph that expresses relationships between items through attributes is leveraged to enhance the item representation. Incorporating a knowledge graph into recommendation can bring two-facet benefits [
144]: (1) the rich semantic relations among items in a knowledge graph can help explore their connections and improve the item representation, and (2) a knowledge graph connects a user’s historically interacted-with items and recommended items, which enhances the interpretability of the results [
189].
Despite the above benefits, utilizing a knowledge graph in recommendation is rather challenging due to its complex graph structure, i.e., multi-type entities and multi-type relations. Previous works preprocess a knowledge graph by
knowledge graph embedding (KGE) methods to learn the embeddings of entities and relations, such as [
26,
146,
196,
202]. The limitation of commonly used KGE methods is that they focus on modeling rigorous semantic relatedness with the transition constraints, which are more suitable for the tasks related to graphs, such as link prediction rather than recommendation [
145]. Meta-path-based methods manually define meta-paths that carry the high-order information and feed them into a predictive model, and thus they require domain knowledge and are rather labor-intensive for complicated a knowledge graph [
147,
154].
Given the user-item interaction information as well as the knowledge graph, the knowledge-graph-based recommendation seeks to take full advantage of the rich information in the knowledge graph, which can help to estimate the users’ preferences for items by explicitly capturing relatedness between items. For the effectiveness of knowledge-graph-based recommendation, there are two main issues to deal with:
\(\bullet\) Graph Construction. How can one effectively integrate the collaborative signals from user-item interactions and the semantic information from the knowledge graph? Should the user nodes explicitly be incorporated into the knowledge graph or the user nodes be implicitly used to distinguish the importance of different relations?
\(\bullet\) Relation-aware Aggregation. One characteristic of a knowledge graph is that it has multiple types of relations between entities. How can one design a relation-aware aggregation function to aggregate information from linked entities?
6.1 Graph Construction
For the stage of graph construction, one main concern is how to effectively integrate the collaborative signals and knowledge information.
One direction is to incorporate the user nodes into the knowledge graph. For instance, KGAT [
154], MKGAT [
134], and CKAN [
162] combine the user-item bipartite graph and knowledge graph into one unified graph by taking the user nodes as one type of entity and the relation between users and items as “interaction.” Recent efforts focus on the entities and relations relevant to the user-item pair. Therefore, they construct the subgraph that links the user-item pair with the user’s historical interacted-with items and the related semantics in the knowledge graph [
35,
127]. Based on the assumption that a shorter path between two nodes reflects more reliable connections, AKGE [
127] constructs the subgraph by the following steps: pre-train the embeddings of entities in the knowledge graph by TransR [
93], calculate the pairwise Euclidean distance between two linked entities, and keep the
\(K\) paths with the shortest distance between the target user and item node. The potential limitation is that the subgraph structure depends on the pre-trained entity embeddings and the definition of distance measurement. ATBRG [
35] exhaustively searches the multi-layer entity neighbors for the target item and the items from the user’s historical behaviors and restores the paths connecting the user behaviors and the target item by multiple overlapped entities. In order to emphasize the information-intensive entities, ATBRG further prunes the entities with a single link, which can also help control the scale of the graph. Although these methods can obtain subgraphs more relevant to the user-item pair, it is quite time-consuming to pre-train the entity embedding or search and prune paths exhaustively. An effective and efficient subgraph construction strategy is worthy of further investigation.
Another direction is to implicitly use the user nodes to distinguish the importance of different relations. For instance, KGCN [
147] and KGNN-LS [
145] take the user nodes as queries to assign weights to different relations. In terms of graph construction, this line of research emphasizes the users’ preferences toward relations instead of the collaborative signal in user-item interactions.
6.2 Relation-aware Aggregation
To fully capture the semantic information in a knowledge graph, both the linked entities (i.e., \(e_i, e_j\) ) and the relations in between (i.e., \(r_{e_i, e_j}\) ) should be taken into consideration during the propagation process. Besides, from the perspective of recommender systems, the role of users might also have an influence. Owing to the advantage of GAT in adaptively assigning weights based on the connected nodes, most of the existing works apply the variants of the traditional GAT over the knowledge graph; i.e., the central node is updated by the weighted average of the linked entities, and the weights are assigned according to the score function, denoted as \(a(e_i, e_j, r_{e_i, e_j}, u)\) . The key challenge is to design a reasonable and effective score function.
For the works [
35,
134,
154] that regard the user nodes as one type of entity, the users’ preferences are expected to be spilled over to the entities in the knowledge graph during the propagation process since the item nodes would be updated with the information of interacted users and related attributes, and then the other entities would contain users’ preferences with iterative diffusion. Therefore, these works do not explicitly model users’ interests in relations but differentiate the influence of entities by the connected nodes and their relations. For instance, inspired by the transition relationship in a knowledge graph, KGAT [
154] assigns the weight according to the distance between the linked entities in the relation space:
where
\(\mathbf {W}_{r}\) is the transformation matrix for the relation, which maps the entity into relation space. In this way, the closer entities would pass more information to the central node. These methods are more appropriate for the constructed subgraph containing user nodes, since it is difficult for the users’ interests to extend to all the related entities by stacking a limited number of GNN layers.
For the works that do not combine the two sources of graphs, these studies [
145,
147] explicitly characterize users’ interests in relations by assigning weights according to the connecting relation and specific user. For example, the score function adopted by KGCN [
147] is the dot product of the user embedding and the relation embedding, i.e.,:
In this way, the entities whose relations are more consistent with users’ interests will spread more information to the central node.
6.3 Summary
Corresponding to the discussion at the beginning of this section, we briefly summarize the current works in terms of the two issues:
\(\bullet\) Graph Construction. Existing works either consider the user nodes as one type of entity or implicitly use the user nodes to differentiate the relations. The first direction can be further divided into the overall unified graph or the specific subgraph for the user-item pair. Compared to the overall unified graph, the user-item subgraph has the advantage of focusing on more related entities and relations, but it requires more computation time and the performance depends on the construction of the subgraph, which still requires further investigation.
\(\bullet\) Relation-aware Aggregation. The variants of GAT are widely leveraged to aggregate information from linked entities, taking into account the relations. For the graphs that do not explicitly incorporate user nodes, user representations are utilized to assign weights to the relations.
Figure
9 summarizes the typical strategies for each of the main issues and lists the representative works accordingly.
7 Other Tasks
In addition to these four types of tasks, researchers have started to utilize GNN for improving the performance of other recommendation tasks, such as POI recommendation and multimedia recommendation. In this section, we will summarize the recent developments for each task respectively.
POI recommendation plays a key role in location-based service, which utilizes the geographical information to capture geographical influence among POIs and users’ historical check-ins to model the transition patterns. In the field of POI recommendation, there are several kinds of graph data, such as the user-POI bipartite graph, the sequence graph based on check-ins, and the geographical graph; i.e., the POIs within a certain distance are connected and the edge weights depend on the distance between POIs [
10,
92]. SGRec [
88] enriches the check-in sequence with the correlated POIs belonging to other check-ins, which allows collaborative signals to be propagated across sequences. Chang et al. [
10] believe that the more often users consecutively visited the two POIs, the greater the geographical influence between these two POIs. Hence, the check-ins not only reflect users’ dynamic preferences but also indicate the geographical influence among POIs. To explicitly incorporate the information of geographical distribution among POIs, the edge weights in the sequence graph depend on the distance between POIs [
10].
Group recommendation aims to suggest items to a group of users instead of an individual one [
59] based on their historical behaviors. There exist three types of relationships: user-item, where each user interacts with several items; user-group, where a group consists of several users; and group-item, where a group of users all choose the same item. “Group” can be regarded as a bridge connecting the users and the items in the group recommendation, which can either be treated as a part of the graph or not. Here are two representative works corresponding to these two strategies respectively. GAME [
59] introduces the “group node” in the graph and applies the GAT to assign appropriate weights to each interacted neighbor. With the propagation diffusion, group representation can be iteratively updated with interacted items and users. However, this approach cannot be directly applied to the task where groups are changed dynamically and new groups are constantly formed. Different from the former transductive method, GLS-GRL [
153] learns the group representative in an inductive way, which constructs the corresponding graph for each group specifically. The group representation is generated by integrating the user representations involved in the group, which can address the new group problem.
Bundle recommendation aims to recommend a set of items as a whole for a user. There are three types of relationships: user-item, where each user interacts with several items; user-bundle, where users choose the bundles; and bundle-item, where a bundle consists of several items. For group recommendation, “group” is made up of users; for bundle recommendation, “group” means a set of items. Analogously, the key challenge is to obtain the bundle representation. BGCN [
11] unifies the three relationships into one graph and designs the item level and bundle level propagation from the users’ perspective. HFGN [
87] considers the bundle as the bridge through which users interact with the items through bundles. Correspondingly, it constructs a hierarchical structure upon user-bundle interactions and bundle-item mappings and further captures the item-item interactions within a bundle.
Click-through rate (CTR) prediction is an essential task for recommender systems in large-scale industrial applications, which predicts the click rate based on the multi-type features. The key challenges of CTR are to model feature interactions and capture user interests. Inspired by the information diffusion process of GNN, a recent work, Fi-GNN [
90], employs GNN to capture the high-order interactions among features. Specifically, it constructs a feature graph, where each node corresponds to a feature field and different fields are connected with each other through edges. Hence, the task of feature interactions is converted to propagate node information across the graph. Despite its considerable performance, Fi-GNN ignores the collaborative signals implicit in user behaviors. DG-ENN [
46] designs both the attribute graph and user-item collaborative graph and utilizes GNN techniques to capture the high-order feature interactions and collaborative signals. To further alleviate the sparsity issue of the user-item interactions, DG-ENN enriches the original user-item interaction relationships with user-user similarity relationships and item-item transitions.
Multimedia recommendation has been a core service to help users identify multimedia contents of interest. The main characteristic is that the contents are in multi-modality, e.g., text, images, and videos. Recently, researchers have started to adopt GNN to capture the collaborative signals from users’ interactions with multi-modal contents. For instance, MMGCN [
165] constructs a user-item bipartite graph for each modality and applies GNN to propagate information for each graph, respectively. The overall user/item representations are the sum of the user/item representations of different modalities. GRCN [
164] utilizes the multi-modal contents to refine the connectivity of user-item interactions. For each propagation layer, GRCN takes the maximum value of the user-item similarities in different modalities as the weight of the user-item interaction edges and uses the corresponding weights to aggregate neighbors. MKGAT [
134] unifies the user nodes and multi-modal knowledge graph into one graph and employs a relation-aware graph attention network to propagate information. Considering the multi-modal characteristic of entities, MKGAT designs the entity encoder to map each specific data type into a condensed vector.