1 Introduction
Recommender systems can assist users to quickly find the desired information from a large amount of data in many application scenarios, such as book recommendations on Amazon, paper recommendations on AMiner, and video recommendations on YouTube. However, the inherent data sparsity issue still seriously hinders performance improvement of existing recommendation approaches [
1]. Therefore, cross-domain recommendations (CDRs) provide another promising solution [
2,
3], which can leverage preference knowledge in the source domain to assist in improving entity recommendation performance in the target domain. Despite many research efforts devoted to developing CDRs, most existing approaches [
4,
5] consider interactive rating records based only on overlapping entities, which makes it difficult to substantially improve the accuracy of recommendations.
A knowledge graph (KG) is a heterogeneous information network [
6] that can capture structured semantic information and contextual preference features between nodes, where nodes correspond to entities, items, and so forth, and edges correspond to relations. Essentially, knowledge graph embedding (KGE) can expose user preference features to some extent and yield satisfactory representations for network training [
7]. Therefore, collaborative knowledge graphs with rich preference information in some of the literature [
8,
9] have been demonstrated to effectively improve recommendation performance.
Currently, most KG-based recommendations are mainly classified into two categories: path-based schemes and embedding-based schemes. Path-based schemes can extract higher-order connectivity information of nodes carrying rich preference signals, which often requires manual design of traversal principles. While these approaches contribute to the interpretability of recommendations, their performance relies heavily on domain knowledge, resulting in less reliable scalability. Recently, motivated by the development of deep training networks, embedding-based schemes that aggregate the feature embeddings for entities in knowledge graphs have been proposed to improve recommendation performance in an end-to-end manner. However, these schemes still have too many parameter dependencies and are limited to single-domain recommendations [
10], which cannot be applied well to cross-domain scenarios.
To tackle this challenge, in this artic le, we attempt to expose preference signals from entity interactions in the collaborative knowledge graph to learn similar preference features within and across domains for target entities, which can enhance the recommendation performance of the target domain. Figure
1 illustrates our cross-domain recommendation between music and movie domains with the collaborative knowledge graph. Suppose that for a given overlapping user
\({u}_3\) in the movie domain B, our recommender system needs to recommend movies that the user may like. Since
\({u}_3\) also belongs to users in the music domain A, according to a known path
\({u}_3 \to {i}_2 \to {e}_2 \to {i}_3\) in the collaborative knowledge graph, the system judges that
\({u}_3\) is likely to prefer
\({i}_3\). Then, based on the preferences of overlapping entities across domains, we can generate final predictions for the target entity
\({u}_3\). Similarly, for those non-overlapping entity preferences, we can use overlapping entities as bridging nodes to connect the two domains to make preference predictions.
Therefore, in this article, we propose a novel preference-aware graph attentive model with a collaborative knowledge graph for cross-domain recommendations that is equipped with knowledge graphs to capture entity semantic information and user potential long-distance preferences across domains. Specifically, we leverage a trainable and personalized graph representation scheme to transform entities or items into preference-aware embeddings. Considering the limitations of existing solutions, a preference-aware graph attention network is then devised to aggregate preference features of similar entities within domains and across domains. Finally, the fused entity features with rich contextual preference information are obtained, and the cross-domain Bayesian personalized ranking is proposed to generate predictive results for different cross-domain recommendation scenarios. In addition, experimental results also show that our proposed Preference-aware Graph Attention network model with Collaborative Knowledge Graph (PGACKG) achieves significant gains over state-of-the-art approaches in recommendation accuracy.
Our work makes the following main contributions:
•
We propose the PGACKG, a new cross-domain recommendation framework based on graph attention networks, which can leverage similar entity preferences across domains to improve prediction performance in target domains. To the best of our knowledge, this is the first attempt to apply knowledge graphs to cross-domain recommender systems to enhance their performance.
•
We propose to learn preference-aware embeddings for network training by exploiting a graph representation model. Some collaborative preference signals are exposed to learn entities with different preferences in the collaborative knowledge graph.
•
We propose a preference-aware graph attention network model to aggregate the preferences from similar entities within domains and across domains. Intuitively, we employ a multi-hop reasoning process to extract entity preference features within domains, and develop a node random walking scheme to model preference features across domains.
•
We conduct extensive experiments on several real datasets to demonstrate the effectiveness of the PGACKG. Experimental results indicate that our proposed PGACKG consistently outperforms state-of-the-art baseline recommendation approaches. Meanwhile, the data sparsity issue is greatly alleviated to improve recommendation performance in the target domain.
The remainder of this article is organized as follows. Section
2 reviews some latest works relevant to our research topic. Section
3 introduces the formulation of the cross-domain recommendation problem studies in this article. Section
4 presents our proposed PGACKG model in detail. Extensive experiments performed on four real-world datasets demonstrate the effectiveness of our approach in Section
5. Our conclusions and future research directions are summarized in Section
6.
3 Problem Formulations
In those typical CDR scenarios, it is assumed that there is a dense source domain \(S\) and a sparse target domain \(T\). Generally, each domain has a set of \(M\) users \(\ U = \{ {{u}_1,{u}_2, \ldots {u}_m} \}\), a set of \(N\) items \(I = \{ {{i}_1,{i}_2, \ldots {i}_n} \}\), and an interaction rating matrix \(R \in {R}^{M \times N}\). \({r}_{mn} = 1\) indicates some historical preference behavior (browsing, liking, forwarding, etc.) of user \({u}_m\) on item \({i}_n\); otherwise, \({r}_{mn} = 0\). Additionally, the existence of some shared entities (users or items) in the source domain and the target domain is a prerequisite for CDRs. For convenience, we assume that the two domains have some overlapping users in this article.
Collaborative Knowledge Graph (CKG). Inspired by previous work [
55], a collaborative KG is defined to facilitate our work, which can encode and project user behaviors and item information into a multi-relational graph. The CKG not only reflects the preference interaction between users and items in the recommendation system, but also exposes the relationship between entities in the real world. Normally, it is composed of entity-relation-entity triples
\(( h,r,t )\), where
\(h \in E,r \in \Omega ,\ t \in E\) denote the head entity, relation, tail entity of a knowledge triple, and
\(E\) and
\(\Omega\), respectively, represent the set of entities and relationships in the C KG. For example, the triplet
\(( {Vin\ Diesel,\ ActorOf,\ Fast\ \& \ Furious\ 9} )\) presents the fact that Vin Diesel is an actor of the movie “Fast & Furious 9.” Items in the recommender system are regarded as a special type of entity in the CKG, that is,
\(i\) belongs to
\(E\).
Problem Description. Given a dense source domain interaction matrix
\(X\), a sparse target domain interaction matrix
\(Y\) and the CKG
\({G}_c\), we aim to predict whether user
\(u\) in the target domain
\(Y\) has potential interest with which the user has had no interaction before. In other words, the problem to be solved in this article is to predict the probability that
\(u\) in the target domain
\(Y\) would adopt item
i based on the known rating matrices in the source and target domains. We list the main symbols used in this article in Table
1.
5 Experiments
In this section, we conduct empirical experiments on real datasets to verify the effectiveness of our proposed approach and answer the following research questions:
RQ1: Does our proposed PGACKG method achieve better performances on different datasets than existing advanced baseline recommendations?
RQ2: How does the PGACKG perform over different user groups with different interaction sparsity levels?
RQ3: How do different parameter settings (e.g., Model Depth, Iteration Times for Layers, Dropout, Random Walks) affect the proposed PGACKG?
RQ4: How do different variants or key components affect the PGACKG?
RQ5: How does the PGACKG perform in visualization experiments?
5.1 Dataset Descriptions
To effectively verify the recommendation performance of our proposed PGACKG approach and some baseline methods, four public real-world datasets are adopted as our experimental datasets: three Amazon datasets
1 (AmazonMusic, AmazonMovie, AmazonBook) and the Book-Crossing dataset.
2 For these datasets, we keep only those items that have more than 10 interactions with the users to ensure that the model has enough dense entities for training. This article mainly focuses on how CKGs can improve the prediction accuracy of cross-domain recommendations, rather than how to construct knowledge maps. Thus, Microsoft Satori,
3 a widely used structured tool, is adopted to directly construct a CKG for each dataset.
\(☆ \) Book-Crossing is an online book community. Its original dataset can provide over 1 million ratings (explicit/implicit) about more than 270,000 books. The constructed CKG contains 21,341 entities, 14 relations, and 52,832 CKG triples.
\(☆ \) Amazon is one of the largest online e-commerce companies in the United States, where users are free to rate products from 1 to 5. In the experiment, we use data from three domains to verify recommendation performance: books, music, and movies. The statistics for these datasets are shown in Table
2.
5.2 Experimental Settings
5.2.1 Setup and Metrics.
Based on partially overlapping user or item entities, three cross-domain recommendation tasks in different scenarios are designed to more comprehensively verify the validity of our proposed approach. Their detailed tasks are described here.
✓
Task 1: Book-Crossing (BCr) versus AmazonBook (ABo); there are some overlapping item entities.
✓
Task 2: AmazonBook (ABo) versus AmazonMovie (AMo); there are some overlapping user entities.
✓
Task 3: AmazonMovie (AMo) versus AmazonMusic (AMu); there are some overlapping user entities.
For task 1, when BCr is used as auxiliary data, ABo is treated as the target data and vice versa. Our proposed PGACKG framework and those baseline approaches are implemented on TensorFlow. For fair comparison, we optimize the parameters of our proposed model and those of the baselines in their original articles. For example, the embedding size is fixed to 64 as suggested in [
62], and all models are optimized by exploiting the Adam optimizer. According to the grid search, the learning rate is tuned in {0.00001, 0.0001, 0.001, 0.01}, the regularization parameter is searched in {0.001, 0.002, 0.004, 0.008, 0.01}, and the mini-batch size is set to 1024. To better model preference signals encoded in multi-order connectivity and reduce some incidental noise, the node depth of the proposed PGACKG is controlled at 2. Furthermore, for each dataset, 70% of entity interaction records are randomly selected for model training, while the two remaining 15% of entity interaction records are used for parameter fine-tuning and testing data, respectively.
To effectively observe Top-K performance and preference ranking, two commonly used ranking indicators are adopted to evaluate recommendation quality: Hit Ratio (HR@K) and Normalized Discounted Cumulative Gain (NDCG@K) [
65]. The HR@K measures whether the predicted item is in the Top-K list, whereas the NDCG@K measures the ranking quality of the hit position. As we all know, higher values of these metrics reveal better ranking results. To obtain a fairer ranking result, the evaluation protocols adopted in our experiments are the full-ranking protocols during testing. This is because although the sampled protocols can speed up the computation of ranking metrics during testing, the model generally only receives biased results, which can hardly reflect the real tendency of recommendation performance, as mentioned in the literature [
66]. Therefore, the full-ranking protocols are adopted to evaluate the recommendation quality of the proposed PGACKG and those baseline approaches during testing.
5.2.2 Baseline Methods.
To demonstrate the effectiveness, we compare our proposed PGACKG with the following7 baseline approaches in two groups: Single-Domain Recommendations (NGCF, KGAT, and CGAT) and Cross-Domain recommendations (KerKT, SemStim, GFM, and CD-GNN). All baseline models are representative or advanced methods in terms of KG or graph structure enabling recommendations. Their detailed descriptions are as follows.
NGCF [
62]: This approach can leverage graph structure to project collaborative signals to an embedding propagation process based on a user-item KG, and a sufficient entity-embedding mechanism can be generated to promote a collaborative filtering recommendation.
KGAT [
55]: This is a recommendation scheme that combines matrix factorization and KG techniques, which can model high-order connectivities in the KG and recursively refine the node embeddings by employing a graph attention network.
CGAT [
67]: It is a state-of-the-art single-domain recommendation framework in which both local and non-local graph contexts are captured simultaneously by exploiting graph attention networks in the item KG.
KerKT [
42]: This is a representative cross-domain recommendation solution. In this solution, diffusion kernel completion is used to associate the source and target domain knowledge to improve the accuracy of rating prediction based on overlapping entities.
SemStim [
68]: This approach exploits semantic links generated in a KG (e.g., DBpedia) to assist the target domain in making cross-domain recommendations with an unsupervised graph algorithm.
GFM [
69]: This model applies graph factorization machines on the KG structure to compute entity embeddings by propagating and aggregating multi-order interactions from the neighborhood in the source and target domains.
CD-GNN [
70]: This framework adopts a graph neural network encoder to capture more representations for inactive entities and relations; the preference features of multi-order neighbors in the KG are considered for cross-domain recommendations.
5.3 Performance Comparison (for RQ1)
To answer RQ1, we observe the average experimental results and compare the performance of our proposed PGACKG model with those of other baselines. The top number of full-ranking item lists for these approaches is set to 5, 15, and 30, respectively. Note that for single-domain recommendation models (NGCF, KGAT, and CGAT), we train them and report their results on each domain. Tables
3 to
5 show the performance of all methods for the three tasks, followed by a summary of our observations.
•
\(☆ \) The CGAT outperforms other SDR models in all datasets. For example, it improves over the NGCF with regard to NDCG@5,15,30 by 7.61%, 7.23%, 9.17%, when Book-Crossing is used as the target domain in task 1. By contrast, the PGACKG outperforms other CDR models in all datasets. For example, it improves over the KerKT with regard to HR@5,15,30 by 9.90%, 10.77%, 11.26%, when AmazonBook is used as the target domain in task 1. The reason for these phenomena may be that the graph attention mechanism can better assist the models to obtain the context preference of the target entities in the KG, which can significantly improve the prediction performance of the recommendation system.
•
\(☆ \) The CDR approaches (KerKT, SemStim, GFM, CD-GNN, and PGACKG) yield better performance than the SDR methods (NGCF, KGAT and CGAT) in three tasks. Further experimental results show that, compared with the pure KG-free solution (KerKT), other cross-domain models based on KG or graph structure achieve better predictive performance than the pure cross-domain, especially our proposed PGACKG. Furthermore, our proposed PGACKG model exploits preference-aware entity embeddings within domains and across domains to more accurately capture user preference features, whereas other baseline models employ only aligned entity embeddings. The reason might be that other baselines fail to fully explore the preference information of entities and our proposed approach can better capture preference signals in the collaborative KG based on attentive preference generation mechanisms within domains and across domains.
•
\(☆ \) The PGACKG performs consistently better than other baseline approaches in all three tasks. Our PGACKG improves over the best-performing baseline (with results marked in Tables
3 to
5) with regard to HR@5, 15, 30 by 6.24%, 7.71%, 7.88%; NDCG@5, 15, 30 by 13.85%, 10.19%, 10.57%, when the AmazonMusic is used as the target domain in task 3. By stacking multiple preference embedding layers, the PGACKG is capable of capturing the preference features of higher-order entity nodes in the collaborative KG for recommendations, while the CD-GNN explores the preference of only first-order neighbors to guide the embedding learning. Moreover, compared with other baselines, the PGACKG also considers preference-aware graph attention layers to encode more preference signals into the embedding network for training. Therefore, we naturally conclude that our proposed PGACKG outperforms all baseline models in terms of cross-domain rating predictions.
5.4 Interaction Sparsity Study (for RQ2)
The advantage of incorporating the collaborative KG into cross-domain recommendation is that it can help alleviate the data sparsity issue in the target domain, which always limits the improvement of recommendation performance. In scenarios with insufficient interactions, models have difficulty learning strong representations for item predictions. Therefore, we observe the performance of the proposed PGACKG and three other cross-domain baseline approaches with KG (SemStim, GFM, CD-GNN) in alleviating the data sparsity issue.
To this end, we perform experiments over user groups with different sparsity levels and further redefine three specific tasks in the experimental setting section. BCr, ABo, and AMo in the three tasks are sequentially used as the target domains; the other three datasets are used as the source domains. Each target domain is divided into four groups {<10, [10, 40), [40, 80), >=80} based on interaction number per user in the test set, while keeping the interaction number in the source domain unchanged. Figure
3 exhibits the results measured by NDCG@15 on different user groups under the three tasks.
The PGACKG usually outperforms other KG-based cross-domain baseline models, especially on fairly sparse user groups in all three tasks. These experimental results demonstrate the effectiveness of the PGACKG in alleviating the data sparsity issue. The potential reason is that the proposed PGACKG can not only aggregate the preference features of similar entities within domains but also aggregate the preference features of similar entities across domains to model the preferences of target users, which can adequately leverage the rich knowledge from the source domain and improve the prediction performance for target users.
5.5 Study of PGACKG (for RQ3)
To answer RQ3 and observe the effect of different parameter settings on our proposed PGACKG approach, we investigate its impact on recommendation performance. First, we explore the effect of model depth on network training. Then, we analyze the influence of iteration times for layers on prediction performance. Finally, we study the influence of node dropout.
5.5.1 Effect of Model Depth.
To investigate whether the PGACKG can benefit from the preference embedding transformation, the model depth L is changed from 1 to 4. BCr, ABo, and AMo in three tasks are sequentially used as the target domains; the other three datasets are used as the source domains. Table
6 shows the experimental results from all three tasks; some conclusions are summarized as follows.
As the model depth increases, our PGACKG can continuously enhance the performance of recommendations. Obviously, the PGACKG-3 and PGACKG-2 consistently perform better than the PGACKG-1 across all three tasks. This may be because multi-order neighbors across domains can capture more entity preference signals for training than only first-order neighbors.
In addition, we can find that the PGACKG-4 achieves marginal improvements only by further stacking one more layer over the PGACKG-3, and it appears to be overfitting in task 2. These results indicate that preference collaborative signals can be adequately captured by considering third-order entity relations, which is consistent with some findings in [
71].
5.5.2 Effect of Iteration Times for Layers.
To further investigate how the number of iterations affects recommendation performance, we set K to 15 and use AmazonBook as the target domain in task 1. Since the performance trends of other tasks are similar to those of task 1, Figure
4 presents only the changing trend of HR@15 and NDCG@15with different iteration times in task 1. From Figure
4, we have the following observations.
Before the smoothness, more iterations consistently yield better performance, and the proposed PGACKG approach also continues to gain better convergence properties accordingly. The performance of our proposed model becomes gradually stable when the number of iterations is 6. Such an observation demonstrates the efficiency of performing preference-aware embedding propagation and the better model capacity of the PGACKG.
We can also see from Figure
4 and Table
6 that the effect of the number of iterations on the proposed PGACKG approach is slightly smaller compared with the model depth in the previous subsection. This may be that multiple epochs of network training result in our proposed model having a smaller and faster iterative convergence.
5.5.3 Effect of Dropout.
To prevent overfitting, message dropout and node dropout techniques are adopted to train our proposed model along with the previous work [
62]. For our three tasks, the experimental performance is averaged to observe the influence of message dropout ratio and node dropout ratio on our proposed PGACKG approach, as shown in Figure
5.
For both dropout strategies, Figure
5 shows that node dropout achieves better performance than message dropout in most cases. For example, setting the Dropout Ratio (DR) to 0.2 yields the highest HR@15 of 0.377, which is 3.01% higher than that of message dropout. One possible reason is that dropping out the drifting noise from the high-order propagation process makes the preference-aware embedding capture more preference signals for recommendations. Hence, it can be seen that node dropout is a more effective strategy than message dropout to solve the overfitting in GNNs.
5.5.4 Effect of Random Walks.
The proposed PGACKG approach employs the random walk algorithm to aggregate similar nodes across domains in a collaborative KG. To investigate its impact on our proposed method, we observe and analyze the walking length L and the sampling number N for random walks. Tables
7 and
8 exhibit the performance of the PGACKG with respect to different settings of L and N in random walk exploration. NDCG@15 is adopted to evaluate the performance of our proposed approach on three tasks.
As can be seen from Table
7, better performance can be achieved by setting the walking length
\({\rm{L}}\) around 16. Further increasing L introduces more noise and training complexity, resulting in lower model performance. In addition, it can be seen from the results in Table
8 that our approach achieves the best performance when performing 24 random walk samplings. This indicates that the most similar nodes across domains can be captured by setting the sampling number
\({\rm{N}}\) to around 24 for random walks.
5.6 Ablation Study (for RQ4)
To investigate and expose the importance of each key components of PGACKG, we perform ablation studies to evaluate the performance of the following three variants. Each variant of our proposed approach is described here in detail.
PGACKG/GA deletes the collaborative KG and graph attention mechanism from PGACKG, and considers and aggregates only the preference features of similar neighbors of the target entity to make recommendations based on partially overlapping entities.
PGACKG/WP removes the attentive preference features within domains from the final fusion features, and considers only the attentive preference features across domains to generate cross-domain recommendations.
PGACKG/AP retains the attentive preference features within domains, which is contrary to PGACKG/WP, and deletes the attentive preference features across domains from the final fusion features to generate the final recommendation.
Additionally, we design three tasks to validate the comparative results of ablation studies. In the three tasks, BCr, ABo, and AMo are sequentially used as the target domain datasets, and ABo, AMo, and AMu are sequentially used as the source domain datasets. The number of interactions of each entity used in the target domain is no more than 5 to ensure data sparsity, whereas the number of interactions of each entity used in the source domain is more than 15 to ensure the richness of knowledge. The ablation results are shown in Table
9.
In all comparisons, PGACKG/GA is slightly better than PGACKG/WP, which achieves the worst performance. This indicates that exploiting only similar preferences across domains cannot capture the target entity's preference features well and is even inferior to some previous models that directly aggregate similar preference features.
Compared with PGACKG/GA and PGACKG/WP, PGACKG/AP has greatly improved the recommendation performance, indicating that the attentive preference features within domains are essential for recommendations. Additionally, our PGACKG outperforms PGACKG/AP, indicating that preference-aware embeddings within domains and across domains cooperate to enhance the quality of recommendations.
In summary, PGACKG is consistently superior to other variants on all tasks in terms of HR@15 and NDCG@15 metrics. Such results demonstrate that the collaborative KG can significantly facilitate the improvement of cross-domain recommendation accuracy, thus showing the effectiveness of the graph attention mechanism in our approach.
5.7 Visualization (for RQ5)
In this section, we attempt to investigate preference embedding in the target domain feature space to further analyze the reason why the PGACKG can improve the performance of the model. ABo is used as the source dataset and BCr is used as the target dataset. Four categories of users are randomly selected to implement visualization experiments, following the default setting of the t-SNE [
72,
73] in Scikit-learn. Additionally, the best cross-domain baseline CD-GNN is adopted to compare the performance of our proposed PGACKG.
As can be seen from Figure
6(a), different user embeddings in a CD-GNN do not have discernible boundaries and clusters, and the transformed embeddings are scattered across target domain feature space. However, in Figure
6(b), different user embeddings exhibit distinct clusters, which demonstrates that user embeddings transformed by the PGACKG can inject some preference signals into representation learning. The visual experiments also explain the fundamental reason why the PGACKG can achieve better performance.
6 Conclusion
In this work, we devise a novel cross-domain recommendation framework that can employ preference-aware graph attention networks in the CKG to explore high-order semantic preferences among entities, thereby improving cross-domain recommendation accuracy. The PGACKG leverages the graph embedding model to transform and obtain the preference-aware embeddings with rich semantic preference signals. A preference-aware graph attention network model is proposed to aggregate the preferences of similar entities within domains and across domains in the CKG via multi-hop reasoning and frequency visits–based node random walk model, respectively. We fuse entity preference features within domains and across domains to remodel user final preferences, and the CBPR is newly proposed to generate the cross-domain recommendations. Compared with state-of-the-art baselines, the superiority of the PGACKG has been verified by extensive experiments on four real-world datasets.
For future work, we would like to explore the PGACKG on more KG-based cross-domain recommendation scenarios. We also intend to develop different preference aggregation strategies to capture the dynamic interactions of entities in temporal KGs.