Dual intent view contrastive learning for knowledge aware recommender systems

Guo, Jianhua; Yin, Zhixiang; Feng, Shuyang; Yao, Donglin; Liu, Shaopeng

doi:10.1038/s41598-025-86416-x

Download PDF

Article
Open access
Published: 16 January 2025

Dual intent view contrastive learning for knowledge aware recommender systems

Jianhua Guo¹,
Zhixiang Yin¹,
Shuyang Feng¹,
Donglin Yao² &
â¦
Shaopeng Liu¹Â

Scientific Reports volumeÂ 15, ArticleÂ number:Â 2133 (2025) Cite this article

434 Accesses
Metrics details

Subjects

Abstract

Knowledge-aware recommendation systems often face challenges owing to sparse supervision signals and redundant entity relations, which can diminish the advantages of utilizing knowledge graphs for enhancing recommendation performance. To tackle these challenges, we propose a novel recommendation model named Dual-Intent-View Contrastive Learning network (DIVCL), inspired by recent advancements in contrastive and intent learning. DIVCL employs a dual-view representation learning approach using Graph Neural Networks (GNNs), consisting of two distinct views: a local view based on the user-item interaction graph and a global view based on the user-item-entity knowledge graph. To further enhance learning, a set of intents are integrated into each user-item interaction as a separate class of nodes, fulfilling three crucial roles in the GNN learning process: (1) providing fine-grained representations of user-item interaction features, (2) acting as evaluators for filtering relevant relations in the knowledge graph, and (3) participating in contrastive learning to strengthen the modelâs ability to handle sparse signals and redundant relations. Experimental results on three benchmark datasets demonstrate that DIVCL outperforms state-of-the-art models, showcasing its superior performance. The implementation is available at: https://github.com/yzxx667/DIVCL.

Enhanced knowledge graph recommendation algorithm based on multi-level contrastive learning

Article Open access 04 October 2024

Iterative heterogeneous graph learning for knowledge graph-based recommendation

Article Open access 28 April 2023

Graph neural network recommendation algorithm based on improved dual tower model

Article Open access 15 February 2024

Introduction

The rapid growth of the Internet has facilitated an explosion of information, exacerbating the issue of information overload. Recommender systems are critical in helping users discover items of interest across various platforms, such as E-commerce, online entertainment, online education, and social networks¹. Knowledge graphs (KGs) have demonstrated significant potential in enhancing both the accuracy and interpretability of recommendations. The rich entity and relation information in KGs not only uncover diverse relationships among items but also explain user preferences. Recently, knowledge-aware recommendation has garnered substantial research interest, with graph neural networks (GNNs) emerging as the dominant models in this domain^2,3,4,5,6.

GNN-based recommendation models employ an informative aggregation paradigm to integrate multi-hop neighbors into node representations, offering a robust mechanism for generating permutation-invariant aggregation on the neighbors of a node^3,4,7. However, these models commonly struggle with sparse supervision signals and redundant entity relations, which can limit the beneficial effects of KGs on recommendation performance.

Like traditional collaborative filtering, GNN-based models rely on abundant user-item interaction data to capture user preferences. Severe sparsity in interactions can lead to degeneration issues, such as collapsing node embedding distributions into narrow cones, which results in indistinct node representations^6,8,9,10. To address sparse supervision signals, several works adopt meta-learning strategies to learn shared prior knowledge across users and enhance model generalization^11,12,13. However, these approaches often fail to incorporate auxiliary data and are solely dependent on limited interaction data, increasing susceptibility to noise and reducing robustness in representing user preferences¹⁴. Other approaches incorporate ancillary data, such as user attributes^15,16 and social relationships^17,18. Yet, such data are often challenging to obtain owing to privacy concerns and streamlined registration processes.

Contrastive learning, a recent self-supervised learning technique, addresses the issue of sparse supervision signals by learning discriminative embeddings from unlabelled data, maximizing the distance between negative samples while minimizing it for positive ones^6,8,10. Another promising approach is intent learning, which enhances user-item connections by inserting intermediate nodes into interactions to uncover underlying intents^7,19.

In contrast, the problem of redundant entity relations has received significantly less attention than the issue of sparse supervision signals. Redundant entity relations refer to those within KGs that contribute little to feature extraction for users or items, and may even introduce noise during GNN aggregation. These relations are typically identified based on long-tail distributions and subsequently removed or merged via clustering techniques⁵.

Despite their successes, existing approaches for tackling sparse supervision signals and redundant entity relations exhibit several limitations: (1) Most approaches only address one type of problem, limiting overall improvements in recommendation accuracy. (2) Many employ multi-view contrastive learning, resulting in substantial computational overhead. To this end, we propose a Dual-Intent-View Contrastive Learning (DIVCL) framework for knowledge-aware recommender systems. DIVCL simultaneously addresses both the challenges by combining contrastive and intent learning while reducing computational complexity. The main contributions of this study are as follows:

(1) Entity Relation Filtering Based on Intent Relevance: We propose an entity relation filtering strategy grounded in intent relevance. Intents, which are shared across user-item interactions, are employed as metrics to remove redundant relations and provide fine-grained representations of interactions. This approach jointly alleviates the issues of sparse supervision signals and redundant entity relations within the KG structure.

(2) Dual-View Representation Learning: DIVCL captures user preferences using dual-view GNN-based representation learning. The local view focuses on the user-item interaction graph (IG), while the global view considers the user-item-entity graph (IGâ+âKG). This dual-view approach enhances user and item representations through both intent and contrastive learning. Furthermore, DIVCL is computationally lighter compared to existing multi-view contrastive learning models.

(3) Comprehensive Experimental Evaluation: We conduct extensive experiments on three benchmark datasets. Notably, we introduce a new dataset, the Fabric Mall dataset. Experimental results demonstrate that DIVCL achieves superior performance compared to state-of-the-art approaches, validating the effectiveness of our model.

The remainder of this paper is organized as follows: Sect. 2 discusses related work. Section 3 formulates the problem. The methodology is presented in Sect. 4. Section 5 provides an interpretation of experimental results, and conclusions are drawn in Sect. 6.

Related work

Knowledge-aware recommendation approaches

Knowledge-aware recommendation has evolved from embedding-based and path-based to GNN-based²⁰. Embedding-based approaches utilize graph embedding models such as TransE²¹, TransH²², TransD²³ and TransR²⁴ to preprocess KGs, incorporating learned entity and relation embeddings into recommendation tasks^6,25,26. However, these approaches typically achieve low recommendation accuracy, as they emphasize semantic relatedness over user preference. COMET²⁷ improves recommendation accuracy through simultaneously modelling the high-order interaction patterns among historical interactions and embedding dimensions. Path-based approaches, on the other hand, explore various connection patterns among items within KGs to provide additional guidance for recommendations^28,29. These approaches rely heavily on manually designed meta-paths, which demand extensive domain knowledge and laborious efforts.

GNN-based approaches, built upon graph convolutional networks (GCNs)³⁰, aggregate information from neighboring nodes into the representation of the target node and incorporate high-order neighbors by stacking multiple GNN layers⁵. KGCN³¹ serves as an early GNN-based recommendation model that utilizes GCN to aggregate neighborhood information when computing the representation of a given entity in the KG, subsequently predicting user engagement with items. Numerous extensions have since been proposed, such as KGAT³ and CKAN², which introduce attention mechanisms to differentiate the importance of neighbors. KGNN-LS³² and KNI³³ combine label smoothness regularization with neighborhood interaction to enhance the information aggregation process, while MKM-SR³⁴ integrates user micro-behaviors and item knowledge into multi-task learning. GNN-based approaches have demonstrated powerful capabilities in effectively generating local, permutation-invariant aggregation over the neighbors of a node, which positions them as the foundation for addressing sparse supervision signals and redundant entity relations in this study.

Intent-oriented recommendation approaches

The concept of user intent has become increasingly prevalent in E-commerce applications such as Taobao and Amazon. User intent is often manifested as automatically generated search suggestions, based on usersâ historical behavior. Traditionally, intent generation relies heavily on domain knowledge and the extraction of handcrafted features. To automate intent generation, MEIRec³⁵ treats queries as user intents and defines users, queries, and items in a semantic order, using meta-path-guided neighbors to generate intents from rich interaction data.

Subsequently, user intents have been considered latent and invisible and used to enrich user-item interaction features rather than to directly assist in item search. DGCF¹⁹ models user intents as fine-grained representations of user-item interactions and generates disentangled representations using a GNN model. RAISE³⁶ models user intents as user-item pairs, which are mined from text reviews by differentiating the importance of review information with a co-attention network. KGIN⁷ treats each intent as an attentive combination of entity relations, promoting independence among different intents to enhance model capability and interpretability. It also employs an informative aggregation scheme for GNNs, recursively integrating relation sequences through long-range connectivity (i.e., relational paths). Furthermore, user intents have been extended to session-based recommendations, where they are treated as sequentially related^{37,38,39,40,41}.

The concept of user intent has inspired new approaches in recommendation system design, achieving notable success in mitigating sparse supervision signals. However, intent has yet to be applied to tackle redundant entity relations, and its combination with contrastive learning remains rare. This paper aims to bridge these gaps in the research.

Contrastive learning recommendation approaches

Contrastive learning approaches derive node representations by contrasting positive pairs against negative pairs⁶. Typically, positive pairs are different augmentations of the same sample, while negative pairs are derived from distinct samples. The learning objective is to maximize the mutual information between positive data transformations while enhancing the discrimination of negative samples. Since contrastive learning does not rely on class labels, it effectively addresses the issue of sparse supervision signals through self-supervised learning.

In recent years, various data augmentation techniques have been introduced in recommendation models. For example, SGL⁸ performs contrastive learning between the original graph and a corrupted graph based on user-item interactions. Other models, such as MBCLRec⁴², CML⁴³ and KMCLR⁴⁴, conduct contrastive learning across multi-behavioral relationships, including page views, favorites, carts, and purchases. MSICL⁴⁵, CDGCL⁴⁶ and DMM-Rec⁴⁷ build contrastive views using different modalities of item-side information, minimizing the dissimilarity between modalities of the same sample while maximizing it between negative samples within each modality. BUCL¹⁰ creates contrastive views from graph embedding and relation-aware attention, aiming to capture various knowledge and item representation. DGI⁴⁸, HeCo⁴⁹, GMI⁵⁰, MVGRL⁵¹, MCCLK⁶, AMMCN⁵², KGCL⁵³ and KRCL⁵⁴ create contrastive views from multi-level user-item-entity graphs, aiming to extract comprehensive graph features and structure information in a self-supervised manner.

In terms of learning methodology, many of these models are GNN-based. For instance, MVGRL⁵¹, MCCLK⁶ and AMMCN⁵² utilize multi-view contrastive learning, while SGL⁸ incorporates supervised recommendation tasks within a multi-task framework. ICL⁵⁵, in particular, proposes an intent contrastive learning approach for sequential recommendation, which inserts latent intent variables into each user-item interaction. ICL learns intent distributions from unlabeled user behavior sequences through contrastive learning, maximizing the agreement between a sequence and its corresponding intent. However, ICL is not a knowledge-aware recommendation approach and relies solely on interaction sequences.

In real-world scenarios, multi-behavior and multi-modal data are often unavailable, and multi-view or multi-task contrastive learning approaches typically suffer from excessive resource consumption and low training efficiency. This study aims to address these challenges by adopting a dual-view approach with KG integration and combining intent and contrastive learning to tackle both sparse supervision signals and redundant entity relations.

The comparison of DIVCL with recent approaches

A comparison of DIVCL with recent models is provided in Table 1. It can be found that the main distinctions between DIVCL and existing models lie in the multi-purpose of intent application, the dual-view contrastive learning and the intent relevance in KG filtering strategy.

Problem formulation

To ensure the availability of data, this study assumes that the recommendation model is based on user-item interactions and item knowledge graphs (KGs). User-item interactions represent the supervised signals, which may include actions such as page views, favorites, cart additions, and purchases. The item KGs consist of entities and entity relations associated with the items.

Table 1 The comparison of DIVCL with recent models.

Full size table

Let U be a set of users and I a set of items. The interactions are represented as a matrix ${\mathbf{Y}} \in {{\mathbb{R}}^{\left| {\mathbf{U}} \right| \times \left| {\mathbf{I}} \right|}}$, where |U| is the size of U, and |I| is the size of I. Let y_uiâY denote the interaction between a user uâU and an item iâI. If there exists interaction between u and i, then y_ui=1; otherwise, y_ui=0. Y is the representation of interaction graph (IG), which consists of user and item nodes.

Let V (âI) be a set of entities including the items, item attributes, taxonomy, external commonsense knowledge, and Re be a set of entity relations. A KG is a graph-structured data model that contains V and Re, represented as a set of triples $\:\mathbf{G}=\left\{\left(h,r,\text{t}\right)|h,t\in\:\mathbf{V},\:r\in\:\mathbf{R}\mathbf{e}\right\}$.

Given Y and G, the task of the recommendation system is to learn a function that can predict how likely a user would adopt an item.

Methodology

It is assumed that both the interaction graph (IG) and the knowledge graph (KG) are prepared and serve as the initial inputs to the model. The IG represents the local graph, while the combination of the IG and KG (IGâ+âKG) forms the global graph. These graphs are treated as two initial views for GNN-based information aggregation. The workflow of the proposed DIVCL model is depicted in Fig. 1, comprising four key components: (1) Global intent view encoder, which injects intents and generates the embedding representations for the IGâ+âKG. (2) Local intent view encoder, which injects intents and produces embedding representations for the IG. (3) Contrastive learning, which is performed between the local and global intent views to obtain discriminative node embeddings. (4) Model prediction, which estimates the likelihood of a user adopting an item based on the generated user and item embeddings.

Initially, the global and local view encoders run in parallel, and their outputs are connected through contrastive learning. Finally, the user and intent embeddings from both views are concatenated and serve as inputs to the modelâs prediction component. The details of each component will be elaborated in the following subsections.

Global intent view encoder

The global intent view encoder begins with the interaction graph (IG) and knowledge graph (KG) (denoted as Y and G, respectively). It proceeds through three main stages: intent injection, KG filtering, and GNN-based information aggregation, ultimately generating the user and item embeddings. The KG filtering step introduces a novel strategy for removing redundant entity relations before aggregating information, thus improving the quality of the embeddings. The remaining components of the encoder are modeled following the approach of KGIN⁷, an approach whose effectiveness has been recently validated.

Intent injection

Let P be the set of intents shared by all users, an interaction pair (u, i) is decomposed into a set of triples {(u, p, i)|pâP} by inserting P. Each intent pâP is assigned with a distribution over KG relations, and its embedding is created by an attention mechanism as follows:

$${{\mathbf{e}}_p}=\sum\nolimits_{{r \in {\mathbf{Re}}}} {\alpha (r,p} ){{\mathbf{e}}_r}$$

(1)

where e_p is the embedding representation of intent p, e_r is the embedding representation of relation r, and $\alpha (r,p)$ represents the importance of r as follows:

$$\alpha (r,p)=\frac{{\exp ({w_{rp}})}}{{\sum\nolimits_{{r^\prime \in {\mathbf{Re}}}} {\exp ({w_{r^\prime}p})} }}$$

(2)

where $\:{w}_{rp}$ is a trainable weight specific to certain relation r and certain intent p.

KG filtering

In the KG, certain entities and relations may be irrelevant to the recommendation context, thereby introducing noise into the learning process. To address this, the KG filtering strategy, which is the entity relation filtering based on intent relevance mentioned in the contributions, is proposed. In the strategy, intents serve as feasible assessment criteria for KG filtering. Specifically, the projected distance of a relation within the latent space of intents is used to evaluate the relationâs influence on the intent space, which in turn determines whether the relation should be retained or discarded.

The process begins by projecting the entity embeddings into the latent space of intents, as follows:

$${\mathbf{e}}_{v}^{{\text{p}}}={{\mathbf{e}}_v} \times {\mathbf{E}}_{{\text{P}}}^{{\text{T}}}$$

(3)

where vâV denotes an entity in the KG, e_v denotes the embedding of v, ${\mathbf{E}}_{{\text{P}}}^{{\text{T}}}$ denotes the transpose of intents embedding matrix in the global view, of which each row is an intent embedding vector ${{\mathbf{e}}_p}$, ${\mathbf{e}}_{v}^{{\text{p}}}$ denotes the projected embedding of v.

For a triple $\:\left(h,r,t\right)\in\:\mathbf{G}$, the influence of the relation r over intent space is evaluated by the structural similarity between h and t in the latent space of the intents as follows:

$${\text{sim}}(h,t)=\frac{{{\mathbf{e}}_{h}^{{\text{p}}} \cdot {\mathbf{e}}_{t}^{{\text{p}}}}}{{\left\| {{\mathbf{e}}_{h}^{{\text{p}}}} \right\| \times \left\| {{\mathbf{e}}_{t}^{{\text{p}}}} \right\|}}$$

(4)

where sim(h,t) is the cosine similarity between h and t and negatively correlated with the influence of the relation r over intent space.

To simplify the threshold setting for the KG filtering strategy, the sim(.) is normalized as follows:

$$\text{sim}^\prime(h,t)=\frac{{1+sim(h,t)}}{2}$$

(5)

where $\text{sim}^\prime(h,t)$ is the normalized sim(h,t).

Thereafter, the KG is filtered as follows:

$${\mathbf{G}}^{\prime} = \{ (h,r,t) | {\text{sim}}^{\prime}(h,t) \leq \theta , (h,r,t) \in {\mathbf{G}} \}$$

(6)

where ${\mathbf{G^{\prime}}}$ represents the preserved KG, and Î¸ represents the filtering threshold, which is the hyperparameter to control the influence of relation.

GNN information aggregation

Following the GNN information aggregation on the basis of Light GCN, the representation of user u is generated by recursively integrating the intent-aware information from interacted items. Based on the interactions in the matrix Y, the representation of u is recursively aggregated as follows:

$${\mathbf{e}}_{u}^{{(l+1)}}=\frac{1}{{\left| {{N_u}} \right|}}\sum\nolimits_{{i \in {N_u}}} {\omega (u,p){{\mathbf{e}}_p} \odot {\mathbf{e}}_{i}^{{(l)}}}$$

(7)

where lâ{0,1,2,., L} is the number of layers of aggregation, e_u, e_p and e_i are the embedding representation of u, p and i, respectively, $\odot$ is the elementwise product,${N_u}=\{ i|{y_{ui}} \in {\mathbf{Y}},{y_{ui}}=1\}$ is the items that u has interacted with, $\omega (u,p)$ represents the importance of p to u which is computed as follows:

$$\omega (u,p)=\frac{{\exp ({{\mathbf{e}}_p} \cdot {\mathbf{e}}_{u}^{{\text{T}}})}}{{\sum\nolimits_{{p^\prime \in {\mathbf{P}}}} {\exp ({{\mathbf{e}}_{p^\prime}} \cdot {\mathbf{e}}_{u}^{{\text{T}}})} }}$$

(8)

where ${\mathbf{e}}_{u}^{{\text{T}}}$ is the transpose of e_u.

The representation of item i is recursively aggregated as follows:

$${\mathbf{e}}_{i}^{{(l+1)}}=\frac{1}{{\left| {N_{{_{i}}}^{{\text{g}}}} \right|}}\sum\nolimits_{{v \in {N_i}}} {{{\mathbf{e}}_r} \odot {\mathbf{e}}_{v}^{{(l)}}}$$

(9)

where $N_{{_{i}}}^{{\text{g}}}=\{ v|(i,r,v) \in {\mathbf{G}}^\prime\}$ is the neighbor set of i in the filtered KG.

Finally, the embedding of user and item under the global intent view is represented by the summation of the representation of all layers as follows:

$${\mathbf{e}}_{u}^{{\text{g}}}=\sum\limits_{{l=0}}^{L} {{e_u}^{{(l)}}}$$

(10)

$${\mathbf{e}}_{i}^{{\text{g}}}=\sum\limits_{{l=0}}^{L} {{e_i}^{{(l)}}}$$

(11)

where ${\mathbf{e}}_{u}^{{\text{g}}}$ is the embedding of user u under the global intent view, and ${\mathbf{e}}_{i}^{{\text{g}}}$ is the embedding of item i under the global intent view.

It should be noted that ${\mathbf{e}}_{u}^{{(0)}}$,${\mathbf{e}}_{i}^{{(0)}}$ and e_r are trainable weight vectors.

Local intent view encoder

The local intent view encoder forms the user and item embedding from the IG. It undergoes intent injection and GNN information aggregation but not KG filtering, which exists in the global intent view encoder.

In the intent injection component, an interaction pair (u, i) is decomposed into a set of triples {(u, p, i)|pâP} like that in the global intent view. The e_p, embedding representation of intent p, is a trainable weight vector since there is only interaction relation in the IG.

The representation of u is recursively aggregated as Eq. (7), and the representation of i is recursively aggregated as follows:

$${\mathbf{e}}_{i}^{{(l+1)}}=\sum\nolimits_{{u \in N_{i}^{{\text{c}}}}} {\frac{1}{{\left| {N_{i}^{{\text{c}}}} \right|}}{\mathbf{e}}_{u}^{{(l)}}}$$

(12)

where $N_{i}^{{\text{c}}}=\{ u|{y_{ui}} \in {\mathbf{Y}},{y_{ui}}=1\}$is the users interacted with i.

Finally, the embedding of user and item under the local intent view is represented by the summation of the representation of all layers as follows:

$${\mathbf{e}}_{u}^{{\text{c}}}=\sum\limits_{{l=0}}^{L} {{e_u}^{{(l)}}}$$

(13)

$${\mathbf{e}}_{i}^{{\text{c}}}=\sum\limits_{{l=0}}^{L} {{e_i}^{{(l)}}}$$

(14)

where ${\mathbf{e}}_{u}^{{(0)}}$and${\mathbf{e}}_{i}^{{(0)}}$ are trainable weight vectors.

Contrastive learning

In the training process, the intents injected in the IG are hard to converge owing to single relational constraint, whereas the intents injected in the IGâ+âKG are easy to prone to fall into prematurity owing to multi relational constraints. Therefore, ${\mathbf{e}}_{u}^{{\text{g}}}$, ${\mathbf{e}}_{i}^{{\text{g}}}$, ${\mathbf{e}}_{u}^{{\text{c}}}$and${\mathbf{e}}_{i}^{{\text{c}}}$ can be improved by contrastive learning between the global and local intent view.

Inspired by⁶, a sample pair is defined under the cross views. Given iâI, jâI, iââ âj, the pair $(e_{i}^{c},e_{i}^{g})$ is defined as a positive item pair since they are the two embedding vectors of one item under different views, and the pair $(e_{i}^{c},e_{j}^{g})$ is defined as a negative item pair since they are the two embedding vectors of two items under different views. The definition rules are also applied to user pairs.

The contrastive loss of item i is defined as follows:

$$L_{i}^{c}= - \log \frac{{\exp (\operatorname{sim} (e_{i}^{c},e_{i}^{g})/\tau )}}{{\exp (\operatorname{sim} (e_{i}^{c},e_{i}^{g})/\tau )+\sum\nolimits_{{j \ne i}} {\exp (\operatorname{sim} (e_{i}^{c},e_{j}^{g})/\tau )} }}$$

(15)

where $L_{i}^{c}$ is the contrastive loss under the local intent view, sim(.) is the cosine similarity, and $\tau \in {\mathbb{R}}$ is a hyperparameter named temperature control parameter.

$$L_{i}^{g}= - \log \frac{{\exp (\operatorname{sim} (e_{i}^{g},e_{i}^{c})/\tau )}}{{\exp (\operatorname{sim} (e_{i}^{g},e_{i}^{c})/\tau )+\sum\nolimits_{{j \ne i}} {\exp (\operatorname{sim} (e_{i}^{g},e_{j}^{c})/\tau )} }}$$

(16)

where $L_{i}^{{\text{g}}}$ is the contrastive loss under the global intent view.

The contrastive loss of user u is defined with the Eqs. (15) and (16), simply replacing i with u. The total contrastive loss is the sum of $L_{i}^{c}$and $L_{i}^{{\text{g}}}$, as follows:

$${L_{CG}}=\frac{1}{{\left| {\mathbf{I}} \right|}}\sum\nolimits_{{i \in {\mathbf{I}}}} {(L_{i}^{c}+L_{i}^{g})} +\frac{1}{{\left| {\mathbf{U}} \right|}}\sum\nolimits_{{u \in {\mathbf{U}}}} {(L_{u}^{c}+L_{u}^{g})}$$

(17)

where ${L_{CG}}$ is the total contrastive loss.

Model prediction

The final embedding vector of user u (or item i) is represented by the concatenation of the embedding vector under local and global intent view. Thereafter, how likely user u would adopt item i is evaluated by the inner product on the final embedding vector of u and i as follows:

$${\hat {y}_{ui}}={({\mathbf{e}}_{u}^{g} \oplus {\mathbf{e}}_{u}^{{\text{c}}})^{\text{T}}} \bullet ({\mathbf{e}}_{i}^{g} \oplus {\mathbf{e}}_{i}^{{\text{c}}})$$

(18)

where $\oplus$ denotes vector concatenation.

It is noted that ${\hat {y}_{ui}}$ is not a probability but a score, and used to compare which of two items is more likely to be adopted by a user.

Optimizer

The BPR⁵⁶ loss is used to train the model and defined as follows:

$${L_{{\text{BPR}}}}=\sum\nolimits_{{\left( {u,i,j} \right) \in O}} { - \ln \sigma \left( {{{\hat {y}}_{ui}} - {{\hat {y}}_{uj}}} \right)}$$

(19)

where $O=\left\{ {\left( {u,i,j} \right)\left| {\left( {u,i} \right) \in } \right.{O^+},\left| {\left( {u,j} \right) \in } \right.{O^-}} \right\}$ is the training dataset consisting of the observed interactions ${O^+}$ and unobserved counterparts ${O^-}$, and Ï(Â·) is the sigmoid function.

By combining the BPR loss and the total contrastive loss, the following objective function is minimized to learn the model parameter:

$$L={L_{{\text{BPR}}}}+\beta {L_{{\text{CL}}}}+\lambda \left\| \Theta \right\|_{2}^{2}$$

(20)

where $\Theta =\{ {\mathbf{e}}_{u}^{{(0)}},{\mathbf{e}}_{v}^{{(0)}},{{\mathbf{e}}_r},{{\mathbf{e}}_p},{\mathbf{w}}|u \in {\mathbf{U}},v \in {\mathbf{V}},r \in {\mathbf{Re}},p \in {\mathbf{P}}\}$ is the set of model parameters (note that ${\mathbf{V}} \subset {\mathbf{I}}$, and ${\mathbf{e}}_{u}^{{(0)}},{\mathbf{e}}_{i}^{{(0)}},{{\mathbf{e}}_p}$are duplicated in the local and global intent view encoder), $\beta$and $\lambda$ are two hyperparameters to control the contrastive loss and L₂ regularization term, respectively.

Experiments

The experiments are designed to answer the following research questions:

RQ1: How does the proposed DIVCL model perform compared to state-of-the-art recommender models?
RQ2: What is the impact of the KG filtering and contrastive learning strategies on the performance improvement of the DIVCL model?
RQ3: Does the proposed DIVCL model significantly increase the computational load compared to baseline models?

Experimental settings

Dataset description

Three benchmark datasets for movie, book, and fabric recommendations are used to evaluate the effectiveness of the proposed DIVCL model. The movie and book datasets are MovieLens-1 M and Amazon-Book, both released by Recbole (https://recbole.io/dataset_list.html), which are widely used for validating recommendation models. To ensure the quality of interactions, the movie dataset is filtered by removing interactions with user ratings below 3. Similarly, the book dataset is filtered by removing interactions with ratings below 3, users with fewer than 10 interactions, and entities with fewer than 5 relations.

The fabric dataset is self-built and has been released at https://github.com/yzxx667/DIVCL along with the DIVCL code. This dataset was collected from a large fabric wholesale mall, which serves as the motivation for this study. The item classes primarily consist of fabrics, linings, and accessories, while the entity classes include shops, materials, patterns, colors, and weaving techniques. The relation classes correspond to these entity categories. Unlike the movie and book datasets, interactions in the fabric dataset are mostly based on browsing rather than purchases or ratings.

The sparsity of a dataset is calculated as follows:

$$S=\left( {1 - \frac{{\left| {{\mathbf{IG}}} \right|}}{{\left| {\mathbf{U}} \right| \times \left| {\mathbf{I}} \right|}}} \right) \times 100\%$$

(21)

where S is the sparsity of the dataset, a higher value indicates a sparser supervised signal, and |IG| is the size of interactions.

The statistics of the three datasets are summarized in Table 2. It is evident that the supervised signals are highly sparse across all datasets, particularly in the book and fabric datasets. Additionally, the datasets contain multiple classes of relations, some of which may be redundant, potentially affecting the modelâs performance.

Table 2 Statistics of the benchmark datasets.

Full size table

During the training phase, each observed user-item interaction is treated as a positive instance. While for negative sampling, an item that the user has not interacted with is randomly selected and paired with the user. For each dataset, 80% of the data is randomly allocated to the training set, 10% to the validation set, and the remaining 10% is used for testing. This split ensures a robust evaluation of the modelâs performance.

Parameter settings

The proposed DIVCL model is implemented in Python, utilizing the PyTorch framework. The experiments were conducted on a workstation equipped with an Intel i7-13700 CPU @ 2.5 GHz, an NVIDIA GeForce RTX 4090 24G GPU, 32GB of DDR RAM, and running on Ubuntu Linux.

For all approaches, the size of the graph embedding was fixed at 64, the dropout rate was set to 0.6, the optimizer used was Adam, the batch size was 4096, the number of training epochs was 500, and the testing period was set at every 10 epochs. To fine-tune the model, a grid search was performed to determine the optimal hyperparameter settings for DIVCL. The range of hyperparameters and their optimal values are listed in Table 3.

Table 3 Parameter settings.

Full size table

Baselines

To verify the effectiveness of the proposed DIVCL model, both non-knowledge-aware and knowledge-aware approaches are employed as baselines for comparison. The non-knowledge-aware approaches build collaborative filtering models based solely on the interaction graph (IG), as follows:

BPR⁵⁶: This is a typical CF-based approach that uses pairwise matrix factorization for implicit feedback optimized by the BPR loss.
DMF⁵⁷: This presents a deep neural network to learn a common low dimensional space for the representations of users and items, which then are used to predict how likely a user adopt an item.
DGCF¹⁹: This considers user intents as a fine-grained representation of user-item interaction, and generates disentangled representations using GNN model.

The knowledge-aware approaches are on the basis of both the IG and KG as follows:

CKE²⁵: This is an embedding-based approach that combines structural, textual, and visual knowledge in one framework.
KGAT³: This is a GNN-based approach which iteratively integrates neighbors over IGâ+âKG with an attention mechanism to get user/item representations.
KGIN⁷: This is a GNN-based approach with intents, which disentangles user-item interactions at the granularity of user intents, and performs GNN on the IGâ+âKG with intents.
KGNNLS³²: This is a GNN-based model which enriches item embedding with GNN and label smoothness regularization.
MCCLK⁶: This is a GNN-based approach with contrastive learning, which considers the IG as a collaborative view, the KG as a semantic view, and the IGâ+âKG as a structural view, and then performs contrastive learning across three views, capturing comprehensive graph feature and structure information in a self-supervised manner.
VRKG⁵: This is a GNN-based approach with the consideration of redundant relations, which constructs virtual relational graphs by clustering relations to alleviate the negative impact of long-tail relations on information aggregation.

Evaluation metrics

The all-ranking strategy is employed during the evaluation phase. Positive and negative instances are defined consistently with those in the training phase. For each user, all items are ranked based on their predicted scores. To assess the top-K recommendation performance⁵⁸, evaluation metrics such as Recall@K and HR@K (Hit Ratio) are used, where K is set to 20 and 50. The average values of these metrics across all users in the test set are reported to provide a comprehensive performance comparison.

Performance comparison (RQ1)

The overall performances over the three datasets are shown in Tables 4 and 5 for different K values. The best results are in boldface and the second best results are italics.

Table 4 Overall performance comparison over the three datasets for Kâ=â20.

Full size table

Table 5 Overall performance comparison over the three datasets for Kâ=â50.

Full size table

The key observations from these results are as follows:

The proposed DIVCL consistently outperforms the baseline models on Recall, which is widely regarded as the most crucial metric in recommendation systems. In addition, DIVCL achieves optimal results on half of HR, another important metric. The results demonstrate the effectiveness of the DIVCL model, with its performance improvements largely attributable to the integration of intent injection, KG filtering, and contrastive learning.
The knowledge-aware approaches (CKE, KGAT, KGNNLS, KGIN, MCCLK, VRKG, and DIVCL) do not consistently outperform the non-knowledge-aware approaches (BPR, DMF and DGCF). In fact, BPR achieved the best performance on HR of Fabric@20. This finding suggests that while knowledge graphs (KGs) can provide valuable information for recommendation, they also introduce noise that may negatively affect performance. This further underscores the importance of the KG filtering strategy employed in the DIVCL model, as it helps mitigate the negative impact of irrelevant or redundant relations in the KG.
DIVCL outperforms the KGIN on all measures. Both use intent injection. The main difference is the KG filtering and contrastive learning added in the former. The result again verifies the promotion effect of the above two strategies.
DIVCL does not outperform VRKG and MCCLK on the HR of Book@50 and Fabric@50. Owing to the maturity of research and the complexity of dataset, it is very difficult to improve the overall performance of the recommendation. VRKG retains some features of relations with low intent relevance by clustering relations, and thereby it achieved better performance on HR with higher K value. MCCLK addresses sparse supervised signals by multi-level cross-view contrastive learning cross, and thereby it achieved better performance on the extremely sparse dataset Fabric with high K value. Despite these, DIVCL still shows superiority in recall metrics and time complexity.

Ablation studies (RQ2)

To evaluate the effectiveness of the dual-intent-view contrastive learning structure and the KG filtering strategy, the DIVCL was compared with the following two variants:

DIVCL w/o CL: In this variant, the contrastive learning component is removed, and the other components remain unchanged.
DIVCL w/o KG: In this variant, the KG filtering strategy in the global intent view encoder is removed, and the other components remain unchanged.

The comparison among the DIVCL and two variants is shown in Fig. 2.

The following observations further support the effectiveness of the proposed DIVCL model:

DIVCL outperforms its two variants across all measures. This result underscores the effectiveness of the dual-intent-view contrastive learning structure combined with the KG filtering strategy. Together, these components significantly enhance model performance.
DIVCL without contrastive learning (w/o CL) outperforms DIVCL without KG filtering (w/o KG) across all measures. This can be explained by the respective importance of the interaction graph (IG) and the knowledge graph (KG) for extracting user intent. The IG, as the source of supervised signals, is inherently closer to user intent than the KG, which serves as auxiliary information. The dual-intent-view contrastive learning module primarily addresses the sparse supervised signals, making it more impactful than KG filtering. However, KG filtering remains crucial, as it has also contributed to notable performance improvements.
The advantage of DIVCL is most pronounced on the fabric dataset. This is explained by the unique characteristics of the dataset. The interactions in the fabric dataset are predominantly browsing-based, which are less reliable compared to purchase and rating interactions. As a result, the sparse supervised signal problem is more severe in the fabric dataset than in the other two datasets, allowing the dual-intent-view contrastive learning component to yield more substantial improvements. Additionally, the KG for the fabric dataset has a higher density compared to the other datasets, exacerbating the problem of redundant entity relations. Consequently, the KG filtering strategy contributes more significantly to the performance gains on this dataset. These results further demonstrate the combined effectiveness of the dual-intent-view contrastive learning and KG filtering strategies in addressing both sparse signals and redundant entity relations.

Computation load studies (RQ3)

The time complexity of a learning network primarily depends on its structure. However, clarifying the network structure of all baseline models is quite challenging. Therefore, an experimental approach is used to assess the computational load of the models. Setting Kâ=â50, both the training and inference times were recorded during the experiments described in Sect. 5.2, and the average times per epoch are presented in Table 6.

Table 6 The training and inference time (unit: seconds/epoch).

Full size table

Although DIVCL ranks relatively low in terms of both training and inference times, its performance remains competitive with baseline models that achieve strong recommendation results, such as KGIN and VRKG. As a result, considering the balance between recommendation performance and computational load, the computational overhead of DIVCL is deemed acceptable.

Comparing DIVCL to KGIN across all datasets, the training time for DIVCL is less than double that of KGIN, while their inference times are quite similar. This difference can be attributed to the number of intent view encoders and the inclusion of the KG filtering strategy in DIVCL. While DIVCL contains two parallel intent view encoders, KGIN only has one. However, the KG filtering strategy in DIVCL simplifies the complexity of information aggregation, which helps mitigate the computational load.

Similarly, when comparing DIVCL to VRKG across the datasets, a similar pattern emerges. This difference can be attributed to the different KG filtering strategies employed. DIVCL removes irrelevant relations based on their influence over the intent space, which only requires projection and similarity calculations. On the other hand, VRKG clusters and merges similar relations, a process that is far more complex. As a result, the computational cost of DIVCL is less than twice that of VRKG.

When comparing DIVCL to MCCLK, the computational load of MCCLK is significantly higher, by an order of magnitude, rendering it impractical under the experimental settings. MCCLK utilizes three parallel intent view encoders and three separate contrastive learning processes, theoretically making its computational complexity two to three times that of DIVCL. However, MCCLK requires information aggregation from three unfiltered graphs, consuming memory well beyond the physical capacity of the experimental machine. This results in frequent internal and external memory swapping, severely slowing down the computation speed.

Conclusion and future work

In this work, a Dual-Intent-View Contrastive Learning network (DIVCL) is proposed to address the challenges of sparse supervised signals and redundant entity relations. DIVCL enhances user and item representation learning by fully utilizing the role of intent. First, supervised signals are represented in a fine-grained manner by inserting a set of intents into each user-item interaction. Second, redundant entity relations are filtered by defining the influence of relations within the intent space. Third, the distribution of user and item embedding is more aligned with user preferences by incorporating intent into the contrastive learning process. Additionally, the computational load of DIVCL remains manageable by adopting dual-view contrastive learning instead of multi-view contrastive learning, ensuring an efficient balance between performance and resource consumption. The effectiveness of the proposed DIVCL model has been demonstrated through experimental results on three benchmark datasets, particularly the self-built fabric dataset, which highlights its strengths. Furthermore, the experimental results reveal that noise not only exists within the KG but also within the user-item interactions themselves. This finding suggests that future work should explore joint denoising strategies at both endsâinteraction and KG levelsâto further improve the performance of recommendation systems.

Data availability

The datasets generated and analysed during the current study are available in the github repository, https://github.com/yzxx667/DIVCL.

References

Saini, K. & Singh, A. A content-based recommender system using stacked LSTM and an attention-based autoencoder. Measurement: Sens. 31, 100975. https://doi.org/10.1016/j.measen.2023.100975 (2024).
ArticleÂ MATHÂ Google ScholarÂ
Wang, Z., Lin, G., Tan, H., Chen, Q. & Liu, X. C. K. A. N. Collaborative Knowledge-aware Attentive Network for Recommender Systems. in SIGIR â20. 219â228, (2020). https://doi.org/10.1145/3397271.3401141
Wang, X., He, X., Cao, Y., Liu, M. & Chua, T. S. KGAT: Knowledge Graph Attention Network for Recommendation. in KDD â19. 950â958, (2019). https://doi.org/10.1145/3292500.3330989
Wang, H., Zhao, M., Xie, X., Li, W. & Guo, M. Knowledge Graph Convolutional Networks for Recommender Systems. in The World Wide Web Conference. 3307â3313, (2019). https://doi.org/10.1145/3308558.3313417
Lu, L., Wang, B., Zhang, Z., Liu, S. & Xu, H. VRKG4Rec: Virtual Relational Knowledge Graph for Recommendation. in the Sixteenth ACM International Conference on Web Search and Data Mining. 526â534. (2023).
Zou, D. et al. Multi-level Cross-view Contrastive Learning for Knowledge-aware Recommender System. in 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1358â1368, (2022). https://doi.org/10.1145/3477495.3532025
Wang, X. et al. Learning Intents behind Interactions with Knowledge Graph for Recommendation. in WWW â21. (2021). https://doi.org/10.1145/3442381.3450133
Wu, J. et al. Self-supervised graph learning for recommendation. in SIGIRâ21. 726â735, (2021). https://doi.org/10.1145/1122445.1122456
Gao, J. et al. Representation degeneration problem in training natural language generation models. in The Seventh International Conference on Learning Representations. 1â14. (2019).
Liu, Y., Xuan, H. & Li, B. Bi-knowledge views recommendation based on user-oriented contrastive learning. J. Intell. Inform. Syst. 61, 611â630. https://doi.org/10.1007/s10844-023-00778-0 (2023).
ArticleÂ MATHÂ Google ScholarÂ
Bharadhwaj, H. Meta-learning for user cold-start recommendation. in International Joint Conference on Neural Networks (IJCNN) 1â8 (IEEE). (2019).
Dong, M. et al. Memory-augmented meta-optimization for cold-start recommendation. in 26th ACM SIGKDD international conference on knowledge discovery & data mining. 688â697, (2020). https://doi.org/10.1145/3394486.3403113
Lee, H. et al. Meta-learned user preference estimator for cold-start recommendation. in 25th ACM SIGKDD international conference on knowledge discovery & data mining. 1073â1082, (2019). https://doi.org/10.1145/3292500.3330859
Li, N. et al. Hierarchical constrained Variational Autoencoder for interaction-sparse recommendations. Inf. Process. Manage. 61, 103641. https://doi.org/10.1016/j.ipm.2024.103641 (2024).
ArticleÂ MATHÂ Google ScholarÂ
Li, J. et al. From zero-shot learning to cold-start recommendation. in AAAI conference on artificial intelligence. 4189â4196, (2019). https://doi.org/10.1609/aaai.v33i01.33014189
Shi, S., Zhang, M., Liu, Y. & Ma, S. Attention-based adaptive model to unify warm and cold starts recommendation. in 27th ACM international conference on information and knowledge management. 127â136, (2018). https://doi.org/10.1145/3269206.3271710
Hu, L. et al. Modeling influential contexts with heterogeneous relations for sparse and cold-start recommendation. in AAAI conference on artificial intelligence. 3830â3837, doi: (2019). https://ojs.aaai.org/index.php/AAAI/article/view/4270
Sedhain, S., Menon, A. K., Sanner, S. P., Xie, L. & Braziunas, D. Low-rank linear cold-start recommendation from social data. in AAAI conference on artificial intelligence. 1502â1508, doi:https://dl.acm.org/doi/abs/ (2017). https://doi.org/10.5555/3298239.3298458
Wang, X. et al. Disentangled Graph Collaborative Filtering. in SIGIR â20. (2020). https://doi.org/10.1145/3397271.3401137
Wu, L., He, X., Wang, X., Zhang, K. & Wang, M. A Survey on Accuracy-oriented neural recommendation: from collaborative filtering to information-rich recommendation. IEEE Trans. Knowl. Data Eng. 35, 4425â4445 (2021).
MATHÂ Google ScholarÂ
Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J. & Yakhnenko, O. Translating Embeddings for Modeling Multi-relational Data. in NIPS. 2787â2795 (MIT Press). (2013).
Wang, Z., Zhang, J., Feng, J. & Chen, Z. Knowledge Graph Embedding by Translating on Hyperplanes in Twenty-Eighth AAAI Conference on Artificial Intelligence. 1112â1119, (2014). https://doi.org/10.1609/aaai.v28i1.8870
Ji, G., He, S. & Zhao, J. Knowledge Graph Embedding via Dynamic Mapping Matrix. in 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. 687â696, (2015). https://doi.org/10.3115/v1/P15-1067
Lin, Y., Liu, Z., Sun, M., Liu, Y. & Zhu, X. Learning Entity and Relation Embeddings for Knowledge Graph Completion in Twenty-Ninth AAAI Conference on Artificial Intelligence. 2181â2187. (2015).
Zhang, F., Yuan, N. J., Lian, D., Xie, X. & Ma, W. Y. Collaborative Knowledge Base Embedding for Recommender Systems. in 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 353â362. (2016).
Wang, H. et al. Propagating user preferences on the knowledge graph for recommender systems. in CIKM. 417â426. (2018).
Lin, Z. et al. Convolutional Dimension Interaction for Collaborative Filtering. ACM Trans. Intell. Syst. Technol. 14, 1â18. https://doi.org/10.1145/3588576 (2023).
ArticleÂ MATHÂ Google ScholarÂ
Hu, B., Shi, C., Zhao, W. X. & Yu, P. S. Leveraging meta-path based context for top-n recommendation with a neural co-attention model. in SIGKDD. 1531â1540. (2018).
Wang, X. et al. Explainable reasoning over knowledge graphs for recommendation. in AAAI. 5329â5336. (2019).
Kipf, T. N. & Welling, M. Semi-Supervised Classification with Graph Convolutional Networks in International Conference on Learning Representations. (2017).
Wang, H., Zhao, M., Xie, X., Li, W. & Guo, M. Knowledge graph convolutional networks for recommender systems. in World Wide Web Conference (WWW â19). 3307â3313, (2019). https://doi.org/10.1145/3308558.3313417 (2019).
Wang, H. et al. Knowledge-aware graph neural networks with label smoothness regularization for recommender systems. in SIGKDD. 968â977, (2019). https://doi.org/10.1145/3292500.3330836
Qu, Y., Bai, T., Zhang, W., Nie, J. & Tang, J. An End-to-End Neighborhood-based Interaction Model for Knowledge-enhanced Recommendation. in DLP-KDDâ19. 1â9, (2019). https://doi.org/10.1145/3326937.3341257
Wenjing, M., Deqing, Y. & Yanghua, X. Incorporating user microbehaviors and item knowledge into multi-task learning for session-based recommendation. in SIGIR. 1091â1100. (2020).
Fan, S. et al. Metapath-guided Heterogeneous Graph Neural Network for Intent Recommendation. in KDD â19. 2478â2486, (2019). https://doi.org/10.1145/3292500.3330673
Lin, Z. et al. Attention over Self-Attention: intention-aware Re-ranking with Dynamic Transformer encoders for recommendation. IEEE Trans. Knowl. Data Eng. 35, 7782â7795. https://doi.org/10.1109/TKDE.2022.3208633 (2023).
ArticleÂ MATHÂ Google ScholarÂ
Liu, Z. Zang, S., Wang, R.,Sun, Z., J. Senthilnath, Xu, C & Kwoh, C. K. Basket recommendation with multi-intent translation graph neural network. in IEEE International Conference on Big Data (Big Data). 728â737. (2023).
Wang, S. et al. Modeling multi-purpose sessions for next-item recommendations via mixture-channel purpose routing networks. in Twenty-Eighth International Joint Conference on Artificial Intelligence. 3771â3777. (2019).
Pan, Z., Cai, F., Ling, Y. & Rijke, M. d. An intent guided collaborative machine for session-based recommendation. in 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 1833â1836.
Ma, J. et al. Disentangled self-supervision in sequential recommenders. in 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 483â491. (2020).
Tanjim, M. M. et al. Attentive sequential models of latent intent for next item recommendation. in Web Conference. 2528â2534. (2020).
Zhao, Z., Tong, X., Wang, Y. & Zhang, Q. Multi-behavior contrastive learning with graph neural networks for recommendation. Knowl. Based Syst. 112221 https://doi.org/10.1016/j.knosys.2024.112221 (2024).
Wei, W. et al. Contrastive meta learning with behavior multiplicity for recommendation. in Fifteenth ACM International Conference on Web Search and Data Mining. 1120â1128, (2022). https://doi.org/10.1145/3488560.3498527
Xuan, H., Liu, Y., Li, B. & Yin, H. Knowledge enhancement for contrastive multibehavior recommendation. in Sixteenth ACM International Conference on Web Search and Data Mining. 195â203, (2023). https://doi.org/10.1145/3539597.357038
Lei, S. et al. Improving graph collaborative filtering with multimodal-side-information-enriched contrastive learning. J. Intell. Inform. Syst. 62, 143â161. https://doi.org/10.1007/s10844-023-00807-y (2024).
ArticleÂ MATHÂ Google ScholarÂ
Xu, F., Zhu, Z., Fu, Y., Wang, R. & Liu, P. Collaborative denoised graph contrastive learning for multi-modal recommendation. Inf. Sci. 679, 121017. https://doi.org/10.1016/j.ins.2024.121017 (2024).
ArticleÂ Google ScholarÂ
Guo, F., Wang, Z., Wang, X., Lu, Q. & Ji, S. Dual-view multi-modal contrastive learning for graph-based recommender systems. Comput. Electr. Eng. 116, 109213. https://doi.org/10.1016/j.compeleceng.2024.109213 (2024).
ArticleÂ MATHÂ Google ScholarÂ
Velickovic, P. et al. Deep Graph Infomax. in ICLR 4, doi: (2019). https://arxiv.org/pdf/1809.10341
Wang, X., Liu, N., Han, H. & Shi, C. Self-supervised Heterogeneous Graph Neural Network with Co-contrastive Learning. in 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 1726â1736. (2021).
Peng, Z. et al. Graph representation learning via graphical mutual information maximization. in WWW â20. 259â270, (2020). https://doi.org/10.48550/arXiv.2002.01169
Hassani, K. & Khasahmadi, A. H. Contrastive multi-view representation learning on graphs. in 37 th International Conference on Machine Learning. 4116â4126. (2020).
Yuan, X. et al. Attribute mining multi-view contrastive learning network for recommendation. Expert Syst. Appl. 253, 124224. https://doi.org/10.1016/j.eswa.2024.124224 (2024).
ArticleÂ Google ScholarÂ
Yang, Y., Huang, C., Xia, L. & Li, C. Knowledge Graph Contrastive Learning for Recommendation. in 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIRâ22),. (2022). https://doi.org/10.1145/3477495.3532009
He, Y., Zheng, X., Xu, R. & Tian, L. Knowledge-based recommendation with contrastive learning. High-Confidence Comput. 3, 100151. https://doi.org/10.1016/j.hcc.2023.100151 (2023).
ArticleÂ MATHÂ Google ScholarÂ
Chen, Y., Liu, Z., Li, J., McAuley, J. & Xiong, C. Intent Contrastive Learning for Sequential Recommendation. in ACMWeb Conference 2022 (WWW â22). (2022). https://doi.org/10.1145/3485447.3512090
Rendle, S., Freudenthaler, C. & Gantner, Z. & Schmidt-Thieme, L. BPR: Bayesian Personalized Ranking from Implicit Feedback. in UAI. 452â461. (2009).
He, X. et al. Neural Collaborative Filtering. in 26th International Conference on World Wide Web.173â182. (2017). https://doi.org/10.1145/3038912.3052569
Tamm, Y. M., Damdinov, R. & Vasilev, A. Quality Metrics in Recommender Systems: DoWe Calculate Metrics Consistently? in Fifteenth ACM Conference on Recommender Systems (RecSys â21). (2021). https://doi.org/10.1145/3460231.3478848

Download references

Acknowledgements

The authors would like to acknowledge the data providers, open-source code providers, and support from the Scientific Research Platforms and Projects of Guangdong Provincial Education Department (Grant No: 2022ZDZX1012) and the Humanities and Social Sciences Fund of Ministry of Education in China (Grant No: 22YJA880075).

Author information

Authors and Affiliations

School of Computer Science, Guangdong Polytechnic Normal University, Guangzhou, Guangdong, China
Jianhua Guo,Â Zhixiang Yin,Â Shuyang FengÂ &Â Shaopeng Liu
School of Education, Guangzhou University, Guangzhou, Guangdong, China
Donglin Yao

Authors

Jianhua Guo
View author publications
You can also search for this author in PubMedÂ Google Scholar
Zhixiang Yin
View author publications
You can also search for this author in PubMedÂ Google Scholar
Shuyang Feng
View author publications
You can also search for this author in PubMedÂ Google Scholar
Donglin Yao
View author publications
You can also search for this author in PubMedÂ Google Scholar
Shaopeng Liu
View author publications
You can also search for this author in PubMedÂ Google Scholar

Contributions

Jianhua Guo: conceptualisation, methodology, supervision, resources, writing, review, and editing. Zhixiang Yin and Shuyang Feng: methodology, formal analysis, and writing (original draft). Donglin Yao: writing, review and funding. Shaopeng Liu: review.

Corresponding author

Correspondence to Jianhua Guo.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisherâs note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the articleâs Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the articleâs Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Guo, J., Yin, Z., Feng, S. et al. Dual intent view contrastive learning for knowledge aware recommender systems. Sci Rep 15, 2133 (2025). https://doi.org/10.1038/s41598-025-86416-x

Download citation

Received: 04 December 2024
Accepted: 10 January 2025
Published: 16 January 2025
DOI: https://doi.org/10.1038/s41598-025-86416-x

Subjects

Abstract

Similar content being viewed by others

Enhanced knowledge graph recommendation algorithm based on multi-level contrastive learning

Iterative heterogeneous graph learning for knowledge graph-based recommendation

Graph neural network recommendation algorithm based on improved dual tower model

Introduction

Related work

Knowledge-aware recommendation approaches

Intent-oriented recommendation approaches

Contrastive learning recommendation approaches

The comparison of DIVCL with recent approaches

Problem formulation

Methodology

Global intent view encoder

Intent injection

KG filtering

GNN information aggregation

Local intent view encoder

Contrastive learning

Model prediction

Optimizer

Experiments

Experimental settings

Dataset description

Parameter settings

Baselines

Evaluation metrics

Performance comparison (RQ1)

Ablation studies (RQ2)

Computation load studies (RQ3)

Conclusion and future work

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisherâs note

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links

Publisherâs note