Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Towards Robust Recommendation via Decision Boundary-aware Graph Contrastive Learning

Jiakai Tang Sunhao Dai Zexu Sun Gaoling School of Artificial Intelligence, Renmin University of ChinaBeijingChina tangjiakai5704@ruc.edu.cn sunhaodai,sunzexu21@ruc.edu.cn Xu Chen Jun Xu Gaoling School of Artificial Intelligence, Renmin University of ChinaBeijingChina xu.chen,junxu@ruc.edu.cn Wenhui Yu Lantao Hu Peng Jiang  and  Han Li Kuaishou TechnologyBeijingChina yuwenhui07, hulantao@kuaishou.com jiangpeng, lihan08@kuaishou.com
(2024)
Abstract.

In recent years, graph contrastive learning (GCL) has received increasing attention in recommender systems due to its effectiveness in reducing bias caused by data sparsity. However, most existing GCL models rely on heuristic approaches and usually assume entity independence when constructing contrastive views. We argue that these methods struggle to strike a balance between semantic invariance and view hardness across the dynamic training process, both of which are critical factors in graph contrastive learning.

To address the above issues, we propose a novel GCL-based recommendation framework RGCL, which effectively maintains the semantic invariance of contrastive pairs and dynamically adapts as the model capability evolves through the training process. Specifically, RGCL first introduces decision boundary-aware adversarial perturbations to constrain the exploration space of contrastive augmented views, avoiding the decrease of task-specific information. Furthermore, to incorporate global user-user and item-item collaboration relationships for guiding on the generation of hard contrastive views, we propose an adversarial-contrastive learning objective to construct a relation-aware view-generator. Besides, considering that unsupervised GCL could potentially narrower margins between data points and the decision boundary, resulting in decreased model robustness, we introduce the adversarial examples based on maximum perturbations to achieve margin maximization. We also provide theoretical analyses on the effectiveness of our designs. Through extensive experiments on five public datasets, we demonstrate the superiority of RGCL compared against twelve baseline models. To benefit the research community, we have released our project at https://cl4rec.github.io/RGCL/.

Recommender Robustness; Graph Contrastive Learning; Adversarial Learning
* Corresponding author
journalyear: 2024copyright: acmlicensedconference: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining; August 25–29, 2024; Barcelona, Spainbooktitle: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’24), August 25–29, 2024, Barcelona, Spaindoi: 10.1145/3637528.3671661isbn: 979-8-4007-0490-1/24/08ccs: Information systems Recommender systems

1. Introduction

Refer to caption
Figure 1. An overview of two types of representative GCL-based recommenders. To facilitate the presentation, we only show a single user and item with injected noise. However, in practice, the semantic-aware GCL-based methods should integrate perturbations to all graph nodes.

Recently, the intersection of graph neural networks (GNNs) and recommender systems has emerged as a focal point of research attention in both academia and industry (Jin et al., 2023). While GNNs have demonstrated remarkable efficacy in capturing high-order connectivity relationships between users and items through their potent message propagation mechanism (Jiao et al., 2023; Wu et al., 2022), the inherent data sparsity within recommendation scenarios introduces unexpected bias in users (e.g., non-active vs. active users) and items (e.g., long-tail vs. popular items) representations, thereby impairing the overall model performance (Cai et al., 2023; Lin et al., 2022).

To mitigate the issue of data sparsity and drawing inspiration from self-supervised learning (SSL), recent works have introduced Graph Contrastive Learning (GCL) into GNN-based algorithms (Lu et al., 2023; Zhang et al., 2023; Wei et al., 2022). GCL represents a new learning paradigm that integrates contrastive learning (Jaiswal et al., 2020) with GNN-based recommenders, simultaneously enhancing the alignment of positive embedding pairs and minimizing the similarity to augmented negative instances. In this way, GCL can effectively alleviate the problem of representation degradation among low-degree nodes. In general, GCL-based recommenders can be classified into two categories based on how to build the contrastive samples: (1) Hardness-driven methods. These methods basically aim to construct hard enough samples to challenge original recommender models and provide more difficult knowledge to widen the model vision. The methods in this line mainly differentiate themselves by how to define the hardness and how to build hard enough samples. For example, SGL (Wu et al., 2021) generates challenging views using various strategies, such as node dropout and edge dropout. (2) Rationality-driven methods. These methods aim to maintain the rationality of the constructed samples, that is, the augmented features and original labels should form reasonable samples. For example, SimGCL (Yu et al., 2022) makes slight changes to the original features, such that the augmented feature-label pairs can be still reasonable (i.e., semantically invariant).

Although the aforementioned GCL-based recommenders have shown impressive performance to some extent, we argue that these methods still suffer from several significant limitations. As depicted in Figure 1, on the one hand, hardness-driven models blindly pursue the example hardness in contrastive augmentations through manual-designed heuristic strategies. Unfortunately, these models may inadvertently remove certain crucial nodes or edges, neglecting how to maintain task-specific semantics. This oversight makes it challenging for recommenders to accurately capture user preferences and item characteristics. On the other hand, rationality-driven methods introduce slight feature perturbations to retain the underlying semantic structure but may overlook the benefits of introducing hard samples on providing more diverse knowledge.

Notably, both challenging positive pairs and hard negative pairs are essential to the success of GCL-based recommenders (Robinson et al., 2020; Wang and Liu, 2021). In extreme cases, the zero-noise version of contrastive learning may not yield significant performance gains, as verified by prior research (Xia et al., 2022; Yu et al., 2022). In summary, achieving an adaptive and ideal balance between the hardness and rationality of contrastive augmentations for GCL-based recommenders poses a highly intricate challenge.

In this work, we aim to leverage the idea of adversarial robustness (Moosavi-Dezfooli et al., 2016) to facilitate the construction of optimal contrastive augmented data. To be specific, the goal of adversarial robustness is to promote feature invariance upon task-relevant information, assuring the neural networks are not fooled by imperceptible data perturbations. More importantly, it specifies the maximum perturbation boundary that the current model can tolerate, which explicitly defines a feasible exploration space for conducting example augmentation. Therefore, grounded by such idea, the graph contrastive learning can effectively balance the example hardness and rationality, both of which are crucial factors to high-quality representations. While this idea is inherently intuitive and holds intriguing potential, its implementation still faces several challenges and obstacles. C1: prevalent contrastive augmentation approaches, assuming entity independence, struggle to maintain inherent structural features as they overlook the important connections among user-user and item-item. C2: as an unsupervised learning algorithm, GCL in blindly pursuing representation uniformity might unintentionally compromise the robust requirement, that is, narrow margins between data points and the model decision boundary, risking unexpected decreases in the model robustness.

To realize our idea and overcome the above challenges, this paper proposes a novel Robust Graph Contrastive Learning-based recommendation framework, named RGCL. Specifically, we first calculate the maximum perturbation magnitudes for different users and items at each graph layer, while preserving core semantic information for both user and item sides. (Rationality) Compared to manual-designed heuristics graph contrastive learning methods, we propose an adversarial-contrastive objective to adaptively generate challenging positive pairs and hard negative pairs based on the global relationships between user-user and item-item, (Hardness) which simultaneously overcomes the limitations of the entity independence assumption. (C1) At last, we optimize the joint loss of adversarial and contrastive components to concurrently increase the dissimilarity between different users (items) and maximize the distances between user-item inputs and model decision boundary, further improving the robustness of the recommendation model. (C2) In summary, our contributions can be summarized as follows:

  • \bullet

    We propose a model-agnostic graph contrastive learning framework, which utilizes dynamic decision boundary-aware adversarial perturbations to constrain the perturbation space of contrastive augmented view, achieving a better balance between contrastive hardness and sample rationality.

  • \bullet

    We develop a joint learning algorithm based on multi-view contrastive learning and margin maximum adversarial learning to optimize RGCL, empowering better representation uniformity while improving model robustness.

  • \bullet

    We give theoretical analyses to underscore the importance of hard contrastive views in model optimization and elucidate the insights behind the efficacy of RGCL in enhancing robustness.

  • \bullet

    Extensive experiments on five real-world datasets demonstrate the superior performance of our proposed RGCL framework.

2. Preliminaries

2.1. GNN-based Recommendation

Formally, let 𝒰={u1,u2,,uM}𝒰subscript𝑢1subscript𝑢2subscript𝑢𝑀\mathcal{U}=\{u_{1},u_{2},\dots,u_{M}\}caligraphic_U = { italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_u start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT } and ={i1,i2,,iN}subscript𝑖1subscript𝑖2subscript𝑖𝑁\mathcal{I}=\{i_{1},i_{2},\dots,i_{N}\}caligraphic_I = { italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_i start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_i start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT } denote the set of users and items, respectively, where M𝑀Mitalic_M and N𝑁Nitalic_N represent the number of users and items, respectively. Considering recommendation scenario with implicit feedback, a binary matrix 𝐑M×N𝐑superscript𝑀𝑁\mathbf{R}\in\mathbb{R}^{M\times N}bold_R ∈ blackboard_R start_POSTSUPERSCRIPT italic_M × italic_N end_POSTSUPERSCRIPT are typically used to record user-item interactions (e.g., clicks or purchases), where ru,i=1subscript𝑟𝑢𝑖1r_{u,i}=1italic_r start_POSTSUBSCRIPT italic_u , italic_i end_POSTSUBSCRIPT = 1 indicates that user u𝑢uitalic_u has interacted with item i𝑖iitalic_i, otherwise ru,i=0subscript𝑟𝑢𝑖0r_{u,i}=0italic_r start_POSTSUBSCRIPT italic_u , italic_i end_POSTSUBSCRIPT = 0. Following most GNN-based recommendation works (He et al., 2020, 2023; Huang et al., 2021), we formulate the interaction behaviors between users and items as a standard bipartite graph 𝒢={𝒱,𝐀}𝒢𝒱𝐀\mathcal{G}=\{\mathcal{V},\mathbf{A}\}caligraphic_G = { caligraphic_V , bold_A }, where 𝒱=𝒰𝒱𝒰\mathcal{V}=\mathcal{U}\cup\mathcal{I}caligraphic_V = caligraphic_U ∪ caligraphic_I involves all graph nodes, and the adjacent matrix 𝐀𝐀\mathbf{A}bold_A is defined as follows:

𝐀=[𝟎M×M𝐑𝐑T𝟎N×N].𝐀delimited-[]superscript0𝑀𝑀𝐑superscript𝐑𝑇superscript0𝑁𝑁\mathbf{A}=\left[\begin{array}[]{cc}\mathbf{0}^{M\times M}&\mathbf{R}\\ \mathbf{R}^{T}&\mathbf{0}^{N\times N}\end{array}\right].bold_A = [ start_ARRAY start_ROW start_CELL bold_0 start_POSTSUPERSCRIPT italic_M × italic_M end_POSTSUPERSCRIPT end_CELL start_CELL bold_R end_CELL end_ROW start_ROW start_CELL bold_R start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT end_CELL start_CELL bold_0 start_POSTSUPERSCRIPT italic_N × italic_N end_POSTSUPERSCRIPT end_CELL end_ROW end_ARRAY ] .

Following the common practice (He et al., 2020; Cai et al., 2023), we encode the user u𝑢uitalic_u and item i𝑖iitalic_i as d-dimensional latent vectors 𝐞udsubscript𝐞𝑢superscript𝑑\mathbf{e}_{u}\in\mathbb{R}^{d}bold_e start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT and 𝐞idsubscript𝐞𝑖superscript𝑑\mathbf{e}_{i}\in\mathbb{R}^{d}bold_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, respectively. Besides, 𝐄={𝐞uu𝒰}{𝐞ii}𝐄conditional-setsubscript𝐞𝑢𝑢𝒰conditional-setsubscript𝐞𝑖𝑖\mathbf{E}=\{\mathbf{e}_{u}\mid u\in\mathcal{U}\}\cup\{\mathbf{e}_{i}\mid i\in% \mathcal{I}\}bold_E = { bold_e start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ∣ italic_u ∈ caligraphic_U } ∪ { bold_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∣ italic_i ∈ caligraphic_I } is defined as the overall learnable embedding matrix for all nodes.

Similar to other GCL-based works (Yu et al., 2022; Wu et al., 2021; Wei et al., 2022), this paper adopts the LightGCN (He et al., 2020) as model backbone. Specifically, the comprehensive graph representations 𝐳usubscript𝐳𝑢\mathbf{z}_{u}bold_z start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT and 𝐳isubscript𝐳𝑖\mathbf{z}_{i}bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT for user u𝑢uitalic_u and item i𝑖iitalic_i in LightGCN are calculated by

𝐳usubscript𝐳𝑢\displaystyle\mathbf{z}_{u}bold_z start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT =l=0L𝐡u(l),𝐡u(l)=j𝒩u1|𝒩u||𝒩j|𝐡j(l1),l1,formulae-sequenceabsentsuperscriptsubscript𝑙0𝐿superscriptsubscript𝐡𝑢𝑙formulae-sequencesuperscriptsubscript𝐡𝑢𝑙subscript𝑗subscript𝒩𝑢1subscript𝒩𝑢subscript𝒩𝑗superscriptsubscript𝐡𝑗𝑙1𝑙1\displaystyle=\sum_{l=0}^{L}\mathbf{h}_{u}^{(l)},\quad\mathbf{h}_{u}^{(l)}=% \sum_{j\in\mathcal{N}_{u}}\frac{1}{\sqrt{|\mathcal{N}_{u}||\mathcal{N}_{j}|}}% \mathbf{h}_{j}^{(l-1)},\quad l\geq 1,= ∑ start_POSTSUBSCRIPT italic_l = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT bold_h start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT , bold_h start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_N start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG square-root start_ARG | caligraphic_N start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT | | caligraphic_N start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | end_ARG end_ARG bold_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l - 1 ) end_POSTSUPERSCRIPT , italic_l ≥ 1 ,
𝐳isubscript𝐳𝑖\displaystyle\mathbf{z}_{i}bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT =l=0L𝐡i(l),𝐡i(l)=v𝒩i1|𝒩i||𝒩v|𝐡v(l1),l1,formulae-sequenceabsentsuperscriptsubscript𝑙0𝐿superscriptsubscript𝐡𝑖𝑙formulae-sequencesuperscriptsubscript𝐡𝑖𝑙subscript𝑣subscript𝒩𝑖1subscript𝒩𝑖subscript𝒩𝑣superscriptsubscript𝐡𝑣𝑙1𝑙1\displaystyle=\sum_{l=0}^{L}\mathbf{h}_{i}^{(l)},\quad\mathbf{h}_{i}^{(l)}=% \sum_{v\in\mathcal{N}_{i}}\frac{1}{\sqrt{|\mathcal{N}_{i}||\mathcal{N}_{v}|}}% \mathbf{h}_{v}^{(l-1)},\quad l\geq 1,= ∑ start_POSTSUBSCRIPT italic_l = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT bold_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT , bold_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_v ∈ caligraphic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG square-root start_ARG | caligraphic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | | caligraphic_N start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT | end_ARG end_ARG bold_h start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l - 1 ) end_POSTSUPERSCRIPT , italic_l ≥ 1 ,

where 𝒩usubscript𝒩𝑢\mathcal{N}_{u}caligraphic_N start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT and 𝒩isubscript𝒩𝑖\mathcal{N}_{i}caligraphic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT indicate the neighboring nodes of user u𝑢uitalic_u and item i𝑖iitalic_i, respectively. 𝐡u(l)subscriptsuperscript𝐡𝑙𝑢\mathbf{h}^{(l)}_{u}bold_h start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT and 𝐡i(l)subscriptsuperscript𝐡𝑙𝑖\mathbf{h}^{(l)}_{i}bold_h start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT means the l𝑙litalic_l-th layer graph representation for user u𝑢uitalic_u and item i𝑖iitalic_i, respectively. Here, 𝐡u(0)subscriptsuperscript𝐡0𝑢\mathbf{h}^{(0)}_{u}bold_h start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT and 𝐡i(0)subscriptsuperscript𝐡0𝑖\mathbf{h}^{(0)}_{i}bold_h start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are initialized with the learnable embedding 𝐞usubscript𝐞𝑢\mathbf{e}_{u}bold_e start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT and 𝐞isubscript𝐞𝑖\mathbf{e}_{i}bold_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, respectively. The predicted score r^u,isubscript^𝑟𝑢𝑖\hat{r}_{u,i}over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT italic_u , italic_i end_POSTSUBSCRIPT for the (u,i)𝑢𝑖(u,i)( italic_u , italic_i ) pair is computed as the inner product of their graph representations, i.e., r^u,i=𝐳u,𝐳isubscript^𝑟𝑢𝑖expectationsubscript𝐳𝑢subscript𝐳𝑖\hat{r}_{u,i}=\Braket{\mathbf{z}_{u},\mathbf{z}_{i}}over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT italic_u , italic_i end_POSTSUBSCRIPT = ⟨ start_ARG bold_z start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT , bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ⟩. Finally, the BPR (Rendle et al., 2012) loss is adopted as the optimization objective:

(1) BPR=u𝒰i+u+iulnσ(r^u,i+r^u,i),subscript𝐵𝑃𝑅subscript𝑢𝒰subscriptsuperscript𝑖superscriptsubscript𝑢subscriptsuperscript𝑖superscriptsubscript𝑢𝜎subscript^𝑟𝑢superscript𝑖subscript^𝑟𝑢superscript𝑖\displaystyle\mathcal{L}_{BPR}=-\sum_{u\in\mathcal{U}}\sum_{i^{+}\in\mathcal{I% }_{u}^{+}}\sum_{i^{-}\in\mathcal{I}_{u}^{-}}\ln\sigma(\hat{r}_{u,i^{+}}-\hat{r% }_{u,i^{-}}),caligraphic_L start_POSTSUBSCRIPT italic_B italic_P italic_R end_POSTSUBSCRIPT = - ∑ start_POSTSUBSCRIPT italic_u ∈ caligraphic_U end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT ∈ caligraphic_I start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ∈ caligraphic_I start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_POSTSUBSCRIPT roman_ln italic_σ ( over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT italic_u , italic_i start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT italic_u , italic_i start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) ,

where σ(x)=1/(1+ex)𝜎𝑥11superscript𝑒𝑥\sigma(x)=1/(1+e^{-x})italic_σ ( italic_x ) = 1 / ( 1 + italic_e start_POSTSUPERSCRIPT - italic_x end_POSTSUPERSCRIPT ), u+superscriptsubscript𝑢\mathcal{I}_{u}^{+}caligraphic_I start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT and usuperscriptsubscript𝑢\mathcal{I}_{u}^{-}caligraphic_I start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT represent the positive item and unobserved item set for user u𝑢uitalic_u, respectively.

2.2. GCL-based Recommenders

In real-world scenarios, interaction behaviors between users and items are actually highly sparse, which can lead to severe overfitting and bias problems (Wu et al., 2021; jing2023contrastive). Graph contrastive learning (GCL), as a novel learning paradigm, helps mitigate the above problems (Yu et al., 2022; Cai et al., 2023). In specific, GCL firstly generates diverse graph views for each user and item (e.g., node dropout and feature masking). Then the different views of the same user (item) are treated as the positive pairs, while the different views of the different instances are treated as the negative pairs. Finally, contrastive learning loss is used to optimize the model parameters with paired users and items, where InfoNCE (Oord et al., 2018) is the most commonly adopted loss. Formally, the contrastive learning loss for the user side can be defined as follows:

(2) CLU(𝐱u,𝐲u)=u𝒰logexp(sim(𝐱u,𝐲u)/τ)v𝒰exp(sim(𝐱u,𝐲v)/τ),superscriptsubscript𝐶𝐿𝑈subscript𝐱𝑢subscript𝐲𝑢subscript𝑢𝒰𝑠𝑖𝑚subscript𝐱𝑢subscript𝐲𝑢𝜏subscript𝑣𝒰𝑠𝑖𝑚subscript𝐱𝑢subscript𝐲𝑣𝜏\displaystyle\mathcal{L}_{CL}^{U}(\mathbf{x}_{u},\mathbf{y}_{u})=\sum_{u\in% \mathcal{U}}-\log\frac{\exp(sim(\mathbf{x}_{u},\mathbf{y}_{u})/\tau)}{\sum_{v% \in\mathcal{U}}\exp(sim(\mathbf{x}_{u},\mathbf{y}_{v})/\tau)},caligraphic_L start_POSTSUBSCRIPT italic_C italic_L end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_U end_POSTSUPERSCRIPT ( bold_x start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT , bold_y start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ) = ∑ start_POSTSUBSCRIPT italic_u ∈ caligraphic_U end_POSTSUBSCRIPT - roman_log divide start_ARG roman_exp ( italic_s italic_i italic_m ( bold_x start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT , bold_y start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ) / italic_τ ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_v ∈ caligraphic_U end_POSTSUBSCRIPT roman_exp ( italic_s italic_i italic_m ( bold_x start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT , bold_y start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT ) / italic_τ ) end_ARG ,

where 𝐱usubscript𝐱𝑢\mathbf{x}_{u}bold_x start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT and 𝐲usubscript𝐲𝑢\mathbf{y}_{u}bold_y start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT denote the two different augmented views of user u𝑢uitalic_u, sim(,)𝑠𝑖𝑚sim(\cdot,\cdot)italic_s italic_i italic_m ( ⋅ , ⋅ ) and τ𝜏\tauitalic_τ represents the cosine similarity function and temperature hyper-parameter, respectively. Similarly, the contrastive learning loss of the item side is formulated as follows:

(3) CLI(𝐱i,𝐲i)=ilogexp(sim(𝐱i,𝐲i)/τ)jexp(sim(𝐱i,𝐲j)/τ).superscriptsubscript𝐶𝐿𝐼subscript𝐱𝑖subscript𝐲𝑖subscript𝑖𝑠𝑖𝑚subscript𝐱𝑖subscript𝐲𝑖𝜏subscript𝑗𝑠𝑖𝑚subscript𝐱𝑖subscript𝐲𝑗𝜏\displaystyle\mathcal{L}_{CL}^{I}(\mathbf{x}_{i},\mathbf{y}_{i})=\sum_{i\in% \mathcal{I}}-\log\frac{\exp(sim(\mathbf{x}_{i},\mathbf{y}_{i})/\tau)}{\sum_{j% \in\mathcal{I}}\exp(sim(\mathbf{x}_{i},\mathbf{y}_{j})/\tau)}.caligraphic_L start_POSTSUBSCRIPT italic_C italic_L end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_I end_POSTSUPERSCRIPT ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_I end_POSTSUBSCRIPT - roman_log divide start_ARG roman_exp ( italic_s italic_i italic_m ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) / italic_τ ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_I end_POSTSUBSCRIPT roman_exp ( italic_s italic_i italic_m ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) / italic_τ ) end_ARG .

where 𝐱isubscript𝐱𝑖\mathbf{x}_{i}bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and 𝐲isubscript𝐲𝑖\mathbf{y}_{i}bold_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT denote the two different views of item i𝑖iitalic_i.

2.3. Adversarial Robustness

Adversarial training (AT) stands out as one of the most promising approaches for bolstering adversarial robustness (Moosavi-Dezfooli et al., 2016; Madry et al., 2017; Goodfellow et al., 2014). The goal of AT is to increase model robustness by generating adversarial examples through well-designed perturbations, which purposefully induce the neural network to error. Formally, the optimal perturbation for data sample (x,y)𝑥𝑦(x,y)( italic_x , italic_y ) is found by maximizing the loss function ():δ=argmax(x+δ,y;𝜽):superscript𝛿𝑥𝛿𝑦𝜽\mathcal{L}(\cdot):\delta^{*}=\arg\max\mathcal{L}(x+\delta,y;\bm{\theta})caligraphic_L ( ⋅ ) : italic_δ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = roman_arg roman_max caligraphic_L ( italic_x + italic_δ , italic_y ; bold_italic_θ ) where δ𝛿\deltaitalic_δ represents an adversarial perturbation of psubscript𝑝\ell_{p}roman_ℓ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT norm smaller than ϵitalic-ϵ\epsilonitalic_ϵ. Then, the model is trained on a mixture of both original clean examples and generated adversarial examples to enhance the robustness ability.

Discussion. Adversarial robustness uncovers the root cause of the model’s adversarial vulnerability, that is, the non-smooth feature space near data samples (Jiang et al., 2020). In other words, small input perturbations likely result in large changes in the potential semantics, subsequently affecting the model output, which is the basis challenge that adversarial defense algorithms strive to resolve. Actually, this particularly fits well with graph contrastive learning, which aims to maximize the consistency of the given instance under different augmentation views. More importantly, adversarial robustness provides the maximum boundary of feature perturbations that the model can tolerate (cf. Sec 3.2), which effectively restrains the exploration space for contrastive augmentation and guides the generation of optimal view-generator.

3. Our Approach: RGCL

Refer to caption
Figure 2. Overall framework of our proposed dynamic decision boundary-aware graph contrastive learning framework RGCL.

3.1. Overall Framework

The overall framework of RGCL is presented in Figure 2. In specific, we calculate the maximum feature perturbations to guide the subsequent generation of both contrastive examples and adversarial examples. For contrastive examples, we firstly generate two random-augmented views 𝐙superscript𝐙\mathbf{Z}^{\prime}bold_Z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and 𝐙′′superscript𝐙′′\mathbf{Z}^{\prime\prime}bold_Z start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT using random perturbations. Besides, the third view 𝐙acsuperscript𝐙𝑎𝑐\mathbf{Z}^{ac}bold_Z start_POSTSUPERSCRIPT italic_a italic_c end_POSTSUPERSCRIPT, which we refer to as adversarial-contrastive view, is generated through maximizing relation-aware contrastive function. On the foundation of these contrastive samples, we employ multi-view contrastive learning to prompt high-quality representations. Furthermore, to safeguard the model robustness against potential compromises arising from the uniformity optimization of graph contrastive learning, we generate adversarial examples using maximum perturbation to strenuously enlarge the distances between data points and the decision boundary. Finally, the model is updated by employing a joint optimization objective with augmented contrastive and adversarial data.

3.2. Decision Boundary-aware Perturbation

To build our contrastive samples, we first derive perturbations that the original samples can maximally tolerate to maintain user preferences. Ideally, the perturbations should satisfy two conditions: (1) the perturbations should be as large as possible, such that the obtained contrastive samples are hard enough (hardness requirement). (2) The augmented samples after incorporating the perturbations should be still aligned with the user’s original preferences (rationality requirement).

Different from traditional adversarial learning problems based on classification settings, recommender system is basically a ranking problem, and the perturbations should be learned to maintain user preference rankings. To this end, we propose to learn the maximum perturbations that can maintain item pair-wise rankings. Furthermore, given that different orders of graph representations possess different levels of expressive capacity, that is, higher-layer representations aggregate richer structure information and reflect more complex connectivity patterns. Consequently, we tailor the maximum perturbation for each high-order graph representation independently. In specific, for each user u𝑢uitalic_u and a positive-negative item pair (i+,i)superscript𝑖superscript𝑖(i^{+},i^{-})( italic_i start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT , italic_i start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ), suppose their original representations are 𝐳u=l=0L𝐡u(l)subscript𝐳𝑢superscriptsubscript𝑙0𝐿superscriptsubscript𝐡𝑢𝑙\mathbf{{z}}_{u}=\sum_{l=0}^{L}\mathbf{h}_{u}^{(l)}bold_z start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_l = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT bold_h start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT, 𝐳i+=l=0L𝐡i+(l)superscriptsubscript𝐳superscript𝑖topsuperscriptsubscript𝑙0𝐿superscriptsubscript𝐡superscript𝑖𝑙\mathbf{z}_{i^{+}}^{\top}=\sum_{l=0}^{L}\mathbf{h}_{i^{+}}^{(l)}bold_z start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_l = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT bold_h start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT, and 𝐳i=l=0L𝐡i(l)superscriptsubscript𝐳superscript𝑖topsuperscriptsubscript𝑙0𝐿superscriptsubscript𝐡superscript𝑖𝑙\mathbf{z}_{i^{-}}^{\top}=\sum_{l=0}^{L}\mathbf{h}_{i^{-}}^{(l)}bold_z start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_l = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT bold_h start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT, respectively. We define the pair-wise ranking function as g(u,i+,i)=𝐳~u(k),𝐳i+𝐳~u(k),𝐳i𝑔𝑢superscript𝑖superscript𝑖expectationsuperscriptsubscript~𝐳𝑢𝑘subscript𝐳superscript𝑖expectationsuperscriptsubscript~𝐳𝑢𝑘subscript𝐳superscript𝑖g(u,i^{+},i^{-})=\Braket{\mathbf{\tilde{z}}_{u}^{(k)},\mathbf{{z}}_{i^{+}}}-% \Braket{\mathbf{\tilde{z}}_{u}^{(k)},\mathbf{{z}}_{i^{-}}}italic_g ( italic_u , italic_i start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT , italic_i start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ) = ⟨ start_ARG over~ start_ARG bold_z end_ARG start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT , bold_z start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_ARG ⟩ - ⟨ start_ARG over~ start_ARG bold_z end_ARG start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT , bold_z start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_ARG ⟩, where 𝐳~u(k)=l=0,lkL𝐡u(l)+(𝐡u(k)+𝚫)superscriptsubscript~𝐳𝑢𝑘superscriptsubscriptformulae-sequence𝑙0𝑙𝑘𝐿superscriptsubscript𝐡𝑢𝑙superscriptsubscript𝐡𝑢𝑘𝚫\mathbf{\tilde{z}}_{u}^{(k)}=\sum_{l=0,l\neq k}^{L}\mathbf{h}_{u}^{(l)}+(% \mathbf{h}_{u}^{(k)}+\bm{\Delta})over~ start_ARG bold_z end_ARG start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_l = 0 , italic_l ≠ italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT bold_h start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT + ( bold_h start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT + bold_Δ ) is the user embedding after incorporating perturbation 𝚫d𝚫superscript𝑑\bm{\Delta}\in\mathbb{R}^{d}bold_Δ ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT to k𝑘kitalic_k-th layer graph representation 𝐡u(k)superscriptsubscript𝐡𝑢𝑘\mathbf{h}_{u}^{(k)}bold_h start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT, and <,><\cdot,\cdot>< ⋅ , ⋅ > means inner product. Then, the learning objective of perturbation 𝚫𝚫\bm{\Delta}bold_Δ is designed as follows:

(4) 𝚫u(k)=argmax𝚫𝚫ps.t. g(u,i+,i)>0,formulae-sequencesuperscriptsubscript𝚫𝑢𝑘subscript𝚫subscriptnorm𝚫𝑝s.t. 𝑔𝑢superscript𝑖superscript𝑖0\mathbf{\Delta}_{u}^{(k)}=\arg\max_{\bm{\Delta}}||\bm{\Delta}||_{p}\quad\text{% s.t. }g(u,i^{+},i^{-})>0,bold_Δ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT = roman_arg roman_max start_POSTSUBSCRIPT bold_Δ end_POSTSUBSCRIPT | | bold_Δ | | start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT s.t. italic_g ( italic_u , italic_i start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT , italic_i start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ) > 0 ,

where p\|\cdot\|_{p}∥ ⋅ ∥ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT means the vector’s p-norm. Here, pair-wise ranking function g()𝑔g(\cdot)italic_g ( ⋅ ) is linearized around the k𝑘kitalic_k-th representation 𝐡u(k)superscriptsubscript𝐡𝑢𝑘\mathbf{h}_{u}^{(k)}bold_h start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT, thus the maximum perturbation 𝚫u(k)superscriptsubscript𝚫𝑢𝑘\bm{\Delta}_{u}^{(k)}bold_Δ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT is exactly corresponding to the orthogonal projection of 𝐡u(k)superscriptsubscript𝐡𝑢𝑘\mathbf{h}_{u}^{(k)}bold_h start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT onto the model decision hyperplane.

For the sake of simplicity and better interpretation, we denote that f(𝐡u(k))=g(u,i+,i)/𝐡u(k)𝑓superscriptsubscript𝐡𝑢𝑘𝑔𝑢superscript𝑖superscript𝑖superscriptsubscript𝐡𝑢𝑘f(\mathbf{h}_{u}^{(k)})=\partial{g(u,i^{+},i^{-})}/\partial{\mathbf{h}_{u}^{(k% )}}italic_f ( bold_h start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ) = ∂ italic_g ( italic_u , italic_i start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT , italic_i start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ) / ∂ bold_h start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT. The maximum perturbation 𝚫u(k)superscriptsubscript𝚫𝑢𝑘\mathbf{\Delta}_{u}^{(k)}bold_Δ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT is equivalent to solving for the directional vector from 𝐡u(k)superscriptsubscript𝐡𝑢𝑘\mathbf{h}_{u}^{(k)}bold_h start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT to the decision boundary, which is formally given as follows:

(5) 𝚫u(k)=g(u,i+,i)f(𝐡u(k))qqsign(f(𝐡u(k)))f(𝐡u(k))q1,superscriptsubscript𝚫𝑢𝑘direct-product𝑔𝑢superscript𝑖superscript𝑖superscriptsubscriptnorm𝑓superscriptsubscript𝐡𝑢𝑘𝑞𝑞sign𝑓superscriptsubscript𝐡𝑢𝑘superscriptnorm𝑓superscriptsubscript𝐡𝑢𝑘𝑞1\mathbf{\Delta}_{u}^{(k)}=-\frac{g(u,i^{+},i^{-})}{\|f(\mathbf{h}_{u}^{(k)})\|% _{q}^{q}}\cdot\text{sign}(f(\mathbf{h}_{u}^{(k)}))\odot\|f(\mathbf{h}_{u}^{(k)% })\|^{q-1},bold_Δ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT = - divide start_ARG italic_g ( italic_u , italic_i start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT , italic_i start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ) end_ARG start_ARG ∥ italic_f ( bold_h start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ) ∥ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT end_ARG ⋅ sign ( italic_f ( bold_h start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ) ) ⊙ ∥ italic_f ( bold_h start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT italic_q - 1 end_POSTSUPERSCRIPT ,

where sign()sign\text{sign}(\cdot)sign ( ⋅ ) is the sign function, and direct-product\odot denotes element-wise product. The value of q𝑞qitalic_q depends on the choice of perturbation norm psubscript𝑝\ell_{p}roman_ℓ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT (1p1𝑝1\leq p\leq\infty1 ≤ italic_p ≤ ∞), and satisfies that 1p+1q=11𝑝1𝑞1\frac{1}{p}+\frac{1}{q}=1divide start_ARG 1 end_ARG start_ARG italic_p end_ARG + divide start_ARG 1 end_ARG start_ARG italic_q end_ARG = 1 by following Holder’s Inequality’s constraint (Moosavi-Dezfooli et al., 2016). In our work, p𝑝pitalic_p is set as \infty and q𝑞qitalic_q is set as 1, as we empirically found that perturbation constraints under the subscript\ell_{\infty}roman_ℓ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT norm have better model performance.

Following that, since users often interact with multiple items in real-world recommendation scenarios, we extend the above method to all interactions of user u𝑢uitalic_u for deriving the final optimal perturbation constraint, which can be rewritten as follows:

(6) 𝚫u(k)=g(u,i+,i)f(𝐡u(k))1sign(f(𝐡u(k))),superscriptsubscript𝚫𝑢𝑘𝑔𝑢superscript𝑖superscript𝑖subscriptnorm𝑓superscriptsubscript𝐡𝑢𝑘1sign𝑓superscriptsubscript𝐡𝑢𝑘\displaystyle\mathbf{\Delta}_{u}^{(k)}=-\frac{g(u,i^{+},i^{-})}{\|f(\mathbf{h}% _{u}^{(k)})\|_{1}}\cdot\text{sign}(f(\mathbf{h}_{u}^{(k)})),bold_Δ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT = - divide start_ARG italic_g ( italic_u , italic_i start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT , italic_i start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ) end_ARG start_ARG ∥ italic_f ( bold_h start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG ⋅ sign ( italic_f ( bold_h start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ) ) ,
wherei+,i=argmini+u+,iu|g(u,i+,i)f(𝐡u(k))1|.wheresuperscript𝑖superscript𝑖formulae-sequencesuperscript𝑖superscriptsubscript𝑢superscript𝑖superscriptsubscript𝑢𝑔𝑢superscript𝑖superscript𝑖subscriptnorm𝑓superscriptsubscript𝐡𝑢𝑘1\displaystyle\text{where}\ i^{+},i^{-}=\underset{i^{+}\in\mathcal{I}_{u}^{+},i% ^{-}\in\mathcal{I}_{u}^{-}}{\arg\min}\left|\frac{g(u,i^{+},i^{-})}{\|f(\mathbf% {h}_{u}^{(k)})\|_{1}}\right|.where italic_i start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT , italic_i start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT = start_UNDERACCENT italic_i start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT ∈ caligraphic_I start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT , italic_i start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ∈ caligraphic_I start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_UNDERACCENT start_ARG roman_arg roman_min end_ARG | divide start_ARG italic_g ( italic_u , italic_i start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT , italic_i start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ) end_ARG start_ARG ∥ italic_f ( bold_h start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG | .

Note that we only focus on perturbing the high-order graph representations for users and items, while skipping the beginning features, i.e., 1kL1𝑘𝐿1\leq k\leq L1 ≤ italic_k ≤ italic_L. This is because the original features contain the most abundant semantic information, and polluting these features could lead to a severe performance decrease. On the other hand, by perturbing higher-order representations, we subtly and implicitly disrupt the potential semantic and structural characteristics. Intuitively, it can efficaciously simulates the noise encountered in real-world scenarios, thereby further enhancing the model robustness. Similarly, we can obtain the graph perturbations of item nodes from a dual perspective.

3.3. Relation-aware Contrastive Learning with Perturbation Constraints

As highlighted in Sec. 1, existing GCL-based recommenders struggle to achieve a harmonious balance between contrastive hardness and rationality, both of which are pivotal to acquire high-quality user (item) representations. To this end, in this subsection, we meticulously design the relation-aware adversarial-contrastive objective, which utilizes the global relationships among user-user and item-item to create more challenging positive and hard negative pairs under perturbation constraints. Finally, we optimize the representations through multi-view contrastive learning.

3.3.1. Perturbation-constrained Contrastive Augmentation

Following previous works (Yu et al., 2022, 2023), we adopt the random perturbations {𝐫u(l):l=1,2,,L}conditional-setsuperscriptsubscript𝐫𝑢𝑙𝑙12𝐿\{\mathbf{r}_{u}^{(l)}:l=1,2,\cdots,L\}{ bold_r start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT : italic_l = 1 , 2 , ⋯ , italic_L } for user u𝑢uitalic_u to generate the first random contrastive view 𝐳usuperscriptsubscript𝐳𝑢\mathbf{z}_{u}^{\prime}bold_z start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT as follows:

(7) 𝐳u=1L+1(𝐡u(0)+l=1L(𝐡u(l)+𝐫u(l))),superscriptsubscript𝐳𝑢1𝐿1superscriptsubscript𝐡𝑢0superscriptsubscript𝑙1𝐿superscriptsubscript𝐡𝑢𝑙superscriptsubscript𝐫𝑢𝑙\displaystyle\mathbf{z}_{u}^{\prime}=\frac{1}{L+1}\left(\mathbf{h}_{u}^{(0)}+% \sum_{l=1}^{L}\left(\mathbf{h}_{u}^{(l)}+\mathbf{r}_{u}^{(l)}\right)\right),bold_z start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_L + 1 end_ARG ( bold_h start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT + ∑ start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ( bold_h start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT + bold_r start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ) ) ,
where 𝐫u(l)=ϵ𝐫sign(𝐡u(l))𝐫sign(𝐡u(l))2.where superscriptsubscript𝐫𝑢𝑙italic-ϵdirect-product𝐫signsuperscriptsubscript𝐡𝑢𝑙subscriptnormdirect-product𝐫signsuperscriptsubscript𝐡𝑢𝑙2\displaystyle\text{where\quad}\mathbf{r}_{u}^{(l)}=\epsilon\cdot\frac{\mathbf{% r}\odot\text{sign}(\mathbf{h}_{u}^{(l)})}{\|\mathbf{r}\odot\text{sign}(\mathbf% {h}_{u}^{(l)})\|_{2}}.where bold_r start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT = italic_ϵ ⋅ divide start_ARG bold_r ⊙ sign ( bold_h start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ) end_ARG start_ARG ∥ bold_r ⊙ sign ( bold_h start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG .

Here, 𝐫d𝐫superscript𝑑\mathbf{r}\in\mathbb{R}^{d}bold_r ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT following a uniform distribution U(0,1)𝑈01U(0,1)italic_U ( 0 , 1 ), and ϵitalic-ϵ\epsilonitalic_ϵ is a hyper-parameter to control the initial perturbation magnitude. Similarly, we could obtain the augmentation views 𝐳isuperscriptsubscript𝐳𝑖\mathbf{z}_{i}^{\prime}bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT for item i𝑖iitalic_i.

Following that, we can get the second augmented representations 𝐳u′′superscriptsubscript𝐳𝑢′′\mathbf{z}_{u}^{\prime\prime}bold_z start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT and 𝐳i′′superscriptsubscript𝐳𝑖′′\mathbf{z}_{i}^{\prime\prime}bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT in the same way but utilizing the perturbations r with different random initialization for more diverse contrastive effects.

However, different users and items have unique intrinsic robustness, which means that even imperceptible perturbations may result in large semantic changes for fragile instances. In turn, they unintentionally lead to the erroneous feature-label examples, which is heavily overlooked by existing GCL methods. Therefore, we propose to employ the instance-wise perturbation constrains to guide the generation of contrastive samples, aiming to avoid lossing task-relevant semantic information and build rational view-generator. Specifically, for the l𝑙litalic_l-layer augmentation perturbations 𝐫u(l)superscriptsubscript𝐫𝑢𝑙\mathbf{r}_{u}^{(l)}bold_r start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT, we constrain its exploration space by using the following projection operation Π()Π\Pi(\cdot)roman_Π ( ⋅ ) to obtain the constrained perturbation 𝐫~u(l)superscriptsubscript~𝐫𝑢𝑙\tilde{\mathbf{r}}_{u}^{(l)}over~ start_ARG bold_r end_ARG start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT:

(8) 𝐫~u(l)=Π(𝐫u(l))=min(abs(𝚫u(l)),max(abs(𝚫u(l)),𝐫u(l)),\mathbf{\tilde{r}}_{u}^{(l)}=\Pi(\mathbf{r}_{u}^{(l)})=\min(abs(\bm{\Delta}_{u% }^{(l)}),\max(-abs(\bm{\Delta}_{u}^{(l)}),\mathbf{r}_{u}^{(l)}),over~ start_ARG bold_r end_ARG start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT = roman_Π ( bold_r start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ) = roman_min ( italic_a italic_b italic_s ( bold_Δ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ) , roman_max ( - italic_a italic_b italic_s ( bold_Δ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ) , bold_r start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ) ,

where max(,)\max(\cdot,\cdot)roman_max ( ⋅ , ⋅ ) and min(,)\min(\cdot,\cdot)roman_min ( ⋅ , ⋅ ) are both wise-element operations, and abs()𝑎𝑏𝑠abs(\cdot)italic_a italic_b italic_s ( ⋅ ) computes the absolute value of each element for the given vector. Here, we conservatively constrain the magnitude of random perturbation 𝐫~u(l)superscriptsubscript~𝐫𝑢𝑙\mathbf{\tilde{r}}_{u}^{(l)}over~ start_ARG bold_r end_ARG start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT within a bounded δu(l)superscriptsubscript𝛿𝑢𝑙\delta_{u}^{(l)}italic_δ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT-ball, where we define δu(l)superscriptsubscript𝛿𝑢𝑙\delta_{u}^{(l)}italic_δ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT as 𝚫u(l)subscriptnormsuperscriptsubscript𝚫𝑢𝑙||\mathbf{\Delta}_{u}^{(l)}||_{\infty}| | bold_Δ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT | | start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT. The main motivation behind Eq. (8) is that 𝚫u(l)superscriptsubscript𝚫𝑢𝑙\mathbf{\Delta}_{u}^{(l)}bold_Δ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT is the maximum perturbation with the most attacking direction, and our conservative strategy ensures that other perturbation direction bounded within the ball could also safely maintain semantic invariance. Consequently, we replace 𝐫u(l)superscriptsubscript𝐫𝑢𝑙\mathbf{r}_{u}^{(l)}bold_r start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT in Eq. (7) with constrained perturbation 𝐫~u(l)superscriptsubscript~𝐫𝑢𝑙\mathbf{\tilde{r}}_{u}^{(l)}over~ start_ARG bold_r end_ARG start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT for achieving contrastive rationality.

3.3.2. Relation-aware Adversarial-Contrastive Augmentation

To break the assumption of instance independence in traditional GCL-based algorithms and simultaneously further enhance the hardness of contrastive examples, RGCL generates the relation-aware adversarial-contrastive perturbations to fool the model by confusing the identities among different users and items. To be specific, we propose to maximize the following contrastive loss for generating instance-specific perturbations 𝜼𝜼\bm{\eta}bold_italic_η:

max𝜼u𝒰logexp(sim(𝐳¨u,𝐳u′′)/τ)exp(sim(𝐳¨u,𝐳u′′)/τ)+v𝒰/uexp(sim(𝐳¨u,𝐳v′′)/τ),𝜼subscript𝑢𝒰𝑠𝑖𝑚subscript¨𝐳𝑢superscriptsubscript𝐳𝑢′′𝜏𝑠𝑖𝑚subscript¨𝐳𝑢superscriptsubscript𝐳𝑢′′𝜏subscript𝑣𝒰𝑢𝑠𝑖𝑚subscript¨𝐳𝑢superscriptsubscript𝐳𝑣′′𝜏\displaystyle\underset{\bm{\eta}}{\max}\sum_{u\in\mathcal{U}}-\log\frac{\exp(% sim(\ddot{\mathbf{z}}_{u},\mathbf{z}_{u}^{\prime\prime})/\tau)}{\exp(sim(\ddot% {\mathbf{z}}_{u},\mathbf{z}_{u}^{\prime\prime})/\tau)+\sum_{v\in\mathcal{U}/{u% }}\exp(sim(\ddot{\mathbf{z}}_{u},\mathbf{z}_{v}^{\prime\prime})/\tau)},underbold_italic_η start_ARG roman_max end_ARG ∑ start_POSTSUBSCRIPT italic_u ∈ caligraphic_U end_POSTSUBSCRIPT - roman_log divide start_ARG roman_exp ( italic_s italic_i italic_m ( over¨ start_ARG bold_z end_ARG start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT , bold_z start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) / italic_τ ) end_ARG start_ARG roman_exp ( italic_s italic_i italic_m ( over¨ start_ARG bold_z end_ARG start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT , bold_z start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) / italic_τ ) + ∑ start_POSTSUBSCRIPT italic_v ∈ caligraphic_U / italic_u end_POSTSUBSCRIPT roman_exp ( italic_s italic_i italic_m ( over¨ start_ARG bold_z end_ARG start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT , bold_z start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) / italic_τ ) end_ARG ,
(9) where𝐳¨u=1L+1(𝐡u(0)+l=1L(𝐡u(l)+𝐫~u(l)+𝜼u(l))),wheresubscript¨𝐳𝑢1𝐿1superscriptsubscript𝐡𝑢0superscriptsubscript𝑙1𝐿superscriptsubscript𝐡𝑢𝑙superscriptsubscript~𝐫𝑢𝑙superscriptsubscript𝜼𝑢𝑙\displaystyle\text{where}\ \ddot{\mathbf{z}}_{u}=\frac{1}{L+1}\left(\mathbf{h}% _{u}^{(0)}+\sum_{l=1}^{L}\left(\mathbf{h}_{u}^{(l)}+\mathbf{\tilde{r}}_{u}^{(l% )}+\bm{\eta}_{u}^{(l)}\right)\right),where over¨ start_ARG bold_z end_ARG start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_L + 1 end_ARG ( bold_h start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT + ∑ start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ( bold_h start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT + over~ start_ARG bold_r end_ARG start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT + bold_italic_η start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ) ) ,

and 𝜼={||𝜼u(l)||δu(l):u𝒰,l{1,2,,L}}\bm{\eta}=\{||\bm{\eta}_{u}^{(l)}||_{\infty}\leq\delta_{u}^{(l)}:u\in\mathcal{% U},l\in\{1,2,\dots,L\}\}bold_italic_η = { | | bold_italic_η start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT | | start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ≤ italic_δ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT : italic_u ∈ caligraphic_U , italic_l ∈ { 1 , 2 , … , italic_L } } denotes the perturbation set of user u𝑢uitalic_u. However, as the general GNN-based recommenders involve nonlinear transformations, it is extremely challenging to find a closed-form solution for the above optimization problem. Drawing inspiration from the fast gradient sign method (FGSM) proposed in Goodfellow et al. (Goodfellow et al., 2014), which assumes that the objective function is approximately linear around the current model parameters. Building on this approximation, we can obtain an optimal max-norm constrained perturbation as follows:

(10) 𝜼u(l)=δu(l)sign(CLU(𝐳¨u,𝐳u′′)/𝜼u(l)).superscriptsubscript𝜼𝑢𝑙superscriptsubscript𝛿𝑢𝑙signsuperscriptsubscript𝐶𝐿𝑈subscript¨𝐳𝑢superscriptsubscript𝐳𝑢′′superscriptsubscript𝜼𝑢𝑙\bm{\eta}_{u}^{(l)}=\delta_{u}^{(l)}\cdot\text{sign}(\partial\mathcal{L}_{CL}^% {U}(\ddot{\mathbf{z}}_{u},\mathbf{z}_{u}^{\prime\prime})/\partial\bm{\eta}_{u}% ^{(l)}).bold_italic_η start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT = italic_δ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ⋅ sign ( ∂ caligraphic_L start_POSTSUBSCRIPT italic_C italic_L end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_U end_POSTSUPERSCRIPT ( over¨ start_ARG bold_z end_ARG start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT , bold_z start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) / ∂ bold_italic_η start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ) .

Similarly, we can compute the relation-aware perturbations for items. Due to space limitation, the detailed derivation steps are omitted here. After that, we generate the relation-aware adversarial-contrastive views for users and items as follows:

(11) 𝐳uacsuperscriptsubscript𝐳𝑢𝑎𝑐\displaystyle\mathbf{z}_{u}^{ac}bold_z start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a italic_c end_POSTSUPERSCRIPT =1L+1(𝐡u(0)+l=1L(𝐡u(l)+𝐫~u(l)sign(𝜼u(l)))),absent1𝐿1superscriptsubscript𝐡𝑢0superscriptsubscript𝑙1𝐿superscriptsubscript𝐡𝑢𝑙direct-productsuperscriptsubscript~𝐫𝑢𝑙signsuperscriptsubscript𝜼𝑢𝑙\displaystyle=\frac{1}{L+1}\left(\mathbf{h}_{u}^{(0)}+\sum_{l=1}^{L}\left(% \mathbf{h}_{u}^{(l)}+\tilde{\mathbf{r}}_{u}^{(l)}\odot\text{sign}(\bm{\eta}_{u% }^{(l)})\right)\right),= divide start_ARG 1 end_ARG start_ARG italic_L + 1 end_ARG ( bold_h start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT + ∑ start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ( bold_h start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT + over~ start_ARG bold_r end_ARG start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ⊙ sign ( bold_italic_η start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ) ) ) ,
𝐳iacsuperscriptsubscript𝐳𝑖𝑎𝑐\displaystyle\mathbf{z}_{i}^{ac}bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a italic_c end_POSTSUPERSCRIPT =1L+1(𝐡i(0)+l=1L(𝐡i(l)+𝐫~i(l)sign(𝜼i(l)))),absent1𝐿1superscriptsubscript𝐡𝑖0superscriptsubscript𝑙1𝐿superscriptsubscript𝐡𝑖𝑙direct-productsuperscriptsubscript~𝐫𝑖𝑙signsuperscriptsubscript𝜼𝑖𝑙\displaystyle=\frac{1}{L+1}\left(\mathbf{h}_{i}^{(0)}+\sum_{l=1}^{L}\left(% \mathbf{h}_{i}^{(l)}+\tilde{\mathbf{r}}_{i}^{(l)}\odot\text{sign}(\bm{\eta}_{i% }^{(l)})\right)\right),= divide start_ARG 1 end_ARG start_ARG italic_L + 1 end_ARG ( bold_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT + ∑ start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ( bold_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT + over~ start_ARG bold_r end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ⊙ sign ( bold_italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ) ) ) ,

where 𝐫~u(l)superscriptsubscript~𝐫𝑢𝑙\tilde{\mathbf{r}}_{u}^{(l)}over~ start_ARG bold_r end_ARG start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT and 𝐫~i(l)superscriptsubscript~𝐫𝑖𝑙\tilde{\mathbf{r}}_{i}^{(l)}over~ start_ARG bold_r end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT are defined in Eq. (8) and note that they are initialized with different random values.

Compared to the random-augmented view, adversarial-contrastive augmentation has two main advantages: (1) The optimization objective integrates global users (items) to confuse their identities, thus the view generation process is essentially guided by the user-user and item-item relationships, resulting in relation-aware and more challenging contrastive representations. (2) Considering different intrinsic vulnerability among instances, our proposed adversarial-contrastive perturbations are instance-specific and dynamically adopted along with the model training process, thereby further improving the model robustness and adaptability.

3.3.3. Multi-View Contrastive Learning

In summary, based on the above discussion, we have obtained views triplets (𝐳u,𝐳u′′,𝐳uac)superscriptsubscript𝐳𝑢superscriptsubscript𝐳𝑢′′superscriptsubscript𝐳𝑢𝑎𝑐(\mathbf{z}_{u}^{\prime},\mathbf{z}_{u}^{\prime\prime},\mathbf{z}_{u}^{ac})( bold_z start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_z start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT , bold_z start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a italic_c end_POSTSUPERSCRIPT ) and (𝐳i,𝐳i′′,𝐳iac)superscriptsubscript𝐳𝑖superscriptsubscript𝐳𝑖′′superscriptsubscript𝐳𝑖𝑎𝑐(\mathbf{z}_{i}^{\prime},\mathbf{z}_{i}^{\prime\prime},\mathbf{z}_{i}^{ac})( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT , bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a italic_c end_POSTSUPERSCRIPT ) for user u𝑢uitalic_u and item i𝑖iitalic_i, respectively. Then, we employ multi-view contrastive learning objective for different views of the same instances, i.e., {𝐳u𝐳u′′,𝐳uac𝐳u, and 𝐳uac𝐳u′′}superscriptsubscript𝐳𝑢superscriptsubscript𝐳𝑢′′superscriptsubscript𝐳𝑢𝑎𝑐superscriptsubscript𝐳𝑢 and superscriptsubscript𝐳𝑢𝑎𝑐superscriptsubscript𝐳𝑢′′\{\mathbf{z}_{u}^{\prime}\leftrightarrow\mathbf{z}_{u}^{\prime\prime},\mathbf{% z}_{u}^{ac}\leftrightarrow\mathbf{z}_{u}^{\prime},\text{ and }\mathbf{z}_{u}^{% ac}\leftrightarrow\mathbf{z}_{u}^{\prime\prime}\}{ bold_z start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ↔ bold_z start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT , bold_z start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a italic_c end_POSTSUPERSCRIPT ↔ bold_z start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , and bold_z start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a italic_c end_POSTSUPERSCRIPT ↔ bold_z start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT } for user u𝑢uitalic_u, while 𝐳i𝐳i′′,𝐳iac𝐳i, and 𝐳iac𝐳i′′superscriptsubscript𝐳𝑖superscriptsubscript𝐳𝑖′′superscriptsubscript𝐳𝑖𝑎𝑐superscriptsubscript𝐳𝑖 and superscriptsubscript𝐳𝑖𝑎𝑐superscriptsubscript𝐳𝑖′′\mathbf{z}_{i}^{\prime}\leftrightarrow\mathbf{z}_{i}^{\prime\prime},\mathbf{z}% _{i}^{ac}\leftrightarrow\mathbf{z}_{i}^{\prime},\text{ and }\mathbf{z}_{i}^{ac% }\leftrightarrow\mathbf{z}_{i}^{\prime\prime}bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ↔ bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT , bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a italic_c end_POSTSUPERSCRIPT ↔ bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , and bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a italic_c end_POSTSUPERSCRIPT ↔ bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT for item i𝑖iitalic_i.

The complete contrastive loss function is formulated as follows:

(12) CL=subscript𝐶𝐿absent\displaystyle\mathcal{L}_{CL}=caligraphic_L start_POSTSUBSCRIPT italic_C italic_L end_POSTSUBSCRIPT = CLU(𝐳u,𝐳u′′)+CLU(𝐳uac,𝐳u)+CLU(𝐳uac,𝐳u′′)superscriptsubscript𝐶𝐿𝑈superscriptsubscript𝐳𝑢superscriptsubscript𝐳𝑢′′superscriptsubscript𝐶𝐿𝑈superscriptsubscript𝐳𝑢𝑎𝑐superscriptsubscript𝐳𝑢superscriptsubscript𝐶𝐿𝑈superscriptsubscript𝐳𝑢𝑎𝑐superscriptsubscript𝐳𝑢′′\displaystyle\mathcal{L}_{CL}^{U}(\mathbf{z}_{u}^{\prime},\mathbf{z}_{u}^{% \prime\prime})+\mathcal{L}_{CL}^{U}(\mathbf{z}_{u}^{ac},\mathbf{z}_{u}^{\prime% })+\mathcal{L}_{CL}^{U}(\mathbf{z}_{u}^{ac},\mathbf{z}_{u}^{\prime\prime})caligraphic_L start_POSTSUBSCRIPT italic_C italic_L end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_U end_POSTSUPERSCRIPT ( bold_z start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_z start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) + caligraphic_L start_POSTSUBSCRIPT italic_C italic_L end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_U end_POSTSUPERSCRIPT ( bold_z start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a italic_c end_POSTSUPERSCRIPT , bold_z start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) + caligraphic_L start_POSTSUBSCRIPT italic_C italic_L end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_U end_POSTSUPERSCRIPT ( bold_z start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a italic_c end_POSTSUPERSCRIPT , bold_z start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT )
CLI(𝐳i,𝐳i′′)++CLI(𝐳iac,𝐳i)+CLI(𝐳iac,𝐳i′′).\displaystyle\mathcal{L}_{CL}^{I}(\mathbf{z}_{i}^{\prime},\mathbf{z}_{i}^{% \prime\prime})++\mathcal{L}_{CL}^{I}(\mathbf{z}_{i}^{ac},\mathbf{z}_{i}^{% \prime})+\mathcal{L}_{CL}^{I}(\mathbf{z}_{i}^{ac},\mathbf{z}_{i}^{\prime\prime% }).caligraphic_L start_POSTSUBSCRIPT italic_C italic_L end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_I end_POSTSUPERSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) + + caligraphic_L start_POSTSUBSCRIPT italic_C italic_L end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_I end_POSTSUPERSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a italic_c end_POSTSUPERSCRIPT , bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) + caligraphic_L start_POSTSUBSCRIPT italic_C italic_L end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_I end_POSTSUPERSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a italic_c end_POSTSUPERSCRIPT , bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) .

where CLU()superscriptsubscript𝐶𝐿𝑈\mathcal{L}_{CL}^{U}(\cdot)caligraphic_L start_POSTSUBSCRIPT italic_C italic_L end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_U end_POSTSUPERSCRIPT ( ⋅ ) and CLI()superscriptsubscript𝐶𝐿𝐼\mathcal{L}_{CL}^{I}(\cdot)caligraphic_L start_POSTSUBSCRIPT italic_C italic_L end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_I end_POSTSUPERSCRIPT ( ⋅ ) are defined in Eq. (2) and (3), respectively. Through the multi-view contrastive learning approach, the model is able to acquire more difficult knowledge from hard yet rational contrastive pairs, mitigating recommendation biases and preventing the overfitting resulting from sparse supervised data.

3.4. Towards Margin Maximization via Adversarial Optimization

However, excessive pursuit of representation uniformity in GCL may lead to reduced distances between data points and the decision boundary, potentially compromising the model robustness. We attribute such dilemma is caused by the inherent deficiency that the GCL’s essence is unsupervised learning paradigm, which pushes all different instances apart while ignoring task-specific semantic relations (Wang and Liu, 2021). To tackle the above issue, we propose to use adversarial examples for achieving margin maximization. Specifically, we generate adversarial examples using the maximum adverasrial perturbation defined in Eq. (6), which can be formulated as follows:

(13) 𝐳uadvsuperscriptsubscript𝐳𝑢𝑎𝑑𝑣\displaystyle\mathbf{z}_{u}^{adv}bold_z start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a italic_d italic_v end_POSTSUPERSCRIPT =1L+1(𝐡u(0)+l=1L(𝐡u(l)+𝚫u(l))),absent1𝐿1superscriptsubscript𝐡𝑢0superscriptsubscript𝑙1𝐿superscriptsubscript𝐡𝑢𝑙superscriptsubscript𝚫𝑢𝑙\displaystyle=\frac{1}{L+1}\left(\mathbf{h}_{u}^{(0)}+\sum_{l=1}^{L}\left(% \mathbf{h}_{u}^{(l)}+\mathbf{\Delta}_{u}^{(l)}\right)\right),= divide start_ARG 1 end_ARG start_ARG italic_L + 1 end_ARG ( bold_h start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT + ∑ start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ( bold_h start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT + bold_Δ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ) ) ,
𝐳iadvsuperscriptsubscript𝐳𝑖𝑎𝑑𝑣\displaystyle\mathbf{z}_{i}^{adv}bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a italic_d italic_v end_POSTSUPERSCRIPT =1L+1(𝐡i(0)+l=1L(𝐡i(l)+𝚫i(l))).absent1𝐿1superscriptsubscript𝐡𝑖0superscriptsubscript𝑙1𝐿superscriptsubscript𝐡𝑖𝑙superscriptsubscript𝚫𝑖𝑙\displaystyle=\frac{1}{L+1}\left(\mathbf{h}_{i}^{(0)}+\sum_{l=1}^{L}\left(% \mathbf{h}_{i}^{(l)}+\mathbf{\Delta}_{i}^{(l)}\right)\right).= divide start_ARG 1 end_ARG start_ARG italic_L + 1 end_ARG ( bold_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT + ∑ start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ( bold_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT + bold_Δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ) ) .

We then utilize the generated adversarial examples to optimize the BPR objective (i.e., Eq. (1)), which is given as follows:

(14) ADV=u𝒰i+u+iulnσ(r^u,iadvr^u,jadv),subscript𝐴𝐷𝑉subscript𝑢𝒰subscriptsuperscript𝑖superscriptsubscript𝑢subscriptsuperscript𝑖superscriptsubscript𝑢𝜎superscriptsubscript^𝑟𝑢𝑖𝑎𝑑𝑣superscriptsubscript^𝑟𝑢𝑗𝑎𝑑𝑣\displaystyle\mathcal{L}_{ADV}=-\sum_{u\in\mathcal{U}}\sum_{i^{+}\in\mathcal{I% }_{u}^{+}}\sum_{i^{-}\in\mathcal{I}_{u}^{-}}\ln\sigma(\hat{r}_{u,i}^{adv}-\hat% {r}_{u,j}^{adv}),caligraphic_L start_POSTSUBSCRIPT italic_A italic_D italic_V end_POSTSUBSCRIPT = - ∑ start_POSTSUBSCRIPT italic_u ∈ caligraphic_U end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT ∈ caligraphic_I start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ∈ caligraphic_I start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_POSTSUBSCRIPT roman_ln italic_σ ( over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT italic_u , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a italic_d italic_v end_POSTSUPERSCRIPT - over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT italic_u , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a italic_d italic_v end_POSTSUPERSCRIPT ) ,
wherer^u,iadv=𝐳uadv,𝐳iadv,r^u,jadv=𝐳uadv,𝐳jadv.formulae-sequencewheresuperscriptsubscript^𝑟𝑢𝑖𝑎𝑑𝑣expectationsuperscriptsubscript𝐳𝑢𝑎𝑑𝑣superscriptsubscript𝐳𝑖𝑎𝑑𝑣superscriptsubscript^𝑟𝑢𝑗𝑎𝑑𝑣expectationsuperscriptsubscript𝐳𝑢𝑎𝑑𝑣superscriptsubscript𝐳𝑗𝑎𝑑𝑣\displaystyle\text{where}\ \hat{r}_{u,i}^{adv}=\Braket{\mathbf{z}_{u}^{adv},% \mathbf{z}_{i}^{adv}},\ \hat{r}_{u,j}^{adv}=\Braket{\mathbf{z}_{u}^{adv},% \mathbf{z}_{j}^{adv}}.where over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT italic_u , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a italic_d italic_v end_POSTSUPERSCRIPT = ⟨ start_ARG bold_z start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a italic_d italic_v end_POSTSUPERSCRIPT , bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a italic_d italic_v end_POSTSUPERSCRIPT end_ARG ⟩ , over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT italic_u , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a italic_d italic_v end_POSTSUPERSCRIPT = ⟨ start_ARG bold_z start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a italic_d italic_v end_POSTSUPERSCRIPT , bold_z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a italic_d italic_v end_POSTSUPERSCRIPT end_ARG ⟩ .

By explicitly creating adversarial examples around the model’s decision boundary, the model optimized with both original and adversarial data can more effectively boost the confidence of input data, thereby enhancing the model’s overall robustness.

3.5. Model Training

3.5.1. Joint Optimization Objective

In the training stage, we propose to optimize the model parameters with the joint learning objective, which is formulated as follows:

(15) =BPR+μADV+αCL,subscript𝐵𝑃𝑅𝜇subscript𝐴𝐷𝑉𝛼subscript𝐶𝐿\mathcal{L}=\mathcal{L}_{BPR}+\mu\mathcal{L}_{ADV}+\alpha\mathcal{L}_{CL},caligraphic_L = caligraphic_L start_POSTSUBSCRIPT italic_B italic_P italic_R end_POSTSUBSCRIPT + italic_μ caligraphic_L start_POSTSUBSCRIPT italic_A italic_D italic_V end_POSTSUBSCRIPT + italic_α caligraphic_L start_POSTSUBSCRIPT italic_C italic_L end_POSTSUBSCRIPT ,

where μ𝜇\muitalic_μ and α𝛼\alphaitalic_α are the hyper-parameters for different loss terms.

3.5.2. Complexity Analysis

Since RGCL doesn’t introduce any other trainable parameters, the space complexity and the inference time complexity of model remains the same as GNN backbone. Besides, the total training time complexity of RGCL is O((L||+B2)d)𝑂𝐿superscript𝐵2𝑑O((L|\mathcal{E}|+B^{2})d)italic_O ( ( italic_L | caligraphic_E | + italic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) italic_d ), where B𝐵Bitalic_B and \mathcal{E}caligraphic_E denote the batch size and edge set, respectively. Thus, our method retains the same order of computation complexity as other state-of-the-art GCL-based methods, such as SimGCL (Yu et al., 2022) and RocSE (Ye et al., 2023). Due to the limited space, please refer to Appendix A for more detailed analysis.

4. Theoretical Analysis

4.1. Hardness-aware Contrastive Learning

The core motivation of this paper is to construct semantic preserving and hardness enhancing view-generator for contrastive learning. For the former, we capitalize on the decision boundary-aware constraint to help build rationality-aware views. For the latter, we carefully construct more challenging contrastive paired data because their hardness significantly affects the optimization process of model parameters.

To further explain, we give a proof that contrastive loss is essentially hardness-aware learning mechanism. Specifically, taking the side of users as an example, given a set of users 𝒰={u1,u2,,uM}𝒰subscript𝑢1subscript𝑢2subscript𝑢𝑀\mathcal{U}=\{u_{1},u_{2},\dots,u_{M}\}caligraphic_U = { italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_u start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT }, we denote the similarity of user uisubscript𝑢𝑖u_{i}italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT under different augmented views (e.g., random-augmented view or adversarial-contrastive view) as si,isubscript𝑠𝑖𝑖s_{i,i}italic_s start_POSTSUBSCRIPT italic_i , italic_i end_POSTSUBSCRIPT, and the similarity between user uisubscript𝑢𝑖u_{i}italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and ujsubscript𝑢𝑗u_{j}italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT as si,jsubscript𝑠𝑖𝑗s_{i,j}italic_s start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT. The probability of uisubscript𝑢𝑖u_{i}italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT being identified as ujsubscript𝑢𝑗u_{j}italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT is formulated as:

Pi,j=exp(si,j/τ)exp(si,i/τ)+kiexp(si,k/τ)).P_{i,j}=\frac{\exp(s_{i,j}/\tau)}{\exp(s_{i,i}/\tau)+\sum_{k\neq i}\exp(s_{i,k% }/\tau))}.italic_P start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT = divide start_ARG roman_exp ( italic_s start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT / italic_τ ) end_ARG start_ARG roman_exp ( italic_s start_POSTSUBSCRIPT italic_i , italic_i end_POSTSUBSCRIPT / italic_τ ) + ∑ start_POSTSUBSCRIPT italic_k ≠ italic_i end_POSTSUBSCRIPT roman_exp ( italic_s start_POSTSUBSCRIPT italic_i , italic_k end_POSTSUBSCRIPT / italic_τ ) ) end_ARG .

Thus, the objective of contrastive learning is rewritten as follows:

φ(ui)=logexp(si,i/τ)exp(si,i/τ)+kiexp(si,k/τ).𝜑subscript𝑢𝑖subscript𝑠𝑖𝑖𝜏subscript𝑠𝑖𝑖𝜏subscript𝑘𝑖subscript𝑠𝑖𝑘𝜏\varphi(u_{i})=-\log\frac{\exp(s_{i,i}/\tau)}{\exp(s_{i,i}/\tau)+\sum_{k\neq i% }\exp(s_{i,k}/\tau)}.italic_φ ( italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = - roman_log divide start_ARG roman_exp ( italic_s start_POSTSUBSCRIPT italic_i , italic_i end_POSTSUBSCRIPT / italic_τ ) end_ARG start_ARG roman_exp ( italic_s start_POSTSUBSCRIPT italic_i , italic_i end_POSTSUBSCRIPT / italic_τ ) + ∑ start_POSTSUBSCRIPT italic_k ≠ italic_i end_POSTSUBSCRIPT roman_exp ( italic_s start_POSTSUBSCRIPT italic_i , italic_k end_POSTSUBSCRIPT / italic_τ ) end_ARG .

Then, the expression of updating model parameters 𝜽𝜽\bm{\theta}bold_italic_θ is

φ(ui)𝜽=φ(ui)si,isi,i𝜽+jiφ(ui)si,jsi,j𝜽,𝜑subscript𝑢𝑖𝜽𝜑subscript𝑢𝑖subscript𝑠𝑖𝑖subscript𝑠𝑖𝑖𝜽subscript𝑗𝑖𝜑subscript𝑢𝑖subscript𝑠𝑖𝑗subscript𝑠𝑖𝑗𝜽\frac{\partial\varphi(u_{i})}{\partial\bm{\theta}}=\frac{\partial\varphi(u_{i}% )}{\partial{s_{i,i}}}\frac{\partial{s_{i,i}}}{\partial\bm{\theta}}+\sum_{j\neq i% }\frac{\partial\varphi(u_{i})}{\partial{s_{i,j}}}\frac{\partial{s_{i,j}}}{% \partial\bm{\theta}},divide start_ARG ∂ italic_φ ( italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG start_ARG ∂ bold_italic_θ end_ARG = divide start_ARG ∂ italic_φ ( italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG start_ARG ∂ italic_s start_POSTSUBSCRIPT italic_i , italic_i end_POSTSUBSCRIPT end_ARG divide start_ARG ∂ italic_s start_POSTSUBSCRIPT italic_i , italic_i end_POSTSUBSCRIPT end_ARG start_ARG ∂ bold_italic_θ end_ARG + ∑ start_POSTSUBSCRIPT italic_j ≠ italic_i end_POSTSUBSCRIPT divide start_ARG ∂ italic_φ ( italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG start_ARG ∂ italic_s start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT end_ARG divide start_ARG ∂ italic_s start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT end_ARG start_ARG ∂ bold_italic_θ end_ARG ,

where we give the derivation results for φ(ui)si,i𝜑subscript𝑢𝑖subscript𝑠𝑖𝑖\frac{\partial\varphi(u_{i})}{\partial{s_{i,i}}}divide start_ARG ∂ italic_φ ( italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG start_ARG ∂ italic_s start_POSTSUBSCRIPT italic_i , italic_i end_POSTSUBSCRIPT end_ARG and φ(ui)si,j𝜑subscript𝑢𝑖subscript𝑠𝑖𝑗\frac{\partial\varphi(u_{i})}{\partial{s_{i,j}}}divide start_ARG ∂ italic_φ ( italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG start_ARG ∂ italic_s start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT end_ARG:

(16) φ(xi)si,i𝜑subscript𝑥𝑖subscript𝑠𝑖𝑖\displaystyle\frac{\partial\varphi(x_{i})}{\partial s_{i,i}}divide start_ARG ∂ italic_φ ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG start_ARG ∂ italic_s start_POSTSUBSCRIPT italic_i , italic_i end_POSTSUBSCRIPT end_ARG =1τ(Pi,i1)exp(si,i/τ),absent1𝜏subscript𝑃𝑖𝑖1proportional-tosubscript𝑠𝑖𝑖𝜏\displaystyle=\frac{1}{\tau}(P_{i,i}-1)\propto\exp(s_{i,i}/\tau),= divide start_ARG 1 end_ARG start_ARG italic_τ end_ARG ( italic_P start_POSTSUBSCRIPT italic_i , italic_i end_POSTSUBSCRIPT - 1 ) ∝ roman_exp ( italic_s start_POSTSUBSCRIPT italic_i , italic_i end_POSTSUBSCRIPT / italic_τ ) ,
φ(ui)si,j𝜑subscript𝑢𝑖subscript𝑠𝑖𝑗\displaystyle\frac{\partial\varphi(u_{i})}{\partial{s_{i,j}}}divide start_ARG ∂ italic_φ ( italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG start_ARG ∂ italic_s start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT end_ARG =1τPi,jexp(si,j/τ),absent1𝜏subscript𝑃𝑖𝑗proportional-tosubscript𝑠𝑖𝑗𝜏\displaystyle=\frac{1}{\tau}P_{i,j}\propto\exp(s_{i,j}/\tau),= divide start_ARG 1 end_ARG start_ARG italic_τ end_ARG italic_P start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ∝ roman_exp ( italic_s start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT / italic_τ ) ,

where we can observe that the gradients of the contrastive loss w.r.t. both positive and negative pairs are proportional to the corresponding exponential form of their similarity scores. This means that smaller positive pair similarity si,isubscript𝑠𝑖𝑖s_{i,i}italic_s start_POSTSUBSCRIPT italic_i , italic_i end_POSTSUBSCRIPT and larger negative pair similarity si,jsubscript𝑠𝑖𝑗s_{i,j}italic_s start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT will have a greater impact on the model parameter optimization. Therefore, our proposed RGCL can learn the high-quality representations by constructing the challenging positive pairs and hard negative pairs, which fits to guide model optimization through hardness-aware contrastive learning.

4.2. Theoretical Analysis of Model Robustness

Although contrastive learning can improve the representation uniformity and reduce the recommendation bias, it may potentially push data points closer to model decision boundary and eventually decrease model robustness due to the nature of task-unrelated unsupervised learning. To make it up, our RGCL explicitly maximizes the margin by constructing adversarial examples based on decision boundary-aware perturbation. Then, in this subsection, we give the explanation on the rationality of our method.

For the sake of notation simplicity, we assume that input example is denoted as x𝑥xitalic_x. The goal of recommendation algorithm is to make the preference probabilities for user u𝑢uitalic_u’s positive items are higher than that for negative items, which is denoted as g(x;𝜽)>0𝑔𝑥𝜽0g(x;\bm{\theta})>0italic_g ( italic_x ; bold_italic_θ ) > 0. Inspired by work (Ding et al., 2020), the margin between data point and decision boundary is denoted as d(x;𝜽)𝑑𝑥𝜽d(x;\bm{\theta})italic_d ( italic_x ; bold_italic_θ ), which can be defined as follows:

(17) d(x;𝜽)=𝚫=max𝚫s.t.𝚫:g(x+𝚫;𝜽)>0.d(x;\bm{\theta})=\|\mathbf{\Delta}^{*}\|=\max\|\mathbf{\Delta}\|\quad s.t.\ % \mathbf{\Delta}:g(x+\mathbf{\Delta};\bm{\theta})>0.italic_d ( italic_x ; bold_italic_θ ) = ∥ bold_Δ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ = roman_max ∥ bold_Δ ∥ italic_s . italic_t . bold_Δ : italic_g ( italic_x + bold_Δ ; bold_italic_θ ) > 0 .

We denote the BPR loss function as ψ()𝜓\psi(\cdot)italic_ψ ( ⋅ ), then we have the theorem:

Theorem 1.

Gradient descent on ψ(g(x+𝚫;𝛉))𝜓𝑔𝑥superscript𝚫𝛉\psi(g(x+\mathbf{\Delta}^{*};\bm{\theta}))italic_ψ ( italic_g ( italic_x + bold_Δ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ; bold_italic_θ ) ) w.r.t. 𝛉𝛉\bm{\theta}bold_italic_θ with a proper step size increases d(x;𝛉)𝑑𝑥𝛉d(x;\bm{\theta})italic_d ( italic_x ; bold_italic_θ ), where 𝚫=argmaxg(x+𝚫;𝛉)>0𝚫superscript𝚫subscript𝑔𝑥𝚫𝛉0norm𝚫\mathbf{\Delta}^{*}=\arg\max_{g(x+\mathbf{\Delta};\bm{\theta})>0}\|\mathbf{% \Delta}\|bold_Δ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = roman_arg roman_max start_POSTSUBSCRIPT italic_g ( italic_x + bold_Δ ; bold_italic_θ ) > 0 end_POSTSUBSCRIPT ∥ bold_Δ ∥ is the maximum perturbation given the current 𝛉𝛉\bm{\theta}bold_italic_θ.

Proof.

Let ρ(𝚫)=𝚫𝜌𝚫norm𝚫\rho(\mathbf{\Delta})=\|\mathbf{\Delta}\|italic_ρ ( bold_Δ ) = ∥ bold_Δ ∥ and assume ρ(𝚫)𝜌𝚫\rho(\mathbf{\Delta})italic_ρ ( bold_Δ ) and ψ(g(x;𝜽))𝜓𝑔𝑥𝜽\psi(g(x;\bm{\theta}))italic_ψ ( italic_g ( italic_x ; bold_italic_θ ) ) are functions with twice continuous derivatives in a neighborhood of (𝚫,𝜽)superscript𝚫𝜽(\mathbf{\Delta}^{*},\bm{\theta})( bold_Δ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , bold_italic_θ ), c𝑐citalic_c is a constant, and the matrix

(2ρ(𝚫)𝚫2+c2ψ(g(x+𝚫;𝜽))𝚫2ψ(g(x+𝚫;𝜽))𝚫(ψ(g(x+𝚫;𝜽))𝚫)T0)matrixsuperscript2𝜌superscript𝚫superscript𝚫2𝑐superscript2𝜓𝑔𝑥superscript𝚫𝜽superscript𝚫2𝜓𝑔𝑥superscript𝚫𝜽𝚫superscript𝜓𝑔𝑥superscript𝚫𝜽𝚫𝑇0\begin{pmatrix}\frac{\partial^{2}\rho(\mathbf{\Delta}^{*})}{\partial\mathbf{% \Delta}^{2}}+c\cdot\frac{\partial^{2}\psi(g(x+\mathbf{\Delta}^{*};\bm{\theta})% )}{\partial\mathbf{\Delta}^{2}}&\frac{\psi(g(x+\mathbf{\Delta}^{*};\bm{\theta}% ))}{\partial\mathbf{\Delta}}\\ \left(\frac{\partial\psi(g(x+\mathbf{\Delta}^{*};\bm{\theta}))}{\partial% \mathbf{\Delta}}\right)^{T}&0\end{pmatrix}( start_ARG start_ROW start_CELL divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ρ ( bold_Δ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) end_ARG start_ARG ∂ bold_Δ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + italic_c ⋅ divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ψ ( italic_g ( italic_x + bold_Δ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ; bold_italic_θ ) ) end_ARG start_ARG ∂ bold_Δ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_CELL start_CELL divide start_ARG italic_ψ ( italic_g ( italic_x + bold_Δ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ; bold_italic_θ ) ) end_ARG start_ARG ∂ bold_Δ end_ARG end_CELL end_ROW start_ROW start_CELL ( divide start_ARG ∂ italic_ψ ( italic_g ( italic_x + bold_Δ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ; bold_italic_θ ) ) end_ARG start_ARG ∂ bold_Δ end_ARG ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT end_CELL start_CELL 0 end_CELL end_ROW end_ARG )

is full rank, then we have

d(x;𝜽)=C(x,𝜽)ψ(g(x+𝚫;𝜽))θ,𝑑𝑥𝜽𝐶𝑥𝜽𝜓𝑔𝑥superscript𝚫𝜽𝜃\nabla d(x;\bm{\theta})=C(x,\bm{\theta})\frac{\partial\psi(g(x+\mathbf{\Delta}% ^{*};\bm{\theta}))}{\partial\theta},∇ italic_d ( italic_x ; bold_italic_θ ) = italic_C ( italic_x , bold_italic_θ ) divide start_ARG ∂ italic_ψ ( italic_g ( italic_x + bold_Δ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ; bold_italic_θ ) ) end_ARG start_ARG ∂ italic_θ end_ARG ,

where

C(x,𝜽)=ρ(𝚫)𝚫,ψ(g(x+𝚫;𝜽))𝚫ψ(g(x+𝚫;𝜽))𝚫22𝐶𝑥𝜽𝜌superscript𝚫𝚫𝜓𝑔𝑥superscript𝚫𝜽𝚫superscriptsubscriptnorm𝜓𝑔𝑥superscript𝚫𝜽𝚫22C(x,\bm{\theta})=\frac{\left\langle\frac{\partial\rho(\mathbf{\Delta}^{*})}{% \partial\mathbf{\Delta}},\frac{\partial\psi(g(x+\mathbf{\Delta}^{*};\bm{\theta% }))}{\partial\mathbf{\Delta}}\right\rangle}{\left\|\frac{\partial\psi(g(x+% \mathbf{\Delta}^{*};\bm{\theta}))}{\partial\mathbf{\Delta}}\right\|_{2}^{2}}italic_C ( italic_x , bold_italic_θ ) = divide start_ARG ⟨ divide start_ARG ∂ italic_ρ ( bold_Δ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) end_ARG start_ARG ∂ bold_Δ end_ARG , divide start_ARG ∂ italic_ψ ( italic_g ( italic_x + bold_Δ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ; bold_italic_θ ) ) end_ARG start_ARG ∂ bold_Δ end_ARG ⟩ end_ARG start_ARG ∥ divide start_ARG ∂ italic_ψ ( italic_g ( italic_x + bold_Δ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ; bold_italic_θ ) ) end_ARG start_ARG ∂ bold_Δ end_ARG ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG

is a scalar. ∎

The above proof demonstrates that under proper perturbations, our method can maximize the margin by minimizing the adversarial loss. Therefore, our proposed method can maximize the margin between data points and the model decision boundary by generating adversarial examples with the maximum perturbations defined in Seq. 3.2, thereby effectively improving the robustness of model. Besides, we give an additional robust analysis of our method from the perspective of connections between the sharpness of loss landscape and PAC-Bayes theory. It further theoretically elaborates on the model’s tolerance to parameter perturbations. The detailed analysis is presented in the Appendix B.

Table 1. Overall performance comparison among baseline and our models. We use bold fonts to label the best performance and use underlines to label the second. The NDCG and Recall metrics are abbreviated as ‘N’ and ‘R’, respectively.
Dataset Metric BPRMF NeuMF GCMC NGCF GCCF LightGCN GraphCL SGL LightGCL RocSE CGI SimGCL RGCL Improv. p-value
ML-1M R@10 0.1702 0.1553 0.1676 0.1763 0.1753 0.1774 0.1837 0.1828 0.1796 0.1786 0.1797 0.1866 0.1934 +3.91% 2.67e-4
N@10 0.2485 0.2291 0.2480 0.2544 0.2624 0.2581 0.2617 0.2625 0.2591 0.2577 0.2613 0.2657 0.2694 +1.58% 7.52e-4
R@20 0.2582 0.2400 0.2526 0.2673 0.2611 0.2680 0.2749 0.2745 0.2722 0.2699 0.2703 0.2798 0.2901 +3.69% 7.50e-4
N@20 0.2576 0.2393 0.2551 0.2647 0.2677 0.2670 0.2721 0.2725 0.2693 0.2676 0.2699 0.2758 0.2821 +2.29% 2.26e-3
R@50 0.4174 0.3952 0.4073 0.4297 0.4171 0.4310 0.4379 0.4381 0.4343 0.4333 0.4308 0.4468 0.4581 +2.53% 4.42e-4
N@50 0.3038 0.2848 0.2985 0.3121 0.3109 0.3137 0.3196 0.3202 0.3162 0.3149 0.3158 0.3242 0.3321 +2.42% 4.08e-4
Alibaba R@10 0.0682 0.0450 0.0503 0.0700 0.0707 0.0734 0.0741 0.0769 0.0747 0.0767 0.0740 0.0791 0.0824 +4.20% 1.69e-3
N@10 0.0435 0.0284 0.0308 0.0446 0.0446 0.0461 0.0473 0.0486 0.0469 0.0485 0.0466 0.0502 0.0528 +5.00% 1.57e-4
R@20 0.1070 0.0718 0.0805 0.1101 0.1104 0.1138 0.1151 0.1187 0.1158 0.1166 0.1146 0.1218 0.1267 +4.00% 4.02e-4
N@20 0.0553 0.0365 0.0399 0.0568 0.0567 0.0584 0.0598 0.0613 0.0594 0.0607 0.0589 0.0632 0.0663 +4.85% 1.54e-6
R@50 0.1875 0.1282 0.1454 0.1920 0.1931 0.1975 0.1944 0.2020 0.2010 0.1937 0.1967 0.2059 0.2129 +3.40% 4.63e-4
N@50 0.0746 0.0501 0.0554 0.0764 0.0765 0.0784 0.0787 0.0812 0.0798 0.0792 0.0786 0.0834 0.0869 +4.29% 1.12e-4
Kuaishou R@10 0.0565 0.0588 0.0645 0.0663 0.0787 0.0730 0.0738 0.0748 0.0775 0.0714 0.0726 0.0788 0.0899 +14.14% 5.05e-6
N@10 0.0326 0.0351 0.0375 0.0370 0.0441 0.0413 0.0436 0.0450 0.0461 0.0409 0.0417 0.0451 0.0498 +8.00% 6.99e-4
R@20 0.0992 0.1095 0.1193 0.1266 0.1327 0.1269 0.1225 0.1282 0.1430 0.1242 0.1316 0.1325 0.1529 +6.88% 4.03e-4
N@20 0.0457 0.0504 0.0541 0.0551 0.0603 0.0573 0.0584 0.0609 0.0660 0.0571 0.0596 0.0613 0.0687 +4.09% 3.89e-3
R@50 0.2027 0.2172 0.2203 0.2562 0.2477 0.2388 0.2366 0.2522 0.2788 0.2489 0.2565 0.2503 0.2865 +2.79% 8.94e-3
N@50 0.0702 0.0760 0.0782 0.0857 0.0879 0.0840 0.0854 0.0902 0.0980 0.0866 0.0891 0.0897 0.1005 +2.54% 9.41e-3
Gowalla R@10 0.1330 0.1205 0.1185 0.1296 0.1319 0.1419 0.1540 0.1470 0.1448 0.1461 0.1447 0.1564 0.1606 +2.66% 7.69e-4
N@10 0.1162 0.1038 0.1013 0.1136 0.1150 0.1257 0.1363 0.1305 0.1277 0.1271 0.1280 0.1379 0.1419 +2.89% 1.84e-3
R@20 0.1894 0.1783 0.1749 0.1878 0.1924 0.2041 0.2178 0.2123 0.2085 0.2117 0.2059 0.2245 0.2272 +1.18% 1.83e-2
N@20 0.1355 0.1238 0.1205 0.1333 0.1356 0.1470 0.1579 0.1527 0.1493 0.1495 0.1487 0.1610 0.1646 +2.22% 4.59e-3
R@50 0.3003 0.2888 0.2832 0.3009 0.3057 0.3194 0.3335 0.3273 0.3240 0.3297 0.3205 0.3460 0.3468 +0.23% 1.31e-1
N@50 0.1682 0.1563 0.1524 0.1667 0.1691 0.1810 0.1922 0.1867 0.1835 0.1845 0.1826 0.1969 0.2000 +1.55% 1.58e-3
Yelp R@10 0.0509 0.0407 0.0520 0.0506 0.0512 0.0612 0.0663 0.0681 0.0626 0.0656 0.0579 0.0740 0.0753 +1.75% 1.16e-2
N@10 0.0392 0.0309 0.0400 0.0390 0.0399 0.0479 0.0518 0.0532 0.0487 0.0512 0.0449 0.0582 0.0591 +1.58% 6.58e-3
R@20 0.0844 0.0691 0.0867 0.0842 0.0851 0.1001 0.1067 0.1098 0.1021 0.1052 0.0940 0.1182 0.1191 +0.78% 1.52e-3
N@20 0.0509 0.0408 0.0520 0.0507 0.0517 0.0614 0.0658 0.0677 0.0624 0.0650 0.0574 0.0736 0.0744 +1.09% 2.83e-3
R@50 0.1571 0.1339 0.1623 0.1570 0.1582 0.1814 0.1909 0.1950 0.1852 0.1871 0.1704 0.2075 0.2108 +1.58% 2.36e-3
N@50 0.0720 0.0596 0.0740 0.0718 0.0730 0.0850 0.0903 0.0925 0.0865 0.0888 0.0796 0.0995 0.1010 +1.46% 2.03e-3
Refer to caption
Figure 3. Model convergence analysis w.r.t training epochs on the ML-1M and Yelp datasets.

5. EXPERIMENTS

In this section, we conduct extensive experiments to validate the effectiveness of RGCL, and our goal is to answer the following research questions:

  • RQ1: How does RGCL perform compared with state-of-the-art recommendation models?

  • RQ2: How do different designs of RGCL contribute to the final recommendation performance?

  • RQ3: How does RGCL perform against different data sparsity and item popularity?

  • RQ4: How do different hyper-parameters affect the recommendation performance of RGCL?

5.1. Experimental Setup

Datasets. We conduct extensive experiments on the following public recommendation datasets: MovieLens (ML)-1M (Harper and Konstan, 2015), Alibaba (Chen et al., 2019), Kuaishou (Gao et al., 2022), Gowalla (Cho et al., 2011), and Yelp. For detailed introductions and preprocessing details of these datasets, please refer to Appendix C.1.

Baseline Models. We compare RGCL with different state-of-the-art recommendation models, including traditional recommenders (BPR (Rendle et al., 2012) and NeuMF (He et al., 2017)), GNN-based recommenders (GCMC(Berg et al., 2017), NGCF (Wang et al., 2019), GCCF (Chen et al., 2020), and LightGCN (He et al., 2020)) and GCL-based recommenders (GraphCL (You et al., 2020), SGL (Wu et al., 2021) , LightGCL (Cai et al., 2023), CGI (Wei et al., 2022), RocSE (Ye et al., 2023), and SimGCL (Yu et al., 2022)). The detailed introduction of all these baseline models are referred to Appendix C.2.

Evaluation Metrics. To ensure the evaluation reliability, following standard practice (Wei et al., 2022; Wu et al., 2021; Yang et al., 2023), we adopt the full-ranking strategy to mitigate the evaluation bias introduced by randomly negative sampling, which ranks all the items that are not interacted by the test user as candidate item pool. For evaluation metrics, we adopt the Normalized Discounted Cumulative Gain@K𝐾Kitalic_K (NDCG@K𝐾Kitalic_K) and Recall@K𝐾Kitalic_K, where K{10,20,50}𝐾102050K\in\{10,20,50\}italic_K ∈ { 10 , 20 , 50 }.

For better reproducibility, more implementation details are provided in Appendix C.3 and https://cl4rec.github.io/RGCL/.

5.2. Overall Performance (RQ1)

The results of different methods on all datasets are shown in Table 1. Based on the results, we have the following observations:

  • \bullet

    Compared to traditional baselines, such as BPRMF and NeuMF, all GNN-based models perform better on most datasets, which agrees with the previous work and confirms the effectiveness of GNNs (He et al., 2020; Wang et al., 2019). Among all the GNN-based methods, LightGCN usually achieves the excellent performance due to its simple yet effective linear convolution structure. Furthermore, most GCL-based recommenders outperform the GNN-based methods, indicating the desirable property of GCL for alleviating the bias introduced by high-degree nodes. However, these GCL-based models fail to explicitly delineate the definitions of task-relevant semantic rationality and contrastive hardness, thus they achieve inferior balance between contrastive rationality and hardness when constructing augmentation views.

  • \bullet

    By comparing our approach with all state-of-the-art baselines, it is clear to see that RGCL yields a consistent boost across all datasets. Besides, the most p𝑝pitalic_p-values that are much less than 0.01 also demonstrate the effectiveness of RGCL. We attribute the marked enhancement in performance to the excellent balance between preserving semantic information and bolstering hardness of contrastive examples, which further prompts the ability upper bound of GCL-based recommenders. Besides, we increase the distance between sample points and decision boundary through enhanced adversarial examples, avoiding compromises in robustness caused by contrastive learning.

Training Efficiency. Moreover, to verify the convergence performance of RGCL, we track the Recall@20 and NDCG@20 curves of different models w.r.t. the training epochs in Figure 3. From the results, we can observe that RGCL converges significantly faster than SimGCL and LightGCN. Although LightGCL also achieves great convergence speed, its accuracy performance is worse than RGCL, as seen in Table 1. One possible reason is that its static SVD contrastive view fails to keep pace with the evolving model capability during training, eventually limiting the improvement of representation quality. Different from these baselines, RGCL adopts the decision boundary-aware perturbation to guide on the example generation, which adaptively adjusts the augmentation strength to reduce the inconsistency between the representation quality and the contrastive hardness. As a result, RGCL shows both significantly greater efficiency and efficacy.

5.3. Ablation Study (RQ2)

To further validate the importance and contribution of each component in RGCL, we devise multiple simplified variants. In specific, we compare the following four variants: (1) in w/o cons, we drop the decision boundary-aware perturbation constraints on contrastive views. (2) In w/o rand, we do not introduce random initialized perturbation (i.e., set 𝐫𝐫\mathbf{r}bold_r as all-one vector). (3) In w/o ac, we drop the relation-aware view generator but only retain two random augmented views; (4) In w/o adv, we drop the adversarial regularization term ADVsubscript𝐴𝐷𝑉\mathcal{L}_{ADV}caligraphic_L start_POSTSUBSCRIPT italic_A italic_D italic_V end_POSTSUBSCRIPT in the final loss. The experiment is conducted based on the datasets of ML-1M and Yelp, while the observation and conclusion on the other datasets are similar and omitted.

We present the results in Table 2, where we can see: For w/o cons variant, unconstrained perturbations result in a significant performance decrease, suggesting that a uniform perturbation cannot effectively preserve that semantic information due to different intrinsic robustness among instances. The w/o rand variant performs much worse than RGCL, which demonstrates that introducing some variances for augmented views is necessary. Furthermore, our method gains improvement over w/o ac variant, which reveals the importance of challenging positive pairs and hard negative pairs However, only optimizing contrastive learning is still sub-optimal, which is evidenced by the lowered performance of w/o adv variant as compared with RGCL. We speculate that over-optimizing contrastive learning for representation uniformity may potentially lead to a reduction in the distance between data points and the model’s decision boundary, eventually deteriorating the robustness. In summary, the above observations demonstrate that all the designs are crucial to the final performance improvement.

Table 2. Ablation Study on ML-1M and Yelp datasets.
Model ML-1M Yelp
R@20 N@20 R@50 N@50 R@20 N@20 R@50 N@50
w/o cons 0.2882 0.2798 0.4566 0.3302 0.1185 0.0733 0.2086 0.0995
w/o rand 0.2838 0.2793 0.4470 0.3265 0.1183 0.0736 0.2080 0.0996
w/o ac 0.2872 0.2813 0.4570 0.3315 0.1182 0.0737 0.2085 0.1000
w/o adv 0.2832 0.2801 0.4470 0.3276 0.1180 0.0737 0.2083 0.1000
RGCL 0.2901 0.2821 0.4581 0.3321 0.1191 0.0744 0.2108 0.1010

5.4. Robustness Evaluation (RQ3)

Refer to caption
Figure 4. Recommendation performances at different level of data sparsity and item popularity. The black dashed line represents no performance improvement or decline.

To validate the model robustness, we conduct experimental analysis based on different levels of user activity level and item popularity. For detailed user and item grouping approaches, please refer to Appendix C.4. The experimental results are presented in Figure 4, where we can observe that in user (item) groups with sparse interactions, RGCL demonstrates more significant performance improvements. This implies that RGCL effectively capture interest preference of inactive users and characteristic of long-tailed items. Note that the performance trends on the item side for ML-1M and Yelp datasets are different. We speculate that one possible reason is that the proportion of long-tailed items in ML-1M is much higher than Yelp, which results in major contribution to the overall performance by low-degree item groups in ML-1M.

5.5. Further Analysis of RGCL (RQ4)

In this subsection, we further conduct more detailed experiments on the RGCL method to confirm its effectiveness. Due to space limitation, we only show the results on ML-1M and Yelp datasets while the similar conclusions can be derived from other datasets.

Refer to caption
Figure 5. Hyper-parameter analysis w.r.t. α𝛼\alphaitalic_α, L𝐿Litalic_L, τ𝜏\tauitalic_τ. The top shows the experimental results on ML-1M and the bottom shows the results on Yelp.
Refer to caption
Figure 6. The model tolerance to hyper-parameter ϵitalic-ϵ\epsilonitalic_ϵ in terms of Recall@20 and NDCG@20 on ML-1M and Yelp datasets. The bars represent the accuracy metrics of different models (w.r.t. NDCG@20 and Recall@20), while the lines show the relative improvement of RGCL compared to SimGCL.

5.5.1. Analysis of the model tolerance to hyper-parameter ϵitalic-ϵ\epsilonitalic_ϵ

To validate the robustness of our method to perturbation hyper-parameter ϵitalic-ϵ\epsilonitalic_ϵ, we conduct extensive experiments of performance comparison with SimGCL baseline with different values of ϵitalic-ϵ\epsilonitalic_ϵ. Specifically, we set the search range as {0.005,0.01,0.05,0.1,0.2,0.5,1.0}. As shown in Figure 6, we observe that SimGCL shows obvious performance fluctuations as ϵitalic-ϵ\epsilonitalic_ϵ changes. We speculate that the twofold reasons are the following: (1) different instances have different levels of intrinsic robustness. However, uniform and unconstrained perturbations may potentially destroy the semantic structure for fragile instances, ultimately leading to erroneous contrastive views. (2) For instances with better intrinsic robustness, the hardness of contrastive examples is insufficient, hindering the full exploitation of contrastive learning. In contrast, our RGCL adopts decision boundary-aware perturbation constraints to guide the generation of both random and adversarial contrastive examples, leading to stable and superior performance. This demonstrates the insensitivity of RGCL to perturbation hyper-parameter ϵitalic-ϵ\epsilonitalic_ϵ.

5.5.2. Impact of the coefficient α𝛼\alphaitalic_α

We change α𝛼\alphaitalic_α to a set of predetermined representative values presented in Figure 5(a). We can see that the recommendation performance of RGCL gradually improves as α𝛼\alphaitalic_α increases, which suggests that contrastive learning can facilitate the uniformity of node representation and learn high-quality features. Correlating with the results in Figure 7 and 8, it also suggests that the personalized characteristic of low-degree users and items can be better captured by our algorithm.

5.5.3. Impact of the layer number L𝐿Litalic_L

To investigate the impact of the GNN layer number on model performance, we vary the hyper-parameter L𝐿Litalic_L in the range {1,2,3}123\{1,2,3\}{ 1 , 2 , 3 }. From the Figure 5(b), We can observe that the performance trend of RGCL differs across different datasets. For example, for the ML-1M dataset, the over-smoothing issue occurs even with small value of L𝐿Litalic_L, while for the Yelp dataset, the model shows the significant performance improvement as graph layer number L𝐿Litalic_L increases.

5.5.4. Impact of the temperature τ𝜏\tauitalic_τ

The temperature τ𝜏\tauitalic_τ plays an important role in contrastive learning (Wang and Liu, 2021). Figure 5(c) shows the impact of model performance w.r.t. different τ𝜏\tauitalic_τ. We can see that the performance fluctuates severely as we use different τ𝜏\tauitalic_τ. Specifically, too large values of τ𝜏\tauitalic_τ lead to poor performance, which is consistent with the previous work (Wu et al., 2021). Conversely, too small temperature values also fail to achieve optimal model performance. One possible reason is that too small τ𝜏\tauitalic_τ enforces the model to concentrate few hardest examples that dominate the optimization process, which is detrimental to achieve the satisfactory generalization ability. Therefore, a suitable temperature is essential to maximize the benefits from graph contrastive learning.

More Analysis. To comprehensively evaluate the superiority of RGCL, we conduct more extensive experiments in Appendix to answer the following research questions:

  • RQ5: What is the effect of RGCL on improving the representation uniformity of users and items? (cf. Appendix D.1)

  • RQ6: How does the RGCL framework perform when applied to other GNN backbones? (cf. Appendix D.2)

  • RQ7: How does RGCL maintain the semantic information of contrastive examples? (cf. Appendix D.3)

6. Related Work

Graph Neural Network in Recommendation. In recent years, the application of GNN models in recommender systems has achieved remarkable success (Wang et al., 2019; He et al., 2020; Berg et al., 2017; Chen et al., 2020). For example, NGCF (Wang et al., 2019) models the higher-order connectivity in user-item graph by explicitly injecting collaborative signals into the embedding process. Compared with NGCF, LightGCN (He et al., 2020) simplifies the design of GCN by removing redundant feature transformation and nonlinear activation function. However, GNN-based recommenders suffer from the sparsity of user-item interactions. Although external data sources (e.g., multi-behavior data and knowledge graphs) help mitigate the above issue, obtaining such data is often challenging and even unavailable due to expensive cost or privacy protection. In contrast, graph contrastive learning, as an popular self-supervised learning paradigm, effectively overcomes the challenge of data sparsity.

GCL-based Recommendation Models. Graph contrastive learning (GCL) bridges the advantages of GNN models with contrastive learning, effectively alleviating recommendation bias and simultaneously modeling high-order connectivity. Generally, GCL methods can be classified into hardness-driven models and rationality-driven methods. Specifically, for hardness-driven methods, their key task is to construct diverse and challenging augmented views. For example, GraphCL (You et al., 2020) and SGL (Wu et al., 2021) both devises multiple heuristic strategy to generate different contrastive views, such edge dropout and feature masking. However, these methods are prone to losing important semantic features since the augmentation operations are indeed unrelated to the downstream task yet simply based on human-designed experiences. In contrast, rationality-driven GCL methods alleviate the above issue by introducing slight feature perturbations to maintain semantic consistency, such as SimGCL (Yu et al., 2022) and RocSE (Ye et al., 2023). However, these methods still suffer from potential issues, such as insufficient contrastive hardness and tedious trial-and-error of hyper-parameter, resulting in suboptimal performance and poor flexibility. Compared with these methods, our method achieves a better balance between rationality and hardness of contrastive examples via well-designed decision boundary-aware perturbations and adversarial-contrastive view-generator.

7. Conclusion

In this paper, we propose a novel graph contrastive learning framework, named RGCL, aiming to strike a better trade-off between rationality and hardness for the contrastive view-generator. Specifically, we propose a decision boundary-aware perturbation constraints and relation-aware adversarial-contrastive augmentation to generate contrastive examples. Besides, RGCL generates adversarial examples based on the adversarial perturbations to achieve margin maximization between data points and the decision boundary, further improving the model robustness. Finally, we design a joint optimization objective to optimize model parameters.

Acknowledgments

This work is supported in part by National Key R&D Program of China (2023YFF0905402), National Natural Science Foundation of China (No. 62102420), Beijing Outstanding Young Scientist Program NO. BJJWZYJH012019100020098, Intelligent Social Governance Platform, Major Innovation & Planning Interdisciplinary Platform for the “DoubleFirst Class” Initiative, Renmin University of China, Public Computing Cloud, Renmin University of China, fund for building world-class universities (disciplines) of Renmin University of China, Intelligent Social Governance Platform. The work is sponsored by KuaiShou Technology Programs (No. 2022020091).

References

  • (1)
  • Berg et al. (2017) Rianne van den Berg, Thomas N Kipf, and Max Welling. 2017. Graph convolutional matrix completion. arXiv preprint arXiv:1706.02263 (2017).
  • Botev et al. (2010) Zdravko I Botev, Joseph F Grotowski, and Dirk P Kroese. 2010. Kernel density estimation via diffusion. (2010).
  • Cai et al. (2023) Xuheng Cai, Chao Huang, Lianghao Xia, and Xubin Ren. 2023. LightGCL: Simple Yet Effective Graph Contrastive Learning for Recommendation. arXiv preprint arXiv:2302.08191 (2023).
  • Chen et al. (2020) Lei Chen, Le Wu, Richang Hong, Kun Zhang, and Meng Wang. 2020. Revisiting graph based collaborative filtering: A linear residual graph convolutional network approach. In Proceedings of the AAAI conference on artificial intelligence, Vol. 34. 27–34.
  • Chen et al. (2019) Wen Chen, Pipei Huang, Jiaming Xu, Xin Guo, Cheng Guo, Fei Sun, Chao Li, Andreas Pfadler, Huan Zhao, and Binqiang Zhao. 2019. POG: personalized outfit generation for fashion recommendation at Alibaba iFashion. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining. 2662–2670.
  • Cho et al. (2011) Eunjoon Cho, Seth A Myers, and Jure Leskovec. 2011. Friendship and mobility: user movement in location-based social networks. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. 1082–1090.
  • Ding et al. (2020) Gavin Weiguang Ding, Yash Sharma, Kry Yik Chau Lui, and Ruitong Huang. 2020. MMA Training: Direct Input Space Margin Maximization through Adversarial Training. In International Conference on Learning Representations.
  • Gao et al. (2022) Chongming Gao, Shijun Li, Yuan Zhang, Jiawei Chen, Biao Li, Wenqiang Lei, Peng Jiang, and Xiangnan He. 2022. KuaiRand: An Unbiased Sequential Recommendation Dataset with Randomly Exposed Videos. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management. 3953–3957.
  • Glorot and Bengio (2010) Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics. JMLR Workshop and Conference Proceedings, 249–256.
  • Goodfellow et al. (2014) Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. 2014. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014).
  • Harper and Konstan (2015) F Maxwell Harper and Joseph A Konstan. 2015. The movielens datasets: History and context. Acm transactions on interactive intelligent systems (tiis) 5, 4 (2015), 1–19.
  • He et al. (2023) Wei He, Guohao Sun, Jinhu Lu, and Xiu Susie Fang. 2023. Candidate-aware Graph Contrastive Learning for Recommendation. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1670–1679.
  • He et al. (2020) Xiangnan He, Kuan Deng, Xiang Wang, Yan Li, Yongdong Zhang, and Meng Wang. 2020. Lightgcn: Simplifying and powering graph convolution network for recommendation. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval. 639–648.
  • He et al. (2017) Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural collaborative filtering. In Proceedings of the 26th international conference on world wide web. 173–182.
  • Huang et al. (2021) Tinglin Huang, Yuxiao Dong, Ming Ding, Zhen Yang, Wenzheng Feng, Xinyu Wang, and Jie Tang. 2021. Mixgcf: An improved training method for graph neural network-based recommender systems. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 665–674.
  • Jaiswal et al. (2020) Ashish Jaiswal, Ashwin Ramesh Babu, Mohammad Zaki Zadeh, Debapriya Banerjee, and Fillia Makedon. 2020. A survey on contrastive self-supervised learning. Technologies 9, 1 (2020), 2.
  • Jiang et al. (2020) Ziyu Jiang, Tianlong Chen, Ting Chen, and Zhangyang Wang. 2020. Robust pre-training by adversarial contrastive learning. Advances in neural information processing systems 33 (2020), 16199–16210.
  • Jiao et al. (2023) Xuewu Jiao, Weibin Li, Xinxuan Wu, Wei Hu, Miao Li, Jiang Bian, Siming Dai, Xinsheng Luo, Mingqing Hu, Zhengjie Huang, et al. 2023. PGLBox: Multi-GPU Graph Learning Framework for Web-Scale Recommendation. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 4262–4272.
  • Jin et al. (2023) Di Jin, Luzhi Wang, Yizhen Zheng, Guojie Song, Fei Jiang, Xiang Li, Wei Lin, and Shirui Pan. 2023. Dual Intent Enhanced Graph Neural Network for Session-based New Item Recommendation. In Proceedings of the ACM Web Conference 2023. 684–693.
  • Kingma and Ba (2014) Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
  • Lin et al. (2022) Zihan Lin, Changxin Tian, Yupeng Hou, and Wayne Xin Zhao. 2022. Improving graph collaborative filtering with neighborhood-enriched contrastive learning. In Proceedings of the ACM Web Conference 2022. 2320–2329.
  • Lu et al. (2023) Lingyun Lu, Bang Wang, Zizhuo Zhang, Shenghao Liu, and Han Xu. 2023. VRKG4Rec: Virtual Relational Knowledge Graph for Recommendation. In Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining. 526–534.
  • Madry et al. (2017) Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. 2017. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083 (2017).
  • Moosavi-Dezfooli et al. (2016) Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. 2016. Deepfool: a simple and accurate method to fool deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2574–2582.
  • Neyshabur et al. (2017) Behnam Neyshabur, Srinadh Bhojanapalli, David McAllester, and Nati Srebro. 2017. Exploring generalization in deep learning. Advances in neural information processing systems 30 (2017).
  • Oord et al. (2018) Aaron van den Oord, Yazhe Li, and Oriol Vinyals. 2018. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 (2018).
  • Paszke et al. (2019) Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32 (2019).
  • Rendle et al. (2012) Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2012. BPR: Bayesian personalized ranking from implicit feedback. arXiv preprint arXiv:1205.2618 (2012).
  • Robinson et al. (2020) Joshua Robinson, Ching-Yao Chuang, Suvrit Sra, and Stefanie Jegelka. 2020. Contrastive learning with hard negative samples. arXiv preprint arXiv:2010.04592 (2020).
  • Van der Maaten and Hinton (2008) Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research 9, 11 (2008).
  • Wang and Liu (2021) Feng Wang and Huaping Liu. 2021. Understanding the behaviour of contrastive loss. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2495–2504.
  • Wang et al. (2019) Xiang Wang, Xiangnan He, Meng Wang, Fuli Feng, and Tat-Seng Chua. 2019. Neural graph collaborative filtering. In Proceedings of the 42nd international ACM SIGIR conference on Research and development in Information Retrieval. 165–174.
  • Wei et al. (2022) Chunyu Wei, Jian Liang, Di Liu, and Fei Wang. 2022. Contrastive Graph Structure Learning via Information Bottleneck for Recommendation. Advances in Neural Information Processing Systems 35 (2022), 20407–20420.
  • Wen et al. (2020) Yuxin Wen, Shuai Li, and Kui Jia. 2020. Towards understanding the regularization of adversarial robustness on neural networks. In International Conference on Machine Learning. PMLR, 10225–10235.
  • Wu et al. (2021) Jiancan Wu, Xiang Wang, Fuli Feng, Xiangnan He, Liang Chen, Jianxun Lian, and Xing Xie. 2021. Self-supervised graph learning for recommendation. In Proceedings of the 44th international ACM SIGIR conference on research and development in information retrieval. 726–735.
  • Wu et al. (2022) Shiwen Wu, Fei Sun, Wentao Zhang, Xu Xie, and Bin Cui. 2022. Graph neural networks in recommender systems: a survey. Comput. Surveys 55, 5 (2022), 1–37.
  • Xia et al. (2022) Jun Xia, Lirong Wu, Jintao Chen, Bozhen Hu, and Stan Z Li. 2022. Simgrace: A simple framework for graph contrastive learning without data augmentation. In Proceedings of the ACM Web Conference 2022. 1070–1079.
  • Yang et al. (2023) Yonghui Yang, Zhengwei Wu, Le Wu, Kun Zhang, Richang Hong, Zhiqiang Zhang, Jun Zhou, and Meng Wang. 2023. Generative-Contrastive Graph Learning for Recommendation. (2023).
  • Ye et al. (2023) Haibo Ye, Xinjie Li, Yuan Yao, and Hanghang Tong. 2023. Towards robust neural graph collaborative filtering via structure denoising and embedding perturbation. ACM Transactions on Information Systems 41, 3 (2023), 1–28.
  • You et al. (2020) Yuning You, Tianlong Chen, Yongduo Sui, Ting Chen, Zhangyang Wang, and Yang Shen. 2020. Graph contrastive learning with augmentations. Advances in neural information processing systems 33 (2020), 5812–5823.
  • Yu et al. (2023) Junliang Yu, Xin Xia, Tong Chen, Lizhen Cui, Nguyen Quoc Viet Hung, and Hongzhi Yin. 2023. XSimGCL: Towards extremely simple graph contrastive learning for recommendation. IEEE Transactions on Knowledge and Data Engineering (2023).
  • Yu et al. (2022) Junliang Yu, Hongzhi Yin, Xin Xia, Tong Chen, Lizhen Cui, and Quoc Viet Hung Nguyen. 2022. Are graph augmentations necessary? simple graph contrastive learning for recommendation. In Proceedings of the 45th international ACM SIGIR conference on research and development in information retrieval. 1294–1303.
  • Zhang et al. (2023) Chi Zhang, Rui Chen, Xiangyu Zhao, Qilong Han, and Li Li. 2023. Denoising and Prompt-Tuning for Multi-Behavior Recommendation. In Proceedings of the ACM Web Conference 2023. 1355–1363.

Content of Appendix

Appendix A Analysis of Training Time Complexity

The extra training time complexity of RGCL comes from the loss terms of contrastive and adversarial components. Suppose the number of nodes and edges are |𝒱|𝒱|\mathcal{V}|| caligraphic_V | and |||\mathcal{E}|| caligraphic_E |, respectively. Let B𝐵Bitalic_B denote the batch size, d𝑑ditalic_d denote the embedding dimension, L denote the total layer number. We analyze the time complexity of each component as follows:

  • Original loss. The time complexity of the original LightGCN model comes from adjacent matrix construction, graph convolution computation and BPR calculation. Their time complexities are O(||)𝑂O(|\mathcal{E}|)italic_O ( | caligraphic_E | ), O(L||d)𝑂𝐿𝑑O(L|\mathcal{E}|d)italic_O ( italic_L | caligraphic_E | italic_d ) and O(Bd)𝑂𝐵𝑑O(Bd)italic_O ( italic_B italic_d ) respectively. Therefore, the total time complexity is O((L||+B)d)𝑂𝐿𝐵𝑑O((L|\mathcal{E}|+B)d)italic_O ( ( italic_L | caligraphic_E | + italic_B ) italic_d ).

  • Contrastive loss. To begin with, solving for the perturbation constraints in contrastive learning needs one pass of forward and backward propagation, where the time complexity is O(L||d)𝑂𝐿𝑑O(L|\mathcal{E}|d)italic_O ( italic_L | caligraphic_E | italic_d ). Then, constructing two random-augmented views requires two pass of forward propagation. As for adversarial-contrastive view, it also needs extra one pass of forward and backward propagation, where the time complexity of the contrastive loss paradigm is O(B2d)𝑂superscript𝐵2𝑑O(B^{2}d)italic_O ( italic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_d ). Therefore, the total time complexity of the contrastive learning component is O((L||+B2)d)𝑂𝐿superscript𝐵2𝑑O((L|\mathcal{E}|+B^{2})d)italic_O ( ( italic_L | caligraphic_E | + italic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) italic_d ).

  • Adversarial loss. The adversarial perturbations for generating adversarial examples has already been accounted in the contrastive loss part. Thus, in this part, we simply consider the time complexity of forward propagation and BPR loss, which are O(L||d)𝑂𝐿𝑑O(L|\mathcal{E}|d)italic_O ( italic_L | caligraphic_E | italic_d ) and O(Bd)𝑂𝐵𝑑O(Bd)italic_O ( italic_B italic_d ), respectively. Therefore, the total time complexity of the adversarial loss is O((L||+B)d)𝑂𝐿𝐵𝑑O((L|\mathcal{E}|+B)d)italic_O ( ( italic_L | caligraphic_E | + italic_B ) italic_d ).

In summary, the total time complexity of the proposed RGCL is O((L||+B2)d)𝑂𝐿superscript𝐵2𝑑O((L|\mathcal{E}|+B^{2})d)italic_O ( ( italic_L | caligraphic_E | + italic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) italic_d ), which maintains the same order of time complexity as other graph contrastive learning algorithms (Yu et al., 2022; Ye et al., 2023). However, the experimental results in Figure 3 demonstrates that our algorithm has better converge and accuracy performance.

Appendix B Further Robustness ANALYSIS

Inspired by previous work (Neyshabur et al., 2017; Xia et al., 2022), we provide the robustness analysis from the perspective of connections between sharpness of loss landscape and PAC-Bayes theory. Generally, smoother feature space can avoid large feature variations caused by input perturbations (Wen et al., 2020). Meanwhile, from the perspective of model optimization, flatter loss landscape can bring better model robustness. Specifically, assuming that the prior distribution 𝒬𝒬\mathcal{Q}caligraphic_Q over the model parameters, with probability at least 1ξ1𝜉1-\xi1 - italic_ξ over the draw of the training data, the expected error of BPRsubscript𝐵𝑃𝑅\mathcal{L}_{BPR}caligraphic_L start_POSTSUBSCRIPT italic_B italic_P italic_R end_POSTSUBSCRIPT can be bounded as follows:

(18) 𝔼𝚫[~BPR]𝔼𝚫[BPR]+4KL(𝜽+𝝃𝒬)+ln2mξm,subscript𝔼𝚫delimited-[]subscript~𝐵𝑃𝑅subscript𝔼𝚫delimited-[]subscript𝐵𝑃𝑅4KL𝜽conditional𝝃𝒬2𝑚𝜉𝑚\mathbb{E}_{\mathbf{\Delta}}\left[\widetilde{\mathcal{L}}_{BPR}\right]\leq% \mathbb{E}_{\mathbf{\Delta}}\left[\mathcal{L}_{BPR}\right]+4\sqrt{\frac{% \mathrm{KL}(\bm{\theta}+\bm{\xi}\|\mathcal{Q})+\ln\frac{2m}{\xi}}{m}},blackboard_E start_POSTSUBSCRIPT bold_Δ end_POSTSUBSCRIPT [ over~ start_ARG caligraphic_L end_ARG start_POSTSUBSCRIPT italic_B italic_P italic_R end_POSTSUBSCRIPT ] ≤ blackboard_E start_POSTSUBSCRIPT bold_Δ end_POSTSUBSCRIPT [ caligraphic_L start_POSTSUBSCRIPT italic_B italic_P italic_R end_POSTSUBSCRIPT ] + 4 square-root start_ARG divide start_ARG roman_KL ( bold_italic_θ + bold_italic_ξ ∥ caligraphic_Q ) + roman_ln divide start_ARG 2 italic_m end_ARG start_ARG italic_ξ end_ARG end_ARG start_ARG italic_m end_ARG end_ARG ,

where ~BPRsubscript~𝐵𝑃𝑅\widetilde{\mathcal{L}}_{BPR}over~ start_ARG caligraphic_L end_ARG start_POSTSUBSCRIPT italic_B italic_P italic_R end_POSTSUBSCRIPT represents the expected error, m𝑚mitalic_m is the size of training data, 𝚫𝚫\mathbf{\Delta}bold_Δ denotes the perturbation of model parameter. Then, we rewrite the above bound as follows:

(19) 𝔼𝚫[~BPR]𝔼[BPR]+𝔼𝚫[BPR]𝔼[BPR]Expected sharpnesssubscript𝔼𝚫delimited-[]subscript~𝐵𝑃𝑅𝔼delimited-[]subscript𝐵𝑃𝑅subscriptsubscript𝔼𝚫delimited-[]subscript𝐵𝑃𝑅𝔼delimited-[]subscript𝐵𝑃𝑅Expected sharpness\displaystyle\mathbb{E}_{\mathbf{\Delta}}\left[\widetilde{\mathcal{L}}_{BPR}% \right]\leq\mathbb{E}\left[\mathcal{L}_{BPR}\right]+\underbrace{\mathbb{E}_{% \mathbf{\Delta}}\left[\mathcal{L}_{BPR}\right]-\mathbb{E}\left[\mathcal{L}_{% BPR}\right]}_{\text{Expected sharpness}}blackboard_E start_POSTSUBSCRIPT bold_Δ end_POSTSUBSCRIPT [ over~ start_ARG caligraphic_L end_ARG start_POSTSUBSCRIPT italic_B italic_P italic_R end_POSTSUBSCRIPT ] ≤ blackboard_E [ caligraphic_L start_POSTSUBSCRIPT italic_B italic_P italic_R end_POSTSUBSCRIPT ] + under⏟ start_ARG blackboard_E start_POSTSUBSCRIPT bold_Δ end_POSTSUBSCRIPT [ caligraphic_L start_POSTSUBSCRIPT italic_B italic_P italic_R end_POSTSUBSCRIPT ] - blackboard_E [ caligraphic_L start_POSTSUBSCRIPT italic_B italic_P italic_R end_POSTSUBSCRIPT ] end_ARG start_POSTSUBSCRIPT Expected sharpness end_POSTSUBSCRIPT
+4KL(𝜽+𝚫𝒬)+ln2mξm,4KL𝜽conditional𝚫𝒬2𝑚𝜉𝑚\displaystyle+4\sqrt{\frac{\mathrm{KL}(\bm{\theta}+\bm{\Delta}\|\mathcal{Q})+% \ln\frac{2m}{\xi}}{m}},+ 4 square-root start_ARG divide start_ARG roman_KL ( bold_italic_θ + bold_Δ ∥ caligraphic_Q ) + roman_ln divide start_ARG 2 italic_m end_ARG start_ARG italic_ξ end_ARG end_ARG start_ARG italic_m end_ARG end_ARG ,

where expected sharpness 𝔼𝚫[BPR]𝔼[BPR]subscript𝔼𝚫delimited-[]subscript𝐵𝑃𝑅𝔼delimited-[]subscript𝐵𝑃𝑅\mathbb{E}_{\mathbf{\Delta}}\left[\mathcal{L}_{BPR}\right]-\mathbb{E}\left[% \mathcal{L}_{BPR}\right]blackboard_E start_POSTSUBSCRIPT bold_Δ end_POSTSUBSCRIPT [ caligraphic_L start_POSTSUBSCRIPT italic_B italic_P italic_R end_POSTSUBSCRIPT ] - blackboard_E [ caligraphic_L start_POSTSUBSCRIPT italic_B italic_P italic_R end_POSTSUBSCRIPT ] demonstrates that our method aims to reduce the sensitivity to model parameter variations and increase the smoothness of the feature space. Therefore, the proposed perturbation-based augmentation examples can achieve more robust and well-generalized model performance.

Table 3. Statistics of the datasets.
Dataset #Users #Items #Interactions Sparsity
ML-1M 6,038 3,489 820,336 96.1059%
Alibaba 12,265 6,145 193,120 99.7437%
Kuaishou 2,457 1,042 35,795 98.6019%
Gowalla 13,149 14,009 535,650 99.7092%
Yelp 42,324 28,748 1,611,965 99.8675%

Appendix C EXPERIMENT DETAILS

C.1. Recommendation Datasets

We conduct extensive experiments on the following five publicly available recommendation datasets in this paper: (1) MovieLens (ML)-1M111https://grouplens.org/datasets/movielens/ is a widely adopted movie recommendation dataset, containing the one million movie ratings provided by users, ranging from 1 to 5 stars. (2) Alibaba222https://github.com/wenyuer/POG is a fashion-related dataset and provides user behaviors related to both outfits and fashion items. (3) Kuaishou333https://kuairand.com/ contains user interactions on exposed short videos, collected from the video-sharing mobile App. (4) Gowalla444https://snap.stanford.edu/data/loc-gowalla.html is a checking-in dataset for item recommendation, collected from a location-based social networking website. (5) Yelp555https://www.kaggle.com/datasets/yelp-dataset/yelp-dataset/versions/2?resource=download is a widely-used business recommendation dataset collected from yelp website, where the business venues of users are viewed as the items.

To transform the explicit user ratings into implicit interaction behavior, the interactions with ratings above three are viewed as the positive example for rating-based datasets (i.e., ML-1M and Yelp). For Yelp and Gowalla datasets, we filter users and items that have less than fifteen interaction number to ensure the data quality. For all datasets, we randomly divide the data into training set, validation set and testing set using a ratio of 8:1:1. For negative samples used in BPR objective, we uniformly sample one negative item for each positive interaction. The overall experiments are repeated five times with different initialized seeds for significance test of model performance. The statistics of the five recommendation datasets are shown in Table 3.

C.2. Baselines

Traditional Recommenders:

  • BPRMF (Rendle et al., 2012) is a well known matrix factorization model by optimizing BPR loss function.

  • NeuMF (He et al., 2017) is a deep recommendation model, which aims to capture the non-linear correlations between users and items.

GNN-based Recommenders:

  • GCMC (Berg et al., 2017) is a graph auto-encoder framework to learn complex patterns and dependencies within the user-item interaction graph by differentiable message passing.

  • NGCF (Wang et al., 2019) is a collaborative filtering model that integrates interactions of user-item bipartite into the embedding process for modeling high-order connectivity.

  • GCCF (Chen et al., 2020) is a linear graph recommendation model, which alleviates the over smoothing problem by removing non-linearity and introducing the residual network structure.

  • LightGCN (He et al., 2020) is a graph-based recommender model, which enhances the collaborative filtering information by abandoning the feature transformation and nonlinear activation.

GCL-based Recommenders:

  • GraphCL (You et al., 2020) is a graph contrastive learning framework, which designs various types of graph augmentations to incorporate transformation randomness (e.g., attribute masking).

  • SGL (Wu et al., 2021) is a self-supervised learning method based on user-item bipartite interaction graph, which devises three augmentation strategies, aka., node dropout, edge dropout and random walk.

  • LightGCL (Cai et al., 2023) is a simple graph contrastive paradigm that utilizes the SVD for contrastive augmentation to integrate the global collaborative relation without structural refinement.

  • RocSE (Ye et al., 2023) is a robust graph collaborative filtering model, which adds in-distribution perturbation to construct a contrastive view-generator, which mimicking the behaviors of adversarial attacks.

  • CGI (Wei et al., 2022) is a graph contrastive model by designing learnable graph augmentation to adaptively learn whether to drop an edge or node and leveraging the information bottleneck technique to guide contrastive learning process.

  • SimGCL (Yu et al., 2022) is a GCL-based recommendation model, which discards the sophisticated graph augmentation and adopts to add uniform noises to the embedding space as contrastive views.

C.3. Implementation Details

We implement our RGCL with PyTorch (Paszke et al., 2019) framework. For fair comparison, all models are initialized with the Xavier method (Glorot and Bengio, 2010) and optimized by the Adam optimizer (Kingma and Ba, 2014). All hyper-parameters of baseline models are searched following suggestions from the original papers. The batch size and embedding dimension are fixed to 4,096 and 64, respectively. The learning rate is searched from {0.0005,0.001,0.005,0.01,0.05}0.00050.0010.0050.010.05\{0.0005,0.001,0.005,0.01,0.05\}{ 0.0005 , 0.001 , 0.005 , 0.01 , 0.05 }. The layer number of graph neural network is searched from {1,2,3}123\{1,2,3\}{ 1 , 2 , 3 }. We set μ=0.1𝜇0.1\mu=0.1italic_μ = 0.1 in Equation (15). The loss weight α𝛼\alphaitalic_α is tuned from {1e5,5e5,,1e2}1𝑒55𝑒51𝑒2\{1e-5,5e-5,\dots,1e-2\}{ 1 italic_e - 5 , 5 italic_e - 5 , … , 1 italic_e - 2 }. The initial hyper-parameter used for perturbation magnitude is chosen from {0.005,0.01,,1.0}0.0050.011.0\{0.005,0.01,\cdots,1.0\}{ 0.005 , 0.01 , ⋯ , 1.0 }. The search range of temperature coefficient τ𝜏\tauitalic_τ is {0.05,0.1,0.2,0.5,1.0,5.0,10.0}0.050.10.20.51.05.010.0\{0.05,0.1,0.2,0.5,1.0,5.0,10.0\}{ 0.05 , 0.1 , 0.2 , 0.5 , 1.0 , 5.0 , 10.0 }. Early stopping is utilized as the convergence criterion. Specifically, we evaluate the performance on the validation dataset for each epoch, and stop the training process once there is no accuracy improvement for 10 consecutive epochs.

Table 4. Generalization evaluation on different GNN-based backbones.
Model ML-1M Yelp
R@10 N@10 R@20 N@20 R@50 N@50 R@10 N@10 R@20 N@20 R@50 N@50
GCMC 0.1676 0.2480 0.2526 0.2551 0.4073 0.2985 0.0520 0.0400 0.0867 0.0520 0.1623 0.0740
GCMC + RGCL 0.1807 0.2608 0.2714 0.2707 0.4351 0.3176 0.0596 0.0463 0.0980 0.0596 0.1802 0.0835
Improv. +7.86% +5.15% +7.42% +6.11% +6.82% +6.42% +14.60% +15.65% +13.02% +14.44% +11.03% +12.82%
NGCF 0.1763 0.2544 0.2673 0.2647 0.4297 0.3121 0.0506 0.0390 0.0842 0.0507 0.1570 0.0718
NGCF + RGCL 0.1813 0.2565 0.2744 0.2683 0.4378 0.3165 0.0530 0.0405 0.0878 0.0526 0.1662 0.0752
Improv. +2.83% +0.81% +2.67% +1.36% +1.89% +1.41% +4.87% +3.86% +4.23% +3.71% +5.82% +4.72%
GCCF 0.1753 0.2624 0.2611 0.2677 0.4171 0.3109 0.0512 0.0399 0.0851 0.0517 0.1582 0.0730
GCCF + RGCL 0.1838 0.2679 0.2722 0.2747 0.4315 0.3195 0.0575 0.0451 0.0937 0.0576 0.1701 0.0798
Improv. +4.84% +2.09% +4.25% +2.61% +3.47% +2.76% +12.34% +12.98% +10.15% +11.49% +7.54% +9.32%
LightGCN 0.1774 0.2581 0.2680 0.2670 0.4310 0.3137 0.0612 0.0479 0.1001 0.0614 0.1814 0.0850
LightGCN + RGCL 0.1934 0.2694 0.2901 0.2821 0.4581 0.3321 0.0753 0.0591 0.1191 0.0744 0.2108 0.1010
Improv. +9.02% +4.39% +8.26% +5.65% +6.29% +5.86% +22.89% +23.39% +19.05% +21.19% +16.20% +18.84%
Algorithm 1 Learning Algorithm of RGCL
1:User-item bipartite graph 𝒢={𝒱,𝐀}𝒢𝒱𝐀\mathcal{G}=\{\mathcal{V},\mathbf{A}\}caligraphic_G = { caligraphic_V , bold_A }, adversarial loss weight μ𝜇\muitalic_μ, contrastive loss weight α𝛼\alphaitalic_α, initialized perturbation magnitude ϵitalic-ϵ\epsilonitalic_ϵ, temperature coefficient τ𝜏\tauitalic_τ, layer number K𝐾Kitalic_K, batch size B𝐵Bitalic_B, learning rate lr𝑙𝑟lritalic_l italic_r;
2:Learnable parameters 𝜽=𝐄𝜽𝐄\bm{\theta}=\mathbf{E}bold_italic_θ = bold_E,
3:RGCL Model;
4:while Model Not Convergence do
5:     // Calculate the decision boundary-aware perturbation
6:     Calculate the perturbation 𝚫u(k),𝚫i(k)superscriptsubscript𝚫𝑢𝑘superscriptsubscript𝚫𝑖𝑘\mathbf{\Delta}_{u}^{(k)},\mathbf{\Delta}_{i}^{(k)}bold_Δ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT , bold_Δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT using Eq. (6);
7:     // Calculate the contrastive loss
8:     Generate perturbation-constrained random views 𝐳usuperscriptsubscript𝐳𝑢\mathbf{z}_{u}^{\prime}bold_z start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, 𝐳u′′superscriptsubscript𝐳𝑢′′\mathbf{z}_{u}^{\prime\prime}bold_z start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT, 𝐳isuperscriptsubscript𝐳𝑖\mathbf{z}_{i}^{\prime}bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, 𝐳i′′superscriptsubscript𝐳𝑖′′\mathbf{z}_{i}^{\prime\prime}bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT using Eq. (6) and (7);
9:     Generate relation-aware adversarial-contrastive views 𝐳uac,𝐳iacsuperscriptsubscript𝐳𝑢𝑎𝑐superscriptsubscript𝐳𝑖𝑎𝑐\mathbf{z}_{u}^{ac},\mathbf{z}_{i}^{ac}bold_z start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a italic_c end_POSTSUPERSCRIPT , bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a italic_c end_POSTSUPERSCRIPT using Eq. (10) and (11);
10:     Calculate multi-view contrastive loss CLsubscript𝐶𝐿\mathcal{L}_{CL}caligraphic_L start_POSTSUBSCRIPT italic_C italic_L end_POSTSUBSCRIPT using Eq. (12);
11:     // Calculate the adversarial loss
12:     Generate adversarial examples 𝐳uadvsuperscriptsubscript𝐳𝑢𝑎𝑑𝑣\mathbf{z}_{u}^{adv}bold_z start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a italic_d italic_v end_POSTSUPERSCRIPT and 𝐳iadvsuperscriptsubscript𝐳𝑖𝑎𝑑𝑣\mathbf{z}_{i}^{adv}bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a italic_d italic_v end_POSTSUPERSCRIPT using Eq. (13);
13:     Calculate adversarial loss ADVsubscript𝐴𝐷𝑉\mathcal{L}_{ADV}caligraphic_L start_POSTSUBSCRIPT italic_A italic_D italic_V end_POSTSUBSCRIPT using Eq. (14);
14:     // Calculate the BPR loss;
15:     Calculate the BPR loss BPRsubscript𝐵𝑃𝑅\mathcal{L}_{BPR}caligraphic_L start_POSTSUBSCRIPT italic_B italic_P italic_R end_POSTSUBSCRIPT using Eq. (1);
16:     // Model optimization
17:     Calculate total loss \mathcal{L}caligraphic_L using Eq. (15);
18:     Update model parameter 𝜽𝜽\bm{\theta}bold_italic_θ using SGD;
19:end while
20:return 𝜽𝜽\bm{\theta}bold_italic_θ;

C.4. Details on User and Item Grouping

In the following, we provide the specific details of partitioning the user and item groups in Experiment 5.4:

  • USER: we split all users into five groups based on the number of user interaction while keeping the total number of each user group the same, which are denoted as [G0,G1,G2,G3,G4]subscript𝐺0subscript𝐺1subscript𝐺2subscript𝐺3subscript𝐺4[G_{0},G_{1},G_{2},G_{3},G_{4}][ italic_G start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_G start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_G start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_G start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_G start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ] in ascending order of interaction count.

  • ITEM: we group all items based on their popularity into five groups and similarly, we keep the total number of each item group the same. Specifically, we adopt the decomposed Recall and NDCG metrics defined as follows:

    Recall(Gi)=1Mu𝒰|l^uluGi||l^u|,RecallsubscriptG𝑖1𝑀subscript𝑢𝒰subscript^𝑙𝑢superscriptsubscript𝑙𝑢subscriptG𝑖subscript^𝑙𝑢\displaystyle\text{Recall}(\text{G}_{i})=\frac{1}{M}\sum_{u\in\mathcal{U}}% \frac{|\hat{l}_{u}\cap l_{u}^{\text{G}_{i}}|}{|\hat{l}_{u}|},Recall ( G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = divide start_ARG 1 end_ARG start_ARG italic_M end_ARG ∑ start_POSTSUBSCRIPT italic_u ∈ caligraphic_U end_POSTSUBSCRIPT divide start_ARG | over^ start_ARG italic_l end_ARG start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ∩ italic_l start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT | end_ARG start_ARG | over^ start_ARG italic_l end_ARG start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT | end_ARG ,
    NDCG(Gi)=1Mu𝒰j=1|l^u|𝕀(l^u(j)luGi)(log2(j+1))1t=1|l^u|(log2(t+1))1,NDCGsubscriptG𝑖1𝑀subscript𝑢𝒰superscriptsubscript𝑗1subscript^𝑙𝑢𝕀subscript^𝑙𝑢𝑗superscriptsubscript𝑙𝑢subscriptG𝑖superscriptsubscript2𝑗11superscriptsubscript𝑡1subscript^𝑙𝑢superscriptsubscript2𝑡11\displaystyle\text{NDCG}(\text{G}_{i})=\frac{1}{M}\sum_{u\in\mathcal{U}}\frac{% \sum_{j=1}^{|\hat{l}_{u}|}\mathbb{I}(\hat{l}_{u}(j)\in l_{u}^{\text{G}_{i}})(% \log_{2}(j+1))^{-1}}{\sum_{t=1}^{|\hat{l}_{u}|}(\log_{2}(t+1))^{-1}},NDCG ( G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = divide start_ARG 1 end_ARG start_ARG italic_M end_ARG ∑ start_POSTSUBSCRIPT italic_u ∈ caligraphic_U end_POSTSUBSCRIPT divide start_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT | over^ start_ARG italic_l end_ARG start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT | end_POSTSUPERSCRIPT blackboard_I ( over^ start_ARG italic_l end_ARG start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ( italic_j ) ∈ italic_l start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) ( roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_j + 1 ) ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT | over^ start_ARG italic_l end_ARG start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT | end_POSTSUPERSCRIPT ( roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_t + 1 ) ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_ARG ,

    where l^usubscript^𝑙𝑢\hat{l}_{u}over^ start_ARG italic_l end_ARG start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT and lusubscript𝑙𝑢l_{u}italic_l start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT represent the predicted and real Top-N recommendation list of user u𝑢uitalic_u, respectively, and 𝕀()𝕀\mathbb{I}(\cdot)blackboard_I ( ⋅ ) is the indication function. We use luGisuperscriptsubscript𝑙𝑢subscriptG𝑖l_{u}^{\text{G}_{i}}italic_l start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT to denote the item recommendation list within the group GisubscriptG𝑖\text{G}_{i}G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Here, we set |l^u|=min(|lu|,K)subscript^𝑙𝑢subscript𝑙𝑢𝐾|\hat{l}_{u}|=\min(|l_{u}|,K)| over^ start_ARG italic_l end_ARG start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT | = roman_min ( | italic_l start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT | , italic_K ).

Refer to caption
Figure 7. Visualization of item representation and degree on ML-1M and Yelp datasets. Darker colors indicate more points falling within the region.

C.5. Learning Algorithm of RGCL

The overall learning algorithm of the proposed RGCL framework is summarized in Algorithm 1.

Appendix D More Experimental Analysis

D.1. Visualization of Representation (RQ4)

To better understand how RGCL promotes the uniformity of representations for preserving personalized node information, we visualize the learned item embeddings and user embeddings in Figure 7 and Figure 8, respectively. Specifically, we firstly map the learned node representations to 2-dimensional normalized vectors using t-SNE (Van der Maaten and Hinton, 2008). Then, we use Kernel Density Estimation (KDE) (Botev et al., 2010) to visualize the distribution of transformed feature representations. Moreover, for a clearer demonstration, we also visualize the density estimations of their angles, where angles are calculated using the function: arctan2(y,x)𝑎𝑟𝑐𝑡𝑎𝑛2𝑦𝑥arctan2(y,x)italic_a italic_r italic_c italic_t italic_a italic_n 2 ( italic_y , italic_x ) for each instance (x,y)𝑥𝑦(x,y)( italic_x , italic_y ). We can observe our RGCL shows a better uniform distribution on both users and items. This shows that RGCL can effectively learn high-quality representations by avoiding the bias caused by the dominance of advantaged users and items. Besides, correlating with the results in Table 1, RGCL achieves a win-win breakthrough in representation uniformization and performance improvement compared other baselines, suggesting the superiority of our designs.

Refer to caption
Figure 8. Visualization of user representation and degree on ML-1M and Yelp datasets. Darker colors indicate more points falling within the region.

D.2. Generalization Evaluation (RQ5)

To verify the generalization of our proposed model-agnostic framework, we employ RGCL framework on three other commonly used GNN-based backbones, i.e., GCMC (Berg et al., 2017), NGCF (Wang et al., 2019) and GCCF (Chen et al., 2020). We summarize the experimental results in Table 4. From the table, we can see that RGCL generalizes well across different GNN-based backbones, further demonstrating the effectiveness and flexibility of our method. Additionally, the improvement based on the NGCF backbone is not significant, which we attribute to the redundant weight parameters and unnecessary nonlinear feature transformations of NGCF model, thus posing challenges to the model learning.

D.3. Case Study (RQ7)

In this section, we present a case study to intuitively show the effectiveness of our model to preserve the important semantic information of recommendation task. From the Figure 9, we can observe that user #315 prefers horror, action, and science fiction movies while showing less interest in comedy movies. Comparing the SimGCL and RGCL methods, although both original ranking results attain the correct ordering preferences for positive items and negative items, the introduction of noise perturbation for SimGCL baseline leads to a reversal in the predicted scores for movies #757 (liked movie) and movie #642 (disliked movie). It indicates that SimGCL baseline cannot reasonably control perturbations to preserve task-relevant information, resulting in irrational contrastive samples. In contrast, our proposed RGCL generates rational contrastive pairs and thus effectively improves model robustness and recommendation performance.

Refer to caption
Figure 9. Case study on ML-1M dataset. The ”Score (Origin.)” and ”Score (Pert)” indicate predicted scores based on the original and contrastive augmented user and item embeddings, respectively. Best viewed in color.