Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Privacy-preserving Cross-domain Recommendation with Federated Graph Learning

Published: 13 May 2024 Publication History
  • Get Citation Alerts
  • Abstract

    As people inevitably interact with items across multiple domains or various platforms, cross-domain recommendation (CDR) has gained increasing attention. However, the rising privacy concerns limit the practical applications of existing CDR models, since they assume that full or partial data are accessible among different domains. Recent studies on privacy-aware CDR models neglect the heterogeneity from multiple-domain data and fail to achieve consistent improvements in cross-domain recommendation; thus, it remains a challenging task to conduct effective CDR in a privacy-preserving way.
    In this article, we propose a novel, as far as we know, federated graph learning approach for Privacy-Preserving Cross-Domain Recommendation (PPCDR) to capture users’ preferences based on distributed multi-domain data and improve recommendation performance for all domains without privacy leakage. The main idea of PPCDR is to model both global preference among multiple domains and local preference at a specific domain for a given user, which characterizes the user’s shared and domain-specific tastes toward the items for interaction. Specifically, in the private update process of PPCDR, we design a graph transfer module for each domain to fuse global and local user preferences and update them based on local domain data. In the federated update process, through applying the local differential privacy technique for privacy-preserving, we collaboratively learn global user preferences based on multi-domain data and adapt these global preferences to heterogeneous domain data via personalized aggregation. In this way, PPCDR can effectively approximate the multi-domain training process that directly shares local interaction data in a privacy-preserving way. Extensive experiments on three CDR datasets demonstrate that PPCDR consistently outperforms competitive single- and cross-domain baselines and effectively protects domain privacy.

    1 Introduction

    In modern recommender systems, it has become a technical tendency to conduct multi-domain recommendation services for satisfying diverse user needs. For example, Amazon sets up 24 different product categories in its e-commerce platform. Such a multi-domain case becomes more common when considering the recommendation service provided by different apps that a user has installed. To improve multi-domain services of information systems, the topic of cross-domain recommendation (CDR) [67, 74] has gained increasing attention in both research and industry, which aims to enhance the recommendation quality at a single domain by leveraging useful information from other domains. Typical CDR approaches [67, 74] mainly try to establish linkages among different domains through overlapping users/items or cluster-level patterns and transfer useful information across these domains, such as collective matrix factorization [48, 70], codebook transfer [20, 29], and cross-domain graph transfer [12, 32].
    Despite the remarkable success, existing CDR methods usually rely on a strong assumption that full or partial user–item interaction data are accessible among different domains. However, this assumption may not be realistic in practice due to commercial competition [25, 74] and privacy concerns [16, 17, 33]. For example, the app data from different domains often belong to different companies or departments, which cannot be easily shared across the data boundaries [25, 74]. In addition, recent data protection regulations, such as GDPR,1 severely restrict the storage and sharing of privacy-sensitive domain data. The aforementioned privacy and security issues have significantly limited the cross-domain storage and sharing of highly sensitive data, such as interaction data, thereby constraining the practical application of CDR models. Consequently, there is a pressing need to develop a privacy-preserving method for cross-domain recommendation that can balance privacy concerns with the quality of cross-domain recommendations. Although several studies [16, 17, 33, 54, 66] propose privacy-aware CDR models, these methods either neglect the heterogeneity of multiple-domain data [16, 17, 66] or fail to achieve consistent improvements on both normal users and cold-start users [33]. Thus, it is still necessary to develop a privacy-preserving cross-domain recommendation method to balance privacy concerns with the quality of cross-domain recommendations.
    Inspired by the recent progress of federated learning in data privacy and security [64], we propose to enhance the privacy-preserving capacity of cross-domain recommender systems by limiting the global use of local-domain data. More specifically, we require that the interaction data from different domains be used only privately in local domains. Concurrently, federated learning serves as a distributed machine learning framework that ensures data privacy while maintaining its utility for analysis. However, existing federated recommender systems mainly focus on learning a unified central model or shared modules for recommendation based on distributed clients [3, 8, 32, 42, 54, 55, 58]. In contrast, CDR aims to improve the recommendation performance of local domains by leveraging globally learned information. Thus, it is non-trivial to adapt federated learning to solve the CDR task under the setting of privacy protection. A major challenge is that interaction data in different domains are heterogeneous and private, and we need to address the issues of distributed information fusion and domain-specific knowledge adaptation given limited data access or sharing.
    To tackle the challenge, we propose a novel federated graph learning approach for Privacy-Preserving Cross-Domain Recom-mendation (PPCDR). We consider a CDR setting with shared users across domains, where local interaction data can only be used in private domain space. The main idea of PPCDR is to model both global preference among all the domains and local preference at a specific domain for a given user. These two kinds of preferences characterize users’ shared or domain-specific tastes toward the items for interaction and are more effectively integrated through Graph Neural Networks (GNNs). For example, a user’s global preference for movie and book domains is the genre of suspense, while the local preference in book domain is the author of Higashino Keigo. In our PPCDR approach, global user preference plays the role of information surrogate to connect different domains (i.e., information communication across domains), and we employ the privacy protection strategy on it. Local user preference syncs with global preference (extracting and injecting information) and characterizes the domain-specific preference based on private domain data. Such a global-local preference modeling way improves single-domain performance and meanwhile ensures the data privacy.
    Specifically, for each domain, we construct a domain-specific user–item interaction graph by augmenting the links that connect global and local user nodes (corresponding to global and local user preferences) as shown in Figure 1 and then devise a federated graph learning method based on GNNs. To learn cross-domain knowledge for recommendation in a privacy-preserving way, each training iteration of PPCDR consists of a private update process within a local domain and a federated update across multiple domains. We design a graph transfer module for each domain to perform bi-directional message exchange and propagation, where the message-passing mechanisms of GNNs in each domain can efficiently fuse global and local user preferences, as well as capture collaborative signals inherent in local user–item interactions. Then, in the federated update process, each domain applies privacy protection technique (i.e., local differential privacy [45]) on the learned global user preferences to enhance privacy protection and then shares them to other domains. Meanwhile, each domain receives the global user preferences from other domains and then locally updates global user preferences by a personalized aggregation strategy for domain-specific adaptation. In this way, PPCDR can effectively approximate the multi-domain training process [12, 75] that directly shares local interaction data in a privacy-preserving way. Besides, we propose a periodic synchronization mechanism to reduce the communication cost brought by maintaining the global preferences across domains.
    Fig. 1.
    Fig. 1. An illustration of cross-domain preferences modeling. We introduce virtual global user to capture global user preference across domains in a privacy-preserving way.
    Our main contributions are summarized as follows:
    To the best of our knowledge, it is the first decentralized federated learning framework for cross-domain recommendation. Such an idea can make the traditional cross-domain recommendation methods more realistic when considering commercial competition and privacy concerns.
    We propose PPCDR, a novel privacy-preserving CDR method, which captures users’ general preferences based on distributed multi-domain data and improves recommendation performance for all domains without privacy leakage. Besides, PPCDR adopts the periodic synchronization mechanism to reduce the communication cost brought by capturing the global preferences across domains.
    Extensive experiments conducted on three real-world datasets and three cross-domain scenarios demonstrate that our approach is consistently better than several single- and cross-domain baselines and effectively protects domain privacy.

    2 Problem Formulation

    Following the setting in the literature [32, 73], we consider a cross-domain recommendation setting: a user u from the user set \(\mathcal {U}\) interacts with items from a set of domains \(\mathcal {D}\) (size \(|\mathcal {D}|\) ) corresponding to multiple item categories (e.g., Books, Movies, and Music). Each domain \(d \in \mathcal {D}\) is associated with an item set \(\mathcal {I}^d=\lbrace i\rbrace\) , where the superscript d denotes that the item set belongs to domain d.
    In our setting, we assume that the user set \(\mathcal {U}\) is shared across domains, while item sets \(\lbrace \mathcal {I}^1, \ldots , \mathcal {I}^{|\mathcal {D}|} \rbrace\) have no overlap between any two domains in \(\mathcal {D}\) . Given the item set \(\mathcal {I}^d=\lbrace i\rbrace\) of domain d, the observed interaction data can be denoted as a matrix \(\mathbf {R}^d \in \mathbb {R}^{|\mathcal {U}| \times |\mathcal {I}^d|}\) , where an element \(r_{u,i}=1\) if there exists an interaction between the user u and item \(i \in \mathcal {I}^d\) ; otherwise, \(r_{u,i}=0\) . Different from previous cross-domain recommendation settings [12, 32, 74], the item set \(\mathcal {I}^d\) and the interaction data \(\mathbf {R}^d\) are locally protected data in domain d, which cannot be directly transferred across domains due to privacy concerns [16, 17, 33]. To transfer protected information across domains, we consider setting a virtual global user \(\widetilde{u}\) to pair with each user u, where \(\widetilde{u}\) plays a global role in inter-domain information communication under privacy-protection, forming a virtual user set \(\widetilde{\mathcal {U}}\) . We enforce privacy protection through these virtual users when sharing cross-domain information.
    We characterize the multi-domain setting via a graph based representation: For each domain d, we construct a user–item interaction graph \(\mathcal {G}^d = \lbrace \mathcal {V}^d, \mathcal {E}^d \rbrace\) according to the interaction data \(\mathbf {R}^d\) in domain d, where \(\mathcal {V}^d = \lbrace \mathcal {U} \cup \widetilde{\mathcal {U}} \cup \mathcal {I}^d\rbrace\) denotes the nodes and \(\mathcal {E}^d = \lbrace (u,i)\;|\;u \in \mathcal {U} , i\in \mathcal {I}^d, r_{u,i} = 1 \rbrace \cup \lbrace \lbrace (u,\widetilde{u})\;|\;u \in \mathcal {U}, \;\widetilde{u} \in \widetilde{\mathcal {U}} \rbrace\) denotes the edges. To discriminate between the two kinds of user nodes (i.e., \(\widetilde{u}\) and u), we call them global user node and local user node, respectively. In our setting, global user node \(\widetilde{u}\) is only connected with local user node u, and user–item interactions only exist for local user nodes, as shown in Figure 1.
    Based on the above notations, our task is to predict user preference and recommend suitable items for each local domain by sharing protected information learned from all the domains in \(\mathcal {D}\) . Formally, for user u and item i in domain d, we compute its preference score via a function \(s(\cdot)\) as
    \(\begin{eqnarray*} \hat{r}_{u,i} = s(u; i \; |\; \mathcal {G}^d \;),\nonumber \nonumber \end{eqnarray*}\)
    where only the local data from domain d are directly accessible and the information from the other domains (i.e., inter-domain data) is used through virtual nodes under privacy protection.

    3 Methodology

    In the section, we introduce the proposed federated graph learning approach for PPCDR.

    3.1 Overview

    The overall architecture of PPCDR is illustrated in Figure 2. With the assumption that domain data are stored locally, PPCDR applies the decentralized federated learning framework to utilize cross-domain knowledge for recommendation without privacy leakage. As introduced in Section 2, for each domain, we set both global and local user nodes and further model global and local user preferences for both types of nodes, respectively. The local user preferences are learned based on within-domain data, while global user preferences are collaboratively updated across multiple domains. The two kinds of user preferences are associated via the global user nodes.
    Fig. 2.
    Fig. 2. The overall architecture of PPCDR. With the decentralized federated learning framework, PPCDR adopts the Graph Transfer Module to fuse global and local user preferences and updates global user preferences across multiple domains by a personalized aggregation strategy and local differential privacy (LDP) for domain-specific adaptation and privacy-preserving.
    Given the specially constructed user–item graph \(\mathcal {G}^{d}\) , we develop our approach based on GNN [21, 26], which can learn both global and local user preferences and capture their associations. For each learning iteration of PPCDR, we consider both a private update process within each local domain, and a federated update process globally across multiple domains.
    In the private update process, for each domain, we locally maintain the global user preferences and local user preferences according to within-domain data. In the federated update process, for each domain, we apply the local differential privacy technique on the learned global user preferences for privacy-preserving and share these protected global information to other domains. During the update process, each domain receives the global user preferences from other domains, which are learned via the private update process in other domains. Then, we design a personalized aggregation strategy for domain-specific adaptation on the received global user preferences. In this way, PPCDR approximates the training with the centralized multi-domain data in existing studies [12, 32, 75] and learns the cross-domain knowledge for recommendation in a privacy-preserving way. Furthermore, we propose a periodic synchronization mechanism to reduce the communication cost caused by sharing the global user preferences.

    3.2 Private Update within Single Domain

    In this section, we present the private update process within every single domain. The main idea of the private update process is to locally fuse global and local user preferences and update them based on within-domain data for capturing domain-specific knowledge.

    3.2.1 Modeling Local–Global Information Transfer.

    To efficiently model users’ global and local preferences and capture the collaborative signals inherent in local user–item interactions, we construct a local user–item graph for each domain based on within-domain data and incorporate virtual global user nodes to pair with the original local user nodes. These global user nodes and original user nodes are associated with learnable embeddings \(\mathbf {e}_{u} \in \mathbb {R}^m\) and \(\mathbf {e}_{\widetilde{u}} \in \mathbb {R}^m\) as their representations, respectively, and linked via edges in the interaction graph, as shown in Figure 1. Based on the constructed user–item graph, we design a graph transfer module by extending the traditional message-passing scheme of GNNs to facilitate (1) message exchange between global and local user preferences and (2) message propagation within the user–item graph.
    The graph transfer module can be conceptualized as an L-layer transformation. At the lth layer, our approach involves initiating a bi-directional embedding transfer to facilitate the exchange of messages between local and global user preferences. This process is formally expressed as follows:
    \(\begin{equation} \begin{aligned}{{\mathbf {h}}^\prime }_{u}^{(l)} &=f_{\mathrm{T}}\left(\mathbf {h}_{u}^{(l)}, {\mathbf {h}}_{\widetilde{u}}^{(l)} \; \big | \; \beta _1 \right), \quad \\ {\mathbf {h}}_{\widetilde{u}}^{(l+1)} &=f_{\mathrm{T}}\left({\mathbf {h}}_{\widetilde{u}}^{(l)}, \mathbf {h}_{u}^{(l)} \; \big | \; \beta _2 \right), \\ \end{aligned} \end{equation}\)
    (1)
    where \(\mathbf {h}_{u}^{(0)} = \mathbf {e}_{u}\) and \({\mathbf {h}}_{\widetilde{u}}^{(0)}={\mathbf {e}}_{\widetilde{u}}\) , \(f_{\mathrm{T}}(\cdot ,\cdot | \beta)\) is the transfer function, and \(\beta _1\) and \(\beta _2\) are hyper-parameters in the range of [0, 1] to control the retention ratio in transfer. Corresponding to the local and global user preferences, \({\mathbf {h}}_{u}^{(l)}\) and \({\mathbf {h}}_{\widetilde{u}}^{(l)}\) denote the local and global representations of user u before transferring, respectively, while \({{\mathbf {h}}^\prime }_{u}^{(l)}\) and \({\mathbf {h}}_{\widetilde{u}}^{(l+1)}\) are the transferred ones. Note that the local user representations \({{\mathbf {h}}^\prime }_{u}^{(l)}\) will be further updated in message propagation before being fed into the \((l+1)\) -th layer. And the local–global transfer function \(f_{\mathrm{T}}\) in Equation (1) is defined as
    \(\begin{equation} \begin{aligned}&f_{\mathrm{T}}\left(\mathbf {h}_{u}^{(l)}, {\mathbf {h}}_{\widetilde{u}}^{(l)} \; \big | \; \beta _1 \right) \\ = &\frac{1}{2}\left(\beta _1 \; \mathbf {h}_{u}^{(l)} + (1-\beta _1)\;{\mathbf {h}}_{\widetilde{u}}^{(l)}\right) + \frac{1}{2}\left(\dfrac{\left|\mathcal {N}_{u}\right|}{\left|\mathcal {N}_{u}\right|+1} \mathbf {h}_{u}^{(l)} + \dfrac{1}{\left|\mathcal {N}_{u}\right|+1} {\mathbf {h}}_{\widetilde{u}}^{(l)}\right), \\ \end{aligned} \end{equation}\)
    (2)
    where \(\mathcal {N}_{u}\) denotes the neighbors of user u in the graph \(\mathcal {G}^d\) , and \(f_{\mathrm{T}}({\mathbf {h}}_{\widetilde{u}}^{(l)}, \mathbf {h}_{u}^{(l)} \; \big | \; \beta _2)\) can be computed in a similar way. Here, we control the information transfer through two ways, i.e., either by hyperparameter (the first term of Equation (2)) or link structure (the second term of Equation (2)). For the second term, we assume that the fewer items a user has interacted with within a domain, the more information should be obtained from the global representations to model user preferences for recommendation.
    Note that in this work, we assume that the user set is shared across domains, when in reality there may be some users who only interact in a single domain. Therefore, we can adapt our method to this scenario by giving a virtual global representation to each user who interacts in only one domain. We will apply the bi-directional transfer between the virtual global representation and local representations with Equations (1)–(3). However, the virtual global representations are learned locally based on within-domain data rather than cross-domain data, which can improve the model’s ability to capture the local preferences of such users in this scenario.

    3.2.2 Message Propagation on Domain-specific Interaction Graph.

    Following the local–global information transfer, we proceed to utilize a standard message propagation mechanism to capture high-order connections within the interaction graph \(\mathcal {G}^d\) . In line with the approach introduced in LightGCN [21], we implement a lightweight GNN by omitting transformation matrices and nonlinear activation functions in the propagation operation. This can be expressed as
    \(\begin{equation} \begin{aligned}\mathbf {h}_{u}^{(l+1)} &=\sum _{i \in \mathcal {N}_{u}} \dfrac{\mathbf {h}_{i}^{(l)}}{\sqrt {|\mathcal {N}_{u}| \cdot |\mathcal {N}_{i}|}}, \quad \\ \mathbf {h}_{i}^{(l+1)} &=\sum _{u \in \mathcal {N}_{i}} \dfrac{{{\mathbf {h}^\prime }}_{u}^{(l)}}{\sqrt {|\mathcal {N}_{i}| \cdot |\mathcal {N}_{u}|}} , \end{aligned} \end{equation}\)
    (3)
    where \(\mathcal {N}_{i}\) denotes the neighbors of item i in the graph \(\mathcal {G}^d\) , and the item representation \(\mathbf {h}_{i}^{(0)}\) is initialized by learnable embedding \(\mathbf {e}_{i} \in \mathbb {R}^m\) . Since we involve virtual global user nodes in the interaction graph, this step also enhances the fusion between global and local user preferences.
    After L-layer transformation in the graph transfer module, we concatenate the representations produced by all L layers to obtain the final user and item representations as follows:
    \(\begin{equation} \begin{aligned}{\mathbf {h}}_{u} &= \operatorname{Concat}\left({\mathbf {h}}_{u}^{(0)}, \ldots , {\mathbf {h}}_{u}^{(L)} \right), \quad \\ {\mathbf {h}}_{i} &= \operatorname{Concat}\left({\mathbf {h}}_{i}^{(0)}, \ldots , {\mathbf {h}}_{i}^{(L)} \right), \end{aligned} \end{equation}\)
    (4)
    where \(\operatorname{Concat}(\cdots)\) denotes the concatenation operation. The ultimate user representations \(\mathbf {h}_{u}\) encapsulate both global and local user preferences, along with high-order information extracted from the user–item graph.

    3.2.3 Learning with Local User–Item Interaction Data.

    Given the above user and item representations, we adopt the inner product operation to generate the scores to predict how likely a user u will interact with item i from domain d,
    \(\begin{equation} \hat{r}_{u,i} = \mathbf {h}_{u}^\top \mathbf {h}_{i} , \end{equation}\)
    (5)
    where \(\hat{r}_{u,i}\) is the prediction score of user u and item i. Then we adopt Bayesian Personalized Ranking (BPR) loss [46] to update the local and global user embeddings based on single-domain interaction data, which is defined as
    \(\begin{equation} \mathcal {L}_{d} = \sum _{(u,i,j)\in \mathcal {O}^d} -\log \sigma (\hat{y}_{u,i}-\hat{y}_{u,j}) + \lambda ||\Theta ^d||_2 , \end{equation}\)
    (6)
    where \(\sigma (\cdot)\) is the sigmoid function, \(\lambda\) is set to control the strength of \(L_2\) regularization, \(\Theta ^d\) is the set of model parameters (e.g., local and global user embeddings) in domain d, \(\mathcal {O}^d=\lbrace (u,i,j)|r_{u,i} = 1, r_{u,j} = 0 \rbrace\) denotes the training data, and j denotes a sampled negative item in domain d that user u has not interacted with.
    Local user embeddings and domain-specific item embeddings are learned locally for each domain based on within-domain data, while the global user embeddings are collaboratively updated based on cross-domain data. We utilize the global user preferences as the bridge to fuse and utilize multi-domain information of users. In the next section, we will describe how the global user embeddings are learned based on cross-domain data.
    Summary of Privacy Protection. Note that in the private update process, the within-domain data are locally stored and will not be directly used by other domains. Although it will utilize the global user preferences for updating the local user preferences, we enforce the information protection on global user preferences (as will be shown in Section 3.3).

    3.3 Federated Update across Multiple Domains

    After applying the private update, we adopt a federated update process to collaboratively learn global user preferences based on multi-domain data and adapt these global preferences to heterogeneous domain data via personalized aggregation.

    3.3.1 Privacy-preserved Preference Sharing.

    In the private update process, each domain d maintains the global user embedding \(\mathbf {e}_{\widetilde{u}}\) for global user \(\widetilde{u}\) , which is locally updated with the within-domain data. To characterize more comprehensive user preferences, we need to learn cross-domain knowledge for enhancing the local user preferences in a privacy-preserving way. For this purpose, the proposed PPCDR provides fundamental privacy protection (no direct sharing of data) with the help of the federated learning framework. Meanwhile, privacy protection is further enhanced by local differential privacy to satisfy the flexible protection requirements of recommendation scenarios.
    Specifically, we adopt decentralized federated learning to collaboratively update these global user embeddings on data from multiple domains, i.e., these global user embeddings will be shared across multiple domains for federated aggregation. However, as pointed out by previous studies [8, 59], these user embeddings have encoded private information of user behaviors and cannot be directly shared out of domain due to the privacy concerns. Therefore, we apply local differential privacy (LDP) technique [45] on the global user embeddings before sharing them. The LDP theoretically guarantees that the leakage of private information is bounded by applying a randomized mechanism \(\mathcal {M}(\cdot)\) to private value x. The randomized mechanism [1, 4] satisfies ( \(\epsilon\) , \(\delta\) )-LDP if for any two adjacent private values x, \(x^{\prime }\) , and all possible outputs \(\mathcal {S}\) :
    \(\begin{equation} \operatorname{Pr}\left[\mathcal {M}\left(x\right) = \mathcal {S}\right] \le e^{\epsilon }\operatorname{Pr}\left[\mathcal {M}\left(x^{\prime }\right) = s\right]+\delta , \end{equation}\)
    (7)
    where the privacy budget \(\epsilon\) controls the tradeoff of utility–privacy and smaller \(\epsilon\) means better private information protection. The failure probability \(\delta\) allows for a small probability of failure for the privacy preservation mechanism. In previous works [42, 43, 58], the randomized mechanism \(\mathcal {M}(\cdot)\) is implemented by adding noise to the private value. We adopt the Gaussian mechanism [13] to guarantee ( \(\epsilon\) , \(\delta\) )-LDP by adding artificial Gaussian noise, where the amount of noise necessary to ensure differential privacy depends on the sensitivity of the individual examples. Specifically, for each global user embedding \({\mathbf {e}}_{\widetilde{u}}\) in domain d, we first clip it in \(L_2\) norm to limit the sensitivity and then add zero-mean Gaussian noise on it to obtain protected embeddings \({\mathbf {e}}_{\widetilde{u}}^d\) as follows:
    \(\begin{equation} \begin{aligned}\overline{\mathbf {e}}_{\widetilde{u}} &= {\mathbf {e}}_{\widetilde{u}} \;/\; \operatorname{max}\left(1, \dfrac{||{\mathbf {e}}_{\widetilde{u}}||}{C}\right), \\ {\mathbf {e}}_{\widetilde{u}}^d &= \overline{\mathbf {e}}_{\widetilde{u}} + \mathcal {N}(0, \sigma ^2), \end{aligned} \end{equation}\)
    (8)
    where C is the clipping threshold and \(\sigma = \sqrt {2\log (1.25/\delta)} C/\epsilon\) is the standard deviation of the Gaussian distribution. After that, each domain d sends out the protected embeddings \({\mathbf {e}}_{\widetilde{u}}^d\) for \(\widetilde{u} \in \widetilde{\mathcal {U}}\) and receives the shared global user embeddings \(\lbrace {\mathbf {e}}_{\widetilde{u}}^{d^\prime }|d^\prime \in \mathcal {D}, {d^\prime } \ne d\rbrace\) from other domains. We provide empirical analysis on the utility–privacy tradeoff of LDP in Section 4.4.

    3.3.2 Personalized Aggregation for Heterogeneous Data Fusion.

    Standard federated learning advocates employing unified parameters across multiple clients [35, 64]. However, in the context of cross-domain recommendation, it becomes crucial to incorporate domain-specific adaptation to enhance cross-domain knowledge utilization. For example, recommendations in the book domain may place a higher emphasis on information transferred from the movie domain than from the clothing domain. Consequently, we implement a personalized aggregation strategy, aiming to generate domain-specific global user embeddings for each domain. This approach ensures a more tailored and effective adaptation to the unique characteristics of each domain in CDR scenarios.
    In particular, we design an attention mechanism [51, 73] to perform the personalized preference aggregation in each domain. When domain d receives user u’s protected embeddings \(\lbrace {\mathbf {e}}_{\widetilde{u}}^{d^\prime } \;|\; d^\prime \in \mathcal {D}, {d^\prime } \ne d\rbrace\) shared by other domains, a self-attention mechanism with learnable transformation matrix \(\mathbf {W}\in \mathbb {R}^{2m}\) is applied to calculate the attention coefficients,
    \(\begin{equation} \alpha _{d, {d^\prime }}=\operatorname{softmax} \left(\operatorname{LeakyReLU}\left(\operatorname{Concat}\left({\mathbf {e}}_{\widetilde{u}}^d, {\mathbf {e}}_{\widetilde{u}}^{d^\prime } \right) \cdot \mathbf {W} \right)\right), \end{equation}\)
    (9)
    where the softmax function is used to normalize the attention coefficients. The attention coefficient \(\alpha _{d, {d^\prime }}\) signifies the significance of domain \({d^\prime }\) ’s knowledge to domain d. Utilized as weights for federated aggregation, these attention coefficients contribute to the generation of the global embedding for user u in domain d according to the following procedure:
    \(\begin{equation} {\mathbf {e}}_{\widetilde{u}} = \beta _3 \; {\mathbf {e}}_{\widetilde{u}} + (1-\beta _3)\;\sum _{d^\prime \in \mathcal {D}, {d^\prime } \ne d}\alpha _{d, {d^\prime }} \cdot {\mathbf {e}}_{\widetilde{u}}^{d^\prime } , \end{equation}\)
    (10)
    where \(\beta _3\) is a hyperparameter set in the range of [0, 1] to control the retention ratio, and the second term of the equation adaptively combines knowledge from different domains. The generated global user embedding \({\mathbf {e}}_{\widetilde{u}}\) for user \(\widetilde{u} \in \widetilde{\mathcal {U}}\) will be fed into graph transfer module described in Section 3.2.1 for the private update process in the next training round.
    Summary of Privacy Protection. Note that in the federated update process, instead of directly sharing highly privacy-sensitive domain data, our approach PPCDR shares the global user embeddings that are protected by applying LDP. In this way, the federated update process of PPCDR learns the cross-domain knowledge for recommendation in a privacy-preserving way. Besides, different from existing federated privacy-preserving CDR methods [33, 38, 62], PPCDR is designed to protect the data privacy of business partners (e.g., domains) rather than individual customers. Expressly, we assume that each client is the recommendation service provider in each domain, which contains the private data of a batch of users rather than a single user.

    3.4 Communication Optimization

    The federated update process incurs additional communication costs for recommendation algorithms, especially in scenarios involving numerous domains. We suggest a periodic synchronization mechanism to minimize these costs and provide a quantitative analysis of the communication cost.

    3.4.1 Periodic Synchronization.

    In decentralized federated learning, communication bandwidth is a major bottleneck for clients sharing local updates [41, 44, 64]. Building on recent advances in decentralized training [23, 53, 65], we propose a periodic synchronization mechanism to reduce communication costs. This mechanism syncs federated updates across domains after a fixed number of private updates. Each domain processes protected global user embeddings from others, applies personalized preference aggregation, and optimizes the objective function in Equation (6) through private updates before triggering a global update. We summarize the training process of PPCDR in Algorithm 1. Through the use of the periodic synchronization mechanism, the proposed PPCDR can adapt to scenarios with a multitude of domains by extending the synchronization interval. Additionally, we address potential scalability challenges through domain sub-sampling and discuss and experimentally validate these approaches in Sections 3.5 and 4.3.4, respectively.

    3.4.2 Communication Cost.

    The proposed periodic synchronization mechanism reduces the number of communication rounds among domains, leading to a decrease in the overall communication cost of the training process. For instance, with T iterations of private updates in each domain, requiring \(R=S/T\) rounds of communication, the total communication cost can be expressed as \(O(mR|\mathcal {U}||\mathcal {D}|(|\mathcal {D}|-1))\) , where m is the dimension of the global user embedding and R is the number of communication rounds. The periodic synchronization mechanism results in a T-fold reduction in communication cost for PPCDR. This highlights the effectiveness of periodic synchronization in enhancing the communication efficiency of federated updates. However, careful tuning of T is required, as periodic synchronization may affect the convergence and overall efficacy of CDR. A more detailed discussion on the impact of periodic synchronization on communication cost and the effectiveness of CDR is provided in Section 4.3.2.

    3.5 Algorithm Analysis

    In this section, we conduct algorithm analysis on PPCDR, including the privacy preservation capacity and algorithm complexity.
    Analysis on Privacy Protection. In this study, the proposed PPCDR framework offers foundational privacy protection by not directly sharing data, utilizing the federated learning framework. Additionally, privacy safeguards are bolstered through the incorporation of privacy-preserving algorithms, such as differential privacy, to accommodate diverse protection requirements within the recommendation scenario. In particular, we provide the analysis of the privacy preservation capacity of PPCDR in the following aspects. First, the privacy-sensitive interaction data of each domain are locally stored and never directly used by other domains, which effectively alleviates the risk of privacy leakage [35, 42]. Second, the shared global user representations across domains are essentially abstractive and compressive embeddings, containing much less private information than the raw interaction data, according to the data processing inequality [35]. Third, we apply LDP technique on the global user embeddings, which further guarantees the leakage of private information is bounded. Note that the training process requires each domain to send out the protected embeddings to other domains in multiple iterations, which leads to an accumulation of privacy cost according to the moments accountant [1]. Therefore we consider providing the periodic synchronization mechanism to reduce the communication rounds. An empirical study on the tradeoff between privacy protection and model performance is shown in Section 4.4. In a nutshell, with the above privacy protection mechanism, PPCDR can effectively learn and utilize cross-domain knowledge for recommendation in a privacy-preserving way.
    Analysis on Algorithm Complexity. For each domain d, the major time complexity of PPCDR is in threefold: the graph transfer module, the personalized aggregation, and loss calculation. For the graph transfer module, the time complexity is \(O(mL|\mathbf {R}^d|+mL|\widetilde{\mathcal {U}}|)\) , where m denotes the dimension of embedding, L denotes the number of graph transfer layers, \(|\mathbf {R}^d|\) denotes the number of interactions in domain d, and \(|\widetilde{\mathcal {U}}|=|{\mathcal {U}}|\) is the number of virtual global users. For the personalized aggregation, the time complexity is \(O(m|\widetilde{\mathcal {U}}||\mathcal {D}|)\) , where \(|\mathcal {D}|\) denotes the number of domains. With one negative item for each positive instance, the computation complexity of BPR loss is \(O(m|\mathbf {R}^d|)\) . In total, the time complexity of each training epoch is \(O(mL|\mathbf {R}^d| + m|\mathbf {R}^d|+mL|\widetilde{\mathcal {U}}| + m|\widetilde{\mathcal {U}}||\mathcal {D}|))\) . Besides, the inference time complexity is \(O(mL|\mathbf {R}^d|+mL|\widetilde{\mathcal {U}}))\) . Take LightGCN [21], for example; its training time complexity and inference time complexity are \(O(mL|\mathbf {R}^d| + m|\mathbf {R}^d|)\) and \(O(mL|\mathbf {R}^d|),\) respectively. Considering \(|\mathcal {D}| \ll |\widetilde{\mathcal {U}}| \lt |\mathbf {R}^d|\) , for each domain, the total time complexity of PPCDR is approximated to that of LightGCN.
    Potential Limitations of Scenario Setting. In this work, we consider a CDR setting with a “user-overlap item-non-overlap” scenario, where the user set is shared across domains, while the item sets do not overlap between any two domains. Indeed, real-world scenarios often involve “partial user overlap” or “partial item overlap,” and our approach can efficiently adapt to such scenarios. As for the scenario with partial user overlap, we can modify our approach by assigning a virtual global representation to each user who exclusively interacts within a single domain. We continue to employ the bidirectional transfer mechanism between the virtual global representation and local representations using Equations (1)–(3). However, virtual global representations are acquired through local learning from within the domain-specific data rather than across-domain data. To demonstrate the generalizability of our approach in this scenario, more experimental analyses are conducted in Section 4.3.5. As for the scenario with partial item overlap, the setting of “no overlap in item sets between any two domains” in our study is a more stringent requirement than partial item overlap. Therefore, our proposed method is suitable for this situation without any modification.
    Potential Limitations of Domain Quantity. In real-world applications, the number of domains is often limited, as exemplified by the Amazon dataset, which includes approximately 20 main domains. However, for the rare scenario of an exceedingly large number of domains, we propose two possible solutions to address potential scalability challenges: (1) Optimizing communication topology: In the standard setting, our proposed approach performs decentralized federated learning across a full connected communication network. However, for cases where the number of domains is large, the communication network can be tailored to a ringlike communication topology [28], where each domain exclusively communicates with its neighboring domains for federated updates. In this setup, participating domains or clients update their parameters based on their local datasets and subsequently aggregate these updates along with the model updates from their neighboring domains [61]. (2) Domain subsampling. When dealing with a substantial number of domains, we can implement a limitation on the number of domains permitted to transmit their updates to other domains during each communication round. In every communication round, all participating clients compute their updates; however, only those with “significant” updates are allowed to communicate with others. The significance may be evaluated by metrics such as the norm of the update or other relevant criteria [10, 56]. To provide a more detailed empirical analysis, we perform random domain subsampling and explore the impact of device subsampling on model performance in Section 4.3.4.

    4 Experiments

    In this section, we conduct a series of experiments to demonstrate the effectiveness and efficiency of PPCDR.

    4.1 Experimental Setup

    We first describe the experimental setup, including adopted datasets, compared methods, evaluation metrics, and implementation details.

    4.1.1 Datasets.

    To evaluate our proposed method, we construct datasets for CDR scenario based on the widely used recommendation dataset Amazon and Douban:
    Amazon 2 contains product ratings, reviews, and metadata. Among the largest categories, we choose “Book” (denoted as Book), “Movies and TV” (denoted as Movie) and “CDs and Vinyl” (denoted as Music) as three domains. Based on these domain data, we construct three CDR scenarios, e.g., Book \(\leftrightarrow\) Movie, Book \(\leftrightarrow\) Music, and Book \(\leftrightarrow\) Movie \(\leftrightarrow\) Music, where “ \(\leftrightarrow\) ” means the bidirectional information propagation between domains. The statistics of processed Amazon datasets are summarized in Table 1.
    Table 1.
    ScenariosDatasets#Users#Items#InteractionsDensity
    Books \(\leftrightarrow\) MovieBook24,77659,496759,2100.052%
    Movie24,77625,822600,3850.094%
    Books \(\leftrightarrow\) MusicBook7,27422,725239,2810.145%
    Music7,27415,684180,8260.158%
    Books \(\leftrightarrow\) Movie \(\leftrightarrow\) MusicBook3,59815,579150,0620.268%
    Movie3,59812,652217,1960.477%
    Music3,59810,556113,5150.299%
    Table 1. Statistics of Amazon Datasets
    Douban 3 is crawled from Douban, a popular Chinese online social network. This dataset consists of three domains, “Douban-Movies” (denoted as D-Movie), “Douban-Music” (denoted as D-Music), and “Douban-Books” (denoted as D-Book). Based on these domain data, we construct another three CDR scenarios, D-Book \(\leftrightarrow\) D-Movie, D-Book \(\leftrightarrow\) D-Music, and D-Book \(\leftrightarrow\) D-Movie \(\leftrightarrow\) D-Music, where “ \(\leftrightarrow\) ” means the bidirectional information propagation between domains. The details of Douban datasets are showed in Table 2.
    Table 2.
    ScenariosDatasets#Users#Items#InteractionsDensity
    D-Book \(\leftrightarrow\) D-MovieD-Book15,58733,219796,2630.125%
    D-Movie15,58726,6382,501,4900.602%
    D-Book \(\leftrightarrow\) D-MusicD-Book12,04329,641676,9020.190%
    D-Music12,04337,8481,028,8200.226%
    D-Book \(\leftrightarrow\) D-Movie \(\leftrightarrow\) D-MusicD-Book11,15429,360661,7660.268%
    D-Movie11,15425,7622,229,0530.776%
    D-Music11,15437,6011,011,2670.241%
    Table 2. Statistics of Douban Datasets
    We adopt the five-core dataset, where the overlapping users in both domains are extracted, and each user or item has at least five interactions. For each dataset, we transform them into implicit data and randomly divide the historical interactions of each user into the training set, validation set, and test set with a ratio of 8:1:1. We uniformly sample one negative item for each positive instance to form the training set.

    4.1.2 Baseline Models.

    We compare the proposed PPCDR with both single-domain and cross-domain baseline methods. The adopted single-domain baselines include the following:
    BPRMF [46] is a classical collaborative filtering method that adopts matrix factorization to learn the latent representation of users and items. This method conducts inner product between user and item to predict the interaction and is optimized by the pairwise ranking loss in the Bayesian approach.
    LightGCN [21] is a competitive GNN-based method that removes the unnecessarily complicated design of GCNs for collaborative filtering. This method consists of two essential components, i.e., light graph convolution and layer combination, which is more concise and appropriate for recommendation.
    Federated Collaborative Filtering (FCF) [3] is the pioneer of federated collaborative filtering for privacy-preserving personalized recommendations. In the Federated Learning paradigm, this method federates the standard collaborative filter using stochastic gradient descent based approach.
    The adopted cross-domain baselines are listed as follows:
    CMF [48] is a collective MF-based model, which first jointly learns the embeddings in different domains for the overlapping users and then optimizes the target domain. Specifically, this method simultaneously factors several matrices, which share parameters among factors when an entity participates in multiple domains.
    MTCDR [75] is a multi-target CDR framework, which improves the dual-target CDR framework (namely DTCDR [73]) with the graph embedding technique. In our experiments, we adopt GA-MTCDR-P as the instantiation of MTCDR. This method levarages personalized training strategies to train recommendation models in different domains.
    BiTGCF [32] is a transfer learning method for cross-domain recommendation, which combines the idea of high-order feature propagation in graph structure with transfer learning. This method consists of two essential modules, i.e., a feature propagation module and a feature transfer module. In our experiments, we use LightGCN as the basic model of BiTGCF.
    FedCT [33] is a federated variational inference framework designed for cross-domain recommendation tasks, which maintains a decentralized user encoding on each personal device. This method protects the private information of each user by restricting the direct transfer of user embeddings and assumes heterogeneous user representations across domains.
    FedCDR [38] is a personalized federated learning framework designed for privacy-preserving cross-domain rating prediction, which consists of rating prediction model and cross-domain recommendation model. This method employs the simplest recommendation model based on MF as the primary network.
    We train the single-domain baselines independently on each dataset for a fair comparison. For single-target cross-domain baseline CMF, we successively use one domain as the target domain and the others as the auxiliary domain to train it separately. Besides, since it is difficult to adapt the dual-target cross-domain method BiTGCF to the scenario of three domains, we do not report its experimental results in the scenario Book \(\leftrightarrow\) Movie \(\leftrightarrow\) Music. We present a comparison of the proposed PPCDR and baseline methods in Table 3.
    Table 3.
    MethodsSingal-DomainCross-DomainMulti-DomainGNNData Storage
    BPRMF [46]Centralized
    LightGCN [21]Centralized
    FCF [3]Local
    CMF [48]Centralized
    MTCDR [75]Centralized
    BiTGCF [32]Centralized
    FedCT [33]Local
    FedCDR [38]Local
    PPCDR (ours)Local
    Table 3. Comparison between the Proposed PPCDR and Baseline Methods

    4.1.3 Evaluation Metrics.

    To evaluate the performance of top-N recommendation on each domain, we adopt Recall (Recall@N) and Normalized Discounted Cumulative Gain (NDCG@N), which have been widely used in previous works [21, 32, 50]. Recall measures the percentage of relevant items that were retrieved, which does not consider the actual rank of the items. NDCG takes the ranking of recommended items into consideration,
    \(\begin{equation*} \begin{aligned}\text{ Recall@N } & =\frac{1}{\left|\mathcal {U}_{\text{test }}\right|} \sum _{u \in \mathcal {U}_{\text{test }}} \frac{1}{\left|\mathcal {I}_u\right|} \sum _{n=1}^N \mathbb {I}\left[\omega (n) \in \mathcal {I}_u\right], \end{aligned} \nonumber \nonumber \end{equation*}\)
    \(\begin{equation*} \begin{aligned}\mathrm{NDCG} @ N & =\frac{1}{\left|\mathcal {U}_{\text{test }}\right|} \sum _{u \in \mathcal {U}_{\text{test }}} \frac{1}{\mathrm{IDCG}_u} \sum _{n=1}^N \frac{2^{\mathbb {I}\left[\omega (n) \in \mathcal {I}_u\right]}-1}{\log _2(n+1)}, \\ \mathrm{IDCG} & =\sum _{n=1}^N \frac{1}{\log _2(n+1)}, \end{aligned} \nonumber \nonumber \end{equation*}\)
    where we use the ideal discounted cumulative gain (IDCG), \(\omega (n)\) is the item at rank n, \(\mathcal {U}_{\text{test }}\) is the set of users in test set, \(\mathcal {I}_u\) is the set of items that user interacted with, and \(\mathcal {I}[\cdot ]\) is an indicator function that returns 1 when the condition is true. For these metrics, the larger value, the better performance. These metrics are widely used in previous studies [21, 32, 47, 75], and we set N to be 10 and 20 in this work. To ensure the reliability of the experimental results, we use the all-ranking strategy, i.e., we rank all items that the user has not interacted with and report the average values across all the testing users.

    4.1.4 Implementation Details.

    We implement our method and baseline methods with the open source recommendation framework RecBole4 [71, 72]. For fair comparison, we optimize all the methods with Adam and carefully tune the hyper-parameters. We initialize the model parameters by the default Xavier distribution. We set the number of training epochs is 200 and apply early stopping with the patience of 10 epochs. We fix the batch size as 4,096, the clipping threshold as 1 and the standard deviation of noise as 0.1. We auto-search the learning rate in {1e-4, 2e-4, 5e-4, 1e-3, 2e-3}, the number of graph transfer layer in {2, 3, 4, 5}, the retention ratios \(\lbrace \beta _1, \beta _2, \beta _3\rbrace\) in {0, 0.1, 0.2, ..., 0.9, 1}, and the strength of \({L}_{2}\) regularization in {1e-4, 1e-3}. We tuned the hyper-parameters on the validation set and report the performance on the test set of the best hyper-parameters.

    4.2 Overall Performance

    In this section, we present overall preformance of PPCDR on Amazon and Douban datasets.

    4.2.1 Performance on Amazon Datasets.

    The performance comparison of the proposed PPCDR and baseline methods on three CDR datasets is reported in Tables 4 and 5. From these tables, we can observe that for single-domain recommendation methods, GNN-based methods (e.g., LightGCN) achieve better performance than the shallow methods (e.g., BPRMF and FCF), which demonstrates the effectiveness of incorporating high-order information of the user–item graphs. Compared to the traditional methods (e.g., BPRMF and LightGCN), federated learning methods (e.g., FCF) can effectively protect user privacy. However, as the cost of privacy-preserving, these federated learning methods perform worse due to the differential privacy techniques and decentralized interaction data. In particular, FCF and MF achieve similar performance, while the performance gap between FCF and LightGCN is more significant. These experimental results demonstrate the challenge of extracting higher-order information brought by decentralized domain data.
    Table 4.
    ScenariosDataset Methods NDCG@10Recall@10NDCG@20Recall@20
    Books \(\leftrightarrow\) MovieBooks BPRMF 0.02260.03490.02740.0537
     LightGCN 0.02960.04650.03570.0693
     FCF 0.02240.03450.02660.0523
     PPCDR  \({\bf 0.0369}\) \(^{*}\) \({\bf 0.0569}\) \(^{*}\) \({\bf 0.0438}\) \(^{*}\) \({\bf 0.0836}\) \(^{*}\)
    Movie BPRMF 0.02760.04650.03520.0754
     LightGCN 0.03460.05740.04270.0882
     FCF 0.02710.04600.03550.0748
     PPCDR  \({\bf 0.0401}\) \(^{*}\) \({\bf 0.0642}\) \(^{*}\) \({\bf 0.0487}\) \(^{*}\) \({\bf 0.0970}\) \(^{*}\)
    Books \(\leftrightarrow\) MusicBooks BPRMF 0.03030.05090.03740.0774
     LightGCN 0.03840.06480.04690.0947
     FCF 0.02980.05010.03610.0766
     PPCDR  \({\bf 0.0456}\) \(^{*}\) \({\bf 0.0710}\) \(^{*}\) \({\bf 0.0548}\) \(^{*}\) \({\bf 0.1063}\) \(^{*}\)
    Music BPRMF 0.03910.06000.04830.0951
     LightGCN 0.04520.07830.05480.1091
     FCF 0.03810.05890.04780.0948
     PPCDR  \({\bf 0.0520}\) \(^{*}\) \({\bf 0.0844}\) \(^{*}\) \({\bf 0.0629}\) \(^{*}\) \({\bf 0.1262}\) \(^{*}\)
    Books \(\leftrightarrow\) Movie \(\leftrightarrow\) MusicBooks BPRMF 0.03360.04700.04020.0733
     LightGCN 0.04270.06100.04900.0858
     FCF 0.03310.04660.04050.0729
     PPCDR  \({\bf 0.0488}\) \(^{*}\) \({\bf 0.0686}\) \(^{*}\) \({\bf 0.0512}\) \(^{*}\) \({\bf 0.0991}\) \(^{*}\)
    Movie BPRMF 0.02950.03690.03500.0585
     LightGCN 0.03820.04880.04450.0739
     FCF 0.02890.03720.03420.0569
     PPCDR  \({\bf 0.0412}\) \(^{*}\) \({\bf 0.0504}\) \(^{*}\) \({\bf 0.0489}\) \(^{*}\) \({\bf 0.0788}\) \(^{*}\)
    Music BPRMF 0.03690.05880.04490.0888
     LightGCN 0.04780.06920.05760.1060
     FCF 0.03680.05640.04410.0877
     PPCDR  \({\bf 0.0547}\) \(^{*}\) \({\bf 0.0812}\) \(^{*}\) \({\bf 0.0655}\) \(^{*}\) \({\bf 0.1203}\) \(^{*}\)
    Table 4. Overall Performance Comparison with Single-domain Methods in Amazon Scenarios
    The best result is in bold and the runner-up is underlined. The superscript “ \(^{*}\) ” indicates the statistical significance for p < 0.01 compared to the best baseline.
    Table 5.
    ScenariosDataset Methods NDCG@10Recall@10NDCG@20Recall@20
    Books \(\leftrightarrow\) MovieBooks CMF 0.02260.03470.02790.0545
     MTCDR 0.02670.03930.03110.0603
     BiTGCF 0.03520.05520.04200.0812
     FedCT 0.02560.03670.02980.0554
     FedCDR 0.03280.05080.03930.0757
     PPCDR  \({\bf 0.0369}^*\) \({\bf 0.0569}^*\) \({\bf 0.0438}^*\) \({\bf 0.0836}^*\)
    Movie CMF 0.02930.04900.03650.0768
     MTCDR 0.03220.05140.03870.0791
     BiTGCF 0.03850.06450.04700.0956
     FedCT 0.03030.04880.03680.0779
     FedCDR 0.03580.05830.04380.0891
     PPCDR  \({\bf 0.0401}^*\) 0.0642 \({\bf 0.0487}^*\) \({\bf 0.0970}^*\)
    Books \(\leftrightarrow\) MusicBooks CMF 0.03370.05230.04050.0779
     MTCDR 0.03550.05680.04120.0816
     BiTGCF 0.04380.07030.05150.0985
     FedCT 0.03460.05430.04030.0811
     FedCDR 0.04100.06450.04980.0968
     PPCDR  \({\bf 0.0456}^*\) \({\bf 0.0710}^*\) \({\bf 0.0548}^*\) \({\bf 0.1063}^*\)
    Music CMF 0.03650.06050.04400.0888
     MTCDR 0.04260.06550.05220.1003
     BiTGCF 0.04870.07840.05950.1198
     FedCT 0.04310.06610.05260.0997
     FedCDR 0.04620.07290.05620.1112
     PPCDR  \({\bf 0.0520}^*\) \({\bf 0.0844}^*\) \({\bf 0.0629}^*\) \({\bf 0.1262}^*\)
    Books \(\leftrightarrow\) Movie \(\leftrightarrow\) MusicBooks CMF 0.03420.05050.04020.0726
     MTCDR 0.03810.05310.04420.0758
     FedCT 0.03460.05170.04120.0756
     PPCDR  \({\bf 0.0488}^*\) \({\bf 0.0686}^*\) \({\bf 0.0512}^*\) \({\bf 0.0991}^*\)
    Movie CMF 0.03230.03890.03820.06
     MTCDR 0.03450.04520.04110.0713
     FedCT 0.03340.03970.03970.0642
     PPCDR  \({\bf 0.0412}^*\) \({\bf 0.0504}^*\) \({\bf 0.0489}^*\) \({\bf 0.0788}^*\)
    Music CMF 0.04110.06290.04860.0907
     MTCDR 0.04220.06510.05060.0968
     FedCT 0.04020.06350.04850.0911
     PPCDR  \({\bf 0.0547}^*\) \({\bf 0.0812}^*\) \({\bf 0.0655}^*\) \({\bf 0.1203}^*\)
    Table 5. Overall Performance Comparison with Cross-domain Methods in Amazon Scenarios
    The best result is in bold and the runner-up is underlined.
    For cross-domain recommendation baselines, the performance is generally superior to that of the corresponding single-domain methods, which confirms the effectiveness of cross-domain knowledge for recommendation. Specifically, the performance of CMF is better than BPRMF in most cases but slightly worse than BPRMF on a few datasets. In contrast, MTCDR that aggregates parameters through the attention mechanism consistently performs better than CMF. These results imply that sharing unified model parameters across domains might not improve performances of all domains simultaneously, which motivates us to perform personalized aggregation in PPCDR. Moreover, the graph-based methods BiTGCF consistently outperforms the best single-domain baseline LightGCN, suggesting that the high-order information of graphs is valuable for cross-domain recommendations.
    The proposed PPCDR achieves the best performance among baseline methods and even better than centralized cross-domain methods such as CMF. Different from the best cross-domain baseline BiTGCF that directly transfers information among multiple domains, we selectively transfer cross-domain knowledge by applying domain adaptation on global user preferences with personalized aggregation. In addition, BiTGCF and FedCDR only consider information transfer between two domains and thus cannot handle multi-domain scenarios, while PPCDR achieves improvement on those scenarios that contain more than two domains, such as the scenario Book \(\leftrightarrow\) Movie \(\leftrightarrow\) Music in our experiment. Compared to the federated learning baseline FedCT that learns embedding on each user personal space and optimizes them with auto-encoder, PPCDR cooperatively trains the parameters of each domain and combines cross-domain knowledge and high-order information of each domain’s interaction graph to perform recommendation.

    4.2.2 Performance on Douban Datasets.

    To demonstrate the versatility and applicability of our proposed method across different platforms, we select representative single-domain methods (i.e., BPRMF and LightGCN) and a multi-domain method (i.e., MTCDR) and conduct further performance comparison on Douban datasets. The experimental results on three CDR scenarios of the Douban datasets are reported in Table 6. From the table, it is evident that our proposed method consistently exhibits performance improvements across the three Douban scenarios compared to single-domain recommendation methods and notably outperforms traditional multi-domain recommendation approaches. Specifically, the graph-based LightGCN outperforms traditional collaborative filtering BPRMF, which further substantiates the effectiveness of incorporating high-order information from the user–item graph in diverse scenarios. In contrast, our proposed method not only leverages graph modeling but also integrates user representation learning from multi-domain data. As a result, it achieves additional performance improvements over LightGCN. Furthermore, the performance of our method consistently surpasses that of the traditional cross-domain recommendation method MTCDR. This attests to the effective balance our method achieves between privacy protection and recommendation performance, even when applied to the Douban dataset.
    Table 6.
    ScenariosDataset Methods NDCG@10Recall@10NDCG@20Recall@20
    D-Book \(\leftrightarrow\) D-MovieD-Book BPRMF 0.07940.11330.09340.1653
     LightGCN 0.09720.13230.11220.1891
     MTCDR 0.8210.11430.11050.1700
     PPCDR  \({\bf 0.1012}\) \({\bf 0.1340}\) \({\bf 0.1162}\) \({\bf 0.1941}\)
    D-Movie BPRMF 0.12370.10890.13230.1668
     LightGCN 0.14690.12370.15390.1856
     MTCDR 0.12880.115001412.0.1720
     PPCDR 0.1416 \({\bf 0.1258}\) \(\underline{0.1501}\) \({\bf 0.1882}\)
    D-Book \(\leftrightarrow\) D-MusicD-Book BPRMF 0.08080.11230.09490.1645
     LightGCN 0.09670.12820.11110.1836
     MTCDR 0.08720.11910.10110.1700
     PPCDR  \({\bf 0.1028}\) \({\bf 0.1379}\) \({\bf 0.1183}\) \({\bf 0.1972}\)
    D-Music BPRMF 0.07760.09800.08950.1465
     LightGCN 0.09630.11620.10900.1714
     MTCDR 0.08040.10270.09180.1499
     PPCDR  \({\bf 0.1048}\) \({\bf 0.1280}\) \({\bf 0.1178}\) \({\bf 0.1854}\)
    D-Book \(\leftrightarrow\) D-Movie \(\leftrightarrow\) D-MusicD-Book BPRMF 0.07760.10720.09270.1635
     LightGCN 0.09030.11900.10370.1720
     MTCDR 0.08670.12280.10220.1779
     PPCDR  \({\bf 0.0992}\) \({\bf 0.1329}\) \({\bf 0.1145}\) \({\bf 0.1908}\)
    D-Movie BPRMF 0.12830.09360.13380.1484
     LightGCN 0.15660.11370.15870.1690
     MTCDR 0.12300.10060.13090.1550
     PPCDR  \(\underline{0.1554}\) \({\bf 0.1165}\) \({\bf 0.1587}\) \({\bf 0.1728}\)
    D-Music BPRMF 0.07940.09860.09210.1503
     LightGCN 0.09680.11170.10860.1650
     MTCDR 0.08810.11200.10170.1681
     PPCDR  \({\bf 0.1038}\) \({\bf 0.1227}\) \({\bf 0.1170}\) \({\bf 0.1824}\)
    Table 6. Overall Performance Comparison on Douban Dataset
    The best result is in bold and the runner-up is underlined.

    4.3 Further Analysis

    In this section, we present further analysis of PPCDR.

    4.3.1 Ablation Study.

    To validate the effectiveness of personalized aggregation in PPCDR, we conduct the ablation study to analyze its contribution. The experimental results are reported in Figure 3, where “w/o p-agg.” denotes replacing the personalized aggregation defined in Equation (10) with an average aggregation. From the figure, we can observe that removing the personalized aggregation will consistently hurt the model performance on seven datasets in the three CDR scenarios. For instance, the performance of “w/o p-agg.” on the Movie dataset in Book \(\leftrightarrow\) Movie \(\leftrightarrow\) Music is worse than PPCDR by a noticeable margin and even worse than that of the single-domain method LightGCN. These experimental results indicate that the personalized aggregation in our proposed PPCDR does help to capture useful cross-domain knowledge and improve recommendation performance for all domains in a privacy-preserving way.
    Fig. 3.
    Fig. 3. Ablation study on three cross-domain recommendation scenarios. “w/o p-agg” denotes the variant of PPCDR that replaces the personalized aggregation with an average aggregation.

    4.3.2 Communication Cost Analysis.

    To reduce communication cost, we propose to perform private update process several times for each domain, and one federated update in each training round of PPCDR (refer to Section 3.4). The number of private update iterations T need to be carefully selected to trade off communication cost and recommendation accuracy. In this part, we vary T in the range of 1 to 5 to further analyze its influence on communication round and recommendation performance, and the results are shown in Figure 4. From the figure, we observe that increasing the number of local iterations T can significantly reduce the number of communication round in training process. Meanwhile, setting a too-large T, such as 5, hurts the recommendation performance, because the recommendation model might be trained to converge to a suboptimal solution on the local-domain data. Therefore, the hyperparameter T should be carefully tuned to achieve a good balance between communication cost and performance.
    Fig. 4.
    Fig. 4. Analysis of communication round and recommendation performance for different private update iterations T in the scenario “Book \(\leftrightarrow\) Music.”

    4.3.3 Impact of the Data Sparsity Levels.

    To further verify the proposed cross-domain recommendation method to alleviate the sparsity of interaction data in each single domain, we evaluate the performance of PPCDR on users with different sparsity. Specifically, we divided users into five groups based on the number of interactions they had in each domain, while maintaining the same total number of interactions for each group. Table 7 shows the details of the split datasets. With these five user groups, we compare the recommendation performance of PPCDR and LightGCN and present the results in Figure 5. The figure demonstrates that the performance of PPCDR is consistently superior to that of LightGCN. Moreover, as the number of interactions decreases, the performance gap between PPCDR and LightGCN increases, indicating that PPCDR is able to perform high-quality recommendation with sparse interaction data and improve recommendation performance for relatively inactive users.
    Table 7.
    DatasetGroups#Users#Interactions#Inter. per user \(\le\) Density
    BooksG14,65942,609170.040%
    G21,67752,866560.139%
    G355648,9811130.388%
    G427047,8662070.780%
    G511246,9591,6791.845%
    MusicG13,58831,779170.056%
    G22,62036,834330.090%
    G373639,041800.338%
    G425337,0231930.933%
    G57736,1491,6012.993%
    Table 7. Statistics of Split Datasets
    Fig. 5.
    Fig. 5. Analysis of recommendation for different sparse-level users. G1 denotes the group of users with the lowest average number of interactions.

    4.3.4 Impact of Domain Subsampling.

    Device subsampling stands as a pivotal aspect of federated learning and is widely used in real-world scenarios. In this subsection, we conduct experiments to scrutinize the effects of device subsampling on model performance. Specifically, at each communication round, we randomly sample domains to participate in federated updates with probability \(\rho\) and then evaluate the recommendation performance in each domain. Fortunately, our personalized aggregation strategy based on the attention mechanism easily adapts to changes in the number of clients across different communication rounds. The experimental results on the Book \(\leftrightarrow\) Music datasets are visually represented in Figure 6. The graph illustrates that as the sampling probability \(\rho\) decreases from 1.0 to 0.2, our method’s recommendation performance demonstrates a declining trend. Concurrently, our model consistently outperforms the single-domain method LightGCN, underscoring the robustness of our approach and its adeptness at adapting to domain subsampling.
    Fig. 6.
    Fig. 6. Analysis of recommendation performance for different probability of domain subsampling in the scenario “Book \(\leftrightarrow\) Music.”

    4.3.5 Adapting to Scenarios with Partial User Overlap.

    In this study, we explore a CDR scenario characterized by a “user overlap item non-overlap” setting, although real-world scenarios more commonly exhibit “partial user overlap.” To assess the adaptability of our proposed PPCDR to this scenario, we randomly designated certain users as “single-domain users” in each domain. Subsequently, we replaced the global nodes of these users in each domain with virtual global nodes that did not participate in federated updates. This simulation reflects a scenario where these users exclusively possess single-domain data. We adjust the proportion of single-domain users and evaluate the recommendation performance of our approach. The experimental results are presented in Figure 7. From the figure, it can be observed that as the proportion of users interacting with only partial domains increases, the recommendation performance gradually decreases and approaches that of single-domain recommendation method LightGCN. These experimental results indicate that joint training with multi-domain data can enhance recommendation performance, and our method is also well suited to situations where the user set is not entirely aligned.
    Fig. 7.
    Fig. 7. Analysis of recommendation performance for different proportion of single-domain users in the scenario “Book \(\leftrightarrow\) Music.”

    4.4 Study on Privacy Protection

    We apply LDP technique [45] on global user embeddings before sharing them across domains for privacy preserving (refer to Section 3.3 for more details). To quantitatively evaluate the privacy protection ability of PPCDR, we predict users’ historical interactions based on their protected global user embeddings shared by each domain. Specifically, for each user we randomly sample one positive item from the items that the user has interacted with and nine negative items from the items that the user has not interacted with. Then, we rank all candidate items of each user according to the similarities between the item embedding and the protected global user embedding, which is computed by a dot product. We predictively regard the item with the highest similarity as the historical interaction, collect the results, and calculate the prediction accuracy as the privacy protection performance of PPCDR. The lower prediction accuracy indicates the better privacy protection performance of PPCDR. Next, we study the privacy-preserving ability and recommendation performance of PPCDR with different LDP settings in detail.
    Following previous works [42, 43, 58] focusing on privacy-preserving recommendation systems, a randomized mechanism \(\mathcal {M}(\cdot)\) that satisfies ( \(\epsilon\) , \(\delta\) )-LDP can be applied to ensure that the leakage of private information is bounded. Specifically, a smaller privacy budget \({\epsilon } \ge \log (\frac{\operatorname{Pr}\left[\mathcal {M}\left(x\right) = \mathcal {S}\right] - \delta }{\operatorname{Pr}\left[\mathcal {M}\left(x^{\prime }\right) = s\right]})\) implies stronger privacy guarantees but weaker data utility, where the randomized mechanism \(\mathcal {M}(\cdot)\) is implemented by adding Gaussian noise with standard deviation \(\sigma\) . To study the tradeoff between privacy protection and model performance, we fix \(\delta =0.05\) and vary different values of \(\epsilon\) to calculate the standard deviation of the Gaussian distribution \(\sigma\) , which controls the strength of privacy protection in PPCDR. The experimental results are shown in Figure 8, from which we can observe that when \(\sigma\) increases, the recommendation performance of PPCDR decreases slightly, and the privacy protection performance AUC improves significantly. When the global user embeddings are not well protected by LDP with too-small \(\sigma\) , users’ historical interactions can be inferred to a certain extent. This phenomenon, which is consistent with previous studies [43, 59], indicates the potential risk of privacy leakage in CDR recommendation. Meanwhile, using too-large \(\sigma\) might degrade the recommendation performance. In conclusion, a moderate privacy budget can balance the recommendation performance and privacy protection strength.
    Fig. 8.
    Fig. 8. Study on tradeoff between privacy protection and recommendation performance. The lower AUC indicates the better privacy protection.

    5 Related Work

    In this section, we summarize the related work from three aspects, including cross-domain recommendation, privacy-preserving recommendation, and federated learning.

    5.1 Cross-domain Recommendation

    To address the cold start and data sparsity issue, CDR aims to improve the recommendation performance with data from multiple domains [6, 25]. According to the used information type, CDR can be divided into two categories, i.e., collaborative approaches and content-based approaches.
    Collaborative CDR tends to apply classical machine learning techniques to directly share or indirectly map user/item embeddings or capture common patterns across domains [32, 34, 48, 70, 73]. For example, CMF [48] assumes a shared global user factor matrix foroverlapping users of all domains and decomposes the data matrices of multiple interactions simultaneously to capture cross-domain collaborative information for these users. CBT [29] builds a matrix shared by the two domains to represent cluster-level rating patterns, which is called the codebook. EMCDR [34] considers cross-domain recommendation scenarios where two domains share the same users and/or items and explicitly maps user representations from the source domains to the target domains. However, the mapping function learned is biased due to lack of knowledge of the target domain. Hence, TMCDR [76] provides a Transfer-Meta framework that leverages meta-learning to optimize meta-networks through task-oriented techniques. Subsequently, some studies [7, 73, 75] propose combining and sharing multi-domain embeddings with the help of multi-task learning framework. For example, DTCDR [73] proposes a general framework to simultaneously improve dual-domain recommendation performance, while GA-DTCDR [75] further extends the framework to multi-domain recommendation tasks. DisenCDR [7] combines multi-task learning and disentangle learning to disentangle the domain-shared and domain-specific information. BiTGCF [32] proposes a bi-directional transfer learning method based on graph neural networks to exploits the high-order connectivity in user–item graph. CCDR [60] designs several auxiliary contrastive learning tasks to capture multiple information reflecting user diverse interests for better cross-domain recommendation performance. Different from collaborative CDR, another kind of approach is to improve cross-domain recommendation performance with the help of user attributes or items attributes from auxiliary domain, namely content-based CDR [2, 14, 68]. MV-DSSM [14] encodes user profiles and project attributes as dense vectors and transfer them in potential space. CKE [68] leverages the items text, structure, and visual knowledge from the auxiliary domain to help build item embeddings. Nevertheless, the majority of these methods only care about the recommendation performance of CDR, ignoring that the protection of domain privacy also matter in real-world scenarios. In this study, we focus on collaborative CDR without auxiliary interaction data, with special attention to privacy issues.

    5.2 Privacy-preserving Recommendation

    To provide more accurate recommendation service, recommender systems require more user behavior records and private user information. However, due to the commercial profits of data owners and the privacy concerns of users, privacy-preserving recommendation has gained increasingly wide attention. In this context, most of the early work on privacy-preserving recommendation focus on differential privacy [36] and encryption [24]. For instance, McSherry et al. [36] propose a differential private recommender system by introducing Gaussian noise into the covariance matrix. Recently, some research efforts apply federated learning to privacy-preserving recommendation [3, 8, 32, 42, 54, 55, 58, 63], because federated learning (FL) can provide secure multi-client collaboration. For example, FCF [3] locally updates the user embeddings and uploads the gradients of item embeddings to a central server that maintains the global item embeddings. Subsequently, some studies have suggested that the model updates sent to the server may contain enough information to reveal the original data, raising privacy concerns [8, 31]. Therefore, FedMF [8] protects the item embeddings by homomorphic encryption techniques to avoid leaking information. For rating prediction with explicit feedback, FedRec [31] designes two strategies, i.e., user averaging and hybrid filling, to protect the private information of each user. FedRec++ [30] further proposes a lossless federated recommendation method that eliminates noise in the case of privacy awareness by assigning a number of denoising clients. Uni-fedrec [43] proposed a privacy-preserving method for news recall and ranking by modeling a linear combination of base interest vectors for user interests. More recently, FedGNN [58] applies federated learning and local differential privacy to GNN-based recommendation, which aims to collectively train GNN models from decentralized user data.
    However, these methods usually focuse on improving the privacy-preserving recommendation performance in a single domain. In modern recommender systems, it has become a tendency that a user interacts with multiple information-seeking apps or multiple services of an app. To avoid the leak of user privacy, some studies [16, 17, 27, 33, 39, 40, 54, 66] propose privacy-aware CDR models. For example, NATR [16] chooses to transfer only the item embeddings across domains. Several CDR models [17, 54, 66] are proposed to specific CDR tasks, such as location recommendation and healthcare wearables recommendation. Besides, some CDR models [39, 40] propose to protect privacy with homomorphic encryption, which introduce additional privacy servers and significant computing costs. More recently, with the success of federated learning, some works [33, 38, 62] propose cross-domain recommendation metheds within the federated framework to provide privacy-preserving recommendation. For example, FedCT [33] defines the edge cross-domain recommendation task and learns a user encoding on each user’s personal space via federated learning. FedCDR [38] designes federated recommendation model based on Matrix-Factorization to alleviate the cold-start problem on the participating user devices, which requires agreement from all users to use their local devices. In this work, we study privacy-preserving cross-domain recommendation within the federated learning framework. Different from previous methods, we are more focused on protecting the data privacy of business partners rather than individual customers. Therefore, we include recommendation service providers in each domain as participants and propose the decentralized federated learning framework for privacy-preserving cross-domain recommendation with overlapping users.

    5.3 Federated Learning

    FL is a privacy-preserving machine learning framework to collectively learn intelligent models based on decentralized data [35, 64]. Different from traditional machine learning methods based on centralized storage of user data, federated learning keeps user data on the user’s local device. In a typical federated learning framework, each device maintains a local model and calculates local model updates based on data stored locally. These updates are aggregated into a unified update to maintain a global model, which is further distributed to all user devices. This process is performed iteratively until the model converges. Federated learning can be classifid into horizontal federated learning, vertical federated learning and federated transfer learning according to how the data is distributed among the parties in the feature and sample ID space [64]. While federated learning traditionally aims to train a single global model, one unified model may not always be suitable for all participating clients [35]. Therefore, recent studies propose personalized federated learning [5, 11, 15, 18, 49]. For instance, FedPer [5] decompose a model into a series of global modules and local modules, where the global modules are trained in a collaborative manner using federated learning. L2SGD [18] conducts theoretical analysis and seeks a tradeoff between the global model and the local models. FedRep [11] enhances FedPer through low-dimensional embedding to accelerate the convergence speed of the training process. Besides, some works [15, 22] explore the links between federated learning and meta-learning. For example, Jiang et al. [22] interprets federated learning as an instance of meta learning algorithm. Per-FedAvg [15] adopts meta-learning technique to learn personalized model for each client.
    Recently, federated learning has been applied to graph data mining [19, 37, 41, 52, 57]. For example, GraphFL [52] is a model-agnostic meta-learning approach, which solves the semi-supervised node classification within the federated learning framework. FedSage [69] leverages federated learning to improve the traditional graph GraphSage, which integrates node features, link structures, and task labels on local subgraphs. FedGL [9] trains a graph model with graph data stored in different clients in a privacy-preserving manner. FedGNN [58] employs federated learning for GNN-based recommendation, which aims to train GNN models from decentralized user data and provide privacy-preserving recommendation. Recently, FedGraphNN [19] provides a benchmark to promote GNN-based federated learning research, and FederatedScope-GNN [57] implements an federated graph learning package to facilitate both the research and application of federated graph learning.

    6 Conclusions and Further Work

    In this article, we proposed a novel, as far as we know, federated graph learning approach for PPCDR, which modeled both the local preference in a single domain and the global preference among multiple domains for a given user. In PPCDR, we first adopted a graph transfer module to fuse the global and local user preferences in the private update process within every single domain. Then, we maintained the global preferences by applying the LDP technique and a personalized aggregation strategy in the federated update process across multiple domains. Meanwhile, we designed a periodic synchronization mechanism to reduce the communication cost of PPCDR. The experimental results showed that the proposed PPCDR outperformed a number of competitive single- and cross-domain baselines while protecting privacy.
    Although it has effectively solved the privacy-preserving cross-domain recommendation problem, several future directions remain. First, we will consider more complex cross-domain recommendation scenarios, where some users interact only in a subset of domains but not all. Second, we will leverage more types of information, such as text, images, and video, to improve the performance of privacy-preserving cross-domain recommendation.

    Acknowledgments

    The authors gratefully appreciate the anonymous reviewers for their valuable and detailed comments, which greatly help to improve the quality of this article.

    Footnotes

    References

    [1]
    Martin Abadi, Andy Chu, Ian Goodfellow, H Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. 2016. Deep learning with differential privacy. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security. 308–318.
    [2]
    Deepak Agarwal, Bee-Chung Chen, and Bo Long. 2011. Localized factor models for multi-context recommendation. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 609–617.
    [3]
    Muhammad Ammad-Ud-Din, Elena Ivannikova, Suleiman A Khan, Were Oyomno, Qiang Fu, Kuan Eeik Tan, and Adrian Flanagan. 2019. Federated collaborative filtering for privacy-preserving personalized recommendation system. arXiv:1901.09888. Retrieved from https://arxiv.org/abs/1901.09888
    [4]
    Pathum Chamikara Mahawaga Arachchige, Peter Bertok, Ibrahim Khalil, Dongxi Liu, Seyit Camtepe, and Mohammed Atiquzzaman. 2019. Local differential privacy for deep learning. IEEE IoT J. 7, 7 (2019), 5827–5842.
    [5]
    Manoj Ghuhan Arivazhagan, Vinay Aggarwal, Aaditya Kumar Singh, and Sunav Choudhary. 2019. Federated learning with personalization layers. arXiv:1912.00818. Retrieved from https://arxiv.org/abs/1912.00818
    [6]
    Shlomo Berkovsky, Tsvi Kuflik, and Francesco Ricci. 2007. Cross-domain mediation in collaborative filtering. In International Conference on User Modeling. 355–359.
    [7]
    Jiangxia Cao, Xixun Lin, Xin Cong, Jing Ya, Tingwen Liu, and Bin Wang. 2022. DisenCDR: Learning disentangled representations for cross-domain recommendation. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 267–277.
    [8]
    D. Chai, L. Wang, K. Chen, and Q. Yang. 2021. Secure federated matrix factorization. IEEE Intelligent Systems 36, 5 (September 2021), 11–20. DOI:
    [9]
    Chuan Chen, Ziyue Xu, Weibo Hu, Zibin Zheng, and Jie Zhang. 2024. FedGL: Federated graph learning framework with global self-supervision. Inf. Sci. 657 (2024), 119976.
    [10]
    Wenlin Chen, Samuel Horvath, and Peter Richtarik. 2022. Optimal client sampling for federated learning. Trans. Mach. Learn. Res. (2022).
    [11]
    Liam Collins, Hamed Hassani, Aryan Mokhtari, and Sanjay Shakkottai. 2021. Exploiting shared representations for personalized federated learning. In International Conference on Machine Learning. PMLR, 2089–2099.
    [12]
    Qiang Cui, Tao Wei, Yafeng Zhang, and Qing Zhang. 2020. HeroGRAPH: A heterogeneous graph framework for multi-target cross-domain recommendation. In Proceedings of the Workshop on Online Recommender Systems and User Modeling co-located with the 16th ACM Conference on Recommender Systems (ORSUM@ RecSys ’20).
    [13]
    Cynthia Dwork, Aaron Roth, et al. 2014. The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci. 9, 3-4 (2014), 211–407.
    [14]
    Ali Mamdouh Elkahky, Yang Song, and Xiaodong He. 2015. A multi-view deep learning approach for cross domain user modeling in recommendation systems. In Proceedings of the 24th International Conference on World Wide Web. 278–288.
    [15]
    Alireza Fallah, Aryan Mokhtari, and Asuman Ozdaglar. 2020. Personalized federated learning with theoretical guarantees: A model-agnostic meta-learning approach. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 3557–3568.
    [16]
    Chen Gao, Xiangning Chen, Fuli Feng, Kai Zhao, Xiangnan He, Yong Li, and Depeng Jin. 2019. Cross-domain recommendation without sharing user-relevant data. In The World Wide Web Conference (WWW ’19). Association for Computing Machinery, New York, NY, 491–502.
    [17]
    Chen Gao, Chao Huang, Yue Yu, Huandong Wang, Yong Li, and Depeng Jin. 2019. Privacy-preserving cross-domain location recommendation. Proc. ACM Interact. Mob. Wear. Ubiq. Technol. 3, 1, Article 11 (Mar. 2019), 21 pages.
    [18]
    Filip Hanzely and Peter Richtárik. 2020. Federated learning of a mixture of global and local models. arXiv:2002.05516. Retrieved from https://arxiv.org/abs/2002.05516
    [19]
    Chaoyang He, Keshav Balasubramanian, Emir Ceyani, Carl Yang, Han Xie, Lichao Sun, Lifang He, Liangwei Yang, Philip S. Yu, Yu Rong, et al. 2021. Fedgraphnn: A federated learning system and benchmark for graph neural networks. arXiv:2104.07145. Retrieved from https://arxiv.org/abs/2104.07145
    [20]
    Ming He, Jiuling Zhang, and Shaozong Zhang. 2019. ACTL: Adaptive codebook transfer learning for cross-domain recommendation. IEEE Access 7 (2019), 19539–19549.
    [21]
    Xiangnan He, Kuan Deng, Xiang Wang, Yan Li, YongDong Zhang, and Meng Wang. 2020. LightGCN: Simplifying and powering graph convolution network for recommendation(SIGIR ’20). Association for Computing Machinery, New York, NY, 639–648.
    [22]
    Yihan Jiang, Jakub Konečnỳ, Keith Rush, and Sreeram Kannan. 2019. Improving federated learning personalization via model agnostic meta learning. arXiv:1909.12488. Retrieved from https://arxiv.org/abs/1909.12488
    [23]
    Dongseok Kang and Chang Wook Ahn. 2021. Communication cost reduction with partial structure in federated learning. Electronics 10, 17 (2021), 2081.
    [24]
    Stefan Katzenbeisser and Milan Petkovic. 2008. Privacy-preserving recommendation systems for consumer healthcare services. In Proceedings of the 3rd International Conference on Availability, Reliability and Security. IEEE, 889–895.
    [25]
    Muhammad Murad Khan, Roliana Ibrahim, and Imran Ghani. 2017. Cross domain recommender systems: A systematic literature review. ACM Comput. Surv. 50, 3 (2017), 1–34.
    [26]
    Thomas N. Kipf and Max Welling. 2017. Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations.
    [27]
    Qi Le, Enmao Diao, Xinran Wang, Ali Anwar, Vahid Tarokh, and Jie Ding. 2022. Personalized federated recommender systems with private and partially federated AutoEncoders. In Proceedings of the 56th Asilomar Conference on Signals, Systems, and Computers. 1157–1163.
    [28]
    Jin-woo Lee, Jaehoon Oh, Sungsu Lim, Se-Young Yun, and Jae-Gil Lee. 2020. Tornadoaggregate: Accurate and scalable federated learning via the ring-based architecture. arXiv:2012.03214. Retrieved from https://arxiv.org/abs/2012.03214
    [29]
    Bin Li, Qiang Yang, and Xiangyang Xue. 2009. Can movies and books collaborate? Cross-domain collaborative filtering for sparsity reduction. In Proceedings of the 21st International Joint Conference on Artificial Intelligence (IJCAI’09). Morgan Kaufmann, San Francisco, CA, 2052–2057.
    [30]
    Feng Liang, Weike Pan, and Zhong Ming. 2021. Fedrec++: Lossless federated recommendation with explicit feedback. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 4224–4231.
    [31]
    Guanyu Lin, Feng Liang, Weike Pan, and Zhong Ming. 2020. Fedrec: Federated recommendation with explicit feedback. IEEE Intell. Syst. 36, 5 (2020), 21–30.
    [32]
    Meng Liu, Jianjun Li, Guohui Li, and Peng Pan. 2020. Cross domain recommendation via bi-directional transfer graph collaborative filtering networks. InProceedings of the Conference on Information and Knowledge Management (CIKM ’20). Association for Computing Machinery, New York, NY, 885–894.
    [33]
    Shuchang Liu, Shuyuan Xu, Wenhui Yu, Zuohui Fu, Yongfeng Zhang, and Amelie Marian. 2021. FedCT: Federated collaborative transfer for recommendation. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’21). Association for Computing Machinery, New York, NY, 716–725.
    [34]
    Tong Man, Huawei Shen, Xiaolong Jin, and Xueqi Cheng. 2017. Cross-domain recommendation: An embedding and mapping approach. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI-17). 2464–2470.
    [35]
    Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. 2017. Communication-efficient learning of deep networks from decentralized data. In Artificial Intelligence and Statistics. 1273–1282.
    [36]
    Frank McSherry and Ilya Mironov. 2009. Differentially private recommender systems: Building privacy into the netflix prize contenders. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 627–636.
    [37]
    Guangxu Mei, Ziyu Guo, Shijun Liu, and Li Pan. 2019. SGNN: A graph neural network based federated learning approach by hiding structure. In Proceedings of the IEEE International Conference on Big Data (Big Data ’19). 2560–2568.
    [38]
    Wu Meihan, Li Li, Chang Tao, Eric Rigall, Wang Xiaodong, and Xu Cheng-Zhong. 2022. FedCDR: Federated cross-domain recommendation for privacy-preserving rating prediction. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management. 2179–2188.
    [39]
    Taiwo Blessing Ogunseyi, Cossi Blaise Avoussoukpo, and Yiqiang Jiang. 2021. Privacy-preserving matrix factorization for cross-domain recommendation. IEEE Access (2021).
    [40]
    Taiwo Blessing Ogunseyi, Tang Bo, and Cheng Yang. 2021. A privacy-preserving framework for cross-domain recommender systems. Comput. Electr. Eng. 93 (2021), 107213.
    [41]
    Yang Pei, Renxin Mao, Yang Liu, Chaoran Chen, Shifeng Xu, Feng Qiang, and Blue Elephant Tech. 2021. Decentralized federated graph neural networks. In International Workshop on Federated and Transfer Learning for Data Sparsity and Confidentiality in Conjunction with IJCAI.
    [42]
    Tao Qi, Fangzhao Wu, Chuhan Wu, Yongfeng Huang, and Xing Xie. 2020. Privacy-preserving news recommendation model learning. In Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics, 1423–1432.
    [43]
    Tao Qi, Fangzhao Wu, Chuhan Wu, Yongfeng Huang, and Xing Xie. 2021. Uni-FedRec: A unified privacy-preserving news recommendation framework for model training and online serving. In Findings of the Association for Computational Linguistics: EMNLP 2021. Association for Computational Linguistics, 1438–1448.
    [44]
    Amirhossein Reisizadeh, Aryan Mokhtari, Hamed Hassani, Ali Jadbabaie, and Ramtin Pedarsani. 2020. Fedpaq: A communication-efficient federated learning method with periodic averaging and quantization. In International Conference on Artificial Intelligence and Statistics (AISTATS ’20). 2021–2031.
    [45]
    Xuebin Ren, Chia-Mu Yu, Weiren Yu, Shusen Yang, Xinyu Yang, Julie A. McCann, and S. Yu Philip. 2018. High-dimensional crowdsourced data publication with local differential privacy. IEEE Trans. Inf. Forens. Secur. 13, 9 (2018), 2151–2166.
    [46]
    Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2009. BPR: Bayesian personalized ranking from implicit feedback. In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI ’09). AUAI Press, Arlington, VA, 452–461.
    [47]
    Francesco Ricci, Lior Rokach, and Bracha Shapira. 2011. Introduction to recommender systems handbook. In Recommender Systems Handbook. 1–35.
    [48]
    Ajit P. Singh and Geoffrey J. Gordon. 2008. Relational learning via collective matrix factorization. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (SIGKDD ’08). 650–658.
    [49]
    Virginia Smith, Chao-Kai Chiang, Maziar Sanjabi, and Ameet S. Talwalkar. 2017. Federated multi-task learning. In Advances in Neural Information Processing Systems, Vol. 30 (2017).
    [50]
    Changxin Tian, Binbin Hu, Wayne Xin Zhao, Zhiqiang Zhang, and Jun Zhou. 2023. Periodicity may be emanative: Hierarchical contrastive learning for sequential recommendation. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management. 2442–2451.
    [51]
    Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 2018. Graph attention networks. In International Conference on Learning Representations.
    [52]
    Binghui Wang, Ang Li, Meng Pang, Hai Li, and Yiran Chen. 2022. GraphFL: A federated learning framework for semi-supervised node classification on graphs. In Proceedings of the IEEE International Conference on Data Mining (ICDM ’22). IEEE Computer Society, 498–507.
    [53]
    Jianyu Wang and Gauri Joshi. 2021. Cooperative SGD: A unified framework for the design and analysis of local-update SGD algorithms. J. Mach. Learn. Res. 22, 1, Article 213 (Jan 2021), 50 pages.
    [54]
    Li-e Wang, Yihui Wang, Yan Bai, Peng Liu, and Xianxian Li. 2021. POI recommendation with federated learning and privacy preserving in cross domain recommendation. In Proceedings of the IEEE Conference on Computer Communications Workshops (INFOCOM Wksps ’21). IEEE, 1–6.
    [55]
    Qinyong Wang, Hongzhi Yin, Tong Chen, Junliang Yu, Alexander Zhou, and Xiangliang Zhang. 2021. Fast-adapting and privacy-preserving federated recommender system. VLDB J. 31, 5 (Oct. 2021), 877–896.
    [56]
    Su Wang, Mengyuan Lee, Seyyedali Hosseinalipour, Roberto Morabito, Mung Chiang, and Christopher G Brinton. 2021. Device sampling for heterogeneous federated learning: Theory, algorithms, and implementation. In Proceedings of the IEEE Conference on Computer Communications (INFOCOM ’21). IEEE, 1–10.
    [57]
    Zhen Wang, Weirui Kuang, Yuexiang Xie, Liuyi Yao, Yaliang Li, Bolin Ding, and Jingren Zhou. 2022. FederatedScope-GNN: Towards a unified, comprehensive and efficient package for federated graph learning. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 4110–4120.
    [58]
    Chuhan Wu, Fangzhao Wu, Yang Cao, Yongfeng Huang, and Xing Xie. 2021. FedGNN: Federated graph neural network for privacy-preserving recommendation. arXiv:2102.04925. Retrieved from https://arxiv.org/abs/2102.04925
    [59]
    Chuhan Wu, Fangzhao Wu, Lingjuan Lyu, Yongfeng Huang, and Xing Xie. 2022. FedCTR: Federated native Ad CTR prediction with cross-platform user behavior data. ACM Trans. Intell. Syst. Technol. 13, 4, Article 62 (Jun. 2022), 19 pages.
    [60]
    Ruobing Xie, Qi Liu, Liangdong Wang, Shukai Liu, Bo Zhang, and Leyu Lin. 2022. Contrastive cross-domain recommendation in matching. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 4226–4236.
    [61]
    Jie Xu, Benjamin S. Glicksberg, Chang Su, Peter Walker, Jiang Bian, and Fei Wang. 2021. Federated learning for healthcare informatics. J. Healthcare Inf. Res. 5 (2021), 1–19.
    [62]
    Dengcheng Yan, Yuchuan Zhao, Zhongxiu Yang, Ying Jin, and Yiwen Zhang. 2022. FedCDR: Privacy-preserving federated cross-domain recommendation. Digit. Commun. Netw. (2022).
    [63]
    Liu Yang, Ben Tan, Vincent W. Zheng, Kai Chen, and Qiang Yang. 2020. Federated recommendation systems. In Federated Learning. Springer, 225–239.
    [64]
    Qiang Yang, Yang Liu, Tianjian Chen, and Yongxin Tong. 2019. Federated machine learning: Concept and applications. ACM Trans. Intell. Syst. Technol. 10, 2 (2019), 1–19.
    [65]
    Bicheng Ying, Kun Yuan, Yiming Chen, Hanbin Hu, Pan Pan, and Wotao Yin. 2021. Exponential graph is provably efficient for decentralized deep training. In Advances in Neural Information Processing Systems, Vol. 34 (2021).
    [66]
    Xu Yu, Dingjia Zhan, Lei Liu, Hongwu Lv, Lingwei Xu, and Junwei Du. 2022. A Privacy-preserving cross-domain healthcare wearables recommendation algorithm based on domain-dependent and domain-independent feature fusion. IEEE Journal of Biomedical and Health Informatics 26, 5 (2022), 1928–1936. DOI:
    [67]
    Tianzi Zang, Yanmin Zhu, Haobing Liu, Ruohan Zhang, and Jiadi Yu. 2022. A survey on cross-domain recommendation: Taxonomies, methods, and future directions. ACM Trans. Inf. Syst. 41, 2, Article 42 (Dec. 2022), 39 pages.
    [68]
    Fuzheng Zhang, Nicholas Jing Yuan, Defu Lian, Xing Xie, and Wei-Ying Ma. 2016. Collaborative knowledge base embedding for recommender systems. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 353–362.
    [69]
    Ke Zhang, Carl Yang, Xiaoxiao Li, Lichao Sun, and Siu Ming Yiu. 2021. Subgraph federated learning with missing neighbor generation. In Advances in Neural Information Processing Systems, Vol. 34 (2021), 6671–6682.
    [70]
    Yu Zhang, Bin Cao, and Dit-Yan Yeung. 2010. Multi-domain collaborative filtering. In Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence (UAI’10). AUAI Press, Arlington, VA, 725–732.
    [71]
    Wayne Xin Zhao, Yupeng Hou, Xingyu Pan, Chen Yang, Zeyu Zhang, Zihan Lin, Jingsen Zhang, Shuqing Bian, Jiakai Tang, Wenqi Sun, Yushuo Chen, Lanling Xu, Gaowei Zhang, Zhen Tian, Changxin Tian, Shanlei Mu, Xinyan Fan, Xu Chen, and Ji-Rong Wen. 2022. RecBole 2.0: Towards a more up-to-date recommendation library. In Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM ’22). ACM, 4722–4726.
    [72]
    Wayne Xin Zhao, Shanlei Mu, Yupeng Hou, Zihan Lin, Yushuo Chen, Xingyu Pan, Kaiyuan Li, Yujie Lu, Hui Wang, Changxin Tian, Yingqian Min, Zhichao Feng, Xinyan Fan, Xu Chen, Pengfei Wang, Wendi Ji, Yaliang Li, Xiaoling Wang, and Ji-Rong Wen. 2021. RecBole: Towards a unified, comprehensive and efficient framework for recommendation algorithms. In Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM ’21). ACM, 4653–4664.
    [73]
    Feng Zhu, Chaochao Chen, Yan Wang, Guanfeng Liu, and Xiaolin Zheng. 2019. DTCDR: A framework for dual-target cross-domain recommendation. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management (CIKM ’19). Association for Computing Machinery, New York, NY, 1533–1542.
    [74]
    Feng Zhu, Yan Wang, Chaochao Chen, Jun Zhou, Longfei Li, and Guanfeng Liu. 2021. Cross-domain recommendation: Challenges, progress, and prospects. In Proceedings of the 30th International Joint Conference on Artificial Intelligence (IJCAI ’21), Zhi-Hua Zhou (Ed.). International Joint Conferences on Artificial Intelligence Organization, 4721–4728. Survey Track.
    [75]
    Feng Zhu, Yan Wang, Jun Zhou, Chaochao Chen, Longfei Li, and Guanfeng Liu. 2023. A unified framework for cross-domain and cross-system recommendations. IEEE Trans. Knowl. Data Eng. 35, 2 (2023), 1171–1184.
    [76]
    Yongchun Zhu, Kaikai Ge, Fuzhen Zhuang, Ruobing Xie, Dongbo Xi, Xu Zhang, Leyu Lin, and Qing He. 2021. Transfer-meta framework for cross-domain recommendation to cold-start users. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1813–1817.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Information Systems
    ACM Transactions on Information Systems  Volume 42, Issue 5
    September 2024
    809 pages
    ISSN:1046-8188
    EISSN:1558-2868
    DOI:10.1145/3618083
    • Editor:
    • Min Zhang
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 May 2024
    Online AM: 21 March 2024
    Accepted: 06 March 2024
    Revised: 13 December 2023
    Received: 17 January 2023
    Published in TOIS Volume 42, Issue 5

    Check for updates

    Author Tags

    1. Recommender systems
    2. privacy-preserving
    3. federated learning
    4. cross-domain recommendation

    Qualifiers

    • Research-article

    Funding Sources

    • National Natural Science Foundation of China
    • Beijing Natural Science Foundation
    • Alibaba through AIR

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 724
      Total Downloads
    • Downloads (Last 12 months)724
    • Downloads (Last 6 weeks)319
    Reflects downloads up to 27 Jul 2024

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media