Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

HTML conversions sometimes display errors due to content that did not convert correctly from the source. This paper uses the following packages that are not yet supported by the HTML conversion tool. Feedback on these issues are not necessary; they are known and are being worked on.

  • failed: kotex

Authors: achieve the best HTML results from your LaTeX submissions by following these best practices.

License: CC BY-NC-ND 4.0
arXiv:2401.03162v1 [cs.IR] 06 Jan 2024

QoS-Aware Graph Contrastive Learning for Web Service Recommendation
thanks: Duksan Ryu is the corresponding author. This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF- 2022R1I1A3069233) and the MSIT (Ministry of Science and ICT), Korea, under the ITRC (Information Technology Research Center) support program (IITP-2023-2020-0-01795) supervised by the IITP (Institute for Information & Communications Technology Planning & Evaluation) and the Nuclear Safety Research Program through the Korea Foundation Of Nuclear Safety (KoFONS) using the financial resource granted by the Nuclear Safety and Security Commission (NSSC) of the Republic of Korea. (No. 2105030)

Jeongwhan Choi Dept. of Artificial Intelligence
Yonsei University
Seoul, South Korea
jeongwhan.choi@yonsei.ac.kr
   Duksan Ryu Dept. of Software Engineering
Jeonbuk National University
Jeonju, South Korea
duksan.ryu@jbnu.ac.kr
Abstract

With the rapid growth of cloud services driven by advancements in web service technology, selecting a high-quality service from a wide range of options has become a complex task. This study aims to address the challenges of data sparsity and the cold-start problem in web service recommendation using Quality of Service (QoS). We propose a novel approach called QoS-aware graph contrastive learning (QAGCL) for web service recommendation. Our model harnesses the power of graph contrastive learning to handle cold-start problems and improve recommendation accuracy effectively. By constructing contextually augmented graphs with geolocation information and randomness, our model provides diverse views. Through the use of graph convolutional networks and graph contrastive learning techniques, we learn user and service embeddings from these augmented graphs. The learned embeddings are then utilized to seamlessly integrate QoS considerations into the recommendation process. Experimental results demonstrate the superiority of our QAGCL model over several existing models, highlighting its effectiveness in addressing data sparsity and the cold-start problem in QoS-aware service recommendations. Our research contributes to the potential for more accurate recommendations in real-world scenarios, even with limited user-service interaction data.

Index Terms:
Quality of Service, Web Service, Service Recommendation, Graph Contrastive Learning

I Introduction

Quality of Service (QoS) is a crucial aspect of web service technologies. Web services are reusable web components designed to support machine-to-machine interaction through programmable method calls [1]. As reported by ProgrammableWeb, there are 20,525 public web services, and their availability is accelerating with the advancement of cloud computing [2]. Many of these web services offer similar functionalities to the users. QoS represents the quality attributes of web services and is perceived as an essential criterion for distinguishing these services. Predicting QoS is a very popular and active research area [3, 4, 5, 6, 7]. Various approaches have been proposed through QoS prediction, such as service recommendation [8], selection [9], and discovery [10]. As web service technology advances rapidly, the cloud has become a repository of a vast array of service options [11]. Selecting an appropriate service among various options, mainly based on QoS, poses a significant challenge.

Existing research in the QoS domain predominantly focuses on predicting QoS values. However, high prediction accuracy does not necessarily equate to satisfactory recommendation results. Fig. 1 depicts a scenario where two users, u1subscript𝑢1u_{1}italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and u2subscript𝑢2u_{2}italic_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, interact with two web services, s1subscript𝑠1s_{1}italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and s2subscript𝑠2s_{2}italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. user u1subscript𝑢1u_{1}italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT who accesses two web services s1subscript𝑠1s_{1}italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and s2subscript𝑠2s_{2}italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, yielding observed QoS values for response time of t11=0.4subscript𝑡110.4t_{11}=0.4italic_t start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT = 0.4 and t12=0.5subscript𝑡120.5t_{12}=0.5italic_t start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT = 0.5, respectively. Models M1subscript𝑀1M_{1}italic_M start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and M2subscript𝑀2M_{2}italic_M start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT predict the QoS grades of s1subscript𝑠1s_{1}italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and s2subscript𝑠2s_{2}italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT as t^11M1=0.3superscriptsubscript^𝑡11subscript𝑀10.3\hat{t}_{11}^{M_{1}}=0.3over^ start_ARG italic_t end_ARG start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT = 0.3, t^12M1=0.6superscriptsubscript^𝑡12subscript𝑀10.6\hat{t}_{12}^{M_{1}}=0.6over^ start_ARG italic_t end_ARG start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT = 0.6 and t^11M2=0.5superscriptsubscript^𝑡11subscript𝑀20.5\hat{t}_{11}^{M_{2}}=0.5over^ start_ARG italic_t end_ARG start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT = 0.5, t^12M2=0.45superscriptsubscript^𝑡12subscript𝑀20.45\hat{t}_{12}^{M_{2}}=0.45over^ start_ARG italic_t end_ARG start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT = 0.45, respectively. Although M2subscript𝑀2M_{2}italic_M start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT has better prediction accuracy, recommending sjsubscript𝑠𝑗s_{j}italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT to users similar to u1subscript𝑢1u_{1}italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT according to M2subscript𝑀2M_{2}italic_M start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT would be inappropriate. This highlights that high prediction accuracy alone does not ensure satisfactory recommendations. Thus, finding similar users or services is essential for better web service recommendations based on QoS.

Collaborative Filtering (CF) has emerged as a key solution to this issue. Leveraging historical user-service interactions, CF provides a more personalized approach to service recommendations. However, the method often encounters the cold-start problem and struggles with data sparsity. To address these limitations, we introduce the use of graph contrastive learning, which has shown promising potential in handling cold-start predictions.

Graph contrastive learning enables us to learn representations of users and services by contrasting augmented views within a graph structure. By generating diverse perspectives through augmentation, we can better capture the underlying relationships and characteristics of user-service interactions. In our approach, we augment the graph with contextual information, such as geographical locations, to provide a broader and more accurate representation of the interactions. We also use random edge dropping to incorporate randomness into the magnified graph. This additional randomness can account for the complexity of real-world interaction scenarios and simulates the uncertainty and variability present in real-world user-service interactions. This can improve the robustness and adaptability of the model to handle different interaction patterns.

We propose the QoS-aware graph contrastive learning (QAGCL) with geolocation context for web service recommendations. Our model offers a more effective and comprehensive solution by shifting the focus from QoS prediction to QoS-aware service recommendation, thus improving the quality and accuracy of recommendations.

The main contributions of this paper can be summarized as follows:

  • We propose the QoS-aware graph contrastive learning (QAGCL) model, a novel approach that combines CF, graph contrastive learning, and contextually augmented graphs.

  • The QAGCL model effectively mitigates the cold-start problems and data sparsity issues inherent in CF methods.

  • Our model incorporates contextual information into graph augmentation, which enhances the quality of QoS-based service recommendations.

  • Through extensive experiments, we demonstrate that our model outperforms several existing models in terms of service recommendation accuracy.

Refer to caption
Figure 1: Illustration of the predictions of QoS from two models. The black edge is the response time, which is the QoS. The blue dashed edge represents the predicted response time of M1subscript𝑀1M_{1}italic_M start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT model, and the red dashed edge represents the predicted response time of M2subscript𝑀2M_{2}italic_M start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT model.

II Prelimaries & Related Work

II-A Web Quality of Service (QoS)

QoS properties such as response time and throughput have different values depending on the user [3]. When considering a specific service-based application, for example a web service providing video, user-dependent QoS data is mainly determined by the user’s calling environment [12]. Therefore, it is generally accepted that if two users have similar past QoS data, they are likely to experience similar QoS in the future due to similar calling environments. From this perspective, a collaborative filtering approach that essentially works by modeling the similarity between users and services [13, 14] becomes suitable for QoS prediction[12, 3]. So far, QoS prediction models based on collaborative filtering have made great progress in providing efficient solutions for many service-based applications, such as cloud computing-based applications [15] and multimedia service-based applications [16].

II-B Web Service Recommendation

In the realm of QoS-based web service recommendation, various studies have explored collaborative filtering techniques to improve recommendation performance [8, 17, 12, 18, 19]. Additionally, frameworks utilizing random walks on user-item bipartite graphs have been proposed to predict Web QoS values [4, 5, 6].

However, despite these efforts, data sparsity and the cold-start problem remain significant challenges in web service recommendation [6, 20]. Data sparsity refers to the scarcity of user-service interaction data, where only a fraction of possible user-service pairs are observed. This sparsity limits the effectiveness of collaborative filtering approaches that heavily rely on historical interactions to make recommendations.

The cold-start problem arises when there is insufficient information about new users or services, making it challenging to provide accurate recommendations. In the context of web service recommendation, this problem can occur when new services are introduced to the system, or when new users join and have limited interaction history.

While existing research has made strides in addressing these challenges, they still pose limitations in terms of handling data sparsity and the cold-start problem effectively. More advanced techniques are required to overcome these obstacles and improve the accuracy and coverage of web service recommendations. Therefore, this study proposes a graph contrastive learning framework for QoS-based web service recommendation. Until recently, QoS values were predicted using graph structures, but there is no example of QoS-based web service recommendation models that apply both graph convolution networks and contrastive learning [21, 22].

II-C Graph-based Collaborative Filtering

Let 𝐑{0,1}|𝒰|×|𝒱|𝐑superscript01𝒰𝒱\mathbf{R}\in\{0,1\}^{|\mathcal{U}|\times|\mathcal{V}|}bold_R ∈ { 0 , 1 } start_POSTSUPERSCRIPT | caligraphic_U | × | caligraphic_V | end_POSTSUPERSCRIPT, where 𝒰𝒰\mathcal{U}caligraphic_U is a set of users and 𝒱𝒱\mathcal{V}caligraphic_V is a set of services, be an interaction matrix. 𝐑u,vsubscript𝐑𝑢𝑣\mathbf{R}_{u,v}bold_R start_POSTSUBSCRIPT italic_u , italic_v end_POSTSUBSCRIPT is 1 iff an interaction (u,v)𝑢𝑣(u,v)( italic_u , italic_v ) is observed in data, or otherwise 0. Let 𝐀N×N𝐀superscript𝑁𝑁\mathbf{A}\in\mathbb{R}^{N\times N}bold_A ∈ blackboard_R start_POSTSUPERSCRIPT italic_N × italic_N end_POSTSUPERSCRIPT be the adjacency matrix, where N=|𝒰|+|𝒱|𝑁𝒰𝒱N=|\mathcal{U}|+|\mathcal{V}|italic_N = | caligraphic_U | + | caligraphic_V | is the number of nodes. The normalized adjacency matrix is defined as 𝐀~:=𝐃12𝐀𝐃12assign~𝐀superscript𝐃12superscript𝐀𝐃12\tilde{\mathbf{A}}:=\mathbf{D}^{-\frac{1}{2}}\mathbf{A}\mathbf{D}^{-\frac{1}{2}}over~ start_ARG bold_A end_ARG := bold_D start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT bold_AD start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT, where 𝐃N×N𝐃superscript𝑁𝑁\mathbf{D}\in\mathbb{R}^{N\times N}bold_D ∈ blackboard_R start_POSTSUPERSCRIPT italic_N × italic_N end_POSTSUPERSCRIPT is the diagonal degree matrix.

The user-service interactions in QoS datasets are closely related to the field of graph-based CF in recommendation systems. We introduce several graph-based CFs, which are mainly used in the general domain of recommender systems suitable for QoS-aware service recommendations. The goal of graph-based CF is to predict user-item (e.g., user-service) ratings by leveraging the relationships and interactions captured in an interaction graph. This approach aims to learn embeddings for users and items in the graph and utilize their inner product to compute the predicted rating.

The user-item relationships can be represented by a bipartite graph and thus, it recently became popular to adopt Graph Convolutional Networks (GCNs) for CF [23, 24, 25, 26, 27, 28, 29, 30, 31, 32]. GCN is a type of neural network that can operate on graphs. GCN’s node embedding is updated by neighbor nodes, and GCN can access l𝑙litalic_l-hop neighbor nodes.

NGCF [24] grafted the GCN into CF as it is. LightGCN [25] has emerged as a standard model by introducing a more lightweight model than NGCF. Its linear GCN layer definition is as follows:

𝐄(l+1)=𝐀~𝐄(l),superscript𝐄𝑙1~𝐀superscript𝐄𝑙\displaystyle\mathbf{E}^{(l+1)}=\tilde{\mathbf{A}}\mathbf{E}^{(l)},bold_E start_POSTSUPERSCRIPT ( italic_l + 1 ) end_POSTSUPERSCRIPT = over~ start_ARG bold_A end_ARG bold_E start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT , (1)

where 𝐄(0)N×Dsuperscript𝐄0superscript𝑁𝐷\mathbf{E}^{(0)}\in\mathbb{R}^{N\times D}bold_E start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_N × italic_D end_POSTSUPERSCRIPT is the learnable initial embedding matrix with D𝐷Ditalic_D dimensions of embedding, and 𝐄(l)superscript𝐄𝑙\mathbf{E}^{(l)}bold_E start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT denotes the embedding matrix at l𝑙litalic_l-th layer. In the message passing perspective, the GCN layer can be rewritten as follows:

𝐞i(l+1)=j𝒩i1didj𝐞j(l),superscriptsubscript𝐞𝑖𝑙1subscript𝑗subscript𝒩𝑖1subscript𝑑𝑖subscript𝑑𝑗superscriptsubscript𝐞𝑗𝑙\displaystyle\mathbf{e}_{i}^{(l+1)}=\sum_{j\in\mathcal{N}_{i}}\frac{1}{d_{i}d_% {j}}\mathbf{e}_{j}^{(l)},bold_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l + 1 ) end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG bold_e start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT , (2)

where 𝐞(l)superscript𝐞𝑙\mathbf{e}^{(l)}bold_e start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT is the feature vector of node i𝑖iitalic_i at layer l𝑙litalic_l, 𝒩isubscript𝒩𝑖\mathcal{N}_{i}caligraphic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the set of neighbors of node i𝑖iitalic_i, disubscript𝑑𝑖d_{i}italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and djsubscript𝑑𝑗d_{j}italic_d start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT are the degrees of nodes i𝑖iitalic_i and j𝑗jitalic_j, respectively. We continue to use matrix-expressed formula of Eq. (1) later.

The predicted rating, denoted as r^^𝑟\hat{r}over^ start_ARG italic_r end_ARG, is calculated by taking the inner product between the user embedding, 𝐞usubscript𝐞𝑢\mathbf{e}_{u}bold_e start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT, and the item embedding, 𝐞isubscript𝐞𝑖\mathbf{e}_{i}bold_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT:

r^=𝐞uT𝐞i.^𝑟superscriptsubscript𝐞𝑢Tsubscript𝐞𝑖\displaystyle\hat{r}=\mathbf{e}_{u}^{\textsc{T}}\mathbf{e}_{i}.over^ start_ARG italic_r end_ARG = bold_e start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT T end_POSTSUPERSCRIPT bold_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT . (3)

II-D Contrastive Learning for Recommendation

In the context of QoS-based web service recommendation, graph contrastive learning is employed to leverage the geolocation context and augment the graph. We introduce several graph contrastive learning methods for recommendation systems.

Graph-based CFs have demonstrated impressive performance but face challenges such as data sparsity and cold-start problems, as they heavily rely on positive user-item interactions as labels [33, 34]. To overcome these challenges, contrastive learning (CL) for CF methods have been proposed to extract valuable information from unlabeled interactions [35, 36]. These methods utilize different views and contrast them to align node representations, have shown promising results.

SGL [36] utilizes LightGCN as its backbone encoder and employs three operators (node dropouts, edge dropouts, and random walks) to generate augmented views. SimGCL [37] simplifies the graph augmentation process by introducing random noises to perturb node representations. LightGCL [38] proposes a graph augmentation strategy based on singular value decomposition to capture global collaborative signals effectively.

These methods consider the views of the same node as positive pairs and views of different nodes as negative pairs. The positive pairs are defined as {(𝐞u,𝐞u′′)|u𝒰}conditional-setsuperscriptsubscript𝐞𝑢superscriptsubscript𝐞𝑢′′𝑢𝒰\{(\mathbf{e}_{u}^{\prime},\mathbf{e}_{u}^{\prime\prime})|u\in\mathcal{U}\}{ ( bold_e start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_e start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) | italic_u ∈ caligraphic_U }, and the negative pairs are defined as {(𝐞u,𝐞v′′)|u,v𝒰,uv}conditional-setsuperscriptsubscript𝐞𝑢superscriptsubscript𝐞𝑣′′formulae-sequence𝑢𝑣𝒰𝑢𝑣\{(\mathbf{e}_{u}^{\prime},\mathbf{e}_{v}^{\prime\prime})|u,v\in\mathcal{U},u% \neq v\}{ ( bold_e start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_e start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) | italic_u , italic_v ∈ caligraphic_U , italic_u ≠ italic_v }. The supervision of positive pairs promotes the similarity between different views of the same user, while the negative pairs encourage the distinction between different nodes. It adopts InfoNCE [39], which allows it to learn better user/item representations to preserve node-specific properties and improve generalization ability. The contrastive loss InfoNCE is as follows:

CLuser=u𝒰logexp(s(𝐞u,𝐞u′′)/τ)v𝒰exp(s(𝐞u,𝐞v′′)/τ),subscriptsuperscript𝑢𝑠𝑒𝑟𝐶𝐿subscript𝑢𝒰𝑠superscriptsubscript𝐞𝑢superscriptsubscript𝐞𝑢′′𝜏subscript𝑣𝒰𝑠superscriptsubscript𝐞𝑢superscriptsubscript𝐞𝑣′′𝜏\displaystyle\mathcal{L}^{user}_{CL}=\sum_{u\in\mathcal{U}}-\log{\frac{\exp(s(% \mathbf{e}_{u}^{\prime},\mathbf{e}_{u}^{\prime\prime})/\tau)}{\sum_{v\in% \mathcal{U}}\exp(s(\mathbf{e}_{u}^{\prime},\mathbf{e}_{v}^{\prime\prime})/\tau% )}},caligraphic_L start_POSTSUPERSCRIPT italic_u italic_s italic_e italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_C italic_L end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_u ∈ caligraphic_U end_POSTSUBSCRIPT - roman_log divide start_ARG roman_exp ( italic_s ( bold_e start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_e start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) / italic_τ ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_v ∈ caligraphic_U end_POSTSUBSCRIPT roman_exp ( italic_s ( bold_e start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_e start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) / italic_τ ) end_ARG , (4)

where s()𝑠s(\cdot)italic_s ( ⋅ ) is the cosine similarity function between two vectors, and τ𝜏\tauitalic_τ is the temperature of softmax, which is a hyper-parameter. It is also applied to items in the same way, which is CLitemsubscriptsuperscript𝑖𝑡𝑒𝑚𝐶𝐿\mathcal{L}^{item}_{CL}caligraphic_L start_POSTSUPERSCRIPT italic_i italic_t italic_e italic_m end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_C italic_L end_POSTSUBSCRIPT. These two loss functions are combined as CL=CLuser+CLitemsubscript𝐶𝐿subscriptsuperscript𝑢𝑠𝑒𝑟𝐶𝐿subscriptsuperscript𝑖𝑡𝑒𝑚𝐶𝐿\mathcal{L}_{CL}=\mathcal{L}^{user}_{CL}+\mathcal{L}^{item}_{CL}caligraphic_L start_POSTSUBSCRIPT italic_C italic_L end_POSTSUBSCRIPT = caligraphic_L start_POSTSUPERSCRIPT italic_u italic_s italic_e italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_C italic_L end_POSTSUBSCRIPT + caligraphic_L start_POSTSUPERSCRIPT italic_i italic_t italic_e italic_m end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_C italic_L end_POSTSUBSCRIPT and CLsubscript𝐶𝐿\mathcal{L}_{CL}caligraphic_L start_POSTSUBSCRIPT italic_C italic_L end_POSTSUBSCRIPT is the objective function for contrastive learning.

To incorporate geolocation context into the recommendation process, the graph is augmented for contrastive learning techniques. This approach allows for aligning node representations based on different views, leveraging geolocation information alongside other contextual features. By contrasting the augmented views, the model can capture fine-grained similarities and differences between services, enhancing the accuracy and robustness of the recommendation system. Our proposed method utilzes contrastive learning and geolocation-based graph augmentation offers a valuable framework for improving QoS-based web service recommendations.

III Motivation

We describe the motivation for using GCNs, geolocation-based graph augmentation, and contrastive learning for web service recommendations from three perspectives.

III-A Why are GCNs suitable for CF?

Web QoS-based service recommendation heavily relies on understanding the interactions between users and services. GCN excels in capturing the dependencies and relationships within the graph structure, allowing it to model the complex nature of user-service interactions. By leveraging neighborhood information, GCN can effectively propagate and aggregate QoS-related signals across the graph.

In the context of CF, the basic assumption is that similar users would have similar preferences on items (services). GCNs can capture high-order connectivity patterns in the user-item interaction graph and propagate information from neighbors to learn embeddings of users and items. This is especially advantageous in CF since GCNs can capture the similarity between users by propagating information from similar users and learning their embeddings. Thus, incorporating GCNs into CFs offers a natural way to encode collaborative signals in the graph structure of the interaction network.

The integration of GCNs with CF enhances the ability to capture collaborative signals, as it considers not only the user-item interactions but also the inherent structure and connectivity patterns within the graph. This approach effectively leverages high-order connectivity on the graph to better understand user preferences and item characteristics. By incorporating this information, the recommendation system can improve the quality of web service recommendations and deliver more relevant suggestions to users.

Refer to caption
Figure 2: A framework of our proposed QAGCL

III-B Why is context information such as geolocation important?

Contextual information is crucial for QoS-based service recommendation due to the following reasons:

  • Geolocation captures location-specific preferences: Users’ service preferences and requirements may vary based on their locations. Incorporating geolocation information enables the recommendation system to consider regional preferences and recommend services that are relevant to users’ current or intended locations.

  • Environmental factors impact service performance: Geolocation information can reflect environmental factors that may affect service performance. For instance, network conditions, availability of resources, and infrastructure quality can vary across different locations. By considering geolocation, the recommendation system can account for these factors and recommend services that are suitable for specific environments.

III-C Why is it appropriate to perform contrastive learning by creating augmented views in utilizing context information?

To leverage geolocation information effectively, an augmented view can be constructed by incorporating it into the user-service interaction graph. This augmented view integrates geolocation attributes into the graph representation, creating a richer context-aware representation of the user-service interactions.

Contrastive learning techniques can be applied to the augmented view for QoS-based service recommendation. Contrastive learning aims to learn discriminative representations by contrasting positive (similar) and negative (dissimilar) instances. By using the augmented view, the recommendation system can learn contextual embeddings that capture the geolocation-related patterns and preferences in the user-service interaction graph. Contrastive learning enables the system to effectively model the relationships between services and users in the context of geolocation, enhancing the accuracy of recommendations.

IV Methododology

We describe our QAGCL which consists of GCN and a CL framework. We first review its overall architecture and then introduce details.

IV-A Overall Architecture

Fig. 2 shows our web service recommendation framework named QoS-aware graph contrastive learning (QAGCL). Our overall framework is as follows:

  • First, we preprocess the user and service invoke data to make a graph structure. We create interactions according to the Web QoS values of users and services. This graph is the original graph used for the recommendation task.

  • Next, we construct other graph views based on the original graph. We create new graphs based on distance using geolocation information such as latitude and longitude of users and services. We also randomly drop the edges of the graph to create another view of the graph.

  • Finally, the initial user and service embeddings enter into different GCNs for the three augmented graphs. Each embedding passed through the final layer has a representation of a different view. One of the embeddings from the other view performs the recommendation task, and the other two are used for contrastive learning.

IV-B Graph Augmentation

We describe the two graph augmentation operators in the following subsubsections.

ABHaversine distanceStraight line distance
Figure 3: Illustration of the Haversine distance and straight line distance between two points on a sphere.

IV-B1 Haversine Distance (HD)

We use Haversine distance for distance-based data augmentation (See Fig. 5). For spherical latitude and longitude points, Haversine distance measurements show a high degree of accuracy. Given latitude and longitude coordinates (a1subscript𝑎1a_{1}italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, b1subscript𝑏1b_{1}italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT) and (b2subscript𝑏2b_{2}italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, b2subscript𝑏2b_{2}italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT), the great circle distance d𝑑ditalic_d (km) between the two coordinates can be calculated using the Haversine formula:

d𝑑\displaystyle ditalic_d =2rarcsinsin2Δa2+cosa1cosa2sin2Δb2,absent2𝑟superscript2subscriptΔ𝑎2subscript𝑎1subscript𝑎2superscript2subscriptΔ𝑏2\displaystyle=2r\arcsin\sqrt{\sin^{2}\frac{{\Delta_{a}}}{2}+\cos a_{1}\cdot% \cos a_{2}\cdot\sin^{2}\frac{{\Delta_{b}}}{2}},= 2 italic_r roman_arcsin square-root start_ARG roman_sin start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT divide start_ARG roman_Δ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG + roman_cos italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ roman_cos italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⋅ roman_sin start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT divide start_ARG roman_Δ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG end_ARG , (5)

where r𝑟ritalic_r is the radius of the Earth (typically taken as 6371 kilometers), a1subscript𝑎1a_{1}italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and a2subscript𝑎2a_{2}italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT are the latitudes of the two points in radians, b1subscript𝑏1b_{1}italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and b2subscript𝑏2b_{2}italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT are the longitudes of the two points in radians, and ΔasubscriptΔ𝑎\Delta_{a}roman_Δ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT and ΔbsubscriptΔ𝑏\Delta_{b}roman_Δ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT mean the difference between a1subscript𝑎1a_{1}italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and a2subscript𝑎2a_{2}italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, and the difference between b1subscript𝑏1b_{1}italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and b2subscript𝑏2b_{2}italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, respectively.

We compute the Haversine distance between every user and service. Among the calculated distances, if the distance is greater than a certain threshold κ𝜅\kappaitalic_κ, the masking matrix is configured as follows:

𝐌us(HD)={1,if dusmax(dus)κ0,otherwise,subscriptsuperscript𝐌HD𝑢𝑠cases1if subscript𝑑𝑢𝑠subscript𝑑𝑢𝑠𝜅0otherwise\displaystyle\mathbf{M}^{(\text{HD})}_{us}=\begin{cases}1,&\text{if }\frac{d_{% us}}{\max(d_{us})}\leq\kappa\\ 0,&\text{otherwise}\end{cases},bold_M start_POSTSUPERSCRIPT ( HD ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_u italic_s end_POSTSUBSCRIPT = { start_ROW start_CELL 1 , end_CELL start_CELL if divide start_ARG italic_d start_POSTSUBSCRIPT italic_u italic_s end_POSTSUBSCRIPT end_ARG start_ARG roman_max ( italic_d start_POSTSUBSCRIPT italic_u italic_s end_POSTSUBSCRIPT ) end_ARG ≤ italic_κ end_CELL end_ROW start_ROW start_CELL 0 , end_CELL start_CELL otherwise end_CELL end_ROW , (6)

where 𝐌us(HD)subscriptsuperscript𝐌HD𝑢𝑠\mathbf{M}^{(\text{HD})}_{us}bold_M start_POSTSUPERSCRIPT ( HD ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_u italic_s end_POSTSUBSCRIPT represents the element of user u𝑢uitalic_u and service s𝑠sitalic_s of matrix 𝐌(HD)superscript𝐌HD\mathbf{M}^{(\text{HD})}bold_M start_POSTSUPERSCRIPT ( HD ) end_POSTSUPERSCRIPT. The max(dus)subscript𝑑𝑢𝑠\max(d_{us})roman_max ( italic_d start_POSTSUBSCRIPT italic_u italic_s end_POSTSUBSCRIPT ) represents the maximum distance value among all distances in the matrix. The threshold κ𝜅\kappaitalic_κ determines the cutoff point for deciding whether the distance is considered large or small. If the ratio of the distance to the maximum distance is less than or equal to the threshold, the element in matrix 𝐌(HD)superscript𝐌HD\mathbf{M}^{(\text{HD})}bold_M start_POSTSUPERSCRIPT ( HD ) end_POSTSUPERSCRIPT is set to 1, indicating a small distance. Otherwise, it is set to 0, indicating a large distance.

The augmentation operator for Haversine distance is defined as follows:

gHD(𝐀)=𝐌(HD)𝐀,subscript𝑔HD𝐀direct-productsuperscript𝐌𝐻𝐷𝐀\displaystyle g_{\text{HD}}(\mathbf{A})=\mathbf{M}^{(HD)}\odot\mathbf{A},italic_g start_POSTSUBSCRIPT HD end_POSTSUBSCRIPT ( bold_A ) = bold_M start_POSTSUPERSCRIPT ( italic_H italic_D ) end_POSTSUPERSCRIPT ⊙ bold_A , (7)

where direct-product\odot is a Hadamard product and 𝐌(HD){0,1}superscript𝐌HD01\mathbf{M}^{(\text{HD})}\in\{0,1\}bold_M start_POSTSUPERSCRIPT ( HD ) end_POSTSUPERSCRIPT ∈ { 0 , 1 } is a masking vector on the adjacency matrix. Only partial connections within the neighborhood contribute to the node representations.

From a Web QoS perspective, the edge dropout based on Haversine distance offers the following advantages:

  • Firstly, it allows for realistic modeling by considering the influence of physical proximity on service quality. This aligns with real-world scenarios where distance can impact factors like network latency and bandwidth limitations (See. Fig 4).

  • Second, distance-based edge drop enables localized service recommendations. By focusing services closer to the user, the system can improve the user experience, especially in applications that require low-latency interactions. In fact, as shown in Fig. 4, the WSDream dataset needs to be considered for shorter interactions because there are many interactions even when the interaction between the user and the service is far (for example, the distance between continents).

  • Moreover, incorporating distance in edge drop enhances QoS prediction. By accounting for distance, the recommendation system can better estimate service performance, leading to more accurate predictions and tailored recommendations based on users’ geographical context.

Refer to caption
Figure 4: The geolocations information of WSDream. The red (resp. blue) circles are the users’ (resp. services’) location. The black lines are interactions between users and services.

IV-B2 Edge Dropout (ED)

It drops out the edges in graph with a dropout ratio ρ𝜌\rhoitalic_ρ. The augmentation operator for the edge dropout process is represented as:

gED(𝐀)=𝐌(ED)𝐀,subscript𝑔ED𝐀direct-productsuperscript𝐌ED𝐀\displaystyle g_{\text{ED}}(\mathbf{A})=\mathbf{M}^{(\text{ED})}\odot\mathbf{A},italic_g start_POSTSUBSCRIPT ED end_POSTSUBSCRIPT ( bold_A ) = bold_M start_POSTSUPERSCRIPT ( ED ) end_POSTSUPERSCRIPT ⊙ bold_A , (8)

where 𝐌(ED){0,1}superscript𝐌ED01\mathbf{M}^{(\text{ED})}\in\{0,1\}bold_M start_POSTSUPERSCRIPT ( ED ) end_POSTSUPERSCRIPT ∈ { 0 , 1 } is the masking vector on the adjacency matrix. Random edge drop is performed in order to simulate noise or uncertainty in the original user-service graph. While the original graph is constructed based on QoS values, it may not account for other contextual information such as distance. By randomly drop edges, we introduce randomness into the graph, which helps to mimic real-world scenarios where the availability or quality of services can vary.

From a Web QoS perspective, random edge drop allows us to model situations where certain services may have intermittent availability or varying performance. In real-world web service environments, factors such as network congestion, server load, or temporary service unavailability can affect the quality of service experienced by users. By randomly dropping edges, we can simulate these scenarios and evaluate the robustness or resilience of recommendation algorithms to such fluctuations in service quality.

Moreover, random edge dropping also helps in assessing the generalization capability of recommendation algorithms. In practice, a recommendation model should be able to provide reasonable recommendations even in the presence of noise or missing information. By introducing random edge drops, we create an environment that tests the ability of the recommendation system to handle incomplete or uncertain data, which is often encountered in real-world web service scenarios.

IV-C Graph Convolutional Networks

Mining inherent patterns in graphs is helpful for representation learning. To do this, we use GCNs to capture better user-service interaction, devising GCNs for Haversine distance-based edge drop and random edge drop to create different views for nodes. These GCNs can be represented as follows:

𝐄(l)superscript𝐄𝑙\displaystyle\mathbf{E}^{(l)}bold_E start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT =𝐀~𝐄(l1),absent~𝐀superscript𝐄𝑙1\displaystyle=\tilde{\mathbf{A}}\mathbf{E}^{(l-1)},= over~ start_ARG bold_A end_ARG bold_E start_POSTSUPERSCRIPT ( italic_l - 1 ) end_POSTSUPERSCRIPT , (9)
𝐄(l)superscript𝐄𝑙\displaystyle\mathbf{E}^{\prime(l)}bold_E start_POSTSUPERSCRIPT ′ ( italic_l ) end_POSTSUPERSCRIPT =𝐀~HD𝐄(l1),absentsubscript~𝐀HDsuperscript𝐄𝑙1\displaystyle=\tilde{\mathbf{A}}_{\text{HD}}\mathbf{E}^{\prime(l-1)},= over~ start_ARG bold_A end_ARG start_POSTSUBSCRIPT HD end_POSTSUBSCRIPT bold_E start_POSTSUPERSCRIPT ′ ( italic_l - 1 ) end_POSTSUPERSCRIPT , (10)
𝐄′′(l)superscript𝐄′′𝑙\displaystyle\mathbf{E}^{\prime\prime(l)}bold_E start_POSTSUPERSCRIPT ′ ′ ( italic_l ) end_POSTSUPERSCRIPT =𝐀~ED𝐄′′(l1),absentsubscript~𝐀EDsuperscript𝐄′′𝑙1\displaystyle=\tilde{\mathbf{A}}_{\text{ED}}\mathbf{E}^{\prime\prime(l-1)},= over~ start_ARG bold_A end_ARG start_POSTSUBSCRIPT ED end_POSTSUBSCRIPT bold_E start_POSTSUPERSCRIPT ′ ′ ( italic_l - 1 ) end_POSTSUPERSCRIPT , (11)

where 𝐄(l)=[𝐄u(l),𝐄s(l)]superscript𝐄𝑙superscriptsubscript𝐄𝑢𝑙superscriptsubscript𝐄𝑠𝑙\mathbf{E}^{(l)}=[\mathbf{E}_{u}^{(l)},\mathbf{E}_{s}^{(l)}]bold_E start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT = [ bold_E start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT , bold_E start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ] is the input feature matrix at layer l𝑙litalic_l and 𝐀~~𝐀\tilde{\mathbf{A}}over~ start_ARG bold_A end_ARG is calculated from original bipartite graph 𝐀𝐀\mathbf{A}bold_A. 𝐀~HDsubscript~𝐀HD\tilde{\mathbf{A}}_{\text{HD}}over~ start_ARG bold_A end_ARG start_POSTSUBSCRIPT HD end_POSTSUBSCRIPT and 𝐀~EDsubscript~𝐀ED\tilde{\mathbf{A}}_{\text{ED}}over~ start_ARG bold_A end_ARG start_POSTSUBSCRIPT ED end_POSTSUBSCRIPT are calculated by augmented operators gHD()subscript𝑔HDg_{\text{HD}}(\cdot)italic_g start_POSTSUBSCRIPT HD end_POSTSUBSCRIPT ( ⋅ ) and gED()subscript𝑔EDg_{\text{ED}}(\cdot)italic_g start_POSTSUBSCRIPT ED end_POSTSUBSCRIPT ( ⋅ ), respectively. Three different views are computed from each of the three different graphs.

IV-D Prediction Layer

The prediction layer is the layer to construct the final embedding after propagating through all the L𝐿Litalic_L layers. The final embedding uses the weighted sum of the embeddings of each layer:

𝐄u(final)=i=0Lwi𝐄s(i),𝐄s(final)=i=0Lwi𝐄u(i),formulae-sequencesuperscriptsubscript𝐄𝑢𝑓𝑖𝑛𝑎𝑙superscriptsubscript𝑖0𝐿subscript𝑤𝑖superscriptsubscript𝐄𝑠𝑖superscriptsubscript𝐄𝑠𝑓𝑖𝑛𝑎𝑙superscriptsubscript𝑖0𝐿subscript𝑤𝑖superscriptsubscript𝐄𝑢𝑖\displaystyle\mathbf{E}_{u}^{(final)}=\sum_{i=0}^{L}w_{i}\mathbf{E}_{s}^{(i)},% \;\mathbf{E}_{s}^{(final)}=\sum_{i=0}^{L}w_{i}\mathbf{E}_{u}^{(i)},bold_E start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_f italic_i italic_n italic_a italic_l ) end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_E start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT , bold_E start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_f italic_i italic_n italic_a italic_l ) end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_E start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT , (12)

where wisubscript𝑤𝑖w_{i}italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT means the weight in each layer. If the w𝑤witalic_w value is the same in all i𝑖iitalic_i, the average of the embedding values in all layers is used. The weighted sum can achieve good performance by using not only the last layer’s embedding, but also the previous layer’s embedding. After calculating 𝐄u(final)superscriptsubscript𝐄𝑢𝑓𝑖𝑛𝑎𝑙\mathbf{E}_{u}^{(final)}bold_E start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_f italic_i italic_n italic_a italic_l ) end_POSTSUPERSCRIPT and 𝐄s(final)superscriptsubscript𝐄𝑠𝑓𝑖𝑛𝑎𝑙\mathbf{E}_{s}^{(final)}bold_E start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_f italic_i italic_n italic_a italic_l ) end_POSTSUPERSCRIPT, the rating of user u𝑢uitalic_u for service s𝑠sitalic_s is predicted. The dot product of 𝐄u(final)superscriptsubscript𝐄𝑢𝑓𝑖𝑛𝑎𝑙\mathbf{E}_{u}^{(final)}bold_E start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_f italic_i italic_n italic_a italic_l ) end_POSTSUPERSCRIPT and 𝐄s(final)superscriptsubscript𝐄𝑠𝑓𝑖𝑛𝑎𝑙\mathbf{E}_{s}^{(final)}bold_E start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_f italic_i italic_n italic_a italic_l ) end_POSTSUPERSCRIPT is performed to predict ratings:

r^u,ssubscript^𝑟𝑢𝑠\displaystyle\hat{r}_{u,s}over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT italic_u , italic_s end_POSTSUBSCRIPT =𝐞u(final)T𝐞s(final),absentsuperscriptsubscript𝐞𝑢𝑓𝑖𝑛𝑎𝑙Tsuperscriptsubscript𝐞𝑠𝑓𝑖𝑛𝑎𝑙\displaystyle=\mathbf{e}_{u}^{(final)\textsc{T}}\mathbf{e}_{s}^{(final)},= bold_e start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_f italic_i italic_n italic_a italic_l ) T end_POSTSUPERSCRIPT bold_e start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_f italic_i italic_n italic_a italic_l ) end_POSTSUPERSCRIPT , (13)

where r^u,ssubscript^𝑟𝑢𝑠\hat{r}_{u,s}over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT italic_u , italic_s end_POSTSUBSCRIPT is the predicted rating value.

Input: Web QoS normalized adjacency matrix 𝐀~~𝐀\tilde{\mathbf{A}}over~ start_ARG bold_A end_ARG, The number of total layers K𝐾Kitalic_K
1 Initialize 𝐄u(0)superscriptsubscript𝐄𝑢0\mathbf{E}_{u}^{(0)}bold_E start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT and 𝐄s(0)superscriptsubscript𝐄𝑠0\mathbf{E}_{s}^{(0)}bold_E start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT;
2 Generate augmented graphs 𝐀~HDsubscript~𝐀HD\tilde{\mathbf{A}}_{\text{HD}}over~ start_ARG bold_A end_ARG start_POSTSUBSCRIPT HD end_POSTSUBSCRIPT and 𝐀~EDsubscript~𝐀ED\tilde{\mathbf{A}}_{\text{ED}}over~ start_ARG bold_A end_ARG start_POSTSUBSCRIPT ED end_POSTSUBSCRIPT;
3 while the joint loss \mathcal{L}caligraphic_L is not converged do
4       for l1𝑡𝑜Lnormal-←𝑙1𝑡𝑜𝐿l\leftarrow 1\;\text{to}\;Litalic_l ← 1 to italic_L do
5             𝐄u(l),𝐄s(l)=Eqsuperscriptsubscript𝐄𝑢𝑙superscriptsubscript𝐄𝑠𝑙Eq\mathbf{E}_{u}^{(l)},\mathbf{E}_{s}^{(l)}=\text{Eq}bold_E start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT , bold_E start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT = Eq. (9), Eq. (9) with 𝐀~~𝐀\tilde{\mathbf{A}}over~ start_ARG bold_A end_ARG;
6             𝐄u(l),𝐄s(l)=Eq\mathbf{E}^{{}^{\prime}(l)}_{u},\mathbf{E}^{{}^{\prime}(l)}_{s}=\text{Eq}bold_E start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT , bold_E start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT = Eq. (10), Eq. (10) with 𝐀~HDsubscript~𝐀HD\tilde{\mathbf{A}}_{\text{HD}}over~ start_ARG bold_A end_ARG start_POSTSUBSCRIPT HD end_POSTSUBSCRIPT;
7             𝐄u(l)′′,𝐄s(l)′′=Eq\mathbf{E}^{{}^{\prime\prime}(l)}_{u},\mathbf{E}^{{}^{\prime\prime}(l)}_{s}=% \text{Eq}bold_E start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ ′ end_FLOATSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT , bold_E start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ ′ end_FLOATSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT = Eq. (11), Eq. (11) with 𝐀~EDsubscript~𝐀ED\tilde{\mathbf{A}}_{\text{ED}}over~ start_ARG bold_A end_ARG start_POSTSUBSCRIPT ED end_POSTSUBSCRIPT;
8            
9      𝐄u(final),𝐄s(final)=Eq.(12),Eq.(12)formulae-sequencesuperscriptsubscript𝐄𝑢𝑓𝑖𝑛𝑎𝑙superscriptsubscript𝐄𝑠𝑓𝑖𝑛𝑎𝑙Eqitalic-(12italic-)Eqitalic-(12italic-)\mathbf{E}_{u}^{(final)},\mathbf{E}_{s}^{(final)}=\text{Eq}.~{}\eqref{eq:wsum1% },\text{Eq}.~{}\eqref{eq:wsum1}bold_E start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_f italic_i italic_n italic_a italic_l ) end_POSTSUPERSCRIPT , bold_E start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_f italic_i italic_n italic_a italic_l ) end_POSTSUPERSCRIPT = Eq . italic_( italic_) , Eq . italic_( italic_);
10       𝐄u(final),𝐄s(final)=Eq.(14),Eq.(14)\mathbf{E}_{u}^{{}^{\prime}(final)},\mathbf{E}_{s}^{{}^{\prime}(final)}=\text{% Eq}.~{}\eqref{eq:wsum2},\text{Eq}.~{}\eqref{eq:wsum2}bold_E start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT ( italic_f italic_i italic_n italic_a italic_l ) end_POSTSUPERSCRIPT , bold_E start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT ( italic_f italic_i italic_n italic_a italic_l ) end_POSTSUPERSCRIPT = Eq . italic_( italic_) , Eq . italic_( italic_);
11       𝐄u(final)′′,𝐄s(final)′′=Eq.(15),Eq.(15)\mathbf{E}_{u}^{{}^{\prime\prime}(final)},\mathbf{E}_{s}^{{}^{\prime\prime}(% final)}=\text{Eq}.~{}\eqref{eq:wsum3},\text{Eq}.~{}\eqref{eq:wsum3}bold_E start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ ′ end_FLOATSUPERSCRIPT ( italic_f italic_i italic_n italic_a italic_l ) end_POSTSUPERSCRIPT , bold_E start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ ′ end_FLOATSUPERSCRIPT ( italic_f italic_i italic_n italic_a italic_l ) end_POSTSUPERSCRIPT = Eq . italic_( italic_) , Eq . italic_( italic_);
12       r^u,s=Eqsubscript^𝑟𝑢𝑠Eq\hat{r}_{u,s}=\text{Eq}over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT italic_u , italic_s end_POSTSUBSCRIPT = Eq. (13);
13       Compute the joint objective loss with;
14       Update 𝐄u(0)superscriptsubscript𝐄𝑢0\mathbf{E}_{u}^{(0)}bold_E start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT and 𝐄s(0)superscriptsubscript𝐄𝑠0\mathbf{E}_{s}^{(0)}bold_E start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT with joint loss;
15      
return 𝐄u(0)superscriptsubscript𝐄𝑢0\mathbf{E}_{u}^{(0)}bold_E start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT and 𝐄s(0)superscriptsubscript𝐄𝑠0\mathbf{E}_{s}^{(0)}bold_E start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT;
Algorithm 1 Training algorithm of QAGCL

IV-E Contrastive Learning

We use two generated views from Eqs. (10) and (11) as follows:

𝐄u(final)=i=0Lwi𝐄s(i),𝐄s(final)=i=0Lwi𝐄u(i),\displaystyle\mathbf{E}_{u}^{{}^{\prime}(final)}=\sum_{i=0}^{L}w_{i}\mathbf{E}% _{s}^{{}^{\prime}(i)},\;\mathbf{E}_{s}^{{}^{\prime}(final)}=\sum_{i=0}^{L}w_{i% }\mathbf{E}_{u}^{{}^{\prime}(i)},bold_E start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT ( italic_f italic_i italic_n italic_a italic_l ) end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_E start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT , bold_E start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT ( italic_f italic_i italic_n italic_a italic_l ) end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_E start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT , (14)
𝐄u(final)′′=i=0Lwi𝐄s(i)′′,𝐄s(final)′′=i=0Lwi𝐄u(i)′′.\displaystyle\mathbf{E}_{u}^{{}^{\prime\prime}(final)}=\sum_{i=0}^{L}w_{i}% \mathbf{E}_{s}^{{}^{\prime\prime}(i)},\;\mathbf{E}_{s}^{{}^{\prime\prime}(% final)}=\sum_{i=0}^{L}w_{i}\mathbf{E}_{u}^{{}^{\prime\prime}(i)}.bold_E start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ ′ end_FLOATSUPERSCRIPT ( italic_f italic_i italic_n italic_a italic_l ) end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_E start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ ′ end_FLOATSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT , bold_E start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ ′ end_FLOATSUPERSCRIPT ( italic_f italic_i italic_n italic_a italic_l ) end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_E start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ ′ end_FLOATSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT . (15)

After generating the different two views, we employ a contrastive objective that enforces the filtered representations of each node in the two views to agree with each other. We perform the CL training by directly contrasting the distance-based augmented view 𝐞(final)superscript𝐞𝑓𝑖𝑛𝑎𝑙\mathbf{e}^{\prime(final)}bold_e start_POSTSUPERSCRIPT ′ ( italic_f italic_i italic_n italic_a italic_l ) end_POSTSUPERSCRIPT with the random edge drop-based augmented view 𝐞′′(final)superscript𝐞′′𝑓𝑖𝑛𝑎𝑙\mathbf{e}^{\prime\prime(final)}bold_e start_POSTSUPERSCRIPT ′ ′ ( italic_f italic_i italic_n italic_a italic_l ) end_POSTSUPERSCRIPT using the InfoNCE [39] loss:

CL=ilogexp(sim(𝐞i(final),𝐞i′′(final))/τ)jexp(sim(𝐞i(final),𝐞j′′(final)/τ),\displaystyle\mathcal{L}_{CL}=\sum_{i\in\mathcal{B}}-log\frac{\exp(\text{sim}(% \mathbf{e}^{\prime(final)}_{i},\mathbf{e}^{\prime\prime(final)}_{i})/\tau)}{% \sum_{j\in\mathcal{B}}\exp(\text{sim}(\mathbf{e}^{\prime(final)}_{i},\mathbf{e% }^{\prime\prime(final)}_{j}/\tau)},caligraphic_L start_POSTSUBSCRIPT italic_C italic_L end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_B end_POSTSUBSCRIPT - italic_l italic_o italic_g divide start_ARG roman_exp ( sim ( bold_e start_POSTSUPERSCRIPT ′ ( italic_f italic_i italic_n italic_a italic_l ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_e start_POSTSUPERSCRIPT ′ ′ ( italic_f italic_i italic_n italic_a italic_l ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) / italic_τ ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_B end_POSTSUBSCRIPT roman_exp ( sim ( bold_e start_POSTSUPERSCRIPT ′ ( italic_f italic_i italic_n italic_a italic_l ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_e start_POSTSUPERSCRIPT ′ ′ ( italic_f italic_i italic_n italic_a italic_l ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT / italic_τ ) end_ARG , (16)

where i𝑖iitalic_i, j𝑗jitalic_j are a user and an item in a sampled batch \mathcal{B}caligraphic_B, and 𝐞i(final)subscriptsuperscript𝐞𝑓𝑖𝑛𝑎𝑙𝑖\mathbf{e}^{\prime(final)}_{i}bold_e start_POSTSUPERSCRIPT ′ ( italic_f italic_i italic_n italic_a italic_l ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, 𝐞i′′(final)subscriptsuperscript𝐞′′𝑓𝑖𝑛𝑎𝑙𝑖\mathbf{e}^{\prime\prime(final)}_{i}bold_e start_POSTSUPERSCRIPT ′ ′ ( italic_f italic_i italic_n italic_a italic_l ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, and 𝐞j′′(final)subscriptsuperscript𝐞′′𝑓𝑖𝑛𝑎𝑙𝑗\mathbf{e}^{\prime\prime(final)}_{j}bold_e start_POSTSUPERSCRIPT ′ ′ ( italic_f italic_i italic_n italic_a italic_l ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT are node representations from Eqs. (14) and (15).

IV-F How to Train

We use the Bayesian Personalized Ranking (BPR) loss function [40] together with Eq. (16). As shown in Eq. (17), therefore, our joint learning objective is as follows:

=main+λ1CL+λ2𝚯22,subscript𝑚𝑎𝑖𝑛subscript𝜆1subscript𝐶𝐿subscript𝜆2subscriptsuperscriptnorm𝚯22\displaystyle\mathcal{L}=\mathcal{L}_{main}+\lambda_{1}\cdot\mathcal{L}_{CL}+% \lambda_{2}\cdot\|\mathbf{\Theta}\|^{2}_{2},caligraphic_L = caligraphic_L start_POSTSUBSCRIPT italic_m italic_a italic_i italic_n end_POSTSUBSCRIPT + italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ caligraphic_L start_POSTSUBSCRIPT italic_C italic_L end_POSTSUBSCRIPT + italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⋅ ∥ bold_Θ ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , (17)

which consists of the Bayesian personalized ranking (BPR) loss mainsubscript𝑚𝑎𝑖𝑛\mathcal{L}_{main}caligraphic_L start_POSTSUBSCRIPT italic_m italic_a italic_i italic_n end_POSTSUBSCRIPT and the CL loss CLsubscript𝐶𝐿\mathcal{L}_{CL}caligraphic_L start_POSTSUBSCRIPT italic_C italic_L end_POSTSUBSCRIPT. The hyperparameters λ1subscript𝜆1\lambda_{1}italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and λ2subscript𝜆2\lambda_{2}italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT control the trade-off among the two loss functions and the regularization term. 𝚯𝚯\mathbf{\Theta}bold_Θ denotes the embeddings to learn, i.e, 𝚯=𝐄(0)𝚯𝐄0\mathbf{\Theta}=\mathbf{E}(0)bold_Θ = bold_E ( 0 ) in our framework. mainsubscript𝑚𝑎𝑖𝑛\mathcal{L}_{main}caligraphic_L start_POSTSUBSCRIPT italic_m italic_a italic_i italic_n end_POSTSUBSCRIPT is defined as:

main=(u,i,j)log(σ(r^uir^uj)),subscript𝑚𝑎𝑖𝑛subscript𝑢𝑖𝑗𝜎subscript^𝑟𝑢𝑖subscript^𝑟𝑢𝑗\displaystyle\mathcal{L}_{main}=-\sum_{(u,i,j)\in\mathcal{B}}\log(\sigma(\hat{% r}_{ui}-\hat{r}_{uj})),caligraphic_L start_POSTSUBSCRIPT italic_m italic_a italic_i italic_n end_POSTSUBSCRIPT = - ∑ start_POSTSUBSCRIPT ( italic_u , italic_i , italic_j ) ∈ caligraphic_B end_POSTSUBSCRIPT roman_log ( italic_σ ( over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT italic_u italic_i end_POSTSUBSCRIPT - over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT italic_u italic_j end_POSTSUBSCRIPT ) ) , (18)

where σ𝜎\sigmaitalic_σ is the sigmoid function, r^uisubscript^𝑟𝑢𝑖\hat{r}_{ui}over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT italic_u italic_i end_POSTSUBSCRIPT and r^ujsubscript^𝑟𝑢𝑗\hat{r}_{uj}over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT italic_u italic_j end_POSTSUBSCRIPT denote the predicted rating scores for a pair of positive and negative services of user u𝑢uitalic_u.

After minimizing the joint loss in Eq. (17), we use the output embeddings of the GCN, i.e., 𝐄u(final)superscriptsubscript𝐄𝑢𝑓𝑖𝑛𝑎𝑙\mathbf{E}_{u}^{(final)}bold_E start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_f italic_i italic_n italic_a italic_l ) end_POSTSUPERSCRIPT and 𝐄i(final)superscriptsubscript𝐄𝑖𝑓𝑖𝑛𝑎𝑙\mathbf{E}_{i}^{(final)}bold_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_f italic_i italic_n italic_a italic_l ) end_POSTSUPERSCRIPT, as the final representation. The exact training method is described in Alg. 1.

V Experiments

To justify the superiority of QAGCL and reveal the reasons of its effectiveness, we conduct extensive experiments and answer the following research questions:

  1. 1.

    RQ1: How does QAGCL perform w.r.t. top-K recommendation as compared with the CF models?

  2. 2.

    RQ2: How effectively does the QAGCL model mitigate the cold-start problem in comparison to the existing methodologies?

  3. 3.

    RQ3: How does the QAGCL model affect the quality of QoS-based service recommendations based on graph augmentation techniques?

  4. 4.

    RQ4: How does varying the number of graph layers affect the performance of the proposed QAGCL?

V-A Experimental Settings

V-A1 Datasets

The web service QoS dataset used in the experiment is WSDream [8], and response time is used as the QoS value***http://wsdream.github.io/dataset The dataset is available for download.. The dataset used in the experiment is the same as Table I, and the test set ratio is 20%. In order to construct the dataset, it is assumed that there is connectivity when response time tijsubscript𝑡𝑖𝑗t_{ij}italic_t start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT is lower than a certain threshold. For example, if tijsubscript𝑡𝑖𝑗t_{ij}italic_t start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT is less than γ𝛾\gammaitalic_γ, it is regarded as a positive interaction and a graph is constructed. We set the threshold γ=0.05s𝛾0.05𝑠\gamma=0.05sitalic_γ = 0.05 italic_s. While the number of interactions in the warm-start environment is 57,727, there are two datasets for the cold-start environment, each with 8,490 and 1,036 interaction matrices. Cold-start-ex means that the density is 2.75%, which is more extreme than Cold-start, which is 5.36%. In the case of the warm-start environment, one user configured more than 10 interactions so that training and test datasets can be configured. In the case of cold-start environment, filtering was performed so that there were more than 2 interactions.

V-A2 Evaluation Metrics

For each user in the test set, all non-interaction services are considered as negative samples, and the model calculates the user’s rating for all samples except for the positive samples used in the training dataset. To compare the performance of our model with the baseline model for top-K𝐾Kitalic_K recommendations, we use Recall@K𝐾Kitalic_K and NDCG@K𝐾Kitalic_K, commonly used in rank-based evaluation, as evaluation metrics. Recall@K𝐾Kitalic_K represents the ratio of K𝐾Kitalic_K recommended services out of all services and is defined as follows:

Recall@K=relKmin(K,rel).Recall@𝐾subscriptrel𝐾𝐾rel\displaystyle\text{Recall@}K=\frac{\text{rel}_{K}}{\min(K,\text{rel})}.Recall@ italic_K = divide start_ARG rel start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT end_ARG start_ARG roman_min ( italic_K , rel ) end_ARG . (19)

relKsubscriptrel𝐾\text{rel}_{K}rel start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT means the number of related items in the top-K𝐾Kitalic_K results, and rel means the total number of related items for the user. Recall@K𝐾Kitalic_K means the ratio of how many K𝐾Kitalic_K systems recommended by the model are included among all services associated with the user. Normalized Discounted Cumulative Gain (NDCG) evaluates the difference between a list of recommended items and a list of optimally ranked items and is defined as:

NDCG@K=DCG@KIDCG@K.NDCG@𝐾DCG@𝐾IDCG@𝐾\displaystyle\text{NDCG@}K=\frac{\text{DCG@}K}{\text{IDCG@}K}.NDCG@ italic_K = divide start_ARG DCG@ italic_K end_ARG start_ARG IDCG@ italic_K end_ARG . (20)

NDCG@K𝐾Kitalic_K evaluates performance by weighting the order of recommendations, and the closer to 1, the better the performance. DCG@K𝐾Kitalic_K and IDCG@K𝐾Kitalic_K are DCG (Discounted Cumulative Gain) of the top k𝑘kitalic_k items of predicted rank and ideal rank, respectively. DCG@K𝐾Kitalic_K is calculated as:

DCG@K=i=1K2reli1log2(i+1),DCG@𝐾superscriptsubscript𝑖1𝐾superscript2subscriptrel𝑖1subscript2𝑖1\displaystyle\text{DCG@}K=\sum_{i=1}^{K}\frac{2^{\text{rel}_{i}-1}}{\log_{2}(i% +1)},DCG@ italic_K = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT divide start_ARG 2 start_POSTSUPERSCRIPT rel start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT end_ARG start_ARG roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_i + 1 ) end_ARG , (21)

where relisubscriptrel𝑖\text{rel}_{i}rel start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the value of the item at the rank position. The value of NDCG is between 0 and 1, and the higher the value, the higher the rank, and 1 represents the ideal rank.

TABLE I: Dataset information of WSDream
Dataset WSDream WSDream WSDream
(Warm-start) (Cold-start) (Cold-start-ex)
# Users 338 275 172
# Services 5,824 575 219
γ𝛾\gammaitalic_γ 0.05 0.02 0.01
Core 10 2 2
Density 13.64% 5.36% 2.75%
# Interactions 57,727 8,490 1,036
TABLE II: Comparison of Recall@K and NDCG@K recommendation performance of each model for WSDream (Warm-start). boldface is the best performance, underline is the second-best performance. Improvement stands for the improvement over second-best recommendation performance.
Model WSDream (Warm-start)
Recall@20 NDCG@20 Recall@40 NDCG@40
UMEAN 0.0943 0.2134 0.0989 0.2193
IMEAN 0.0884 0.2009 0.0919 0.2074
BPR-MF 0.2012 0.4170 0.3390 0.4483
NeuMF 0.1950 0.4104 0.3041 0.4198
NGCF 0.2095 0.4294 0.3409 0.4511
LightGCN 0.2113 0.4325 0.3419 0.4595
SGL 0.2158 0.4628 0.3569 0.4958
SimGCL 0.2150 0.4563 0.3566 0.4910
LightGCL 0.1946 0.4268 0.2974 0.4271
QAGCL 0.2212 0.4751 0.3825 0.5149
Improvement 2.50% 2.66% 7.17% 3.85%
TABLE III: Comparison of Recall@K and NDCG@K recommendation performance of each model for WSDream (Cold-start)
Model WSDream (Cold-start)
Recall@20 NDCG@20 Recall@40 NDCG@40
UMEAN 0.2193 0.1843 0.2443 0.2022
IMEAN 0.2034 0.1792 0.2431 0.1984
BPR-MF 0.4376 0.3767 0.6321 0.4363
NeuMF 0.4012 0.3551 0.6104 0.4111
NGCF 0.5015 0.4532 0.6459 0.4913
LightGCN 0.5751 0.5009 0.7274 0.5513
SGL 0.6123 0.5717 0.7897 0.6251
SimGCL 0.6388 0.5702 0.8002 0.6249
LightGCL 0.6077 0.4985 0.7516 0.5466
QAGCL 0.6426 0.5845 0.8300 0.6450
Improvement 0.59% 2.24% 3.72% 3.18%
TABLE IV: Comparison of Recall@K and NDCG@K recommendation performance of each model for WSDream (Cold-start-ex)
Model WSDream (Cold-start-ex)
Recall@20 NDCG@20 Recall@40 NDCG@40
UMEAN 0.2012 0.1204 0.2444 0.1455
IMEAN 0.1994 0.1195 0.2402 0.1412
BPR-MF 0.4284 0.2585 0.5800 0.2999
NeuMF 0.4020 0.2498 0.5712 0.2901
NGCF 0.7412 0.5066 0.7865 0.5150
LightGCN 0.9026 0.7204 0.9404 0.7337
SGL 0.8997 0.7155 0.9471 0.7303
SimGCL 0.9139 0.7185 0.9410 0.7291
LightGCL 0.8083 0.6493 0.8940 0.6749
QAGCL 0.9178 0.7430 0.9491 0.7554
Improvement 0.42% 3.14% 0.21% 2.96%

V-A3 Compared Baselines

We compare our model against 9 baselines with different learning paradigms:

  • Traditional baseline: UMEAN predicts missing values by averaging the available QoS values based on the target user. IMEAN predicts missing values by averaging the available QoS values based on the target Web service.

  • Matrix factorization: BPR-MF [40] is a classical collaborative filtering algorithm that minimizes a pair-wise loss function to learn implicit feedback. In BPR, MF is used to initialize the embedding of users and items. NeuMF [41] is a collaborative filtering algorithm that uses non-linear hidden layers on the interaction of users and item embeddings to capture interactions.

  • Graph-based collaborative filtering: NGCF [24] and LightGCN [25].

  • Contrastive Learning for Collaborative Fitlering: SGL [36], SimGCL [37], and LightGCL [38].

V-A4 Hyperparameters

For fair comparison with previous studies, we set the size of embedding D𝐷Ditalic_D as 64, the number of epochs as 100, and the same test split ratio as 0.2. We also further search the best hyperparameters for baselines based on their recommended settings. For our method, we test the following hyperparameters:

  • The learning rate is in {1.0×1041.0E-41.0\text{\times}{10}^{-4}start_ARG 1.0 end_ARG start_ARG times end_ARG start_ARG power start_ARG 10 end_ARG start_ARG - 4 end_ARG end_ARG, 5.0×1045.0E-45.0\text{\times}{10}^{-4}start_ARG 5.0 end_ARG start_ARG times end_ARG start_ARG power start_ARG 10 end_ARG start_ARG - 4 end_ARG end_ARG, 1.0×1031.0E-31.0\text{\times}{10}^{-3}start_ARG 1.0 end_ARG start_ARG times end_ARG start_ARG power start_ARG 10 end_ARG start_ARG - 3 end_ARG end_ARG, 5.0×1035.0E-35.0\text{\times}{10}^{-3}start_ARG 5.0 end_ARG start_ARG times end_ARG start_ARG power start_ARG 10 end_ARG start_ARG - 3 end_ARG end_ARG, 1.0×1021.0E-21.0\text{\times}{10}^{-2}start_ARG 1.0 end_ARG start_ARG times end_ARG start_ARG power start_ARG 10 end_ARG start_ARG - 2 end_ARG end_ARG};

  • The regularization weight for the InfoNCE loss λ1subscript𝜆1\lambda_{1}italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is in {0.1,0.2,,1.0}0.10.21.0\{0.1,0.2,\cdots,1.0\}{ 0.1 , 0.2 , ⋯ , 1.0 };

  • The regularization weight λ2subscript𝜆2\lambda_{2}italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT is in {1.0×107,1.0×106,1.0×105}1.0E-71.0E-61.0E-5\{$1.0\text{\times}{10}^{-7}$,$1.0\text{\times}{10}^{-6}$,$1.0\text{\times}{10% }^{-5}$\}{ start_ARG 1.0 end_ARG start_ARG times end_ARG start_ARG power start_ARG 10 end_ARG start_ARG - 7 end_ARG end_ARG , start_ARG 1.0 end_ARG start_ARG times end_ARG start_ARG power start_ARG 10 end_ARG start_ARG - 6 end_ARG end_ARG , start_ARG 1.0 end_ARG start_ARG times end_ARG start_ARG power start_ARG 10 end_ARG start_ARG - 5 end_ARG end_ARG }.

  • The ratio ρ𝜌\rhoitalic_ρ for gEDsubscript𝑔EDg_{\text{ED}}italic_g start_POSTSUBSCRIPT ED end_POSTSUBSCRIPT is in {0.1,0.2,,0.5}0.10.20.5\{0.1,0.2,\cdots,0.5\}{ 0.1 , 0.2 , ⋯ , 0.5 };

  • The ratio κ𝜅\kappaitalic_κ for gEDsubscript𝑔EDg_{\text{ED}}italic_g start_POSTSUBSCRIPT ED end_POSTSUBSCRIPT is in {0.1,0.2,,0.5}0.10.20.5\{0.1,0.2,\cdots,0.5\}{ 0.1 , 0.2 , ⋯ , 0.5 };

  • The number of layer L𝐿Litalic_L is in {1,2,3,4}1234\{1,2,3,4\}{ 1 , 2 , 3 , 4 };

V-B Evaluation of Top-K Recommendation Performance (RQ1)

In the WSDream (Warm-start) dataset of Table II, we compare the recommendation performances between our proposed QAGCL method and baseline models. QAGCL shows the best performance in all evaluation metrics. Based on Recall@40, QAGCL improved by 7.17% over SGL, and based on NDCG@40, it showed a performance improvement of 3.85% over the recommended performance of BPR-MF. Among the baseline models, recommendation models based on graph contrastive learning generally show better results. This shows that the effect of contrastive learning works in performing web service recommendation on the user-service interaction dataset.

In contrast, graph convolution-based CFs show superior recommendation performance compared to BPR-MF and NeuMF. This difference can be attributed to the limitations of the MF-based methods in effectively utilizing neighbor information and high-order connectivity of users and services. NGCF and LightGCN, on the other hand, exhibit slightly improved performance due to their explicit exploration of higher-order connections within the neighborhood. UMEAN and IMEAN notably show relatively lower recommendation performance than the other baselines.

V-C Cold-Start Problem Mitigation (RQ2)

We test the effectiveness of the QAGCL model and baseline models in mitigating the cold-start problem by configuring the cold-start environment by tuning the sparsity of WSDream dataset. In Tables III and IV, we compare the results of the WSDream (Cold-start) dataset, the cold-start setting of WSDream, and the extreme cold-start setting, WSDream (Cold-start-ex).

V-C1 Result of WSDream (cold-start)

In Table III, QAGCL shows the best performance in all baselines. Based on Recall@40, the cold-start dataset showed an improvement of 3.72% compared to SimGCL. In this cold-start environment, contrastive learning-based models show better performance than graph convolution-based CF methods.

V-C2 Result of WSDream (Cold-start-ex)

Table  IV also shows that QAGCL shows the highest recommendation performance in all evaluation scales, improving by 3.14% over LightGCN and 3.41% over SimGCL based on NDCG@20. In addition, LightGCN shows the second-best Recall@40 performance of 0.7337, while QAGCL shows a recommendation performance of 0.7554, an improvement of 2.96%.

Through this experiment, our proposed QAGCL shows a better method even in a cold-start environments with sparse interactions in the dataset for RQ2. Therefore, the necessity of our proposed design can be confirmed in both warm-start and cold-start environments.

V-D Impact of Graph Augmentation Techniques on QoS-Based Service Recommendations (RQ3)

As shown in Table V, using a combination of HD and ED augmetnation operators for QoS-based service recommendations is effective. The HD takes into account the physical proximity of services. The ED reflects the unpredictable nature of service usage patterns and incorporates diverse perspectives. The randomness of ED allows the QAGCL to adapt to different user behaviors and environmental factors, enhancing the robustness and generalization ability of the recommendation system. By combining these approaches, the augmented graph benefits from localized proximity information and adaptability to diverse user preferences and network conditions. This hybrid approach enhances recommendation performance from a Web QoS perspective.

TABLE V: Graph augmentation technique comparison experiments on WSDream (Cold-start). ND stands for a random node drop.
Graph Augmentation Recall@20 NDCG@20
HD & ED 0.6426 0.5845
HD & ND 0.5958 0.4977
ED & ED 0.6105 0.5698

V-E Sensitivity Study on the Number of Graph Layers (RQ4)

To understand how the different setting influenced the effectiveness of the QAGCL, we varied the number of graph layers of our methods. When the number of graph convolution layers L𝐿Litalic_L is 2, the best performance is shown for Recall@40 and NDCG@40. However, in the case of Recall@20, it is best when L=2𝐿2L=2italic_L = 2. And when L=4𝐿4L=4italic_L = 4, recommendation performance tends to drop. Through this, the best performance is shown with an optimal L𝐿Litalic_L.

TABLE VI: Sensitivity analysis of the number of graph layers for WSDreadm (Cold-start)
L𝐿Litalic_L Recall@20 NDCG@20 Recall@40 NDCG@40
1 0.5394 0.4718 0.7223 0.5303
2 0.6524 0.5783 0.8109 0.6318
3 0.6426 0.5845 0.8300 0.6450
4 0.6342 0.5756 0.8274 0.6404

VI Threads to Validity

The threat to construction validity lies in the data preprocessing step to construct the bipartite graph. For graph construction, it is assumed that there is an interaction between a user and a service with a lower response time based on the threshold γ𝛾\gammaitalic_γ. Classifying whether or not there is an interaction as 1 or 0 may limit the rich use of web service QoS data. To overcome this limitation, we will design the method using bipartite graphs with weights in the future.

The threat to internal validity is not using other geolocation-based recommendation system techniques as a comparison model. However, the goal of our study is to explore how contrastive learning by augmenting the graph is an effective design for web service recommendation. Thus, we focused on comparing with contrastive learning techniques, and plan to conduct additional experiments and compare various distance-based models in the future.

VII Conclusion and Future Work

In this paper, we proposed the QoS-aware graph contrastive learning (QAGCL) model for web service recommendation. Our model addressed the limitations of CF methods by incorporating graph contrastive learning, contextual augmentation, and random edge dropout. Through extensive experiments, we demonstrated the effectiveness of the QAGCL model in improving web service recommendation accuracy. By leveraging graph contrastive learning, our model effectively handled the cold-start problem and data sparsity issues, providing more accurate recommendations. The incorporation of contextual augmentation, including geographical information, allowed for a broader perspective of user-service interactions. The results of our experiments showed that the QAGCL model outperformed several existing models in terms of recommendation accuracy.

Future work could explore further enhancements to the QAGCL model, such as incorporating additional contextual information or exploring different techniques for graph augmentation and contrastive learning.

References

  • [1] L.-J. Zhang, J. Zhang, and H. Cai, “Services computing. 2007.”
  • [2] Q. Duan, Y. Yan, and A. V. Vasilakos, “A survey on service-oriented network virtualization toward convergence of networking and cloud computing,” IEEE Transactions on Network and Service Management, vol. 9, no. 4, pp. 373–392, 2012.
  • [3] S. H. Ghafouri, S. M. Hashemi, and P. C. K. Hung, “A survey on web service qos prediction methods,” IEEE Transactions on Services Computing, vol. 15, no. 4, pp. 2439–2454, 2022.
  • [4] M. Tang, X. Dai, B. Cao, and J. Liu, “Wswalker: A random walk method for qos-aware web service recommendation,” in 2015 IEEE International Conference on Web Services, pp. 591–598, 2015.
  • [5] K. Lee, J. Park, and J. Baik, “Location-based web service qos prediction via preference propagation for improving cold start problem,” in 2015 IEEE International Conference on Web Services, pp. 177–184, 2015.
  • [6] D. Ryu, K. Lee, and J. Baik, “Location-based web service qos prediction via preference propagation to address cold start problem,” IEEE Transactions on Services Computing, vol. 14, no. 3, pp. 736–746, 2018.
  • [7] J. Choi, J. Lee, D. Ryu, S. Kim, and J. Baik, “Gain-qos: A novel qos prediction model for edge computing,” Journal of Web Engineering, vol. 21, no. 1, pp. 27–51, 2022.
  • [8] Z. Zheng, H. Ma, M. R. Lyu, and I. King, “Collaborative web service qos prediction via neighborhood integrated matrix factorization,” IEEE Transactions on Services Computing, vol. 6, no. 3, pp. 289–299, 2012.
  • [9] T. Yu, Y. Zhang, and K.-J. Lin, “Efficient algorithms for web services selection with end-to-end qos constraints,” ACM Transactions on the Web (TWEB), vol. 1, no. 1, pp. 6–es, 2007.
  • [10] R. Phalnikar and P. A. Khutade, “Survey of qos based web service discovery,” in 2012 World Congress on Information and Communication Technologies, pp. 657–661, IEEE, 2012.
  • [11] Y. Zhang, Z. Zheng, and M. R. Lyu, “Wsexpress: A qos-aware search engine for web services,” in 2010 IEEE International Conference on Web Services, pp. 91–98, 2010.
  • [12] Z. Zheng, L. Xiaoli, M. Tang, F. Xie, and M. R. Lyu, “Web service qos prediction via collaborative filtering: A survey,” IEEE Transactions on Services Computing, 2020.
  • [13] H. Shin, S. Kim, J. Shin, and X. Xiao, “Privacy enhanced matrix factorization for recommendation with local differential privacy,” IEEE Transactions on Knowledge and Data Engineering, vol. 30, no. 9, pp. 1770–1782, 2018.
  • [14] Y. He, C. Wang, and C. Jiang, “Correlated matrix factorization for recommendation with implicit feedback,” IEEE Transactions on Knowledge and Data Engineering, vol. 31, no. 3, pp. 451–464, 2019.
  • [15] Z. ur Rehman, O. K. Hussain, F. K. Hussain, E. J. Chang, and T. S. Dillon, “User-side qos forecasting and management of cloud services,” World Wide Web, vol. 18, pp. 1677–1716, 2015.
  • [16] M. S. Hossain, “Qos in web service-based collaborative multimedia environment,” in 16th International Conference on Advanced Communication Technology, pp. 881–884, 2014.
  • [17] M. Chen, Y. Ma, B. Hu, and L.-J. Zhang, “A ranking-oriented hybrid approach to qos-aware web service recommendation,” in 2015 IEEE International Conference on Services Computing, pp. 578–585, IEEE, 2015.
  • [18] Y. Yin, W. Zhang, Y. Xu, H. Zhang, Z. Mai, and L. Yu, “Qos prediction for mobile edge service recommendation with auto-encoder,” IEEE Access, vol. 7, pp. 62312–62324, 2019.
  • [19] S. Wang, Y. Zhao, L. Huang, J. Xu, and C.-H. Hsu, “Qos prediction for service recommendations in mobile edge computing,” Journal of Parallel and Distributed Computing, vol. 127, pp. 134–144, 2019.
  • [20] J. Zhu, B. Li, J. Wang, D. Li, Y. Liu, and Z. Zhang, “Bgcl: Bi-subgraph network based on graph contrastive learning for cold-start qos prediction,” Knowledge-Based Systems, vol. 263, p. 110296, 2023.
  • [21] T. E. Trueman, P. Narayanasamy, and J. Ashok Kumar, “A graph-based method for ranking of cloud service providers,” The Journal of Supercomputing, vol. 78, no. 5, pp. 7260–7277, 2022.
  • [22] Z. Chang, D. Ding, and Y. Xia, “A graph-based qos prediction approach for web service recommendation,” Applied Intelligence, vol. 51, no. 10, pp. 6728–6742, 2021.
  • [23] R. van den Berg, T. N. Kipf, and M. Welling, “Graph convolutional matrix completion,” in KDD, 2017.
  • [24] X. Wang, X. He, M. Wang, F. Feng, and T.-S. Chua, “Neural graph collaborative filtering,” in SIGIR, 2019.
  • [25] X. He, K. Deng, X. Wang, Y. Li, Y. Zhang, and M. Wang, “LightGCN: Simplifying and powering graph convolution network for recommendation,” in SIGIR, 2020.
  • [26] K. Mao, J. Zhu, J. Wang, Q. Dai, Z. Dong, X. Xiao, and X. He, “Simplex: A simple and strong baseline for collaborative filtering,” in CIKM, p. 1243–1252, 2021.
  • [27] K. Mao, J. Zhu, X. Xiao, B. Lu, Z. Wang, and X. He, “Ultragcn: Ultra simplification of graph convolutional networks for recommendation,” in CIKM, 2021.
  • [28] J. Choi, J. Jeon, and N. Park, “LT-OCF: Learnable-time ode-based collaborative filtering,” in CIKM, 2021.
  • [29] L. Chen, L. Wu, R. Hong, K. Zhang, and M. Wang, “Revisiting graph based collaborative filtering: A linear residual graph convolutional network approach,” in AAAI, 2020.
  • [30] T. Kong, T. Kim, J. Jeon, J. Choi, Y.-C. Lee, N. Park, and S.-W. Kim, “Linear, or non-linear, that is the question!,” in WSDM, pp. 517–525, 2022.
  • [31] F. Liu, Z. Cheng, L. Zhu, Z. Gao, and L. Nie, “Interest-aware message-passing gcn for recommendation,” in TheWebConf (former WWW), p. 1296–1305, 2021.
  • [32] J. Choi, S. Hong, N. Park, and S.-B. Cho, “Blurring-sharpening process models for collaborative filtering,” in SIGIR, 2023.
  • [33] J. Yu, M. Gao, J. Li, H. Yin, and H. Liu, “Adaptive implicit friends identification over heterogeneous network for social recommendation,” in CIKM, pp. 357–366, 2018.
  • [34] Y. You, T. Chen, Y. Sui, T. Chen, Z. Wang, and Y. Shen, “Graph contrastive learning with augmentations,” NeurIPS, vol. 33, pp. 5812–5823, 2020.
  • [35] M. Jing, Y. Zhu, T. Zang, and K. Wang, “Contrastive self-supervised learning in recommender systems: A survey,” arXiv preprint arXiv: Arxiv-2303.09902, 2023.
  • [36] J. Wu, X. Wang, F. Feng, X. He, L. Chen, J. Lian, and X. Xie, “Self-supervised graph learning for recommendation,” in SIGIR, p. 726–735, 2021.
  • [37] J. Yu, H. Yin, X. Xia, T. Chen, L. Cui, and Q. V. H. Nguyen, “Are graph augmentations necessary? simple graph contrastive learning for recommendation,” in SIGIR, pp. 1294–1303, 2022.
  • [38] X. Cai, C. Huang, L. Xia, and X. Ren, “LightGCL: Simple yet effective graph contrastive learning for recommendation,” in ICLR, 2023.
  • [39] A. v. d. Oord, Y. Li, and O. Vinyals, “Representation learning with contrastive predictive coding,” arXiv preprint arXiv:1807.03748, 2018.
  • [40] S. Rendle, C. Freudenthaler, Z. Gantner, and L. Schmidt-Thieme, “Bpr: Bayesian personalized ranking from implicit feedback,” in UAI, 2009.
  • [41] X. He, L. Liao, H. Zhang, L. Nie, X. Hu, and T.-s. Chua, “Neural Collaborative Filtering,” in TheWebConf (former WWW), 2017.