1 Introduction
Accurate and large-scale user profiles are required for providing personalized services [
1,
2,
3], such as customized search, recommendations, and advertisements, among others. However, in practice, user profiles are usually unknown and hard to obtain because of privacy settings. Therefore, user profiling, which aims to infer individual personality traits from user-generated data, is significant for real-world applications. In this article, we focus on inferring user profiles based on their spatiotemporal mobile app usage behavior. In particular, compared with other data sources, mobile app usage data has the following three advantages for the task of user profiling. First, the prevalence of smartphones makes it possible to automatically collect large-scale and fine-grained mobile app usage data for service providers [
4,
5,
6]. The large-scale dataset allows us to adopt more advanced models, such as deep neural networks, to improve the robustness and accuracy of user profiling. Second, users choose which apps to use based on their individual needs and preferences, which are heavily influenced by their personality attributes [
7,
8], including gender, income, age, and occupation, among others. Hence, users’ mobile app usage behavior can correspondingly reveal their profiles. Third, the mobile app usage behavior also contains rich spatiotemporal features of users—that is, location and time information of app usage records. Such spatiotemporal features are also helpful for inferring user personality traits.
Previous studies in the scope of mobile app usage behavior analysis for user profiling can be characterized into two types:
\((1)\) descriptive analysis, where researchers apply statistical methods to describe how user profile traits, such as gender, affect their app usage behavior [
9], and
\((2)\) predictive analysis, where researchers recognize distinct patterns from mobile app usage traces and use classification models, such as
Support Vector Machine (SVM), to predict users’ profile labels [
10]. Nevertheless, previous studies have the following two limitations. First, previous studies principally rely on handcrafted features [
10,
11]. They empirically defined descriptive rules based on small-scale datasets, lacking generalization capability when dealing with large-scale and noisy mobile app usage datasets. Second, previous studies did not explore the spatiotemporal features of mobile app usage behavior [
7,
9]. They only considered app adoptions of mobile users (i.e., what apps were used) while ignoring where and when. Such a single type of input data limits the performance of existing app usage behavior-based user profiling models.
Alternatively, in recent years, graph-based representation learning, such as
Graph Convolutional Networks (GCNs), shows great potential for automatic behavior profiling [
12]. Recent studies have shown that a graph data structure can provide a general representation to integrate multiple types of data [
13,
14,
15]. Using a graph structure to represent spatiotemporal mobile app usage behavior can overcome the limits of previous studies. Introducing user, app, location, and time nodes into a graph can encode the spatiotemporal features of mobile users’ app usage behavior. Then, by using the embedding of user nodes, we can provide individual-level user profiling.
Consequently, the combination of spatiotemporal mobile app usage data and graph-based representation learning is promising for the task of user profiling. However, three unique challenges arise in achieving this goal:
(1)
Each mobile app usage record involves four types of entities. Thus, the app usage graph is heterogeneous and has four node types: users, apps, locations, and time. Because different node types have different semantics, the ability to distinguish neighbor node types and select informative neighbors is required in the graph model.
(2)
Generally, mobile users use many apps and access a large number of locations. Therefore, app and location nodes will have dense connections with user nodes in the app usage graph. Such dense connections among nodes will cause severe neighborhood expansion and oversmoothing issues for graph-based representation learning methods [
16].
(3)
Spatiotemporal mobile app usage data carry various relations among users, apps, locations, and time. Therefore, the app usage graph has multiple relational edges: user-app, user-location, user-time, app-time, app-location, and location-time edges. The app usage graph is undirected. Different relational edges have different semantics. Therefore, fusing the diverse semantic information into node representations is also challenging.
To overcome the preceding challenges, we propose a new framework, named Multi-Relational Heterogeneous Graph Attention Network (MRel-HGAN), to infer user profiles from their spatiotemporal mobile app usage behavior. First, to cope with the heterogeneity of mobile app usage graph, we leverage a relational graph convolutional operation consisting of relation-specific propagation and aggregation phases, which can distinguish the types of neighbors during operations and learn multiple relation-specific representations (with different semantics) for a single node. Second, to solve the issue of a high density of the app usage graph, we design a neighbor sampling strategy that samples strongly correlated neighbors of a fixed size for each node. The sampling operation can make the graph sparse and mitigate the issues of neighborhood expansion and oversmoothing. Third, to fuse the different semantic information from multiple relational edges, we leverage a multi-relational attention operation to learn the importance of each relation-specific representation and assign proper weights to them. By doing so, for each node in the mobile app usage graph, we fuse its multiple relation-specific representations into one feature vector.
In summary, we present the main contributions as follows:
•
We introduce a promising graph learning based framework to the problem of user profiling based on spatiotemporal mobile app usage data. By exploring the co-occurrence of users, locations, time, and apps in usage records, we construct a multi-relational heterogeneous mobile app usage graph. We also extract node features by utilizing side information, such as app category and Point of Interest (POI).
•
We develop MRel-HGAN to learn the node embeddings of the mobile app usage graph. By employing a relational graph convolutional operation and multi-relational attention operation, MRel-HGAN can adequately leverage the multi-relational graph structure, and heterogeneous node features to label user profiles.
•
We conduct extensive experiments based on large-scale real-world mobile app usage datasets. The experimental results exhibit the superiority of MRel-HGAN over the State-of-the-Art (SOTA) models for the task of user profiling for attributes of gender and age.
We present the preliminaries of user profiling from users’ spatiotemporal app usage behavior in Section
2. In Section
3, we detail how to construct the heterogeneous app usage graph and determine the initial features of vertexes. In Section
4, we elaborate on the network design of our proposed MRel-HGAN. We then evaluate MRel-HGAN by comparing it with other SOTA models in Section
5. Related work and study limitations are presented in Section
6. We conclude the article briefly in Section
7.
4 Representation Learning for Users
This section formally presents the design of MRel-HGAN and shows how to employ MRel-HGAN to learn representations of user nodes. Precisely, MRel-HGAN consists of three parts: (1) heterogeneous neighbor sampling, (2) relational graph convolutional operation, and (3) multi-relational attention operation.
4.1 Neighbor Sampling
The critical idea of
Graph Neural Networks (GNNs) is to aggregate features from a node’s neighbors [
28]. Typically, the computation of GNN is carried out in two phases: (1) the message passing phase and (2) the aggregating and updating phase. Specifically, in the message passing phase, a node passes its representation vector to its first-order neighbors. In the aggregating and updating phase, a node first aggregates the received representation vectors with its own representation. Then, the node updates its own representation vector with the aggregated one. By increasing the number of network layers, each node can incorporate information from higher-order neighbors and thus learn richer node features. For example, as shown in Figure
3(b), in layer 1, nodes
C,
D,
F will pass their representation vectors to their first-order neighbors (e.g., node
E). In layer 2, node
E will then pass the aggregated representation to node
D. In this way, node
D can incorporate the feature information from its second-order neighbors (i.e., nodes
C and
F). However, applying this approach to a mobile app usage graph may raise several issues:
•
Neighborhood expansion: For a given node, computing its hidden representation requires considering its first-order neighbors. In turn, its first-order neighbors must consider their own first-order neighbors, and so on. Such a process causes recursive neighborhood expansion by growing with each additional layer. Since mobile users usually use a large number of apps and visit many locations, the mobile app usage graph is large scale and dense. Therefore, the issue of neighborhood expansion will be quite severe for the app usage graph.
•
Oversmoothing: As mentioned earlier, the app usage graph is dense, which could lead to another severe issue—that is, oversmoothing [
16]. The dense connections between nodes make the learned representations indistinguishable, which hurts the profiling accuracy.
•
Various neighbor sizes: Apps and locations have varying popularity, and thus nodes have varying degrees. For example, as one of the most popular apps, Facebook is used by millions of people, whereas some apps only have a few users. Hence, the representations of nodes with high degrees could be impaired by weakly connected neighbors, and nodes with low degrees may not adequately learn their representations.
To solve these issues, we apply a heterogeneous neighbor sampling strategy based on a bootstrapping approach. Specifically, for each node, we randomly sample a fixed-size set of neighbors with probability proportional to the edge weights. Mathematically, we use
\(\hat{\mathcal {N}}(v)\) to denote the sampled fixed-size neighbors of node
v, drawn from the set
\(\lbrace u \in V: e(v,u) \in E \rbrace\) . We then use the neighbors sampled to approximate the aggregation of the total neighbors. For example, as depicted in Figure
3(c), we set the sample size as 2. Therefore, by dropping the edge
\(e(D, A)\) , node
D only passes its representation vectors to nodes
B and
E in layer 1. Correspondingly, in layer 2, node
D will only aggregate the representations from nodes
B and
E as well. We also note that the app usage graph is heterogeneous, having various node types. Moreover, the degree distribution of different node types varies greatly. Thus, for different types of nodes, we implement the sampling strategy separately. In other words, given a node
v, we sample its neighbors of user nodes, app nodes, location nodes, and time nodes with the fixed size of
\(\hat{\mathcal {N}_u}(v)\) ,
\(\hat{\mathcal {N}_a}(v)\) ,
\(\hat{\mathcal {N}_l}(v)\) , and
\(\hat{\mathcal {N}_t}(v)\) , respectively.
The heterogeneous neighbor sampling strategy can avoid the issues mentioned earlier due to two principal reasons. (First, only a small set of the most relevant neighbors with large edge weights are selected, thus mitigating the issues of neighborhood expansion and oversmoothing. Second, for each node, all types of neighbors are collected, and the sample size is fixed by leveraging bootstrapping, which solves the issue of varying node degrees.
4.2 Relational Graph Convolutional Operation
Because of the heterogeneity of nodes, as illustrated in Section
3.2, the nodes with different types have different feature spaces. Hence, we first project the features of different node types into the same space using a type-specific transformation. Mathematically, the projection process can be expressed as follows:
where
\(\boldsymbol {M}_{o_v}\) denotes transformation matrix and
\(o_v\) represents the type of node
v.
\(\boldsymbol {h}_v\) and
\(\hat{{\bf h}}_v\) denote the original and projected feature vectors of node
v, respectively. After the node type-specific projection operation, the downstream operations (e.g., graph convolution) can cope with arbitrary types of nodes.
The app usage graph has different edge types, revealing different relations between nodes and having different semantics. For example, user-app edges can describe the app co-using relationships between users, whereas user-location edges can depict the relationships between users visiting the same location. However, the conventional graph convolutional operation treats graph edges equally and cannot explore the different semantics of various edge types. As a result, it cannot be applied to the mobile app usage graph directly.
In this work, we leverage a relational graph convolutional operation consisting of relation-specific propagation and aggregation phases. For each edge type, we implement a corresponding layer. Therefore, the relational graph convolutional operation on the app usage graph exploits six layers: the
user-app,
user-location,
user-time,
app-location,
app-time, and
location-time layers. These layers use the edges of corresponding relations, which is equivalent to information propagation and aggregation in the respective bipartite subgraphs. Given a node
i, by the relational graph convolutional operation of relation
\(r_e \in R_E\) , we will obtain a learned relation-specific feature vector
\(\boldsymbol {h}^{r_e}_i\) of node
i, which can be calculated as follows:
where
\(\hat{\mathcal {N}}_{r_e}(i)\) denotes the set of sampled neighbors of node
i under relation
\(r_e \in R_E\) ,
\(\hat{\omega }(j,i)\) is the normalized edge weight of edge
\(e(j, i)\) ,
\({\bf W}_{r_e}\) represents the relation-specific transformation weight matrix of relation
\(r_e\) ,
\(\hat{\boldsymbol {h}}_j\) stands for the projected feature vector of node
j, and
\(\sigma (\cdot)\) is an activation function.
Intuitively, (
7) accumulates transformed feature vectors of neighbor nodes through a set of relation-specific edges, which have homogeneous semantics. Due to the symmetry of the app usage graph, all types of nodes have three types of edges. Given an arbitrary node
i, after feeding the node feature into the relational graph convolutional layer, we can obtain a group of relation-specific node feature vectors, denoted as
\(\lbrace \boldsymbol {h}^{r_e^0}_i, \boldsymbol {h}^{r_e^1}_i, \boldsymbol {h}^{r_e^2}_i\rbrace\) . To better understand the relational graph convolutional operation, we briefly explain the processes in Figure
4.
4.3 Multi-Relational Attention Operation
We next fuse together the multiple relation-specific feature vectors to update new features of nodes. Importantly, we need to ensure that the new feature vector of a node can also be informed by the corresponding previous feature vector. Therefore, as depicted in Figure
4, we add a self-loop of a specific relation type to each node. Given a node
i, we can calculate its self-relation feature vector
\(\boldsymbol {h}^{s}_i\) as
where
\({\bf W}_{0}\) is the weight matrix of self-relation and
\(\hat{\boldsymbol {h}}_i\) is the projected feature vector of node
i. In this way, for an arbitrary node
i, it has four relation-specific feature vectors—that is,
\({\bf H}_i = \lbrace \boldsymbol {h}^{r_e^0}_i, \boldsymbol {h}^{r_e^1}_i, \boldsymbol {h}^{r_e^2}_i, \boldsymbol {h}^{s}_i\rbrace\) .
Inspired by the vanilla attention approach [
29], we first use a one-layer
Multi-Layer Perceptron (MLP) with the activation function of
\(\tanh\) to transform relation-specific feature vectors. We then compute the importance of different relation-specific feature vectors by multiplying an attention vector
\(\boldsymbol {c}\) . Given a node
i, the importance of relation
\(r_e \in R_E\) can be calculated as
where
\(\bf W\) is the weight matrix,
\(\boldsymbol {b}\) is the bias vector, and
\(\boldsymbol {c}^{T}\) is the attention vector. Note that we have added the self-relation into the set of
\(R_E\) . Next, we normalize the importance across different relation-specific features with the softmax function. By denoting the normalized weight as
\(\beta _{i, r_e}\) , we can compute
\(\beta _{i, r_e}\) as
The higher the
\(\beta _{i, r_e}\) , the higher the contribution of
\(\boldsymbol {h}^{r_e}_i\) toward the new feature vector of node
i. Therefore, by using the learned weights as coefficients, we can fuse these relation-specific feature vectors and update the new feature vector
\(\boldsymbol {h}^{\prime }_i\) as follows:
In this way, the updated feature vectors of nodes aggregate all semantics hidden in multiple relations.
4.4 User Profiling
Since we formulate user profiling as a multi-label classification task, the last layer of our model is responsible for predicting the labels of users based on the representations of user nodes. Given the set of user profile labels
Y, we employ the softmax function on the users’ representation matrix and obtain
\({\bf Z} \in \mathbb {R} ^{|U|\times |Y|}\) (i.e., the predicted probability distribution of users’ labels). We then adopt the cross entropy as the loss function to carry out end-to-end training for the model. Mathematically, the loss function over all of the users labeled is defined as
where
\(\boldsymbol {y}_u\) and
\(\boldsymbol {z}_u\) are the ground truth and the predicted probability distribution of user
u, respectively. Guided by the labeled data, we can optimize our proposed model through the back-propagation method.
5 Experiments
This section presents extensive experiments conducted on large-scale real-world app usage datasets. We first exhibit the experiment setup, including the datasets, compared baselines, evaluation metrics, and implementation details. We next compare the performance of our model with baselines and discuss the results. Last, to show the effectiveness of modules in our system, we compare several variants.
5.1 Experiment Setup
5.1.1 Data Collection.
To evaluate the system we proposed, we explore two real-world anonymized mobile app usage datasets. One dataset is collected by a Mobile Network Operator (MNO), and the other one is collected by the TalkingData platform.
Dataset Collected by an MNO. The MNO dataset was collected from one of the largest cities in the world, Shanghai, covering 1 week in April 2016. The dataset has more than 10,000 users. In practice, a systematic approach called
SAMPLES is used to identify mobile app usage based on users’ network access data. SAMPLES can build conjunctive rules and detect more than 90% of applications based on a limited collection of manually labeled data samples, obtaining a 99% average accuracy [
30]. To create conjunctive criteria, the operator crawled the 2,000 most popular apps from app stores and manually created data samples. The gathered network access records were then matched with particular applications. According to the ISP’s data, up to 90% of network traffic may be traced to individual apps. Each mobile app usage record includes the following fields: anonymized user ID
u, app ID
a, location ID
l, and timestamp
t. In particular, the locations refer to the associated base stations of users, as the dataset is collected from mobile networks. The profiles of users are gender labels provided by the ISP. The app category information is obtained from app stores [
17].
We also crawled 782,528 POIs of Shanghai via the Baidu Map service to create a POI dataset. There are 15 POI categories: restaurant, hotel, entertainment, industry, residence, education, hospital, fitness center, shopping mall, scenic spot, transportation facility, financial service, life service, corporation & business, and government & organization. The statistics of the dataset are detailed in Table
2.
Dataset Collected by TalkingData. The TalkingData software development kit (SDK),
1 which is embedded into mobile apps and operates in the background, collects the app usage data automatically. Individual users of such apps have given their full recognition and approval. Necessary anonymization has been carried out to safeguard their privacy during collection processes. Each data sample contains a list of the applications being used, the time, the location (latitude and longitude), and a device-specific identification. To maintain spatial consistency with the MNO dataset, we filter out the app usage traces from locations other than Shanghai. Then, by using the nearest base station’s latitude and longitude, we map each position to that one. By doing so, for the TalkingData dataset, we also utilize base station ID information as the location, much like in the MNO dataset. As for TalkingData, the profiles of users are age labels provided by the platform. The statistics of the dataset are detailed in Table
3.
5.1.2 Baselines.
We select 10 models as the baselines to compare with our framework, MRel-HGAN. Specifically, they can be classified in two categories: classic models and graph-based models.
We first introduce four classic models for the user profiling task, which are commonly used in previous studies [
7,
10,
22,
31]. These models take users’ feature vectors as input and output the profile labels of users, which are introduced as follows.
Logistic Regression [
32].
Logistic Regression (LR) is widely used in user profiling. LR uses a logistic function to model the probability of a certain profile label for each user.
Support Vector Machine [
33]. SVM is also widely used to solve classification problems. An SVM model is trained to find the maximum-margin hyperplane separating users of different labels, which is a non-probabilistic linear classifier.
Random Forest [
34].
Random Forest (RF) is a classic ensemble learning method for classification. RF works by constructing a multitude of simple decision trees.
Multi-Layer Perceptron [
35]. The MLP is trained using a supervised learning approach called
back-propagation. The hidden layer size was set to 64 in our case.
As we use the graph structure to model mobile app usage behavior, we compare our method with six SOTA graph-based models. The graph-based models take a graph as input and output the embeddings learned of vertexes by exploring local graph structures and node features.
DeepWalk [
36]. DeepWalk enlarges the word2vec [
37] model to the application of graph representation learning. DeepWalk uses truncated random walks to gather local structural information, then feeds walk pathways into the skip-gram model to learn node embeddings. We set the number of random walks per node to 50, the embedding size to 64, the walk length to 30, and the window size to 10.
Node2vec [
38]. Node2vec applies a biased random walk procedure, controlled by two parameters
p and
q, to produce embeddings of nodes. The parameters add flexibility in exploring neighborhoods. In the experiment, we set
\(p=0.25\) and
\(q=0.25\) .
Metapath2vec [
39]. Metapath2vec employs meta-path-based random walks to cope with the heterogeneity of the graph and leverages the skip-gram model to perform node embeddings. In the experiments, the used meta-path schemes are UAU, UALAU, and UATAU.
Graph Convolutional Network [
40]. GCN is an end-to-end supervised learning algorithm on homogeneous graph-structured data, which performs convolutional operations in the graph Fourier domain. In the experiment, we add a feature projection layer before the conventional GCN model to project the features of heterogeneous nodes into the same feature space.
Graph Attention Network [
41]. The
Graph Attention Network (GAT) is an end-to-end supervised learning algorithm on homogeneous graph-structured data, which uses the attention mechanism for the aggregation operation of node features. Similarly, we add a feature projection layer before the conventional GAT model to project the features of heterogeneous nodes into the same feature space. Additionally, we set the number of attention heads to 4.
Heterogeneous Graph Attention Network [
42]. The
Heterogeneous Graph Attention Network (HAN) performs graph attention operations on heterogeneous graph-structured data by leveraging meta-paths. It first learns meta-path-specific node embeddings from multiple meta-path-based homogeneous graphs and then employs the attention mechanism to combine them. In the experiments, the used meta-path schemes are UAU, UALAU, and UATAU. It is worth noting that the conventional HAN is hard to implement directly on the mobile app usage graph due to the high density of the graph. For example, for the meta-path UALAU, its meta-path-based homogeneous graph is with 12,777 nodes but 110,031,748 edges. Hence, we add our proposed heterogeneous neighbor sampling module before the conventional HAN to make the graph sparse and decrease computational costs.
5.1.3 Implementation Details and Evaluation Metrics.
In the experiments, 80% of labeled users are randomly selected for training, 10% of users are selected for testing, and the remaining composes the validation set used to determine the optimal hyperparameters. In the neighbor sampling procedure, the sizes of sampled neighbor sets are 1,000, 50, 50, and 24, for user
\(\hat{\mathcal {N}_u}\) , app
\(\hat{\mathcal {N}_a}\) , location
\(\hat{\mathcal {N}_l}\) , and time
\(\hat{\mathcal {N}_t}\) , respectively. The dimension of the attention vector
\(\boldsymbol {c}\) is 128. The dimension of node embeddings is 64. In the training procedure, we randomly initialize parameters and use Adam [
43] to optimize the model with an initial learning rate of 0.01.
We adopt four metrics: AUC (area under the curve), PRE (precision), Macro-F1, and Micro-F1, which are generally used in user profiling to evaluate the performance of models.
2 To reduce the variance of results, we train all models repeatedly 10 times and report the averaged evaluation metrics of each model. Additionally, grid search is used to find the optimal hyperparameters of a model.
5.2 Performance Comparisons with Baselines
We first evaluate the classic models on three scenarios: the app-based scenario, location-based scenario, and app-location-joint scenario. For the app-based scenario, we exploit the used apps for gender prediction on the MNO dataset and age group prediction on TalkingData. We treat each app as a dimension and represent the input feature of each user as an app-based vector. For each app, the corresponding value is the normalized frequency of usage. Similarly, for the location-based scenario, each user is represented as a location-based vector to indicate the normalized frequency that the user visits the corresponding location. For the app-location-joint scenario, we jointly explore users’ used apps and visited locations for user profiling. In detail, for each user, we concatenate her or his app-based feature vector and location-based vector together. The performance of classic models for user profiling under three different scenarios is presented in Tables
4 and
5. Additionally, we evaluate graph-based models on the heterogeneous app usage graph and depict the results in Tables
6 and
7. GCN-Sampling and GAT-Sampling refer to the GCN and GAT models with our proposed neighbor sampling operation. From the results, we have the following key observations.
First, MRel-HGAN performs best among all methods, including both classic models and graph-based models. Specifically, as shown in Table
6, MRel-HGAN outperforms the best baseline by 5.26%, 4.38%, 3.99%, and 3.49% in terms of AUC, PRE, Macro-F1, and Micro-F1 on the MNO dataset for predicting users’ gender. In addition, MRel-HGAN outperforms the best baseline by 4.22%, 2.48%, and 2.98% in terms of PRE, Macro-F1, and Micro-F1 on TalkingData for predicting users’ age groups.
Second, compared with visited location information, the used app information is more useful for the task of both gender prediction and age group prediction. As shown in Tables
4 and
5, all classic models’ performance in the app-based scenario is better than that in the location-based scenario. Moreover, classic models perform poorly on the app-location-joint scenario, implying that simple concatenating operation is insufficient for coping with heterogeneous features and cannot explore the hidden relationships between different types of features.
Third, graph-based models generally outperform classic models. The main reason is that the graph structure can capture spatiotemporal app usage behavior across various users very well. By traveling through the local graph structures, graph-based models can learn the hidden relations between users, apps, locations, and time, which are helpful for user profiling.
Fourth, GCN performs the worst for the task of gender prediction compared to other graph-based models. As stated in Section
4.1, the reason may be twofold: oversmoothing and varying node degrees, which can be solved by the neighbor sampling mechanism. By applying our proposed neighbor sampling operation, GCN-Sampling achieves satisfactory performance.
Fifth, GAT outperforms GCN because GAT applies the attention mechanism to automatically estimate the importance of neighbors, which will mitigate the issues of varying node degrees and oversmoothing. In addition, GAT still obtains performance gain from the neighbor sampling mechanism.
Sixth, HAN achieves the best performance among all baselines. This is because HAN formalizes meta-path-based homogeneous graphs. Such a meta-path-based structure can leverage the semantics of different types of edges to enhance the performance of learned embeddings of nodes. However, HAN discards intermediate nodes along the meta-path when constructing meta-path-based homogeneous graphs. Therefore, compared with MRel-HGAN, HAN cannot leverage the node features of intermediate nodes, which leads to information loss and performance degradation.
5.3 Ablation Study
In this section, we compare the performance of MRel-HGAN with the following four variants:
MRel-HGAN(L): MRel-HGAN(L) only uses the subgraph \(G_{ul}\) . The feature of a user is the average embeddings of locations the user has visited.
MRel-HGAN(A): MRel-HGAN(A) only uses the subgraph \(G_{ua}\) . The feature of a user is the average embeddings of apps used by that user.
MRel-HGAN(No-S.): MRel-HGAN(No-S.) does not use the heterogeneous neighbor sampling operation.
MRel-HGAN(No-A.): MRel-HGAN(No-A.) does not use the multi-relational attention operation. Instead, we fuse multiple relation-specific feature vectors of one node by averaging.
The results of MRel-HGAN(L), MRel-HGAN(A), MRel-HGAN(No-S.), MRel-HGAN(No-A.), and MRel-HGAN are presented in Figures
5 and
6. The following can be observed:
(1)
MRel-HGAN(L) performs slightly better than classic models of the location-based scenario for predicting users’ age groups and gender. This is because we utilize POI information as the feature of location nodes in the subgraph \(G_{ul}\) , which improves performance.
(2)
MRel-HGAN(A) outperforms MRel-HGAN(L), implying that the used app information is more valuable than visited location information for predicting gender and age groups, which corresponds to the results of classic models.
(3)
The prediction performance is improved when we introduce the spatiotemporal features into the app usage graph by adding location and time nodes.
(4)
MRel-HGAN(No-S.) performs better than GCN and GAT, demonstrating the effectiveness of relational graph convolutional operation in the heterogeneous graph.
(5)
MRel-HGAN(No-A.) performs better than MRel-HGAN(No-S.). The main reason is that the neighbor sampling operation can overcome oversmoothing issues and various neighbor sizes in the app usage graph.
(6)
MRel-HGAN performs better than MRel-HGAN(No-A.) because the multi-relational attention operation can automatically learn the importance of different relation-specific features for different types of nodes.
The results demonstrate that the modules we design, including heterogeneous neighbor sampling, relational graph convolutional operation, and multi-relational attention operation, are necessary to integrate the multi-relational data in the heterogeneous network.
7 Conclusion
The problem of user profiling based on spatiotemporal app use behavior was investigated in this article. We proposed MRel-HGAN, a graph learning based model that integrates users, apps, locations, and time entities into a single low-dimensional latent space. By applying a bootstrapping-based heterogeneous neighbor sampling strategy, MRel-HGAN can overcome the issue of oversmoothing caused by the high density of the mobile app usage graph. We then designed a relational graph convolutional operation and a multi-relational attention operation to explore the rich semantic information of various relations among apps, users, locations, and time. MRel-HGAN outperforms SOTA baselines for user profiling in experiments conducted on large-scale real-world datasets. Additionally, we verified the effectiveness of components in MRel-HGAN. This study opens the way for a slew of app usage related applications, including personalized app recommendations, app usage analysis, and app service optimization.