1 Introduction

With the rapid development of mobile applications, users are becoming more inclined to share their experiences and preferences on social media such as Twitter, Instagram, Facebook, and Foursquare. These check-in records can reflect users’ POI migration or POI preferences within the cycle time. Using such check-in records to provide POI recommendation services for users has become an important task [4, 5, 14, 15].

Recently, successive POI recommendation work has mostly considered three areas. First, the relationship between POIs(POI-poi) [1, 6,7,8, 12, 13], according to users, usually tends to go to the POI that is closer to them. In the past, word2vec was usually used to map geographically close POIs closer in Euclidean space, but it could not address the asymmetry of space. For example, as shown in Figure 1, users tend to go to the gym first instead of the restaurant. The relative distance between the gym and the restaurant is the same. However, from the user’s point of view, when the order of the two changes, the probability of the latter being visited will change greatly. Second, regarding the relationship between POI and time(POI-time) [6, 7, 9, 24], according to the periodic characteristics of users’ access to POIs, most of the past work only predicts where the user will go, ignoring how long it will be before this takes place. This deficiency leads to most models not being applicable in real-life scenarios. For example, if the next predicted behavior of the user occurs in the dataset a week or even a few months later, it is difficult to accept the next POI recommended to the user. Third, regarding the relationship between the POI and user (POI-user) [3, 10, 11, 13], previous studies typically use collaborative filtering to compare the similarity of users or items before providing recommendations, but this approach cannot solve the data sparsity problem.

Figure. 1
figure 1

Example of user check-in history on mobile-based social media platforms.

In this paper, we model a user’s behavior sequence within a day, which is a more time-sensitive task. However, this involves two major challenges: 1) Time mode. In addition to understanding the preference variance, the task is to understand the changes in users’ interests and preferences in a day’s behavioral patterns to make more effective suggestions. Therefore, determining the POI influence of a user’s check-ins in a day is the key to making accurate recommendations. 2) High sparsity. The user’s check-in records in one day are very sparse. In the case of sparse check-in data, it is very challenging to accurately capture the user’s preferences and recommend a POI for the user to visit in subsequent hours.

Several successive POI recommendation methods take into account the time impact of user login to better capture user preferences. However, these methods cannot capture sequence information from users’ check-in behavior because they model only the relationship between two consecutive POIs. Recently, recurrent neural networks(RNNs) have shown good performance in modeling the correlation of sequential information between POIs recently checked in by users. It is a good choice to use a recurrent neural network to model a user’s sequential check-in data. However, this still has problems of little adaptability to data sparsity and a lack of accurate mining ability for the above time patterns, and its serial input data method needs many resources to train the model. The goal of our work is to properly capture the user’s daily cycle patterns. To this end, we model the user’s check-in sequence within a day and learn the influence of each POI in the sequence to determine where the user will go next in the day.

To change the traditional serial input data and address the problem of the training model taking too long, InfAM geocodes the points of interest at which users check in within a day. In addition, the multihead attention mechanism focuses on short-term POI sequences to obtain the influence of historical POIs. The sequence is encoded and concatenated with the user embedding to understand the preferences of different users. Through the above methods, we can solve three problems: 1) POI-poi: InfAM can adaptively embed POIs, alleviating the problem of spatial asymmetry in traditional methods, and it can detect the influence of short-term POIs. 2) POI-time: We input data into the model on a daily basis to fully learn the periodicity of the user’s access to the POI and predict where the user will go next in a day, ensuring the timeliness of the sequential POI. 3)POI-user: The user embedding and input sequence coding are concatenated to enable the learning of user preferences directly and effectively.

The contributions of this paper are summarized as follows:

  • We propose InfAM, which can introduce the influence of each POI in short-term sequence fragments.

  • In the experimental evaluation, InfAM achieves state-of-the-art successive POI recommendation performance on real-world datasets.

  • We also show the usefulness of POI influence awareness for successive POI recommendations through qualitative analysis.

The rest of this article is organized as follows. In the second section, we summarize the previous research related to our work. In the third section, we describe InfAM in detail. In the fourth part, we compare InfAM with existing successive POI recommendation model, and analyze the experimental results. Finally, in the fifth section, we put forward concluding remarks about our thesis.

2 Related work

2.1 Conventional point-of-interest recommendation

Conventional POI recommendation mainly focuses on geographic features and temporal factors. Ye et al. [16] developed a method based on the collaborative filtering, using a power-law probability model to capture the geographic impact between POIs, and using the naive Bayes method to implement POI collaborative recommendation based on the geographic impact. Cheng et al. [17] used a multicenter Gaussian model to capture geographic influences and integrated social information and geographic influences into a generalized matrix factorization framework. Lian et al. [14] used the user’s activity area vector and the influence vector of POIs to capture the spatial clustering phenomenon and incorporate it into matrix decomposition. Yuan et al. [18, 19] developed a time-aware point-of-interest recommendation framework using the temporal and spatial effects of check-in data.

2.2 Successive point-of-interest recommendation

Unlike conventional POI recommendation, the successive POI recommendation model attaches importance to the user’s recent check-ins, considers the user’s POI migration in a check-in sequence, and provides users with more accurate POI recommendation. Cheng et al. [2] first proposed the successive POI recommendation problem. Then, on the basis of FPMC [5], local area constraints were added to form the FPMC-LR model. FPMC was originally used to predict the next task. Cheng et al. [2] incorporated geographic influences and used Markov chains to infer user preferences for POIs. The migration of POIs in the user’s check-in sequence was analyzed, and the law of user preference changes and the recent impacts of different check-in POIs on visits to the target POI were obtained.

At present, recurrent neural network (RNN) have also been widely used in successive POI recommendations, and they have shown good performance in mining sequence information. Because the user’s check-in records form a sequence, these sequences imply spatial information, time information, and interaction information between POIs, and this information is fully utilized through the recurrent neural network. For example, Zhao et al. [22] proposed a spatiotemporal recurrent neural network (ST-RNN) to model local spatiotemporal and geographic influences. This method uses specific time conversion matrices at different time intervals and specific distance conversion matrices at different geographic distances to simulate the local time and space background of each layer. Manotumruksa et al. [20] proposed a deep recurrent collaborative filtering framework (DRCF), which uses the dot product of latent factors to model user-POI interactions. The framework extends the Neuf framework to model static and dynamic user preferences. Zhao et al. [21] proposed a new spatiotemporal gated network (STGN), which enhances the long-term memory network and introduces spatiotemporal gates to capture the spatiotemporal relationship between successive check-ins. Convolutional neural networks have also been applied in successive POI recommendation. Among them, Tang et al. [11] proposed a convolutional sequence embedding recommendation model (Caser), which embeds a sequence of recent POIs into an “image” in time and latent space and uses a convolution filter to learn the sequence mode as the local feature of the image.

However, the user has limited energy in a day, so every POI visited in a period of time has an impact on subsequent POI visits. For example, in Figure 1, the user checks in at the gym halfway, which means that the probability of the user continuing to visit POIs that consume energy becomes lower. This is of great significance for us in analyzing user behavior patterns. When there is a sufficient number of user sequences, we can use these user sequences to identify points of interest that have a greater influence on user behavior and user behavior patterns. This information can help us better recommend subsequent POIs to users.

Although it is now possible to mine sequence information from the user’s recent check-ins, it is still not known how to mine the time pattern accurately. As we know, successive POI recommendation is a time-sensitive task, since different POIs have different popularities at different timeslots [25]. For example, a user’s demand at evening is usually a hotel, while also show the minimum cycle of popularity is one day, i.e., periodicity. To better capture the behavior patterns of users visiting points of interest during the time period, our research focuses on time patterns. Unlike previous work, the data are processed as a sequence with a period of days and input into the model to improve the timeliness of model prediction.

In this paper, our goal is to capture the influence of POIs in a continuous sequence and capture the user’s behavior characteristics in the cycle time. In particular, we use multihead attention mechanism to mine the influence of POIs in user behavior patterns from multiple perspectives and improve the accuracy of the recommendation results.

3 Our approach

In this section, we describe thel InfAM model in detail. As shown in Figure 2, we first define the successive POI recommendation task. Second, we describe how the coding layer learns the influence of the points of interest in the sequence. Finally, we discuss the InfAM model, which predicts the follow-up behavior of users from the influence of historical points of interest.

Figure. 2
figure 2

Architecture of InfAM

3.1 Problem Statement

Let U and L denote a set of users and a set of POIs in the dataset, respectively. Unlike conventional POI recommendation models, successive POI recommendation models recommend POIs based on the recent check-in history of users. When the previous T check-ins \(\left( l_1,l_2,...,l_T \right)\) of user \(u\in U\) are given, successive POI recommendation models predict the POI \(l_{T+1}\) of the (T+1)-th check-in \(l_{T+1}\) of user u. This task is formulated as follows:

$$\begin{aligned} \hat{y}_i=f\left( l_i \mid l_1,l_2,...,l_T,u \right) \end{aligned}$$
(1)

where \(\hat{y}_i\) is the score of a POI \(l_i\in {L}\) computed by the recommendation models. Finally, the successive POI recommendation models generate a list of POIs ranked by the computed scores \(\hat{y}\). Like the conventional successive point of interest recommendation model, this model uses the check-in history \(I=\left( l_1,l_2,...,l_T \right)\) as the user’s input to capture the user’s behavior pattern.

3.2 Influence-aware encoder

3.2.1 Geographical encoder

The sequence data composed of the user’s check-in data contain the user’s behavior patterns in a day and the location information of the POIs used. To determine the impact of the spatial asymmetry in conventional successive POI recommendation, we adopt geographic location coding to form the embedding vector of the POIs. This method can enable diverse embeddings of POIs. After adding the relative position information between the points of interest, the same POI can also obtain different embeddings at different positions in a sequence with this method. The different embeddings allow our model to better process data in parallel and perceive influence.

The geographic location coding method we use ensures that each geographic location has a unique geographic location code, and the relationship between two locations can be modeled by the affine change between their location codes. Similar to other sequence transduction models, we use the learned embeddings to convert the input tokens and output tokens to vectors of dimension \(d_{m}\). After applying position coding, the given embedding vector is:

$$\begin{aligned} GE_{\left( i,2t \right) }=\sin \left( i/10000^{2t/d_{m}} \right) \end{aligned}$$
(2)
$$\begin{aligned} GE_{\left( i,2t+1 \right) }=\cos \left( i/10000^{2t/d_{m}} \right) \end{aligned}$$
(3)

where i is the position and t is the dimension. That is, each dimension of the positional encoding corresponds to a sinusoid [15]. We obtain different codes of POIs in different positions and codes of different orders of visits of POIs to alleviate spatial asymmetry and better obtain the influence of POIs in different positions.

3.2.2 Input layer

The input of our model is the sum of the POI geographical encoder and POI embedding. This can contain both the relative position information between the POIs and the position information of the POIs themselves.The formula is as follows:

$$\begin{aligned} I_i=GE_i+l_i \end{aligned}$$
(4)

where \(GE_i\) is the geographical encoder and \(l_i\) is the POI embedding.

3.2.3 Add and normalization

The encoder is composed of a stack of \(N = 4\) identical layers. Each layer has two sublayers. The first is a multihead self-attention mechanism, and the second is a simple, positionwise fully connected feed-forward network. We employ a residual connection around each of the two sublayers, followed by layer normalization. The formula is as follows:

$$\begin{aligned} Sublayers_{i}(X) = LayerNorm(X + SubLayer_{i-1}(X)) \end{aligned}$$
(5)

where X is output of sub-layers.

3.2.4 Multihead attention

To reflect the influence of POIs in a short sequence from many perspectives, we use a multihead attention mechanism to learn in multiple subspaces. Multihead attention expands the model’s ability to focus on different positions. In the above example, although each code will be reflected, it may be dominated by the actual POI itself. If we want to know which POIs are most affected by a certain POI, then the multihead attention mechanism of the model will play a role. For the multihead attention mechanism, we have multiple query key-value weight matrix sets, which are related to the number of heads of the multihead attention mechanism. Each of these sets is initialized randomly. After training, each set is used to embed and project the input points of interest into different representation subspaces. The formula is as follows:

$$\begin{aligned} MultiHead\left( I \right) =\left\| \left( head_1,...,head_i \right) W^O \right. \end{aligned}$$
(6)
$$\begin{aligned} head_i = Attention\left( I\right) \end{aligned}$$
(7)

where Q is the query vector,K is the key vector, and V is the value vector.In the experiment, we take Q,K and V as the same value I. With \(W_{i}^{Q}\in R^{d_{m}\times d_k}\), \(W_{i}^{K}\in R^{d_{m}\times d_k}\), \(W_{i}^{V}\in R^{d_{m}\times d_k}\) and \(W^O\in R^{id_v \times d_m}\), we obtain the number of heads as \(i=4\). For each of these we use \(d_k = d_v = \frac{d_m}{i}=64\).

3.2.5 Position-wise feed forward network

The positionwise feed forward network calculates the output of each POI separately, achieving the effect of parallel processing sequences. This layer is composed of two fully connected layers and the activation function ReLu, and its formula is expressed as follows:

$$\begin{aligned} Z=SubLayer\left( MultiHead\left( I \right) \right) \end{aligned}$$
(8)
$$\begin{aligned} q_c=SubLayer\left( ReLu \left( ZW_1+b_1 \right) W_2+b_2 \right) \end{aligned}$$
(9)

where \(q_c\) is the output of the second sublayer, \(q_c\in R^{d_c}\), and Z is the output of the first sublayer.

3.2.6 Output layer

The output layer of InfAM is based on the multilayer perceptron(MLP). To capture users’ general preferences for POIs, we assign a user embedding \(q_u\in R^{d_u}\) to each user. The input \(x\in R^{dc+d_u}\) of the MLP is represented as follows.

$$\begin{aligned} x=\left\| \left( q_c,q_u \right) \right. \end{aligned}$$
(10)

The input is feed-forwarded to the MLP as follows.

$$\begin{aligned} \bar{y}=\sigma \left( W_{o}^{\left( 2 \right) }\left( \sigma \left( W_{o}^{\left( 1 \right) }x+b_{o}^{\left( 1 \right) } \right) \right) +b_{o}^{\left( 2 \right) } \right) \end{aligned}$$
(11)
$$\begin{aligned} \hat{y}_i=\frac{\exp \left( \bar{y}_i \right) }{\sum _{j=1}^{\mid L \mid }{\exp \left( \bar{y}_j \right) }} \end{aligned}$$
(12)

where \(W_o\) and \(b_o\) are the weight matrix and bias term of the MLP, respectively. In this paper, we use two layers for the MLP. The input \(\hat{y}\) is the probability distribution computed by the softmax function. For training InfAM, we employ cross-entropy as our cost function which is represented as follows:

$$\begin{aligned} loss=-\sum _i{y_i\log \left( \hat{y}_i \right) } \end{aligned}$$
(13)

where y is the true distribution represented by a one-hot vector with digit labels. For every training phase, the parameters of our proposed InfAM are updated by minimizing the cost computed by the loss function.

Table 1 Statistics of the datasets

Algorithm 1 illustrates our training procedure for learning the InfAM model.

figure a

3.3 Settings

3.3.1 Dataset preparation

We used data in real-world datasets to evaluate our InfAM model. The Instagram dataset [1, 10] and Foursquare dataset were used. The statistical results of the dataset are summarized in Table 1. The Instagram dataset contains 2,216,631 check-ins, with 78,233 users and 13,187 check-in points. The Foursquare dataset contains 194,108 check-ins at 5,596 points of interest by 2321 users. For each dataset, the first 70% of the check-ins are used as the training set, the last 10% of the check-ins are used as the validation set, and the remaining check-ins are used as the test set.

Table 2 Evaluation results on the Instagram and Foursquare datasets

To produce check-in sequences from datasets, we abide by the following rules: (1) All POIs that are visited within one day belong to the same sequence;(2) For Instagram and Foursquare, we filter out sequences whose length is less than 4. After preprocessing these datasets, Table 1 shows the statistics of the prepared datasets.

3.3.2 Baselines

We compare InfAM with the following state-of-the-art successive POI recommendation models.

  1. (1)

    FPMC [5]: A personalized Markov chain that relies on a personalized transfer matrix is introduced. This allows continuous effects and long-term user preferences to be captured.

  2. (2)

    ST-RNN [22]: Based on an RNN, this method considers the time interval and distance between consecutive check-ins in modeling user behavior patterns.

  3. (3)

    DRCF [20]: Based on an RNN, DRCF uses the dot product of latent factors to model the user-place interactions. The framework extends the Neuf framework to model the static and dynamic preferences of users.

  4. (4)

    Caser [11]: A CNN is used to fit the user check-in sequence. Using horizontal and vertical convolution filters, Caser captures static and dynamic preferences for POIs.

  5. (5)

    InfAM: A multihead attention mechanism is used to fit the user’s sign-in sequence, mainly to capture the influence of points of interest in the sequence and to model the user’s behavior patterns.

3.3.3 Metrics

In this paper, we use Recall@k and the average reciprocal rank (MRR@50) as our evaluation indicators. Recall@k is one of the most commonly used evaluation indicators in point of interest recommendation tasks. When the correct point of interest is among the top_k recommended point of interest, the Recall@k score is high. When k=1, 5, and 10, we report Recall@k. MRR@k is used to evaluate the quality of the list sorted by the recommendation model. When the correct point of interest ranks high in the recommended point of interest list, the MRR@k score is high. When k=50, we report MRR@k.

3.3.4 Details

For a fair comparison, our proposed InfAM and all baseline models use time information as their input. All successive POI recommendation models are trained with a learning rate of 0.01 and Adam optimizer. After using validation set and grid search to find optimal parametres, we fix the number of embedding dimensions to 300, num-heads to 4, and dropout-rate to 0.05 for InfAM on both datasets. To prevent overfitting, we apply normalization and dropout techniques to the output layer. InfAM and baseline models are implemented using the Tensorflow v2.1.0 library. We train and evaluate the model on GTX965M.

4 Results

4.1 Comparison with baselines

The last row of Table 2 reports the performance improvement of our proposed model relative to the best performance of the baseline model. The DRCF and Caser models based on deep learning are better than FPMC. In addition, the evaluation results show that InfAM is better than the successive POI recommendation baseline model in terms of Recall@1, Recall@5, Recall@10 and MRR on both datasets. More specifically, compared with the baseline model on the Instagram dataset, InfAM achieved 10.5%-12.3% and 1.9%-9.7% improvements over MRR@50 and Recall@10, respectively. Through the above experimental comparison, we find that our model is greatly improved in terms of Recall@1 and MRR. The results of this experiment show that we hit the top of the point of interest. This shows that we have better recommendation results with a limited number of recommendations. Our method is more suitable for real application scenarios.

4.2 Effectiveness of each component

To observe the influence of the model depth on our experimental results, we carried out the following experiments and observed the changes in its performance. Scenarios A, B, and C set the value of N to N=1, 2, 3, respectively. From the results shown in Table 3, we draw some conclusions. Comparing N=4, we find that as the level of the InfAM gradually deepens, the experimental results of the model gradually improve. This observation shows that when the number of model layers is certain, the residual network can learn more effective information, which helps to improve the recommendation performance.

Table 3 Ablation test on the Instagram and Foursquare datasets

4.3 POI influence

We use the multihead attention mechanism to examine to the check-in sequence generated by the user in a day to capture the influence of points of interest in the sequence from different perspectives. We visualize the multihead attention weight of each interest point in the sequence in the encoder. Figure 3 shows the visualization results. Different attention heads have different attention weights for the same successive POI sequence. As we expected, we verify that each attention head captures the influuence of POIs in the sequence from different perspectives, which cannot be captured by a single-head attention mechanism. We sampled the user's sequence, and the result is shown in Figure 3. This case shows a typical user behavior pattern. In this sequence, the gym (New York Pilates SoHo), which requires a lot of users' energy, has a much higher weight than the restaurant (Little Italy) and the hotel (The Dominick). This result shows that our proposed InfAM model improves the performance of POI recommendation by capturing the influuence of POIs in the sequence to perceive user behavior patterns.

Figure. 3
figure 3

Example of POI influence

4.4 Short Sequences

InfAM makes recommendations for a user by analyzing the user’s check-in sequence within a day, so we often encounter the problem of short user check-in sequences. Fewer check-in records lead to a lack of information and more serious data sparsity. To verify that our model is more advantageous in the case of short sequences, we extracted sequences of length L = 2, 3, 4, 5, 6 from Instagram to form the short sequence dataset Instagram_short. As shown in Figure 4, our proposed InfAM model outperforms all baseline models on both datasets. InfAM does not use any additional steps to overcome the short sequence problem. The results show that the InfAM point-of-interest influence-aware model can effectively capture short-sequence check-in information.

Figure. 4
figure 4

Example of a short sequence

5 Conclusion

In this paper, we propose InfAM, which is a model for determining the influence of points of interest from multiple perspectives. InfAM captures behavior patterns from the periodic sequence generated by user check-ins. InfAM encodes users’ daily sequences based on a multihead-attention mechanism and residual network. Based on this, the influence of each POI in the sequence is obtained. By fully considering three aspects(POI-poi, POI-user, and POI-time), the performance of our recommendation system is improved. In the evaluation of real data sets, InfAM achieves advanced performance in successive POI recommendation. A comparative experiment shows that our model has better performance in the face of short sequences. Finally, through the visualization of case attention weights, it is shown that analyzing the influence of POIs in a sequence is effective for understanding user sequence information.