The objective of a user-centered recommendation model is to rank positive items higher than the negative items, i.e., giving a higher prediction score to positive items, where the prediction score for an item may not represent the user’s absolute preference on this item, as this prediction score can be influenced by other items in the catalog and the item distribution in the dataset, e.g., popularity. User-centered recommendation models usually only take a user.s historical interactions and learn the general interest of the user, but may not track the users. interest w.r.t. a specific given item as time goes by. To achieve accurate item-centered recommendations, there are two things that should be taken into consideration: (i) the prediction of the model should be appropriate and meaningful for ranking users for a given item; and (ii) apart from a user’s historical interactions, the recommendation model should also take the given item as input and be able to track user’s interests or habits w.r.t. the given item as time goes by.
5.1.1 Model.
Recall from Section
3.1 that the Expl-
RNPR task is to find the top-
k explore users who are most likely to purchase a given item
i. To address the Expl-
RNPR task, we propose a habit-interest fusion (HIF) model, which leverages frequency information to model category-level repetition and exploration habits and pre-trained item representations to model user’s interests. Figure
1 illustrates the architecture of the HIF model.
Pre-trained embedding. In the context of NLP, the skip-gram framework [
29,
30] has been proposed to learn word representations via predicting the surrounding words within the context. Several recent publications [
5,
12,
42] leverage skip-gram techniques to learn item/product representations in an e-commerce scenario. In this article, we assume that the items within the same basket share similar semantics and use basket-level skip-grams to derive the embeddings of items. We regard a particular item as a target item
\(i \in I\) and regard the other items in the same basket as context items
\(i^{\prime } \in I_b^i\). Then, the learning objective is to maximize the following function:
where
\(p(i^{\prime } \mid i)\) denotes the probability of observing a context item
\(i^{\prime } \in I_b^v\) given the current/target item
i. It is defined by a softmax function:
where
\(Emb_i\) and
\(Emb_{i^{\prime }}\) are vector representations of the current item
i and the context item
\(i^{\prime }\), respectively.
M represents the number of items in the item catalog. After pre-training on historical data, we can get a vector representation (a.k.a. embedding) of each item.
Interest module. Suppose that a user
u has a sequence of historical baskets
\(S^h = \lbrace B^1, B^2, \ldots , B^t\rbrace\). We first get pre-trained item embeddings
\(Emb_i\) for each item
i within each basket
\(B^t\). Note that baskets may have different sizes, so we aggregate item embeddings within the same basket by a pooling strategy (max pooling or average pooling) to generate the basket representation
\(\mathit {Emb}_b^t\) at each timestamp
t. Given the target item
i we want to recommend, we compute the cosine similarity
\(\mathit {Sim}_{u, i}^t\) between its embedding
\(\mathit {Emb}_i\) and basket embedding
\(\mathit {Emb}_b^t\) at each timestamp and then get the similarity vector
\(Sim_{u, i}\), which reflects user’s interests in the given item
i across different timestamps. That is:
To model users’ dynamic interests, we introduce two types of time-aware weight embeddings, i.e., (i) a category-specific time-aware weight embedding
\(\mathit {TW}_e^c\), which can only be trained by the samples of the corresponding category
c, and (ii) a global time-aware weight embedding
\(\mathit {TW}_e^g\), which is shared across categories and can be trained by all training samples.
5 For a given item
i and user
\(u\in U_i^\mathit {expl}\), we compute the dot products of the similarity vector
\(\mathit {Sim}_{u, i}\) and two time-aware weight embeddings, i.e.,
\(\mathit {TW}_e^c\) and
\(\mathit {TW}_e^g\), to get time-aware interests features, i.e.,
\(\mathit {SimF}^c_{u, i}\) and
\(\mathit {SimF}^g_{u, i}\). Finally, we concat
\(\mathit {SimF}^c_{u, i}\) and
\(\mathit {SimF}^g_{u, i}\) with a trainable category embedding
\(\mathit {Emb}_c^\mathit {inte}\) to get a hybrid representation, which will be fed into a two-layer fully connected network to get the final interests score
\(\mathit {Score}_{u, i}^\mathit {inte}\), that is:
Habit module. In the Expl-RNPR task, we aim to find possible explore users for a given item. However, there are no direct interactions between the given item i and explore users \(U_i^\mathit {expl}\), so we cannot directly model explore users’ habits w.r.t. the item i. In the grocery-shopping scenario, every item belongs to a category, and a category can contain many items. We notice that if an item i will be purchased by the user \(u\in U_i^\mathit {expl}\), then it indicates that the user u will purchase and explore the items in category \(c^i\) in the next period. Therefore, we aim to model users’ repetition and exploration habits w.r.t. the target category \(c^i\) of the given item i.
The users’ repetition habits within a category can be dynamic across time. Besides, the purchase frequency within a category can also indicate demands of the user. Specifically, to capture the user’s repetition habits, we create a category-level repetition frequency vector
\(\mathit {RepVec}\) for category
\(c^i\in C\) for the user
u by considering both temporal information and frequency information. That is,
where
\(I^{c^i}\) is the item set within category
\(c^i\);
\(B^t\) is a set of items (basket) that user
u purchased at timestamp
t. Note that the square root operation is applied to address the problem of varying sizes of baskets in recommendation systems. By taking the square root, the impact of baskets that are too large is reduced, leading to more equitable and balanced frequency information. Then, we derive time-aware category repetition feature
\(\mathit {RepF}^c_{u, c^i}\) and global repetition feature
\(\mathit {RepF}^g_{u, c^i}\) as follows:
where
\(\mathit {TW}_\mathit {rep}^c\) and
\(\mathit {TW}_\mathit {rep}^g\) are a category time-aware weight embedding and a global time-aware weight embedding, respectively, for modeling repetition behavior.
Note that the user might be loyal to a specific item [
42] and uninterested in exploring new items within the same category, e.g., someone might only purchase a specific brand of milk. To model a user’s exploration habits within a category, we also create an exploration frequency vector
\(\mathit {ExplVec}_{u, c^i}\) considering the temporal orders, that is:
where
\(B_\mathit {expl}^t\) is a set of
explore items (new items) that the user
u purchased at timestamp
t.
Similarly, we compute the category exploration feature
\(\mathit {ExplF}^c_{u, c^i}\) and global exploration feature
\(\mathit {ExplF}^g_{u, c^i}\) as follows:
where
\(\mathit {TW}_\mathit {expl}^c\) and
\(\mathit {TW}_\mathit {expl}^g\) are the category time-aware weight embedding and the global time-aware weight embedding, respectively, for modeling exploration behavior. Finally, we concatenate repetition features, i.e.,
\(\mathit {RepF}^c_{u, c^i}\) and
\(\mathit {RepF}^g_{u, c^i}\), exploration features, i.e.,
\(\mathit {ExplF}^c_{u, c^i}\) and
\(\mathit {ExplF}^g_{u, c^i}\), and a trainable category-specific embedding
\(\mathit {Emb}_c^\mathit {hab}\) to get a feature vector, which will be fed into a two-layer fully connected network to get the habit score
\(\mathit {Score}_{u, i}^{hab}\). That is,
Fusion. We compute the fusion score by:
5.1.2 Training.
In a conventional
user-centered scenario, a recommendation model is optimized based on a user-wise loss, which is computed based on all items for each user. Since we focus on
item-centered recommendations to rank users for the given item, we propose an item-wise ranking loss to train our model. Specifically, positive users and negative users are sampled for each item, and then the training objective is to minimize the following loss function:
where
\(\mathit {Score}_{\mathit {pos}, i}\) and
\(\mathit {Score}_{\mathit {neg}, i}\) represent the predicted fusion scores for positive users and negative users, respectively. By minimizing the proposed item-wise ranking loss, the model will maximize the difference in predicted preference (fusion) scores between the positive and negative users, such that positive users are ranked higher in the predicted user ranking list. Even though the definition of item-wise ranking loss is straightforward, we identify two major issues w.r.t. the training process of the Expl-
RNPR model using item-wise ranking loss.
First, as illustrated in Figure
2(a), a typical positive sample for Expl-
RNPR is an
explore user who only purchased the given item
i in the last period of the historical sequence. However, as shown in Table
3, items have a small number of such positive samples (i.e., new users) if we only consider the last period of the historical sequence. Therefore, we need to augment the positive samples, i.e., include more users who explore the target item for the first time. According to an intuitive reading of the Expl-
RNPR task, we should not select a
repeat user who has already purchased the given item as a positive training sample for the given item, since Expl-
RNPR is targeting
explore users. However,
repeat users of the target item should have a sub-sequence of interactions, i.e., a basket sequence before their first purchase, that could be regarded as a positive sample for Expl-
RNPR training (shown in Figure
2(a)).
Second, a typical negative sample for Expl-
RNPR is an
explore user who did not purchase the given item
i in the last period sequence. However, as illustrated in Figure
2(b), if a user
u is a new user of a given item
i, i.e.,
\(i\in I_u^n\), the user has not purchased this item in previous interactions, i.e.,
\(i\notin I_u^h\), and this means that the user should also be regarded as a negative sample for the item
i during the training process. In this case, when we use a leave-one/few-out splitting strategy to construct a historical (training) dataset and a future (test) dataset, the positive samples (i.e., the ground-truth) in the test set will be the negative sample in the training set, even though they may share a long overlap between two input sequences. To avoid the negative impact of this case, we propose a negative sample adjustment strategy, which eliminates the potential overlap between positive and negative sequences by truncating a sub-sequence from the original negative samples. Note that we perform the truncation action on all negative samples, since we do not know which one is the positive sample in the future (test) dataset.