Expected Transaction Value Optimization for Precise Marketing in FinTech Platforms
Abstract.
FinTech platforms facilitated by digital payments are watching growth rapidly, which enable the distribution of mutual funds personalized to individual investors via mobile Apps. As the important intermediation of financial products investment, these platforms distribute thousands of mutual funds obtaining impressions under guaranteed delivery (GD) strategy required by fund companies. Driven by the profit from fund purchases of users, the platform aims to maximize each transaction amount of customers by promoting mutual funds to these investors who will be interested in. Different from the conversions in traditional advertising or e-commerce recommendations, the investment amount in each purchase varies greatly even for the same financial product, which provides a significant challenge for the promotion recommendation of mutual funds. In addition to predicting the click-through rate (CTR) or the conversion rate (CVR) as in traditional recommendations, it is essential for FinTech platforms to estimate the customers’ purchase amount for each delivered fund and achieve an effective allocation of impressions based on the predicted results to optimize the total expected transaction value (ETV). In this paper, we propose an ETV-optimized customer allocation framework (EOCA) that aims to maximize the total ETV of recommended funds, under the constraints of GD dealt with fund companies. EOCA consists of two phases: a prediction phase of the customer purchase amount followed by a constrained allocation phase. Specifically, we propose an entire space deep probabilistic model with a novel-designed loss function to predict the purchase amount when a promotional fund is exposed to a user, which involves not only the conversion rate prediction but also the post-conversion purchase amount estimation. Based on the predicted ETV, we design a heuristic algorithm to solve the large-scale constrained combinatorial optimization problem to suggest which fund each user should be exposed to in order to maximize the total purchase amount. To the best of our knowledge, it’s the first attempt to solve the GD problem for financial product promotions based on customer purchase amount prediction. We conduct extensive experiments on large-scale real-world datasets and online tests based on LiCaiTong, Tencent’s wealth management platform, to demonstrate the effectiveness of our proposed EOCA framework.
1. Introduction
There are more than 600 million people who invest funds in China 111https://www.amac.org.cn/researchstatistics/report/tzzbg. With the emergence of digital payments, Chinese individual investors prefer purchasing mutual funds on Internet platforms instead of traditional financial institutions. Therefore, FinTech platforms that cooperate with fund companies for mutual fund distribution via mobile Apps are developing rapidly such as TianTian, Ant Group, and Tencent LiCaiTong (Hong et al., 2019).
Data-driven customer allocation is a widely used growth hacking manner in FinTech platforms for user growth or customer acquisition operations. Generally, financial marketing operations require to allocate customers to different fund type preference groups according to their profiles and historical behavior. FinTech platforms continuously deliver promotional operations about a certain type of funds to the allocated users for precise marketing and aim to boost the platform’s AUM. Figure 1 provides an example of promotion in a financial distribution platform. Due to commercial considerations and the requirements of fund companies, each fund type should obtain a certain number of users with a higher risk tolerance than the risk level of the corresponding financial product type. In other words, for types of financial products such as Bond Funds and Money Funds, users need to be divided into k groups, and the number of users in each group is predetermined. This leads to a large-scale allocation problem that requires the allocation framework to maximize the total transaction value under multiple constraints. To achieve the maximum transaction value, i.e. the total customers’ purchase amount, it’s essential for the allocation framework to accurately estimate the customers’ expected transaction value (ETV) for each type of fund. In the FinTech platform, since users can freely determine the amount of investment, the transaction value of each successful conversion varies greatly even for the same fund. Nevertheless, for conventional e-commerce or online advertising recommendation, the expected revenue of a single item display can be calculated only by accurately predicting the click rate/conversion rate since the price of the product or the bid of the advertiser are predetermined. Meanwhile, little literature has studied the problem of predicting customer purchase amount for industrial applications. Although there have been some efforts that use deep learning models for financial product recommendation based on CTR/CVR prediction (Huang et al., 2020; Zheng et al., 2021), we argue that the CVR-based allocation strategies may result in sub-optimal total ETV since they don’t optimize the ETV directly. It is non-trivial to predict the transaction amount in financial commerce due to the significant zero value inflation and high variance of the transaction amount. Despite Mean Squared Error (MSE) has become the de facto standard loss function for continuous value prediction problems, it relies on the homoskedasticity assumption which assumes different samples have the same variance. However, the samples in FinTech platforms are heteroscedastic because the investment intention and available capital vary greatly in regard to different user-fund pairs. Although the previous literature proposed a deep learning method with zero-inflated lognormal (ZILN) loss that accommodates zero-inflated distribution for customer lifetime value (LTV) prediction (Wang et al., 2019), it is not sufficient to meet the needs of growth marketing in actual business. To give a specific example, given a negative sample, suppose that the estimated conversion rate (which is quite accurate), while the wrongly estimated post-conversion purchase amount is a too much high value like 1 million CNY since ZILN does not perform supervised learning on the non-converted sample space in the regression task. It makes the expected transaction value overestimated and leads to bad allocation and recommendation results. Besides, in practical industrial applications, the allocation strategies mainly depend on manual operations (Lei et al., 2020). For example, operation specialists determine the priority of all fund types and allocate customers to high-priority fund type segments in order. However, the manual allocation strategy highly relies on human experience and easily leads to the sub-optimal total sales amount.
To address the above challenges, we design a two-stage ETV-optimized customer allocation framework (EOCA). The task of the first stage is predicting the customer purchase amount as the expected transaction value for each fund type and the task of the second stage is allocating the customer to different fund types for marketing operations. To alleviate the problems of MSE and ZILN, we propose a novel designed loss function, entire space multi-task joint (ESJ) loss, that unifies the entire sample probability space for users’ purchase intention and expected transaction value modeling. Furthermore, we propose the entire space deep probabilistic model (ESDPM) for the multi-task learning problem of financial recommendations based on ESJ loss. Then, our framework takes all potential customers’ expected transaction value for each fund type as constants and uses an efficient heuristic algorithm to search the near-optimal allocation plan.
The main contributions of our work are summarized as follows:
-
•
We design an ETV-optimized customer allocation framework that aims to maximize the total customer purchase amount for promotional recommendations in FinTech platforms under the allocated customer size and risk tolerance constraints.
-
•
We propose a novel entire space multi-task joint loss function and an entire space deep probabilistic model to predict the expected transaction value of each financial product’s delivered impression. To our best knowledge, it’s the first attempt to estimate customer purchase amount for financial products recommendations.
-
•
We design an efficient heuristic algorithm to solve the customer allocation problem based on the predicted ETV, which aims to maximize the total transaction value.
-
•
Comprehensive offline and online experiments are conducted to demonstrate the effectiveness of our proposed EOCA framework.
2. Related Work
User preference modeling is essential for recommendation systems or delivery systems to show users the products they are willing to purchase. For the performance-driven online advertising and e-commerce recommendations, the most widely used methods are focus on predicting users’ CTR or CVR to reflect users’ estimated response (Lu et al., 2017; Li et al., 2020a; Zheng et al., 2021; Chen et al., 2016). In recent years, a lot of deep learning-based CTR/CVR prediction models have been proposed. Ma et al. (Ma et al., 2018a) pointed out that the CVR models trained with samples of clicked impressions may encounter performance issues in practice due to the sample selection bias. They proposed a multi-task learning method Entire Space Multi-task Model (ESMM) to estimate CTR and post-click CVR in the entire sample space. Furthermore, Wen et al. (Wen et al., 2020) proposed the Elaborated Entire Space Supervised Multi-task Model () derived from ESMM. considers several purchase-related actions like adding to wish list besids click behavior to alleviate the data sparsity problem. Chapelle (Chapelle, 2014) suggested taking delayed feedback into consideration for online advertising and proposed the delayed feedback model (DFM), a deep probabilistic model that captures the conversion delay. To better estimate users’ responses, a lot of studies have been made that utilize sequential models to capture users’ historical behavior patterns (Zhou et al., 2018; Xiao et al., 2020; Wu et al., 2019). Su et al. (Su et al., 2020) proposed an attention-based model that captures users’ purchase interest from historical clicks and calibrates the CVR through modeling the users’ delayed feedback.
Most user response estimation models are proposed for online advertising, e-commerce or information feeds recommendation. Only a few researches have delved into conversion rate prediction for financial products, let alone the expected financial transaction value prediction. Researchers from Ping An, an insurance company in China, proposed HConvoNet which combines the information of users’ static profiles and the conversations between insurance agents and customers to predict users’ purchase intention (Kang et al., 2019). Huang et al. (Huang et al., 2020) utilize a graph neural network to capture users’ dynamic interests for next-purchase financial product prediction. Zheng et al. (Zheng et al., 2021) proposed the graph-convolved factorization machine (GCFM) to model multiple feature interactions for financial recommender systems, they conducted offline experiments on real-world financial datasets.
However, the existing literature for user response estimation mainly focuses on the CTR/CVR prediction. To the best of our knowledge, there is few work has discussed the problem of customers’ purchase amount prediction for the financial product recommendation. For e-commerce or online advertising, there is no need to estimate the transaction amount to make recommendations for maximizing total revenue. Because the price of the product or the advertiser’s bid is known in advance. For a specific instance, for cost-per-conversion (CPA) payment advertising systems, the expected transaction value for per thousand impression is calculated as , where CPA is the price the advertiser pays for each conversion. In this case, recommendation models only need to predict the click and post-click conversion rate accurately for optimizing the total revenue. Pei et al. (Pei et al., 2019) suggested that it is essential to estimate the total revenue for recommendation systems. They calculate the overall value of exposure by modeling the connections between the user’s arbitrary behavior (i.e. clicking, adding to the cart, adding to wishlists) and conversion rate and combining the price of the candidate item to calculate the expected value of the exposure for ranking. Nevertheless, their method is essentially to estimate the probability of various user behaviors and the conversion rate under corresponding conditions since the price of the product does not need to be estimated. For each conversion, the user’s investment amount is arbitrary. Therefore, we need to estimate not only the conversion rate but also the transaction amount after the conversion occurs. The most similar works study the problem of customer lifetime value (CLTV) prediction that does not involve in item recommendation (Chamberlain et al., 2017; Bauer and Jannach, 2021; Chen et al., 2018). Most previous works predicted CLTV based on regression models with vanilla MSE loss. Wang et al. (Wang et al., 2019) pointed that MSE loss is sensitive to extremely large LTV samples and limits the performance due to the large fraction of zero LTV samples. Therefore, they proposed the ZLIN loss which consists of cross-entropy loss for classification and regression loss for non-zero samples.
Regarding the guaranteed delivery (GD) strategies, a variety of literatures have studied the impression allocation problem for guaranteed display advertising (Chen et al., 2012; Bharadwaj et al., 2012; Fang et al., 2019). Most of the recent related researches are aimed at performance advertising based on CTR/CVR prediction as well. Lei et al. (Lei et al., 2020) proposed the pv-click-ctr model to predict the CTR trends with respect to each video’s impression number. They considered the optimization problem with objectives of maximizing the overall video views volume and fairness and suggested using a genetic algorithm to search for a near-optimal solution. Besides, they deployed their model on youku, which is one of the largest online video service platforms in China, and achieved remarkable performance compared to the manual strategy. Zhang et al. (Zhang et al., 2020) designed a request-level guaranteed delivery advertising planning system RAP to improve the delivery rate and play rate. RAP includes impressions forecasting and allocation optimization which involves the consideration of CTR prediction.
3. Problem Formulation
The financial customer allocation problem can be formulated as a combinatorial optimization problem with constraints. Given the user set with N selected potential users and types of funds, it requires the allocation system to divide the N users into disjoint groups. Each type of fund will be displayed to users in the corresponding segment. The size of each segment should exactly equal . With the expected transaction value which can be estimated by models, our aim is maximum the total expected transaction values by optimazing :
(1) |
(2) |
(3) |
(4) |
(5) |
where constraint (3) requires that one and only one type of funds will be delivered to each user and constraint (4) ensures each product will be delivered to exact users. In the first stage, the allocation framework adopts trained models to predict the and uses it as a constraint to solve the above optimization problem. Especially, constraint (5) guarantee that customers can only be allocated to the customer segments of the fund type with a lower risk level than their risk tolerance .
4. methodology
Figure 2 presents an illustrative example of customer allocation in the financial distribution platform. In this instance, customers are divided into two segmentations for promotional operations of two different types of funds. Fund type demands two allocated users and fund type is planned to deliver to one user according to the predetermined contract between the FinTech platform and fund company. Firstly, the framework needs to estimate the for each and . Secondly, the framework solves the customer allocation problem where the predicted ETV is used as the constant.
4.1. Entire Space Deep Probabilistic Model
In order to accurately estimate the ETV, it requires the model to address two main tasks: i) predict whether the user will purchase the delivered item, ii) if the conversion occurs, predict how much money the user is willing to invest. Thus, we formulate the ETV as the product of click & conversion rate (ctcvr) and the purchase amount after conversion:
(6) |
where PA denotes the purchase amount(PA) after conversion. Following previous work (Wang et al., 2019), we adopt a logarithmic transformation and denote as the result of the following transformation applied to the :
(7) |
Note that to avoid confusion between positive samples with an amount of 1 and those without conversion, we added 1 to the purchase amount of all samples, and the processed revenue label is denoted as and thus .
The first task is a typical classification problem which quite a lot of works have discussed. Nevertheless, there are several significant challenges for predicting customer purchase amount and we doubt that the existing methods based on MSE loss or ZILN loss can effectively solve this problem in practice. In addition to the zero inflation phenomenon caused by a large number of non-converted samples, the difference in the purchase amount of converted samples is also very large.
Figure 3 shows the logarithmic transaction value of converted samples of the real-world financial transactions data. It implies that the converted users’ log transaction values follow a normal distribution. Especially, besides the overall distribution of overall converted samples, we present examples of historical purchase amount distribution of three different users in Figure 3b. User 1 and user 3 have similar profile and behavior patterns while they are quite different from user 2. Meanwhile, the logarithmic transaction value distributions of user 1 and user 3 are similar while the distribution of user 2 is significantly different from theirs. Therefore, inspired by the ZILN loss, we assume that the observed log purchase amount of each positive sample for a financial product is sampled from an implicit normal distribution. The parameters of the distribution, i.e. the mean and the standard deviation , are determined by the sample’s features.
4.1.1. Entire Sample Space Modeling
For long-term marketing operations, models need to infer both the CTCVR and post-conversion purchase amount on the entire sample space including new coming users. This requires the models are trained with supervision for both tasks on the entire sample space. Otherwise, the performance of models in practical applications will suffer from sample selection bias in the actual industrial environment.
In order to tackle this problem, we provide a novel perspective of sample space division that unifies the classification and regression tasks for financial recommendation scenarios. For the convenience of understanding, we introduce a hidden variable , means that the user has the willingness to convert for the delivered mutual fund, and indicates that the user has no intention to purchase. Note that if the conversion is observed means that C = 1 while users might have the intention to convert even for observed negative samples. We assume that the probability of the event is approximately equal to the predicted CTCVR and
(8) |
For a given data set with each point described as , where is features of user and is features of fund type such as average annual rate of return, we summarize the sample space as:
i) User purchases and thus and . The observed transaction amount (Sample subspace ). ii) User has no intention to purchase () and thus and (Sample subspace ). iii) User has intention to putchase () while has not purchased yet and thus the observed transaction amount (Sample subspace ). Samples in the subspace are observed positive samples and samples in subspaces and are all observed negative samples. We utilize a multi-task probabilistic model to fit the data set. The first main task is to model the probability of purchase event occurs, and the second main task is to learn the parameters and standard deviation of each sample’s implicit normal distribution thereby estimate the probability of the event occurs which the sample’s observed logarithmic transaction value is under the condition that C = 1.
In this way, the probability of observed positive sample with a user-fund pair that involves user and fund can be formulated as:
(9) | ||||
The probability of observed negative sample can be formulated as:
(10) | ||||
Therefore, the likelihood of ESDPM with trainable parameters set and data set can be written as:
(11) | ||||
According to Equation 9 - 11, we can finally derive the following form of entire space multi-task joint (ESJ) loss function from the negative log-likelihood loss function:
(12) | ||||
where we denote pCTCVR as . Through the proposed loss function ESJ, the model has a joint probability modeling of the classification task and regression task on entire sample space. With the learned and the predicted expected transaction value when fund is deliverd to user can be written as:
(13) |
4.1.2. Model Architecture
How much money a user is planning to use for investment is not only related to the user’s investment intention and investment habits but also directly related to the user’s available capital. To tackle these challenges, on one hand, we use the widely used self-attention mechanism(Vaswani et al., 2017) to capture the users’ investment habits based on their historical behavior data, i.e. click funds, purchase, sell and add to favorites, in the sequential manner. On the other hand, we construct a consumption graph to better estimate users’ post-conversion purchase amount. The consumption graph is a bipartite graph, the nodes in the graph consist of user nodes and merchant nodes . There is an edge between a user and a merchant if has paid to . Graph neural networks (GNNs) are proven to be effective for extracting information from graph data and have been widely used (Kipf and Welling, 2016; Hamilton et al., 2017; Hong et al., 2020). The user’s payment behavior to different merchants contains valuable information and implies the user’s available capital in a way. In addition, applying the two-layer GNN model to the user-merchant-user graph can help the model capture the information of a target user through the attributes of users who have similar consumption behavior patterns. Without loss of generality, we utilize graph attention layer (Velickovic et al., 2017) to help ESDPM focus on the important information of consumption graph.
The left part of Figure 4 presents the architecture of ESDPM. It firstly generates the user embedding and fund embedding via concatenating the feature embeddings of the user and fund respectively. The user behavior embedding and consumption embedding are then generated by the self-attention layer and graph attention layer. Finally, we concatenate them into a vector as input:
(14) |
After that, ESDPM feeds the generated input vector to the subsequent multi-task learning network. Here we adapt the structure of MMoE (Ma et al., 2018b) to capture the relationships between tasks. The post-click conversion rate is learned as a hidden variable as ESMM(Ma et al., 2018a). We can obtain the ETV according to the output of ESDPM and the equation (13).
4.1.3. Discussion
For plain multi-task learning models including binary classification tasks and continuous values regression tasks, cross-entropy loss and MSE loss remain dominant presence(Liu et al., 2018; Gutelman and Levin, 2020), and the final loss is obtained by simply adding the losses of the above mentioned two tasks. This form of multi-task loss function does not consider classification and regression tasks in a unified sample probability space but essentially models these two tasks separately. Besides, it ignores the variance differences of the samples’ regression label distributions and is sensitive to outliers during the training process. Therefore, the performance of MTL models with such a loss function might be limited.
ESJ loss divides the unconverted samples into two situations, i) no intention to convert (), ii) willing to convert but the investment amount is 0 (). Through such a reasonable distinction of samples, the classification task and regression task are unified in the entire sample space and make the model perform supervised regression training for the observed negative samples. In addition, ESJ formulates the probability of each sample based on the assumption that the transaction value labels are heteroscedastic in terms of different samples. The formulation of ZILN loss shows that it uses the cross-entropy loss for classification tasks training on the entire sample space while the regression loss is only used for positive samples. Since ZILN loss has no regression supervision for the observed negative samples and needs to meet the implicit condition of , it is easy for the model to erroneously overestimate the of the logarithmic purchase amount distribution for samples in the inference space.
4.2. Large Scale Customer Allocation Solution
With the estimated ETV as the constraint, the remaining task of the EOCA framework is to solve the allocation problem described in section 3. Although there are some commercial tools for solving constrained integer programming problems, they fail to solve large-scale allocation problems in a reasonable time (Ji et al., 2021). Moreover, the strict risk constraint and strict constraint that the number of allocated users for each segment must exact equal to the demand require a special design allocation algorithm for large scale customer allocation in FinTech platforms.
4.2.1. Description of our proposed allocation algorithm
We summarize the proposed heuristic optimization algorithm (HA) in Algorithm 1. In Line 1, we set to 0 if the risk tolerance level of user is lower than risk level of fund . We introduce the factor to estimate the satisfaction speed of the segment during the allocation process. If of the fund j is larger than that of other funds, it implies that the segments may reach the allocated user number limit sooner, and other remaining users cannot be allocated to at that time. Based on and ETV, the algorithm calculates the heuristic score for each user in Line 5 to Line 9. Firstly, for the user , sort all funds in descending order of ETV to get the top 3 fund types . Secondly, respectively calculate the score difference between and as well as the score difference between and , get and , and finally get . measures the decrease of objective function that may result if the user is not allocated to the current highest scored fund in time. In Line 10, the algorithm sorts all users in descending order according to the calculated . The part from Line 11 to Line 19 executes the allocation process, the user with the highest is allocated to the fund with highest ETV refer to and remove from the unallocated user set . If the is satisfied, the algorithm sets the to 0 for all remaining users and recalculates and . Repeat the loop until all users are allocated.
4.2.2. Discussion
The higher the heuristic score for a user is, the more prominent the user’s preference for a certain promotional fund. If the allocation strategy fails to allocate to the fund that obtains the highest ETV regard to , it probably leads to a more obvious loss for the objective value. In other words, the HA strategy tends to conduct delayed allocation for the users who have similar preferences for all promotional funds. Besides, the algorithm ensures that exact users are allocated to each promotional fund and users will not be allocated to the fund with a higher risk level than the user’s risk tolerance. Although recent researches used solvers based on Lagrangian methods to achieve near-optimal solutions for the constrained optimization problem in practical applications(Zhang et al., 2020; Li et al., 2020b), these existing approaches can hardly ensure no constraints will be violated after allocation.
5. Experiments
In this section, we present the offline experiments on two real-world financial datasets and online experiments in actual delivery after customer allocation for promotional recommendations.
5.1. Offline Experiments
For the offline experiments, we use two real-world datasets collected from LiCaiTong platform, which are summarized as follows:
LCT-D: We collected samples from historical delivery in 2021/04/24-2021/06/01 as training (90%) and validation (10%) samples, which involves 8 different fund types ( and 4,751,065 users. We evaluate the performance of models on the samples in 2021/06/02 - 2021/06/08. The testing set contains 3,459,177 users.
LCT-W: The dataset is sampled from the whole log of LiCaiTong in 2021/08/14-2021/09/20 as training (90%) and validation (10%) samples, which involves 4 different types of funds () and 4,287,959 users. We evaluate the performance of models on the samples in 2021/09/21. The testing set contains 688,171 users. Different from LCT-D, this dataset contains samples on entire platforms rather than being limited to samples from historical delivery.
In order to demonstrate the effectiveness of our proposed ESDPM for practical industrial application, we conduct extensive experiments on the aforementioned datasets. AUC and logloss are used as the metrics for conversion prediction task evaluation. MAE and MSE are used as evaluation metrics for ETV prediction task evaluation as most other works that involve continueous value prediction task(Wang et al., 2021; Abdellah et al., 2019; Chiang and Dey, 2018). We compare ESDPM with several widely used methods and a variant model of ESDPM without the consumption graph, which are described as follows:
-
•
: Vanilla DNNs cannot handle multi-task learning data. Therefore, we trained two DNN models for click&conversion rate prediction task and logarithmic purchase value estimation task, denoted as and respectively.
-
•
: It’s a commonly used structure for multi-task learning. We leverage 3 shared bottom layers and two tower networks with 2 layers for classification and regression tasks respectively.
-
•
(Ma et al., 2018b): MMoE has been widely accepted as an effective multi-task learning model and deployed in the actual industrial application(Zhao et al., 2019), it leverages multiple expert networks and the gating mechanism to capture the relationship between different tasks. Note that MMoE does not learn the hidden variable post-click CVR, but directly predicts the probability of a user clicking and converting.
-
•
: It uses exactly the same model structure as the ESDPM for multi-task learning while the ZILN loss (Wang et al., 2019) is used to train the model.
-
•
: The structure of this model is similar to ESDPM, but it does not have the tower network of , and the regression task directly fits the logarithmic purchase amount with MSE loss function as well as MMoE. pCTCVR is obtained by multiplying the learned pCTCVR and pCVR.
-
•
: It regards the post-click conversion rate as a hidden variable and estimates the pCTCVR as the product of pCTR and pCVR. The plain ESMM network only uses the shared embedding layer as the bottom structure, and we replace the network structure with the MMoE structure. ESMM is a specially designed model for conversion rate prediction and thus there are no towers for the regression task.
-
•
: It is a viariant model of ESDPM, which removes the consumption graph from the input information as well as the coresspoding graph attention layers.
Model | AUC | logloss | MSE | MAE |
---|---|---|---|---|
ESMM | 0.8303 | 0.0243 | - | |
DNN | 0.8237 | 0.0233 | - | |
DNN | - | - | 0.0386 | 0.0413 |
Shared-Bottom | 0.8120 | 0.0239 | 0.0367 | 0.0323 |
MMoE | 0.8205 | 0.0258 | 0.0381 | 0.0392 |
MTL-ZILN | 0.8287 | 0.0226 | 0.0386 | 0.0426 |
MTL-MSE | 0.8276 | 0.0236 | 0.0382 | 0.0353 |
ESDPM-NCG | 0.8342 | 0.0193 | 0.0365 | 0.0265 |
0.8327 | 0.0185 | 0.0362 | 0.0227 |
Model | AUC | logloss | MSE | MAE |
---|---|---|---|---|
ESMM | 0.9270 | 0.0887 | - | |
DNN | 0.9040 | 0.1013 | - | |
DNN | - | - | 0.4248 | 0.1369 |
Shared-Bottom | 0.9208 | 0.0920 | 0.4265 | 0.1404 |
MMoE | 0.9192 | 0.0917 | 0.4236 | 0.1436 |
MTL-ZILN | 0.9228 | 0.0927 | 1.7885 | 0.5909 |
MTL-MSE | 0.9213 | 0.0907 | 0.4262 | 0.1446 |
ESDPM-NCG | 0.9283 | 0.0900 | 0.4206 | 0.1251 |
0.9313 | 0.0880 | 0.4203 | 0.1326 |
For a fair comparison, all models use the same sequence of user historical behavior and generate user behavior embedding with the self-attention mechanism for each dataset. The behavior embedding size is set to be 128 for all models. All models are optimized with Adam(Kingma and Ba, 2014). We adopt the early stopping strategy on the loss of validation set and fix the batch size to be 512 for all cases. The learning rate, dropout probability, and other hyper-parameters are tuned for each model according to the performance on the validation set with a grid search strategy.
5.1.1. Effectiveness of ESDPM
The experimental results on the two real-world datasets are reported in Table 1 and Table 2 respectively. The results illustrate that our proposed model achieves the best performances in both conversion rate prediction tasks and expected transaction value prediction tasks. In addition, the evaluation results of the entire sample space show that the model trained with ZILN has a larger error compared with models trained with MSE for ETV estimation. It indicates that the sample selection bias caused by ignoring negative samples in the ETV estimation task leads to unsatisfactory performance on the entire inference sample space.
It is worth noting that the MTL-MSE and MTL-ZILN have the same bottom structure and paradigm for conversion rate prediction as ESDPM-NCG. To further illustrate that our proposed ESJ loss function is effective for customer allocation in financial scenarios, we conduct a simulation experiment on comparing models with different loss functions. To be specific, we adopt the models trained with three types of loss function mentioned above to predict the preference score of testing samples respectively, each user is assigned to a group if ’s ETV for fund type is the highest among the four types of funds. Two metrics are used in this simulating experiments:
Total Hit Conversions. The total hit conversions (THC) measures how many users invest in the fund with the highest predicted ETV. It is formulated as:
(15) |
Total Hit Amount. The total hit amount (THA) measures how much money users invest in the fund with the highest predicted ETV. It is formulated as:
(16) |
The experimental results are shown in Figure 5. The model trained with our proposed ESJ loss achieve the best performance in both THC and THA. The results prove that ESJ can help model better estimate users’ investment preferences comparing with MSE and ZILN. It also indicates that the model trained with ESJ loss can help deliver the suitable financial product to achieve the maximum total transaction value in actual applications.
Delivery Period | Model | Allocation Strategy | CPMD | TAPMD | %CPMD Lift | %TAPMD Lift |
P1 | ESMM | Manual | 2.69 | 14,901 | - | - |
ESMM | HA | 2.88 | 13,492 | 7.06% | -9.46% | |
P2 | ESMM | HA | 2.05 | 7,359 | - | - |
ESDPM-NCG | HA | 2.55 | 10,048 | 24.39% | 36.54% | |
P3 | ESMM | Manual | 3.39 | 16,608 | - | - |
ESDPM-NCG | HA | 4.29 | 25,558 | 26.54% | 53.89% | |
P4 | ESDPM-NCG | HA | 5.22 | 24,636 | - | - |
ESDPM | HA | 5.39 | 28,568 | 3.26% | 15.96% |
5.1.2. Effectiveness of HA
The offline experiments on two-real world datasets verify the effectiveness of our proposed ESDPM model for expected transaction value prediction in financial product recommendations. The next step is to allocate customers into groups. Although the traditional integer programming solver such as branch and cut algorithm can achieve remarkable objective value for such a combinatorial optimization problem, its computation cost is unacceptable for practical industrial application that involves millions of users. A simple but applicable resource allocation strategy is to manually define the priority of the funds according to the average historical conversion rate of the funds and the experience of operation specialists. For the fund type with the highest priority , sort all users in descending order according to their preference scores that predicted by the model for , the top- users are allocated to , and so on until the allocation is completed. Before our proposed heuristic algorithm (HA) is deployed, the manual allocation strategy is used for the constrained customer allocation in LiCaiTong platform. However, the manual allocation strategy leads to limited performance and highly relies on the operation specialist’s experience.
In this experiment, we randomly sample datasets of different scales from the actual historical delivery. With the predicted ETV and delivered demand of different funds, we simulate the allocation and compare the performance of our proposed heuristic algorithm with the traditional integer programming (IP) solver and the manual allocation strategy. As shown in Figure 6a, the IP achieves the best objective. Our proposed HA allocation strategy can obtain similar performance as IP. For example, in the dataset with 200,000 users, HA achieves the 98.75% objective score of IP. Figure 6b shows the time cost of different strategies. It demonstrates that the time consumption continues significantly outperform IP in different data scales. In the large dataset with 200,000 users, the speed-up ratio achieves 416. In practical application, we need to solve the allocation problem involves in over 10 million users and the solution time for traditional IP methods is unacceptable.
5.2. Result from online A/B testings
We have conducted multiple online A/B testing experiments during several delivery periods to demonstrate that the methods we proposed are effective in practical industrial applications. The number of delivered users is 10 million of each period. For online A/B testing, we focus on metrics: i) Conversion Per Mille Deliveries (CPMD), ii) Transaction Amount Per Mille Deliveries (TAPMD), which are formulated as follows:
(17) |
(18) |
In the online experiments, we compared multiple frameworks, i.e. combinations of models and allocation strategies, to illustrate the superiority of our proposed EOCA framework. The results are summarized in Table 3. It should be noted that because the market conditioens in each delivery period are different, and the fund products delivered for recommendation are also different, there is no comparison between the performances of frameworks in different two delivery periods. The major observations from the online experimental results are summarized as follows:
-
•
In P1, both two frameworks used pCTCVR predicted by ESMM as the ETV for allocation optimization, and the framework that utilizes our proposed heuristic algorithm as allocation strategy achieved an improvement in the actual online conversion rate compared to the manual allocation strategy. However, TAPCD decreased due to ignoring the expected purchase amount in the ETV estimation phase. The experimental results from indicate that an increase in the conversion rate does not mean an increase in the total transaction amount for financial scenarios. For example, some users may have a strong intention to invest in a certain financial product, but the investment amount is limited due to concerns about risks. This shows that to achieve the optimization of sales amount, it is necessary to estimate the expected subscription amount for each user-fund pair.
-
•
Experimental results from P2 and P3 demonstrate that our proposed EOCA framework is effective for online guarantee delivery. To be specific, both frameworks utilize HA as the allocation strategy while EOCA takes the purchase amount estimation into consideration and thus significantly improves the transaction amount by 36.54%. In addition, the EOCA also improves the CPMD, which implies that it can better estimate the customer’s purchase intention with the entire sample space loss function we proposed. In P3, we compare the EOCA framework with the original applied guaranteed delivery framework in LiCaiTong. It proves that the customer optimized segmentation framework we proposed in this paper is superior to the current practical solutions.
-
•
In the delivery period P4, we conducted an online ablation experiment to illustrate the importance of the consumption graph. The experimental results imply the CG contains valuable information which can help ESDPM better estimate customers’ purchase intention and ETV.
6. Conclusion
In this paper, we study the problem of constrained customer allocation for FinTech platforms that distribute mutual funds, which aims to maximize the total transaction value for the promotional funds. To tackle this challenging problem, we present the ETV-optimized customer allocation framework that firstly estimates the expected transaction value for each delivery and secondly allocates users to different fund types for the corresponding promotional fund to deliver. We propose an entire space multi-task joint loss function that unifies the classification task and regression task in the entire sample probability space and we present the entire space deep probabilistic model based on ESJ for ETV estimation. To the best of our knowledge, it’s the first attempt to predict customers’ purchase amount on FinTech platforms. In addition, we propose an effective heuristic algorithm with the predicted ETV as the constant for the allocation stage, which can achieve near-optimal solutions efficiently under multiple constraints. The framework has been successfully deployed in the Tencent LiCaiTong platform. Offline experiments and online A/B tests demonstrate the effectiveness of EOCA.
References
- (1)
- Abdellah et al. (2019) Ali R Abdellah, Omar Abdul Kareem Mahmood, Alexander Paramonov, and Andrey Koucheryavy. 2019. IoT traffic prediction using multi-step ahead prediction with neural network. In 2019 11th International Congress on Ultra Modern Telecommunications and Control Systems and Workshops (ICUMT). IEEE, 1–4.
- Bauer and Jannach (2021) Josef Bauer and Dietmar Jannach. 2021. Improved Customer Lifetime Value Prediction With Sequence-To-Sequence Learning and Feature-Based Models. ACM Transactions on Knowledge Discovery from Data (TKDD) 15, 5 (2021), 1–37.
- Bharadwaj et al. (2012) Vijay Bharadwaj, Peiji Chen, Wenjing Ma, Chandrashekhar Nagarajan, John Tomlin, Sergei Vassilvitskii, Erik Vee, and Jian Yang. 2012. Shale: an efficient algorithm for allocation of guaranteed display advertising. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. 1195–1203.
- Chamberlain et al. (2017) Benjamin Paul Chamberlain, Angelo Cardoso, CH Bryan Liu, Roberto Pagliari, and Marc Peter Deisenroth. 2017. Customer lifetime value prediction using embeddings. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 1753–1762.
- Chapelle (2014) Olivier Chapelle. 2014. Modeling delayed feedback in display advertising. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 1097–1105.
- Chen et al. (2016) Junxuan Chen, Baigui Sun, Hao Li, Hongtao Lu, and Xian-Sheng Hua. 2016. Deep ctr prediction in display advertising. In Proceedings of the 24th ACM international conference on Multimedia. 811–820.
- Chen et al. (2012) Peiji Chen, Wenjing Ma, Srinath Mandalapu, Chandrashekhar Nagarjan, Jayavel Shanmugasundaram, Sergei Vassilvitskii, Erik Vee, Manfai Yu, and Jason Zien. 2012. Ad serving using a compact allocation plan. In Proceedings of the 13th ACM Conference on Electronic Commerce. 319–336.
- Chen et al. (2018) Pei Pei Chen, Anna Guitart, Ana Fernández del Río, and Africa Periánez. 2018. Customer lifetime value in video games using deep learning and parametric models. In 2018 IEEE international conference on big data (big data). IEEE, 2134–2140.
- Chiang and Dey (2018) Po-Han Chiang and Sujit Dey. 2018. Personalized effect of health behavior on blood pressure: Machine learning based prediction and recommendation. In 2018 IEEE 20th International Conference on e-Health Networking, Applications and Services (Healthcom). IEEE, 1–6.
- Fang et al. (2019) Zhen Fang, Yang Li, Chuanren Liu, Wenxiang Zhu, Yu Zheng, and Wenjun Zhou. 2019. Large-Scale Personalized Delivery for Guaranteed Display Advertising with Real-Time Pacing. In 2019 IEEE International Conference on Data Mining (ICDM). IEEE, 190–199.
- Gutelman and Levin (2020) Benjamin Gutelman and Pavel Levin. 2020. Efficient Image Gallery Representations at Scale Through Multi-Task Learning. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2281–2287.
- Hamilton et al. (2017) William L Hamilton, Rex Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In Proceedings of the 31st International Conference on Neural Information Processing Systems. 1025–1035.
- Hong et al. (2019) Claire Yurong Hong, Xiaomeng Lu, and Jun Pan. 2019. FinTech Platforms and Mutual Fund Distribution. Technical Report. National Bureau of Economic Research.
- Hong et al. (2020) Huiting Hong, Hantao Guo, Yucheng Lin, Xiaoqing Yang, Zang Li, and Jieping Ye. 2020. An attention-based graph neural network for heterogeneous structural learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 4132–4139.
- Huang et al. (2020) Bo Huang, Ye Bi, Zhenyu Wu, Jianming Wang, and Jing Xiao. 2020. UBER-GNN: A User-Based Embeddings Recommendation based on Graph Neural Networks. CoRR abs/2008.02546 (2020). arXiv:2008.02546 https://arxiv.org/abs/2008.02546
- Ji et al. (2021) Houye Ji, Junxiong Zhu, Chuan Shi, Xiao Wang, Bai Wang, Chaoyu Zhang, Zixuan Zhu, Feng Zhang, and Yanghua Li. 2021. Large-scale Comb-K Recommendation. In Proceedings of the Web Conference 2021. 2512–2523.
- Kang et al. (2019) Mao Kang, Ye Bi, Zhenyu Wu, Jianming Wang, and Jing Xiao. 2019. A Heterogeneous Conversational Recommender System for Financial Products.. In KaRS@ CIKM. 26–30.
- Kingma and Ba (2014) Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
- Kipf and Welling (2016) Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).
- Lei et al. (2020) Hang Lei, Yin Zhao, and Longjun Cai. 2020. Multi-objective Optimization for Guaranteed Delivery in Video Service Platform. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 3017–3025.
- Li et al. (2020b) Liangwei Li, Liucheng Sun, Chenwei Weng, Chengfu Huo, and Weijun Ren. 2020b. Spending Money Wisely: Online Electronic Coupon Allocation based on Real-Time User Intent Detection. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 2597–2604.
- Li et al. (2020a) Zhao Li, Xin Shen, Yuhang Jiao, Xuming Pan, Pengcheng Zou, Xianling Meng, Chengwei Yao, and Jiajun Bu. 2020a. Hierarchical bipartite graph neural networks: Towards large-scale e-commerce applications. In 2020 IEEE 36th International Conference on Data Engineering (ICDE). IEEE, 1677–1688.
- Liu et al. (2018) Mingxia Liu, Jun Zhang, Ehsan Adeli, and Dinggang Shen. 2018. Joint classification and regression via deep multi-task multi-channel learning for Alzheimer’s disease diagnosis. IEEE Transactions on Biomedical Engineering 66, 5 (2018), 1195–1206.
- Lu et al. (2017) Quan Lu, Shengjun Pan, Liang Wang, Junwei Pan, Fengdan Wan, and Hongxia Yang. 2017. A practical framework of conversion rate prediction for online display advertising. In Proceedings of the ADKDD’17. 1–9.
- Ma et al. (2018b) Jiaqi Ma, Zhe Zhao, Xinyang Yi, Jilin Chen, Lichan Hong, and Ed H Chi. 2018b. Modeling task relationships in multi-task learning with multi-gate mixture-of-experts. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1930–1939.
- Ma et al. (2018a) Xiao Ma, Liqin Zhao, Guan Huang, Zhi Wang, Zelin Hu, Xiaoqiang Zhu, and Kun Gai. 2018a. Entire space multi-task model: An effective approach for estimating post-click conversion rate. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. 1137–1140.
- Pei et al. (2019) Changhua Pei, Xinru Yang, Qing Cui, Xiao Lin, Fei Sun, Peng Jiang, Wenwu Ou, and Yongfeng Zhang. 2019. Value-aware recommendation based on reinforced profit maximization in e-commerce systems. arXiv preprint arXiv:1902.00851 (2019).
- Su et al. (2020) Yumin Su, Liang Zhang, Quanyu Dai, Bo Zhang, Jinyao Yan, Dan Wang, Yongjun Bao, Sulong Xu, Yang He, and Weipeng Yan. 2020. An Attention-based Model for Conversion Rate Prediction with Delayed Feedback via Post-click Calibration.. In IJCAI. 3522–3528.
- Vaswani et al. (2017) Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998–6008.
- Velickovic et al. (2017) Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 2017. Graph Attention Networks. CoRR abs/1710.10903 (2017). arXiv:1710.10903 http://arxiv.org/abs/1710.10903
- Wang et al. (2021) Wei Wang, Junyang Chen, Yushu Zhang, Zhiguo Gong, Neeraj Kumar, and Wei Wei. 2021. A multi-graph convolutional network framework for tourist flow prediction. ACM Transactions on Internet Technology (TOIT) 21, 4 (2021), 1–13.
- Wang et al. (2019) Xiaojing Wang, Tianqi Liu, and Jingang Miao. 2019. A deep probabilistic model for customer lifetime value prediction. arXiv preprint arXiv:1912.07753 (2019).
- Wen et al. (2020) Hong Wen, Jing Zhang, Yuan Wang, Fuyu Lv, Wentian Bao, Quan Lin, and Keping Yang. 2020. Entire space multi-task modeling via post-click behavior decomposition for conversion rate prediction. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval. 2377–2386.
- Wu et al. (2019) Shu Wu, Yuyuan Tang, Yanqiao Zhu, Liang Wang, Xing Xie, and Tieniu Tan. 2019. Session-based recommendation with graph neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 346–353.
- Xiao et al. (2020) Zhibo Xiao, Luwei Yang, Wen Jiang, Yi Wei, Yi Hu, and Hao Wang. 2020. Deep Multi-Interest Network for Click-through Rate Prediction. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 2265–2268.
- Zhang et al. (2020) Hong Zhang, Lan Zhang, Lan Xu, Xiaoyang Ma, Zhengtao Wu, Cong Tang, Wei Xu, and Yiguo Yang. 2020. A Request-level Guaranteed Delivery Advertising Planning: Forecasting and Allocation. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2980–2988.
- Zhao et al. (2019) Zhe Zhao, Lichan Hong, Li Wei, Jilin Chen, Aniruddh Nath, Shawn Andrews, Aditee Kumthekar, Maheswaran Sathiamoorthy, Xinyang Yi, and Ed Chi. 2019. Recommending what video to watch next: a multitask ranking system. In Proceedings of the 13th ACM Conference on Recommender Systems. 43–51.
- Zheng et al. (2021) Yongsen Zheng, Pengxu Wei, Ziliang Chen, Yang Cao, and Liang Lin. 2021. Graph-Convolved Factorization Machines for Personalized Recommendation. IEEE Transactions on Knowledge and Data Engineering (2021).
- Zhou et al. (2018) Guorui Zhou, Xiaoqiang Zhu, Chenru Song, Ying Fan, Han Zhu, Xiao Ma, Yanghui Yan, Junqi Jin, Han Li, and Kun Gai. 2018. Deep interest network for click-through rate prediction. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1059–1068.