-
Effects of Using Synthetic Data on Deep Recommender Models' Performance
Authors:
Fatih Cihan Taskin,
Ilknur Akcay,
Muhammed Pesen,
Said Aldemir,
Ipek Iraz Esin,
Furkan Durmus
Abstract:
Recommender systems are essential for enhancing user experiences by suggesting items based on individual preferences. However, these systems frequently face the challenge of data imbalance, characterized by a predominance of negative interactions over positive ones. This imbalance can result in biased recommendations favoring popular items. This study investigates the effectiveness of synthetic da…
▽ More
Recommender systems are essential for enhancing user experiences by suggesting items based on individual preferences. However, these systems frequently face the challenge of data imbalance, characterized by a predominance of negative interactions over positive ones. This imbalance can result in biased recommendations favoring popular items. This study investigates the effectiveness of synthetic data generation in addressing data imbalances within recommender systems. Six different methods were used to generate synthetic data. Our experimental approach involved generating synthetic data using these methods and integrating the generated samples into the original dataset. Our results show that the inclusion of generated negative samples consistently improves the Area Under the Curve (AUC) scores. The significant impact of synthetic negative samples highlights the potential of data augmentation strategies to address issues of data sparsity and imbalance, ultimately leading to improved performance of recommender systems.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Mutual Learning for Finetuning Click-Through Rate Prediction Models
Authors:
Ibrahim Can Yilmaz,
Said Aldemir
Abstract:
Click-Through Rate (CTR) prediction has become an essential task in digital industries, such as digital advertising or online shopping. Many deep learning-based methods have been implemented and have become state-of-the-art models in the domain. To further improve the performance of CTR models, Knowledge Distillation based approaches have been widely used. However, most of the current CTR predicti…
▽ More
Click-Through Rate (CTR) prediction has become an essential task in digital industries, such as digital advertising or online shopping. Many deep learning-based methods have been implemented and have become state-of-the-art models in the domain. To further improve the performance of CTR models, Knowledge Distillation based approaches have been widely used. However, most of the current CTR prediction models do not have much complex architectures, so it's hard to call one of them 'cumbersome' and the other one 'tiny'. On the other hand, the performance gap is also not very large between complex and simple models. So, distilling knowledge from one model to the other could not be worth the effort. Under these considerations, Mutual Learning could be a better approach, since all the models could be improved mutually. In this paper, we showed how useful the mutual learning algorithm could be when it is between equals. In our experiments on the Criteo and Avazu datasets, the mutual learning algorithm improved the performance of the model by up to 0.66% relative improvement.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Pairwise Ranking Loss for Multi-Task Learning in Recommender Systems
Authors:
Furkan Durmus,
Hasan Saribas,
Said Aldemir,
Junyan Yang,
Hakan Cevikalp
Abstract:
Multi-Task Learning (MTL) plays a crucial role in real-world advertising applications such as recommender systems, aiming to achieve robust representations while minimizing resource consumption. MTL endeavors to simultaneously optimize multiple tasks to construct a unified model serving diverse objectives. In online advertising systems, tasks like Click-Through Rate (CTR) and Conversion Rate (CVR)…
▽ More
Multi-Task Learning (MTL) plays a crucial role in real-world advertising applications such as recommender systems, aiming to achieve robust representations while minimizing resource consumption. MTL endeavors to simultaneously optimize multiple tasks to construct a unified model serving diverse objectives. In online advertising systems, tasks like Click-Through Rate (CTR) and Conversion Rate (CVR) are often treated as MTL problems concurrently. However, it has been overlooked that a conversion ($y_{cvr}=1$) necessitates a preceding click ($y_{ctr}=1$). In other words, while certain CTR tasks are associated with corresponding conversions, others lack such associations. Moreover, the likelihood of noise is significantly higher in CTR tasks where conversions do not occur compared to those where they do, and existing methods lack the ability to differentiate between these two scenarios. In this study, exposure labels corresponding to conversions are regarded as definitive indicators, and a novel task-specific loss is introduced by calculating a \textbf{p}air\textbf{wise} \textbf{r}anking (PWiseR) loss between model predictions, manifesting as pairwise ranking loss, to encourage the model to rely more on them. To demonstrate the effect of the proposed loss function, experiments were conducted on different MTL and Single-Task Learning (STL) models using four distinct public MTL datasets, namely Alibaba FR, NL, US, and CCP, along with a proprietary industrial dataset. The results indicate that our proposed loss function outperforms the BCE loss function in most cases in terms of the AUC metric.
△ Less
Submitted 5 June, 2024; v1 submitted 4 June, 2024;
originally announced June 2024.