1 Introduction
Recommender systems are a prominent example of information-seeking technology where machines and humans collaborate to support decision making in daily life. One important challenge for these systems is balancing business interests for producers and providers with fairness, privacy, and utility for consumers. Such concerns motivate the research direction of trustworthy recommender systems. Various countries and institutions around the world have recently identified and categorized challenges in the scope of digital technology trustworthiness, and have developed policies and regulations to govern them. For instance, regulations in the European Union include the
Artificial Intelligence Act,
1 Digital Services Act,
2 Digital Markets Act,
3 and the
General Data Protection Regulation.
4 In the USA, the White House’s
Executive Order on Artificial Intelligence5 provides the foundation for responsible AI development and use. In China, regulations include the
Provisions on the Administration of Algorithm-generated Recommendations for Internet Information Services,
6 Cybersecurity Law of the People’s Republic of China,
7 and
Personal Information Protection Law [
2] Those regulations demonstrate that recommender systems and their potential risks are also taken seriously from a political and legal perspective [
13].
Trustworthiness is a fundamental topic in recommender systems research from a technical, ethical, moral, legal, and utilitarian perspective, and is embedded into the larger area of responsible AI technology. Respective systems should, therefore, be researched from a multidisciplinary viewpoint. The primary concerns that trustworthiness summons in a recommender system context span aspects of robustness, security, privacy, controllability, interpretability, transparency, and fairness. Robustness and security refer to the ranking model’s capability to withstand unintended errors (e.g., noise) or intended attacks (e.g., injection of adversarial data), and more generally, adversarial cyberattacks of third parties. Privacy refers to whether we can trust models to have access to sensitive data, or whether (some part of the) sensitive data could be released during training. Controllability means that users have the ability to change the system behavior by controlling aspects of the ranking model or recommendations. Interpretability refers to the ability to understand how the model makes decisions. Transparency, more generally, means that the data, system, and business models linked to the recommender system should be transparent. The system’s capabilities, limitations, and decisions should be explained in a manner tailored to their stakeholders. Fairness means that the model does not systematically discriminate against (groups of) consumers or producers. Most of these dimensions are addressed in the papers constituting this special issue.
2 Overview of the Publications Included in this Special Issue
The special issue on
Trustworthy Recommender Systems features
nine articles, selected through a thorough peer-review process. The constituting contributions explore various aspects of trustworthy recommender systems, covering key dimensions such as security, privacy, transparency, explainability, bias, and fairness. Table
1 provides an overview of the dimensions addressed in each of the articles. Concrete topics include privacy-preserving mechanisms, such as federated learning for movie recommendations, security considerations for user data protection, methods for providing explainable recommendations, and strategies for ensuring fairness in ranking and interactions. The research spans diverse recommendation domains, including news, movies, jobs, and books, thereby offering practical insights into real-world challenges in recommender systems.
As an overarching contribution that provides a comprehensive overview on existing literature on the topic, the survey by Ge et al. [
6] reviews over 400 papers on the topic, categorized according to five trustworthiness dimensions: explainability, fairness, privacy, robustness, and controllability. The authors discuss the state of the art in recommender systems that address these dimensions, not only along a single category but also in a cross-dimension manner, such as privacy-aware explainability or controllable fairness.
In the following, we introduce the research articles, categorized into the topics of privacy and security (Section
2.1), transparency and explainability (Section
2.2), and bias and fairness (Section
2.3).
2.1 Privacy and Security
Security and privacy are critical in recommender systems due to the large volumes of sensitive user data these systems process. Preserving user privacy while still providing personalized recommendations is an ongoing challenge. The papers in this part address these issues, by developing privacy-preserving methods using federated learning, and investigating the vulnerabilities of recommender systems to data poisoning attacks.
Privacy via Federated Learning
Neumann et al. [
10] in their work “
A Privacy Preserving System for Movie Recommendations using Federated Learning” address the issue of maintaining privacy in a recommender system by employing
federated learning (FL). The authors introduce FedQ, a method designed to overcome key limitations in FL, such as handling non-independent and non-identically distributed data and small local datasets, which often degrade performance. The FedQ approach also incorporates early aggregation of client updates to mitigate the risk of input data reconstruction attacks, while neural network compression is applied to reduce communication overhead and further enhance privacy through parameter obfuscation.
For validating their method, the study focuses on movie recommendation, demonstrating the scalability and effectiveness of their system on a large dataset with over 162,000 clients. One of the key insights from the results is that the early aggregation of updates and the use of neural network compression help counterbalance the effects of data sparsity and imbalance across clients. Additionally, by chaining client training in a privacy-preserving manner, the system achieves more stable and consistent global model updates. Overall, these findings underscore the potential of FL as a robust solution for developing privacy-by-design recommendation models.
Security and Attacks
The study “
Evaluating Impact of User-Cluster Targeted Attacks in Matrix Factorisation Recommenders” by Shams and Leith [
14] investigates the vulnerability of matrix factorization-based recommender systems to
data poisoning attacks targeting specific user clusters. In these attacks, an adversary injects
fake users with manipulated ratings to promote certain items within targeted user groups. The research examines how user and item (latent) feature matrices change following such attacks and identifies factors that affect the success of attacks.
The authors validate their findings through experiments using two real-world datasets, MovieLens, and Goodreads. Their results show that even
low-knowledge attacks, where attackers have limited information about the system, can lead to significant changes in recommendations. A particularly notable insight is that items with fewer ratings are more vulnerable because their latent feature vectors can be easily manipulated, potentially making them the main targets for adversaries. Furthermore, these attacks can propagate through the system, influencing not only the targeted user clusters but also other parts of the network. Overall, studies of this nature clearly point to the need for more robust
defense mechanisms in recommender systems, particularly in the face of automatically learned (adversarial) attacks [
4].
2.2 Interpretability and Explainability
Transparency and explainability are crucial for building user trust in recommender systems. They provide users with insights into how recommendations are generated, promoting user acceptance and enhancing system reliability. The papers in this category explore different models and techniques to achieve higher transparency in recommendations.
Explainable Meta-Paths
The article “
Explainable Meta-Path Based Recommender Systems” by Markchom et al. [
9] studies the issue of making complex recommendations more interpretable using
meta-paths in a heterogeneous information network, which represent structured sequences of node types and relations. While long meta-paths are highly informative, their complexity often reduces their explainability. For example, a long meta-path such as
User → Movie → Director → Movie → Genre → User may be difficult for users to understand despite its richness in capturing diverse relationships. To address this, the authors introduce
meta-path translation that transforms long and complex meta-paths into shorter, more interpretable alternatives without sacrificing recommendation performance. Specifically, they propose a new meta-path translation dataset and a sequence-to-sequence model that achieves good performance in balancing translation accuracy with explainability.
The study utilizes two real-world datasets, generating novel meta-path translation data for testing. The results demonstrate that the proposed model outperforms existing sequence-to-sequence baselines in terms of translation accuracy, and achieves a better balance between accuracy and readability. An example would be shortening the aforementioned path to User → Movie → Genre → User, making it easier to comprehend while still retaining essential relational information.
Topic-based Explanations
The paper “
Topic-Centric Explanations for News Recommendation” by Liu et al. [
8] addresses the challenge of
explainability by introducing a
topic-aware model for news recommendations. This model is shown to be able to suggest relevant articles, and also explain why they are recommended by identifying
associated topics.
The authors evaluate their model using the MIND dataset, a large-scale news recommendation dataset. The results show that the proposed model outperforms baseline systems in recommendation accuracy and explainability. One interesting finding from the experiments is the ability of the proposed model to generate coherent and interpretable topics that align with user interests, as measured by coherence metrics. These results highlight the potential of topic-based explanations to increase user trust in recommender systems.
Together, these works contribute novel approaches to improving transparency in diverse recommendation settings, offering promising solutions to enhance user trust and satisfaction through more understandable recommendations.
2.3 Bias and Fairness
Bias and fairness are critical concerns in recommender systems, as biased recommendations can result in unequal representation, user dissatisfaction, and negative societal impacts. The papers in this category propose various methods to identify, measure, and mitigate biases, aiming for a more balanced and fair recommendation process. These papers can be subcategorized based on their approach and the specific problems they address, such as exposure bias, fairness in interaction, fairness in ranking (multi-agent fairness), and privacy-aware fairness strategies.
Exposure Bias
In “
Evaluation Measures of Individual Item Fairness for Recommender Systems: A Critical Study”, Rampisela et al. [
12] focus on
fairness in
individual item exposure. The authors review and critically analyze existing fairness evaluation measures and propose modifications to address the limitations they identify, which include interpretability and computational challenges. Their work contributes new
evaluation metrics to improve the fairness of item exposure in recommendation systems.
Through empirical analysis using both real-world datasets such as LastFM and MovieLens 1M, as well as synthesized datasets designed to test specific fairness conditions, they compare the original and corrected fairness measures. The results highlight that relying solely on exposure-based fairness can lead to biased outcomes, particularly when evaluating underrepresented items. This could mean that, for instance, if a recommender system heavily promotes popular items that have already received significant exposure, less popular or new items may rarely appear in recommendation lists, even if they are relevant. The study provides practical guidelines on when and how to use these evaluation measures, especially for items from underrepresented providers.
In another work, “
Mitigating Exposure Bias in Recommender Systems – A Comparative Analysis of Discrete Choice Models”, Krause et al. [
7] explore the impact of
exposure bias in recommender systems and propose
discrete choice models as a mitigation strategy. Exposure bias occurs when items are unequally presented to users, leading to skewed recommendations, echo chambers, and underrepresentation of certain items. It has been argued that traditionally models such as multinomial logit have been employed to address this issue, but the authors mentioned that discrete choice models offer a better solution, particularly in mitigating bias through
choice set compositions.
The study includes two experiments using partially biased human choice data collected in a controlled environment. The authors evaluate how discrete choice models and baseline recommender systems respond to exposure bias, both through over-exposure of certain items and through varying the competitiveness of choice sets. The results indicate that discrete choice models, including exponomial and generalized nested logit models, reduce exposure bias, outperforming the traditional multinomial logit model. Their findings suggest that choice set composition plays a critical role in exacerbating exposure bias, and mitigating it requires incorporating not only the frequency of item exposure but also the competitiveness of items within the choice sets.
Fairness in Interaction
In their work “
Fairness of Interaction in Ranking under Position, Selection, and Trust Bias”, Ovaisi et al. [
11] study fairness issues in
user-item interactions under various biases, including
exposure,
selection, and
trust bias. The paper emphasizes that interactions are impacted not only by the ranking position of an item (position bias) and whether it is presented in the truncated list (selection bias), but also by a user’s perception of relevance, which can be overly influenced by trust in the rankings provided by the system (trust bias).
The solution by the paper is a flexible fairness metric that quantifies disparities in interactions by accounting for these three types of biases. The proposed approach is a post-processing algorithm namely Fairness Optimization for Ranking via Greedy Exploration (FORGE), which aims to balance fairness and utility. The approach optimizes fairness through a greedy exploration strategy, enabling trade-offs between fairness and user utility, ultimately outperforming state-of-the-art fair ranking algorithms.
Their experimental results cover domains such as job applicant rankings and demonstrate that neglecting biases leads to an unfair interaction distribution, often resulting in rich-get-richer dynamics. FORGE achieves fairer ranking by adjusting exposure at different ranking positions and addressing interaction imbalances. Their results highlight that managing these biases provides a notable improvement in fairness, with minimal reduction in utility, thereby contributing to more equitable eco-systems that benefit both consumers and provider.
Fairness in Ranking
In “
Dynamic Fairness-aware Recommendation through Multi-agent Social Choice”, Aird et al. [
1] address the issue of achieving
fairness in personalized recommendations, particularly in
multi-stakeholder settings where fairness is complex and multifaceted. Unlike traditional classification tasks that often simplify fairness to the equality of outcomes between protected and unprotected groups, fairness in recommendation systems must consider diverse and sometimes conflicting fairness concerns, including user and provider perspectives. The authors argue that current fairness interventions, which are often static, do not adapt to the
dynamic nature of real-world recommendations.
To tackle this, they propose the SCRUF-D architecture (Social Choice for Recommendation Under Fairness – Dynamic). This model treats fairness as a two-stage social choice problem. In the allocation stage, various fairness concerns are represented as agents advocating for different aspects of fairness (e.g., geographical diversity or gender representation). These agents are dynamically weighted based on historical fairness outcomes, user preferences, and context-specific compatibility. In the aggregation stage, these weighted agents interact with the user’s preferences to form a final recommendation list, balancing fairness and personalization.
The framework is tested in the context of Kiva Microloans, a platform that focuses on global financial inclusion. Using Kiva’s real-world dataset, which includes loan characteristics such as region, gender, and loan amount, the authors demonstrate that SCRUF-D can adaptively incorporate multiple fairness concerns. Their results show that the model promotes fairness over time, outperforming static fairness interventions. Notably, the system dynamically adjusts to shifts in user compatibility and historical fairness states, which is (arguably) important in meeting diverse fairness goals.
Fairness in Federated Learning
The paper by Neumann et al. [
10] “
A Privacy Preserving System for Movie Recommendations using Federated Learning”, discusses fairness indirectly in an FL setting, mainly through the lens of
non-independent and non-identically distributed data and
small, imbalanced datasets. In FL, fairness can be understood as the ability of the model to perform well across all clients, regardless of how much or what kind of data each contributes. In traditional models, data-rich clients can dominate, leading to biased outcomes that underrepresent data-poor clients. This imbalance might hinder the convergence and reduce overall model performance, especially in diverse client settings.
To tackle this, the authors introduce a technique called FedQ, which chains client model updates together to produce more stable and aggregated updates. This approach helps mitigate issues resulting from small and imbalanced datasets by allowing multiple client datasets, even small ones, to contribute more meaningfully to the global model. The method addresses the fairness issue by ensuring that the model does not overfit to the data of larger or more frequent clients, thus promoting more equitable model performance across all users.
3 Conclusion
The challenge of developing trustworthy recommender systems has become increasingly pressing as these systems are now central to online decision-making. Despite significant advancements, ensuring recommender systems’ (and their provider’s) compliance with emerging regulatory frameworks, such as the EU’s AI Act or the Digital Markets Act, still remains a substantial obstacle.
From a multi-stakeholder perspective, recommender systems are not just tools for end users. They are also pivotal for producers, employers, sellers, and content creators since they control how resources like exposure opportunities, products, and services are allocated, directly impacting different parties. Recommender systems can determine which small retailers or grassroots content creators receive exposure and which are left out, thus affecting the sustainability of smaller businesses or creators. From a social and ethical standpoint, untrustworthy systems can create echo chambers or filter bubbles, limiting the exposure of users to diverse viewpoints, reinforcing biases, or even promoting harmful content.
Key challenges that arise when addressing these issues include balancing transparency, privacy, robustness, and fairness, without compromising the utility of recommendations for the end user, and the recommender’s business performance. The articles in this special issue contribute critical insights into these and other trustworthiness dimensions. The selected works improve our understanding of current methodologies, and will inspire future research to address the complex ethical, technical, and regulatory challenges associated with trustworthy recommender systems. We hope the reader will find the articles engaging and thought-provoking.
In the coming years, the integration of
large language models (LLMs), or more broadly, generative models in recommender systems, will introduce new challenges that must be understood, evaluated, and mitigated, see [
3,
5]. For example, LLM-driven recommenders, due to their training on vast amounts of unregulated online data, can exacerbate issues such as biases, privacy concerns, and the reinforcement of filter bubbles if not carefully managed. While these systems offer numerous benefits, they also present unknown risks that may require a new generation of trustworthy recommender systems focused on harm mitigation and ensuring ethical outcomes.