Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Measuring and Mitigating Biases
in Motor Insurance Pricing

Mulah Moriah1,2, Franck Vermet1,2 & Arthur Charpentier3
(1Euro-Institut d’Actuariat Jean Dieudonné (EURIA)
2Univ Brest, CNRS, UMR 6205, Laboratoire de Mathématiques de Bretagne Atlantique, France
3Université du Québec à Montréal
)
Abstract

The non-life insurance sector operates within a highly competitive and tightly regulated framework, confronting a pivotal juncture in the formulation of pricing strategies. Insurers are compelled to harness a range of statistical methodologies and available data to construct optimal pricing structures that align with the overarching corporate strategy while accommodating the dynamics of market competition. Given the fundamental societal role played by insurance, premium rates are subject to rigorous scrutiny by regulatory authorities. Consequently, the act of pricing transcends mere statistical calculations and carries the weight of strategic and societal factors. These multifaceted concerns may drive insurers to establish equitable premiums, considering various variables. For instance, regulations mandate the provision of equitable premiums, considering factors such as policyholder gender. Or mutualist groups in accordance with respective corporate strategies can implement age-based premium fairness. In certain insurance domains, the presence of serious illnesses or disabilities are emerging as new dimensions for evaluating fairness. Regardless of the motivating factor prompting an insurer to adopt fairer pricing strategies for a specific variable, the insurer must possess the capability to define, measure, and ultimately mitigate any fairness biases inherent in its pricing practices while upholding standards of consistency and performance. This study seeks to provide a comprehensive set of tools for these endeavors and assess their effectiveness through practical application in the context of automobile insurance. Results show that fairness bias can be found in historical data and models, and that fairer outcomes can be obtained by more fairness-aware approaches.


Keywords: Machine learning, Fairness, Pricing, Non-life insurance, Discrimination

1 Introduction

1.1 Motivation

The insurance industry is characterized by an inherent reversal in its production cycle, where insurers request a fixed premium at the time of policy subscription in exchange for coverage against uncertain risks in terms of both occurrence and magnitude. This inversion underscores the statistical nature that envelops the pricing of insurance, necessitating adherence to statistical theory for the estimation and coverage of random events. Beyond these statistical considerations, insurance premiums represent the equitable price for insurance services, encompassing a multitude of strategic challenges. The insurance market is increasingly competitive, comprising established incumbents and more agile newcomers striving to gain market share. This competitive landscape is further catalyzed by continuously evolving regulations aimed at fostering competition among industry players.

Hence, participants in the insurance market must offer competitive pricing strategies that align with their corporate strategies and communication, while also employing tools tailored to diverse distribution processes.

Recent years have witnessed the widespread adoption of machine learning algorithms, neural networks, and the utilization of vast datasets. This adoption is attributable to scientific advancements, increased computational power, enhanced accessibility to technology, and the proliferation of data. These emerging technologies have made significant inroads into various business sectors, including the insurance industry.

These data and algorithms serve as decision support tools, aiding in policyholder segmentation, risk comprehension, and the consideration of various factors associated with it. Therefore, industry stakeholders must incorporate these new elements to maintain their competitiveness. Nevertheless, the extensive use of massive data and intricate algorithms has brought the issue of transparency to the forefront. It is imperative that premiums and decisions are both explainable and fair, given the high stakes involved. This is not simply about algorithms assisting in trivial choices like movie selection; rather, it involves algorithms determining the cost of access to insurance services for various population segments. Concerns regarding fairness and ethics have been integral to our societies and philosophies for centuries. Although subject to interpretation, fairness can be defined as the ability to place individuals on an equal footing while acknowledging the differences that exist among them.

In response to these considerations, regulatory measures such as the gender directive have been implemented to promote fairness. Reforms related to access to borrower insurance also represent a form of fairness enforcement. Consequently, insurance industry participants may need to construct fairer premiums with regard to so-called sensitive or protected variables. These fairness constraints can emanate from regulatory requirements, as seen in the case of gender, or be driven by commercial and strategic objectives, such as those related to age in specific companies.

In the context of this study, bias is understood as a form of discrimination, signifying the undesirable impact of a sensitive variable on a variable of interest. For instance, this could include the effect of gender on insurance premiums within the framework established by the gender directive (of the European Union (2008)). Since 2016, numerous research studies (Angwin et al. (2016); Chouldechova (2017))have identified instances of discrimination in decision-making tools across various domains, such as the risk assessment tool for recidivism in the United States, Google Images algorithms, Amazon’s application processing algorithms, and lending and financing algorithms, among others. Historically, the solution has often been to circumvent the issue rather than address it from an ethical perspective.

One simplified scenario that can be examined involves the use of a gender-correlated variable to determine premiums. In this case, gender is not explicitly factored into the estimates, resulting in fairness by omission. However, the presence of a correlation between gender and the non-sensitive explanatory variable leads to an indirect effect of gender on the estimated premiums. Even though gender is seemingly reprocessed or removed from the model, its association with other variables allows its influence on the target variable to persist. This is because adjustments only account for the direct impact of sensitive variables and not their indirect effects, despite these variables exerting a significant influence on the distributions of other explanatory variables. Factors such as age, gender, and disability influence choices related to activities, risk tolerance, product preferences, and so on. Consequently, it is imperative to first establish the means to define and measure the fairness of the constructed models and subsequently mitigate this bias through fairness-aware approaches.

1.2 Agenda

The purpose of this article is to provide actuaries with tools to understand, measure and mitigate the unwanted effect of a sensitive variable in a pricing problem.In Section 2, we will discuss fairness notions and statistical measures of fairness. The bigger picture will be shown while focusing on important elements for pricing. Then, in Section 3, we will present methods that can be used to reduce fairness bias in pricing models. Finally, in Sections 4 and 5, we present the results of fairness implementation on a real car insurance pricing, respectively with measures of biases (in Section 4) and a description of the impact of mitigation (in Section 5). Note that all implementations are made in Python 3.8.

1.3 Notations

Throughout this document, the variable denoted as Y𝑌Yitalic_Y is the target variable we aim to predict. This variable can take on either categorical or quantitative values (discrete or continuous, but positive). Given the primary focus on pricing, Y𝑌Yitalic_Y predominantly assumes a quantitative nature. However, for the sake of illustration, it might be temporarily treated as a binary variable. In the context of addressing this supervised machine learning problem, we introduce an algorithm denoted as m𝑚mitalic_m, a set of non-sensitive features represented by the vector 𝑿=(X1,X2,,Xp)𝑿subscript𝑋1subscript𝑋2subscript𝑋𝑝\boldsymbol{X}=(X_{1},X_{2},\dots,X_{p})bold_italic_X = ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_X start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ), and sensitive attributes denoted as 𝑺𝑺\boldsymbol{S}bold_italic_S. The vector 𝑺𝑺\boldsymbol{S}bold_italic_S comprises variables that are intended to have no influence on the models, either intentionally or inadvertently. These variables are typically discrete, particularly in the context of examining fairness in pricing, and may explicitly pertain to attributes such as gender. If a single sensitive attribute is considered, as it will mostly be the case in this document, notation S𝑆Sitalic_S will be used (see Hu et al. (2023) for a detailed discussion about multiple sensitive attributes). Furthermore, we assume that S𝑆Sitalic_S is a binary sensitive variable, with possible values of 00 and 1111, which may correspond to the gender of the policyholder.

Each data point in our dataset, which can represent individuals, contracts, or claims, is identified as (yi,𝒙i,si)subscript𝑦𝑖subscript𝒙𝑖subscript𝑠𝑖(y_{i},\boldsymbol{x}_{i},s_{i})( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ), with i𝑖iitalic_i ranging from 1111 to n𝑛nitalic_n, and n0subscript𝑛0n_{0}italic_n start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and n1subscript𝑛1n_{1}italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT signifying the numbers of observations where si=0subscript𝑠𝑖0s_{i}=0italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0 and si=1,subscript𝑠𝑖1s_{i}=1,italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1 , respectively. Note that lowercase letter variables denote observations, whereas uppercase letter variables denote random variables.

We introduce y^=m^(𝒙)^𝑦^𝑚𝒙\widehat{y}=\widehat{m}(\boldsymbol{x})over^ start_ARG italic_y end_ARG = over^ start_ARG italic_m end_ARG ( bold_italic_x ) as the predictions generated by an "unfair model" (or "unaware model", as defined by Dwork et al. (2011, 2012)), and y~=m~(𝒙)~𝑦~𝑚𝒙\widetilde{y}=\widetilde{m}(\boldsymbol{x})over~ start_ARG italic_y end_ARG = over~ start_ARG italic_m end_ARG ( bold_italic_x ) as the predictions generated by a fair model. m~~𝑚\widetilde{m}over~ start_ARG italic_m end_ARG is a model that takes into account defined fairness constraints to minimize the effects of s𝑠sitalic_s on y𝑦yitalic_y. As studied in section 3, these constraints can take the form of criteria for variable selection (preprocessing constraints), penalized learning (in-processing approach) and output correction (post-processing approach). We can then define individual fairness bias, denoted as ε𝜀\varepsilonitalic_ε, as the difference between these two predictions, i.e., ε=y^y~𝜀^𝑦~𝑦\varepsilon=\widehat{y}-\widetilde{y}italic_ε = over^ start_ARG italic_y end_ARG - over~ start_ARG italic_y end_ARG.

We also introduce the function 𝒱s(𝒙i)superscript𝒱𝑠subscript𝒙𝑖\mathcal{V}^{s}(\boldsymbol{x}_{i})caligraphic_V start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ), which represents the k𝑘kitalic_k-nearest neighbors in the group S=s𝑆𝑠S=sitalic_S = italic_s associated with the i𝑖iitalic_i-th individual.

2 Measuring fairness and biases

While economists have engaged in discussions on discrimination for several decades, as evidenced in works such as Edgeworth (1922), Becker (1957), and Phelps (1972), recent contributions from the field of computer science have sought to formalize key fairness concepts and the notion of non-discrimination with respect to specific sensitive attributes denoted as "S". These discussions primarily pertain to various classifiers denoted as "m", and are exemplified in works such as Dwork et al. (2012), Hardt et al. (2016), Berk et al. (2017), and Corbett-Davies et al. (2017). For an overview of state-of-the-art developments in insurance models, please refer to Charpentier (2024).

As the literature in this area is relatively recent, references and consensus have not yet fully coalesced, as noted by Angwin et al. (2016): "The rapid growth of this emerging field has led to highly inconsistent motivations, terminologies, and notations, posing a significant challenge in cataloging and comparing definitions." Furthermore, Castelnovo et al. (2022) describe the multiplicity of fairness definitions as a "zoo of definitions," remarking, "The researcher or practitioner approaching this facet of machine learning for the first time can easily feel confused and somewhat lost in this maze of definitions. These various definitions capture different facets of the fairness concept, but to the best of our knowledge, a comprehensive understanding of the broader landscape where these measures reside remains elusive."

Fairness is intuitively understood as the absence of any association between the sensitive variable and the variable of interest. Fair predictions y~~𝑦\widetilde{y}over~ start_ARG italic_y end_ARG will then be independent of S𝑆Sitalic_S. Indeed, independence between these variables implies the absence of any direct or indirect relationships, thereby precluding the existence of fairness bias. However, this is just one facet of the fairness concept.

In this section, we will introduce the two primary types of fairness, namely group fairness and individual fairness, with group fairness being the more prevalent of the two.

2.1 Group or Statistical Fairness

Group fairness necessitates equality of treatment among groups based on the sensitive variables. For that, statistical properties are specified using the model m𝑚mitalic_m and Y^^𝑌\widehat{Y}over^ start_ARG italic_Y end_ARG to compare each group. The overarching objective is to ensure that individuals from both privileged and unprivileged groups are treated equitably in terms of the specified statistical properties.

Fairness was initially introduced using conditional probabilities in the context of binary classification (see EEOC (1979)).This concept can be extended to scenarios where the variable Y𝑌Yitalic_Y is non-binary, incorporating moment properties, a weak version, or distribution properties, a strong version. These can be linked to correlation properties and independence conditions, respectively.

Independence (demographic parity):

States that predictions (Y^^𝑌\widehat{Y}over^ start_ARG italic_Y end_ARG) should not rely on the sensitive variable S𝑆Sitalic_S. It emphasizes that predictions must be independent of S𝑆Sitalic_S:

Y^S.perpendicular-toabsentperpendicular-to^𝑌𝑆\widehat{Y}\mathchoice{\mathrel{\hbox to0.0pt{$\displaystyle\perp$\hss}\mkern 2% .0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$\textstyle\perp$\hss}\mkern 2% .0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptstyle\perp$\hss}\mkern 2% .0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss% }\mkern 2.0mu{\scriptscriptstyle\perp}}}S.over^ start_ARG italic_Y end_ARG start_RELOP ⟂ ⟂ end_RELOP italic_S .

This principle highlights the need for predictions to be unrelated to S𝑆Sitalic_S, without mentioning the target variable (Y𝑌Yitalic_Y). In practical scenarios like insurance pricing, if S𝑆Sitalic_S is age and Y𝑌Yitalic_Y is premium, it suggests premiums should be consistent across age groups. However, enforcing this fairness principle may seem counterintuitive if certain age groups pose higher risks, implying less restrictive models to ensure consistent premiums regardless of age. This approach may contradict traditional fairness perceptions, which prioritize equal treatment regardless of sensitive attributes. This fairness principle becomes crucial when there’s concern about unfair information in the target variable (Y𝑌Yitalic_Y), especially due to historical bias in the data, reflecting past unfair behaviors or decisions. In such cases, emphasizing fairness regarding S𝑆Sitalic_S may be necessary for corrective measures.

Separation (equalized odds):

states that predictions (Y^^𝑌\widehat{Y}over^ start_ARG italic_Y end_ARG) should be independent of the sensitive variable (S𝑆Sitalic_S) when the true value of the target variable (Y𝑌Yitalic_Y) is known. This means that once the actual outcome (Y𝑌Yitalic_Y) is revealed, the predictions should not be influenced by the sensitive attribute (S𝑆Sitalic_S):

Y^S|Y.perpendicular-toabsentperpendicular-to^𝑌conditional𝑆𝑌\widehat{Y}\mathchoice{\mathrel{\hbox to0.0pt{$\displaystyle\perp$\hss}\mkern 2% .0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$\textstyle\perp$\hss}\mkern 2% .0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptstyle\perp$\hss}\mkern 2% .0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss% }\mkern 2.0mu{\scriptscriptstyle\perp}}}S\leavevmode\nobreak\ |\leavevmode% \nobreak\ Y.over^ start_ARG italic_Y end_ARG start_RELOP ⟂ ⟂ end_RELOP italic_S | italic_Y .

In the context of separation, any disparities in treatment between groups based on the sensitive attribute (S𝑆Sitalic_S) must be justifiable by the actual value of the target variable (Y𝑌Yitalic_Y). For example, in a scenario involving premium and age, premiums may vary for each age group, based on risk factors independent of age. This approach reduces the influence of the sensitive attribute (S𝑆Sitalic_S) while preserving valuable information in the target variable (Y𝑌Yitalic_Y). However, it’s only feasible when Y𝑌Yitalic_Y isn’t affected by historical bias. If unfair information is present in Y𝑌Yitalic_Y, it will easily affect the predictions Y^^𝑌\widehat{Y}over^ start_ARG italic_Y end_ARG.

Sufficiency (predictive parity):

examines fairness regarding the target variable (Y𝑌Yitalic_Y). It aims for independence between Y𝑌Yitalic_Y and the sensitive attribute (S𝑆Sitalic_S), given the predictions (Y^^𝑌\widehat{Y}over^ start_ARG italic_Y end_ARG):

YS|Y^.perpendicular-toabsentperpendicular-to𝑌conditional𝑆^𝑌Y\mathchoice{\mathrel{\hbox to0.0pt{$\displaystyle\perp$\hss}\mkern 2.0mu{% \displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$\textstyle\perp$\hss}\mkern 2.0% mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptstyle\perp$\hss}\mkern 2.% 0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}% \mkern 2.0mu{\scriptscriptstyle\perp}}}S\leavevmode\nobreak\ |\leavevmode% \nobreak\ \widehat{Y}.italic_Y start_RELOP ⟂ ⟂ end_RELOP italic_S | over^ start_ARG italic_Y end_ARG .

This approach doesn’t require knowledge of the true Y𝑌Yitalic_Y for unseen individuals, addressing fairness directly. In feature selection and modeling, Y𝑌Yitalic_Y often faces selection issues due to observed data limitations. By focusing on Y^^𝑌\widehat{Y}over^ start_ARG italic_Y end_ARG as a starting point, this approach helps mitigate such issues associated with Y𝑌Yitalic_Y.

Different scopes of group fairness are inherently incompatible. Studies show simultaneous fulfillment of multiple fairness criteria is challenging, except in special cases. References like Chouldechova (2017), Kleinberg et al. (2016), Berk et al. (2018), and Charpentier (2024) explore this. This understanding leads to the introduction of the following definition of statistical fairness.

Definition 1 (Statistical (Group) Fairness).

Let y^=m(𝐱)^𝑦𝑚𝐱\widehat{y}={m}(\boldsymbol{x})over^ start_ARG italic_y end_ARG = italic_m ( bold_italic_x ). A model m𝑚mitalic_m is classified as strongly fair if it satisfies the following conditions:

{demographic parity:(Y^|S=s)=Y^,sequalized odds:(Y^|S=s,Y)=(Y^|Y),spredictive parity:(Y|S=s,Y^)=(Y|Y^),scases:demographic parityabsentconditional^𝑌𝑆𝑠^𝑌for-all𝑠:equalized oddsabsentconditional^𝑌𝑆𝑠𝑌conditional^𝑌𝑌for-all𝑠:predictive parityabsentconditional𝑌𝑆𝑠^𝑌conditional𝑌^𝑌for-all𝑠\begin{cases}\text{demographic parity}:&\leavevmode\nobreak\ (\widehat{Y}|S=s)% \overset{\mathcal{L}}{=}\widehat{Y},\leavevmode\nobreak\ \forall s\\ \text{equalized odds}:&\leavevmode\nobreak\ (\widehat{Y}|S=s,Y)\overset{% \mathcal{L}}{=}(\widehat{Y}|Y),\leavevmode\nobreak\ \forall s\\ \text{predictive parity}:&\leavevmode\nobreak\ (Y|S=s,\widehat{Y})\overset{% \mathcal{L}}{=}(Y|\widehat{Y}),\leavevmode\nobreak\ \forall s\end{cases}{ start_ROW start_CELL demographic parity : end_CELL start_CELL ( over^ start_ARG italic_Y end_ARG | italic_S = italic_s ) overcaligraphic_L start_ARG = end_ARG over^ start_ARG italic_Y end_ARG , ∀ italic_s end_CELL end_ROW start_ROW start_CELL equalized odds : end_CELL start_CELL ( over^ start_ARG italic_Y end_ARG | italic_S = italic_s , italic_Y ) overcaligraphic_L start_ARG = end_ARG ( over^ start_ARG italic_Y end_ARG | italic_Y ) , ∀ italic_s end_CELL end_ROW start_ROW start_CELL predictive parity : end_CELL start_CELL ( italic_Y | italic_S = italic_s , over^ start_ARG italic_Y end_ARG ) overcaligraphic_L start_ARG = end_ARG ( italic_Y | over^ start_ARG italic_Y end_ARG ) , ∀ italic_s end_CELL end_ROW

while m𝑚mitalic_m is considered weakly fair if it fulfills the following conditions:

{demographic parity:𝔼[Y^|S=s]=𝔼[Y^],sequalized odds:𝔼[Y^|S=s,Y]=𝔼[Y^|Y],spredictive parity:𝔼[Y|S=s,Y^]=𝔼[Y|Y^],scases:demographic parityabsent𝔼delimited-[]conditional^𝑌𝑆𝑠𝔼delimited-[]^𝑌for-all𝑠:equalized oddsabsent𝔼delimited-[]conditional^𝑌𝑆𝑠𝑌𝔼delimited-[]conditional^𝑌𝑌for-all𝑠:predictive parityabsent𝔼delimited-[]conditional𝑌𝑆𝑠^𝑌𝔼delimited-[]conditional𝑌^𝑌for-all𝑠\begin{cases}\text{demographic parity}:&\leavevmode\nobreak\ \mathbb{E}[% \widehat{Y}|S=s]=\mathbb{E}[\widehat{Y}],\leavevmode\nobreak\ \forall s\\ \text{equalized odds}:&\leavevmode\nobreak\ \mathbb{E}[\widehat{Y}|S=s,Y]=% \mathbb{E}[\widehat{Y}|Y],\leavevmode\nobreak\ \forall s\\ \text{predictive parity}:&\leavevmode\nobreak\ \mathbb{E}[Y|S=s,\widehat{Y}]=% \mathbb{E}[Y|\widehat{Y}],\leavevmode\nobreak\ \forall s\end{cases}{ start_ROW start_CELL demographic parity : end_CELL start_CELL blackboard_E [ over^ start_ARG italic_Y end_ARG | italic_S = italic_s ] = blackboard_E [ over^ start_ARG italic_Y end_ARG ] , ∀ italic_s end_CELL end_ROW start_ROW start_CELL equalized odds : end_CELL start_CELL blackboard_E [ over^ start_ARG italic_Y end_ARG | italic_S = italic_s , italic_Y ] = blackboard_E [ over^ start_ARG italic_Y end_ARG | italic_Y ] , ∀ italic_s end_CELL end_ROW start_ROW start_CELL predictive parity : end_CELL start_CELL blackboard_E [ italic_Y | italic_S = italic_s , over^ start_ARG italic_Y end_ARG ] = blackboard_E [ italic_Y | over^ start_ARG italic_Y end_ARG ] , ∀ italic_s end_CELL end_ROW

These definitions provide a framework for categorizing models as either strongly fair or weakly fair based on their compliance with different fairness criteria, namely demographic parity, equalized odds, and predictive parity.

2.2 Individual fairness

Individual fairness, also known as similarity-based fairness, asserts that similar individuals should receive similar predictions. Unlike group fairness, which aggregates outcomes at the group level, individual fairness compares individuals directly. This concept is formalized as "disparate treatment":

Definition 2 (Disparate Treatment).

A model m𝑚mitalic_m exhibits disparate treatment if, given the explanatory variables 𝐗𝐗\boldsymbol{X}bold_italic_X, the predictions Y^^𝑌\widehat{Y}over^ start_ARG italic_Y end_ARG and the sensitive attribute S𝑆Sitalic_S are dependent. To achieve fairness:

Y^S|𝑿.perpendicular-toabsentperpendicular-to^𝑌conditional𝑆𝑿\widehat{Y}\mathchoice{\mathrel{\hbox to0.0pt{$\displaystyle\perp$\hss}\mkern 2% .0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$\textstyle\perp$\hss}\mkern 2% .0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptstyle\perp$\hss}\mkern 2% .0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss% }\mkern 2.0mu{\scriptscriptstyle\perp}}}S|\boldsymbol{X}.over^ start_ARG italic_Y end_ARG start_RELOP ⟂ ⟂ end_RELOP italic_S | bold_italic_X .

Individual fairness and disparate treatment embody an intuitive conception of fairness, seen in scenarios like car insurance pricing, where identical individuals with different sensitive attribute values receive the same predictions. However, this approach overlooks relationships between explanatory variables (𝑿𝑿\boldsymbol{X}bold_italic_X) and S𝑆Sitalic_S, impacting fairness enforcement.

The challenge of individual fairness arises from sensitive variables significantly affecting other variables’ distribution. Attributes like gender, age, disabilities, ethnicity, and illnesses shape habits, preferences, and behaviors, complicating defining "similar individuals."

To address these questions, Dwork et al. (2011) introduced the Lipschitz constraint concept:

dY(y^i,y^j)<λd𝑿(𝒙i,𝒙j),subscript𝑑𝑌subscript^𝑦𝑖subscript^𝑦𝑗𝜆subscript𝑑𝑿subscript𝒙𝑖subscript𝒙𝑗d_{Y}(\widehat{y}_{i},\widehat{y}_{j})<\lambda d_{\boldsymbol{X}}(\boldsymbol{% x}_{i},\boldsymbol{x}_{j}),italic_d start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT ( over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) < italic_λ italic_d start_POSTSUBSCRIPT bold_italic_X end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ,

Here, dYsubscript𝑑𝑌d_{Y}italic_d start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT and d𝑿subscript𝑑𝑿d_{\boldsymbol{X}}italic_d start_POSTSUBSCRIPT bold_italic_X end_POSTSUBSCRIPT measure distances in target and explanatory variable spaces, respectively. Similarity in the Y𝑌Yitalic_Y space, like between premiums, can be defined using metrics such as |y^iy^j|subscript^𝑦𝑖subscript^𝑦𝑗|\widehat{y}_{i}-\widehat{y}_{j}|| over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | or (y^iy^j)2superscriptsubscript^𝑦𝑖subscript^𝑦𝑗2(\widehat{y}_{i}-\widehat{y}_{j})^{2}( over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. The focus is on defining individuals as similar when they have different values of the sensitive attribute (S𝑆Sitalic_S). And, disparate treatment will mean that regardless of the value of S𝑆Sitalic_S, if two individuals are similar on 𝑿𝑿\boldsymbol{X}bold_italic_X, they must be similar on Y𝑌Yitalic_Y and thus have the same premium.

Despite its intuitive appeal, individual fairness is less frequently used than group fairness, largely due to addressing various interactions between variables’ complexity. Causality-based criteria may offer a promising avenue, but practical implementation remains challenging. These causal models are not discussed here (for more information, refer to Kusner et al. (2017), Alycia and Wu (2022), and Galles and Pearl (1998)).

2.3 Quantifying unfairness

In light of the definitions provided earlier, it is imperative that fairness metrics possess the ability to quantify full/conditioned independence in the context of group fairness and also discern discrepancies in individual predictions, contingent upon the proximity of individuals in the case of individual fairness.

2.3.1 From the binary to the general case

In the case where both the target variable (y𝑦yitalic_y) and the sensitive attribute (s𝑠sitalic_s) are binary variables, metrics can be defined using the confusion matrix of y^^𝑦\widehat{y}over^ start_ARG italic_y end_ARG and s𝑠sitalic_s. For instance, the disparate impact metric can be derived as:

(Y^=1|S=1)(Y^=1|S=0).^𝑌conditional1𝑆1^𝑌conditional1𝑆0\frac{\mathbb{P}(\widehat{Y}=1|S=1)}{\mathbb{P}(\widehat{Y}=1|S=0)}.divide start_ARG blackboard_P ( over^ start_ARG italic_Y end_ARG = 1 | italic_S = 1 ) end_ARG start_ARG blackboard_P ( over^ start_ARG italic_Y end_ARG = 1 | italic_S = 0 ) end_ARG .

In the context of demographic parity, when this metric is closer to one, it indicates a fairer model. Similarly, another fairness metric, denoted as M𝑀Mitalic_M, is defined as:

M1subscript𝑀1\displaystyle M_{1}italic_M start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT =\displaystyle== |(Y^=1|S=1)(Y^=1|S=0)|\displaystyle|{\mathbb{P}(\widehat{Y}=1|S=1)}-{\mathbb{P}(\widehat{Y}=1|S=0)}|| blackboard_P ( over^ start_ARG italic_Y end_ARG = 1 | italic_S = 1 ) - blackboard_P ( over^ start_ARG italic_Y end_ARG = 1 | italic_S = 0 ) |
=\displaystyle== |(Y^=0|S=1)(Y^=0|S=0)|=M0.\displaystyle|{\mathbb{P}(\widehat{Y}=0|S=1)}-{\mathbb{P}(\widehat{Y}=0|S=0)}|% \leavevmode\nobreak\ =\leavevmode\nobreak\ M_{0}.| blackboard_P ( over^ start_ARG italic_Y end_ARG = 0 | italic_S = 1 ) - blackboard_P ( over^ start_ARG italic_Y end_ARG = 0 | italic_S = 0 ) | = italic_M start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT .

In the case of demographic parity, the closer the value of M1subscript𝑀1M_{1}italic_M start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT (or M0subscript𝑀0M_{0}italic_M start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT) is to zero, the fairer the model.

To address separation and sufficiency, a confusion matrix involving Y^^𝑌\widehat{Y}over^ start_ARG italic_Y end_ARG, Y𝑌Yitalic_Y, and S𝑆Sitalic_S can be constructed, enabling the definition of metrics such as true positive and false negative rates. To measure fairness in accordance with the equalized odds definition, the disparate mistreatment can be calculated as follows:

{M1|0=|(Y^=1|Y=0,S=1)(Y^=1|Y=0,S=0)|,M0|1=|(Y^=0|Y=1,S=1)(Y^=0|Y=1,S=0)|.\begin{cases}M_{1|0}=|\mathbb{P}(\widehat{Y}=1|Y=0,S=1)-\mathbb{P}(\widehat{Y}% =1|Y=0,S=0)|,\\ M_{0|1}=|\mathbb{P}(\widehat{Y}=0|Y=1,S=1)-\mathbb{P}(\widehat{Y}=0|Y=1,S=0)|.% \end{cases}{ start_ROW start_CELL italic_M start_POSTSUBSCRIPT 1 | 0 end_POSTSUBSCRIPT = | blackboard_P ( over^ start_ARG italic_Y end_ARG = 1 | italic_Y = 0 , italic_S = 1 ) - blackboard_P ( over^ start_ARG italic_Y end_ARG = 1 | italic_Y = 0 , italic_S = 0 ) | , end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL italic_M start_POSTSUBSCRIPT 0 | 1 end_POSTSUBSCRIPT = | blackboard_P ( over^ start_ARG italic_Y end_ARG = 0 | italic_Y = 1 , italic_S = 1 ) - blackboard_P ( over^ start_ARG italic_Y end_ARG = 0 | italic_Y = 1 , italic_S = 0 ) | . end_CELL start_CELL end_CELL end_ROW

In the context of equalized odds, the closer these values are to zero, the fairer the model is considered to be.

In a more general context involving non-binary variables, various correlation-based metrics, such as Kendall’s tau, Pearson’s correlation, and Spearman’s rho, can be employed to assess dependencies between variables. However, in the pursuit of capturing all forms of dependence in the relationship between Y^^𝑌\widehat{Y}over^ start_ARG italic_Y end_ARG and S𝑆Sitalic_S, the Hirschfeld-Gebelein-Renyi (HGRHGR\mathrm{HGR}roman_HGR) maximal correlation measure appears to outperform other methods.

The HGRHGR\mathrm{HGR}roman_HGR maximal correlation, originally introduced by Hirschfeld (1935) and further developed by Gebelein (1941) and Rényi (1959), offers the ability to quantify both linear and nonlinear relationships while adhering to essential properties for a reliable measure of dependency, as established by Rényi (1959). It enjoys broad acceptance within the field of statistics. For two continuous or discrete variables, the HGRHGR\mathrm{HGR}roman_HGR maximal correlation is precisely equal to zero if and only if the variables are independent. This makes it a valuable tool for detecting and quantifying dependencies between variables.

Definition 3 (HGRHGR\mathrm{HGR}roman_HGR).

For two random variables U𝑈Uitalic_U and V𝑉Vitalic_V respectively with values in 𝒰𝒰\mathcal{U}caligraphic_U and 𝒱𝒱\mathcal{V}caligraphic_V,

HGR(U,V)=maxf𝒰,g𝒢𝒱𝔼[f(U)g(V)],HGR𝑈𝑉subscriptformulae-sequence𝑓subscript𝒰𝑔subscript𝒢𝒱𝔼delimited-[]𝑓𝑈𝑔𝑉\mathrm{HGR}(U,V)=\max_{f\in\mathcal{F}_{\mathcal{U}},\ g\in\mathcal{G}_{% \mathcal{V}}}\mathbb{E}[f(U)g(V)],roman_HGR ( italic_U , italic_V ) = roman_max start_POSTSUBSCRIPT italic_f ∈ caligraphic_F start_POSTSUBSCRIPT caligraphic_U end_POSTSUBSCRIPT , italic_g ∈ caligraphic_G start_POSTSUBSCRIPT caligraphic_V end_POSTSUBSCRIPT end_POSTSUBSCRIPT blackboard_E [ italic_f ( italic_U ) italic_g ( italic_V ) ] ,

where

{𝒰={f:𝒰:𝔼[f(U)]=0 and 𝔼[f2(U)]=1}𝒢𝒱={g:𝒱:𝔼[g(V)]=0 and 𝔼[g2(V)]=1}casessubscript𝒰conditional-set𝑓:𝒰𝔼delimited-[]𝑓𝑈0 and 𝔼delimited-[]superscript𝑓2𝑈1otherwisesubscript𝒢𝒱conditional-set𝑔:𝒱𝔼delimited-[]𝑔𝑉0 and 𝔼delimited-[]superscript𝑔2𝑉1otherwise\begin{cases}\mathcal{F}_{\mathcal{U}}=\{f:\mathcal{U}\to\mathbb{R}:\mathbb{E}% [f(U)]=0\text{ and }\mathbb{E}[f^{2}(U)]=1\}\\ \mathcal{G}_{\mathcal{V}}=\{g:\mathcal{V}\to\mathbb{R}:\mathbb{E}[g(V)]=0\text% { and }\mathbb{E}[g^{2}(V)]=1\}\\ \end{cases}{ start_ROW start_CELL caligraphic_F start_POSTSUBSCRIPT caligraphic_U end_POSTSUBSCRIPT = { italic_f : caligraphic_U → blackboard_R : blackboard_E [ italic_f ( italic_U ) ] = 0 and blackboard_E [ italic_f start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_U ) ] = 1 } end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL caligraphic_G start_POSTSUBSCRIPT caligraphic_V end_POSTSUBSCRIPT = { italic_g : caligraphic_V → blackboard_R : blackboard_E [ italic_g ( italic_V ) ] = 0 and blackboard_E [ italic_g start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_V ) ] = 1 } end_CELL start_CELL end_CELL end_ROW

In our case, U𝑈Uitalic_U will be the sensitive attribute, and V𝑉Vitalic_V the premium (or one of its components).

In the realm of fairness quantification, Mary et al. (2019) proposed the use of the HGRHGR\mathrm{HGR}roman_HGR maximal correlation measure, and Grary et al. (2022) explored its application in the insurance domain. However, obtaining the exact value of this metric is not straightforward, as it is defined in infinite-dimensional function spaces. Several authors have used Witsenhausen’s linear algebra characterization (see Witsenhausen (1975)) to compute the HGR. This measure can also be related to Kernel Based Nonlinear Canonical Analysis Darolles et al. (2004). Therefore, Mary et al. (2019) combine kernel density estimation (KDE) (specifically a gaussian kernel) to these characterizations to obtain an approximation of the HGR, called HGR_KDE. Importantly, this estimator has been demonstrated to preserve the fundamental properties of the original HGRHGR\mathrm{HGR}roman_HGR measure and delivers strong performance. Therefore, in this paper, one of the metrics utilized is the HGRHGR\mathrm{HGR}roman_HGR maximal correlation, and we employ the implementation through HGR_KDE, as recommended by its authors.

2.3.2 Focus on the case of pricing

In pricing, where the target variable Y𝑌Yitalic_Y is continuous, confusion matrix metrics are irrelevant. Various correlation and distribution-based metrics related to the sensitive attribute S𝑆Sitalic_S become relevant. Here, we define S𝑆Sitalic_S as a binary variable representing gender.

One commonly used metric is Kendall’s tau. Additionally, HGRHGR\mathrm{HGR}roman_HGR estimation is valuable and applicable to all variables.

Probabilistic distances and divergences can be computed on conditional distributions Y|Sconditional𝑌𝑆Y|Sitalic_Y | italic_S. For instance, Kullback–Leibler’s divergence (KL) assesses disparities between premiums for men and women.

The p𝑝pitalic_p-value from the Kolmogorov-Smirnov (KS) test is employed for fairness evaluation. This test complements divergences, being particularly sensitive to extreme values. These metrics are implemented using the Python package Scipy. The KL divergence is calculated using relative entropy, following the approach proposed in Boyd and Vandenberghe (2004).

The concept of individual fairness with continuous variables has received less attention. Noteworthy contributions include Dwork et al. (2011), who proposed using k𝑘kitalic_k-nearest-neighbors for proximity assessment.

Adaptation flip-test

Inspired by Black et al. (2020) and Dwork et al. (2011), we adapt a metric for continuous variables to define individual fairness. We assume a distance metric or algorithm effectively measures proximity between individuals.

  • Select individuals within group si=1subscript𝑠𝑖1s_{i}=1italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1.

  • Find their k𝑘kitalic_k nearest neighbors of the opposite gender.

  • Calculate differences between their predictions y^isubscript^𝑦𝑖\widehat{y}_{i}over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and the average predictions of their neighbors 𝒱0(𝒙i)superscript𝒱0subscript𝒙𝑖\mathcal{V}^{0}(\boldsymbol{x}_{i})caligraphic_V start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) :

    Δ𝒱i=y^i1kxj𝒱0(𝒙i)y^j.superscriptsubscriptΔ𝒱𝑖subscript^𝑦𝑖1𝑘subscriptsubscript𝑥𝑗superscript𝒱0subscript𝒙𝑖subscript^𝑦𝑗\Delta_{\mathcal{V}}^{i}=\widehat{y}_{i}-\frac{1}{k}\sum_{x_{j}\in\mathcal{V}^% {0}(\boldsymbol{x}_{i})}\widehat{y}_{j}.roman_Δ start_POSTSUBSCRIPT caligraphic_V end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT = over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG ∑ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ caligraphic_V start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT .
  • Average these distances :

    FT~1=1n1i=1n1Δ𝒱i.subscript~𝐹𝑇11subscript𝑛1subscriptsuperscriptsubscript𝑛1𝑖1superscriptsubscriptΔ𝒱𝑖\tilde{FT}_{1}=\frac{1}{n_{1}}\sum^{n_{1}}_{i=1}\Delta_{\mathcal{V}}^{i}.over~ start_ARG italic_F italic_T end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT roman_Δ start_POSTSUBSCRIPT caligraphic_V end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT .

This process is repeated starting with si=0subscript𝑠𝑖0s_{i}=0italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0 to obtain FT~0subscript~𝐹𝑇0\tilde{FT}_{0}over~ start_ARG italic_F italic_T end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. A value closer to zero indicates fairer model.

However, it’s crucial to note that the quality of the proximity metric provided by the k𝑘kitalic_k-nearest-neighbors model is the primary limitation. To address this, we conduct hyperparameter tuning and variable selection using a grid search approach on our model. This optimization involves selecting an appropriate distance metric and its parameter settings, along with determining the optimal number of neighbors. Additionally, we optimize the number and specific variables used in model construction. The objective is to identify the k𝑘kitalic_k-nearest-neighbors model that minimizes individual distance while minimizing differences in premiums. We perform a grid search on these parameters and validate the results using a test dataset. This approach offers a satisfactory solution with significantly lower operational costs compared to a causal study. Minimizing bias and distance helps ensure that the presented bias level isn’t artificially inflated by the model, aligning with the concept of counterfactual fairness discussed in Kusner et al. (2017), De Lara et al. (2021), and more recently Charpentier et al. (2023a).

Once bias is detected, it’s crucial to attempt mitigation while preserving the performance and consistency of the models constructed.

3 Bias mitigation

Fairness bias mitigation involves reducing the unwanted influence of the sensitive variable and the estimated variable of interest in models, data, or results. There is no consensus on bias mitigation approaches, as they are closely tied to specific use cases. Binary y𝑦yitalic_y is often preferred in the literature, but in this study, we present methods applicable to regression and adapted for insurance contexts. Mitigation can occur before, during, or after modeling.

Incorporating fairness into the modeling process typically negatively impacts performance. Many papers have observed performance drops, with attempts to quantify these reductions (del Barrio et al., 2020). Therefore, maintaining acceptable performance while enforcing fairness is crucial.

We evaluate model performance using two mean metrics: Root Mean Square Error (RMSE) and losses over premiums ratio (LR):

RMSE=1ni=1nwi(y^iyi)2,LR=i=1nyii=1ny^i,formulae-sequenceRMSE1𝑛superscriptsubscript𝑖1𝑛subscript𝑤𝑖superscriptsubscript^𝑦𝑖subscript𝑦𝑖2LRsuperscriptsubscript𝑖1𝑛subscript𝑦𝑖superscriptsubscript𝑖1𝑛subscript^𝑦𝑖\mathrm{RMSE}=\sqrt{\frac{1}{n}\sum_{i=1}^{n}w_{i}(\widehat{y}_{i}-y_{i})^{2}}% ,\ \mathrm{LR}=\frac{\sum_{i=1}^{n}{y}_{i}}{\sum_{i=1}^{n}\widehat{y}_{i}},roman_RMSE = square-root start_ARG divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG , roman_LR = divide start_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG , (1)

where wisubscript𝑤𝑖w_{i}italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT may represent a weight based on the variable y𝑦yitalic_y considered. To balance performance and fairness metrics, we use them to guide optimization and comparison. A scenario is non-dominated when no other model achieves better performance and fairness simultaneously. Dominated scenarios can be surpassed by others in terms of fairness and performance.

Non-dominated scenarios warrant further investigation, and choosing the best dominant scenario depends on decision-maker constraints. Some prioritize fairness attainment, while others focus on maintaining performance levels.

We will present various methods in the following sections and summarize their advantages and disadvantages in Table 1.

3.1 Pre-processing mitigation

These mitigation methods involve data transformations aimed at reducing bias while retaining relevant information. The methods we have implemented include: total removal of variables correlated with the sensitive variable (Section 3.1.1), removal of linear correlations (Section 3.1.2) and an adaptation of the fair-SMOTE method (Section 3.1.3) proposed by Chakraborty et al. (2021). Originally designed for binary cases, we have modified it to suit the needs of bias mitigation in non-life insurance pricing.

3.1.1 Total deletion

This straightforward approach aims to mitigate bias by removing not only the sensitive variable but also variables that are correlated with it to reduce its indirect effects.Several studies have highlighted the potential for non-sensitive variables to perpetuate the effects of sensitive variables in models, as discussed in Lindholm et al. (2022a) and Lindholm et al. (2022b). Starting with explanatory variables (X1,X2,,Xp)subscript𝑋1subscript𝑋2subscript𝑋𝑝(X_{1},X_{2},\dots,X_{p})( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_X start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ), the process involves the following steps:

  • Measure the dependency between the sensitive variable, denoted as S𝑆Sitalic_S, and each of the explanatory variables Xjsubscript𝑋𝑗X_{j}italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT.

  • Based on the identified dependencies, create deletion scenarios, where each scenario represents a modeling instance with specific variables removed. Domain knowledge can also guide the selection of variables to delete.

  • Build models for each scenario and assess their fairness and performance on predictions to identify the best models.

Additionally, scenarios can be automatically generated by setting a maximal dependency threshold and removing all variables with dependencies exceeding that threshold.

3.1.2 Correlation remover

Rather than deleting variables correlated with the sensitive variable, an alternative approach is to transform these variables to reduce bias while retaining some information. This method, suggested in studies like Komiyama and Shimao (2017) and Frees and Huang (2023), involves removing information contained in 𝑺𝑺\boldsymbol{S}bold_italic_S from 𝑿𝑿\boldsymbol{X}bold_italic_X using regression models. The resulting residuals, denoted as 𝒙superscript𝒙perpendicular-to\boldsymbol{x}^{\perp}bold_italic_x start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT, are used in place of 𝒙𝒙\boldsymbol{x}bold_italic_x, ensuring that 𝒙superscript𝒙perpendicular-to\boldsymbol{x}^{\perp}bold_italic_x start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT is orthogonal to the sensitive attributes 𝒔𝒔\boldsymbol{s}bold_italic_s. Formally, for a matrix of vectors 𝒔𝒔\boldsymbol{s}bold_italic_s, each variable xjsubscript𝑥𝑗x_{j}italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT is transformed as: xj=xjα𝒔(𝒔𝒔)1𝒔xjsubscriptsuperscript𝑥perpendicular-to𝑗subscript𝑥𝑗𝛼𝒔superscriptsuperscript𝒔top𝒔1superscript𝒔topsubscript𝑥𝑗x^{\perp}_{j}=x_{j}-\alpha\boldsymbol{s}(\boldsymbol{s}^{\top}\boldsymbol{s})^% {-1}\boldsymbol{s}^{\top}x_{j}italic_x start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - italic_α bold_italic_s ( bold_italic_s start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_s ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_s start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. α[0,1]𝛼01\alpha\in[0,1]italic_α ∈ [ 0 , 1 ] is an hyperparameter that controls the level of correction applied on xjsubscript𝑥𝑗x_{j}italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT.

It is important to note that the absence of correlation does not guarantee the absence of statistical dependence, especially in cases involving non-linear transformations of legitimate features. The bias mitigation achieved through this method is limited. However, for linear models like Generalized Linear Models (GLMs), this approach is consistent in breaking all the connections that models can establish.

3.1.3 Fair-SMOTE adaptation

The fair-SMOTE approach differs from previous techniques by introducing synthetic individuals into the dataset instead of altering the existing data. Its primary goal is to ensure equal gender representation, regardless of premium levels, potentially addressing under-representation issues of specific classes. It operates exclusively on the training dataset, leaving the test dataset untouched.

In a study by Chakraborty et al. (2021), sampling methods directly on Y𝑌Yitalic_Y were found to exacerbate bias as they do not consider the sensitive variable. They opted to sample based on S|Yconditional𝑆𝑌S|Yitalic_S | italic_Y and Y𝑌Yitalic_Y. However, sampling on Y𝑌Yitalic_Y in insurance pricing contexts may compromise unique target variable characteristics and decrease performance. Therefore, we avoid sampling on Y𝑌Yitalic_Y and assess the ramifications in our use case.

To apply this method to continuous target variables, we discretize them, allowing delineation of resampling bins. The number of bins influences proximity to the continuous distribution and reduction of statistical bias from discretization. However, selecting too many bins may result in small sub-populations unfit for consistent simulations, requiring tuning based on target variable distributions. Once bins are designated, distributions of each S𝑆Sitalic_S modality are harmonized within each bin.

Instead of randomly selecting from the initial set, we introduce subtleties enabling distinct individuals’ generation in specific scenarios. Two hyperparameters, the threshold st𝑠𝑡stitalic_s italic_t and transformation factor ft𝑓𝑡ftitalic_f italic_t, are defined and selected within the interval [0,1]01[0,1][ 0 , 1 ]. Using a k𝑘kitalic_k-nearest neighbor model, two closest individuals, v1subscript𝑣1v_{1}italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and v2subscript𝑣2v_{2}italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, are identified from a randomly chosen individual p𝑝pitalic_p. New individuals’ attributes are reconstructed column by column, preserving observed subgroup distributions. The pseudo code for this process is outlined in Algorithm 1 (Appendix A). Continuous Y𝑌Yitalic_Y values for newly generated individuals are calculated based on the algorithm’s quantitative value rule.

Generative adversarial networks (GANs) could have been employed to simulate new individuals in a SMOTE approach or reconstruct 𝑿𝑿\boldsymbol{X}bold_italic_X variables by generating similar individuals while minimizing dependence on S𝑆Sitalic_S.

3.2 In-processing

These approaches involve the inclusion of fairness constraints during the model calibration phase. The exponentiated gradient method, named after the game theory technique upon which it is founded, and its grid search version are both utilized.

3.2.1 Exponentiated gradient

Agarwal et al. (2018) present a bias mitigation approach that focuses on reducing bias within machine learning models. They begin by demonstrating that fairness definitions can be expressed as linear inequality sets of the following form:

Mμ(m)c,𝑀𝜇𝑚𝑐M\mu(m)\leq c,italic_M italic_μ ( italic_m ) ≤ italic_c ,

In this representation, M𝑀Mitalic_M is a matrix, c𝑐citalic_c is a vector, and μ𝜇\muitalic_μ is a conditional moment vector. The vector c𝑐citalic_c provides the means to control the level at which each constraint is enforced by adjusting the values of cksubscript𝑐𝑘c_{k}italic_c start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT.

This formulation leads to an optimization problem within the context of statistical learning, defined as follows:

minmL(m) under the constraint that Mμ(m)c,subscript𝑚𝐿𝑚 under the constraint that 𝑀𝜇𝑚𝑐\min_{m\in\mathcal{M}}L(m)\text{ under the constraint that }M\mu(m)\leq c,roman_min start_POSTSUBSCRIPT italic_m ∈ caligraphic_M end_POSTSUBSCRIPT italic_L ( italic_m ) under the constraint that italic_M italic_μ ( italic_m ) ≤ italic_c ,

where L𝐿Litalic_L represents a loss function employed for the evaluation of the models. We can observe that applying constraints to deterministic prediction functions can have a detrimental impact on performance. To mitigate this, they introduce the concept of "random functions," involving the use of a randomized predictor denoted as Q, which is drawn from ΔΔ\Deltaroman_Δ, the set encompassing all distributions over \mathcal{M}caligraphic_M, for the purpose of making predictions. In this approach, a predictor m𝑚mitalic_m, selected from \mathcal{M}caligraphic_M, is sampled from Q𝑄Qitalic_Q and subsequently used for prediction. Consequently, the prediction error is defined as L(Q)=mQ(m)L(m)𝐿𝑄subscript𝑚𝑄𝑚𝐿𝑚L(Q)=\sum_{m\in\mathcal{M}}Q(m)L(m)italic_L ( italic_Q ) = ∑ start_POSTSUBSCRIPT italic_m ∈ caligraphic_M end_POSTSUBSCRIPT italic_Q ( italic_m ) italic_L ( italic_m ) and the conditional moments are expressed as μ(Q)=mQ(m)μ(m).𝜇𝑄subscript𝑚𝑄𝑚𝜇𝑚\mu(Q)=\sum_{m\in\mathcal{M}}Q(m)\mu(m).italic_μ ( italic_Q ) = ∑ start_POSTSUBSCRIPT italic_m ∈ caligraphic_M end_POSTSUBSCRIPT italic_Q ( italic_m ) italic_μ ( italic_m ) . This adaptation transforms the optimization problem into the following for

minQΔL(Q) under the constraint that Mμ(Q)c.subscript𝑄Δ𝐿𝑄 under the constraint that 𝑀𝜇𝑄𝑐\min_{Q\in\Delta}L(Q)\text{ under the constraint that }M\mu(Q)\leq c.roman_min start_POSTSUBSCRIPT italic_Q ∈ roman_Δ end_POSTSUBSCRIPT italic_L ( italic_Q ) under the constraint that italic_M italic_μ ( italic_Q ) ≤ italic_c .

To address this problem, we employ the "exponentiated gradient" algorithm, as recommended in Freund and Schapire (1996). This game-theory-based approach pits the prediction function against the level of compliance with the constraint. The optimum is reached when any alteration of these elements results in a minimal loss of performance and an increase in constraint enforcement.

In theory, this approach has the potential to accommodate various fairness constraints in both binary and continuous cases. However, in practice, it has primarily been applied to the binary case. The implementation of equalized odds or demographic parity in the continuous case remains an unresolved challenge, and it is inconclusive whether this method is suitable for such cases.

A less stringent constraint known as "equality of the expected error" has been incorporated, specifically when S𝑆Sitalic_S is a discrete variable:

 equality of 𝔼[(Y,Y^)|S=s] across all s. equality of 𝔼delimited-[]conditional𝑌^𝑌𝑆𝑠 across all 𝑠\text{ equality of }\mathbb{E}[\ell(Y,\widehat{Y})|S=s]\text{ across all }s.equality of blackboard_E [ roman_ℓ ( italic_Y , over^ start_ARG italic_Y end_ARG ) | italic_S = italic_s ] across all italic_s .

This constraint aims to ensure that the model makes errors of similar magnitude, on average, regardless of the value of S𝑆Sitalic_S, ultimately resulting in predictions of equal quality for different s𝑠sitalic_s values. This concept can be reformulated as an inequality:

𝔼[(Y,Y^)|S=s]<ζ,s.𝔼delimited-[]conditional𝑌^𝑌𝑆𝑠𝜁for-all𝑠\mathbb{E}[\ell(Y,\widehat{Y})|S=s]<\zeta,\forall s.blackboard_E [ roman_ℓ ( italic_Y , over^ start_ARG italic_Y end_ARG ) | italic_S = italic_s ] < italic_ζ , ∀ italic_s .

In this formulation, the hyperparameter ζ𝜁\zetaitalic_ζ controls the acceptable error margin beyond which the constraint may be violated. Additionally, the error from a prediction can be further constrained by introducing the hyperparameter M𝑀Mitalic_M, leading to the inequality:

𝔼[min({(Y,Y^)|S=s},M)]<ζ,s.𝔼delimited-[]conditional-set𝑌^𝑌𝑆𝑠𝑀𝜁for-all𝑠\mathbb{E}[\min(\{\ell(Y,\widehat{Y})|S=s\},M)]<\zeta,\forall s.blackboard_E [ roman_min ( { roman_ℓ ( italic_Y , over^ start_ARG italic_Y end_ARG ) | italic_S = italic_s } , italic_M ) ] < italic_ζ , ∀ italic_s .

This approach allows the incorporation of a threshold to limit the extent of deviations considered, particularly when the algorithm encounters convergence challenges.

3.2.2 Grid search approach

In their exploration of bias mitigation, Agarwal et al. (2018) highlight an intriguing possibility when the sensitive variable is binary. In this context, selecting the most suitable hyperparameters becomes crucial, transforming the pursuit of the optimal solution into a challenge of identifying two interconnected parameters through an equation.

However, in the continuous case, a grid search methodology becomes vital for uncovering suboptimal solutions to the inequality mentioned earlier. When the exponentiated gradient method fails to converge for a given value of ζ𝜁\zetaitalic_ζ or computational time becomes prohibitive, grid search offers an alternative. It systematically explores specific or random regions within the solution space, identifying a solution within a predefined time frame.

3.3 Post-processing

This approach involves transforming model predictions to enhance fairness. For instance, logistic regression allows influencing the model’s behavior with respect to different classes through computed probabilities and decision thresholds without recalibrating predictions.

Historically, post-modeling techniques mainly adjust decision boundaries while considering fairness definitions, but they’re less applicable to the continuous case due to the absence of distinct decision boundaries. Recent suggestions, like using Wasserstein barycenter on scores and the associated transport procedure by Charpentier et al. (2023b), aim to address this gap. However, defining advantageous and disadvantageous premiums is challenging, as it depends on policyholder characteristics and inherent risk. The "individual fair redistribution" approach was introduced within this context.

3.3.1 Fair redistribution

The methodology based on optimal transport, particularly the Wasserstein barycenter concept as utilized by Charpentier et al. (2023b), involves implementing monotone transformations on premiums within distinct sensitive groups while maintaining the relative orderings of premiums within each group. Here, we’ll focus on a uniform premium adjustment within each sensitive group. This approach adapts the flip-test to define bias, dividing premiums into fair shares and biases. The redistributed bias is then allocated to individual data points iteratively with the aim of minimizing bias given the adjusted premiums. In this subsection, we will refer to Y𝑌Yitalic_Y as premium to grasp the intuition behind our approach. The same reasoning can be made with any other continuous pricing outcome.

We introduce ε𝜀\varepsilonitalic_ε to represent a measure of "individual fairness bias", which is defined as the difference between the estimated outcome y^^𝑦\widehat{y}over^ start_ARG italic_y end_ARG and the fair premium y~~𝑦\widetilde{y}over~ start_ARG italic_y end_ARG. This estimation is initially determined by means of the quantity Δ𝒱subscriptΔ𝒱\Delta_{\mathcal{V}}roman_Δ start_POSTSUBSCRIPT caligraphic_V end_POSTSUBSCRIPT, presented earlier. More specifically, we define εisubscript𝜀𝑖\varepsilon_{i}italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT as Δ𝒱isuperscriptsubscriptΔ𝒱𝑖\Delta_{\mathcal{V}}^{i}roman_Δ start_POSTSUBSCRIPT caligraphic_V end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT, as per the adaptation of the flip-test, which is elaborated upon in Section 2.3.2. But here, instead of completely rectifying the disparity in the premium due to the initially quantified bias with the equation:

y~=y^ε,~𝑦^𝑦𝜀\widetilde{y}=\widehat{y}-\varepsilon,over~ start_ARG italic_y end_ARG = over^ start_ARG italic_y end_ARG - italic_ε ,

we adopt a strategy of partial fairness correction. This is accomplished by employing the following expression:

y~=y^εη,~𝑦^𝑦𝜀𝜂\widetilde{y}=\widehat{y}-\frac{\varepsilon}{\eta},over~ start_ARG italic_y end_ARG = over^ start_ARG italic_y end_ARG - divide start_ARG italic_ε end_ARG start_ARG italic_η end_ARG ,

where η𝜂\etaitalic_η is an hyperparameter constrained within the interval [1,)1[1,\infty)[ 1 , ∞ ), identical within a given group. When η𝜂\etaitalic_η takes on a very large value, it corresponds to minimal correction, while as η𝜂\etaitalic_η approaches 1, the level of correction is greater. η𝜂\etaitalic_η was introduced because we noticed that correcting directly y^^𝑦\widehat{y}over^ start_ARG italic_y end_ARG with ε𝜀\varepsilonitalic_ε didn’t lead to an increase of the fairness level. Individuals that had greater premiums than their neighborhood (as defined by the adapted flip-test) now had smaller premiums and vice versa. By bringing an individual’s premium closer to the average premium of his opposite-gender neighbors, it potentially moves away from the premiums of other individuals whose neighborhood it made up. The conclusions of our first works were that a too brutal correction wasn’t efficient and that it will be necessary to correct both subgroups smoothly and simultaneously. Thus, we decided to introduce an alternative algorithm where for a given level of η𝜂\etaitalic_η, we correct slowly and iteratively the two subgroup’s premiums while recalculating at each iteration the new bias level ε𝜀\varepsilonitalic_ε. After the final iteration, we obtain

y~i=y^iεifinalbias,subscript~𝑦𝑖subscript^𝑦𝑖superscriptsubscript𝜀𝑖𝑓𝑖𝑛𝑎𝑙𝑏𝑖𝑎𝑠\widetilde{y}_{i}=\widehat{y}_{i}-\varepsilon_{i}^{final\leavevmode\nobreak\ % bias},over~ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_f italic_i italic_n italic_a italic_l italic_b italic_i italic_a italic_s end_POSTSUPERSCRIPT ,

where εifinalbiassuperscriptsubscript𝜀𝑖𝑓𝑖𝑛𝑎𝑙𝑏𝑖𝑎𝑠\varepsilon_{i}^{final\leavevmode\nobreak\ bias}italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_f italic_i italic_n italic_a italic_l italic_b italic_i italic_a italic_s end_POSTSUPERSCRIPT can be seen as the summation of all εisubscript𝜀𝑖\varepsilon_{i}italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT applied to an individual during the iterative process. To control the level of correction and stop the algorithm when needed, we introduce a second hyperparameter ζ[0,)𝜁0\zeta\in[0,\infty)italic_ζ ∈ [ 0 , ∞ ) that will be a threshold that defines the maximal level of acceptable correction. Given a gender s (s binary in our case), we define :

Σs=i=1nsεi=i=1nsΔ𝒱i,subscriptΣ𝑠superscriptsubscript𝑖1subscript𝑛𝑠subscript𝜀𝑖superscriptsubscript𝑖1subscript𝑛𝑠superscriptsubscriptΔ𝒱𝑖\Sigma_{s}=\sum_{i=1}^{n_{s}}\varepsilon_{i}=\sum_{i=1}^{n_{s}}\Delta_{% \mathcal{V}}^{i},roman_Σ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUPERSCRIPT roman_Δ start_POSTSUBSCRIPT caligraphic_V end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ,

the sum of the fairness biases of all nssubscript𝑛𝑠n_{s}italic_n start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT individuals in a gender subgroup. When Σs<ζsubscriptΣ𝑠𝜁\Sigma_{s}<\zetaroman_Σ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT < italic_ζ, the algorithm will be stopped. This way, ζ𝜁\zetaitalic_ζ can be used to control the tradeoff between fairness and performance, larger corrections leading to better fairness but a degradation of the quality of the premiums. The quality of the premiums after redistribution is measured using two metrics :

  • Redistribution integrity, measures to which extent the redistribution has altered the scope of the premium’s distribution :

    (maxy~miny~)(maxy^miny^).~𝑦~𝑦^𝑦^𝑦\frac{(\max\widetilde{y}-\min\widetilde{y})}{(\max\widehat{y}-\min\widehat{y})}.divide start_ARG ( roman_max over~ start_ARG italic_y end_ARG - roman_min over~ start_ARG italic_y end_ARG ) end_ARG start_ARG ( roman_max over^ start_ARG italic_y end_ARG - roman_min over^ start_ARG italic_y end_ARG ) end_ARG .
  • Global variation. Within the framework of the optimal transport approach, the fair premium is characterized by a balance property, corresponding to a null global variation. Measures the lost in premium induced by the redistribution :

    i=1ny~ii=1ny^i.superscriptsubscript𝑖1𝑛subscript~𝑦𝑖superscriptsubscript𝑖1𝑛subscript^𝑦𝑖\sum_{i=1}^{n}\widetilde{y}_{i}-\sum_{i=1}^{n}\widehat{y}_{i}.∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT over~ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT .

With these elements, we detail the alternative approach in the following lines. Initialize by choosing η,ζ𝜂𝜁\eta,\zetaitalic_η , italic_ζ and setting a subgroup of s𝑠sitalic_s to start with (s=0)𝑠0(s=0)( italic_s = 0 ). At each iteration, the treated subgroup will be switched and the bias will be recalculated. While ΣsζsubscriptΣ𝑠𝜁\Sigma_{s}\geq\zetaroman_Σ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ≥ italic_ζ repeat the following steps :

  1. 1.

    Correct the premiums for the s𝑠sitalic_s gender subgroup (update y^isubscript^𝑦𝑖\widehat{y}_{i}over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT):

    y^i=y^iεiη,i with si=s.formulae-sequencesubscript^𝑦𝑖subscript^𝑦𝑖subscript𝜀𝑖𝜂for-all𝑖 with subscript𝑠𝑖𝑠\widehat{y}_{i}=\widehat{y}_{i}-\frac{\varepsilon_{i}}{\eta},\leavevmode% \nobreak\ \leavevmode\nobreak\ \forall i\text{\leavevmode\nobreak\ with% \leavevmode\nobreak\ }s_{i}=s.over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - divide start_ARG italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_η end_ARG , ∀ italic_i with italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_s .
  2. 2.

    Switch gender subgroup, take the opposite gender :

    {if s=0, then s=1,if s=1, then s=0.casesformulae-sequenceif 𝑠0 then 𝑠1otherwiseformulae-sequenceif 𝑠1 then 𝑠0otherwise\begin{cases}\text{if }s=0,\text{ then }s=1,\\ \text{if }s=1,\text{ then }s=0.\end{cases}{ start_ROW start_CELL if italic_s = 0 , then italic_s = 1 , end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL if italic_s = 1 , then italic_s = 0 . end_CELL start_CELL end_CELL end_ROW
  3. 3.

    Measure the bias εisubscript𝜀𝑖\varepsilon_{i}italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT between individuals of gender s𝑠sitalic_s and their neighbors of opposite gender using the flip-test method (update εisubscript𝜀𝑖\varepsilon_{i}italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT):

    εi=y^i1kxj𝒱s(𝒙i)y^j=Δ𝒱i,i with si=s.formulae-sequencesubscript𝜀𝑖subscript^𝑦𝑖1𝑘subscriptsubscript𝑥𝑗superscript𝒱𝑠subscript𝒙𝑖subscript^𝑦𝑗superscriptsubscriptΔ𝒱𝑖for-all𝑖 with subscript𝑠𝑖𝑠\varepsilon_{i}=\widehat{y}_{i}-\frac{1}{k}\sum_{x_{j}\in\mathcal{V}^{s}(% \boldsymbol{x}_{i})}\widehat{y}_{j}=\Delta_{\mathcal{V}}^{i},\leavevmode% \nobreak\ \leavevmode\nobreak\ \forall i\text{\leavevmode\nobreak\ with% \leavevmode\nobreak\ }s_{i}=s.italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_k end_ARG ∑ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ caligraphic_V start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = roman_Δ start_POSTSUBSCRIPT caligraphic_V end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , ∀ italic_i with italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_s .
  4. 4.

    Calculate the sum of the differences:

    Σs=i=1nsεi,subscriptΣ𝑠superscriptsubscript𝑖1subscript𝑛𝑠subscript𝜀𝑖\Sigma_{s}=\sum_{i=1}^{n_{s}}\varepsilon_{i},roman_Σ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ,

    if this sum is greater than ζ𝜁\zetaitalic_ζ then repeat these five steps else stop the iterative process.

The genders are switched before steps 3 and 4 because, after correction on a given gender subgroup, we want to recalculate the biases on the opposite gender subgroup and perform the check on the new ΣssubscriptΣ𝑠\Sigma_{s}roman_Σ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT. If ΣssubscriptΣ𝑠\Sigma_{s}roman_Σ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT is still above our threshold ζ𝜁\zetaitalic_ζ, then we will repeat step 1 before switching for the next verification. This is because each correction on y^^𝑦\widehat{y}over^ start_ARG italic_y end_ARG leads to a new fairness state which must be reevaluated with εisubscript𝜀𝑖\varepsilon_{i}italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT to ensure a proper correction in the next iteration. Performing the check on the opposite gender ensures that the algorithm doesn’t stop immediately after one gender subgroup has obtained sufficient corrections. As hyperparameters, ζ𝜁\zetaitalic_ζ and η𝜂\etaitalic_η have to be optimized to ensure the best possible results with regards to the redistribution integrity and global variation metrics.

pros cons
Pre-processing preserves the overall modeling process. Straightforward to implement and consume less computational time. can lead to substantial loss of information. Variables may be closely interrelated and might not allow for both bias reduction and information retention. Additionally, it’s crucial to prevent the modeling process from introducing new biases related to fairness.
Total deletion allows simple variable selection. HGR coefficients facilitates the identification of dependencies that classical measures might overlook. depends on the ability of the remaining variables to compensate for the loss of information resulting from variable deletion.
Correlation remover achieves an intriguing balance between fairness and performance. It is relatively simple to implement, relying on linear regression. requires quantitative explanatory variables and ideally a quantitative sensitive variable. After correction, discretization of the transformed variable’s distribution is necessary, along with the construction of a correspondence function between the original and transformed variables for predictions.
Fair-SMOTE adaptation needs various scenarios and hyperparameters allow customization to address specific problem requirements while sampling. seems to have a limited effect on historical fairness bias and interdependencies.
In-processing leverages sensitive variable information to identify the optimal trade-off between performance and fairness. In theory, they are more likely to yield the best possible compromise. can be challenging to implement and generalize. Even after successful implementation, there is no assurance of convergence, and computation times can become exponential.
Exponentiated gradient offers the advantage of providing a comprehensive framework for the direct integration of fairness constraints into machine learning models. may require exponential computation times, even when dealing with relatively simple constraints. Certain cost functions and the imposition of various constraints may necessitate fundamental restructuring of the system to accommodate the application of the exponentiated gradient method.
Post-processing avoids model recalibration, leading to shorter computation times, and produce outcomes less susceptible to bias contamination. relies on the quality of built models, premium, and bias models, to be effective. Applying mitigation using inaccurately estimated components can result in inconsistencies.
Fair redistribution addresses individual fairness within regression scenarios and can be tailored to suit the specific problem under consideration. needs monitoring of the quality of the premium and KNN models. And, considering the current distribution constraints, an extra step will may be necessary building a grid that encompasses all premiums corrections or distributable model that predicts the corrected premiums.
Table 1: Summary of pros and cons of the different mitigation approaches

4 Measuring biases in car insurance pricing

For this glass breakage guarantee pricing application, various business and operational constraints alongside statistical considerations are essential throughout the pricing process, such as distribution constraints. Table 2 displays the variables extracted from a car insurance database, augmented with vehicle information. It includes variable names, descriptions, values, and statistics. Quantitative variables show mean and median values, while qualitative ones indicate the shares of the two most common categories.

Variable name Description Value taken Statistics
claim_amount individual claims expenses () [0,1825]01825[0,1825][ 0 , 1825 ] x¯:580|Me:596:¯𝑥conditional580𝑀𝑒:596\bar{x}:580\text{\texteuro}\leavevmode\nobreak\ |\leavevmode\nobreak\ Me:596% \text{\texteuro}over¯ start_ARG italic_x end_ARG : 580 € | italic_M italic_e : 596 €
claim_nb number of claims [0,5]05[0,5][ 0 , 5 ] 0:97.3%| 1:2.7%:0conditionalpercent97.31:percent2.70:97.3\%\leavevmode\nobreak\ |\leavevmode\nobreak\ 1:2.7\%0 : 97.3 % | 1 : 2.7 %
expo exposure by contract [0.34%,100%]percent0.34percent100[0{.}34\%,100\%][ 0.34 % , 100 % ] x¯:79%|Me:95%:¯𝑥conditionalpercent79𝑀𝑒:percent95\bar{x}:79\%\leavevmode\nobreak\ |\leavevmode\nobreak\ Me:95\%over¯ start_ARG italic_x end_ARG : 79 % | italic_M italic_e : 95 %
year_pol year the policy was purchased [2015,2020]20152020[2015,2020][ 2015 , 2020 ] 2018:18%| 2020:17%:2018conditionalpercent182020:percent172018:18\%\leavevmode\nobreak\ |\leavevmode\nobreak\ 2020:17\%2018 : 18 % | 2020 : 17 %
driv_age age of primary driver [18,77]1877[18,77][ 18 , 77 ] x¯:47|Me:45:¯𝑥conditional47𝑀𝑒:45\bar{x}:47\leavevmode\nobreak\ |\leavevmode\nobreak\ Me:45over¯ start_ARG italic_x end_ARG : 47 | italic_M italic_e : 45
driv_yp number of years in portfolio [0,12]012[0,12][ 0 , 12 ] x¯:1|Me:1.98:¯𝑥conditional1𝑀𝑒:1.98\bar{x}:1\leavevmode\nobreak\ |\leavevmode\nobreak\ Me:1.98over¯ start_ARG italic_x end_ARG : 1 | italic_M italic_e : 1.98
area area code 17 zones D:29%|F:23%:𝐷conditionalpercent29𝐹:percent23D:29\%\leavevmode\nobreak\ |\leavevmode\nobreak\ F:23\%italic_D : 29 % | italic_F : 23 %
driv_gender gender of primary driver F, M M:58.4%:𝑀percent58.4M:58{.}4\%italic_M : 58.4 %
driv_ly driver licence seniority [0,44]044[0,44][ 0 , 44 ] x¯:15|Me:17:¯𝑥conditional15𝑀𝑒:17\bar{x}:15\leavevmode\nobreak\ |\leavevmode\nobreak\ Me:17over¯ start_ARG italic_x end_ARG : 15 | italic_M italic_e : 17
driv_2 presence of secondary driver 0, 1 0:69%:0percent690:69\%0 : 69 %
veh_age age in year of the vehicle [0,44]044[0,44][ 0 , 44 ] x¯:5|Me:4:¯𝑥conditional5𝑀𝑒:4\bar{x}:5\leavevmode\nobreak\ |\leavevmode\nobreak\ Me:4over¯ start_ARG italic_x end_ARG : 5 | italic_M italic_e : 4
energy type of energy 5 types D:57%|E:43%:𝐷conditionalpercent57𝐸:percent43D:57\%\leavevmode\nobreak\ |\leavevmode\nobreak\ E:43\%italic_D : 57 % | italic_E : 43 %
weight weight in kilograms [830,3200]8303200[830,3200][ 830 , 3200 ] x¯:1240|Me:1280:¯𝑥conditional1240𝑀𝑒:1280\bar{x}:1240\leavevmode\nobreak\ |\leavevmode\nobreak\ Me:1280over¯ start_ARG italic_x end_ARG : 1240 | italic_M italic_e : 1280
veh_power vehicle power in KW [13,220]13220[13,220][ 13 , 220 ] x¯:94|Me:91:¯𝑥conditional94𝑀𝑒:91\bar{x}:94\leavevmode\nobreak\ |\leavevmode\nobreak\ Me:91over¯ start_ARG italic_x end_ARG : 94 | italic_M italic_e : 91
veh_price vehicule price [6.8k,65k]6.8𝑘65𝑘[6.8k,65k][ 6.8 italic_k , 65 italic_k ] x¯:21400|Me:15425:¯𝑥conditional21400𝑀𝑒:15425\bar{x}:21400\text{\texteuro}\leavevmode\nobreak\ |\leavevmode\nobreak\ Me:154% 25\text{\texteuro}over¯ start_ARG italic_x end_ARG : 21400 € | italic_M italic_e : 15425 €
box_type type of gearbox 2 types A:91.1%:𝐴percent91.1A:91.1\%italic_A : 91.1 %
claim_hist claim occurrence in previous observed years 0, 1 0:91.7%:0percent91.70:91.7\%0 : 91.7 %
Table 2: Description of the dataset’s variables.

The data underwent processing and analysis, including variable reprocessing and discretization, to construct a modeling database. Output variables frequency, average cost, and pure premium were constructed using claim_amount, claim_nb, and expo. Two additional variables were created: zoning, describing risk zones with ten categories, and weight_kw, representing the weight of the vehicle divided by its power. Models were built using Generalized Linear Models (GLM), Random Forest, and Gradient Boosting, evaluated using Root Mean Square Error (RMSE) and Loss Ratios (LR), and optimized and validated using interpretability methods.

Following the pricing phase, GLM models were selected for retention due to their interpretability and ease of integration into production pricing tools. Despite the potential for black-box models to achieve better performance, the marginal gain did not justify their operational costs. However, calculations were performed on all models at each stage, and there were generally no significant deviations.

Furthermore, the pure premium model was favored over a combined cost and frequency model for efficiency according to defined metrics. Reference models were established by excluding gender. The study then delves into measuring and mitigating gender bias to refine the pricing process using developed tools.

To assess fairness bias, six different dependence measures are employed:

  • 1-2

    Kendall’s tau and mean ratio : These provide an initial understanding of dependence, offering a comprehensive first impression,

  • 3-4

    Kolmogorov-Smirnov’s test p𝑝pitalic_p-value and the JS divergence : These quantify dependence between distributions,

  • 5

    HGR (or HGR_KDE) : A potent metric offering a nuanced perspective,

  • 6

    Flip-test adaptation : An individual-based fairness metric.

The first five measures focus on group fairness, specifically independence, disregarding variable Y𝑌Yitalic_Y. The sixth measure, the Flip-Test Adaptation, examines bias at the individual level, providing an alternative perspective.

4.1 Bias on historical data

Table 3 displays the results of the dependence measures between each of the variables of interest and the sensitive variable.

Y𝑌Yitalic_Y Kendall HGR_KDE KS (p𝑝pitalic_p-value) Div_JS mean_ratio Flip-test
Average cost -0.0796 0.0903 1.9426e-07 0.3217 1.1024 -10.23
Frequency -0.0197 0.3031 3.3333e-02 0.8413 1.2489 -2.57%
Premium -0.0212 0.3106 3.8555e-02 0.8401 1.3450 -7.88
Table 3: Dependence between Y𝑌Yitalic_Y and S𝑆Sitalic_S before modeling

Kendall’s tau indicates a weak dependence between the variables of interest (Y𝑌Yitalic_Y and S𝑆Sitalic_S), with a negative sign suggesting slightly weaker values for women. HGR also detects slightly stronger relationships, consistent with Kendall’s tau. The KS test highlights significant differences between male and female distributions.

According to the flip-test, compared to similar male policyholders, women have an average cost €10 lower. This observation aligns with the negative sign of Kendall’s tau and the mean ratio values, indicating lower averages for women compared to men.

Figures 1 and 2 analyze the distributions of these variables with respect to gender. For the frequency variable, distributions between [0,1)01[0,1)[ 0 , 1 ) for males and females are similar, representing 97.3% of the population. At 1111, there’s a 15% higher representation of women compared to men, accounting for 1.2% of the population. Above 1111, men are more strongly represented, with an 18% higher presence compared to women, representing 1.5% of the population.

Refer to caption
Figure 1: Historical average cost and premium distribution by gender.
Refer to caption
Figure 2: This histogram displays the historical premium distribution by gender. The premium is derived from the average cost times frequency distribution, with zeros excluded to facilitate the visualization of the remaining distribution.

The observed distributions suggest that, on average, men generally exhibit higher values for claim cost, frequency, and premium compared to women in historical data. However, interpreting these figures in absolute terms poses challenges due to the difficulty in establishing critical thresholds, given the lack of precedents in this type of problem. Nonetheless, these initial findings imply that gender has an impact on the variables of interest. Besides the class imbalance (more men than women), bias appears evident in the historical data. Such differences are well-documented in automobile pricing and are typically addressed by either excluding gender as a factor or rebalancing the model outputs. It’s worth noting that risk exposure may differ between gender classes due to various factors (see Ayuso et al. (2016)). To mitigate this effect, our dataset comprises individuals who subscribed to the same mileage package, providing the most accurate information available on vehicle usage.

4.2 Bias after modeling using gender

By definition, the best-performing model would include gender as a feature because it leverages all the available information for risk modeling. However, it is also the most unfair model concerning gender since a clear distinction between men and women is directly visible in the resulting premiums. To create this model, gender is reintroduced into the modeling process. Once constructed, predictions Y^^𝑌\widehat{Y}over^ start_ARG italic_Y end_ARG are obtained, and the dependence between S𝑆Sitalic_S and Y^^𝑌\widehat{Y}over^ start_ARG italic_Y end_ARG is measured. Table 4 presents the results of these different measures.

Y^^𝑌\widehat{Y}over^ start_ARG italic_Y end_ARG Kendall HGR_KDE KS (p𝑝pitalic_p-value) Div_JS mean_ratio Flip-test
Average cost -0.1824 0.2241 0.0000e+00 0.3825 1.0960 -3.81
Frequency -0.1949 0.3144 0.0000e+00 0.7333 1.2219 -1.09%
Premium -0.2101 0.3268 0.0000e+00 0.7116 1.3524 -1.44
Table 4: Dependence between Y^^𝑌\widehat{Y}over^ start_ARG italic_Y end_ARG and S𝑆Sitalic_S after modeling containing the gender

The dependencies between the interest variables and the sensitive variable were amplified according to almost all measures. Measures leveraging distributions, like div_JS and the flip-test, suggest weaker dependencies because Y^^𝑌\widehat{Y}over^ start_ARG italic_Y end_ARG has less dispersion than Y𝑌Yitalic_Y. Kendall’s tau detects triple the dependence for the cost variable, implying a somewhat simpler detection of the dependency structure. Thus, the constructed models not only replicated the historical bias present in the data but also exacerbated it. Figures 3, 4, and 5 analyze the distributions of these predicted interest variables concerning gender.

Refer to caption
Figure 3: predicted average cost by gender.
Refer to caption
Figure 4: predicted pure premium by gender.
Refer to caption
Figure 5: predicted frequency by gender.

The mismatch between the distributions is more pronounced post-modeling, with larger portions of the distributions not overlapping. The differences are more evident compared to the historical data case.

How can we explain this bias amplification? The interpretation of the constructed models reveals that gender plays a significant role in the prediction processes. By examining coefficients and importance measures, it becomes evident that gender is among the most important variables in these models.

The greater measured bias in the constructed models compared to the historical data used for modeling can be explained by the interdependence of the variables in the dataset. There is a link between the explanatory variables selected and the variables of interest because they help understand the associated risk. However, these variables, besides their predictive abilities, may also be related to each other. For example, a vehicle’s power may be linked to its price. These interdependencies pose challenges for fairness implementation because the so-called sensitive variables often have a notable influence on the distributions of the other observed variables. Thus, even if the observable interdependencies seem weak at first glance, their accumulation can magnify the role of the sensitive variable in the models.

In the data used, gender is significantly linked with variables such as horsepower, weight, gearbox type, and vehicle price, in addition to its minor relationships with other explanatory variables. Figure 6 visually represents how gender directly influences the variables of interest and has a significant indirect effect through its relationships with other variables.

Refer to caption
Figure 6: Direct and indirect effects of gender on the predicted average cost. The darker the line, the stronger the relationship.

For instance, in the cost model, gender is ranked as the fourth most influential variable. Nevertheless, it’s noteworthy that the most substantial variables within the model are those closely related to gender. Generally, sensitive variables exert a significant impact on individuals’ behavior. For instance, age can influence risk-taking propensity, the level of responsibility, personal interests, and more. Similarly, ethnicity may correlate with factors such as geographical location, purchased services, and cultural preferences.

These analyses inevitably lead to a central question: What are the implications when gender is omitted or excluded from the modeling process? This raises the possibility that the observed interdependencies might have been overemphasized, and that the models have unduly relied on gender as a driving factor.

4.3 Bias after modeling without gender

By definition, a model that excludes gender as a variable aims to eliminate direct discrimination, ensuring that final outcomes are not discerned based on gender. However, it’s acknowledged that this strategy doesn’t address indirect discrimination as it overlooks the interrelationships between other explanatory variables and gender. Table 5 presents findings on the association between Y^^𝑌\widehat{Y}over^ start_ARG italic_Y end_ARG and gender in models where gender isn’t included as an explanatory variable.

Y^^𝑌\widehat{Y}over^ start_ARG italic_Y end_ARG Kendall HGR_KDE KS (p𝑝pitalic_p-value) Div_JS mean_ratio Flip-test
Average cost -0.1681 0.2111 9.7460e-39 0.3288 1.0118 -2.71
Frequency -0.1309 0.2741 0.0000e+00 0.6493 1.1573 -1.07%
Premium -0.1733 0.2897 0.0000e+00 0.7226 1.2694 -0.88
Table 5: Dependence between Y^^𝑌\widehat{Y}over^ start_ARG italic_Y end_ARG and S𝑆Sitalic_S after modeling without gender

The results show a minimal reduction in bias compared to models where gender is included, indicating that gender’s influence remains pronounced even when it’s absent from the models. While it may seem counterintuitive that premiums no longer differentiate by gender, there are still distinctions observed based on gender, primarily due to indirect interdependencies within the data. For example, varying premiums based on the presence of secondary drivers indirectly leads to varied premiums for each gender because secondary drivers have different distributions by gender. In complex multivariate models, such distinctions based on gender can arise from the combination of multiple variables. For instance, a combination of factors such as vehicle power, gearbox type, and zoning may result in significant gender imbalances within the dataset. When examining the intersection of zoning and vehicle prices, notable gender imbalances become evident in the data. For instance, 82% of individuals residing in the 10th zone and owning a vehicle costing over €30,000 are male. Consequently, any substantial premium difference within this segment compared to others significantly impacts men, leading to gender-based premium disparities, despite the absence of explicit gender classification.

4.4 A different modeling approach

These findings highlight that implementing fairness through omission does not offer a comprehensive solution to discrimination. An alternative modeling approach is employed to assess dependency levels. Instead of excluding the gender variable, it is included in the model, and then the outputs are post-processed to ensure gender-insensitive pricing, aligning with strategies used by insurers to comply with regulations like the gender directive. The specific approach calculates a weighted average of male and female model outputs for all segments, but the results show similar dependencies between the new values of Y^^𝑌\widehat{Y}over^ start_ARG italic_Y end_ARG and S𝑆Sitalic_S as previous modeling strategies. This method fails to resolve interdependencies, as reweighting the outputs perpetuates existing imbalances.

Consequently, gender’s influence persists within the modeling process, even after its omission or conventional adjustment. Some scholars refer to this as the "reconstruction" of the sensitive variable, where bias continues to manifest due to interdependencies between explanatory variables and the sensitive variable. This study exemplifies such a scenario.

In summary, bias exists in historical data, and regardless of the modeling approach applied, it is perpetuated, primarily due to interdependencies. The subsequent section explores bias mitigation while maintaining acceptable performance, using the model constructed without gender as reference.

5 Bias mitigation in car insurance pricing

The models have been constructed (using standard machine learning approaches), performance and biases have been assessed, and the next step involves mitigating bias while upholding the performance level. This section provides the implementation specifics and outcomes of the various bias mitigation methods introduced in Section 3.

5.1 Total deletion

To establish the deletion scenarios, we analyze the relationship between sensitive and non-sensitive variables. The results are depicted in Figure 7.

Refer to caption
Figure 7: Interdependence between gender and other explanatory variables

By examining these dependency levels, α𝛼\alphaitalic_α can assume values within the range of (8%,41%].percent8percent41(8\%,41\%].( 8 % , 41 % ] . Scenarios are developed by iteratively adjusting the threshold, considering the relationships between the variables. The chosen scenarios are as follows:

  • Scenario 1: driv_2;

  • Scenario 2: driv_2 and box_type;

  • Scenario 3: driv_2, box_type and energy;

  • Scenario 4: box_type, driv_2, energy and weight;

  • Scenario 5: box_type, driv_2, energy and anc_cp;

  • Scenario 6: box_type, driv_2, energy, and zoning;

  • Scenario 7: box_type, driv_2, energy and veh_price;

  • Scenario 8: box_type, driv_2, energy, veh_price and weight_kw;

  • Scenario 9: box_type, driv_2, energy, veh_price and zoning;

  • Scenario 10: box_type, driv_2, energy, weight and zoning;

  • Scenario 11: box_type, driv_2, energy, veh_price, weight_kw and zoning.

The gender variable is eliminated from all models. Figure 8 illustrates the fairness and performance levels of each scenario. The non-dominated scenarios are highlighted in red. The abbreviation dbe refers to the variables driv_2, box_type and energy. For example, scenario 6 is represented on the graph as dbe+zoning.

Refer to caption
Figure 8: model’s fairness according to their performance

Among the non-dominated scenarios, a choice must be made based on the desired trade-off as determined by decision-makers. For instance, the objective might be to attain the fairest model while tolerating a maximum performance loss of 5%. Scenarios 8, 9, 10, and 11, although among the fairest, would no longer meet this criterion. Upon analyzing the graph, the 7th scenario, labeled as dbe+veh_price, emerges as an interesting trade-off option, as it significantly reduces bias, almost halving it, with an acceptable performance loss. Compared to the reference model, a bias reduction of 37% is achieved in exchange for a 3.9% drop in performance. The 7th scenario model is further examined.

It appears that the model has learned to rely more on other variables to compensate for the absence of the veh_price variable. For instance, some variables like zoning, driv_ly, and veh_age now have larger coefficients and play a more significant role in the prediction process. However, the performance loss still indicates that these omitted variables cannot be entirely replaced optimally. Furthermore, this new model leads to a slightly degraded equilibrium in pricing compared to the reference equilibrium. The LR metric obtained is 99.3% compared to the reference LR of 99.7%. It is essential to note that once the input data are modified, a reevaluation and validation of the model are necessary.

Hence, this initial approach can yield satisfactory results depending on the quality of the explanatory variables and the acceptable level of performance loss. It is a straightforward method to implement and underscores the importance of addressing fairness concerns from the initial stages of data processing and variable selection.

5.2 Correlation remover

To apply the methodology introduced in Section 3.1.2, we have chosen to retain qualitative variables while focusing solely on transforming quantitative variables. The quantitative variables subject to transformation include: veh_price, weight_kw, veh_age, driv_yp, and driv_ly. We uniformly select 100100100100 values of α𝛼\alphaitalic_α within the interval [0,1]01[0,1][ 0 , 1 ], including both endpoints. For each α𝛼\alphaitalic_α value, we correct our quantitative variables, build our model, and measure performance and fairness.

From these results, two observations arise. First, there is no clear trend between the fairness level and the hyperparameter α𝛼\alphaitalic_α. Typically, one might expect fairness and performance to decrease as α𝛼\alphaitalic_α increases, but this trend is not evident. The relationship between fairness, performance levels, and α𝛼\alphaitalic_α appears random, lacking a discernible pattern. Second, using this method, some models outperform the reference model. For example, the model with α=85%𝛼percent85\alpha=85\%italic_α = 85 % shows lower bias and slightly better performance. While the performance gain is only around 0.5%, it comes with an 11% improvement in fairness. However, the efficacy of this method relies on the linear model’s ability to detect relationships, ensuring that residuals are unbiased. Nonetheless, examining linear correlations reveals that the highest dependence between S𝑆Sitalic_S and other explanatory variables is 8%, significantly limiting suppression capacity in this specific case.

While this method seems promising, it presents several limitations within the insurance pricing context. First, in pricing models, it is more common to use discrete or discretized variables as inputs. Ideally, continuous variables should be discretized after suppression to ensure interpretability. However, as the values represent residuals, they may lose interpretability. Additionally, in practical production scenarios, transforming customer attributes into predicted premiums may be complicated by this method. Lastly, the non-intuitive model behavior in response to changes in α𝛼\alphaitalic_α lacks a clear explanation. Considering these factors, while a model outperforming the reference model was achieved, adopting it is challenging due to associated limitations. This method may find greater success when applied to a continuous sensitive variable.

5.3 Fair-SMOTE adaptation

Balancing the number of bins and the size of each subpopulation, we have opted for a segmentation into seven bins, with the following breakdown:

  • bin 1 if y=0𝑦0y=0italic_y = 0;

  • bin 2 if y(0,250]𝑦0250y\in(0,250]italic_y ∈ ( 0 , 250 ];

  • bin 3 if y(250,500]𝑦250500y\in(250,500]italic_y ∈ ( 250 , 500 ];

  • bin 4 if y(500,750]𝑦500750y\in(500,750]italic_y ∈ ( 500 , 750 ];

  • bin 5 if y(750,1000]𝑦7501000y\in(750,1000]italic_y ∈ ( 750 , 1000 ];

  • bin 6 if y(1000,1500]𝑦10001500y\in(1000,1500]italic_y ∈ ( 1000 , 1500 ];

  • bin 7 if y(1500,+)𝑦1500y\in(1500,+\infty)italic_y ∈ ( 1500 , + ∞ ).


The number of bins serves as a hyperparameter, resulting in various binning scenarios. The selection of these bins depends on their alignment with the observed trends in the premium distribution.

Following the bin selection, the next step is to determine the scope of resampling. As previously discussed, resampling will only be performed on S|Yconditional𝑆𝑌S|Yitalic_S | italic_Y since resampling on Y𝑌Yitalic_Y would alter the specific distribution needed for the GLM. Although for algorithms like random forest, which make no distribution assumptions, resampling on Y𝑌Yitalic_Y was attempted, it led to a notably lower LR due to the high premium levels predicted by the models.

It’s worth noting that the creators of the traditional fair-SMOTE method recommend setting st𝑠𝑡stitalic_s italic_t and ft𝑓𝑡ftitalic_f italic_t to 0.8 each. This configuration is retained for consistency. The distribution after resampling is illustrated in Figure 9.

Refer to caption
Figure 9: Distribution of Y𝑌Yitalic_Y after resampling.

In total, 9588 individuals were generated, constituting a substantial expansion of the learning dataset by 16.1%. The resulting distribution of this enlarged dataset remains in alignment with the initial observations, and the distinctions between genders are notably less prominent. The box plots demonstrate a substantial overlap, with the noticeable shift observed in the historical data significantly mitigated. Therefore, this resampling procedure appears to be effective. Following the preparation of the augmented training data, the models are reconstructed, and the outcomes are presented in Table 6.

Models RMSE HGR_KDE LR
Reference model 164.21 28.97% 99.71%
Model after fair-SMOTE 164.88 28.05% 99.69%
Table 6: Comparative table of results after fair-SMOTE.

The initial distribution shows reduced skewness, but the outcomes on the test dataset still exhibit skewed results. The improvement in fairness is minimal, with less than a 4% enhancement. Performance metrics and the LR remain similar to the baseline. Therefore, rebalancing the distributions solely based on S|Yconditional𝑆𝑌S|Yitalic_S | italic_Y seems inadequate for achieving fairness. This could be because the bias needing mitigation in car pricing does not stem from a representation issue concerning Y𝑌Yitalic_Y, but rather from historical differences in male and female premiums. Thus, while resampling ensures both genders receive sufficient data for equitable treatment during training, it doesn’t address the interdependencies between variables, crucial for ensuring fairness.

Sensitivity analyses on various modeling choices were conducted to explore whether these choices contributed to the ineffectiveness of our approach. Changes to hyperparameters ft𝑓𝑡ftitalic_f italic_t and st𝑠𝑡stitalic_s italic_t didn’t result in improved outcomes. Adjusting the resampling bins also didn’t lead to significant enhancements. Increasing the number of bins introduced more irregularities in the distribution of Y𝑌Yitalic_Y and reduced model performance.

A significant departure from conventional practices in the literature was limiting the resampling process solely to S|Yconditional𝑆𝑌S|Yitalic_S | italic_Y rather than conducting full resampling on Y𝑌Yitalic_Y. Full resampling of Y𝑌Yitalic_Y altered its distribution drastically, generating an excessive number of artificial instances and failing to produce satisfactory results. To explore the potential benefits of resampling on Y𝑌Yitalic_Y while preserving the overall distribution shape, partial rebalancing of Y𝑌Yitalic_Y values was considered. For instance, in scenarios where Y𝑌Yitalic_Y has only two distinct values, such as {25,35}2535\{25,35\}{ 25 , 35 }, Tables 10 to 10 demonstrate the various forms of resampling undertaken.

Y/S F H Total
25 250 150 400
35 120 170 290
Total 370 320 690
Table 7: initial distribution
Y/S F H Total
25 250 250 500
35 170 170 340
Total 420 420 840
Table 8: Balance on S|Yconditional𝑆𝑌S|Yitalic_S | italic_Y.
Y/S F H Total
25 250 250 500
35 250 250 500
Total 500 500 1000
Table 9: Balance on S|Yconditional𝑆𝑌S|Yitalic_S | italic_Y and Y𝑌Yitalic_Y.
Y/S F H Total
25 250 250 500
35 200 200 400
Total 450 450 900
Table 10: Balance on S|Yconditional𝑆𝑌S|Yitalic_S | italic_Y and partially on Y𝑌Yitalic_Y.

Multiple scenarios involving partial resampling of Y𝑌Yitalic_Y were explored. The results indicate that as the number of artificially generated individuals increases, model performance worsens without simultaneous improvements in fairness. For example, expanding the population by 20% through resampling led to an increased RMSE of 177, while the HGR fairness metric remained at 28.05. Ultimately, these approaches failed to produce better results.

An unexplored area for potential improvement involves considering the explanatory variables during the resampling process. Instead of resampling solely to achieve equity based on S|Yconditional𝑆𝑌S|Yitalic_S | italic_Y, an alternative approach is resampling based on S|X1,,Xp,Yconditional𝑆subscript𝑋1subscript𝑋𝑝𝑌S|X_{1},\dots,X_{p},Yitalic_S | italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_X start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT , italic_Y. The selection of variables for conditioning is crucial here, as they can potentially amplify existing biases. Implementing this approach is complex, as it requires overseeing the quality of resampling on individual variables and their interactions while maintaining the overall consistency of premium distributions. Although this approach hasn’t been extensively explored in this study, integrating more efficient methods for addressing interdependencies between variables may be considered in future research.

In summary, the various pre-modeling bias mitigation strategies have provided insights into the data’s structure and identified elements with significant influence on bias levels. They offer valuable tools for addressing ethical concerns during the data preprocessing phase.

5.4 Exponentiated gradient

To achieve the model with the lowest permissible error level, bounded by ζ𝜁\zetaitalic_ζ, a specific approach is adopted. The method starts with the lowest feasible ζ𝜁\zetaitalic_ζ value, which is initially set at 105superscript10510^{-5}10 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT, and progressively increases it in multiples of 10 until the algorithm converges. The first convergence occurred when ζ=101𝜁superscript101\zeta=10^{-1}italic_ζ = 10 start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT. Once the method converges to the optimal solution, the grid search and the M𝑀Mitalic_M parameter become unnecessary.

However, for ζ<101𝜁superscript101\zeta<10^{-1}italic_ζ < 10 start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, these tools can help identify suboptimal solutions that may outperform the solution found with larger tolerances. During testing, values of M𝑀Mitalic_M ranging from 0.5 to 500 were employed with grid sizes of 3000. The results obtained were not as favorable as those achieved with convergence. It is essential to note that the grids explored were relatively small compared to the dimensionality of the dataset. Given more computational power and time, improved results might potentially be attainable.

The outcomes for the best model are presented in Table 11.

Models RMSE HGR_KDE LR
Reference model 164.21 28.97% 99.71%
Model after mitigation 164.30 31.74% 99.70%
Table 11: Evaluation of the results obtained after application of the exponentiated gradient.

The model obtained after mitigation exhibits reduced efficiency and increased unfairness compared to the reference model. While the loss of performance is negligible, the model experiences a 9% higher HGR after the application of the mitigation method. These outcomes can be attributed to the fact that the error constraint implemented does not enforce any form of independence between Y^^𝑌\widehat{Y}over^ start_ARG italic_Y end_ARG and S𝑆Sitalic_S. In other words, the error rate per gender can be the same without ensuring fair treatment of the genders.

In addition to the inability to accommodate alternative fairness definitions, this method presents exponential computation times. The grid search approach and the parameter M𝑀Mitalic_M offer an alternative but lead to suboptimal results. In the rapidly evolving domain of bias mitigation, the aspect of mitigation during modeling stands as one of the most challenging and least developed. The limited availability of methods primarily focused on binary classification, combined with the high customization required for their adaptation, compounds the challenge. Nonetheless, the method explored here, implemented under a relatively straightforward fairness constraint, serves to illustrate the limitations associated with implementing fairness as an optimization constraint. Furthermore, the quest for accessible, generalizable, and stable methods remains a significant mathematical challenge that must consider the constraints inherent to the field of pricing.

5.5 Fair redistribution

This method employs the flip test’s adaptation to measure individual fairness. We optimized the k𝑘kitalic_k-nearest neighbors used in the flip test through hyperparameter tuning. The objective was to select the most suitable variables that would minimize the differences between individuals in a test database. As discussed earlier, the goal is to develop a model that reduces the distance and bias between individuals.

The selected variables include: driv_yp, driv_2, veh_age, veh_price, box_type, and zoning, with a number of neighbors set to 5 and a Manhattan distance metric (1subscript1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT). In Python, there is an attribute in the k𝑘kitalic_k-nearest neighbor algorithm that can automatically select the optimal neighbor search method based on factors like dimensionality, the number of individuals, and the data structure (e.g., whether the matrix is sparse or not). This automated method yielded the most favorable results. Figure 10 illustrates the distribution of disparities between women and their corresponding male neighbors.

Refer to caption
Figure 10: Histogram of εisubscript𝜀𝑖\varepsilon_{i}italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT (gaps) between women and their male neighbors

Subsequently, for each individual in the database, the average disparity with their closest neighbors of the opposite gender is computed. These average discrepancies are incorporated into the original modeling dataset. An experiment is then conducted on a test dataset, encompassing a total cost of 359584 and predicted premiums of 360884 for 14000 observations. The Loss Ratio (LR) of 99.64% aligns well with the overall LR of 99.7%. On average, women’s premiums are 0.8 lower than those of the nearest men. Table 12 provides a summary of the interesting aggregates calculated on the test dataset.

Aggregates Female Male Sum
Total expenses 154880 204704 359584
Predicted Premiums 154953 206931 360884
Exposure 4776.6 7114.6 11891.2
Number of individuals 6069 7931 14000
Average ε𝜀\varepsilonitalic_ε -0.8 1.2 0.4
Sum of bias (ΣssubscriptΣ𝑠\Sigma_{s}roman_Σ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT) -17448 22029 4581
Table 12: Some relevant aggregates on the test perimeter.

The sum of individual fairness bias for women is 17448, while men have a sum of 22029. These disparities can be interpreted as follows: on average, women pay 17448 less than men with similar characteristics. The signs of the differences align with expectations, reflecting that women pose lower risks compared to men. However, the differences do not fully offset each other, resulting in a residual difference of 4581 (-17448 + 22029). This remaining variance can be attributed, in part, to the larger representation of men in the database and the imperfections within the employed models.

To assess the influence of different values for ζ𝜁\zetaitalic_ζ and η𝜂\etaitalic_η, and to identify the optimal combination, the redistribution process is repeated across a grid of values. This grid covers all possible combinations of (η,ζ)𝜂𝜁(\eta,\zeta)( italic_η , italic_ζ ), where η𝜂\etaitalic_η takes on values from the set {2,3,4,5,6,7,8,9,10}2345678910\{2,3,4,5,6,7,8,9,10\}{ 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 } and ζ𝜁\zetaitalic_ζ from the set {2500,2000,1500,1000,500,100,10,1,0.1}25002000150010005001001010.1\{2500,2000,1500,1000,500,100,10,1,0.1\}{ 2500 , 2000 , 1500 , 1000 , 500 , 100 , 10 , 1 , 0.1 }. Three factors are examined: computational time, global variation, and redistribution integrity.

When ζ100𝜁100\zeta\leq 100italic_ζ ≤ 100, the redistribution process results in a substantial reduction in the range of the Y^^𝑌\widehat{Y}over^ start_ARG italic_Y end_ARG distribution. This occurs because the correction needed to achieve such low levels of differences between subgroups is extensive, causing premiums to become increasingly clustered until they eventually converge towards the sample’s mean premium. Consequently, pushing the redistribution method towards maximum convergence leads to an equilibrium where all individuals are assigned the same premium. While this equilibrium represents trivial fairness, it severely deteriorates the model’s performance. Therefore, it becomes imperative to guide the method towards optima where global variation is minimized while preserving the distribution of Y^^𝑌\widehat{Y}over^ start_ARG italic_Y end_ARG. Among the 81 tested combinations, only 10 of them emerged as non-dominated solutions based on criteria of redistribution integrity and global variation. Out of these 10, 5 had a redistribution integrity of less than 25%, making them unacceptable despite having the smallest global variations. The two most promising scenarios are as follows:

  1. 1.

    η=6𝜂6\eta=6italic_η = 6 and ζ=2000𝜁2000\zeta=2000italic_ζ = 2000 for a global variation of 873873873873 and an integrity of 88%percent8888\%88 %;

  2. 2.

    η=5𝜂5\eta=5italic_η = 5 and ζ=2500𝜁2500\zeta=2500italic_ζ = 2500 for a global variation of 771771771771 and an integrity of 86%percent8686\%86 %.


The scenario with 88% fidelity is preferred, resulting in a sum of gaps of -603 for women and +1682 for men. Consequently, the total gap is reduced to 1079, representing a 76% decrease compared to the initial gap of 4581€. The average bias are now 0.071 for women and 0.14 for men, all while preserving a distribution of Y^^𝑌\widehat{Y}over^ start_ARG italic_Y end_ARG that remains faithful to the one prior to redistribution. Figure 11 illustrates the overlap of histograms before and after the redistribution.

Regarding performance, the premiums post-redistribution exhibit a Root Mean Square Error (RMSE) of 164.72, which closely aligns with the baseline value of 164.21. Despite the increase of 873€ in total premiums (global variation), the Loss Ratio (LR) only experiences a minor reduction from 99.64% to 99.39%. Thus, premiums maintain their consistency and calibration while effectively reducing the gaps between men and women. For values of ζ𝜁\zetaitalic_ζ less than or equal to 100, most computation times are under five minutes, rendering this criterion less relevant for defining the best redistribution strategies.

Refer to caption
Figure 11: Distribution of Y^^𝑌\widehat{Y}over^ start_ARG italic_Y end_ARG before and after redistribution

The introduction of bias mitigation methods has prompted the examination of fairness considerations throughout various phases of the pricing process. The successful integration of these mitigations necessitates alignment with the diverse requirements and limitations inherent to pricing. Consequently, they should be operationally feasible while striving to maintain the performance of the models. Among the implemented methods, variable suppression, due to its simplicity, and redistribution, owing to its personalized approach, appear to yield the most promising outcomes. The other methods contribute to a more comprehensive understanding of the analyzed bias and may yield superior results in alternative datasets. An overview of the results stemming from the various mitigation strategies is presented in Table 13.

metrics reference model total deletion correlation deletion fair-SMOTE Adaptation exponentiated gradient fair redistribution
HGR KDE 28.97% 18.25% 27.66% 28.05% 31.74% 29.05%
RMSE 164.21 170.61 163.03 164.88 164.30 164.72
Loss ratio 99.71% 99.30% 99.79% 99.69% 99.70% 99.39%
Table 13: Summary of bias mitigation results : the HGR KDE metric evaluates the dependance between Y^^𝑌\hat{Y}over^ start_ARG italic_Y end_ARG and S𝑆Sitalic_S. RMSE ans Loss ratio, defined in (1), evaluate the performance of the model after various mitigation strategies.

6 Conclusion

Fairness serves as a crucial constraint in the realm of pricing. Whether it arises from regulatory requirements or a company’s strategic objectives, it is imperative to establish, assess, and alleviate biases to attain equitable models. Consequently, there was an initial examination of fairness from a mathematical standpoint, wherein various metrics were introduced to identify biases both before and after the modeling process, irrespective of the conventional gender-related treatment. Subsequently, a study of mitigation methods was conducted, which encompassed pre-, mid-, and post-modeling interventions. These interventions aimed to address bias comprehensively by reprocessing the data and imposing fairness constraints, as well as to address bias individually by reprocessing distinct premiums. While some methods proved more effective than others, each approach offered valuable insights.

In this application, taking into account the constraint of pricing model performance, unfairness as defined by our metrics was not fully mitigated because it would have led to a significant degradation of our models. For example, in total deletion more variables could be deleted or more redistribution iterations could be made in the fair redistribution method. Apart from this constraint, full fairness seems difficult to achieve, at least because corrections are made on a training dataset and with statistical randomness, changes in the portfolio structure etc. fairness would be partial.

In continuation of this research, further datasets and scenarios can be explored to obtain more robust results and establish a reference benchmark. It will also be essential to extend the analysis to encompass other pricing granularities such as portfolio segments or groups of coverages.

This study, nevertheless, lays the groundwork for quantifying and mitigating biases within the insurance domain, providing a framework for ongoing exploration. For instance, there is potential to enhance mitigation techniques during the modeling process and investigate the simultaneous treatment of multiple sensitive variables.

Acknowledgements

Data:  Datasets used in this manuscript are based on confidential data obtained by a consulting company, from a (real) insurance company.

Conflicts of interest:  The authors declare no conflicts of interest.

Declaration of funding:  AC acknowledges Canada’s National Sciences and Engineering Research Council (NSERC) for funding (RGPIN-2019-07077) and the SCOR Foundation.

Acknowledgements:  We are grateful to Laurence Barry for stimulating discussions and André Grondin for sharing this thoughts on earlier versions of this work.
FV thanks the France 2030 framework programme Centre Henri Lebesgue ANR-11-LABX-0020-01 for its stimulating mathematical research programs.
We are very grateful to the anonymous referees for valuable comments.

Appendix A Fair-SMOTE adaptation pseudo code


Algorithm 1 Fair-SMOTE adaptation
1:for  each subgroup do
2:     for  each simulated individual of the given subgroup do
3:         randomly select an individual p𝑝pitalic_p in the sub group
4:         find via k𝑘kitalic_k-nearest neighbor the two p𝑝pitalic_p’s closest neighbors v1 and v2subscript𝑣1 and subscript𝑣2v_{1}\text{ and }v_{2}italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT
5:         sample u𝑢uitalic_u from an uniform distribution
6:         for each column of our N𝑁Nitalic_N columns do
7:              if column with binary values then
8:                  if st>u𝑠𝑡𝑢st>uitalic_s italic_t > italic_u then
9:                       x~=random_choice(xv1,xv2,xp)~𝑥random_choicesubscript𝑥subscript𝑣1subscript𝑥subscript𝑣2subscript𝑥𝑝\tilde{x}=\text{{random\_choice}}(x_{v_{1}},x_{v_{2}},x_{p})over~ start_ARG italic_x end_ARG = random_choice ( italic_x start_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT )
10:                  else
11:                       x~=xp~𝑥subscript𝑥𝑝\tilde{x}=x_{p}over~ start_ARG italic_x end_ARG = italic_x start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT
12:                  end if
13:              else if  column with qualitative values then
14:                  x~=random_choice(xv1,xv2,xp)~𝑥random_choicesubscript𝑥subscript𝑣1subscript𝑥subscript𝑣2subscript𝑥𝑝\tilde{x}=\text{{random\_choice}}(x_{v_{1}},x_{v_{2}},x_{p})over~ start_ARG italic_x end_ARG = random_choice ( italic_x start_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT )
15:              else if column with quantitative values then
16:                  if st>u𝑠𝑡𝑢st>uitalic_s italic_t > italic_u then
17:                       x~=xp+ft×(xv1xv2)~𝑥subscript𝑥𝑝𝑓𝑡subscript𝑥subscript𝑣1subscript𝑥subscript𝑣2\tilde{x}=x_{p}+ft\times(x_{v_{1}}-x_{v_{2}})over~ start_ARG italic_x end_ARG = italic_x start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT + italic_f italic_t × ( italic_x start_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT )
18:                  else
19:                       x~=xp~𝑥subscript𝑥𝑝\tilde{x}=x_{p}over~ start_ARG italic_x end_ARG = italic_x start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT
20:                  end if
21:              else
22:                  return an alert to trigger the transformation of the column
23:              end if
24:         end for
25:     end for
26:end for

With u𝑢uitalic_u the realization of an uniform law on [0,1]01[0,1][ 0 , 1 ], x~~𝑥\tilde{x}over~ start_ARG italic_x end_ARG the value of the coordinate of the new individual on the corresponding j𝑗jitalic_j column Xjsubscript𝑋𝑗X_{j}italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, (xj)j{v1,v2,p}subscriptsubscript𝑥𝑗𝑗subscript𝑣1subscript𝑣2𝑝(x_{j})_{j\in\{v_{1},v_{2},p\}}( italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_j ∈ { italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_p } end_POSTSUBSCRIPT the value of the coordinate for the individuals v1,v2 and p.subscript𝑣1subscript𝑣2 and 𝑝v_{1},v_{2}\text{ and }p.italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and italic_p . Random_choice, a random choice between parameters with each parameter having the same chances of being choice.

References

  • Agarwal et al. (2018) Alekh Agarwal, Alina Beygelzimer, Miroslav Dudík, John Langford, and Hanna Wallach. A reductions approach to fair classification. ICML’18, pages 60––69, 2018.
  • Alycia and Wu (2022) Carey Alycia and Xintao Wu. The causal fairness field guide: Perspectives from social and formal sciences. Frontiers in Big Data, 5, 2022.
  • Angwin et al. (2016) Julia Angwin, Jeff Larson, Lauren Kirchner, and Surya Mattu. Machine bias. ProPublica, 2016.
  • Ayuso et al. (2016) Mercedes Ayuso, Montserrat Guillen, and Ana Maria Pérez-Marín. Telematics and gender discrimination: Some usage-based evidence on whether men’s risk of accidents differs from women’s. Risks, 4(2), 10, 2016.
  • Becker (1957) Gary S Becker. The economics of discrimination. University of Chicago press, 1957.
  • Berk et al. (2017) Richard Berk, Hoda Heidari, Shahin Jabbari, Matthew Joseph, Michael Kearns, Jamie Morgenstern, Seth Neel, and Aaron Roth. A convex framework for fair regression. arXiv, 1706.02409, 2017.
  • Berk et al. (2018) Richard Berk, Hoda Heidari, Shahin Jabbari, Michael Kearns, and Aaron Roth. Fairness in criminal justice risk assessments: The state of the art. sociological methods & research. Sociological Methods & Research, 50(1), 2018.
  • Black et al. (2020) Emily Black, Samuel Yeom, and Matt Fredrikson. Fliptest: fairness testing via optimal transport. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, pages 111–121, 2020.
  • Boyd and Vandenberghe (2004) Stephen Boyd and Lieven Vandenberghe. Convex Optimization. Cambridge University Press, 2004. doi: 10.1017/CBO9780511804441.
  • Castelnovo et al. (2022) Alessandro Castelnovo, Riccardo Crupi, Greta Greco, Daniele Regoli, Ilaria Giuseppina Penco, and Andrea Claudio Cosentini. A clarification of the nuances in the fairness metrics landscape. 12, 2022. doi: 10.1038/s41598-022-07939-1.
  • Chakraborty et al. (2021) Joymallya Chakraborty, Suvodeep Majumder, and Tim Menzies. Bias in machine learning software : Why ? how ? what to do ? arXiv, 2105.12195v3, 2021.
  • Charpentier (2024) Arthur Charpentier. Insurance, biases, discrimination and fairness. Springer, 2024. doi: 10.1007/978-3-031-49783-4.
  • Charpentier et al. (2023a) Arthur Charpentier, Emmanuel Flachaire, and Ewen Gallic. Causal inference with optimal transport. In Nguyen Ngoc Thach, Vladik Kreinovich, Doan Thanh Ha, and Nguyen Duc Trung, editors, Optimal Transport Statistics for Economics and Related Topics. Springer Verlag, 2023a.
  • Charpentier et al. (2023b) Arthur Charpentier, François Hu, and Philipp Ratz. Mitigating discrimination in insurance with wasserstein barycenters. In Proceedings of BIAS 2023, 3rd Workshop on Bias and Fairness in AI, International Workshop of ECML PKDD, 2023b.
  • Chouldechova (2017) Alexandra Chouldechova. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big Data, 5(2):153–163, 2017.
  • Corbett-Davies et al. (2017) Sam Corbett-Davies, Emma Pierson, Avi Feller, Sharad Goel, and Aziz Huq. Algorithmic decision making and the cost of fairness. arXiv, 1701.08230, 2017.
  • Darolles et al. (2004) Serge Darolles, Jean-Pierre Florens, and Christian Gourieroux. Kernel-based nonlinear canonical analysis and time reversibility. Journal of Econometrics, 119(2), 323-353, 2004.
  • De Lara et al. (2021) Lucas De Lara, Alberto González-Sanz, Nicholas Asher, and Jean-Michel Loubes. Transport-based counterfactual models. arXiv, 2108.13025, 2021.
  • del Barrio et al. (2020) Eustasio del Barrio, Paula Gordaliza, and Jean-Michel Loubes. Review of mathematical frameworks for fairness in machine learning. arXiv, 2005.13755, 2020.
  • Dwork et al. (2011) Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Rich Zemel. Fairness through awareness. arXiv, 1104.3913, 2011.
  • Dwork et al. (2012) Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. Fairness through awareness. In Proceedings of the 3rd innovations in theoretical computer science conference, pages 214–226, 2012.
  • Edgeworth (1922) Francis Y Edgeworth. Equal pay to men and women for equal work. The Economic Journal, 32(128):431–457, 1922.
  • EEOC (1979) The U.S. EEOC. Uniform guidelines on employee selection procedures. Equal Employment Opportunity Commission EEOC Technical Report, 1979.
  • Frees and Huang (2023) Edward W Frees and Fei Huang. The discriminating (pricing) actuary. North American Actuarial Journal, 27(1):2–24, 2023.
  • Freund and Schapire (1996) Yoav Freund and Robert Schapire. Game theory, on-line prediction and boosting. Proceedings of the Ninth Annual Conference on Computational Learning Theory, 1996.
  • Galles and Pearl (1998) David Galles and Judea Pearl. An axiomatic characterization of causal counterfactuals. Foundations of Science, 3:151–182, 1998.
  • Gebelein (1941) Hans Gebelein. Das statistische problem der korrelation als variations- und eigenwertproblem und sein zusammenhang mit der ausgleichsrechnung. ZAMM - Journal of Applied Mathematics and Mechanics / Zeitschrift für Angewandte Mathematik und Mechanik, 21(6):364–379, 1941.
  • Grary et al. (2022) Vincent Grary, Arthur Charpentier, and Marcin Detyniecki. A fair pricing model via adversarial learning. ArXiv, 2202.12008, 2022.
  • Hardt et al. (2016) Moritz Hardt, Eric Price, and Nati Srebro. Equality of opportunity in supervised learning. Advances in neural information processing systems, 29:3315–3323, 2016.
  • Hirschfeld (1935) Hermann Otto Hirschfeld. A connection between correlation and contingency. Mathematical Proceedings of the Cambridge Philosophical Society, 31(4):520–524, 1935. doi: 10.1017/S0305004100013517.
  • Hu et al. (2023) François Hu, Philipp Ratz, and Arthur Charpentier. A sequentially fair mechanism for multiple sensitive attributes. arXiv, 2309.06627, 2023.
  • Kleinberg et al. (2016) Jon Kleinberg, Sendhil Mullainathan, and Manish Raghavan. Inherent trade-offs in the fair determination of risk scores. arXiv, 1609.05807, 2016.
  • Komiyama and Shimao (2017) Junpei Komiyama and Hajime Shimao. Two-stage algorithm for fairness-aware machine learning. arXiv, 1710.04924, 2017.
  • Kusner et al. (2017) Matt J Kusner, Joshua Loftus, Chris Russell, and Ricardo Silva. Counterfactual fairness. Advances in neural information processing systems, 30, 2017.
  • Lindholm et al. (2022a) Mathias Lindholm, Ronald Richman, Andreas Tsanakas, and Mario V Wüthrich. A discussion of discrimination and fairness in insurance pricing. arXiv preprint arXiv:2209.00858, 2022a.
  • Lindholm et al. (2022b) Mathias Lindholm, Ronald Richman, Andreas Tsanakas, and Mario V Wüthrich. Discrimination-free insurance pricing. ASTIN Bulletin: The Journal of the IAA, 52(1):55–89, 2022b.
  • Mary et al. (2019) Jérémie Mary, Clément Calauzenes, and Noureddine El Karoui. Fairness-aware learning for continuous attributes and treatments. In International Conference on Machine Learning, pages 4382–4391. PMLR, 2019.
  • of the European Union (2008) Council of the European Union. Proposition de directive du conseil relative à la mise en œuvre du principe de l’égalité de traitement entre les personnes sans distinction de religion ou de convictions, de handicap, d’âge ou d’orientation sexuelle. Presses Universitaires de France, 2008.
  • Phelps (1972) Edmund S Phelps. The statistical theory of racism and sexism. The american economic review, 62(4):659–661, 1972.
  • Rényi (1959) Alfréd Rényi. On measures of dependence. Acta Mathematica Hungarica, 10 (3-4):441––451, 1959.
  • Witsenhausen (1975) Hans Witsenhausen. On sequences of pairs of dependent random variables. SIAM J. Appl. Math., 28, 1975.