1 Introduction

\OneAndAHalfSpacedXII\TheoremsNumberedThrough\EquationsNumberedThrough\MANUSCRIPTNO

\RUNTITLE

Reducing the Filtering Effect in Public School Admissions \TITLEReducing the Filtering Effect in Public School Admissions: A Bias-aware Analysis for Targeted Interventions

\ARTICLEAUTHORS\AUTHOR

Yuri Faenza \AFFColumbia University, New York, NY, \EMAILyf2414@columbia.edu \AUTHORSwati Gupta \AFFGeorgia Institute of Technology, Atlanta, GA, \EMAILswatig@gatech.edu \AUTHORAapeli Vuorinen \AFFColumbia University, New York, NY, \EMAILaapeli.vuorinen@columbia.edu \AUTHORXuan Zhang \AFFColumbia University, New York, NY, \EMAILxz2569@columbia.edu

\ABSTRACT

Problem definition: Traditionally, New York City’s top 8 public schools have selected candidates solely based on their scores in the Specialized High School Admissions Test (SHSAT). These scores are known to be impacted by socioeconomic status of students and test preparation received in middle schools, leading to a massive filtering effect in the education pipeline. The classical mechanisms for assigning students to schools do not naturally address problems like school segregation and class diversity, which have worsened over the years. The scientific community, including policymakers, have reacted by incorporating group-specific quotas and proportionality constraints, with mixed results. The problem of finding effective and fair methods for broadening access to top-notch education is still unsolved.

Methodology/results: We take an operations approach to the problem different from most established literature, with the goal of increasing opportunities for students with high economic needs. Using data from the Department of Education (DOE) in New York City, we show that there is a shift in the distribution of scores obtained by students that the DOE classifies as “disadvantaged” (following criteria mostly based on economic factors). We model this shift as a “bias” that results from an underestimation of the true potential of disadvantaged students. We analyze the impact this bias has on an assortative matching market. We show that centrally planned interventions can significantly reduce the impact of bias through scholarships or training, when they target the segment of disadvantaged students with average performance.

Managerial implications: To make these interventions incentive compatible and individually-fair, we propose a randomization-based policy for allocation of training resources to students, which is heavily targeted towards average performers. Our results challenge existing notions of scholarships in the current education system. We believe that these insights can guide policymakers in answering a critical question: how should one allocate limited funding across schools and students to maximally help disadvantaged students.

\KEYWORDS

bias, admissions, interventions, assortative matching, randomized policies

1 Introduction

Bias and the disparity in opportunities are believed to play a major role in access to education at different levels (Quinn Capers et al. 2017). It is known that outcomes of middle school admissions dictate high school admissions, which in turn impact pathways to higher studies (Corcoran and Baker-Smith 2018). Selection however starts much earlier, with gifted and talented programs screening students as young as 4 years old; these tests often see few students from ethnic minorities succeeding (Shapiro 2019b). In this work, we are motivated by high school admissions in large cities such as New York City (NYC), which has an extensive public school system, with a current enrollment of over one million students. In particular, every year roughly 80,000 students wish to join one of the 700 high school programs. By far, the most sought after public schools are the so-called Specialized High Schools (SHSs). By law, these schools select candidates solely based on their score on the Specialized High School Admissions Test (SHSAT) (NYC DOE 2019). However, such scores are known to be impacted by socioeconomic status of students (Lovaglia et al. 1998) and test preparation received in middle schools (Corcoran and Baker-Smith 2018, Shapiro 2019a). Since ethnic minorities tend to cluster in middle schools of lower quality (Boschma and Brownstein 2016), they are already at a disadvantage in high school admissions, which then reflects in under-representation in higher education programs (Ashkenas et al. 2017). The results is a massive filtering effect in high school admissions: 50% (resp. 80%) of the students admitted to the SHSs come from only the top 5% (resp. 15%) of the middle schools (Corcoran and Baker-Smith 2018).

The goal of this work is to investigate data-driven interventions at the middle school level to reduce the filtering effect. An extensive literature has focused on doing so by proposing changes to admissions policies themselves (see Section 1.2). However, there are substantial political and legislative hurdles to implement admissions criteria that take the disparate backgrounds of students into account. For example, in 2003, an attempt by the University of Michigan to add 12 points for “diversity” on a 150 point scale to promote admissions of underrepresented ethnic minorities was met with a lawsuit, which was ultimately decided not in favor of the university (Gratz v. Bollinger 2003). Moreover, a 2019 plan supported by the then mayor of New York City to eliminate the entrance exam to SHSs has failed to gain enough support, and was not approved by the New York State Senate (Shapiro and Wang 2019). Hence, finding a way to incorporate mechanisms that help disadvantaged students access high-quality education while keeping the procedure fair and legal is still a fundamental open problem.

In this work, we take a completely different operations perspective. We focus on centralized pre-admission interventions, such as targeted preparatory courses for selected students, that do not involve a change in the admissions criteria. We introduce a matching model of schools and students where some students (that we call disadvantaged) are not evaluated at their true potential, but at a strictly lower level. We then investigate both theoretically and empirically the impact of such differences in treatment, as well as interventions to counter it, in the form of vouchers to access additional training. Our main contribution is a randomized policy for voucher allocation that is individually fair, incentive compatible and, by targeting average disadvantaged students, can substantially reduce the mistreatment they experience, as measured by various metrics. We next present the setup, intermediate results, and experiments leading to our main contribution.

1.1 Contributions

In order to present our mathematical model, let us first look into the characteristics and the mechanism for SHSs admissions in NYC. SHSs admit students uniquely based on the student’s score on the highly competitive SHSAT. The NYC Department of Education (DOE) acknowledges that there is a disparity in students’ abilities to prepare for the test, and so classifies some students as disadvantaged based on criteria such as their household income and the middle school they attended (NYC DOE 2018). Following NYC DOE’s definition, we divide the students who took SHSAT into two groups: non-disadvantaged ( $G_{1}$ ) and disadvantaged ( $G_{2}$ ). We find that the distribution of the SHSAT scores¹¹1We obtained this data from the NYC DOE under a non-disclosure agreement. of the two groups (Figure 1(a)) are significantly different, but match closely (as measured by Wasserstein distance) if we scale the scores of disadvantaged students by a factor $\frac{1}{\beta}\approx\frac{1}{0.88}\approx 1.13$ (Figure 1(b)). Motivated by this observation, in this paper we consider a model where the true potential $Z$ of a student is sampled from the Pareto distribution²²2The choice of the Pareto distribution to model potentials is inspired by a body of empirical work (see, e.g., Clauset et al. (2009)) on the achievements of individuals in many professions. As we observe later, it also gives a good approximation to the SHSAT score distribution beyond a certain threshold (see Figure 2)., while the perceived potential $\widehat{Z}$ is equal to $Z$ for $G_{1}$ students and to $\beta Z$ for some $\beta\in(0,1)$ for $G_{2}$ students. We are not the first to use a multiplicative model³³3See Appendix 9 for a discussion on how the multiplicative model better fits our data compared to alternative models of bias. to scale the potential of agents: such a model has been pioneered in the literature by Kleinberg and Raghavan (2018), where it was also motivated by (different) experimental evidence (Wenneras and Wold 2010). Following Kleinberg et al. (2017), we call $\beta$ the bias factor⁴⁴4Broadly speaking, this bias factor captures cases when candidates with the same level of knowledge, skills, or abilities have dissimilar algorithmic or perceived qualification due to their group membership (measurement bias) or when prediction relationships (i.e., predictor and outcome) are not equivalent for members of different groups (predictive bias) (Drasgow 1984)..

Thus, in the following, we assume that schools rank students based on perceived potentials $\widehat{Z}$ (e.g., using SHSAT scores). To be able to find tractable policies, we assume in our theoretical analysis that all students share the same ranking of schools (e.g., based on US News Rankings). Although this assumption abstracts out considerations that may be important for students (such as proximity of a school (Burgess et al. 2015), limits on the length of preference lists (Calsamiglia et al. 2010), or strong preferences of students for certain high schools⁵⁵5For instance in the 2016-17 SHSAT cohort, 56% of students indicated Stuyvesant or Brooklyn Tech as their first preference, with 76% naming at least one of the two in their top two preferences.), we also later argue experimentally, by dropping this assumption, that our qualitative results are robust to relaxations of our stylized model.

After investigating the impact that bias has on both disadvantaged and non-disadvantaged students, we propose interventions such as additional training and scholarships towards disadvantaged students, quantifying the effect of those mechanisms and identifying the population they should be targeted to. We discuss qualitative results for the New York City Specialized High Schools by first estimating the multiplicative bias and fitting the Pareto distribution to the data, and then evaluating the effect of the interventions devised in the theoretical model. Our key findings are as follows:

Refer to caption — (a) Original SHSAT Scores

1.

Asymmetric impact and minority effect: We observe that, under reasonable assumptions on the parameters, such as disadvantaged students being a minority, the impact of bias on $G_{2}$ (disadvantaged) students is much bigger than the slight advantage that $G_{1}$ students obtain (see Figure 3 and Figure 7). Moreover, at a societal level, the presence of bias excludes most disadvantaged students from top schools (see Appendix 11 of the e-Companion and Figure 9 therein), explaining a phenomenon often seen in the real world (Shapiro 2021).
2.

Deterministic Centralized Interventions: Since numerically rescaling student scores has been ruled unconstitutional in many past cases (Gratz v. Bollinger 2003), we next consider centralized interventions such as extra training, problem solving groups, team building exercises, and refer to these collectively as vouchers in Section 4. These are provided to disadvantaged students, who we assume will benefit from these vouchers and be able to reveal their true potential to all schools (e.g., by retaking the SHSAT exam after additional training). To measure the impact of the bias on disadvantaged students, we first define the mistreatment of a student as the difference in the ranking of the school the disadvantaged student gets matched to under bias compared to the unbiased setting. We then study mechanisms to allocate vouchers to reduce aggregate measures of mistreatment, which we interpret as measures of (group) fairness. We first show that under two such very different measures, maximum benefit is achieved by providing vouchers to average-performing (rather than top) disadvantaged students, assuming as said before that their abilities are Pareto-distributed. These findings challenge existing scholarship/aid allocation mechanisms, addressing one of the key questions facing policy makers on how to distribute resources.
3.

Incentive Compatible Voucher Distribution: We next observe that the deterministic allocation of vouchers to average performers creates an incentive for some top students to underperform. More generally, we show that the only deterministic policy that is incentive compatible distributes vouchers to all students whose perceived potential is above a threshold. However, this policy has a small impact on reducing mistreatment. Therefore, we discuss two random allocations of vouchers that are incentive compatible. In particular, one of them (that we term Proportional to Mistreatment or PropM) still favors mid-performing students and guarantees that the maximum expected mistreatment is lower than under the best deterministic policy. The other is incentive compatible under general potential distributions. These policies have the additional benefit of being individually fair, in the sense of a Lipschitz condition, so that the probability of receiving a voucher for students with similar potential is similar.
4.

Experimental validation: In Section 6, we validate our theoretical results on admissions data to the SHSs for the academic year 2016-17. We define disadvantaged students following NYC DOE criteria. Although our model assumes that students have homogeneous preferences, we compute the stable matching using their SHSAT scores and reported heterogeneous preferences. We find our key theoretical takeaways to be still valid; for instance, the shape of students’ mistreatment resembles the theoretical prediction (including the fact that average performers are the most mistreated students) and that our voucher distribution program improves the mistreatment across the board. We further show that the ranges of students to give vouchers to, obtained from applying the stylized model to the real data, are qualitatively similar to the best ranges found via naive grid search. This leads to our policy insights, which we discuss next.
5.

Policy Insights: Motivated by the goal of maximizing the impact of limited available resources, in this work we propose that additional training and vouchers should be offered to average performers rather than top performers. At a high level, the two assumptions that lead to this result are (1) the concentration of students who perform around the average, compared to a much smaller cohort who make up the top performers, and (2) that given enough opportunities and support, the performance of the two cohorts of the students would be indistinguishable, motivating a multiplicative model of bias which allows the scaled distributions to match closely. The key phenomenon that arises due to these assumptions is that a small deviation in an average disadvantaged student’s perceived performance leads to a significant drop in their school rank.

(1) is a key characteristic of many common distributions, including the Pareto distribution investigated in detail in this paper. (2) is supported by education and policy literature which shows additional resources can positively impact low achieving student groups (Greenwald et al. 1996, Dee and Jacob 2011). Our work compliments this line of work through mathematical analysis, to target limited resources effectively. The multiplicative model and debiasing effect therefore justifies the design of our randomized policies to distribute the vouchers.

The current rationale behind most scholarship programs is to reward top performers, driven by a desire for meritocracy, and to drive better top performance by creating this competition. Our analysis, on the other hand, suggests that if the goal is to maximize impact on disadvantaged students, more support should be given to average performers for maximal impact in terms of highest gains in improvement of matched schools, driven by a desire for more equitable outcomes. Moreover, this can be achieved with an incentive compatible randomized policy tailored to the distribution of potentials. However, in the case that nothing is known about the distribution of student potentials (so, in particular, condition (1) cannot be assumed), we show that the only policies guaranteed to be incentive compatible are the ones which allocate vouchers to students with high SHSAT performance with higher probability (e.g., threshold policies). It is worth noticing that the use of randomization is perfectly viable in school choice settings, and many current mechanisms explicitly rely on it⁶⁶6For instance, the NYC Department of Education assigns to each student a 32-hexadecimal digit random number that is used to break ties when two students have the same priority at some school. Ties are extremely frequent, e.g., in admissions to early school years. In fact, most public Kindergartens in New York City prioritize students based on three criteria only: whether a sibling is currently attending the school; whether the student lives in the same zone of the school; whether the student lives in the same district of the school. This means that the random number plays a key role in admission decisions at this level..

The rest of the paper is organized as follows: In Section 2, we formally introduce our mathematical models for the continuous matching market and multiplicative bias. In Section 3 we analyze the effects of bias on both disadvantaged and non-disadvantaged students, introducing the key concept of displacement. We then consider deterministic policies for reducing such bias via a centralized approach in Section 4, quantifying their impact on students and discussing various notions of (group) fairness. In particular, in Section 4 we present two theorems that quantify the optimal deterministic debiasing sets under two different measures of fairness. In Section 5, we show that such determinstic policies fail to be incentive compatible and individually fair. We introduce the randomized assignment of vouchers to satisfy these fairness conditions while at the same time achieving a lower maximum mistreatment than the deterministic policies. In Section 6 we apply our policies to the real-world dataset of SHSs admissions for the 2016-17 cohort. We close with a discussion in Section 7.

1.2 Related Work

Various selection problems have been investigated in models with a multiplicative bias introduced by Kleinberg and Raghavan (2018) (e.g., Celis et al. (2020, 2021), Salem and Gupta (2023), Emelianov et al. (2020)) but, to the best of our knowledge, this paper is the first to investigate it in the role of school choice. There exists some literature such as Hastings et al. (2009), Laverde (2020) on understanding the impact of family backgrounds on student preferences, but this is orthogonal to the questions we study here. Our work complements existing work in the education and policy literature which shows additional resources can positively impact low achieving student groups (Dee and Jacob 2011, Greenwald et al. 1996).

The most common way to model admissions to schools is through a two-sided market, with the two sides being schools and students, respectively, and each agent having an ordered preference on the agents from the other side of the market that are considered acceptable. This model has been used to match doctors to hospitals by the National Residency Matching Program since the 1960s, and it has gained widespread notoriety when Abdulkadiroğlu et al. (2005) used it to reform the admissions process for New York City public high schools in 2003. Since then, admission decisions have been centralized and are (essentially) governed by the classical Gale-Shapley Deferred Acceptance algorithm (Gale and Shapley 1962). The simplicity of the algorithm, as well as the drastic improvement to the quality of matches it provides when compared to the pre-2003 method, have led to academic and public acclaim, and spurred applications in many other systems (see, e.g., Biró (2008)). However, this mechanism does not naturally address problems like school segregation and class diversity, which have worsened and become more and more of a concern in recent years (Kamada and Kojima 2024, Kucsera and Orfield 2014, Shapiro March 26, 2019, Shapiro and Lai June 03, 2019). The scientific community, including policy-makers, has reacted e.g. by incorporating in the mathematical model group-specific quotas, proportionality constraints (Biró et al. 2010, Nguyen and Vohra 2019, Tomoeda 2018), but there is evidence that adding such constraints may even hurt the very students they were meant to help (Backes 2012, Fershtman and Pavan 2021, Hafalir et al. 2013) or create legal challenges.

There is a long line of work on affirmative action policies in theory and in practice (Abdulkadiroğlu 2005, Arcidiacono et al. 2011, Chade et al. 2014, Chan and Eyster 2003, Hafalir et al. 2013, Quinn Capers et al. 2017) and alternatives like the top 10% admissions criteria implemented in Texas (Texas Comptroller of Public Accounts 2024). In our work, we do not consider proposing substitute mechanisms such as the top 10% criteria, due to significant deviation from current practice. Moreover, it is unclear whether this would improve the status quo or worsen it (e.g., (Long 2004) found a significant impact on the admissions of minorities if affirmative action policies for college admissions were replaced by top $x$ % rules). In this work, we take a completely different approach to improve the performance of disadvantaged students by voucher distribution, and help mitigate the impact of disparities. This will naturally help with the downstream impacts in the education pipeline towards economic opportunities (Kannan et al. 2019, Coate and Loury 1993), since the evaluation criteria, i.e., SHSAT, and the threshold for admissions remains unchanged in this work. Further, test-optional policies typically studied in the context of college admissions are likely to not be adaptable for high school admissions (Liu and Garg 2021, Dessein et al. 2023), due to the state law in New York.

Further, to the best of our understanding, our work differs from statistical discrimination theories in economics (Phelps 1972) or taste-based discrimination (Becker 2010), since in our setting the sole admissions criteria is performance on the SHSAT score, as necessitated by state law, disbarring the use of additional characteristics of candidates. In fact, our approach tries to minimize the impact of unequal opportunities, through pre-admission resources like after-school coaching or simply the means to retake the SHSAT exam. A recent study (Niu et al. 2022) shows the impact of being able to retake SAT exams and that reporting all the scores leads to more equitable outcomes as well as a more accurate signal for colleges. Our work is aligned with the latter result, in the sense of creating a more accurate signal for disadvantaged students. Further, a recent study (Garg et al. 2021) focused on the design of a fair admissions process by identifying conditions where standardized tests should be dropped, while our paper mostly focuses on pre-admission policies. Furthermore, changing the admissions criteria for SHSs in New York would require changing a state law (Hecht-Calandra Act), which is a significant hurdle to the implementation of such policies (see, for example, (Chin 2022)).

Lastly, any admissions policy is susceptible to manipulation by applicants. Recent work by (Hu et al. 2019) has considered strategic behavior of students in a classification setting, where each student can expend some bounded amount of resources to improve their test-score performance and convert a “reject” decision to an “accept” decision. The school can provide subsidies to students to reveal their true potential. (Hu et al. 2019) shows cases in which providing a subsidy can make the group receiving the subsidy worse-off. Though our work considers a completely different model, we also find cases in which the voucher distribution can in fact worsen some fairness metrics over the disadvantaged groups, or students may be strategic, see the discussions in Section 4 and Section 5.1.

2 A continuous matching market

We introduce a stylized matching model, where students rank schools following a unique strict order, and schools rank students following a unique order. For tractability of results, both schools and students are assumed to be continuous sets, following a recent trend in the literature (see Appendix 8 for discussion). We let the student population be a set $\Theta$ , and we associate to $\Theta$ a probability distribution on the potentials of students. For each student $\theta\in\Theta$ , we use $Z(\theta)$ to denote their true potential. Unless otherwise stated, we assume ${Z(\theta)}\sim\text{Pareto}(1,\alpha)$ , and thus, all students have potentials at least $1$ .

Continuous Model with One Group:

Let us first consider a case when every student’s true potential is visible to the schools, i.e., there is only one group of students. Consider the cumulative distribution function (cdf) of the distribution of potentials Pareto $(1,\alpha)$ , given by $F(t)=1-t^{-\alpha}$ . The domain of $F(\cdot)$ is $[1,\infty)$ and the range of $F$ is [0,1]. Let $\mu:\Theta\rightarrow[0,1]$ be the function that ranks students based on their potentials, assuming rankings to be normalized between 0 and 1 (0 being the best, and 1 worst). Note that $\mu(\theta)=1-F(Z(\theta))$ , since $F(Z(\theta))$ is the fraction of students whose potential is lower than student $\theta$ ’s potential $Z(\theta)$ . Since $1-F(\cdot)$ (i.e., the complementary cumulative distribution function, ccdf) will appear in our analysis often, we use the notation $\bar{F}=1-F$ . Schools are also parametrized by $[0,1]$ , thus we let a student $\theta$ be assigned to school ranked $\mu(\theta)$ . Therefore, we will at times overload the notation $\mu$ to directly imply the assignment of students to schools. We will often want to associate a (set of) students who are assigned to a school in a way which ensures that the total probability mass is preserved under the assignment (similar to an inverse probability transform (Grimmett and Stirzaker 2020)). Consider the assignment $\mu:\Theta\rightarrow[0,1]$ , where $\mu(\theta)=\bar{F}(Z(\theta))=1-F(Z(\theta))$ . Then, for school $s\in[0,1]$ , $\mu^{-1}(s)=Z^{-1}(F^{-1}(1-s))$ is the set of students assigned to school $s$ . Note that the fraction of students who are matched to schools with ranks in $(s_{1},s_{2}]$ (with $0\leq s_{1}<s_{2}\leq 1$ ) is $F^{-1}(1-s_{1})-F^{-1}(1-s_{2})$ .

Example 2.1

A student Maya $\in\Theta$ scores $Z($ “Maya” $)=1.4$ , with their score sampled from $\text{Pareto}(1,3)$ (i.e., $\alpha=3$ ). The fraction of students who are better than Maya is equal to $\bar{F}(Z($ “Maya” $))=1-(1-1/(1.4^{3})\approx 0.3644$ , which is also the school (rank) $s$ Maya is assigned to in the continuous model.

Continuous Model with Two Groups:

We now consider two groups of students: non-disadvantaged $G_{1}$ and disadvantaged $G_{2}$ students. We assume that the $G_{2}$ students constitute a $p$ fraction of the entire student population for some $p\in[0,1]$ , and that their perceived potentials are biased by a constant multiplicative factor $\beta\in(0,1]$ . For students in $G_{1}$ , we let their perceived potentials be exactly their true potentials and they account for $1-p$ proportion of the population. We let $\widehat{Z}(\theta)$ denote the perceived potential of a student $\theta\in\Theta$ . That is, if $\theta\in G_{1}$ , then $\widehat{Z}(\theta)=Z(\theta)$ ; otherwise, $\widehat{Z}(\theta)=\beta Z(\theta)$ . The cdfs for $G_{1}$ and $G_{2}$ students are $F_{1}$ and $F_{2}$ respectively:

F_{1}(t)=1-t^{-\alpha};\qquad F_{2}(t)=1-\beta^{\alpha}t^{-\alpha}.

Note that both functions $F_{1}(\cdot)$ and $F_{2}(\cdot)$ take in the value of perceived potentials. Moreover, the domain of $F_{1}$ is $[1,\infty)$ , whereas the domain of $F_{2}$ is $[\beta,\infty)$ . We denote their respective ccdfs by $\bar{F}_{1}$ and $\bar{F}_{2}$ . Now for a student $\theta\in\Theta$ , we let $\widehat{\mu}(\theta)$ be equal to the fraction of students whose perceived potentials are higher than that of $\theta$ in the two groups case. We get:

\widehat{\mu}(\theta)=\begin{cases}(1-p)\bar{F}_{1}(\widehat{Z}(\theta))+p\bar% {F}_{2}(\widehat{Z}(\theta))&\text{ if }\theta\in G_{1},\\[2.84526pt] (1-p)\bar{F}_{1}(\widehat{Z}(\theta)\vee 1)+p\bar{F}_{2}(\widehat{Z}(\theta))&% \text{ if }\theta\in{G_{2}},\end{cases}

(1)

where $\vee$ is the maximum operator. As before, we say that student $\theta$ is assigned to school $\widehat{\mu}(\theta)$ . Note that, when $\beta=1$ (i.e., no bias), formula (1) computes $\mu(\theta)$ : $\mu(\theta)=\bar{F}_{1}(Z(\theta))$ for $\theta\in\Theta$ . We let $\mu(\theta)$ be the school that student $\theta$ gets assigned to, without any bias, and let $\widehat{\mu}(\theta)$ be the school which student $\theta$ is actually assigned to due to bias. For assignment $\gamma\in\{\mu,\widehat{\mu}\}$ , and school $s\in[0,1]$ , we let again $\gamma^{-1}(s)$ be the “set” of students assigned to school $s$ under matching $\gamma$ .

Formally, we define a matching in this market to be a surjective measurable function $\gamma$ from $\Theta$ to $[0,1]$ (i.e., students to schools), such that the mass of students mapped to a set of schools $S\subseteq[0,1]$ coincides with the standard Lebesgue measure $\nu$ of $S$ . In formula, any surjective function $\gamma$ from $\Theta$ to $[0,1]$ is a matching if

\nu(\gamma^{-1}(S)):=(1-p)\int_{\theta\in\gamma^{-1}(S)\cap G_{1}}dF_{1}(% \widehat{Z}(\theta))+p\int_{\theta\in\gamma^{-1}(S)\cap G_{2}}dF_{2}(\widehat{% Z}(\theta))

is equal to the standard Lebesgue measure of $S$ for all $S\subseteq[0,1]$ . One can easily check that $\mu$ and $\widehat{\mu}$ defined above are matchings.

Example 2.2

Student scores are again sampled from $\text{Pareto}(1,3)$ . Maya $\in G_{2}$ scores again $Z($ “Maya” $)=1.4$ . Lisa $\in G_{1}$ instead scores $Z($ “Lisa” $)=1.3$ . In the unbiased setting, Maya gets matched to schoool $\bar{F}(Z($ “Maya” $))=1-(1-1/(1.4^{3})\approx 0.3644$ while Lisa gets matched to $\bar{F}(Z($ “Lisa” $))=1-(1-1/(1.3^{3})\approx 0.4552$ . Letting $\beta=.9$ , we have $\widehat{Z}($ “Maya” $)=1.26$ , $\widehat{Z}($ “Lisa” $)=1.3$ . Letting $p=.2$ , we have that in the biased setting Maya and Lisa are matched to schools

\widehat{\mu}(\text{``Maya''})=0.4729\quad\hbox{and}\quad\widehat{\mu}(\text{`% `Lisa''})=0.4305,

respectively to a significantly worse (slightly better) school than they used to in the setting without bias. Note that Lisa has a smaller true potential than Maya but is assigned to a better school in the biased setting.

3 Impact on Students

Our first goal is to understand how much perceived bias affects agents in the market. In particular, we would like to answer the following question: what is the loss of efficiency for students⁷⁷7In Appendix 11 of the e-Companion, we take the schools’ perspective and show that there is effectively no loss of efficiency for schools under this model, creating little incentive for them to intervene at the individual school level. We also measure there the diversity of the admitted cohort in our model. when all students $\theta\in\Theta$ are assigned to school $\widehat{\mu}(\theta)$ instead of $\mu(\theta)$ ? Formally, we define $\widehat{\mu}(\theta)-\mu(\theta)$ to be the displacement of a student $\theta\in\Theta$ . Note that if $\theta\in G_{1}$ , the displacement is non-positive, and if $\theta\in G_{2}$ , it is non-negative. The displacement can be easily calculated using the formulae for $\mu$ and $\widehat{\mu}$ given in (1).

Proposition 3.1

For any student $\theta\in G_{2}$ , the displacement $\widehat{\mu}(\theta)-\mu(\theta)$ is given by:

\widehat{\mu}(\theta)-\mu(\theta)=\begin{cases}\displaystyle(1-p)\left({Z(% \theta)}\right)^{-\alpha}\left(\beta^{-\alpha}-1\right)&\text{ if }Z(\theta)% \geq\frac{1}{\beta},\\[2.84526pt] \displaystyle(1-p)\left(1-\left({Z(\theta)}\right)^{-\alpha}\right)&\text{ if % }Z(\theta)\leq\frac{1}{\beta}.\end{cases}

For any student $\theta\in G_{1}$ , we have $\widehat{\mu}(\theta)-\mu(\theta)=\left(-p+p\beta^{\alpha}\right)\left({Z(% \theta)}\right)^{-\alpha}.$ Thus, the maximum displacement of $(1-p)(1-\beta^{\alpha})$ is experienced by a $G_{2}$ student with potential $1/\beta$ ; and the most significant negative displacement of $-p(1-\beta^{\alpha})$ is experienced by a $G_{1}$ student with potential $1$ .

One can think of this result intuitively in the following way. Starting from the top school, $G_{1}$ students gradually take up more seats than they deserve, and thus gradually push $G_{2}$ students to worse schools than what they deserve. This process stops once all $G_{1}$ students are assigned to schools, and the only students that remain to be assigned are $G_{2}$ students. As a result, in lower ranked schools, all students are $G_{2}$ students. Hence, the difference in ranks of the schools $G_{2}$ students are matched to decreases towards the end. Figure 3 gives a pictorial illustration of Proposition 3.1. From there, one can clearly see how the most mistreated students are average performers. This intuition will be fundamental in devising policies to counter the effect of bias.

4 Deterministic Centralized Interventions

In this section, we discuss how interventions of a central administration (such as the Department of Education) can act to mitigate the effects of bias. We assume that these interventions allow a set of students, chosen by the central agency, to be debiased, i.e., to reveal their true potential. In practice, this can be achieved by for instance giving free vouchers to (a limited amount of) students that allow them to access preparatory classes for exams or by spending resources to build a community for the students with vouchers that explores learning as a group. Given a certain amount of available vouchers, we want to investigate which students these vouchers should be offered to. We first formally define the negative impact on $G_{2}$ students due to the presence of bias and then derive policies that optimize certain fairness measures. These policies are deterministic, and the decision of whether to assign a voucher to a student (hence, debias them) depends only on the potential of the students. In the next section, we will discuss randomized policies where the decisions will also depend on the outcome of a random coin flip.

Metric of Impact on Students:

Recall that $\mu(\theta)$ and $\widehat{\mu}(\theta)$ denote the school a student $\theta$ is assigned to in the unbiased and biased setting respectively. Now let $\widetilde{\mu}:\Theta\rightarrow[0,1]$ be the ranking of students after bias mitigation. The mistreatment of a student $\theta\in G_{2}$ with respect to an assignment $\widetilde{\mu}$ is defined as the positive part of their displacement, that is $m(\theta):=\max(0,\widetilde{\mu}(\theta)-\mu(\theta))$ . That is, the mistreatment is the drop in the rank of the school the student is matched to (if this drop is positive). A student $\theta$ has mistreatment equal to $0$ if they are assigned to a school at least as good as $\mu(\theta)$ . In the following, we evaluate a voucher distribution by its effect on the mistreatment of $G_{2}$ students, since only $G_{2}$ students may experience strictly positive mistreatment. It is easy to see that after the interventions, no student $\theta\in G_{1}$ will be matched to a school worse than $\mu(\theta)$ . This is because our interventions focus on helping (certain) $G_{2}$ students reveal their true potentials, hence for any $G_{1}$ student $\theta$ , no student with potential lower than $Z(\theta)$ can have a perceived potential higher than $\widehat{Z}(\theta)=Z(\theta)$ .

Fairness Considerations:

Finding a set of students to allocate vouchers to is a resource allocation problem with natural fairness considerations that guide the choice of the measures to be optimized⁸⁸8To read a more detailed philosophical discussion on relevant philosophies of equality and decision-making, we refer the interested reader to the 1979 Tanner Lectures on Human Values (Sen (1979)).. For the cohort of disadvantaged students, we take the view of finding a distribution of vouchers so that the mistreatment across $G_{2}$ students is as balanced or equitable as possible. We analyze two representative fairness measures in this regard: (1) the total mistreatment across all students, and (2) the maximum mistreatment experienced in this cohort. The former is the continuous $L^{1}$ norm of the mistreatment after voucher allocation, or the positive area under the curve (PAUC). The latter is the continuous $L^{\infty}$ norm, and we refer to it with the shorthand “ $mm$ ”.

	$\displaystyle\sigma(\widetilde{\mu})$	$\displaystyle:=\int_{\theta\in\Theta}m(\theta)\,dF_{1}(Z(\theta))=\\|m(\theta)% \\|_{1},$		(2)
	$\displaystyle mm(\widetilde{\mu})$	$\displaystyle:=\sup_{\theta\in\Theta}m(\theta)=\lim_{p\rightarrow\infty}\left(% \int_{\theta\in\Theta}m(\theta)^{p}\,dF_{1}(Z(\theta))\right)^{1/p}=\\|m(\theta% )\\|_{\infty}.$		(3)

These notions of fairness have been axiomatically established and are well-studied in the literature. For example, the min-max notion of fairness has been considered in (Kumar and Kleinberg 2000), and the notion of positive area under the curve corresponds to average mistreatment of group $G_{2}$ : it is a group notion of fairness consider in many fairness related studies (Conitzer et al. 2019, Dwork and Ilvento 2018, Marsh and Schilling 1994). Since we will show that the solutions of the two extremes $L^{1}$ and $L^{\infty}$ target qualitatively similar sets of students⁹⁹9An $L^{p}$ norm with $p$ small generally measures the average of a function, whereas a large $p$ measures its “peakiness”, with $p=\infty$ equalling the essential supremum (for a further discussion on the relationship between $L^{p}$ spaces, see Folland (1999))., we expect the solution for any other $L^{p}$ norm to also behave similarly and restrict our analysis to $L^{1}$ and $L^{\infty}$ for simplicity.

Optimal Deterministic Strategies:

Before stating the results formally, we need to introduce some notation. A deterministic debiasing set (DDS) $T$ is a measurable subset of $[1,\infty)$ . For $\widehat{c}\in[0,1]$ , let $\mathcal{T}(\widehat{c})$ be all DDS such that $\int_{t\in T}dF_{1}(t)\leq\widehat{c}$ . Here, $\widehat{c}$ denotes the amount of resources or vouchers, and each $T\in\mathcal{T}(\widehat{c})$ represents the potentials of $G_{2}$ students to whom vouchers are provided, in effect, revealing their true potentials. That is, for $\theta\in G_{2}$ such that $Z(\theta)\in T$ , we now have $\widehat{Z}(\theta)=Z(\theta)$ after the intervention. We show in Figure 4 how much the two fairness measures can be maximally improved as a function of the amount of resources $\widehat{c}$ .

Let $\mu_{T}:\Theta\rightarrow[0,1]$ be the ranking of the students after $G_{2}$ students whose true potentials lie in $T$ have been debiased, and let $\mathcal{T}_{mm}(\widehat{c})$ be the collection of sets $T$ such that $\sup(\mu_{T}-\mu)$ is minimized. The next result gives an explicit characterization of these sets, assuming¹⁰¹⁰10We refer to the end of Section 5.1 for a discussion on the various technical assumptions on data from Section 4 and Section 5.1 $p<1-\beta^{\alpha}$ .

Theorem 4.1

Assume $p<1-\beta^{\alpha}$ . Then there exists a set $T=[Z_{1}^{*},Z_{2}^{*}]\in\mathcal{T}_{mm}(\widehat{c})$ such that all other sets in $\mathcal{T}_{mm}(\widehat{c})$ differ from $T$ on a set of measure zero. If $\widehat{c}\geq\frac{(1-p)(1-\beta^{\alpha})}{1-p+1-\beta^{\alpha}}$ , then

\displaystyle Z_{1}^{*}=\bigg{(}\frac{(1-p)+(\frac{1}{\beta^{\alpha}}-1)% \widehat{c}}{\frac{1}{\beta^{\alpha}}-p}\bigg{)}^{-\frac{1}{\alpha}}\quad\text% {and}\quad Z_{2}^{*}=\bigg{(}\frac{(1-p)(1-\widehat{c})}{\frac{1}{\beta^{% \alpha}}-p}\bigg{)}^{-\frac{1}{\alpha}},

and $mm(\mu_{[Z_{1}^{*},Z_{2}^{*}]})=(1-p)(1-\beta^{\alpha})\frac{1-\widehat{c}}{1-% p\beta^{\alpha}}$ , reduced from $mm(\widehat{\mu})=(1-p)(1-\beta^{\alpha})$ . Conversely, if $\widehat{c}\leq\frac{(1-p)(1-\beta^{\alpha})}{1-p+1-\beta^{\alpha}}$ , then:

\displaystyle Z_{1}^{*}=\bigg{(}\frac{(1-p-\widehat{c})\beta^{\alpha}}{1-p}+% \widehat{c}\bigg{)}^{-\frac{1}{\alpha}}\quad\text{ and }\quad Z_{2}^{*}=\bigg{% (}\frac{(1-p-\widehat{c})\beta^{\alpha}}{1-p}\bigg{)}^{-\frac{1}{\alpha}},

and $mm(\mu_{[Z_{1}^{*},Z_{2}^{*}]})=(1-p-\widehat{c})(1-\beta^{\alpha})+p\widehat{c}$ .

We include the proof of Theorem 4.1 in Appendix 12 of the e-Companion. Interestingly, our proof also shows that if vouchers are not distributed carefully, one may actually increase the maximum mistreatment and, more generally, shows which distribution of vouchers leads to an improvement of the status quo. A pictorial representation of Theorem 4.1 is given in Figure 5. The two sub-figures correspond to two choices of $\widehat{c}$ . Moreover, Figure 4(a) shows how much $mm(\mu_{[Z_{1}^{*},Z_{2}^{*}]})$ decreases as $\widehat{c}$ , the amount of resources, increases.

Next, consider minimizing the positive area under the curve (PAUC): this is the aggregate amount of mistreatment experienced by all $G_{2}$ students. In this case, we restrict our attention to debiasing $G_{2}$ students whose potentials are in a connected set — this is a justifiable implementation in practice (otherwise a student might feel fairly treated given that someone with a better potential as well as someone with a worse potential receives the voucher). This assumption also makes our analysis more tractable. In particular, let $\mathcal{T}^{c}(\widehat{c})\subseteq\mathcal{T}(\widehat{c})$ be the family of all connected subsets of $\mathcal{T}(\widehat{c})$ . That is, $\mathcal{T}^{c}(\widehat{c}):=\{[t_{1},t_{2}]:\bar{F}_{1}(t_{1})-\bar{F}_{1}(t% _{2})\leq\widehat{c}\}$ . In addition, let $\mathcal{T}^{c}_{auc}(\widehat{c})$ be the collection of sets $T\in\mathcal{T}^{c}(\widehat{c})$ such that $\sigma(\mu_{T}-\mu)$ is minimized. The next result gives an explicit description of the set $\mathcal{T}^{c}_{auc}(\widehat{c})$ when assuming, again that $p<1-\beta^{\alpha}$ and additionally that $p<0.5$ .

Theorem 4.2

Assume $p<1-\beta^{\alpha}$ and $p<0.5$ . Then $\mathcal{T}^{c}_{auc}(\widehat{c})$ is made up of a unique set $T=[Z_{1}^{*},Z_{2}^{*}]$ . If $\widehat{c}\geq\frac{(1-p)(1-\beta^{\alpha})}{2-p-\beta^{\alpha}-p\beta^{% \alpha}+p\beta^{2\alpha}}$ , then:

\displaystyle Z_{2}^{*}=\bigg{(}\frac{(1-p)(1-\widehat{c})}{p\beta^{\alpha}+% \frac{1}{\beta^{\alpha}}-2p}\bigg{)}^{-\frac{1}{\alpha}}\quad\text{ and }\quad Z% _{1}^{*}=\bigg{(}\frac{(1-p)(1-\widehat{c})}{p\beta^{\alpha}+\frac{1}{\beta^{% \alpha}}-2p}+\widehat{c}\bigg{)}^{-\frac{1}{\alpha}},

and $\sigma(\mu_{[Z_{1}^{*},Z_{2}^{*}]})=\frac{1}{2}(1-p)(1-\beta^{\alpha})\left(% \frac{(\frac{1}{\beta^{\alpha}}-p)(1-\widehat{c})^{2}}{p\beta^{\alpha}+\frac{1% }{\beta^{\alpha}}-2p}\right)$ , down from $\sigma(\widehat{\mu})=\frac{1}{2}(1-p)(1-\beta^{\alpha})$ . Conversely, if $\widehat{c}\leq\frac{(1-p)(1-\beta^{\alpha})}{2-p-\beta^{\alpha}-p\beta^{% \alpha}+p\beta^{2\alpha}}$ , then:

\displaystyle Z_{2}^{*}=\bigg{(}\frac{(p\beta^{\alpha}-1)\widehat{c}+(1-p)}{(1% -p)\frac{1}{\beta^{\alpha}}}\bigg{)}^{-\frac{1}{\alpha}}\quad\text{ and }\quad Z% _{1}^{*}=\bigg{(}\frac{(p\beta^{\alpha}-1)\widehat{c}+(1-p)}{(1-p)\frac{1}{% \beta^{\alpha}}}+\widehat{c}\bigg{)}^{-\frac{1}{\alpha}},

and $\sigma(\mu_{[Z_{1}^{*},Z_{2}^{*}]})=\frac{1}{2}(1-p)(1-\widehat{c})^{2}-\frac{% 1}{2}\beta^{\alpha}\left(\frac{[(p\beta^{\alpha}-1)\widehat{c}+(1-p)]^{2}}{1-p% }+p\widehat{c}^{2}\right)$ .

The proof of Theorem 4.2 is given in Appendix 13 of the e-Companion. A pictorial representation of Theorem 4.2 is given in Figure 6. Again, two sub-figures are presented for two different choices of $\widehat{c}$ . Figure 4(b) shows how much $\sigma(\mu_{[Z_{1}^{*},Z_{2}^{*}]})$ decreases as $\widehat{c}$ , the amount of resources, increases.

We compare the optimal ranges of $G_{2}$ students to debias under the two measures of fairness, with parameters $\alpha=3$ , $\beta=.8$ , and $p=.25$ . We check the optimal intervals under both measures of fairness, and find on average a 95% overlap of the optimal intervals. In particular, both measures suggest that vouchers should be given to the average (middle performing) students. More details are given in Table 4 in the e-Companion.

Although these results highlight an important deviation from the current practice of prioritizing top-performing students for scholarships, we highlight in the next section two fundamental problems with such policies.

5 Incentive Compatible and Individually Fair Voucher Distribution

In Section 4, we characterized the optimal deterministic intervals for the distribution of vouchers under the maximum mistreatment and PAUC measures. In this section, we introduce two natural and desirable properties that the policies developed in Section 4 fail to have. Once we recognize these flaws, we show in Section 5.1 how we can shift from a deterministic voucher distribution policy to a randomized one in order to satisfy them.

Our first property is individual fairness (Dwork et al. 2012), which requires that similar individuals be treated similarly. While a formal definition of this concept is postponed to Section 5.1, we observe here that in the policies developed in Section 4 fail to be individually fair as individuals close to the boundary of the debiasing interval are treated very differently depending on whether they are inside or outside of it.

Our second property is incentive compatibility (see, e.g., Roughgarden (2010)). In general, it requires that no individual can benefit from misrepresenting their features. In our setting, we assume that a student can misrepresent themselves as appearing to have lower potential (e.g., intentionally achieving a lower score in a test) in order to be part of the set of students who can access vouchers. Recall that a DDS is a measurable set $T\subseteq[1,\infty)$ . A DDS is incentive compatible if no student is assigned to a better school if they misreport their performance. Formally, assume that a voucher given to a disadvantaged student with reported perceived potential $\beta Z(\theta)$ , will improve their performance up to $Z(\theta)$ ¹¹¹¹11This assumption is justified by the fact that additional training is usually commensurate with the (perceived) level of a student.; then, a DDS $T$ is incentive compatible if for each $x\in[1,\infty)\setminus T$ and $x^{\prime}\in T$ with $x>x^{\prime}$ , we have $\beta x\geq x^{\prime}$ .

Lemma 5.1

Assume $\beta\in[0,1)$ and let $T\neq\emptyset$ be an incentive compatible DDS. Then $T$ is of the form $\{\theta\in\Theta:Z(\theta)\geq\delta\}$ or $\{\theta\in\Theta:Z(\theta)>\delta\}$ for some value $\delta\in[1,\infty)$ .

We defer the proof to Appendix 10. This lemma shows that, if we care about incentive compatibility and require that vouchers are distributed deterministically, then the only feasible mechanism is to debias all students that have potential above some threshold cutoff $\delta$ . To overcome these flaws in deterministic policies, we next turn to randomization.

5.1 Randomized assignment of vouchers

We now introduce and study randomized mechanisms for the allocation of vouchers. For simplicity of notation, in this section (and in its proofs in Appendix 14 of the e-Companion), we abuse notation and identify a student $\theta$ with their true potential $Z(\theta)$ .

A Randomized Voucher Program (RVP) is a measurable function $\rho:\Theta\to[0,1]$ that gives, for each $\theta\in\Theta$ , the probability that a $G_{2}$ student with true potential $\theta$ is assigned a voucher. Observe that if $\rho(\theta)\in\left\{0,1\right\}$ for all $\theta\in\Theta$ , then $\rho^{-1}(1)$ is a deterministic debiasing set (DDS) as in the definition in Section 4; likewise, given a DDS ${T}$ we can construct the RVP $\rho_{{T}}(\theta):=\mathbbm{1}_{\theta\in{T}}$ that matches this DDS.

The main class of RVPs investigated in this section are Proportional-to-Mistreatment (PropM), denoted by $\rho_{m}$ and defined as

\rho_{m}(\theta):=\frac{2\widehat{c}}{(1-\beta^{\alpha})(1-p)}m(\theta),

(4)

for some $\widehat{c}\in(0,1/2]$ (recall that $m(\theta)$ is the mistreatment of a student with real potential $\theta$ when no vouchers are distributed). It is easy to see that $\widehat{c}$ is the expected proportion of disadvantaged students that will get a voucher, that is $\widehat{c}=\int_{\Theta}\rho_{m}\,dF$ . Intuitively, $\rho_{m}$ assigns a larger probability of being debiased to students with a higher mistreatment.

As we show next, under broadly applicable technical hypotheses on the parameters, PropMs satisfy many of the properties that deterministic voucher allocations fail to have. Moreover, they can lower the maximum expected mistreatment. To state these results formally, we first extend concepts from deterministic DDSs to randomized RVPs. We let $\mu_{\rho}(\theta)$ be the expected school a student with true potential $\theta\in\Theta$ is assigned to under $\rho$ and let $m_{\rho}(\theta):=\max(0,\mu_{\rho}(\theta)-\mu(\theta))$ be the mistreatment they experience under $\rho$ . An RVP $\rho$ is incentive compatible if $\mu_{\rho}(\theta^{\prime})\geq\mu_{\rho}(\theta)$ for all $\theta^{\prime}<\theta$ . That is, an RVP is incentive compatible if a student with true potential $\theta$ is not better off by manipulating themselves to appear as having a true potential $\theta^{\prime}<\theta$ . We define the maximum mistreament as $mm_{\rho}:=\sup_{x\in\Theta}\left\{m_{\rho}(x)\right\}$ .

We define individual fairness as a Lipschitz continuity condition on $\rho$ . We say an RVP $\rho$ is $k$ -individually fair if, for each $\theta,\theta^{\prime}\in[1,\infty)$ , $|\rho(\theta)-\rho(\theta^{\prime})|\leq k|\theta-\theta^{\prime}|$ (note that under this definition, any non-empty DDS $T\neq[1,\infty)$ is not $k$ -individually fair for any $k$ ). We can now state the main result from this section, which is proved in Section 14.3 of the e-Companion. We let $mm^{*}(\widehat{c})$ be the maximum mistreatment achieved by the deterministic policy that minimizes the maximum mistreatment as a function of the available resources $\widehat{c}$ , as computed in Theorem 4.1.

Theorem 5.2

Let $p\in[0,1/2]$ and $\rho_{m}$ be a PropM defined as in (4) for some $\widehat{c}\in(0,1/2]$ . Then:

1.

$\rho_{m}$ is $\frac{2\widehat{c}\alpha}{1-\beta^{\alpha}}$ -individually fair.

$\rho_{m}$ is incentive compatible for

\widehat{c}\leq\frac{1-p}{2\left[p(1-\beta^{\alpha})+(1-p)(\beta^{-\alpha}-1)% \right]}.

(5)

Suppose $p<1-\beta^{\alpha}$ and $\widehat{c}\leq\frac{(1-p)(1-\beta^{\alpha})}{1-p+1-\beta^{\alpha}}$ . Then $mm_{\rho_{m}}\leq mm^{*}(\widehat{c})$ if

\widehat{c}\geq 1-\frac{p+1-\beta^{\alpha}}{4p(1-\beta^{\alpha})}.

(6)

(5) and (6) give complementary conditions on the amount of vouchers that can be given out. On one hand, (5) suggests that distributing too many vouchers prevents incentive compatibility of the PropM. In fact, a $\widehat{c}$ too large causes students performing just above the most mistreated student to be incentivized to artificially lower their score, as the absolute value of the derivative of the PropM becomes large around its maximum¹²¹²12This observation can also be verified numerically.. On the other hand, (6) suggests that we need to distribute enough vouchers to see the maximum expected mistreatment (i.e., $mm_{\rho_{m}}$ ) drop below the optimal deterministic one (i.e., $mm^{*}(\widehat{c})$ ). This is because the optimal deterministic policy debiases the most mistreated student straight away whereas the PropM distributes vouchers more widely, and so the maximum expected mistreatment does not immediately drop as significantly. As we discuss at the end of the section, both conditions are satisfied for a large range of parameters.

PropMs represent therefore a more robust and theoretically satisfying alternative, yet at least as effective, to the deterministic voucher assignments developed in Section 4 to minimize the maximum mistreatment.

We conclude this section be observing that to design a non-trivial incentive compatible RVP, it is essential to have knowledge of the distribution of student potentials. Define an RVP $\rho$ Increasing-with-Potential (IwP) if $\rho(\theta)\geq\rho(\theta^{\prime})$ for all $\theta>\theta^{\prime}$ . An IwP assigns a higher probability of being debiased to students with higher potential. It can therefore be interpreted as a randomized counterpart of the DDS from Lemma 5.1 (in particular, the DDS from Lemma 5.1 is IwP).

Consider now the more general version of the model from Section 2, where the potentials are allowed to be drawn from any continuous, integrable distribution $F$ . All definitions of mistreatment, incentive compatibility, etc. naturally extend to this setting. We first show that, under mild technical conditions, IwPs are incentive compatible with respect to any $F$ . This fact is proved in Section 14.4 of the e-Companion.

Lemma 5.3

Suppose $\rho$ is IwP and such that it is everywhere continuously differentiable except a countable set of isolated points where it has right-continuous jump discontinuities. Then, for any distribution of potentials $F$ , $\rho$ is incentive compatible.

The following theorem gives a converse to the previous statement and it is also proved in Section 14.4 of the e-Companion.

Theorem 5.4

Suppose $\rho$ is an RVP. Let $\theta\in[1,\infty)$ such that $\rho$ is continuously differentiable in some neighborhood of $\theta$ but $\rho^{\prime}(\theta)<0$ . Then, for any $\beta\in(0,1)$ and $p\in(0,1)$ , there exists a continuous distribution $F$ such that, if true potentials are sampled from distribution $F$ , $\rho$ is not incentive compatible.

Theorem 5.4 implies that without any information on the distribution of student potentials, the only voucher distribution policies that are incentive compatible are those that allocate vouchers in such a way that a student with a higher potential always has a higher chance of receiving a voucher. Examples of such policies are lotteries for students whose potential is above a certain threshold. Hence, if no information on the distribution of students potential can be assumed, it is reasonable that policy-makers stick to a more conservative distribution of vouchers which rewards top-performing students.

Discussion on technical assumptions.

We now discuss the technical assumptions on the parameters of the model in Section 4 and Section 5. In Theorem 4.1, Theorem 4.2, and Theorem 5.2 we assume $p<1-\beta^{\alpha}$ . Note that the right hand side is equal to $F_{2}(1)-F_{2}(\beta)$ , that is, the proportion of disadvantaged students whose perceived potential is less than $1$ (that of any non-disadvantaged student). The condition $p<1-\beta^{\alpha}$ therefore requires that the proportion of disadvantaged students out of the whole student population is no more than the proportion of disadvantaged students that are perceived as being worse than any non-disadvantaged student. In Theorem 4.2, we further assume $p<0.5$ , and in Theorem 5.2 we assume both an upper and a lower bound on $\widehat{c}$ . The conditions need to be checked and do not always hold, but we note that all conditions hold for many reasonable choices of $(\alpha,\beta,p,\widehat{c})$ . For instance, they hold if $\beta=.8$ , $\alpha=3$ , $p<.4$ (as in Figure 4 and the Figure 5(b)), and $\widehat{c}\leq 1/4$ ; or if $\beta=.9$ , $\alpha=8.9$ , $p\leq 1/3$ , and $\widehat{c}\leq 1/4$ (as in our numerical experiments in the next section).

6 Experimental Case Study

Our theoretical analysis has shown that student mistreatment under various metrics can be substantially reduced by creating an intervention tailored to the distribution of student potential. We now use the student performance data from the NYC DOE, and compute the optimal ways to reduce student mistreatment, given their heterogeneous school preferences and distribution of performance. We show that our theoretical model provides a reasonable approximation despite some deviations from the data, and the qualitative results remain the same. Our theoretical analysis has therefore been instrumental in identifying effective debiasing policies for the real-world application, that can be optimized empirically depending on the actual data.

In New York City, there are eight SHSs that are considered to be among the top public high schools in the city. Admissions to these schools is highly competitive, with an intake of only about $5,000$ students every year (from a pool of $29,000$ students who take the SHSAT). We remark that our model and the experimental data differ in two features. First, while schools’ preference over students are based strictly on the students’ scores on SHSAT and thus all schools share the same preference list over students, students’ preference for schools are not unanimous. However, as already remarked in the introduction, in the 2016-17 SHSAT cohort, 56% of students indicated Stuyvesant or Brooklyn Tech as their first preference, with 76% naming at least one of the two in their top two preferences, showing some alignment among preferences. In order to apply our analysis, we compute a stable matching using the real preferences of students. Second, although the Pareto distribution was mostly chosen for ease of theoretical analysis, we find that it adequately fits the data (see Figure 2). Overall, we find further evidence of reasonability of our assumptions since the empirical results on real-world data match the optimal target distribution of students predicted by our assortative model.

We describe our analysis using the dataset from the 2016-17 academic year. For each student, the dataset includes their SHSAT score (i.e., perceived potential), their preference over the schools, and whether they are in the $G_{1}$ group or the $G_{2}$ group (based on the DOE definition). From the dataset, we estimate $p=0.319$ and the Pareto distribution parameter to be $\alpha=8.9$ . The true potentials are computed by inflating the test scores of disadvantaged students by a factor of $\frac{1}{\beta}\approx 1.13$ (see Figure 1). The dataset is then restricted to all students who would receive an offer under their true potential.

First, we show empirically that without intervention, all $G_{1}$ students (magenta dots) have non-positive displacement and all $G_{2}$ students (blue dots) have non-negative displacement. These results are aligned with our analysis in Section 3. Furthermore, we consider deterministic and randomized interventions with $\widehat{c}=0.17$ . In Figure 7, top, deterministic vouchers are offered to students between the two dashed lines. All $G_{2}$ students with vouchers (red dots) have a displacement of at most zero, but some $G_{2}$ students (green dots) might get worse, particularly the ones who are scoring just slightly higher than the range to which vouchers are offered, as they are overtaken by some other $G_{2}$ students just below them. This highlights the non-incentive compatible nature of such deterministic policies.

Applying the randomized voucher program to this dataset requires further modifications. Since students have heterogeneous preferences, their mistreatment is also heterogeneous; in the worst case two students with the same score may have different mistreatment. To produce an empirical PropM (see Figure 8), we divide the potentials of admitted students into 20 equal sized intervals and compute an average mistreatment within each interval. The PropM is then normalized to be proportional to this average mistreatment. Since PropM is a randomized voucher program, we ran the experiment 100 times and took the average displacement, which is plotted in Figure 7. Our experiments show that the maximum mistreatment is reduced (in particular, it is similar to the maximum mistreatment after the theoretically optimal deterministic debiasing), and more generally the mistreatment is improved across the board, indicating a more equitable outcome than the deterministic vouchers. Note that due to the heterogeneity of preferences and binned averaging, PropM is not in fact incentive compatible, as indicated by the $G_{2}$ students with larger displacement under intervention (green diamonds). However, it can be observed that, unlike the deterministic debiasing procedure, students with an incentive to underperform are interspersed with students with no incentive to underperform, making it harder for students to game the system. Moreover, it is not uncommon that some theoretically incentive compatible mechanisms exhibits in practice some lack of incentive compatibility. For instance, the NYC School Match mechanism curtails the preference lists of students to at most 12 schools (Abdulkadiroğlu et al. 2005), incentivizing students to be at least partially strategic.

Last, we consider the PAUC measure, in two cases: low resources ( $\widehat{c}=0.1$ ) and abundant resources ( $\widehat{c}=0.4$ ). For both cases, we plug in the values to formula in Theorem 4.2 to obtain the theoretically optimal range of students to offer vouchers to, which is then shown to be close to the empirically optimal range obtained via grid search (see Table 1). Moreover, for both choices of $\widehat{c}$ , the PAUC under our policies is substantially better than the one achieved by debiasing only the top students with total mass equal to $\widehat{c}$ .

	$\widehat{c}=0.1$	$\widehat{c}=0.4$
theoretically optimal range	$[529.01,547.27]$	$[506.86,583.09]$
empirically optimal range	$[527.29,543.17]$	$[504.61,561.31]$

Table 1: Comparing optimal ranges of students to offer vouchers obtained empirically and “theoretically” based on formula in Theorem 4.2, under two different amounts of resources.

7 Discussion

The qualitative takeaways from our work speak to a much ingrained systemic problem that limits access to opportunities – how can one understand the impact of bias on societal practices and systematically account for biases in the real world. Indeed, resources available for meaningful interventions in an existing system are limited, and there is resistance to change: for instance, a 2019 plan supported by the then New York City’s mayor to eliminate the entrance exam to top public high schools has failed to gain enough support, and was not approved by the New York State government (Shapiro and Wang 2019). Thus, our focus is on understanding the impact of minimally invasive use of targeted resources, as opposed to changing the matching mechanism itself.

From our analysis, we are able to highlight the following qualitative properties using simple models of bias and matching mechanism:

1.

Disparate Impact: The disparity in admissions is experienced much more by the disadvantaged group of students, compared to the marginal advantage for the rest.
2.

Interventions: A carefully-designed randomized voucher distribution program can counter some of the effects of bias, while also being incentive compatible and individually fair. We further showed empirically that our qualitative results remain unchanged when applied to our dataset (the admissions process for the 8 SHSs in New York).
3.

Resources: Additional resources centrally distributed to slightly above-average students overall in the system (e.g., top performers in schools with high economic need index) would maximally impact fairness measures considered in this work. Targeting resources to students based on their performance provides an important lever to policy makers to improve fairness of the system.

These takeaways are a first step, and in no way address all the systemic problems in school admissions process – such as access to counselors, transport to schools or familial support towards education. But they do help us understand the most impacted student groups, and provide a mathematical basis to policymakers to make changes to allocation of the city funds and scholarships. We have shared the results of this work and are in discussions with the Department of Education of New York City. Further, our analysis leads to open questions such as theoretically optimal interventions under other structured student preferences and qualitative analysis when distributions of students’ potentials is not Pareto distributed.

Acknowledgments.

The authors are deeply indebted to the editors and the reviewers for the many comments and suggestions on an earlier version of the manuscript.

References

Abdulkadiroğlu (2005) Abdulkadiroğlu A (2005) College admissions with affirmative action. International Journal of Game Theory 33:535–549.
Abdulkadiroğlu et al. (2005) Abdulkadiroğlu A, Pathak PA, Roth AE (2005) The New York City high school match. American Economic Review 95(2):364–367.
Arcidiacono et al. (2011) Arcidiacono P, Aucejo EM, Fang H, Spenner KI (2011) Does affirmative action lead to mismatch? a new test and evidence. Quantitative Economics 2(3):303–333.
Arnosti (2019) Arnosti N (2019) A continuum model of stable matchings with finite capacities, talk at Simons Institute for the Theory of Computing.
Ashkenas et al. (2017) Ashkenas J, Park H, Pearce A (2017) Even with affirmative action, blacks and hispanics are more underrepresented at top colleges than 35 years ago. New York Times 1–18.
Azevedo and Leshno (2016) Azevedo EM, Leshno JD (2016) A supply and demand framework for two-sided matching markets. Journal of Political Economy 124(5):1235–1268.
Backes (2012) Backes B (2012) Do affirmative action bans lower minority college enrollment and attainment?: Evidence from statewide bans. Journal of Human resources 47(2):435–455.
Becker (2010) Becker GS (2010) The economics of discrimination (University of Chicago press).
Biró (2008) Biró P (2008) Student admissions in Hungary as Gale and Shapley envisaged. University of Glasgow Technical Report TR-2008-291 .
Biró et al. (2010) Biró P, Fleiner T, Irving RW, Manlove DF (2010) The college admissions problem with lower and common quotas. Theoretical Computer Science 411(34-36):3136–3153.
Boschma and Brownstein (2016) Boschma J, Brownstein R (2016) The concentration of poverty in american schools. The Atlantic 29.
Burgess et al. (2015) Burgess S, Greaves E, Vignoles A, Wilson D (2015) What parents want: School preferences and school choice. The Economic Journal 125(587):1262–1289.
Calsamiglia et al. (2010) Calsamiglia C, Haeringer G, Klijn F (2010) Constrained school choice: An experimental study. American Economic Review 100(4):1860–74.
Celis et al. (2021) Celis LE, Hays C, Mehrotra A, Vishnoi NK (2021) The effect of the rooney rule on implicit bias in the long term. Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 678–689.
Celis et al. (2020) Celis LE, Mehrotra A, Vishnoi NK (2020) Interventions for ranking in the presence of implicit bias. Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, 369–380.
Chade et al. (2014) Chade H, Lewis G, Smith L (2014) Student portfolios and the college admissions problem. Review of Economic Studies 81(3):971–1002.
Chan and Eyster (2003) Chan J, Eyster E (2003) Does banning affirmative action lower college student quality? American Economic Review 93(3):858–872.
Chin (2022) Chin WW (2022) Equity and excellence, four years later. City Journal URL https://www.city-journal.org/article/equity-and-excellence-four-years-later, accessed: 2024-07-01.
Clauset et al. (2009) Clauset A, Shalizi CR, Newman ME (2009) Power-law distributions in empirical data. SIAM review 51(4):661–703.
Coate and Loury (1993) Coate S, Loury GC (1993) Will affirmative-action policies eliminate negative stereotypes? The American Economic Review 1220–1240.
Conitzer et al. (2019) Conitzer V, Freeman R, Shah N, Vaughan JW (2019) Group fairness for the allocation of indivisible goods. Proceedings of the 33rd AAAI Conference on Artificial Intelligence (AAAI).
Corcoran and Baker-Smith (2018) Corcoran SP, Baker-Smith EC (2018) Pathways to an elite education: Application, admission, and matriculation to new york city’s specialized high schools. Education Finance and Policy 13(2):256–279.
Dee and Jacob (2011) Dee TS, Jacob B (2011) The impact of no child left behind on student achievement. Journal of Policy Analysis and management 30(3):418–446.
Dessein et al. (2023) Dessein W, Frankel A, Kartik N (2023) Test-optional admissions. arXiv preprint arXiv:2304.07551 .
Drasgow (1984) Drasgow F (1984) Scrutinizing psychological tests: Measurement equivalence and equivalent relations with external variables are the central issues. .
Dwork et al. (2012) Dwork C, Hardt M, Pitassi T, Reingold O, Zemel R (2012) Fairness through awareness. Proceedings of the 3rd Innovations in Theoretical Computer Science conference, 214–226 (ACM).
Dwork and Ilvento (2018) Dwork C, Ilvento C (2018) Group fairness under composition.
Emelianov et al. (2020) Emelianov V, Gast N, Gummadi KP, Loiseau P (2020) On fair selection in the presence of implicit variance. Proceedings of the 21st ACM Conference on Economics and Computation, 649–675.
Fershtman and Pavan (2021) Fershtman D, Pavan A (2021) “soft” affirmative action and minority recruitment. American Economic Review: Insights 3(1):1–18.
Folland (1999) Folland GB (1999) Real analysis: modern techniques and their applications, volume 40 (John Wiley & Sons).
Gale and Shapley (1962) Gale D, Shapley LS (1962) College admissions and the stability of marriage. The American Mathematical Monthly 69(1):9–15.
Garg et al. (2021) Garg N, Li H, Monachou F (2021) Standardized tests and affirmative action: The role of bias and variance. Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 261–261.
Gratz v. Bollinger (2003) Gratz v Bollinger (2003) Gratz v. bollinger, 539 u.s. 244 (2003). .
Greenwald et al. (1996) Greenwald R, Hedges LV, Laine RD (1996) The effect of school resources on student achievement. Review of educational research 66(3):361–396.
Grimmett and Stirzaker (2020) Grimmett G, Stirzaker D (2020) Probability and random processes (Oxford university press).
Hafalir et al. (2013) Hafalir IE, Yenmez MB, Yildirim MA (2013) Effective affirmative action in school choice. Theoretical Economics 8(2):325–363.
Hastings et al. (2009) Hastings J, Kane TJ, Staiger DO (2009) Heterogeneous preferences and the efficacy of public school choice. NBER Working Paper 2145:1–46.
Hu et al. (2019) Hu L, Immorlica N, Vaughan JW (2019) The disparate effects of strategic manipulation. Proceedings of the Conference on Fairness, Accountability, and Transparency, 259–268.
Kamada and Kojima (2024) Kamada Y, Kojima F (2024) Fair matching under constraints: Theory and applications. Review of Economic Studies 91(2):1162–1199.
Kannan et al. (2019) Kannan S, Roth A, Ziani J (2019) Downstream effects of affirmative action. Proceedings of the Conference on Fairness, Accountability, and Transparency, 240–248.
Kleinberg et al. (2017) Kleinberg J, Mullainathan S, Raghavan M (2017) Inherent trade-offs in the fair determination of risk scores. 8th Innovations in Theoretical Computer Science Conference (ITCS 2017) (Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik).
Kleinberg and Raghavan (2018) Kleinberg J, Raghavan M (2018) Selection problems in the presence of implicit bias. 9th Innovations in Theoretical Computer Science Conference (ITCS 2018) (Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik).
Kucsera and Orfield (2014) Kucsera J, Orfield G (2014) New York State’s extreme school segregation: Inequality, inaction and a damaged future .
Kumar and Kleinberg (2000) Kumar A, Kleinberg J (2000) Fairness measures for resource allocation. Proceedings 41st Annual Symposium on Foundations of Computer Science, 75–85 (IEEE).
Laverde (2020) Laverde M (2020) Unequal assignments to public schools and the limits of school choice. Unpublished working paper .
Liu and Garg (2021) Liu Z, Garg N (2021) Test-optional policies: Overcoming strategic behavior and informational gaps. Proceedings of the 1st ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization, 1–13.
Long (2004) Long MC (2004) Race and college admissions: An alternative to affirmative action? review of Economics and Statistics 86(4):1020–1033.
Lovaglia et al. (1998) Lovaglia MJ, Lucas JW, Houser JA, Thye SR, Markovsky B (1998) Status processes and mental ability test scores. American Journal of Sociology 104(1):195–228.
Marsh and Schilling (1994) Marsh MT, Schilling DA (1994) Equity measurement in facility location analysis: A review and framework. European Journal of Operational Research 74(1):1–17.
Nguyen and Vohra (2019) Nguyen T, Vohra R (2019) Stable matching with proportionality constraints. Operations Research .
Niu et al. (2022) Niu M, Kannan S, Roth A, Vohra R (2022) Best vs. all: Equity and accuracy of standardized test score reporting. Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, 574–586.
NYC DOE (2018) NYC DOE (2018) Specialized high schools proposal. https://www.schools.nyc.gov/docs/default-source/default-document-library/specialized-high-schools-proposal.
NYC DOE (2019) NYC DOE (2019) 2019 NYC High School Directory. https://bigappleacademy.com/wp-content/uploads/2018/06/HSD_2019_ENGLISH_Web.pdf.
Phelps (1972) Phelps ES (1972) The statistical theory of racism and sexism. The american economic review 62(4):659–661.
Quinn Capers et al. (2017) Quinn Capers I, Clinchot D, McDougle L, Greenwald AG (2017) Implicit racial bias in medical school admissions. Academic Medicine 92(3):365–369.
Roth and Sotomayor (1992) Roth AE, Sotomayor M (1992) Two-sided matching. Handbook of game theory with economic applications 1:485–541.
Roughgarden (2010) Roughgarden T (2010) Algorithmic game theory. Communications of the ACM 53(7):78–86.
Salem and Gupta (2023) Salem J, Gupta S (2023) Secretary problems with biased evaluations using partial ordinal information. Management Science .
Sen (1979) Sen A (1979) Equality of what? The Tanner lecture on human values 1.
Shapiro (2019a) Shapiro E (2019a) Racist? fair? biased? asian-american alumni debate elite high school admissions. The New York Times Magazine .
Shapiro (2019b) Shapiro E (2019b) Should a single test decide a 4-year-old’s educational future? New York Times .
Shapiro (2021) Shapiro E (2021) Only 8 black students are admitted to stuyvesant high school. New York Times .
Shapiro (March 26, 2019) Shapiro E (March 26, 2019) Segregation has been the story of New York City’s schools for 50 years. The New York Times Magazine .
Shapiro and Lai (June 03, 2019) Shapiro E, Lai KKR (June 03, 2019) How New York’s elite public schools lost their black and hispanic students. The New York Times Magazine .
Shapiro and Wang (2019) Shapiro E, Wang V (2019) Amid racial divisions, mayor’s plan to scrap elite school exam fails. New York Times .
Texas Comptroller of Public Accounts (2024) Texas Comptroller of Public Accounts (2024) Top 10% rule. URL https://comptroller.texas.gov/programs/education/msp/funding/aid/state-programs/txttp.php, accessed: 2024-07-01.
Tomoeda (2018) Tomoeda K (2018) Finding a stable matching under type-specific minimum quotas. Journal of Economic Theory 176:81–117.
Wenneras and Wold (2010) Wenneras C, Wold A (2010) Nepotism and sexism in peer-review. Women, science, and technology, 64–70 (Routledge).

{APPENDICES}

8 Discussion on discrete versus continuous models

Traditionally, matching markets are assumed to be discrete (Gale and Shapley 1962, Roth and Sotomayor 1992). There has been however, in recent years, an interest for models where one or both sides of the markets are continuous (Arnosti 2019, Azevedo and Leshno 2016). This is justified by the fact that, in many applications, markets are large, hence predictions in continuous markets often translate with a good degree of accuracy to discrete ones. Moreover, continuous markets are often analytically more tractable than discrete ones (see, again, Arnosti (2019), Azevedo and Leshno (2016)). Our case is no exception: the continuous model allows us to deduce precise mathematical formulae, while we show through experiments that those formulae are a good approximation to the discrete case. We also provide additional experiments evaluating the robustness of our results under relaxation of assumptions, such as that of a unique bias factor for all students in $G_{2}$ . We remark that the goal of this study is not to provide a mechanism to admits students to schools, for which the assumption of all rankings of schools as well as of students being the same would be too simplistic. On the contrary, as we want to understand the impact of bias at a macroscopic level, we believe our approximation to be meaningful and useful since as in our model any reasonable mechanism would output the same assignment. Note that in the classical discrete model, when schools and students have unique ranking, there is only one stable assignment, which is also Pareto-optimal for students. A similar statement holds for the appropriate translations of those concepts to our model.

9 Discussion on alternate models of bias

We fitted two complementary models of bias to the 2016-17 SHSAT score distribution that forms the basis of our experiments on real data in Section 6: a multiplicative model and an additive model. In the multiplicative model we assume the perceived scores of $G_{2}$ students are given by $\widehat{Z}=\beta Z$ for some $\beta<1$ , and in the additive model we assume $\widehat{Z}=Z+\beta$ for some $\beta<0$ . We attempted to fit each model to the real SHSAT data of $G_{1}$ and $G_{2}$ students. To fit a model, we chose a measure for the similarity of distributions, then chose the $\beta$ value that minimized this measure. We performed this under both the Wasserstein distance metric as well as Kullback-Leibler divergence as our measures of similarity. Each optimization problem (for each measure of similarity and each model) had a unique minimum, which was our best fit $\beta$ . We then compared the additive and multiplicative models based on the measure of similarity under best fit $\beta$ . We found that under both metrics, the best fit $\beta$ had lower distance between adjusted $G_{1}$ scores and $G_{2}$ scores in the multiplicative model than the additive model. Thus we conclude that the multiplicative model matches the data better. For our final analysis we chose the best fit $\beta$ based on the Wasserstein metric (as opposed to Kullback-Leibler divergence) as we believe this is the more appropriate measure of similarity of distributions in this setting.

Model	Metric	Value of Metric	Beta fit $\beta$
Additive	KL-divergence	0.0321	-36.9
Multiplicative	KL-divergence	0.0293	0.902
Additive	Wasserstein distance	9.52	-49.0
Multiplicative	Wasserstein distance	5.36	0.882

Table 2: Comparison of Additive and Multiplicative Models using KL-divergence and Wasserstein Distance Metrics

10 Proof of Lemma 5.1

Assume $T$ is not of the form $\{\theta\in\Theta:Z(\theta)\geq\delta\}$ or $\{\theta\in\Theta:Z(\theta)>\delta\}$ for some $\delta$ and let $U$ be a connected and inclusionwise maximal subset of $T$ that is bounded. Take the smallest number $\delta_{1}\in[1,\infty)$ so that $Z(\theta)\leq\delta_{1}$ for all $\theta\in U$ . Let $\overline{\theta}\in\theta$ such that $Z(\overline{\theta})=\delta_{1}$ .

Assume first that $\overline{\theta}\in U$ . Then, for each $\varepsilon_{1}>0$ , there exists $\varepsilon\in(0,\varepsilon_{1}]$ such that $Z^{-1}(\delta_{1}+\varepsilon)\notin T$ . Since $\beta<1$ and by continuity of $Z(\cdot)$ , there exists $\varepsilon_{2}>0$ such that $\beta(\delta_{1}+\varepsilon)<\delta_{1}$ for all $\varepsilon<\varepsilon_{2}$ . We can then take an appropriate $x\in[\delta_{1},\delta_{1}+\varepsilon_{2}]\setminus T$ and $x^{\prime}=\delta_{1}$ to show that $T$ is not incentive compatible.

Next assume $\overline{\theta}\notin U$ . In particular, we have $\overline{\theta}\notin T$ . Similarly to the case above, we can find $\varepsilon>0$ such that $x^{\prime}=\delta_{1}-\varepsilon$ satisfies $Z^{-1}(x^{\prime})\in U\subseteq T$ and $\beta\delta_{1}<x^{\prime}$ . Setting $x=\delta_{1}$ , we deduce that $T$ is not incentive compatible. \Halmos

\ECSwitch

\ECHead

Online Supplement

11 Impact on Schools

This appendix explores the school’s perspective: the impact of bias on utility (quality of accepted students) and diversity for schools, as well as school-driven interventions such as interviews. In the notation of the two-group model in Section 2, we define the utility $u_{\gamma}(s)$ of a school $s$ under matching $\gamma\in\left\{\mu,\widehat{\mu}\right\}$ , as

u_{\gamma}(s):=\int_{\theta\in\gamma^{-1}(s)\cap G_{1}}Z(\theta)dF_{1}(% \widehat{Z}(\theta))+\int_{\theta\in\gamma^{-1}(s)\cap G_{2}}Z(\theta)dF_{2}(% \widehat{Z}(\theta)).

(7)

That is, the utility of a school is the average true potential of admitted students. Continuing from Example 2.2, let $s_{M}$ (resp. $s_{L}$ ) be the school Maya (resp. Lisa) is assigned to in the biased setting. Following Proposition 11.1, the utilities of $s_{M},s_{L}$ in the unbiased setting are

u_{\mu}(s_{M})=1.283\quad\hbox{and}\quad u_{\mu}(s_{L})=1.324,

while in the biased setting, they are

u_{\widehat{\mu}}(s_{M})=1.280\quad\hbox{and}\quad u_{\widehat{\mu}}(s_{L})=1.% 320.

Hence, the change in the utilities of the two school between the two settings is negligible. We develop the theory to validate these observations in this appendix.

We discuss first the impact of bias on the average true potential of students accepted by a school. Let $s\in[0,1]$ denote the school that is ranked in the $s\times 100\%$ position among the continuous range of schools. As the next proposition shows, the impact on the utilities of schools is negligible for all schools other than the lowest ranked schools. This is because for each school, although the average potential of assigned $G_{1}$ students is lower than it should be, its assigned $G_{2}$ students have much higher true potentials. And thus, the toll on the utility due to unqualified $G_{1}$ students is partially canceled out by the overqualified $G_{2}$ students and the net effect is minimal. On the other hand, some lower ranked schools that only admit $G_{2}$ students fare better in the biased setting (since they admit over-qualified $G_{2}$ candidates).

Proposition 11.1

For school $s$ , its utility under the unbiased (resp. biased) models are respectively

u_{\mu}(s)=s^{-\frac{1}{\alpha}}\quad\hbox{ and }\quad u_{\widehat{\mu}}(s)=% \begin{cases}\displaystyle\frac{1-p+p\beta^{\alpha}}{1-p+p\beta^{\alpha+1}}% \left(\frac{s}{1-p+p\beta^{\alpha}}\right)^{-\frac{1}{\alpha}}&\text{ if }s% \leq 1-p+p\beta^{\alpha},\\[2.84526pt] \displaystyle\left(\frac{s-(1-p)}{p}\right)^{-\frac{1}{\alpha}}&\text{ if }s>1% -p+p\beta^{\alpha}.\end{cases}

The key idea in the proof is to first compute the cutoffs at each school for each of the two groups, that is, the minimum perceived potential needed for a student to be matched to a given school. Once these are known, using Bayes’ rule, we deduce the minimum real potential needed by students of each group to attend the school. From the latter, we can immediately compute the average utility of each school.

Proof 11.2

Proof of Proposition 11.1. In order for a student $\theta$ to be assigned to a school that is at least as good as $s$ , their perceived potential $\widehat{Z}(\theta)$ needs to be high enough to satisfy $(1-p)\bar{F}_{1}(1\vee\widehat{Z}(\theta))+p\bar{F}_{2}(\widehat{Z}(\theta))\leq s$ . That is, we need

\widehat{Z}(\theta)\geq d(s):=\begin{cases}\displaystyle\left(\frac{s}{1-p+p% \beta^{\alpha}}\right)^{-\frac{1}{\alpha}}&\text{if }s\leq 1-p+p\beta^{\alpha}% ,\\[2.84526pt] \displaystyle\beta\left(\frac{s-(1-p)}{p}\right)^{-\frac{1}{\alpha}}&\text{if % }s>1-p+p\beta^{\alpha}.\end{cases}

We call $d(s)$ the cutoff for school $s$ . With the cutoffs, we can compute the utilities of schools. We start with the formula for $u_{\widehat{\mu}}(s)$ . First note that by Bayes rule, the probability that a given student with perceived potential $\widehat{Z}(\theta)\geq 1$ belongs to $G_{1}$ is $\frac{1-p}{1-p+p\beta^{\alpha+1}}$ . Using Equation (1), observe that the $G_{2}$ student whose perceived potential is $1$ (i.e., true potential is $\frac{1}{\beta}$ ) is matched to school $1-p+p\beta^{\alpha}$ . Thus, if $s\geq 1-p+p\beta^{\alpha}$ , $s$ is only assigned with $G_{2}$ students. Therefore, when $s\leq 1-p+p\beta^{\alpha}$ ,

u_{\widehat{\mu}}(s)=\frac{1-p}{1-p+p\beta^{\alpha+1}}d(s)+\frac{p\beta^{% \alpha+1}}{1-p+p\beta^{\alpha+1}}\frac{d(s)}{\beta}=\frac{1-p+p\beta^{\alpha}}% {1-p+p\beta^{\alpha+1}}\left(\frac{s}{1-p+p\beta^{\alpha}}\right)^{-\frac{1}{% \alpha}}.

And when $s>1-p+p\beta^{\alpha}$ , we have

u_{\widehat{\mu}}(s)=d(s)/\beta=\left(\frac{s-(1-p)}{p}\right)^{-\frac{1}{% \alpha}}.

One the other hand, when there is no bias against $G_{2}$ students, we simply have $u_{\mu}(s)=s^{-\frac{1}{\alpha}}$ . \Halmos

As one readily observes from Proposition 11.1, the negative impact of bias on schools’ utility is negligible. Hence, from an operational perspective, it may be hard to convince schools to autonomously put in place mechanisms to alleviate the effect of bias given the limited impact on them.

Let $pr(s)$ (resp. $\widehat{pr}(s)$ ) be the proportion of $G_{2}$ students assigned to school $s$ when there is no bias (resp. there is bias) against $G_{2}$ students. Since the distribution of potentials is the same for both $G_{1}$ and $G_{2}$ students, it is immediate that $pr(s)=p$ when there is no bias.

Proposition 11.3

Without bias, we have $pr(s)=p$ . Under the biased setting, we have

\widehat{pr}(s)=\begin{cases}\displaystyle\frac{p\beta^{\alpha}}{1-p+p\beta^{% \alpha}}&\text{ if }s\leq 1-p+p\beta^{\alpha},\\ 1&\text{ if }s>1-p+p\beta^{\alpha}.\end{cases}

Proof 11.4

Proof. The formula for $\widehat{pr}(s)$ follows from the analysis of utility of schools in Proposition 11.1.\Halmos

A visual comparison of $pr(s)$ and $\widehat{pr}(s)$ can be found in Figure 9 for different values of $\beta$ and $p$ . In particular, we show that the proportion of $G_{2}$ students in higher ranked schools decreases significantly in the biased setting.

12 Proof of Theorem 4.1 and related facts

12.1 Technical discussion

The main idea of the proof is to first assume that the set $T$ forms a connected set (i.e., a closed interval). Then, we can express $mm(\mu_{T})$ as a function of the endpoints of $T$ and work out the minimizing interval. We next drop the assumption that $T$ is connected and show that the optimal set of students to debias remains the same. The analysis we give is actually more general, and presents results under which vouchers improve the mistreatment of students lexicographically. Interestingly, it also shows that, if vouchers are not distributed carefully, one may actually worsen the most mistreated students.

12.2 A more general approach

The analysis we give is actually leads to a more general statement than Theorem 4.1, and has the goal of investigating conditions under which giving vouchers can improve over the status quo. More formally, for bounded functions $f,g:G_{2}\rightarrow\mathbb{R}$ , we write $f\succ g$ if we can partition $G_{2}$ in two sets $S,S^{\prime}$ (with possibly $S^{\prime}=\emptyset$ ) so that $f(\theta)=g(\theta)$ for $\theta\in S^{\prime}$ and $\sup_{\theta\in S}f(\theta)>\sup_{\theta\in S}g(\theta)$ . Note that $\succ$ is transitive and antisymmetric, and can be interpreted as a continuous equivalent of the classical lexicographic ordering for discrete vectors. In particular, if we let $f=\gamma-\mu$ and $g=\gamma^{\prime}-\mu$ for matchings $\gamma,\gamma^{\prime}$ , then $\sup_{\theta\in G_{2}}(\gamma-\mu)(\theta)>\sup_{\theta\in G_{2}}(\gamma^{% \prime}-\mu)(\theta)$ implies $f\succ g$ (taking $S=G_{2}$ ). Now suppose we debias student in $T=[Z_{1},Z_{2}]$ for some $T\in\mathcal{T}(\widehat{c})$ , and let $f:=\widehat{\mu}-\mu$ , $g:=\mu_{T}-\mu$ . Table 3 provides conditions under which $f\succ g$ (i.e., intervention reduces the maximum mistreated experienced by $G_{2}$ students). In particular it shows that for certain combinations of the data and the choice of $Z_{1}$ and $Z_{2}$ , giving vouchers may actually lead to worse (according to $\succ$ ) matchings. One can check that under assumption $p<1-\beta^{\alpha}$ , all conditions given in Table 3 for different cases are satisfied.

CASE	subcase	condition for $\widehat{\mu}-\mu\succ\mu_{T}-\mu$
I. $\beta Z_{2}\geq Z_{1}$	1. $1\leq\beta Z_{1}$	$\displaystyle p<1-\left(\frac{Z_{1}}{Z_{2}}\right)^{\alpha}$
I. $\beta Z_{2}\geq Z_{1}$	2. $\beta Z_{1}\leq 1\leq\beta Z_{2}$	$\displaystyle p<1-\left(\frac{1}{\beta Z_{2}}\right)^{\alpha}$
II. $\beta Z_{2}\leq Z_{1}$	1. $1\leq\beta Z_{1}$	$\displaystyle p<1-\beta^{\alpha}$
	2. $\beta Z_{1}\leq 1\leq\beta Z_{2}$	$\displaystyle p\left(\left(\frac{1}{Z_{1}}\right)^{\alpha}-\left(\frac{1}{Z_{2% }}\right)^{\alpha}\right)<(1-p)\left(\frac{1}{\beta^{\alpha}}-1\right)\left(% \beta^{\alpha}-\left(\frac{1}{Z_{2}}\right)^{\alpha}\right)$
	3. $\beta Z_{2}\leq 1$	Not possible: $g\succ f$ in this case.

Table 3: Sufficient conditions for

\widehat{\mu}-\mu\succ\mu_{T}-\mu

by cases, where

T=[Z_{1},Z_{2}]

. Each strict inequality, when replaced with its non-strict counterpart, gives instead a necessary condition.

In this first part of the proof, we proceed as follows. First, we assume that $T\in\mathcal{T}^{c}(\widehat{c})$ . That is, we assume $T=[Z_{1},Z_{2}]$ with extreme points $Z_{1}<Z_{2}$ . For simplicity, we let $\widetilde{\mu}$ denote $\mu_{T}$ . We then compare $f:=\widehat{\mu}-\mu$ and $g:=\widetilde{\mu}-\mu$ using the relation $\succ$ .

Note that, if we let $S$ be the set of students in $G_{2}$ with potential in $[Z_{1},Z_{2}/\beta]$ and $S^{\prime}:=G_{2}\setminus S$ , we have $f(\theta)=g(\theta)$ for $\theta\in S^{\prime}$ . That is, only $G_{2}$ students whose true potential lies in interval $[Z_{1},Z_{2}/\beta]$ are affected by the intervention. Hence, $\sup_{\theta\in S}f>\sup_{\theta\in S}g$ if and only if $f\succ g$ . We divide the analysis in the following two major cases: the first case is when $\beta Z_{2}\geq Z_{1}$ (i.e., when $[\beta Z_{1},\beta Z_{2}]$ and $[Z_{1},Z_{2}]$ overlap) and the second case is when $\beta Z_{2}\leq Z_{1}$ . For both major cases, we will consider two subcases: $\beta Z_{1}\geq 1$ , $\beta Z_{1}\leq 1\leq\beta Z_{2}$ . And for the second major case, we also need to consider the subcase where $\beta Z_{2}\leq 1$ . The results for all cases are summarized in the Table 3.

Observation 12.1

If there is an interval $[Z_{1},Z_{2}]$ that is of either case I.2 or case II.2 such that $\mu_{[Z_{1},Z_{2}]}-\mu\prec\widehat{\mu}-\mu$ with $S=G_{2}$ , then the optimal range must be of case I.2 or case II.2. This is because for any interval $[Z_{1}^{\prime},Z_{2}^{\prime}]$ that is not of case I.2 or case II.2, we have

\sup_{\theta\in\Theta}\{(\mu_{[Z_{1}^{\prime},Z_{2}^{\prime}]}-\mu)\}\geq\sup_% {\theta\in\Theta}\{\widehat{\mu}-\mu\}>\sup_{\theta\in\Theta}\{\mu_{[Z_{1},Z_{% 2}]}-\mu\}.

As it turns out, indeed, the optimal range will be either case I.2 or case II.2, and exactly which one the optimal solution is depends on the amount of resources, i.e., the value of $\widehat{c}$ .

We now show the first half of Theorem 4.1, i.e., we assume $\widehat{c}\geq\frac{(1-p)(1-\beta^{\alpha})}{1-p+1-\beta^{\alpha}}$ . The proof steps are outlined below. Each step can be shown by simple algebra and is thus omitted.

(1).

We first show that $[Z_{1}^{*},Z_{2}^{*}]$ is of case I.2. That is, we show $\beta Z_{2}^{*}\geq Z_{1}^{*}$ and $Z_{1}^{*}\leq\frac{1}{\beta}\leq Z_{2}^{*}$ .

By writing out the formula for $\mu_{[Z_{1},Z_{2}]}-\mu$ , one can see that for an interval $[Z_{1},Z_{2}]$ of case I.2 or case II.2, $\mu_{[Z_{1},Z_{2}]}-\mu$ increases on $[1,Z_{1}]$ , deceases on $[Z_{2},\infty]$ , and it is non-positive on $[Z_{1},Z_{2}]$ . This means $\sup_{\theta\in\Theta}\{\mu_{[Z_{1},Z_{2}]}-\mu\}$ is achieved either at $Z_{1}$ or $Z_{2}$ .

(2).

Next, we show that $[Z_{1}^{*},Z_{2}^{*}]$ is an exact range, that is, $(\frac{1}{Z_{1}^{*}})^{\alpha}-(\frac{1}{Z_{2}^{*}})^{\alpha}=\widehat{c}$ . Moreover, let $\theta_{1}^{*}$ and $\theta_{2}^{*}$ be the $G_{2}$ students whose potentials are $Z_{1}^{*}$ and $Z_{2}^{*}$ respectively. Then, $(\mu_{[Z_{1},Z_{2}]}-\mu)(\theta_{1}^{*})=(\mu_{[Z_{1},Z_{2}]}-\mu)(\theta_{2}% ^{*})$ and thus, they are both equal to $\sup_{\theta\in\Theta}\{\mu_{[Z_{1},Z_{2}]}-\mu\}$ .

Together with the assumption $p<1-\beta^{\alpha}$ , we have $\sup_{\theta\in\Theta}\{\mu_{[Z_{1}^{*},Z_{2}^{*}]}-\mu\}\leq\sup_{\theta\in% \Theta}\{\widehat{\mu}-\mu\}$ . Thus, due to Observation 12.1, it is sufficient to compare $[Z_{1}^{*},Z_{2}^{*}]$ only with intervals $[Z_{1},Z_{2}]$ of case I.2 and case II.2 (i.e, when $\beta Z_{1}\leq 1\leq\beta Z_{2}$ ). Since $[Z_{1}^{*},Z_{2}^{*}]$ is exact, we must either have $Z_{1}>Z_{1}^{*}$ or $Z_{2}<Z_{2}^{*}$ .

(3).
Lastly, we show that for any other feasible range $[Z_{1},Z_{2}]$ of case I.2 or case II.2, we must have $\sup_{\theta\in\Theta}\{\mu_{[Z_{1},Z_{2}]}-\mu\}>\sup_{\theta\in\Theta}\{\mu_% {[Z_{1}^{*},Z_{2}^{*}]}-\mu\}$ . Let $\theta_{1}$ and $\theta_{2}$ be the $G_{2}$ students whose potentials are $Z_{1}$ and $Z_{2}$ . It suffices to show
1. i).
  
  if $Z_{1}>Z_{1}^{*}$ , then $(\mu_{[Z_{1},Z_{2}]}-\mu)(\theta_{1})>(\mu_{[Z_{1}^{*},Z_{2}^{*}]}-\mu)(\theta% _{1}^{*})$ ;
2. ii).
  
  if $Z_{2}<Z_{2}^{*}$ , then $(\mu_{[Z_{1},Z_{2}]}-\mu)(\theta_{2})>(\mu_{[Z_{1}^{*},Z_{2}^{*}]}-\mu)(\theta% _{2}^{*})$ .

For the second half of the theorem, we will follow similar steps and reasoning, outlined below.

(1).

We first show that $[Z_{1}^{*},Z_{2}^{*}]$ is of case II.2. That is to show $\beta Z_{2}^{*}\leq Z_{1}^{*}$ and $Z_{1}^{*}\leq\frac{1}{\beta}\leq Z_{2}^{*}$ .
(2).

We check that $[Z_{1}^{*},Z_{2}^{*}]$ is an exact range. And let $\theta_{1}^{*}$ and $\theta_{2}^{*}$ be the $G_{2}$ students whose potentials are $Z_{1}^{*}$ and $Z_{2}^{*}$ respectively, we want to show that $(\mu_{[Z_{1},Z_{2}]}-\mu)(\theta_{1}^{*})=(\mu_{[Z_{1},Z_{2}]}-\mu)(\theta_{2}% ^{*})$ , which implies that both are $\sup_{\theta\in\Theta}\{\mu_{[Z_{1}^{*},Z_{2}^{*}]}-\mu\}$ .
(3).

We show $\mu_{[Z_{1}^{*},Z_{2}^{*}]}-\mu\prec\widehat{\mu}-\mu$ , which, unlike in the previous case, is not immediate from the assumption $p<1-\beta^{\alpha}$ .

Again, due to Observation 12.1, it is sufficient to compare $[Z_{1}^{*},Z_{2}^{*}]$ only with regions $[Z_{1},Z_{2}]$ of case I.2 and case II.2 (i.e, when $\beta Z_{1}\leq 1\leq\beta Z_{2}$ ).

(4).
As before, we will show two cases, which is enough because $[Z_{1}^{*},Z_{2}^{*}]$ is exact and one of the two cases is bound to happen. Again, let $\theta_{1}$ and $\theta_{2}$ be the $G_{2}$ students whose potentials are $Z_{1}$ and $Z_{2}$ respectively. We want to show
1. i).
  
  if $Z_{1}>Z_{1}^{*}$ , then $(\mu_{[Z_{1},Z_{2}]}-\mu)(\theta_{1})>(\mu_{[Z_{1}^{*},Z_{2}^{*}]}-\mu)(\theta% _{1}^{*})$ ,
2. ii).
  
  otherwise, we must have $Z_{2}<Z_{2}^{*}$ , and then $(\mu_{[Z_{1},Z_{2}]}-\mu)(\theta_{2})>(\mu_{[Z_{1}^{*},Z_{2}^{*}]}-\mu)(\theta% _{2}^{*})$ .

Now let $T^{*}\in\mathcal{T}(\widehat{c})$ be the optimal solution without the restriction that sets in $\mathcal{T}(\widehat{c})$ are connected. We will show that $T^{*}$ differs from $[Z_{1}^{*},Z_{2}^{*}]$ in a set of measure zero. First, in order to have $\sup(\mu_{T^{*}}-\mu)\leq\sup(\mu_{[Z_{1}^{*},Z_{2}^{*}]}-\mu)=:s$ , in $T^{*}$ , we must debias all students $\theta$ whose mistreatment $(\widehat{\mu}-\mu)(\theta)$ is greater than $s$ . That is, we must have $T_{1}^{*}:=[Z_{1}^{*},Z^{(1)}]\subseteq T^{*}$ , where $Z^{(1)}:=Z(\theta^{(1)})\geq 1/\beta$ and $(\widehat{\mu}-\mu)(\theta^{(1)})=s$ . There is a $G_{2}$ student $\theta^{(2)}$ such that $Z^{(2)}:=Z(\theta^{(2)})>Z^{(1)}$ and $(\mu_{T_{1}^{*}}-\mu)(\theta^{(2)})=s$ . We have moreover that $(\mu_{T_{1}^{*}}-\mu)(\theta)\geq s$ for all $\theta\in G_{2}$ such that $Z(\theta)\in[Z^{(1)},Z^{(2)}]$ . Thus, we must also have $[Z^{(1)},Z^{(2)}]\in T^{*}$ . Let $T_{2}^{*}:=[Z_{1}^{*},Z^{(2)}]$ . We can repeat the argument and observe that there is a $G_{2}$ student $\theta^{(3)}$ such that $Z^{(3)}:=Z(\theta^{(3)})>Z^{(2)}$ and $(\mu_{T_{2}^{*}}-\mu)(\theta)\geq s$ for $\theta\in G_{2}$ such that $Z(\theta)\in[Z^{(2)},Z^{(3)}]$ and conclude that $T_{3}^{*}:=[Z_{1}^{*},Z^{(3)}]$ must be contained in $T^{*}$ . Continuously applying the same argument, we have $\lim_{n\rightarrow\infty}Z(\theta^{(n)})=Z_{2}^{*}$ and thus the claim follows.

13 Proof of Theorem 4.2 and related facts

Assume $T=[Z_{1},Z_{2}]$ is the range of true potentials of $G_{2}$ students we want to debias. For simplicity, as in previous sections, let $\widetilde{\mu}$ denote $\mu_{T}$ . In order to obtain the minimizer of $\sigma(\widetilde{\mu}-\mu)$ , first, we want to compute $\sigma(\widetilde{\mu}-\mu)$ for each of the cases in Table 3.

For $1\leq t_{1}\leq t_{2}\in\mathbb{R}\cup\{+\infty\}$ , let $\sigma_{t_{1}}^{t_{2}}(f):=\int_{t_{1}}^{t_{2}}\max(f(t),0)dF_{1}(t)$ for any function $f:[1,\infty]\rightarrow[0,1]$ . When $t_{1}=1$ and $t_{2}=\infty$ , we simply write $\sigma(f)$ , which is consistent with previous notations. Note that with $\sigma(\widehat{\mu}-\mu)$ as a reference, it actually suffices to compute only $\sigma_{Z_{1}}^{Z_{2}/\beta}(\widetilde{\mu}-\mu)$ , because minimizing $\sigma(\widetilde{\mu}-\mu)$ is equivalent to maximizing $\sigma_{Z_{1}}^{Z_{2}/\beta}(\widehat{\mu}-\mu)-\sigma_{Z_{1}}^{Z_{2}/\beta}(% \widetilde{\mu}-\mu)$ since $(\widehat{\mu}-\mu)(\theta)=(\widetilde{\mu}-\mu)(\theta)$ for all $\theta\in G_{2}$ with $Z(\theta)\notin[Z_{1},Z_{2}/\beta]$ .

For each case, we give an explicit formula for $\sigma_{Z_{1}}^{Z_{2}/\beta}(\widehat{\mu}-\mu)-\sigma_{Z_{1}}^{Z_{2}/\beta}(% \widetilde{\mu}-\mu)$ . These formulae can be computed via simply integration, and are thus omitted. In addition, we analyze how this value changes (increase or decrease) with respect to $Z_{1}$ and $Z_{2}$ .

CASE I – Subcase 1. After integrating, we have

\displaystyle\sigma_{Z_{1}}^{Z_{2}/\beta}(\widehat{\mu}-\mu)-\sigma_{Z_{1}}^{Z% _{2}/\beta}(\widetilde{\mu}-\mu)

\displaystyle=\frac{(1-p)\left(\frac{1}{\beta^{\alpha}}-1\right)}{2}\left(% \frac{1}{Z_{1}}\right)^{2\alpha}+\frac{p-p\beta^{\alpha}-\frac{1}{\beta^{% \alpha}}+1}{2}\left(\frac{1}{Z_{2}}\right)^{2\alpha}.

Now, to analyze how this quantity changes with $Z_{1}$ and $Z_{2}$ , we first simplify some of the terms, which will also be used in later sections. Let $x=(\frac{1}{Z_{2}})^{\alpha}\in[0,1]$ and let $(\frac{1}{Z_{1}})^{\alpha}=c+x\in[0,1]$ for some $c\leq\widehat{c}$ . Also, let $g(x,c):=\sigma_{Z_{1}}^{Z_{2}/\beta}(\widehat{\mu}-\mu)-\sigma_{Z_{1}}^{Z_{2}/% \beta}(\widetilde{\mu}-\mu)$ . Then,

g(x,c)=\frac{(1-p)\left(\frac{1}{\beta^{\alpha}}-1\right)}{2}(c+x)^{2}+\frac{p% -p\beta^{\alpha}-\frac{1}{\beta^{\alpha}}+1}{2}x^{2}.

First order conditions (FOC) show that $g(x,c)$ increases as $x$ increases (or equivalently, as $Z_{2}$ decreases) and as $c$ increases (meaning that the constraint $(\frac{1}{Z_{1}})^{\alpha}-(\frac{1}{Z_{2}})^{\alpha}\leq\widehat{c}$ is effectively $(\frac{1}{Z_{1}})^{\alpha}-(\frac{1}{Z_{2}})^{\alpha}=\widehat{c}$ ).

CASE I – Subcase 2. In this case, we have

	$\displaystyle\sigma_{Z_{1}}^{Z_{2}/\beta}(\widehat{\mu}-\mu)-\sigma_{Z_{1}}^{Z% _{2}/\beta}(\widetilde{\mu}-\mu)$	$\displaystyle=-\frac{1}{2}(1-p)\beta^{\alpha}+(1-p)\left(\left(\frac{1}{Z_{1}}% \right)^{\alpha}-\frac{1}{2}\left(\frac{1}{Z_{1}}\right)^{2\alpha}\right)$
		$\displaystyle+\frac{p-p\beta^{\alpha}-\frac{1}{\beta^{\alpha}}+1}{2}\left(% \frac{1}{Z_{2}}\right)^{2\alpha}.$

Now, for the analysis, similarly, write

g(x,c)=\text{const}+(1-p)\left((c+x)-\frac{1}{2}(c+x)^{2}\right)+\frac{p-p% \beta^{\alpha}-\frac{1}{\beta^{\alpha}}+1}{2}x^{2}.

Then, FOC shows that $g(x,c)$ is an increasing function w.r.t. $c$ , and it is an increasing function w.r.t. $x$ on $[0,h_{\text{I}}(c)]$ and is a decreasing function on $[h_{\text{I}}(c),1]$ , where

h_{\text{I}}(c)=\frac{(1-p)(1-c)}{p\beta^{\alpha}+\frac{1}{\beta^{\alpha}}-2p}.

CASE II – Subcase 1. In this case, $\sigma_{Z_{1}}^{Z_{2}/\beta}(\widehat{\mu}-\mu)-\sigma_{Z_{1}}^{Z_{2}/\beta}(% \widetilde{\mu}-\mu)$ equals to

\displaystyle\left(\frac{1}{Z_{1}}\right)^{2\alpha}\left(\frac{(1-p)\left(% \frac{1}{\beta^{\alpha}}-1\right)+p\beta^{\alpha}}{2}\right)-\left(\frac{1}{Z_% {2}}\right)^{2\alpha}\left(\frac{(1-p)\left(\frac{1}{\beta^{\alpha}}-1\right)+% p\beta^{\alpha}}{2}\right)+p\left(\frac{1}{Z_{2}}\right)^{2\alpha}-p\left(% \frac{1}{Z_{1}}\right)^{\alpha}\left(\frac{1}{Z_{2}}\right)^{\alpha}.

Now, for the analysis, let $A=[(1-p)(\frac{1}{\beta^{\alpha}}-1)+p\beta^{\alpha}]/2\geq 0$ . Then,

g(x,c)=A(c+x)^{2}-Ax^{2}+px^{2}-p(c+x)(x).

Checking the FOCs, we have that $g(x,c)$ is an increasing function w.r.t. $c$ and w.r.t. $x$ .

CASE II – Subcase 2. We have

	$\displaystyle\sigma_{Z_{1}}^{Z_{2}/\beta}(\widehat{\mu}-\mu)-\sigma_{Z_{1}}^{Z% _{2}/\beta}(\widetilde{\mu}-\mu)=$	$\displaystyle-\frac{1}{2}(1-p)\beta^{\alpha}+\left(\frac{1}{Z_{1}}\right)^{2% \alpha}\bigg{(}\frac{-(1-p)+p\beta^{\alpha}}{2}\bigg{)}+(1-p)\left(\frac{1}{Z_% {1}}\right)^{\alpha}$
		$\displaystyle-\left(\frac{1}{Z_{2}}\right)^{2\alpha}\bigg{(}\frac{(1-p)\left(% \frac{1}{\beta^{\alpha}}-1\right)+p\beta^{\alpha}}{2}\bigg{)}+p\left(\frac{1}{% Z_{2}}\right)^{2\alpha}-p\left(\frac{1}{Z_{1}}\right)^{\alpha}\left(\frac{1}{Z% _{2}}\right)^{\alpha}.$

For the analysis, again let $A=[(1-p)(\frac{1}{\beta^{\alpha}}-1)+p\beta^{\alpha}]/2\geq 0$ and $B=[-(1-p)+p\beta^{\alpha}]/2<0$ . Then,

g(x,c)=\text{const}+B(c+x)^{2}+(1-p)(c+x)-Ax^{2}+px^{2}-p(c+x)(x),

and for $c$ , it is an increasing function; and for $x$ , it is an increasing function on $[0,h_{\text{II}}(c)]$ and is a decreasing function on $[h_{\text{II}}(c),1]$ , where

h_{\text{II}}(c)=\frac{(p\beta^{\alpha}-1)c+(1-p)}{(1-p)\frac{1}{\beta^{\alpha% }}}.

CASE II – Subcase 3. Lastly, we have that $\sigma_{Z_{1}}^{Z_{2}/\beta}(\widehat{\mu}-\mu)-\sigma_{Z_{1}}^{Z_{2}/\beta}(% \widetilde{\mu}-\mu)$ equals to

\displaystyle\left(\frac{1}{Z_{1}}\right)^{2\alpha}B+(1-p)\left(\frac{1}{Z_{1}% }\right)^{\alpha}-\left(\frac{1}{Z_{2}}\right)^{2\alpha}B-(1-p)\left(\frac{1}{% Z_{2}}\right)^{\alpha}+p\left(\frac{1}{Z_{2}}\right)^{2\alpha}-p\left(\frac{1}% {Z_{1}}\right)^{\alpha}\left(\frac{1}{Z_{2}}\right)^{\alpha}.

For the analysis, write

g(x,c)=B(c+x)^{2}+(1-p)(c+x)-Bx^{2}-(1-p)x+px^{2}-p(c+x)x.

$g(x,c)$ is a decreasing function in $x$ . The sign of $\frac{\partial g(x,c)}{\partial c}$ is actually not clear in this subcase. But for the purpose of finding the minimizer of $\sigma(\widetilde{\mu}-\mu)$ , this is not important because for a fixed value of $c$ , $g(x,c)$ achieves its maximum when $x$ is of the value such that $[Z_{1},Z_{2}]$ is of subcase 2, of either case I or case II.

Not that for a fixed value of $\widehat{c}$ , as $Z_{1}$ gets larger (or equivalently as $Z_{2}$ gets larger, or as $x:=(\frac{1}{Z_{2}})^{\alpha}$ gets smaller), the range $[Z_{1},Z_{2}]$ goes from case II to case I. In particular, for each value of $\widehat{c}$ , such transition happens exactly when $\beta Z_{2}=Z_{1}$ . That is, when

\widehat{c}=\left(\frac{1}{\beta^{\alpha}}-1\right)\left(\frac{1}{Z_{2}}\right% )^{\alpha}\quad\Leftrightarrow\quad\left(\frac{1}{Z_{2}}\right)^{\alpha}=\frac% {\widehat{c}\beta^{\alpha}}{1-\beta^{\alpha}}.

Now, for each fixed value of $\widehat{c}$ , Figure 10 plots $\sigma(\widehat{\mu}-\mu)-\sigma(\widetilde{\mu}-\mu)$ against $Z_{1}$ . It also shows that as $Z_{1}$ increases, how the interval $[Z_{1},Z_{2}]$ changes by cases.

With simple algebra, one can easily check that

\widehat{c}=\frac{(1-p)(1-\beta^{\alpha})}{2-p-\beta^{\alpha}-p\beta^{\alpha}+% p\beta^{2\alpha}}\quad\Rightarrow\quad h_{\text{I}}(\widehat{c})=h_{\text{II}}% (\widehat{c})=\frac{\widehat{c}\beta^{\alpha}}{1-\beta^{\alpha}}.

Therefore, when

\widehat{c}\geq\frac{(1-p)(1-\beta^{\alpha})}{2-p-\beta^{\alpha}-p\beta^{% \alpha}+p\beta^{2\alpha}},

both $h_{\text{I}}(\widehat{c})$ and $h_{\text{II}(\widehat{c})}$ are no more than $\displaystyle\frac{\widehat{c}\beta^{\alpha}}{1-\beta^{\alpha}}$ . Thus, the maximum value of $\sigma(\widehat{\mu}-\mu)-\sigma(\widetilde{\mu}-\mu)$ is achieved when $x=h_{\text{I}}(\widehat{c})$ ; and when

\widehat{c}\leq\frac{(1-p)(1-\beta^{\alpha})}{2-p-\beta^{\alpha}-p\beta^{% \alpha}+p\beta^{2\alpha}},

both $h_{\text{I}}(\widehat{c})$ and $h_{\text{II}(\widehat{c})}$ are no less than $\displaystyle\frac{\widehat{c}\beta^{\alpha}}{1-\beta^{\alpha}}$ . Thus, the maximum value of $\sigma(\widehat{\mu}-\mu)-\sigma(\widetilde{\mu}-\mu)$ is achieved when $x=h_{\text{II}}(\widehat{c})$ .

$\widehat{c}$	$\mathcal{T}_{mm}(\widehat{c})=[Z_{1},Z_{2}]$	$\mathcal{T}_{auc}(\widehat{c})=[Z_{1}^{\prime},Z_{2}^{\prime}]$	$Z_{1}-Z_{1}^{\prime}$
0.10	[1.2252, 1.3111]	[1.2187, 1.3026]	0.0065
0.20	[1.2022, 1.3861]	[1.1903, 1.3653]	0.0119
0.30	[1.1802, 1.4803]	[1.1644, 1.4421]	0.0158
0.40	[1.1461, 1.5584]	[1.1346, 1.5203]	0.0115
0.50	[1.1156, 1.6560]	[1.1070, 1.6155]	0.0086
0.60	[1.0881, 1.7839]	[1.0819, 1.7403]	0.0063
0.70	[1.0632, 1.9635]	[1.0589, 1.9154]	0.0043
0.80	[1.0404, 2.2476]	[1.0377, 2.1926]	0.0026

Table 4: Compare the optimal ranges of

G_{2}

students to debias under two measures of unfairness, under parameters

\alpha=3

\beta=.8

, and

p=.25

. We check the optimal intervals under both measures of unfairness, and find on an average 95% overlap of the optimal intervals.

14 Proofs from Section 5.1

14.1 Auxiliary results for Section 5.1

Recall that, in this section, we consider the generalization of the model from Section 2 where students’ true potential follow a generic continuous, integrable cdf $F$ . Moreover, we write $\left[x\right]^{+}:=\max(0,x)$ for a number or a function $x$ . Recall that, similarly to Section 5.1, we abuse notation and identify a student $\theta$ with their potential $Z(\theta)$ .

Lemma 14.1

Let $\rho$ be an RVP. Under any continuous distribution of potentials $F$ , we have

$\displaystyle\mu_{\rho}(\theta)$	$\displaystyle=\rho(\theta)\left((1-p)\int_{\theta}^{\infty}dF+p\left[\int_{% \theta}^{\theta/\beta}\rho\,dF+\int_{\theta/\beta}^{\infty}dF\right]\right)+(1% -\rho(\theta))\cdot$
	$\displaystyle\left((1-p)\int_{\beta\theta}^{\infty}dF+p\left[\int_{\beta\theta% }^{\theta}\rho\,dF+\int_{\theta}^{\infty}dF\right]\right),$	(8)
$\displaystyle m_{\rho}(\theta)$	$\displaystyle=[0,(1-\rho(\theta))(1-p)\int_{\beta\theta}^{\theta}\,dF+p\left[(% 1-\rho(\theta))\int_{\beta\theta}^{\theta}\rho\,dF-\rho(\theta)\int_{\theta}^{% \theta/\beta}(1-\rho)\,dF\right]]^{+}.$	(9)

Proof 14.2

Proof. Suppose a student appears to have potential $\tau$ , possibly after having been debiased. Then under $\mu_{\rho}$ , they will be matched to school $s(\tau)$ given by

s(\tau)=(1-p)\int_{\tau}^{\infty}\,dF+p\left[\int_{\tau}^{\tau/\beta}\rho\,dF+% \int_{\tau/\beta}^{\infty}\,dF\right],

that is, they will appear after all non-disadvantaged students with true potential exceeding $\tau$ ; those disadvantaged students with potential exceeding $\tau/\beta$ ; and those disadvantaged students who receive a voucher and have potential in the interval $(\tau,\tau/\beta)$ .

A student with true potential $\theta$ now receives a voucher with probability $\rho(\theta)$ , so by the law of total expectation, have

\mu_{\rho}(\theta)=\rho(\theta)s(\theta)+(1-\rho(\theta))s(\beta\theta),

which is exactly (8). (9) follows from (8) and the definitions of displacement and $\mu(\theta)$ . \Halmos

We next report more expressions for $\mu_{\rho}$ and $\mu_{\rho}^{\prime}$ , as they will be used in the upcoming proofs.

Proposition 14.3

Let $\rho$ be an RVP. For all $\theta\in\Theta$ , we have

	$\displaystyle\mu_{\rho}(\theta)$	$\displaystyle=-\rho(\theta)\left((1-p)\int_{\beta\theta}^{\theta}dF+p\left[% \int_{\theta}^{\theta/\beta}(1-\rho)\,dF+\int_{\beta\theta}^{\theta}\rho\,dF% \right]\right)$
		$\displaystyle+\left((1-p)\int_{\beta\theta}^{\infty}dF+p\left[\int_{\beta% \theta}^{\theta}\rho\,dF+\int_{\theta}^{\infty}dF\right]\right).$		(10)

Moreover, if $\mu_{\rho}$ is differentiable at $\theta$ , we have

$\displaystyle\mu_{\rho}^{\prime}(\theta)$	$\displaystyle=-f(\theta)-\rho^{\prime}(\theta)\left[p\int_{\theta}^{\theta/% \beta}(1-\rho)\,dF+(1-p)\int_{\beta\theta}^{\theta}\,dF+p\int_{\beta\theta}^{% \theta}\rho\,dF\right]$
	$\displaystyle\qquad-p\rho(\theta)\left[\frac{1}{\beta}(1-\rho(\theta/\beta))f(% \theta/\beta)-(1-\rho(\theta))f(\theta)\right]$
	$\displaystyle\qquad+(1-\rho(\theta))\left[(1-p)(f(\theta)-\beta f(\beta\theta)% )+p(f(\theta)\rho(\theta)-\beta f(\beta\theta)\rho(\beta\theta))\right].$	(11)

Proof 14.4

Proof. (10) follows by simple rearrangement of (8), and (11) follows by standard mechanics of derivative computation. \Halmos

Definition 14.5

The RVP that assigns no vouchers, denoted $\rho_{0}$ is defined by $\rho_{0}(\theta):=0$ for all $\theta\in[1,\infty)$ . Note that $\mu_{\rho_{0}}(\theta)=(1-p)\int_{\beta\theta}^{\infty}dF+p\int_{\theta}^{% \infty}dF$ and $m_{\rho_{0}}(\theta)=m(\theta)=(1-p)\int_{\beta\theta}^{\theta}\,dF$ .

14.2 Necessary and sufficient conditions for incentive compatibility

In this section we develop necessary and sufficient conditions for incentive compatibility through the concept of well-behavedness and prove an important technical lemma.

Definition 14.6 (Well-behaved RVP)

We call an RVP $\rho$ well-behaved if it is everywhere continuously differentiable except for a set of isolated points where it has non-negative, right-continuous jump discontinuities.

Lemma 14.7 (Necessary and sufficient conditions for incentive compatibility)

Let $\rho$ be a well-behaved RVP and $F$ be an arbitrary continuous distribution of potentials. $\rho$ is incentive compatible with respect to $F$ if and only if, for all $\theta$ such that $\rho$ is continuously differentiable at $\theta$ , we have $\rho^{\prime}(\theta)\geq 0$ or $\mu_{\rho}^{\prime}(\theta)\leq 0$ .

Proof 14.8

Proof. Recall that $\rho$ is incentive compatible if $\mu_{\rho}$ is everywhere non-increasing. Observe from (10) in Proposition 14.3 that $\mu_{\rho}$ is continuous at $\theta$ if and only if $\rho$ is continuous at $\theta$ . On the other hand, if $\mu_{\rho}$ is not continuous at $\theta$ then it must have a negative jump-discontinuity caused by a positive jump-discontinuity of $\rho$ (since all other terms of (10) are positive). Further note that if $\mu_{\rho}$ is not continuously differentiable at $\theta\in\Theta$ , then $\rho$ is not continuously differentiable at $\theta$ , $\beta\theta$ or $\theta/\beta$ ; so the set of points where $\mu_{\rho}$ is not continuously differentiable also forms an isolated set.

Consider any $\theta\in\Theta$ where $\mu_{\rho}$ is continuously differentiable, then $\mu_{\rho}$ is non-increasing iff $\mu_{\rho}^{\prime}(\theta)\leq 0$ . By collecting the terms for $f(\theta)$ , $f(\beta\theta)$ and $f(\theta/\beta)$ in (11), one can see that $\mu_{\rho}^{\prime}(\theta)\leq 0$ if $\rho^{\prime}(\theta)\geq 0$ .

We have established that $\mu_{\rho}$ is continuous at all but an isolated set of negative jump-discontinuities, and that $\mu_{\rho}$ is continuously differentiable and non-increasing at all but an isolated set of points. $\mu_{\rho}$ is therefore everywhere non-increasing, as required. \Halmos

Lemma 14.9

Suppose $\rho$ is a well-behaved RVP such that for all $\theta$ where $\rho$ is continuously differentiable, we have $\rho^{\prime}(\theta)\geq-\phi(\theta)$ , with

\displaystyle\phi(\theta):=\frac{\alpha(1-p)}{\theta\left[p(1-\beta^{\alpha})+% (1-p)(\beta^{-\alpha}-1)\right]}.

Then, $\rho$ is incentive compatible.

Proof 14.10

Proof. By Lemma 14.7, it suffices to show that $\mu_{\rho}^{\prime}(\theta)\leq 0$ for $\theta$ such that $\rho^{\prime}(\theta)$ exists and is continuous, and $\rho^{\prime}(\theta)<0$ . Now define

\mathcal{L}=p\int_{\theta}^{\theta/\beta}(1-\rho)\,dF+(1-p)\int_{\beta\theta}^% {\theta}\,dF+p\int_{\beta\theta}^{\theta}\rho\,dF\hbox{ and }W=-\rho(\theta)(1% -p)f(\theta)-(1-\rho(\theta))(1-p)\beta f(\beta\theta),

and note that $\mathcal{L}\geq 0$ , and $W\leq 0$ . Simple calculations based on (11) in Proposition 14.3 shows that $\mu_{\rho}^{\prime}(\theta)\leq-\rho^{\prime}(\theta)\mathcal{L}+W$ . It is therefore enough to prove $-\rho^{\prime}(\theta)\leq\frac{-W}{\mathcal{L}}$ . Compute next

	$\displaystyle\mathcal{L}$	$\displaystyle\leq p\int_{\theta}^{\theta/\beta}\,dF+(1-p)\int_{\beta\theta}^{% \theta}\,dF\leq\theta^{-\alpha}\left[p(1-\beta^{\alpha})+(1-p)(\beta^{-\alpha}% -1)\right]\hbox{ and }$
	$\displaystyle-W$	$\displaystyle\geq(1-p)\min\left\{f(\theta),\beta f(\beta\theta)\right\}=\frac{% \alpha(1-p)}{\theta^{1+\alpha}}.$

This yields

\displaystyle\frac{-W}{\mathcal{L}}

\displaystyle\geq\frac{\alpha(1-p)}{\theta\left[p(1-\beta^{\alpha})+(1-p)(% \beta^{-\alpha}-1)\right]}=\phi(\theta).

We have shown $\frac{-W}{\mathcal{L}}\geq\phi(\theta)$ , which combined with the assumption that $\phi(\theta)\geq-\rho^{\prime}(\theta)$ completes the proof.\Halmos

14.3 Proof of Theorem 5.2: properties of PropMs

We next prove Lemmas 14.11, 14.13, and 14.19, which together constitute Theorem 5.2.

Lemma 14.11

The proportional-to-mistreatment RVP $\rho_{m}$ is $\frac{2\widehat{c}\alpha}{1-\beta^{\alpha}}$ -individually fair.

Proof 14.12

Proof. $\rho_{m}$ is everywhere continuous and continuously differentiable on $\Theta$ , except at $\theta=1/\beta$ . $\rho_{m}$ is therefore Lipschitz for a constant given by the supremum of the absolute value of the derivative, which occurs at $\theta=1$ where $\rho_{m}^{\prime}(\theta)=\frac{2\widehat{c}\alpha}{1-\beta^{\alpha}}$ .\Halmos

Lemma 14.13

The proportional-to-mistreatment RVP $\rho_{m}$ is incentive compatible for

\displaystyle\widehat{c}\leq\frac{1-p}{2\left[p(1-\beta^{\alpha})+(1-p)(\beta^% {-\alpha}-1)\right]}.

Proof 14.14

Proof. Applying Lemma 14.9, it suffices to show

\displaystyle-\rho^{\prime}_{m}(\theta)=2\alpha\widehat{c}\beta^{-\alpha}% \theta^{-\alpha-1}

\displaystyle\leq\phi(\theta)=\frac{\alpha(1-p)}{\theta\left[p(1-\beta^{\alpha% })+(1-p)(\beta^{-\alpha}-1)\right]}.

(12)

for $\theta\geq 1/\beta$ (since $\rho_{m}^{\prime}(\theta)\geq 0$ for $\theta<1/\beta$ ). Note that this is tightest when $\theta=1/\beta$ , which gives the condition for $\widehat{c}$ . \Halmos

Lemma 14.15

For the proportional-to-mistreatment RVP $\rho_{m}$ , we have

\sup_{\theta\in\Theta}\,(1-\rho_{m}(\theta))\int_{\beta\theta}^{\theta}\,dF=(1% -\beta^{\alpha})\xi(\widehat{c}),\quad\hbox{ where}\quad\xi(\widehat{c}):=% \left\{\begin{array}[]{ll}1-2\widehat{c},&\widehat{c}\leq 1/4,\\ \frac{1}{8\widehat{c}},&\widehat{c}>1/4.\end{array}\right.

Proof 14.16

Proof. With $F(a,b):=\int_{a}^{b}\,dF$ , define $q:=\frac{2\widehat{c}}{1-\beta^{\alpha}}$ and $y(\theta):=F(\beta\theta,\theta)$ . Now write

\displaystyle(1-\rho_{m}(\theta))\int_{\beta\theta}^{\theta}\,dF

\displaystyle=\left(1-\frac{2\widehat{c}}{1-\beta^{\alpha}}F(\beta\theta,% \theta)\right)F(\beta\theta,\theta)=(1-qy(\theta))y(\theta).

This is a quadratic in $y$ that increases from $y=0$ to its maximum at $y=1/(2q)$ . Observe that $\max_{\theta}F(\beta\theta,\theta)=1-\beta^{\alpha}$ which is attained at $\theta=1/\beta$ . This means that if $\widehat{c}\leq 1/4$ , then

\displaystyle y(\theta)=F(\beta\theta,\theta)\leq 1-\beta^{\alpha}\leq\frac{1-% \beta^{\alpha}}{4\widehat{c}}=\frac{1}{2q},

and the maximum of the quadratic over $y$ is realized at the maximum value of $y$ . Thus,

\displaystyle\sup_{\theta\in\Theta,\widehat{c}\leq 1/4}\,(1-qy(\theta))y(% \theta)=\left(1-\frac{2\widehat{c}}{1-\beta^{\alpha}}(1-\beta^{\alpha})\right)% (1-\beta^{\alpha})=(1-\beta^{\alpha})(1-2\widehat{c}).

On the other hand if $\widehat{c}>1/4$ , then this expression reaches its maximum when the quadratic does, at $y=1/(2q)$ , giving

\displaystyle\sup_{\theta\in\Theta,\widehat{c}>1/4}\,(1-qy(\theta))y(\theta)

\displaystyle=\left(1-q\frac{1}{2q}\right)\frac{1}{2q}=\left(1-\frac{1}{2}% \right)\frac{1}{2q}=\frac{1-\beta^{\alpha}}{8\widehat{c}}.

Combining the two completes the proof.\Halmos

Lemma 14.17

For the proportional-to-mistreament RVP, $\rho_{m}$ , the maximum mistreament $mm_{\rho_{m}}$ satisfies

\displaystyle mm_{\rho_{m}}\leq(1-p(1-2\widehat{c}))(1-\beta^{\alpha})\xi(% \widehat{c}),

where $\xi(\cdot)$ is defined as in Lemma 14.15.

Proof 14.18

Proof. Abbreviate $\rho=\rho_{m}$ . Apply $\rho(\theta)\leq\rho(1/\beta)$ to (9) and simplify to get

	$\displaystyle m_{\rho_{m}}(\theta)$	$\displaystyle\leq\left[(1-p)(1-\rho(\theta))\int_{\beta\theta}^{\theta}\,dF+p(% 1-\rho(\theta))\rho(1/\beta)\int_{\beta\theta}^{\theta}\,dF\right]^{+}$
		$\displaystyle=\left[(1+p(\rho(1/\beta)-1))(1-\rho(\theta))\int_{\beta\theta}^{% \theta}\,dF\right]^{+}.$

The thesis follows by taking the supremum over $\theta\in\Theta$ , applying Lemma 14.15, and substituting $\rho(1/\beta)=2\widehat{c}$ .\Halmos

Recall that we let $mm^{*}(\widehat{c})$ be the maximum mistreatment achieved by the optimal policy from Theorem 4.1 with the amount of resources being $\widehat{c}$ . We have

\displaystyle mm^{*}(\widehat{c})

\displaystyle=\begin{cases}(1-p-\widehat{c})(1-\beta^{\alpha})+\widehat{c}p&% \hbox{if }\widehat{c}\leq\frac{(1-p)(1-\beta^{\alpha})}{1-p+1-\beta^{\alpha}},% \\ (1-p)(1-\beta^{\alpha})\frac{1-\widehat{c}}{1-p\beta^{\alpha}}&\text{otherwise% .}\end{cases}

(13)

Lemma 14.19

Suppose $p<1-\beta^{\alpha}$ , $p\leq 1/2$ , and $\widehat{c}\leq\frac{(1-p)(1-\beta^{\alpha})}{1-p+1-\beta^{\alpha}}$ . If $\widehat{c}\geq 1-\frac{p+1-\beta^{\alpha}}{4p(1-\beta^{\alpha})}$ , then $mm_{\rho_{m}}\leq mm^{*}(\widehat{c})$ .

Proof 14.20

Proof. Let $Q:=mm^{*}(\widehat{c})-mm_{\rho_{m}}$ , we need to show $Q\geq 0$ . Using Theorem 4.1 and Lemma 14.17, compute

\displaystyle Q

\displaystyle=mm^{*}(\widehat{c})-mm_{\rho_{m}}\geq(1-p)(1-\beta^{\alpha})-% \widehat{c}(1-\beta^{\alpha}-p)-(1+p(2\widehat{c}-1))(1-\beta^{\alpha})\xi(% \widehat{c}).

For $\widehat{c}\leq 1/4$ , we now have

\displaystyle Q\geq\widehat{c}\left[(1-4p(1-\widehat{c}))(1-\beta^{\alpha})+p% \right].

(14)

If $p\leq\frac{1}{4}$ , then the right-hand side of (14) is nonnegative, concluding the proof. Thus, assume $p>\frac{1}{4}$ . Since $\widehat{c}>0$ , we can drop the leading $\widehat{c}$ , so for $Q\geq 0$ , we need

\displaystyle p(1-4(1-\beta^{\alpha})(1-\widehat{c}))\geq-(1-\beta^{\alpha}).

Rearranging leads to the thesis.

Consider next the case where $\widehat{c}\geq 1/4$ . In this case we want to show the inequality

\displaystyle(1-p)(1-\beta^{\alpha})-\widehat{c}(1-\beta^{\alpha}-p)-(1+p(2% \widehat{c}-1))(1-\beta^{\alpha})\frac{1}{8\widehat{c}}\geq 0.

Again since $\widehat{c}>0$ , we can multiply by $\widehat{c}$ to get a quadratic in $\widehat{c}$ ; call the resulting expression $W(\widehat{c})$ :

\displaystyle W(\widehat{c})

\displaystyle=\widehat{c}(1-p)(1-\beta^{\alpha})-\widehat{c}^{2}(1-\beta^{% \alpha}-p)-(1+p(2\widehat{c}-1))(1-\beta^{\alpha})\frac{1}{8}.

Since $p<1-\beta^{\alpha}$ , $W^{\prime\prime}(\widehat{c})\leq 0$ , hence this is a concave quadratic. One can verify that if $p\leq 1/2$ then $W(1/4)\geq 0$ and $W(1/2)\geq 0$ , which means that $W$ must also be non-negative for $\widehat{c}\in[1/4,1/2]$ , as required.\Halmos

14.4 Increasing-with-Potential RVPs

Proof 14.21

Proof of Lemma 5.3. Directly from Lemma 14.7.\Halmos

Proof 14.22

Proof of Theorem 5.4. We claim that there exists $\delta>0$ with $\delta<\theta(1-\beta)$ such that on $I:=(\theta-\delta,\theta+\delta)$ , the following properties hold for all $t\in I$ : $\rho$ is continuous and differentiable at $t$ ; $\rho$ is monotonically decreasing at $t$ ; and $0<\rho(t)\leq(1+\rho(\theta))/2$ . The existence of an interval that satisfies the first and second properties follows since $\rho$ is continuously differentiable in some neighborhood of $\theta$ and has strictly negative derivative. The third follows since $\rho$ has a strictly negative derivative at $\theta$ , so it must be strictly bounded away from $0$ and $1$ itself, and then one can restrict $\delta$ to guarantee the same for $t$ close to $\theta$ . Note also that $I\subset(\beta\theta,\theta/\beta)$ .

Next, fix $\varepsilon>0$ , then one can construct a distribution $f$ that satisfies the following conditions: $f$ is continuous and differentiable everywhere; $f(\theta)=\varepsilon$ ; $f(t)=0$ for $t\not\in I$ ; and $\int_{\theta}^{\theta+\delta}f(t)\,dt\geq\frac{1}{2}$ . This can be done for instance by constructing a piece-wise constant function that satisfies all but the first condition, then smoothing it out with an appropriate bump function via standard techniques.

From (11) of Proposition 14.3 we can compute

$\displaystyle\mu_{\rho}^{\prime}(\theta)$	$\displaystyle=-\rho^{\prime}(\theta)\left(p\int_{\theta}^{\theta/\beta}(1-\rho% )\,dF+(1-p)\int_{\beta\theta}^{\theta}\,dF+p\int_{\beta\theta}^{\theta}\rho\,% dF\right)$
	$\displaystyle\qquad-\frac{1}{\beta}p\rho(\theta)f\left(\frac{\theta}{\beta}% \right)\left(1-\rho\left(\frac{\theta}{\beta}\right)\right)-\beta(1-\rho(% \theta))f(\beta\theta)(1-p(1-\rho(\beta\theta)))$
	$\displaystyle\qquad-f(\theta)(p(1-\rho(\theta))(1-2\rho(\theta))+\rho(\theta))$
	$\displaystyle=(-\rho^{\prime}(\theta))\left(p\int_{\theta}^{\theta+\delta}(1-% \rho)\,dF+(1-p)\int_{\theta-\delta}^{\theta}\,dF+p\int_{\theta-\delta}^{\theta% }\rho\,dF\right)-\varepsilon(p(1-\rho(\theta))(1-2\rho(\theta))+\rho(\theta))$
	$\displaystyle\geq(-\rho^{\prime}(\theta))p\int_{\theta}^{\theta+\delta}(1-\rho% )\,dF-\varepsilon$
	$\displaystyle\geq\frac{1}{2}(-\rho^{\prime}(\theta))p(1-\rho(\theta))\int_{% \theta}^{\theta+\delta}\,dF-\varepsilon$
	$\displaystyle\geq\frac{1}{4}(-\rho^{\prime}(\theta))p(1-\rho(\theta))-\varepsilon.$	(15)

Here we used the fact that $p(1-\rho(\theta))(1-2\rho(\theta))+\rho(\theta)\in[0,1]$ .

Now the first term in (15) is strictly positive, and we can freely choose $\varepsilon$ strictly smaller in magnitude to get $\mu_{\rho}^{\prime}(\theta)>0$ . Note that although $\rho$ might not be well-behaved everywhere, it is well behaved on $I$ , and we can apply Lemma 14.7 to this point to get that $\rho$ is not incentive compatible for $\theta$ , completing the proof. \Halmos