Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2021 May 25;118(22):e2018340118. doi: 10.1073/pnas.2018340118

Algorithmic monoculture and social welfare

Jon Kleinberg a, Manish Raghavan a,1
PMCID: PMC8179131  PMID: 34035166

Significance

Algorithmic monoculture is a growing concern in the use of algorithms for high-stakes screening decisions in areas such as employment and lending. If many firms use the same algorithm, even if it is more accurate than the alternatives, the resulting “monoculture” may be susceptible to correlated failures, much as a monocultural system is in biological settings. To investigate this concern, we develop a model of selection under monoculture. We find that even without any assumption of shocks or correlated failures—i.e., under “normal operations”—the quality of decisions may decrease when multiple firms use the same algorithm. Thus, the introduction of a more accurate algorithm may decrease social welfare—a kind of “Braess’ paradox” for algorithmic decision-making.

Keywords: monoculture, ranking, random utility model, algorithmic decision-making

Abstract

As algorithms are increasingly applied to screen applicants for high-stakes decisions in employment, lending, and other domains, concerns have been raised about the effects of algorithmic monoculture, in which many decision-makers all rely on the same algorithm. This concern invokes analogies to agriculture, where a monocultural system runs the risk of severe harm from unexpected shocks. Here, we show that the dangers of algorithmic monoculture run much deeper, in that monocultural convergence on a single algorithm by a group of decision-making agents, even when the algorithm is more accurate for any one agent in isolation, can reduce the overall quality of the decisions being made by the full collection of agents. Unexpected shocks are therefore not needed to expose the risks of monoculture; it can hurt accuracy even under “normal” operations and even for algorithms that are more accurate when used by only a single decision-maker. Our results rely on minimal assumptions and involve the development of a probabilistic framework for analyzing systems that use multiple noisy estimates of a set of alternatives.


The rise of algorithms used to shape societal choices has been accompanied by concerns over monoculture—the notion that choices and preferences will become homogeneous in the face of algorithmic curation. One of many canonical articulations of this concern was expressed in The New York Times by Farhad Manjoo (1), who wrote: “Despite the barrage of choice, more of us are enjoying more of the same songs, movies and TV shows.” Because of algorithmic curation, trained on collective social feedback (2), our choices are converging.

When we move from the influence of algorithms on media consumption and entertainment to their influence on high-stakes screening decisions about to whom to offer a job or to whom to offer a loan, the concerns about algorithmic monoculture become even starker. Even if algorithms are more accurate on a case-by-case basis, a world in which everyone uses the same algorithm is susceptible to correlated failures when the algorithm finds itself in adverse conditions. This type of concern invokes an analogy to agriculture, where monoculture makes crops susceptible to the attack of a single pathogen (3); the analogy has become a mainstay of the computer-security literature (4), and it has recently become a source of concern about screening decisions for jobs or loans as well. Discussing the postrecession financial system, Citron and Pasquale (5) write: “Like monocultural-farming technology vulnerable to one unanticipated bug, the converging methods of credit assessment failed spectacularly when macroeconomic conditions changed.”

The narrative around algorithmic monoculture thus suggests a trade-off: In “normal” conditions, a more accurate algorithm will improve the average quality of screening decisions, but when conditions change through an unexpected shock, the results can be dramatically worse. But is this trade-off genuine? In the absence of shocks, does monocultural convergence on a single, more accurate screening algorithm necessarily lead to better average outcomes?

In this work, we show that algorithmic monoculture poses risks, even in the absence of shocks. We investigate a model involving minimal assumptions, in which two competing firms can either use their own independent heuristics to perform screening decisions, or they can use a more accurate algorithm that is accessible to both of them. (Again, we think of screening job applicants or loan applicants as a motivating scenario.) We find that even though it would be rational for each firm in isolation to adopt the algorithm, it is possible for the use of the algorithm by both firms to result in decisions that are worse on average. This, in turn, leads, in the language of game theory, to a type of “Braess’ paradox” (6) for screening algorithms: The introduction of a more accurate algorithm can drive the firms into a unique equilibrium that is worse for society than the one that was present before the algorithm existed.

Note that the harm here is to overall performance. Another common concern about algorithmic monoculture in screening decisions is the harm it can cause to specific individuals: If all employers or lenders use the same algorithm for their screening decisions, then particular applicants might find themselves locked out of the market when this shared algorithm doesn’t like their application for some reason. While this is clearly also a significant concern, our results show that it would be a mistake to view the harm to particular applicants as necessarily balanced against the gains in overall accuracy—rather, it is possible for algorithmic monoculture to cause harm not just to particular applicants, but also to the average quality of decisions as well.

Our results thus have a counterintuitive flavor to them: If an algorithm is clearly more accurate than the alternatives when one entity uses it, why does the accuracy become worse than the alternatives when multiple entities use it? The analysis relies on deriving some probabilistic properties of rankings, establishing that when we are constructing a ranking from a probability distribution representing a “noisy” version of a true ordering, we can sometimes achieve less error through an incremental construction of the ranking—building it one element at a time—than we can by constructing it in a single draw from the distribution. We now set up the basic model and then frame the probabilistic questions that underpin its analysis.

Algorithmic Hiring as a Case Study

To instantiate the ideas introduced thus far, we’ll focus on the case of algorithmic hiring, where recruiters make decisions based in part on scores or recommendations provided by data-driven algorithms. In this setting, we’ll propose and analyze a stylized model of algorithmic hiring with which we can begin to investigate the effects of algorithmic monoculture.

Informally, we can think of a simplified hiring process as follows: Rank all of the candidates, and select the first available one. We suppose that each firm has two options to form this ranking: Either develop their own, private ranking (which we will refer to as using a “human evaluator”) or use an algorithmically produced ranking. We assume that there is a single vendor of algorithmic rankings, so all firms choosing to use the algorithm receive the same ranking. The firms proceed in a random order, each hiring their favorite remaining candidate according to the ranking they’re using—human-generated or algorithmic (see Fig. 1 for an example). Thus, we can frame the effects of monoculture as follows: Are firms better off using the more accurate, common algorithm, or should they instead employ their own less accurate, but private, evaluations?

Fig. 1.

Fig. 1.

Each firm has the choice to use either their own private ranking or the common algorithmic ranking to order the n candidates. In a random order, each firm hires the highest-ranked available candidate according to the ranking they chose. For example, if firm 1 uses their private ranking and firm 2 uses the algorithmic ranking, then firm 1 hires candidate B, and firm 2 hires candidate A. If both firms use the algorithmic ranking, then the firm randomly selected to hire first hires candidate A, and the firm randomly selected to hire second hires candidate C.

In what follows, we’ll introduce a formal model of evaluation and selection, using it to analyze a setting in which firms seek to hire candidates.

Modeling Ranking.

More formally, we model the n candidates as having intrinsic values x1,,xn, where any employer would derive utility xi from hiring candidate i. Throughout the paper, we assume without loss of generality that x1>x2>>xn. These values, however, are unknown to the employer; instead, they must use some noisy procedure to rank the candidates. We model such a procedure as a randomized mechanism R that takes in the true candidate values and draws a permutation π over those candidates from some distribution. Our main results hold for families of distributions over permutations as defined below:Definition 1 (Noisy Permutation Family): A noisy permutation family Fθ is a family of distributions over permutations that satisfies the following conditions for any θ>0 and set of candidates x:

  • 1.

    Differentiability: For any permutation π, PrFθ[π] is continuous and differentiable in θ.

  • 2.

    Asymptotic optimality: For the true ranking π*, limθPrFθ[π*]=1.

  • 3.

    Monotonicity: For any (possibly empty) Sx, let π(S) be the partial ranking produced by removing the items in S from π. Let π1(S) denote the value of the top-ranked candidate according to π(S). For any θ>θ,

EFθπ1(S)EFθπ1(S). [1]
  • Moreover, for S=, Eq. 1 holds with strict inequality.

θ serves as an “accuracy parameter”: For large θ, the noisy ranking converges to the true ranking over candidates. The monotonicity condition states that a higher value of θ leads to a better first choice, even if some of the candidates are removed after ranking. Removal after ranking (as opposed to before) is important because some of the ranking models we will consider later do not satisfy Independence of Irrelevant Alternatives. Examples of noisy permutation families include Random Utility Models (RUMs) (7) and the Mallows Model (8), both of which we will discuss in detail later.

As an objective function to evaluate the effects of different approaches to ranking and selection, we’ll consider each individual employer’s utility, as well as the sum of employers’ utilities. We think of this latter sum as the social welfare, since it represents the total quality of the applicants who are hired by any firm. (For example, if all firms deterministically used the correct ranking, then the top applicants would be the ones hired, leading to the highest possible social welfare.)

Modeling Selection.

Each firm in our model has access to the same underlying pool of n candidates, which they rank using a randomized mechanism R to get a permutation π, as described above. Then, in a random order, each firm hires the highest-ranked remaining candidate according to their ranking. Thus, if two firms both rank candidate i first, only one of them can hire i; the other must hire the next available candidate according to their ranking. In our model, candidates automatically accept the offer they get from a firm. For the sake of simplicity, throughout this paper, we restrict ourselves to the case where there are two firms hiring one candidate each, although our model readily generalizes to more complex cases.

As described earlier, each firm can choose to use either a private human evaluator or an algorithmically generated ranking as its randomized mechanism R. We assume that both candidate mechanisms come from a noisy permutation family Fθ, with differing values of the accuracy parameter θ: Human evaluators all have the same accuracy θH, and the algorithm has accuracy θA. However, while the human evaluator produces a ranking independent of any other firm, the algorithmically generated ranking is identical for all firms who choose to use it. In other words, if two firms choose to use the algorithmically generated ranking, they will both receive the same permutation π.

The choice of which ranking mechanism to use leads to a game-theoretic setting: Both firms know the accuracy parameters of the human evaluators (θH) and the algorithm (θA), and they must decide whether to use a human evaluator or the algorithm. This choice introduces a subtlety: For many ranking models, a firm’s rational behavior depends not only on the accuracy of the ranking mechanism, but also on the underlying candidate values x1,,xn. Thus, to fully specify a firm’s behavior, we assume that x1,,xn are drawn from a known joint distribution D. Our main results will hold for any D, meaning that they apply even when the candidate values (but not their identities) are deterministically known.

Stating the Main Result.

Our main result is a pair of intuitive conditions under which a Braess’ Paradox-style result occurs—in other words, conditions under which there are accuracy parameters for which both firms rationally choose to use the algorithmic ranking, but social welfare (and each individual firm’s utility) would be higher if both firms used independent human evaluators. Recall that the two firms hire in a random order. For a permutation π, let πi denote the value of the ith-ranked candidate according to π.

We first state the two conditions and then the theorem based on them.Definition 2 (Preference for the First Position): A candidate distribution D and noisy permutation family Fθ exhibits a preference for the first position if for all θ>0, if π,σFθ,

Eπ1π2|π1σ1>0.

In other words, for any θ>0, suppose we draw two permutations π and σ independently from Fθ, and suppose that the first-ranked candidates differ in π and σ. Then, the expected value of the first-ranked candidate in π is strictly greater than the expected value of the second-ranked candidate in π.Definition 3 (Preference for Weaker Competition): A candidate distribution D and noisy permutation family Fθ, exhibits a preference for weaker competition if the following holds: For all θ1>θ2, σFθ1 and π,τFθ2,

Eπ1({σ1})<Eπ1({τ1}).

Intuitively, suppose we have a higher accuracy parameter θ1 and a lower accuracy parameter θ2<θ1; we draw a permutation π from Fθ2; and we then derive two permutations from π: π({σ1}) obtained by deleting the first-ranked element of a permutation σ drawn from the more accurate distribution Fθ1, and π({τ1}) obtained by deleting the first-ranked element of a permutation τ drawn from the less accurate distribution Fθ2.

Then, the expected value of the first-ranked candidate in π({τ1}) is strictly greater than the expected value of the first-ranked candidate in π({σ1})—that is, when a random candidate is removed from π, the best remaining candidate is better in expectation when the randomly removed candidate is chosen based on a noisier ranking.

Using these two conditions, we can state our theorem.

Theorem 1. Suppose that a given candidate distribution D and noisy permutation family Fθ satisfy Definition 2 (preference for the first position) and Definition 3 (preference for weaker competition).

Then, for any θH, there exists θA>θH such that using the algorithmic ranking is a strictly dominant strategy for both firms, but social welfare would be higher if both firms used human evaluators.

A Preference for Independence.

Before we prove Theorem 1, we provide some intuition for the two conditions in Definitions 2 and 3. The second condition essentially says that it is better to have a worse competitor: The firm randomly selected to hire second is better off if the firm that hires first uses a less accurate ranking (in this case, a human evaluator instead of the algorithmic ranking).

The first condition states that when two identically distributed permutations disagree on their first element, the first-ranked candidate according to either permutation is still better, in expectation, than the second-ranked candidate according to either permutation. In what follows, we’ll demonstrate that this condition implies that firms in our model rationally prefer to make decisions using independent (but equally accurate) rankings.

To do so, we need to introduce some notation. Recall that the two firms hire in a random order. Given a candidate distribution D, let Us(θA,θH) denote the expected utility of the first firm to hire a candidate when using ranking s, where s{A,H} is either the algorithmic ranking or the ranking generated by a human evaluator, respectively. Similarly, let Us1s2(θA,θH) be the expected utility of the second firm to hire, given that the first firm used strategy s1 and the second firm uses strategy s2, where again, s1,s2{A,H}. Finally, let π,σFθ.

In what follows, we will show that for any θ,

Eπ1π2|π1σ1>0UAH(θ,θ)>UAA(θ,θ). [2]

In other words, whenever a ranking model meets Definition 2, the firm chosen to select second will prefer to use an independent ranking mechanism from its competitor, given that the ranking mechanisms are equally accurate.

First, we can write

UAH(θA,θH)=Eπ11π1σ1+π21π1=σ1UAA(θA,θH)=Eσ2=Eσ21π1σ1+σ21π1=σ1.

Thus,

UAH(θA,θH)UAA(θA,θH)=E(π1σ2)1π1σ1+(π2σ2)1π1=σ1.

Conditioned on either π1=σ1 or π1σ1, π2 and σ2 are identically distributed and, therefore, have equal expectations. As a result,

UAH(θA,θH)UAA(θA,θH)=E(π1π2)1π1σ1, [3]

which implies Eq. 2. Thus, whenever a ranking model meets Definition 2, firms rationally prefer independent assessments, all else equal.

To provide some intuition for what this preference for independence entails, consider a setting where a hiring committee seeks to hire two candidates. They meet, produce a ranking σ, and hire σ1 (the best candidate according to σ). Suppose they have the option to either hire σ2 or reconvene the next day to form an independent ranking π and hire the best remaining candidate according to π; which option should they choose? It’s not immediately clear why one option should be better than the other. However, whenever Definition 2 is met, the committee should prefer to reconvene and make their second hire according to a new ranking π. After proving Theorem 1, we will provide natural ranking models that meet Definition 2, implying that under these ranking models, independent reranking can be beneficial.

Proving Theorem 1.

With this intuition, we are ready to prove Theorem 1.

Proof of Theorem 1.

For given values of θA and θH, using the algorithmic ranking is a strictly dominant strategy as long as

UA(θA,θH)+UAA(θA,θH)>UH(θA,θH)+UAH(θA,θH), [4]
UA(θA,θH)+UHA(θA,θH)>UH(θA,θH)+UHH(θA,θH). [5]

Note that Eq. 5 is always true for θA>θH by the monotonicity assumption on Fθ: UA(θA,θH)UH(θA,θH) because a more accurate ranking produces a top-ranked candidate with higher expected value, and UHA(θA,θH)UHH(θA,θH) because this holds even conditioned on removing any candidate from the pool (in this case, the candidate randomly selected by the firm that hires first). Crucially, in Eq. 5, the first firm’s random selection is independent from the second firm’s selection; the same logic could not be used to argue that Eq. 4 always holds for θAθH. Moreover, when θA>θH, UA(θA,θH)>UH(θA,θH) by the monotonicity assumption, meaning Eq. 5 holds.

Let Ws1s2(θA,θH) denote social welfare when the two firms employ strategies s1,s2{A,H}. Then, when both firms use the algorithmic ranking, social welfare is

WAA(θA,θH)=UA(θA,θH)+UAA(θA,θH).

By Eq. 2, Definition 2 implies that for any θ, UAA(θ,θ)<UAH(θ,θ), implying

UA(θH,θH)+UAA(θH,θH)<UH(θH,θH)+UAH(θH,θH).

However, by the optimality assumption on Fθ in Definition 1, for sufficiently large θ^A,

UA(θ^A,θH)+UAA(θ^A,θH)>UH(θ^A,θH)+UAH(θ^A,θH).

Note that Us1(θA,θH) and Us1s2(θA,θH) are continuous with respect to θA for any s1,s2{A,H} since they are expectations over discrete distributions with probabilities that are, by assumption, differentiable with respect to θA. Therefore, by the differentiability assumption on Fθ from Definition 1, there is some θA*>θH such that

UA(θA*,θH)+UAA(θA*,θH)=UH(θA*,θH)+UAH(θA*,θH), [6]

i.e., given that its competitor uses the algorithmic ranking, a firm is indifferent between the two strategies. For such θA*, using the algorithmic ranking is still a weakly dominant strategy. By definition of WAA,

WAA(θA*,θH)=UH(θA*,θH)+UAH(θA*,θH).

If both firms had instead used human evaluators, social welfare would be

WHH(θA*,θH)=UH(θA*,θH)+UHH(θA*,θH).

By Definition 3, for σFθA* and π,τFθH,

Eπ1({σ1})<Eπ1({τ1}).

Note that

UAH(θA*,θH)=Eπ1({σ1})UHH(θA*,θH)=Eπ1({τ1}).

Thus, Definition 3 implies that for θA*>θH, UHH(θA*,θH)>UAH(θA*,θH). As a result, for θA*>θH, using the algorithmic ranking is a weakly dominant strategy, but

WHH(θA*,θH)=UH(θA*,θH)+UHH(θA*,θH)>UH(θA*,θH)+UAH(θA*,θH)=UA(θA*,θH)+UAA(θA*,θH)=WAA(θA*,θH),

meaning that social welfare would have been higher had both firms used human evaluators.

We can show that this effect persists for a value θA, such that using the algorithmic ranking is a strictly dominant strategy. Intuitively, this is simply by slightly increasing θA* so the algorithmic ranking is strictly dominant. For fixed θH, define

f(θA)=UA(θA,θH)+UAA(θA,θH)g(θA)=UH(θA,θH)+UAH(θA,θH)h(θA)=UH(θA,θH)+UHH(θA,θH).

Because Eq. 5 always holds for θA>θH, it suffices to show that there exists θA such that g(θA)<f(θA)<h(θA). This is because g(θA)<f(θA) is equivalent to Eq. 4, and f(θA)<h(θA) is equivalent to WAA(θA,θH)<WHH(θA,θH).

First, note that h(θA) is a constant, and by Definition 3, g(θA)<h(θA) for all θA>θH. By the optimality assumption of Definition 1, there exists sufficiently large θ^A such that f(θ^A)>g(θ^A). Recall that by definition of θA*, f(θA*)=g(θA*). Both f and g are continuous by the differentiability assumption in Definition 1. Thus, there must exist some θA>θA* such that g(θA)<f(θA)<h(θA). This means that for θA, using the algorithmic ranking is a strictly dominant strategy, but social welfare would still be larger if both firms used human evaluators.

Instantiating with Ranking Models

Thus far, we have described a general set of conditions under which algorithmic monoculture can lead to a reduction in social welfare. Under which ranking models do these conditions hold? In the remainder of this paper, we instantiate the model with two well-studied ranking models: RUMs (7) and the Mallows Model (8). While RUMs do not always satisfy Definitions 2 and 3, they do under some realistic parameterizations, regardless of the candidate distribution D. Under the Mallows Model, Definitions 2 and 3 are always met, meaning that for any candidate distribution D and human evaluator accuracy θH, there exists an accuracy parameter θA such that a common algorithmic ranking with accuracy θA decreases social welfare.

RUMs.

In RUMs, the underlying candidate values xi are perturbed by independent and identically distributed noise εiE, and the perturbed values are ranked to produce π. Originally conceived in the psychology literature (7), this model has been well-studied over nearly a century, (914), including more recently in the computer science and machine-learning literature (1519).

First, we must define a family of RUMs that satisfies the conditions of Definition 1. Assume without loss of generality that the noise distribution E has unit variance. Then, consider the family of RUMs parameterized by θ, in which candidates are ranked according to xi+εiθ. By this definition, the SD of the noise for a particular value of θ is simply 1/θ. Intuitively, larger values of θ reduce the effect of the noise, making the ranking more accurate. In SI Appendix, we show as long as the noise distribution E has positive support on (,), this definition of Fθ meets the differentiability, asymptotic optimality, and monotonicity conditions in Definition 1. For distributions with finite support, many of our results can be generalized by relaxing strict inequalities in Definition 1 and Theorem 1 to weak inequalities.

Because RUMs are notoriously difficult to work with analytically, we restrict ourselves to the case where n=3—i.e., there are three candidates. Under this restriction, we can show that for Gaussian and Laplacian noise distributions, Definition 2 and 3—the two conditions of Theorem 1—are met, regardless of the candidate distribution D. We defer the proof to SI Appendix.

Theorem 2. Let Fθ be the family of RUMs with either Gaussian or Laplacian noise with SD 1/θ. Then, for any candidate distribution D over three candidates, the conditions of Theorem 1 are satisfied.

It might be tempting to generalize Theorem 2 to other distributions and more candidates; however, certain noise and candidate distributions violate the conditions of Theorem 1. Even for three-candidate RUMs, there exist distributions for which each of the conditions is violated; see SI Appendix for examples.

Moreover, while Gaussian and Laplacian distributions provably meet Definitions 2 and 3 with only three candidates, this doesn’t necessarily extend to larger candidate sets. Fig. 2 shows that Definition 2 can be violated under a particular candidate distribution D for Laplacian noise with 15 candidates. This challenges the intuition that independence is preferable—under some conditions, it can actually better in expectation for a firm to use the same algorithmic ranking as its competitor, even if an independent human evaluator is equally accurate overall. Unlike Theorem 2, which applies for any candidate distribution D, certain noise models may violate Definition 2 only for particular D. It is an open question as to whether Theorem 2 can be extended to larger numbers of candidates under Gaussian noise.

Fig. 2.

Fig. 2.

UAH(θ,θ)UAA(θ,θ) for three noise models with n candidates whose utilities are drawn from a uniform distribution with unit variance for n=3, n=5, and n=15. Note that for n=15, UAH(θ,θ)UAA(θ,θ)<0 for Laplacian noise, meaning Definition 2 is not met.

Finally, there exist noise distributions that violate Definition 2 for any candidate distribution D. In particular, the RUM family defined by the Gumbel distribution is well-known to be equivalent to the Plackett–Luce model of ranking, which is generated by sequentially selecting candidate i with probability

exp(θxi)jSexp(θxj), [7]

where S is the set of remaining candidates (10, 20). Under the Plackett–Luce model, for any θ, UAH(θ,θ)=UAA(θ,θ). To see this, suppose the firm that hires first selects candidate i*. Then, the firm that hires second gets each candidate i with probability given by Eq. 7 with S={1,,n}\i*. As a result, by Eq. 3, if π,σFθ,

Eπ1π2|π1σ1=0,

for any candidate distribution D, meaning the Plackett–Luce model never meets Definition 2. Thus, under the Plackett–Luce model, monoculture has no effect—the optimal strategy is always to use the best available ranking, regardless of competitors’ strategies.

Given the analytic intractability of most RUMs, it might appear that testing the conditions of Theorem 1, especially for particular noise and candidate distributions, may not be possible; however, they can be efficiently tested via simulation: As long as the noise distribution E and the candidate distribution D can be sampled from, it is possible to test whether the conditions of Theorem 1 are satisfied. Thus, even if the conditions of Theorem 1 are not met for every candidate distribution D, it is possible to efficiently determine whether they are met for any particular D.

It is also interesting to ask about the magnitude of the negative impact produced by monoculture. Our model allows for the qualities of candidates to be either positive or negative (capturing the fact that a worker’s productivity can be either more or less than their cost to the firm in wages); using this, we can construct instances of the model in which the optimal social welfare is positive, but the welfare under the (unique) monocultural equilibrium implied by Theorem 1 is negative. This is a strong type of negative result, in which suboptimality reverses the sign of the objective function, and it means that, in general, we cannot compare the optimum and equilibrium by taking a ratio of two nonnegative quantities, as is standard in Price of Anarchy results. However, as a future direction, it would be interesting to explore such Price of Anarchy bounds in special cases of the problem where structural assumptions on the input are sufficient to guarantee that the welfare at both the social optimum and the equilibrium are nonnegative. As one simple example, if the qualities for three candidates are drawn independently from a uniform distribution centered at zero, and the noise distribution is Gaussian, then there exist parameters θA>θH such that expected social welfare at the equilibrium where both firms use the algorithmic ranking is nonnegative and approximately 4% less than it would be had both firms used human evaluators instead.

The Mallows Model.

The Mallows Model also appears frequently in the ranking literature (21, 22) and is much more analytically tractable than RUMs. Under the Mallows Model, the likelihood of a permutation is related to its distance from the true ranking π*:

Pr[π]=1Zϕd(π,π*), [8]

where Z is a normalizing constant. In this model, ϕ>1 is the accuracy parameter: The larger ϕ is, the more likely the ranking procedure is to output a ranking π that is close to the true ranking r. To instantiate this model, we need a notion of distance d(,) over permutations. For this, we’ll use Kendall tau distance, another standard notion in the literature, which is simply the number of pairs of elements in π that are incorrectly ordered (23). In SI Appendix, we verify that the family of distributions Fθ given by the Mallows Model satisfies Definition 1, defining θ=ϕ1 (for consistency, so θ is well-defined on (0,)).

In contrast to RUMs, the Mallows Model always satisfies the conditions of Theorem 1 for any candidate distribution D, which we prove in SI Appendix.

Theorem 3. Let Fθ be the family of Mallows Model distributions with parameter θ=ϕ1. Then, for any candidate distribution D, the conditions of Theorem 1 are satisfied.

Fig. 3 characterizes firms’ rational behavior at equilibrium in the (θH,θA) plane under the Mallows Model. The decrease in social welfare found in Theorem 3 is depicted by the shaded portion of the green region labeled AA, where social welfare would be higher if both firms used human evaluators.

Fig. 3.

Fig. 3.

Regions for different equilibria. When human evaluators are more accurate than the algorithm, both firms decide to employ humans (HH). When the algorithm is significantly more accurate, both firms use the algorithm (AA). When the algorithm is slightly more accurate than human evaluators, two possible equilibria exist: 1) One firm uses the algorithm and the other employs a human (AH) or (2) both decide whether to use the algorithm with some probability p. The shaded portion of the green AA region depicts where social welfare is smaller at the AA equilibrium than it would be if both firms used human evaluators.

While the result of Theorem 3 is certainly stronger than that of Theorem 2, in that it applies to all instances of the Mallows Model without restrictions, it should be interpreted with some caution. The Mallows Model does not depend on the underlying candidate values, so, according to this model, monoculture can produce arbitrarily large negative effects. While insensitivity to candidate values may not necessarily be reasonable in practice, our results hold for any candidate distribution D. Thus, to the extent that the Mallows Model can reasonably approximate ranking in particular contexts, our results imply that monoculture can have negative welfare effects.

Conclusion

Concerns about monoculture in the use of algorithms have focused on the danger of unexpected, correlated shocks and on the harm to particular individuals who may fare poorly under the algorithm’s decision. Our work here shows that concerns about algorithmic monoculture are, in a sense, more fundamental, in that it is possible for monoculture to cause decisions of globally lower average quality, even in the absence of shocks. In addition to telling us something about the pervasiveness of the phenomenon, it also suggests that it might be difficult to notice its negative effects, even while they’re occurring—these effects can persist at low levels, even without a shock-like disruption to call our attention to them. Our results also make clear that algorithmic monoculture in decision-making doesn’t always lead to adverse outcomes; rather, we give natural conditions under which such outcomes become possible and show that these conditions hold in a wide range of standard models.

Our results suggest a number of natural directions for further work. To begin with, we have noted earlier in the paper that it would be interesting to give more comprehensive quantitative bounds on the magnitude of monoculture’s possible negative effects in decisions such as hiring—how much worse can the quality of candidates be when selected with an equilibrium strategy involving shared algorithms than with a socially optimal one? In formulating such questions, it will be important to take into account how the noise model for rankings relates to the numerical qualities of the candidates.

We have also focused here on the case of two firms and a single shared algorithm that is available to both. It would be natural to consider generalizations involving more firms and potentially more algorithms as well. With more algorithms, we might see solutions in which firms cluster around different algorithms of varying accuracies, as they balance the level of accuracy and the amount of correlation in their decisions. It would also be interesting to explore the ways in which correlations in firms’ decisions can be decomposed into constituent parts, such as the use of standardized tests that form input features for algorithms, and how quantifying these forms of correlation might help firms assess their decisions.

Finally, it will be interesting to consider how these types of results apply to further domains. While the analysis presented here illustrates the consequences of monoculture as applied to algorithmic hiring, our findings have potential implications in a broader range of settings. Algorithmic monoculture not only leads to a lack of heterogeneity in decision-making; by allowing valuable options to slip through the cracks—be they job candidates, potential hit songs, or budding entrepreneurs—it reduces total social welfare, even when the individual decisions are more accurate on a case-by-case basis. These concerns extend beyond the use of algorithms; whenever decision-makers rely on identical or highly correlated evaluations, they miss out on hidden gems and, in this way, diminish the overall quality of their decisions.

Materials and Methods

The results in Figs. 2 and 3 were obtained by computational methods. The computational results for RUMs in Fig. 2 were performed via simulation, taking the average of 10,000,000 trials and reporting 95% CIs. Candidate utilities were drawn from a uniform distribution on [3,3], which is symmetric and has unit variance. The computational results for the Mallows Model in Fig. 3 were obtained symbolically by using the sympy module in Python. Candidate utilities were drawn from a uniform distribution on [0,1].

Supplementary Material

Supplementary File
pnas.2018340118.sapp.pdf (318.1KB, pdf)

Acknowledgments

This work has been supported in part by a Simons Investigator Award, a Vannevar Bush Faculty Fellowship, a Multidisciplinary University Research Initiative grant, Air Force Office of Scientific Research Grant FA9550-19-1-0183, an NSF Graduate Research Fellowship, and grants from the Army Research Office and the MacArthur Foundation.

Footnotes

The authors declare no competing interest.

This article is a PNAS Direct Submission.

This article contains supporting information online at https://www.pnas.org/lookup/suppl/doi:10.1073/pnas.2018340118/-/DCSupplemental.

Data Availability

There are no data underlying this work.

References

  • 1.Manjoo F., This summer stinks. But at least we’ve got ‘Old Town Road.’The New York Times, 7 August 2019. https://www.nytimes.com/2019/08/07/opinion/old-town-road.html.
  • 2.Salganik M. J., Dodds P. S., Watts D. J., Experimental study of inequality and unpredictability in an artificial cultural market. Science 311, 854–856 (2006). [DOI] [PubMed] [Google Scholar]
  • 3.Power J., Follett R., Monoculture. Sci. Am. 256, 78–87 (1987). [Google Scholar]
  • 4.Birman K. P., Schneider F. B., The monoculture risk put into context. IEEE Secur. Priv. 7, 14–17 (2009). [Google Scholar]
  • 5.Citron D. K., Pasquale F., The scored society: Due process for automated predictions. Wash. Law Rev. 89, 1 (2014). [Google Scholar]
  • 6.Braess D., Über ein paradoxon aus der verkehrsplanung. Unternehmensforschung 12, 258–268 (1968). [Google Scholar]
  • 7.Thurstone L. L., A law of comparative judgment. Psychol. Rev. 34, 273 (1927). [Google Scholar]
  • 8.Mallows C. L., Non-null ranking models. I. Biometrika 44, 114–130 (1957). [Google Scholar]
  • 9.Daniels H., Rank correlation and population models. J. Roy. Stat. Soc. B 12, 171–191 (1950). [Google Scholar]
  • 10.Block H., Marschak J., “Random orderings and stochastic theories of responses” in Contributions to Probability and Statistics, Olkin I., Ghurye S. G., Hoeffding W., Madow W. G., Mann H. B., Eds. (Stanford University Press, Stanford, CA, 1960), pp. 97–132. [Google Scholar]
  • 11.Joe H., Inequalities for random utility models, with applications to ranking and subset choice data. Methodol. Comput. Appl. Probab. 2, 359–372 (2000). [Google Scholar]
  • 12.J. I. Yellott, Jr, The relationship between Luce’s choice axiom, Thurstone’s theory of comparative judgment, and the double exponential distribution. J. Math. Psychol. 15, 109–144 (1977). [Google Scholar]
  • 13.Manski C. F., The structure of random utility models. Theor. Decis. 8, 229 (1977). [Google Scholar]
  • 14.Strauss D., Some results on random utility models. J. Math. Psychol. 20, 35–52 (1979). [Google Scholar]
  • 15.Azari Soufiani H., Parkes D. C., Xia L., “Random utility theory for social choice” in NIPS’12: Proceedings of the 25th International Conference on Neural Information Processing Systems, F. Pereira, C. J. C. Burges, L. Bottou, K. Q. Weinberger, Eds. (Curran Associates, Red Hook, NY, 2012), pp. 126–134.
  • 16.Azari Soufiani H., Diao H., Lai Z., Parkes D. C., “Generalized random utility models with multiple types” in NIPS’13: Proceedings of the 26th International Conference on Neural Information Processing Systems, C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, K. O. Weinberger, Eds. (Curran Associates, Red Hook, NY, 2013), pp. 73–81.
  • 17.Ragain S., Ugander J., “Pairwise choice Markov chains” in NIPS’16: Proceedings of the 30th International Conference on Neural Information Processing Systems, D. D. Lee, U. von Luxburg, R. Garnett, M. Sugiyama, I. Guyon, Eds. (Curran Associates, Red Hook, NY, 2016), pp. 3198–3206.
  • 18.Zhao Z., Villamil T., Xia L., “Learning mixtures of random utility models” in Thirty-Second AAAI Conference on Artificial Intelligence, McIlraith S. A., Weinberger K. Q., Eds. (AAAI Press, Palo Alto, CA, 2018), pp. 4530–4538.
  • 19.Makhijani R., Ugander J., “Parametric models for intransitivity in pairwise rankings” in WWW’19: The World Wide Web Conference, L. Liu, R. White, Eds. (Association for Computing Machinery, New York, NY, 2019), pp. 3056–3062.
  • 20.Luce R. D., Individual Choice Behavior: A Theoretical Analysis (Wiley, New York, NY, 1959). [Google Scholar]
  • 21.Das S., Li Z., “The role of common and private signals in two-sided matching with interviews” in WINE 2014: International Conference on Web and Internet Economics, T. Y. Liu, Q. Qi, Y. Ye, Eds. (Lecture Notes in Computer Science, Springer, Cham, Switzerland, 2014), vol. 8877, pp. 492–497. [Google Scholar]
  • 22.Lu T., Boutilier C., “Learning Mallows models with pairwise preferences” in ICML’11: Proceedings of the 28th International Conference on Machine Learning, L. Getoor, T. Scheffer, Eds. (Omnipress, Madison, WI, 2011), pp. 145–152.
  • 23.Kendall M. G., A new measure of rank correlation. Biometrika 30, 81–93 (1938). [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary File
pnas.2018340118.sapp.pdf (318.1KB, pdf)

Data Availability Statement

There are no data underlying this work.


Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES