Decision 5
Decision 5
Decision 5
S1 S2 S3 S4
A 10 9 9 0
B 8 3 3 3
And imagine we’re using the maximin rule. Then the rule says that A does better than B in
S1 , while B does better than A in S4 . The rule also says that B does better than A overall,
since it’s worst case scenario is 3, while A’s worst case scenario is 0. But we can also compare
A and B with respect to pairs of states. So conditional on us just being in S1 or S2 , then A is
better. Because between those two states, its worst case is 9, while B’s worst case is 3.
Now imagine we’ve given up on maximin, and are applying a new rule we’ll call maxi-
average. The maxiaverage rule tells us make the choice that has the highest (or maximum)
average of best case and worst case scenarios. The rule says that B is better overall, since it
has a best case of 8 and a worst case of 3 for an average of 5.5, while A has a best case of 10
and a worst case of 0, for an average of 5.
But if we just know we’re in S1 or S2 , then the rule recommends A over B. That’s because
among those two states, A has a maximum of 10 and a minimum of 9, for an average of 9.5,
while B has a maximum of 8 and a minimum of 3 for an average of 5.5.
And if we just know we’re in S3 or S4 , then the rule also recommends A over B. That’s
because among those two states, A has a maximum of 9 and a minimum of 0, for an average
of 4.5, while B has a maximum of 3 and a minimum of 3 for an average of 3.
This is a fairly odd result. We know that either we’re in one of S1 or S2 , or that we’re in
one of S3 or S4 . And the rule tells us that if we find out which, i.e. if we find out we’re in S1
54
or S2 , or we find out we’re in S3 or S4 , either way we should choose A. But before we find
this out, we should choose B.
Here then is a more general version of dominance. Assume our initial states are {S1 , S2 , ..., Sn }.
Call this set S. A binary partition of S is a pair of sets of states, call them T1 and T2 , such
that every state in S is in exactly one of T1 and T2 . (We’re simplifying a little here - generally
a partition is any way of dividing a collection up into parts such that every member of the
original collection is in one of the ‘parts’. But we’ll only be interested in cases where we di-
vide the original states in two, i.e., into a binary partition.) Then the generalised version of
dominance says that if A is better than B among the states in T1 , and it is better than B among
the states in T2 , where T1 and T2 provide a partition of S, then it is better than B among the
states in S. That’s the principle that maxiaverage violates. A is better than B among the states
{S1 , S2 }. And it is better than B among the states {S3 , S4 }. But it isn’t better than B among
the states {S1 , S2 , S3 , S4 }. That is, it isn’t better than B among the states generally.
We’ll be interested in this principle of dominance because, unlike perhaps dominance
itself, there are some cases where it leads to slightly counterintuitive results. For this reason
some theorists have been interested in theories which, although they satisfy dominance, do
not satisfy this general version of dominance.
On the other hand, maximise expected utility does respect this principle. In fact, it
respects an even stronger principle, one that we’ll state using the notion of conditional
expected utility. Recall that as well as probabilities, we defined conditional probabilities
above. Well conditional expected utilities are just the expectations of the utility function
with respect to a conditional probability. More formally, if there are states S1 , S2 , ..., Sn , then
the expected utility of A conditional on E, which we’ll write Exp(U(A|E), is
Exp(U(A|E)) = Pr(S1 |E)U(S1 |A) + Pr(S2 |E)U(S2 |A) + ... + Pr(Sn |E)U(Sn |A)
That is, we just replace the probabilities in the definition of expected utility with conditional
probabilities. (You might wonder why we didn’t also replace the utilities with conditional
utilities. That’s because we’re assuming that states are defined so that given an action, the
state has a fixed utility. If we didn’t make this simplifying assumption, we’d have to be more
careful here.) Now we can prove the following theorem.
• If Exp(U(A|E)) > Exp(U(B|E)), and Exp(U(B|¬E)) > Exp(U(B|¬E)), then Exp(U(A)) >
Exp(U(B)).
We’ll prove this by proving something else that will be useful in many contexts.
55
And now we’ll use this when we’re expanding Exp(U(A|E))Pr(E).
Exp(U(A|E))Pr(E) = Pr(E)[Pr(S1 |E)U(S1 |A) + Pr(S2 |E)U(S2 |A) + ... + Pr(Sn |E)U(Sn |A)]
= Pr(E)Pr(S1 |E)U(S1 |A) + Pr(E)Pr(S2 |E)U(S2 |A) + ... + Pr(E)Pr(Sn |E)U(Sn |A)
Exp(U(A|¬E))Pr(¬E) = Pr(¬E)[Pr(S1 |¬E)U(S1 |A) + Pr(S2 |¬E)U(S2 |A) + ... + Pr(Sn |¬E)U(Sn |A)]
= Pr(¬E)Pr(S1 |¬E)U(S1 |A) + Pr(¬E)Pr(S2 ¬|E)U(S2 |A) + ... + Pr(¬E)Pr(Sn |¬E)U(Sn |A)
Exp(U(A|E))Pr(E) + Exp(U(A|¬E))Pr(¬E)
= Pr(E)Pr(S1 |E)U(S1 |A) + ... + Pr(E)Pr(Sn |E)U(Sn |A)+
Pr(¬E)Pr(S1 |¬E)U(S1 |A) + ... + Pr(¬E)Pr(Sn |¬E)U(Sn |A)
= (Pr(E)Pr(S1 |E) + Pr(¬E)Pr(S1 |¬E))U(S1 |A) + ... + (Pr(E)Pr(Sn |E) + Pr(¬E)Pr(Sn |¬E))U(Sn |A)
= Pr(S1 )U(S1 |A) + Pr(S2 )U(S2 |A) + ...Pr(Sn )U(Sn |A)
= Exp(U(A))
Now if Exp(U(A|E)) > Exp(U(B|E)), and Exp(U(B|¬E)) > Exp(U(B|¬E)), then the follow-
ing two inequalities hold.
Exp(U(A|E))Pr(E) ≥ Exp(U(B|E))Pr(E)
Exp(U(A|¬E))Pr(¬E) ≥ Exp(U(B|¬E))Pr(¬E)
In each case we have equality only if the probability in question (Pr(E) in the first line,
Pr(¬E) in the second) is zero. Since not both Pr(E) and Pr(¬E) are zero, one of those is a
strict inequality. (That is, the left hand side is greater than, not merely greater than or equal
to, the right hand side.) So adding up the two lines, and using the fact that in one case we
have a strict inequality, we get
That is, if A is better than B conditional on E, and it is better than B conditional on ¬E, then
it is simply better than B.
The terminology there could use some spelling out. By A ≻ B we mean that A is preferred
to B. By A ⪰ B we mean that A is regarded as at least as good as B. The relation between ≻
and ⪰ is like the relation between > and ≥. In each case the line at the bottom means that
we’re allowing equality between the values on either side.
56
The odd thing here is using AE ⪰ BE rather than something that’s explicitly conditional.
We should read the terms on each side of the inequality sign as conjunctions. It means that A
and E is regarded as at least as good an outcome as B and E. But that sounds like something
that’s true just in case the agent prefers A to B conditional on E obtaining. So we can use
preferences over conjunctions like AE as proxy for conditional preferences.
So we can read the Sure Thing Principle as saying that if A is at least as good as B con-
ditional on E, and conditional on ¬E, then it really is at least as good as B. Again, this looks
fairly plausible in the abstract, though we’ll soon see some reasons to worry about it.
Expected Utility maximisation satisfies the Sure Thing Principle. I won’t go over the
proof here because it’s really just the same as the proof from the previous section with >
replaced by ≥ in a lot of places. But if we regard the Sure Thing Principle as a plausible
principle of decision making, then it is a good feature of Expected Utility maximisation that
it satisfies it.
It is tempting to think of the Sure Thing Principle as a generalisation of a principle of
logical implication we all learned in propositional logic. The principle in question said that
from X → Z, and Y → Z, and X ∨ Y, we can infer C. If we let Z be that A is better than
B, let X be E, and Y be ¬E, it looks like we have all the premises, and the reasoning looks
intuitively right. But this analogy is misleading for two reasons.
First, for technical reasons we can’t get into in depth here, preferring A to B conditional
on E isn’t the same as it being true that if E is true you prefer A to B. To see some problems
with this, think about cases where you don’t know E is true, and A is something quite hor-
rible that mitigates the effects of the unpleasant E. In this case you do prefer AE to BE, and
E is true, but you don’t prefer A to B. But we’ll set this question, which is largely a logical
question about the nature of conditionals, to one side.
The bigger problem is that the analogy with logic would suggest that the following gen-
eralisation of the Sure Thing Principle will hold.
Disjunction Principle If AE1 ⪰ BE1 and AE2 ⪰ BE2 , and Pr(E1 ∨ E2 ) = 1 then A ⪰ B.
But this “Disjunction Principle” seems no good in cases like the following. I’m going to toss
two coins. Let p be the proposition that they will land differently, i.e. one heads and one
tails. I offer you a bet that pays you $2 if p, and costs you $3 if ¬p. This looks like a bad
bet, since Pr(p) = 0.5, and losing $3 is worse than gaining $2. But consider the following
argument.
Let E1 be that at least one of the coins landing heads. It isn’t too hard to show that
Pr(p|E1 ) = 2/3. So conditional on E1 , the expected return of the bet is 2/3 × 2 – 1/3 × 3 =
4/3 – 1 = 1/3. That’s a positive return. So if we let A be taking the bet, and B be declining the
bet, then conditional on E1 , A is better than B, because the expected return is positive.
Let E2 be that at least one of the coins landing tails. It isn’t too hard to show that
Pr(p|E1 ) = 2/3. So conditional on E2 , the expected return of the bet is 2/3 × 2 – 1/3 × 3 =
4/3 – 1 = 1/3. That’s a positive return. So if we let A be taking the bet, and B be declining the
bet, then conditional on E2 , A is better than B, because the expected return is positive.
Now if E1 fails, then both of the coins lands tails. That means that at least one of the
coins lands tails. That means that E2 is true. So if E1 fails E2 is true. So one of E1 and E2
57
has to be true, i.e. Pr(E1 ∨ E2 ) = 1. And AE1 ⪰ BE1 and AE2 ⪰ BE2 . Indeed AE1 ≻ BE1
and AE2 ≻ BE2 . But B ≻ A. So the disjunction principle isn’t in general true.
It’s a deep philosophical question how seriously we should worry about this. If the Sure
Thing Principle isn’t any more plausible intuitively than the Disjunction Principle, and the
Disjunction Principle seems false, does that mean we should be sceptical of the Sure Thing
Principle? As I said, that’s a very hard question, and it’s one we’ll return to a few times in
what follows.
That is, they are offered a choice between an 11% shot at $1,000,000, and a 10% shot at
$5,000,000. Second, the subjects are offered the following choice between C and D, which
are dependent on drawings from a similarly constructed urn.
That is, they are offered a choice between $1,000,000 for sure, and a complex bet that gives
them a 10% shot at $5,000,000, an 89% shot at $1,000,000, and a 1% chance of striking out
and getting nothing.
Now if we were trying to maximise expected dollars, then we’d have to choose both B
and D. But, and this is an important point that we’ll come back to, dollars aren’t utilities.
Getting $2,000,000 isn’t twice as good as getting $1,000,000. Pretty clearly if you were offered
a million dollars or a 50% chance at two million dollars you would, and should, take the
million for sure. That’s because the two million isn’t twice as useful to you as the million.
Without a way of figuring out the utility of $1,000,000 versus the utility of $5,000,000, we
can’t say whether A is better than B. But we can say one thing. You can’t consistently hold
the following three views.
• B≻A
• C≻D
• The Sure Thing Principle holds
58
This is relevant because a lot of people think B ≻ A and C ≻ D. Let’s work through the
proof of this to finish with.
Let E be that either a white or yellow ball is drawn. So ¬E is that a black ball is drawn.
Now note that A¬E is identical to B¬E. In either case you get nothing. So A¬E ⪰ B¬E. So
if AE ⪰ BE then, by Sure Thing, A ⪰ B. Equivalently, if B ≻ A, then BE ≻ AE. Since we’ve
assumed B ≻ A, then BE ≻ AE.
Also note that C¬E is identical to D¬E. In either case you get a million dollars. So
D¬E ⪰ C¬E. So if DE ⪰ CE then, by Sure Thing, D ⪰ C. Equivalently, if C ≻ D, then
CE ≻ DE. Since we’ve assumed C ≻ D, then CE ≻ DE.
But now we have a problem, since BE = DE, and AE = CE. Given E, then choice between
A and B just is the choice between C and D. So holding simultaneously that BE ≻ AE and
CE ≻ DE is incoherent.
It’s hard to say for sure just what’s going on here. Part of what’s going on is that we have
a ‘certainty premium’. We prefer options like C that guarantee a positive result. Now having
a certainly good result is a kind of holistic property of C. The Sure Thing Principle in effect
rules out assigning value to holistic properties like that. The value of the whole need not be
identical to the value of the parts, but any comparisons between the values of the parts has
to be reflected in the value of the whole. Some theorists have thought that a lesson of the
Allais paradox is that this is a mistake.
We won’t be looking in this course at theories which violate the Sure Thing Principle,
but we will be looking at justifications of the Sure Thing Principle, so it is worth thinking
about reasons you might have for rejecting it.
59
10.4 Exercises
10.4.1 Calculate Expected Utilities
In the following example Pr(S1 ) = 0.4, Pr(S2 ) = 0.3, Pr(S3 ) = 0.2 and Pr(S4 ) = 0.1. The table
gives the utility of each of the possible actions (A, B, C, D and E) in each state. What is the
expected utility of each action?
S1 S2 S3 S4
A 0 2 10 2
B 6 2 1 7
C 1 8 9 7
D 3 1 8 6
E 4 7 1 4
60
Chapter 11
Understanding Probability
11.1 Kinds of Probability
As might be clear from the discussion of what probability functions are, there are a lot of
probability functions. For instance, the following is a probability function for any (logically
independent) p and q.
p q Pr
T T 0.97
T F 0.01
F T 0.01
F F 0.01
But if p actually is that the moon is made of green cheese, and q is that there are little green
men on Mars, you probably won’t want to use this probability function in decision making.
That would commit you to making some bets that are intuitively quite crazy.
So we have to put some constraints on the kinds of probability we use if the “Maximise
Expected Utility” rule is likely to make sense. As it is sometimes put, we need to have an in-
terpretation of the Pr in the expected utility rule. We’ll look at three possible interpretations
that might be used.
11.2 Frequency
Historically probabilities were often identified with frequencies. If we say that the probabil-
ity that this F is a G is, say, 23 , that means that the proportion of F’s that are G’s is 23 .
Such an approach is plausible in a lot of cases. If we want to know what the probability
is that a particular student will catch influenza this winter, a good first step would be to find
1
out the proportion of students who will catch influenza this winter. Let’s say this is 10 . Then,
to a first approximation, if we need to feed into our expected utility calculator the probability
1
that this student will catch influenza this winter, using 10 is not a bad first step. Indeed, the
insurance industry does not a bad job using frequencies as guides to probabilities in just this
way.
But that can hardly be the end of the story. If we know that this particular student has
not had an influenza shot, and that their boyfriend and their roommate have both caught
influenza, then the probability of them catching influenza would now be much higher. With
61
that new information, you wouldn’t want to take a bet that paid $1 if they didn’t catch in-
fluenza, but lost you $8 if they did catch influenza. The odds now look like that’s a bad
bet.
Perhaps the thing to say is that the relevant group is not all students. Perhaps the relevant
group is students who haven’t had influenza shots and whose roommates and boyfriends
have also caught influenza. And if, say, 23 of such students have caught influenza, then per-
haps the probability that this student will catch influenza is 32 .
You might be able to see where this story is going by now. We can always imagine more
details that will make that number look inappropriate as well. Perhaps the student in ques-
tion is spending most of the winter doing field work in South America, so they have little
chance to catch influenza from their infected friends. And now the probability should be
lower. Or perhaps we can imagine that they have a genetic predisposition to catch influenza,
so the probability should be higher. There is always more information that could be relevant.
The problem for using frequencies as probabilities then is that there could always be
more precise information that is relevant to the probability. Every time we find that the
person in question isn’t merely an F (a student, say), but is a particular kind of F (a student
who hasn’t had an influenza shot, whose close contacts are infected, who has a genetic pre-
disposition to influenza), we want to know the proportion not of F’s who are G’s, but the
proportion of the more narrowly defined class who are G’s. But eventually this will leave us
with no useful probabilities at all, because we’ll have found a way of describing the student
in question such that they are the only person in history who satisfies this description.
This is hardly a merely theoretical concern. If we are interested in the probability that a
particular bank will go bankrupt, or that a particular Presidential candidate will win elec-
tion, it isn’t too hard to come up with a list of characteristics of the bank or candidate in
question in such a way that they are the only one in history to meet that description. So the
frequency that such banks will go bankrupt is either 1 (1 out of 1 go bankrupt) or 0 (0 out
of 1 do). But those aren’t particularly useful probabilities. So we should look elsewhere for
an interpretation of the Pr that goes into our definition of expected utility.
In the literature there are two objections to using frequencies as probabilities that seem
related to the argument we’re looking at here.
One of these is the Reference Class Problem. This is the problem that if we’re interested
in the probability that a particular person is G, then the frequency of G-hood amongst the
different classes the person is in might differ.
The other is the Single Case Problem. This is the problem that we’re often interested in
one-off events, like bank failures, elections, wars etc, that don’t naturally fit into any natural
broader category.
I think the reflections here support the idea that these are two sides of a serious problem
for the view that probabilities are frequencies. In general, there actually is a natural solution
to the Reference Class Problem. We look to the most narrowly drawn reference class we
have available. So if we’re interested in whether a particular person will survive for 30 years,
and we know they are a 52 year old man who smokes, we want to look not to the survival
frequencies of people in general, or men in general, or 52 year old men in general, but 52
year old male smokers.
Perhaps by looking at cases like this, we can convince ourselves that there is a natural
solution to the Reference Class Problem. But the solution makes the Single Case Problem
62
come about. Pretty much anything that we care about is distinct in some way or another.
That’s to say, if we look closely we’ll find that the most natural reference class for it just
contains that one thing. That’s to say, it’s a single case in some respect. And one-off events
don’t have interesting frequencies. So frequencies aren’t what we should be looking to as
probabilities.
So if you pay $Pr(p) for the bet, your expected return is exactly 0. Obviously if you pay
more, you’re worse off, and if you pay less, you’re better off. $Pr(p) is the break even point,
so that’s the fair price for the bet.
And that’s how we measure degrees of belief. We look at the agent’s ‘fair price’ for a bet
that returns $1 if p. (Alternatively, we look at the maximum they’ll pay for such a bet.) And
that’s they’re degree of belief that p. If we’re taking probabilities to be degrees of belief, if we
63
are (as it is sometimes put) interpreting probability subjectively, then that’s the probability
of p.
This might look suspiciously circular. The expected utility rule was meant to give us
guidance as to how we should make decisions. But the rule needed a probability as an input.
And now we’re taking that probability to not only be a subjective state of the agent, but a
subjective state that is revealed in virtue of the agent’s own decisions. Something seems odd
here.
Perhaps we can make it look even odder. Let p be some proposition that might be true
and might be false, and assume that the agent’s choice is to take or decline a bet on p that has
some chance of winning and some chance of losing. Then if the agent takes the bet, that’s
a sign that their degree of belief in p was higher than the odds of the bet on p, so therefore
they are increasing their expected utility by taking the bet, so they are doing the right thing.
On the other hand, if they decline the bet, that’s a sign that their degree of belief in p was
lower than the odds of the bet on p, so therefore they are increasing their expected utility
by taking the bet, so they are doing the right thing. So either way, they do the right thing.
But a rule that says they did the right thing whatever they do isn’t much of a rule.
There are two important responses to this, which are related to one another. The first
is that although the rule does (more or less) put no restrictions at all on what you do when
faced with a single choice, it can put quite firm constraints on your sets of choices when
you have to make multiple decisions. The second is that the rule should be thought of as a
procedural rather than substantive rule of rationality. We’ll look at these more closely.
If we take probabilities to be subjective probabilities, i.e. degrees of belief, then the max-
imise expected utility rule turns out to be something like a consistency constraint. Compare
it to a rule like Have Consistent Beliefs. As long as we’re talking about logically contingent
matters, this doesn’t put any constraint at all on what you do when faced with a single ques-
tion of whether to believe p or ¬p. But it does put constraints on what further beliefs you
can have once you believe p. For instance, you can’t now believe ¬p.
The maximise expected utility rule is like this. Indeed we already saw this in the Allais
paradox. The rule, far from being empty, rules out the pair of choices that many people
intuitively think is best. So if the objection is that the rule has no teeth, that objection can’t
hold up.
We can see this too in simpler cases. Let’s say I offer the agent a ticket that pays $1 if p,
and she pays 60c for it. So her degree of belief in p must be at least 0.6. Then I offer her a
ticket that pays $1 if ¬p, and she pays 60c for it too. So her degree of belief in ¬p must be at
least 0.6. But, and here’s the constraint, we think degrees of belief have to be probabilities.
And if Pr(p) > 0.6, then Pr(¬p) < 0.4. So if Pr(¬p) > 0.6, we have an inconsistency. That’s
bad, and it’s the kind of badness it is the job of the theory to rule out.
One way to think about the expected utility rule is to compare it to norms of means-
end rationality. At times when we’re thinking about what someone should do, we really
focus on what the best means is to their preferred end. So we might say If you want to go to
Harlem, you should take the A train, without it even being a relevant question whether they
should, in the circumstances, want to go to Harlem.
The point being made here is quite striking when we consider people with manifestly
crazy beliefs. If we’re just focussing on means to an end, then we might look at someone
who, say, wants to crawl from the southern tip of Broadway to its northern tip. And we’ll
64
say “You should get some kneepads so you don’t scrape your knees, and you should take lots
of water, and you should catch the 1 train down to near to where Broadway starts, etc.” But
if we’re not just offering procedural advice, but are taking a more substantive look at their
position, we’ll say “You should come up with a better idea about what to do, because that’s
an absolutely crazy thing to want.”
As we’ll see, the combination of the maximise expected utility rule with the use of de-
grees of belief as probabilities leads to a similar set of judgments. On the one hand, it is a
very good guide to procedural questions. But it leaves some substantive questions worry-
ingly unanswered. Next time we’ll come back to this distinction, and see if there’s a better
way to think about probability.
65
Chapter 12
Objective Probabilities
12.1 Credences and Norms
We ended last time with looking at the idea that the probabilities in expected utility calcu-
lations should be subjective. As it is sometimes put, they should be degrees of belief. Or, as
it is also sometimes put, they should be credences. We noted that under this interpretation,
the maximise expected utility rule doesn’t put any constraints on certain simple decisions.
That’s because we use the rule to calculate what credences are, and then use the very same
credences to say what the rule requires. But the rule isn’t useless. It puts constraints, often
sharp constraints, on sets of decisions. In this respect it is more like the rule Have Consistent
Beliefs than like the rule Believe What’s True, or Believe What Your Evidence Supports. And
we compared it to procedural, as opposed to substantive norms.
What’s left from all that are two large questions.
• Do we get the right procedural/consistency constraints from the expected utility rule?
In particular (a) should credences be probabilities, and (b) should we make complex
decisions by the expected utility rule? We’ll look a bit in what follows at each of these
questions.
• Is a purely procedural constraint all we’re looking for in a decision theory?
And intuitively the answer to the second question is No. Let’s consider a particular case.
Alex is very confident that the Kansas City Royals will win baseball’s World Series next year.
In fact, Alex’s credence in this is 0.9, very close to 1. Unfortunately, there is little reason for
this confidence. Kansas City has been one of the worst teams in baseball for many years,
the players they have next year will be largely the same as the players they had when doing
poorly this year, and many other teams have players who have performed much much better.
Even if Kansas City were a good team, there are 30 teams in baseball, and relatively random
events play a big role in baseball, making it unwise to be too confident that any one team
will win.
Now, Alex is offered a bet that leads to a $1 win if Kansas City win the World Series, and
a $1 loss if they do not. The expected return of that bet, given Alex’s credences, is +80c. So
should Alex make the bet?
Intuitively, Alex should not. It’s true that given Alex’s credences, the bet is a good one.
But it’s also true that Alex has crazy credences. Given more sensible credences, the bet has
a negative expected return. So Alex should not make the bet.
66
It’s worth stepping away from probabilities, expected values and the like to think about
this in a simpler context. Imagine a person has some crazy beliefs about what is an effective
way to get some good end. And assume they, quite properly, want that good end. In fact,
however, acting on their crazy beliefs will be counterproductive; it will just make things
worse for everyone. And their evidence supports this. Should they act on their beliefs?
Intuitively not. To be sure, if they didn’t act on their beliefs, there would be some inconsis-
tency between their beliefs and their actions. But inconsistency isn’t the worst thing in the
world. They should, instead, have different beliefs.
Similarly Alex should have different credences in the case in question. The question,
what should Alex do given these credences, seems less interesting than the question, what
should Alex do? And that’s what we’ll look at.
67
The probability of each proposition is a measure of how strongly it is supported by the evi-
dence.
That’s different from what a rational person would believe in two respects. For one thing,
there is a fact about how strongly the evidence supports p, even if different people might
disagree about just how strongly that is. For another thing, it isn’t true that the evidence
supports that you are perfectly rational, even though you would believe that if you were
perfectly rational. So the two objections we just mentioned are not an issue here.
From now on then, when we talk about probability in the context of expected utility, we’ll
talk about evidential probabilities. There’s an issue, one we’ll return to later, about whether
we can numerically measure strengths of evidence. That is, there’s an issue about whether
strengths of evidence are the right kind of thing to be put on a numerical scale. Even if they
are, there’s a tricky issue about how we can even guess what they are. I’m going to cheat a
little here. Despite the arguments above that evidential probabilities can’t be identified with
betting odds of perfectly rational agents, I’m going to assume that, unless we have reason to
the contrary, those betting odds will be our first approximation. So when we have to guess
what the evidential probability of p is, we’ll start with what odds a perfectly rational agent
(with your evidence) would look for before betting on p.
68
the friend might have taken in the past. These thoughts won’t be thoughts about chances in
the physicists’ sense; they’ll be about evidential probabilities.
Finally, chances are objective. The evidential probability that p is true might be different
for me than for you. For instance, the evidence she has might make it quite likely for the
juror that the suspect is guilty, even if he is not. But the evidence the suspect has makes it
extremely likely that he is innocent. Evidential probabilities differ between different people.
Chances do not. Someone might not know what the chance of a particular outcome is, but
what they are ignorant of is a matter of objective fact.
The upshot seems to be that chances are quite different things from evidential probabil-
ities, and the best thing to do is simply to take them to be distinct basic concepts.
69