Decision 5

Chapter 10
Sure Thing Principle

10.1 Generalising Dominance
The maximise expected utility rule also supports a more general version of dominance. We’ll
state the version of dominance using an example, then spend some time going over how we
know maximise expected utility satisfies that version.
The original dominance principle said that if A is better than B in every state, then A is
simply better than B simply. But we don’t have to just compare choices in individual states,
we can also compare them across any number of states. So imagine that we have to choose
between A and B and we know that one of four states obtains. The utility of each choice in
each state is given as follows.
S1 S2 S3 S4
A 10 9 9 0
B 8 3 3 3
And imagine we’re using the maximin rule. Then the rule says that A does better than B in
S1 , while B does better than A in S4 . The rule also says that B does better than A overall,
since it’s worst case scenario is 3, while A’s worst case scenario is 0. But we can also compare
A and B with respect to pairs of states. So conditional on us just being in S1 or S2 , then A is
better. Because between those two states, its worst case is 9, while B’s worst case is 3.
Now imagine we’ve given up on maximin, and are applying a new rule we’ll call maxi-
average. The maxiaverage rule tells us make the choice that has the highest (or maximum)
average of best case and worst case scenarios. The rule says that B is better overall, since it
has a best case of 8 and a worst case of 3 for an average of 5.5, while A has a best case of 10
and a worst case of 0, for an average of 5.
But if we just know we’re in S1 or S2 , then the rule recommends A over B. That’s because
among those two states, A has a maximum of 10 and a minimum of 9, for an average of 9.5,
while B has a maximum of 8 and a minimum of 3 for an average of 5.5.
And if we just know we’re in S3 or S4 , then the rule also recommends A over B. That’s
because among those two states, A has a maximum of 9 and a minimum of 0, for an average
of 4.5, while B has a maximum of 3 and a minimum of 3 for an average of 3.
This is a fairly odd result. We know that either we’re in one of S1 or S2 , or that we’re in
one of S3 or S4 . And the rule tells us that if we find out which, i.e. if we find out we’re in S1
54
or S2 , or we find out we’re in S3 or S4 , either way we should choose A. But before we find
this out, we should choose B.
Here then is a more general version of dominance. Assume our initial states are {S1 , S2 , ..., Sn }.
Call this set S. A binary partition of S is a pair of sets of states, call them T1 and T2 , such
that every state in S is in exactly one of T1 and T2 . (We’re simplifying a little here - generally
a partition is any way of dividing a collection up into parts such that every member of the
original collection is in one of the ‘parts’. But we’ll only be interested in cases where we di-
vide the original states in two, i.e., into a binary partition.) Then the generalised version of
dominance says that if A is better than B among the states in T1 , and it is better than B among
the states in T2 , where T1 and T2 provide a partition of S, then it is better than B among the
states in S. That’s the principle that maxiaverage violates. A is better than B among the states
{S1 , S2 }. And it is better than B among the states {S3 , S4 }. But it isn’t better than B among
the states {S1 , S2 , S3 , S4 }. That is, it isn’t better than B among the states generally.
We’ll be interested in this principle of dominance because, unlike perhaps dominance
itself, there are some cases where it leads to slightly counterintuitive results. For this reason
some theorists have been interested in theories which, although they satisfy dominance, do
not satisfy this general version of dominance.
On the other hand, maximise expected utility does respect this principle. In fact, it
respects an even stronger principle, one that we’ll state using the notion of conditional
expected utility. Recall that as well as probabilities, we defined conditional probabilities
above. Well conditional expected utilities are just the expectations of the utility function
with respect to a conditional probability. More formally, if there are states S1 , S2 , ..., Sn , then
the expected utility of A conditional on E, which we’ll write Exp(U(A|E), is
Exp(U(A|E)) = Pr(S1 |E)U(S1 |A) + Pr(S2 |E)U(S2 |A) + ... + Pr(Sn |E)U(Sn |A)
That is, we just replace the probabilities in the definition of expected utility with conditional
probabilities. (You might wonder why we didn’t also replace the utilities with conditional
utilities. That’s because we’re assuming that states are defined so that given an action, the
state has a fixed utility. If we didn’t make this simplifying assumption, we’d have to be more
careful here.) Now we can prove the following theorem.
• If Exp(U(A|E)) > Exp(U(B|E)), and Exp(U(B|¬E)) > Exp(U(B|¬E)), then Exp(U(A)) >
Exp(U(B)).
We’ll prove this by proving something else that will be useful in many contexts.
• Exp(U(A)) = Exp(U(A|E))Pr(E) + Exp(U(A|¬E))Pr(¬E)
To see this, note the following
Pr(Si ) = Pr((Si ∧ E) ∨ (Si ∧ ¬E))

= Pr(Si ∧ E) + Pr(Si ∧ ¬E)
= Pr(Si |E)Pr(E) + Pr(Si |¬E)Pr(¬E)
55
And now we’ll use this when we’re expanding Exp(U(A|E))Pr(E).
Exp(U(A|E))Pr(E) = Pr(E)[Pr(S1 |E)U(S1 |A) + Pr(S2 |E)U(S2 |A) + ... + Pr(Sn |E)U(Sn |A)]
= Pr(E)Pr(S1 |E)U(S1 |A) + Pr(E)Pr(S2 |E)U(S2 |A) + ... + Pr(E)Pr(Sn |E)U(Sn |A)
Exp(U(A|¬E))Pr(¬E) = Pr(¬E)[Pr(S1 |¬E)U(S1 |A) + Pr(S2 |¬E)U(S2 |A) + ... + Pr(Sn |¬E)U(Sn |A)]
= Pr(¬E)Pr(S1 |¬E)U(S1 |A) + Pr(¬E)Pr(S2 ¬|E)U(S2 |A) + ... + Pr(¬E)Pr(Sn |¬E)U(Sn |A)
Putting those two together, we get
Exp(U(A|E))Pr(E) + Exp(U(A|¬E))Pr(¬E)
= Pr(E)Pr(S1 |E)U(S1 |A) + ... + Pr(E)Pr(Sn |E)U(Sn |A)+
Pr(¬E)Pr(S1 |¬E)U(S1 |A) + ... + Pr(¬E)Pr(Sn |¬E)U(Sn |A)
= (Pr(E)Pr(S1 |E) + Pr(¬E)Pr(S1 |¬E))U(S1 |A) + ... + (Pr(E)Pr(Sn |E) + Pr(¬E)Pr(Sn |¬E))U(Sn |A)
= Pr(S1 )U(S1 |A) + Pr(S2 )U(S2 |A) + ...Pr(Sn )U(Sn |A)
= Exp(U(A))
Now if Exp(U(A|E)) > Exp(U(B|E)), and Exp(U(B|¬E)) > Exp(U(B|¬E)), then the follow-
ing two inequalities hold.
Exp(U(A|E))Pr(E) ≥ Exp(U(B|E))Pr(E)
Exp(U(A|¬E))Pr(¬E) ≥ Exp(U(B|¬E))Pr(¬E)
In each case we have equality only if the probability in question (Pr(E) in the first line,
Pr(¬E) in the second) is zero. Since not both Pr(E) and Pr(¬E) are zero, one of those is a
strict inequality. (That is, the left hand side is greater than, not merely greater than or equal
to, the right hand side.) So adding up the two lines, and using the fact that in one case we
have a strict inequality, we get
Exp(U(A|E))Pr(E) + Exp(U(A|¬E))Pr(¬E) ≥ Exp(U(B|E))Pr(E) + Exp(U(B|¬E))Pr(¬E)

i.e. Exp(U(A)) > Exp(U(B))
That is, if A is better than B conditional on E, and it is better than B conditional on ¬E, then
it is simply better than B.
10.2 Sure Thing Principle

The result we just proved is very similar to a famous principle of decision theory, the Sure
Thing Principle. The Sure Thing Principle is usually stated in terms of one option being at
least as good as another, rather than one option being better than another, as follows.
Sure Thing Principle If AE ⪰ BE and A¬E ⪰ B¬E, then A ⪰ B.
The terminology there could use some spelling out. By A ≻ B we mean that A is preferred
to B. By A ⪰ B we mean that A is regarded as at least as good as B. The relation between ≻
and ⪰ is like the relation between > and ≥. In each case the line at the bottom means that
we’re allowing equality between the values on either side.
56
The odd thing here is using AE ⪰ BE rather than something that’s explicitly conditional.
We should read the terms on each side of the inequality sign as conjunctions. It means that A
and E is regarded as at least as good an outcome as B and E. But that sounds like something
that’s true just in case the agent prefers A to B conditional on E obtaining. So we can use
preferences over conjunctions like AE as proxy for conditional preferences.
So we can read the Sure Thing Principle as saying that if A is at least as good as B con-
ditional on E, and conditional on ¬E, then it really is at least as good as B. Again, this looks
fairly plausible in the abstract, though we’ll soon see some reasons to worry about it.
Expected Utility maximisation satisfies the Sure Thing Principle. I won’t go over the
proof here because it’s really just the same as the proof from the previous section with >
replaced by ≥ in a lot of places. But if we regard the Sure Thing Principle as a plausible
principle of decision making, then it is a good feature of Expected Utility maximisation that
it satisfies it.
It is tempting to think of the Sure Thing Principle as a generalisation of a principle of
logical implication we all learned in propositional logic. The principle in question said that
from X → Z, and Y → Z, and X ∨ Y, we can infer C. If we let Z be that A is better than
B, let X be E, and Y be ¬E, it looks like we have all the premises, and the reasoning looks
intuitively right. But this analogy is misleading for two reasons.
First, for technical reasons we can’t get into in depth here, preferring A to B conditional
on E isn’t the same as it being true that if E is true you prefer A to B. To see some problems
with this, think about cases where you don’t know E is true, and A is something quite hor-
rible that mitigates the effects of the unpleasant E. In this case you do prefer AE to BE, and
E is true, but you don’t prefer A to B. But we’ll set this question, which is largely a logical
question about the nature of conditionals, to one side.
The bigger problem is that the analogy with logic would suggest that the following gen-
eralisation of the Sure Thing Principle will hold.
Disjunction Principle If AE1 ⪰ BE1 and AE2 ⪰ BE2 , and Pr(E1 ∨ E2 ) = 1 then A ⪰ B.
But this “Disjunction Principle” seems no good in cases like the following. I’m going to toss
two coins. Let p be the proposition that they will land differently, i.e. one heads and one
tails. I offer you a bet that pays you $2 if p, and costs you $3 if ¬p. This looks like a bad
bet, since Pr(p) = 0.5, and losing $3 is worse than gaining $2. But consider the following
argument.
Let E1 be that at least one of the coins landing heads. It isn’t too hard to show that
Pr(p|E1 ) = 2/3. So conditional on E1 , the expected return of the bet is 2/3 × 2 – 1/3 × 3 =
4/3 – 1 = 1/3. That’s a positive return. So if we let A be taking the bet, and B be declining the
bet, then conditional on E1 , A is better than B, because the expected return is positive.
Let E2 be that at least one of the coins landing tails. It isn’t too hard to show that
Pr(p|E1 ) = 2/3. So conditional on E2 , the expected return of the bet is 2/3 × 2 – 1/3 × 3 =
4/3 – 1 = 1/3. That’s a positive return. So if we let A be taking the bet, and B be declining the
bet, then conditional on E2 , A is better than B, because the expected return is positive.
Now if E1 fails, then both of the coins lands tails. That means that at least one of the
coins lands tails. That means that E2 is true. So if E1 fails E2 is true. So one of E1 and E2
57
has to be true, i.e. Pr(E1 ∨ E2 ) = 1. And AE1 ⪰ BE1 and AE2 ⪰ BE2 . Indeed AE1 ≻ BE1
and AE2 ≻ BE2 . But B ≻ A. So the disjunction principle isn’t in general true.
It’s a deep philosophical question how seriously we should worry about this. If the Sure
Thing Principle isn’t any more plausible intuitively than the Disjunction Principle, and the
Disjunction Principle seems false, does that mean we should be sceptical of the Sure Thing
Principle? As I said, that’s a very hard question, and it’s one we’ll return to a few times in
what follows.
10.3 Allais Paradox

The Sure Thing Principle is one of the more controversial principles in decision theory be-
cause there seem to be cases where it gives the wrong answer. The most famous of these is
the Allais paradox, first discovered by the French economist (and Nobel Laureate) Maurice
Allais. In this paradox, the subject is first offered the following choice between A and B.
The results of their choice will depend on the drawing of a coloured ball from an urn. The
urn contains 10 white balls, 1 yellow ball, and 89 black balls, and assume the balls are all
randomly distributed so the probability of drawing each is identical.
White Yellow Black

A $1,000,000 $1,000,000 $0
B $5,000,000 $0 $0
That is, they are offered a choice between an 11% shot at $1,000,000, and a 10% shot at
$5,000,000. Second, the subjects are offered the following choice between C and D, which
are dependent on drawings from a similarly constructed urn.
White Yellow Black

C $1,000,000 $1,000,000 $1,000,000
D $5,000,000 $0 $1,000,000
That is, they are offered a choice between $1,000,000 for sure, and a complex bet that gives
them a 10% shot at $5,000,000, an 89% shot at $1,000,000, and a 1% chance of striking out
and getting nothing.
Now if we were trying to maximise expected dollars, then we’d have to choose both B
and D. But, and this is an important point that we’ll come back to, dollars aren’t utilities.
Getting $2,000,000 isn’t twice as good as getting $1,000,000. Pretty clearly if you were offered
a million dollars or a 50% chance at two million dollars you would, and should, take the
million for sure. That’s because the two million isn’t twice as useful to you as the million.
Without a way of figuring out the utility of $1,000,000 versus the utility of $5,000,000, we
can’t say whether A is better than B. But we can say one thing. You can’t consistently hold
the following three views.
• B≻A
• C≻D
• The Sure Thing Principle holds
58
This is relevant because a lot of people think B ≻ A and C ≻ D. Let’s work through the
proof of this to finish with.
Let E be that either a white or yellow ball is drawn. So ¬E is that a black ball is drawn.
Now note that A¬E is identical to B¬E. In either case you get nothing. So A¬E ⪰ B¬E. So
if AE ⪰ BE then, by Sure Thing, A ⪰ B. Equivalently, if B ≻ A, then BE ≻ AE. Since we’ve
assumed B ≻ A, then BE ≻ AE.
Also note that C¬E is identical to D¬E. In either case you get a million dollars. So
D¬E ⪰ C¬E. So if DE ⪰ CE then, by Sure Thing, D ⪰ C. Equivalently, if C ≻ D, then
CE ≻ DE. Since we’ve assumed C ≻ D, then CE ≻ DE.
But now we have a problem, since BE = DE, and AE = CE. Given E, then choice between
A and B just is the choice between C and D. So holding simultaneously that BE ≻ AE and
CE ≻ DE is incoherent.
It’s hard to say for sure just what’s going on here. Part of what’s going on is that we have
a ‘certainty premium’. We prefer options like C that guarantee a positive result. Now having
a certainly good result is a kind of holistic property of C. The Sure Thing Principle in effect
rules out assigning value to holistic properties like that. The value of the whole need not be
identical to the value of the parts, but any comparisons between the values of the parts has
to be reflected in the value of the whole. Some theorists have thought that a lesson of the
Allais paradox is that this is a mistake.
We won’t be looking in this course at theories which violate the Sure Thing Principle,
but we will be looking at justifications of the Sure Thing Principle, so it is worth thinking
about reasons you might have for rejecting it.
59
10.4 Exercises
10.4.1 Calculate Expected Utilities
In the following example Pr(S1 ) = 0.4, Pr(S2 ) = 0.3, Pr(S3 ) = 0.2 and Pr(S4 ) = 0.1. The table
gives the utility of each of the possible actions (A, B, C, D and E) in each state. What is the
expected utility of each action?
S1 S2 S3 S4
A 0 2 10 2
B 6 2 1 7
C 1 8 9 7
D 3 1 8 6
E 4 7 1 4
10.4.2 Conditional Choices

In the previous example, C is the best thing to do conditional on S2 . It has expected utility 8
in that case, and all the others are lower. It is also the best thing to do conditional on S2 ∨ S3 .
It has expected utility 8.4 if we conditionalise on S2 ∨ S3 , and again all the others are lower.
For each of the actions A, B, C, D and E, find a proposition such that conditional on that
proposition, the action in question has the highest expected utility.
10.4.3 Generalised Dominance

Does the maximax decision rule satisfy the generalised dominance principle we discussed
in the text? That principle says that if the initial range of states is S, and T1 and T2 form a
partition of S, and if A is a better choice than B conditional on being in T1 , and A is also
a better choice than B conditional on being in T2 , then A is simply a better choice than B.
Does this principle hold for the maximax decision rule?
10.4.4 Sure Thing Principle

Assume we’re using the ‘Maximise Expected Utility’ rule. And assume that B is not the best
choice out of our available choices conditional on E. Assume also that B is not the best choice
out of our available choices conditional on ¬E. Does it follow that B is not the best available
choice? If so, provide an argument that this is the case. If not, provide a counterexample,
i.e. a case where B is not the best choice conditional on E, not the best choice conditional
on ¬E, but the best choice overall.
60
Chapter 11
Understanding Probability
11.1 Kinds of Probability
As might be clear from the discussion of what probability functions are, there are a lot of
probability functions. For instance, the following is a probability function for any (logically
independent) p and q.
p q Pr
T T 0.97
T F 0.01
F T 0.01
F F 0.01
But if p actually is that the moon is made of green cheese, and q is that there are little green
men on Mars, you probably won’t want to use this probability function in decision making.
That would commit you to making some bets that are intuitively quite crazy.
So we have to put some constraints on the kinds of probability we use if the “Maximise
Expected Utility” rule is likely to make sense. As it is sometimes put, we need to have an in-
terpretation of the Pr in the expected utility rule. We’ll look at three possible interpretations
that might be used.
11.2 Frequency
Historically probabilities were often identified with frequencies. If we say that the probabil-
ity that this F is a G is, say, 23 , that means that the proportion of F’s that are G’s is 23 .
Such an approach is plausible in a lot of cases. If we want to know what the probability
is that a particular student will catch influenza this winter, a good first step would be to find
1
out the proportion of students who will catch influenza this winter. Let’s say this is 10 . Then,
to a first approximation, if we need to feed into our expected utility calculator the probability
1
that this student will catch influenza this winter, using 10 is not a bad first step. Indeed, the
insurance industry does not a bad job using frequencies as guides to probabilities in just this
way.
But that can hardly be the end of the story. If we know that this particular student has
not had an influenza shot, and that their boyfriend and their roommate have both caught
influenza, then the probability of them catching influenza would now be much higher. With
61
that new information, you wouldn’t want to take a bet that paid $1 if they didn’t catch in-
fluenza, but lost you $8 if they did catch influenza. The odds now look like that’s a bad
bet.
Perhaps the thing to say is that the relevant group is not all students. Perhaps the relevant
group is students who haven’t had influenza shots and whose roommates and boyfriends
have also caught influenza. And if, say, 23 of such students have caught influenza, then per-
haps the probability that this student will catch influenza is 32 .
You might be able to see where this story is going by now. We can always imagine more
details that will make that number look inappropriate as well. Perhaps the student in ques-
tion is spending most of the winter doing field work in South America, so they have little
chance to catch influenza from their infected friends. And now the probability should be
lower. Or perhaps we can imagine that they have a genetic predisposition to catch influenza,
so the probability should be higher. There is always more information that could be relevant.
The problem for using frequencies as probabilities then is that there could always be
more precise information that is relevant to the probability. Every time we find that the
person in question isn’t merely an F (a student, say), but is a particular kind of F (a student
who hasn’t had an influenza shot, whose close contacts are infected, who has a genetic pre-
disposition to influenza), we want to know the proportion not of F’s who are G’s, but the
proportion of the more narrowly defined class who are G’s. But eventually this will leave us
with no useful probabilities at all, because we’ll have found a way of describing the student
in question such that they are the only person in history who satisfies this description.
This is hardly a merely theoretical concern. If we are interested in the probability that a
particular bank will go bankrupt, or that a particular Presidential candidate will win elec-
tion, it isn’t too hard to come up with a list of characteristics of the bank or candidate in
question in such a way that they are the only one in history to meet that description. So the
frequency that such banks will go bankrupt is either 1 (1 out of 1 go bankrupt) or 0 (0 out
of 1 do). But those aren’t particularly useful probabilities. So we should look elsewhere for
an interpretation of the Pr that goes into our definition of expected utility.
In the literature there are two objections to using frequencies as probabilities that seem
related to the argument we’re looking at here.
One of these is the Reference Class Problem. This is the problem that if we’re interested
in the probability that a particular person is G, then the frequency of G-hood amongst the
different classes the person is in might differ.
The other is the Single Case Problem. This is the problem that we’re often interested in
one-off events, like bank failures, elections, wars etc, that don’t naturally fit into any natural
broader category.
I think the reflections here support the idea that these are two sides of a serious problem
for the view that probabilities are frequencies. In general, there actually is a natural solution
to the Reference Class Problem. We look to the most narrowly drawn reference class we
have available. So if we’re interested in whether a particular person will survive for 30 years,
and we know they are a 52 year old man who smokes, we want to look not to the survival
frequencies of people in general, or men in general, or 52 year old men in general, but 52
year old male smokers.
Perhaps by looking at cases like this, we can convince ourselves that there is a natural
solution to the Reference Class Problem. But the solution makes the Single Case Problem
62
come about. Pretty much anything that we care about is distinct in some way or another.
That’s to say, if we look closely we’ll find that the most natural reference class for it just
contains that one thing. That’s to say, it’s a single case in some respect. And one-off events
don’t have interesting frequencies. So frequencies aren’t what we should be looking to as
probabilities.
11.3 Degrees of Belief

In response to these worries, a lot of philosophers and statisticians started thinking of prob-
ability in purely subjective terms. The probability of a proposition p is just how confident
the agent is that p will obtain. This level of confidence is the agent’s degree of belief that p
will obtain.
Now it isn’t altogether to measure degrees of belief. I might be fairly confident that my
baseball team will win tonight, and more confident that they’ll win at least one of the next
three games, and less confident that they’ll win all of their next three games, but how could
we measure numerically each of those strengths. Remember that probabilities are numbers.
So if we’re going to identify probabilities with degrees of belief, we have to have a way to
convert strengths of confidence to numbers.
The core idea about how to do this uses the very decision theory that we’re looking for
input to. I’ll run through a rough version of how the measurement works; we’ll be refining
this quite a bit as the course goes on. Imagine you have a chance to buy a ticket that pays
$1 if p is true. How much, in dollars, is the most would you pay for this? Well, it seems that
how much you should pay for this is the probability of p. Let’s see why this is true. (Assume
in what follows that the utility of each action is given by how many dollars you get from the
action; this is the simplifying assumption we’re making.) If you pay $Pr(p) for the ticket,
then you’ve performed some action (call it A) that has the following payout structure.
{
1 – Pr(p) if p,
U(A) =
–Pr(p) if ¬p.
So the expected value of U(A) is
Exp(U(A)) = Pr(p)U(Ap) + Pr(¬p)U(A¬p)

= Pr(p)(1 – Pr(p)) + Pr(¬p)U(A¬p)
= Pr(p)(1 – Pr(p)) + (1 – Pr(p))(–Pr(p))
= Pr(p)(1 – Pr(p)) – (1 – Pr(p))(Pr(p))
=0
So if you pay $Pr(p) for the bet, your expected return is exactly 0. Obviously if you pay
more, you’re worse off, and if you pay less, you’re better off. $Pr(p) is the break even point,
so that’s the fair price for the bet.
And that’s how we measure degrees of belief. We look at the agent’s ‘fair price’ for a bet
that returns $1 if p. (Alternatively, we look at the maximum they’ll pay for such a bet.) And
that’s they’re degree of belief that p. If we’re taking probabilities to be degrees of belief, if we
63
are (as it is sometimes put) interpreting probability subjectively, then that’s the probability
of p.
This might look suspiciously circular. The expected utility rule was meant to give us
guidance as to how we should make decisions. But the rule needed a probability as an input.
And now we’re taking that probability to not only be a subjective state of the agent, but a
subjective state that is revealed in virtue of the agent’s own decisions. Something seems odd
here.
Perhaps we can make it look even odder. Let p be some proposition that might be true
and might be false, and assume that the agent’s choice is to take or decline a bet on p that has
some chance of winning and some chance of losing. Then if the agent takes the bet, that’s
a sign that their degree of belief in p was higher than the odds of the bet on p, so therefore
they are increasing their expected utility by taking the bet, so they are doing the right thing.
On the other hand, if they decline the bet, that’s a sign that their degree of belief in p was
lower than the odds of the bet on p, so therefore they are increasing their expected utility
by taking the bet, so they are doing the right thing. So either way, they do the right thing.
But a rule that says they did the right thing whatever they do isn’t much of a rule.
There are two important responses to this, which are related to one another. The first
is that although the rule does (more or less) put no restrictions at all on what you do when
faced with a single choice, it can put quite firm constraints on your sets of choices when
you have to make multiple decisions. The second is that the rule should be thought of as a
procedural rather than substantive rule of rationality. We’ll look at these more closely.
If we take probabilities to be subjective probabilities, i.e. degrees of belief, then the max-
imise expected utility rule turns out to be something like a consistency constraint. Compare
it to a rule like Have Consistent Beliefs. As long as we’re talking about logically contingent
matters, this doesn’t put any constraint at all on what you do when faced with a single ques-
tion of whether to believe p or ¬p. But it does put constraints on what further beliefs you
can have once you believe p. For instance, you can’t now believe ¬p.
The maximise expected utility rule is like this. Indeed we already saw this in the Allais
paradox. The rule, far from being empty, rules out the pair of choices that many people
intuitively think is best. So if the objection is that the rule has no teeth, that objection can’t
hold up.
We can see this too in simpler cases. Let’s say I offer the agent a ticket that pays $1 if p,
and she pays 60c for it. So her degree of belief in p must be at least 0.6. Then I offer her a
ticket that pays $1 if ¬p, and she pays 60c for it too. So her degree of belief in ¬p must be at
least 0.6. But, and here’s the constraint, we think degrees of belief have to be probabilities.
And if Pr(p) > 0.6, then Pr(¬p) < 0.4. So if Pr(¬p) > 0.6, we have an inconsistency. That’s
bad, and it’s the kind of badness it is the job of the theory to rule out.
One way to think about the expected utility rule is to compare it to norms of means-
end rationality. At times when we’re thinking about what someone should do, we really
focus on what the best means is to their preferred end. So we might say If you want to go to
Harlem, you should take the A train, without it even being a relevant question whether they
should, in the circumstances, want to go to Harlem.
The point being made here is quite striking when we consider people with manifestly
crazy beliefs. If we’re just focussing on means to an end, then we might look at someone
who, say, wants to crawl from the southern tip of Broadway to its northern tip. And we’ll
64
say “You should get some kneepads so you don’t scrape your knees, and you should take lots
of water, and you should catch the 1 train down to near to where Broadway starts, etc.” But
if we’re not just offering procedural advice, but are taking a more substantive look at their
position, we’ll say “You should come up with a better idea about what to do, because that’s
an absolutely crazy thing to want.”
As we’ll see, the combination of the maximise expected utility rule with the use of de-
grees of belief as probabilities leads to a similar set of judgments. On the one hand, it is a
very good guide to procedural questions. But it leaves some substantive questions worry-
ingly unanswered. Next time we’ll come back to this distinction, and see if there’s a better
way to think about probability.
65
Chapter 12
Objective Probabilities
12.1 Credences and Norms
We ended last time with looking at the idea that the probabilities in expected utility calcu-
lations should be subjective. As it is sometimes put, they should be degrees of belief. Or, as
it is also sometimes put, they should be credences. We noted that under this interpretation,
the maximise expected utility rule doesn’t put any constraints on certain simple decisions.
That’s because we use the rule to calculate what credences are, and then use the very same
credences to say what the rule requires. But the rule isn’t useless. It puts constraints, often
sharp constraints, on sets of decisions. In this respect it is more like the rule Have Consistent
Beliefs than like the rule Believe What’s True, or Believe What Your Evidence Supports. And
we compared it to procedural, as opposed to substantive norms.
What’s left from all that are two large questions.
• Do we get the right procedural/consistency constraints from the expected utility rule?
In particular (a) should credences be probabilities, and (b) should we make complex
decisions by the expected utility rule? We’ll look a bit in what follows at each of these
questions.
• Is a purely procedural constraint all we’re looking for in a decision theory?
And intuitively the answer to the second question is No. Let’s consider a particular case.
Alex is very confident that the Kansas City Royals will win baseball’s World Series next year.
In fact, Alex’s credence in this is 0.9, very close to 1. Unfortunately, there is little reason for
this confidence. Kansas City has been one of the worst teams in baseball for many years,
the players they have next year will be largely the same as the players they had when doing
poorly this year, and many other teams have players who have performed much much better.
Even if Kansas City were a good team, there are 30 teams in baseball, and relatively random
events play a big role in baseball, making it unwise to be too confident that any one team
will win.
Now, Alex is offered a bet that leads to a $1 win if Kansas City win the World Series, and
a $1 loss if they do not. The expected return of that bet, given Alex’s credences, is +80c. So
should Alex make the bet?
Intuitively, Alex should not. It’s true that given Alex’s credences, the bet is a good one.
But it’s also true that Alex has crazy credences. Given more sensible credences, the bet has
a negative expected return. So Alex should not make the bet.
66
It’s worth stepping away from probabilities, expected values and the like to think about
this in a simpler context. Imagine a person has some crazy beliefs about what is an effective
way to get some good end. And assume they, quite properly, want that good end. In fact,
however, acting on their crazy beliefs will be counterproductive; it will just make things
worse for everyone. And their evidence supports this. Should they act on their beliefs?
Intuitively not. To be sure, if they didn’t act on their beliefs, there would be some inconsis-
tency between their beliefs and their actions. But inconsistency isn’t the worst thing in the
world. They should, instead, have different beliefs.
Similarly Alex should have different credences in the case in question. The question,
what should Alex do given these credences, seems less interesting than the question, what
should Alex do? And that’s what we’ll look at.
12.2 Evidential Probability

We get a better sense of what an agent should do if we look not to what credences they have,
but to what credences they should have. Let’s try to formalise this as the credences they
would have if they were perfectly rational.
Remember credences are still being measured by betting behaviour, but now it is betting
behaviour under the assumption of perfect rationality. So the probability of p is the highest
price the agent would pay for a bet that pays $1 if p, if they were perfectly rational. The thing
that should be done then is the thing that has the highest expected utility, relative to this
probability function. In the simple case where the choice is between taking and declining a
bet, this becomes a relatively boring theory - you should take the bet if you would take the
bet if you were perfectly rational. In the case of more complicated decisions, it becomes a
much more substantive theory. (We’ll see examples of this in later weeks.)
But actually we’ve said enough to give us two philosophical puzzles.
The first concerns whether there determinately is a thing that you would do if you were
perfectly rational. Consider a case where you have quite a bit of evidence for and against
p. Different rational people will evaluate the evidence in different ways. Some people will
evaluate p as being more likely than not, and so take a bet at 50/50 odds on p. Others will
consider the evidence against p to be stronger, and hence decline a bet at 50/50 odds. It
seems possible that both sides in such a dispute could be perfectly rational.
The danger here is that if we define rational credences as the credences a perfectly ratio-
nal person would have, we might not have a precise definition. There may be many different
credences that a perfectly rational person would have. That’s bad news for a purported def-
inition of rational credence.
The other concerns cases where p is about your own rationality. Let’s say p is the propo-
sition that you are perfectly rational. Then if you were perfectly rational, your credence in
this would probably be quite high. But that’s not the rational credence for you to have right
now in p. You should be highly confident that you, like every other human being on the
planet, are susceptible to all kinds of failures of rationality. So it seems like a mistake in
general to set your credences to what they would be were you perfectly rational.
What seems better in general is to proportion your credences to the evidence. The ra-
tional credences are the ones that best reflect the evidence you have in favour of various
propositions. The idea here to to generate what’s usually called an evidential probability.
67
The probability of each proposition is a measure of how strongly it is supported by the evi-
dence.
That’s different from what a rational person would believe in two respects. For one thing,
there is a fact about how strongly the evidence supports p, even if different people might
disagree about just how strongly that is. For another thing, it isn’t true that the evidence
supports that you are perfectly rational, even though you would believe that if you were
perfectly rational. So the two objections we just mentioned are not an issue here.
From now on then, when we talk about probability in the context of expected utility, we’ll
talk about evidential probabilities. There’s an issue, one we’ll return to later, about whether
we can numerically measure strengths of evidence. That is, there’s an issue about whether
strengths of evidence are the right kind of thing to be put on a numerical scale. Even if they
are, there’s a tricky issue about how we can even guess what they are. I’m going to cheat a
little here. Despite the arguments above that evidential probabilities can’t be identified with
betting odds of perfectly rational agents, I’m going to assume that, unless we have reason to
the contrary, those betting odds will be our first approximation. So when we have to guess
what the evidential probability of p is, we’ll start with what odds a perfectly rational agent
(with your evidence) would look for before betting on p.
12.3 Objective Chances

There is another kind of probability that theorists are often interested in, one that plays a
particularly important role in modern physics. Classical physics was, or at least was thought
to be, deterministic. Once the setup of the universe at a time t was set, the laws of nature
determined what would happen after t. Modern physics is not deterministic. The laws don’t
determine, say, how long it will take for an unstable particle to decay. Rather, all the laws
say is that the particle has such-and-such a chance of decaying in a certain time period. You
might have heard references to the half-life of different radioactive particles; this is the time
in which the particle has a 21 probabiilty of decaying.
What are these probabilities that the scientists are talking about? Let’s call them ‘chances’
to give them a name. So the question is, what is the status of chances. We know chances
aren’t evidential probabilities. We know this for three reasons.
One is that it is a tricky empirical question whether any event has any chance other than
0 or 1. It is now something of a scientific consensus that some events are indeed chancy.
But this relies on some careful scientific investigation. It isn’t something we can tell from
our armchairs. But we can tell from just thinking about decisions under uncertainty that
the evidential probability of some outcomes is between 0 and 1.
Another is that, as chances are often conceived, events taking place in the past do not,
right now, have chances other than 0 or 1. There might have been, at a point in the past,
some intermediate chance of a particle decaying. But if we’re now asking about whether a
particle did decay or not in the last hour, then either it did decay, and its chance is 0, or it
did not decay, and its chance is 1. (I should note that not everyone thinks about chances
in quite this way, but it is a common way to think about them.) There are many events
that took place in the past, however, whose evidential probability is between 0 and 1. For
instance, if we’re trying to meet up a friend, and hence trying to figure out where the friend
might have gone to, we’ll think about, and assign evidential probabilities to, various paths
68
the friend might have taken in the past. These thoughts won’t be thoughts about chances in
the physicists’ sense; they’ll be about evidential probabilities.
Finally, chances are objective. The evidential probability that p is true might be different
for me than for you. For instance, the evidence she has might make it quite likely for the
juror that the suspect is guilty, even if he is not. But the evidence the suspect has makes it
extremely likely that he is innocent. Evidential probabilities differ between different people.
Chances do not. Someone might not know what the chance of a particular outcome is, but
what they are ignorant of is a matter of objective fact.
The upshot seems to be that chances are quite different things from evidential probabil-
ities, and the best thing to do is simply to take them to be distinct basic concepts.
12.4 The Principal Principle and Direct Inference

Although chances and evidential probabilities are distinct, it seems they stand in some close
relation. If a trustworthy physicist tells you that a particle has an 0.8 chance of decaying
in the next hour, then it seems your credences should be brought into line with what the
physicists say. This idea has been dubbed the Principal Principle, because it is the main
principle linking chances and credences. If we use Pr for evidential probabilities, and Ch
for objective chances in the physicists’ sense, then the idea behind the principle is this.
Principal Principle Pr(p|Ch(p) = x) = x
That is, the probability of p, conditional on the chance of p being x, is x.

The Principal Principle may need to be qualified. If your evidence also includes that p,
then even if the chance of p is 0.8, perhaps your credence in p should be 1. After all, p is
literally evident to you. But perhaps it is impossible for p to be part of your evidence while
its chance is less than 1. The examples given in the literature of how this could come about
are literally spectacular. Perhaps God tells you that p is true. Or perhaps a fortune teller
with a crystal ball sees that it is true. Or something equally bizarre happens. Any suggested
exceptions to the principle have been really outlandish. So whether the principle is true for
all possible people in all possible worlds, it seems to hold for us around here.
Chances, as the physicists think of them, are not frequencies. It might be possible to
compute the theoretical chance of a rare kind of particle not decaying over the course of
an hour, even though the particle is so rare, and so unstable, that no such particle has ever
survived an hour. In that case the frequency of survival (i.e. the proportion of all such
particles that do actually survive an hour) is 0, but physical theory might tell us that the
chance is greater than 0. Nevertheless chances are like frequencies in some respects.
One such respect is that chances are objective. Just as the chance of a particle decay is an
objective fact, one that we might or might not be aware of, the frequency of particle decay
is also an objective fact that we might or might not be aware of. Neither of these facts are in
any way relative to the evidence of a particular agent, the way that evidential probabilities
are.
And just like chances, frequencies might seem to put a constraint on credences. Con-
sider a case where the only thing you know about a is that it is G. And you know that the
frequency of F-hood among Gs is x. For instance, let a be a person you’ve never met, G be
the property of being a 74 year old male smoker, and F the property of surviving 10 more
69

Decision 5

Uploaded by

Copyright:

Available Formats

Decision 5

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Decision 5

Uploaded by

Copyright:

Available Formats

Chapter 10

Sure Thing Principle

• Exp(U(A)) = Exp(U(A|E))Pr(E) + Exp(U(A|¬E))Pr(¬E)

To see this, note the following

Pr(Si ) = Pr((Si ∧ E) ∨ (Si ∧ ¬E))

Putting those two together, we get

Exp(U(A|E))Pr(E) + Exp(U(A|¬E))Pr(¬E) ≥ Exp(U(B|E))Pr(E) + Exp(U(B|¬E))Pr(¬E)

10.2 Sure Thing Principle

Sure Thing Principle If AE ⪰ BE and A¬E ⪰ B¬E, then A ⪰ B.

10.3 Allais Paradox

White Yellow Black

White Yellow Black

10.4.2 Conditional Choices

10.4.3 Generalised Dominance

10.4.4 Sure Thing Principle

11.3 Degrees of Belief

Exp(U(A)) = Pr(p)U(Ap) + Pr(¬p)U(A¬p)

12.2 Evidential Probability

12.3 Objective Chances

12.4 The Principal Principle and Direct Inference

Principal Principle Pr(p|Ch(p) = x) = x

That is, the probability of p, conditional on the chance of p being x, is x.

You might also like