OCR S3 Revision Notes

OCR Statistics 3 Module Revision Sheet
The S3 exam is 1 hour 30 minutes long. You are allowed a graphics calculator.
Before you go into the exam make sure you are fully aware of the contents of the formula booklet
you receive. Also be sure not to panic; it is not uncommon to get stuck on a question (I’ve
been there!). Just continue with what you can do and return at the end to the question(s)
you have found hard. If you have time check all your work, especially the first question you
attempted. . . always an area prone to error.
J .M .S .
Preliminaries
x2
P
2
• In S1 when calculating the variance you will mostly have used σ = − x̄2 . This was
n
(x − x̄)2
P
2
for ease of calculation. However in S3 the equivalent formula σ = appears to
n
2
P
make a storming comeback. You will often be given (x − x̄) summary data and you
must know how to handle it.
• The unbiased estimator of variance from a sample (s2 ) simplifies to

P 2
(x − x̄)2 (x − x̄)2
P P
2 n x 2 n
s ≡ − x̄ = = .
n−1 n n−1 n n−1
(nx −1)s2x +(ny −1)s2y

• Because of this, if you need to calculate (for a two sample t-test) s2p = nx +ny −2
and are given (x − x̄)2 and (y − ȳ)2 then s2p simplifies thus
P P
(nx − 1)s2x + (ny − 1)s2y

s2p =
nx + ny − 2
P P
(x−x̄)2 (y−ȳ)2
(nx − 1) nx −1 + (n y − 1) ny −1
=
nx + ny − 2
x̄)2 + (y − ȳ)2
P P
(x −
= .
nx + ny − 2
Continuous Random Variables

• In S2 you met probability density functions (pdf) f (x). They measured where events were
more likely to occur than others. To find P(a < X < b) we needed to calculate the area
Rb
between x = a and x = b; i.e. a f (x) dx. In S3 we have cumulative distribution functions
(cdf) F (x) which are defined F (x) ≡ P(X 6 x). We can think of F (x) as the area to the
left of x in the pdf. So F (4) is the area to the left of 4 and F (3) is the area to the left of
3. Therefore P(3 < X < 4) = F (4) − F (3). This is an example of:
P(a < X < b) = F (b) − F (a).
• Cdfs make calculating the median (M ) very easy. You just solve F (M ) = 12 . Likewise the
upper (Q3 ) and lower (Q1 ) quartiles are very easy to calculate; F (Q1 ) = 41 and F (Q3 ) = 34 .
www.MathsHelper.co.uk 1 J.M.Stone
You must understand the concept of percentiles and how to get them from a cdf. The
85th percentile (say) is such that 85% of the data lies to the left of that point. Therefore
85
F (P85 ) = 100 .
• You cannot write x
x
x3 x3 1
Z
2
x dx = = − .
1 3 1 3 3
You must use a dummy variable thus:
Z x 3 x
2 t x3 1
t dt = = − .
1 3 1 3 3
Basically whenever you find yourself putting an x on the upper limit of an integral, change
all future x’s to t’s.
• To calculate f (x) from F (x) is easy; just differentiate F (x). For example given

 0 x<0
F (x) = x3
06x63
 27
1 x > 3.
x3 x2
When we differentiate the constants 0 and 1 they become 0. The 27 becomes 9 so the
pdf is x2
9 06x63
f (x) =
0 otherwise.
• To calculate F (x) from f (x) is a little trickier. You must remember that F (x) is the entire
area to the left of a point. Therefore given

 k 06x<2
f (x) = k(x − 1) 2 6 x 6 3
0 otherwise.

Rx
Firstly
2 x we calculate1 k = 27 . For the section 0 6 x < 2 we Rdo the expected 0 27 dt =
2 x 2
7 t 0 = 7 x. However, for the next region we do not just do 2 7 (x − 1) dt. We need to
add in the contribution from the first part (i.e. the value of F (2) from the first result; 74
x
in this case). So we do 47 + 2 27 (t − 1) dt = 71 (x2 − 2x + 4). Therefore
R

 0 x<0
 2

7 x 06x<2
F (x) = 1 2
(x − 2x + 4) 2 6 x 6 3
 7


1 x > 3.
• Once you have calculated your F (x) a nice check to see whether your cdf is correct is to
see if your F (x) is continuous2 which it must be. For example let’s say you discovered that

 0 x<0
 1

3 x 0 6x<1
F (x) = 2 − 5x + 2 1 6 x 6 2

 x 2
1 x > 2.

You then check the ‘boundary’ values where the functions are being joined; here they are
x = 0, x = 1 and x = 2. In this case there is no problem for x = 0 nor x = 2, but when
we look at x = 1 there is a problem. 13 x gives 31 but x2 − 25 x + 2 gives 12 . Therefore we
must have made a mistake which must be fixed.
1
R∞
By remembering −∞ f (x) dx = 1.
2
A function is continuous if you can draw it without taking your pen off the paper. . . basically.
• Given a cdf F (x) you can find a related cdf F (y) where X and Y are related; i.e Y =
g(X). The idea here is that F (x) ≡ P(X 6 x). Start with the original cdf. Then write
F (y) = P(Y 6 y) = P(g(X) 6 y) = P(X 6 g −1 (y)). Then replace every x in the original
cdf by g−1 (y) (even the ones in the limits).
For example given 
 0 x<2
1 2
F (x) = 8 (x − 2x) 2 6 x 6 4
1 x > 4.

and Y = 4X 2 find F (y).

√
y
So, F (y) = P(Y 6 y) = P(4X 2 6 y) = P(X 6 2 ) (we don’t have to worry about ± when
square rooting because the cdf is only defined for positive x). Therefore
√
y

 0 √

√ 2 < √
2
1 y 2 y y
F (y) = 8 (( 2 ) − 2( 2 )) 2 6 2 64
√
 y
1 2 > 4.

And so 
 0 y < 16
1 √
F (y) = 32 (y − 4 y) 16 6 y 6 64
1 y > 64.

• You must be a little careful if (say) Y = X1 . You start F (y) = P(Y 6 y) = P( X1 6 y) =

P(X > y1 ) = 1 − P(X 6 y1 ). Notice this reversal of the inequality sign; this is because if
a c b d
b > d then a < c .
R∞
• In S2 expectation for a pdf f (x) is E(X) = −∞ xf (x) dx. In S3 you can find the expec-
tation of any function g(X) of the pdf f (x) by the formula
Z ∞
E(g(X)) = g(x)f (x) dx.
−∞
For example find E(X 2 + 1) of

(
ex−1
e−1 16x62
f (x) =
0 otherwise.
So,
2
ex−1
Z
2
E(X + 1) = (x2 + 1) dx
1 e−1
Z 2
1
= x2 ex−1 + ex−1 dx
e−1 1
= int by parts twice on first bit. . . good exercise for you to do. . .
3e − 2
= .
e−1
Linear Combinations Random Variables

• Any random variable X can be transformed to become a new random variable Y = aX + b
where a and b are constants. It can be shown that
E(Y ) = E(aX + b) = aE(X) + b.
It can also be shown that
Var(Y ) = Var(aX + b) = a2 Var(X).
The b ‘disappears’ because it only has the effect of moving X up or down the number line
and does not therefore alter the spread (i.e. variance). Note also that the a gets squared
when one ‘pulls it out’ of the variance. Therefore Var(−2X) = (−2)2 Var(X) = 4Var(X).
It also makes sense with Var(−X) = (−1)2 Var(X) = Var(X) because if one makes all the
values of X negative from where they were they are just as spread out.
• Take any two random variables X and Y . If they are combined in a linear fashion aX +bY
for constant a and b then it is always true (even when X and Y are not independent)
that
E(aX + bY ) = aE(X) + bE(Y ).
If X and Y are independent then
Var(aX + bY ) = a2 Var(X) + b2 Var(Y ).
It is particularly useful to note that Var(X − Y ) = Var(X)+ Var(Y ). These results extend
(rather obviously) to more than two variables
E(a1 X1 + a2 X2 + · · · + an Xn ) = a1 E(X1 ) + a2 E(X2 ) + + · · · + an E(Xn ),

Var(a1 X1 + a2 X2 + · · · + an Xn ) = a21 Var(X1 ) + a22 Var(X2 ) + · · · + a2n Var(Xn ).
The second (of course) true if all independent.
• If X and Y are independent and normally distributed then aX + bY is also normally

distributed. Because E(aX + bY ) = aE(X) + bE(Y ) and Var(aX + bY ) = a2 Var(X) +
b2 Var(Y ) we find
X ∼ N (µ1 , σ12 ) and Y ∼ N (µ2 , σ22 ) ⇒ aX + bY ∼ N (aµ1 + bµ2 , a2 σ12 + b2 σ22 ).
For example when Jon throws a shot put his distance is J ∼ N (11, 4). When Ali throws
a shot his distance is A ∼ N (12, 9). Find the probability on one throw that Jon beats
Ali. So we need J − A ∼ N (11 − 12, 4 + 9) which gives J − A ∼ N (−1, 13). Notice the
variances have been added and that the expected value is negative (on average Jon will
lose to Ali). Now

0 − (−1)
P(J − A > 0) = P Z > √
13
= P(Z > 0.277)
= 1 − P(Z < 0.277) = 0.3909
• Given a random variable X you must fully appreciate the difference between two indepen-
dent samplings of this random variable (X1 and X2 ) and two times this random variable
(2X). For example given a random variable X such that
x 1 2
1 1 .
P(X = x) 2 2
The random variable 2X is doubling the outcome of one sampling of X, but X1 + X2

is adding two independent samplings of X. Thus 2X can only take values 2 and 4 with
probabilities 12 each. But X1 + X2 can take values 2, 3 and 4 with probabilities 14 , 12 and
1
4respectively. Note that the expected values for 2X and X1 + X2 are the same (because
E(2X) = 2E(X) and E(X1 + X2 ) = E(X1 ) + E(X2 ) = 2E(X)), but that the variances are
not the same; i.e. Var(2X) 6= Var(X1 + X2 ). This is because Var(2X) = 4Var(X) and
Var(X1 + X2 ) = Var(X1 ) + Var(X2 ) = 2Var(X).
For example given the above shot put example J ∼ N (11, 4). If Jon was to throw the
shot put three times (independently) and the total of all three throws recorded we would
need J1 + J2 + J3 ∼ N (33, 3 × 4) and not 3J ∼ N (33, 9 × 4).
• Given Poisson distributed X and Y it is even simpler. Here aX + bY is not distributed

Poisson3 . However the special case of X + Y is distributed Poisson.
X ∼ P o(λ1 ) and Y ∼ P o(λ2 ) ⇒ X + Y ∼ P o(λ1 + λ2 ).
For example if Candy makes on average 3 typing errors per hour and Tiffany makes 4
typing errors per hour find the probability of fewer than 12 errors in total in a two hour
period. Here we have P o(14) so P(X < 12) = P(X 6 11) = 0.2600 (tables).
Student’s t -Distribution
• In S2 you learnt that if you take a sample froma normal
population of known variance
2 σ2
σ then no matter what the sample size X̄ ∼ N µ, n exactly.
X̄ − c
The test statistic for H0 : µ = c is Z = q .
σ2
n
• You also learnt that if you take a sample ofsize n> 30 from any population distribution
2
where you know σ 2 then (by CLT) X̄ ∼ N µ, σn approximately.
X̄ − c
σ2
n
• You also learnt that if you take a sample of size n > 30 from any population distribution

2
with unknown σ 2 then you estimate σ 2 by calculating s2 and (by CLT) X̄ ∼ N µ, sn
approximately.
X̄ − c
s2
n
• You would therefore think that if you were drawing from a normal population
with un-
2 2 2 s2
known σ then you would estimate σ by calculating s and X̄ ∼ N µ, n . But this is
not the case!!! In fact X̄ is exactly described by Student’s t-distribution4 .
X̄ − c
The test statistic for H0 : µ = c is T = q .
s2
n
3
Because with the Poisson we require the expectation and the variance to be the same and given X ∼ P o(λ1 )
and Y ∼ P o(λ2 ) we have E(aX + bY ) = aE(X) + bE(Y ) = aλ1 + bλ2 and Var(aX + bY ) = a2 Var(X) + b2 Var(Y ) =
a2 λ1 + b2 λ2 and the only time aX + bY = a2 X + b2 Y is when a = b = 1.
4
Named after W.S.Gosset who wrote under the pen name ‘Student’. Gosset devised the t-test as a way to
cheaply monitor the quality of stout. Good bloke.
• (You will notice the apparent contradiction between the last two bullet points. If a large
sample (n > 30) is taken from a normal population with unknown variance then how
can X̄ be distributed both normally and as a t-distribution? Well, as the sample size
gets larger, the t-distribution converges to the normal distribution. Just remember that
technically if you have a normal population with unknown variance then X̄ is exactly a
t-distribution, but if n > 30 then CLT lets us approximate X̄ as a normal. In practice the
t-distribution is used only with small sample sizes.)
• There is the new concept of the degree of freedom (denoted ν) of the t-distribution. As ν
gets larger the t-distribution tends towards the standard normal distribution. However if
ν is small enough, then the difference between t and z becomes quite marked (as you can
see yourself from the tables).
• We can do hypothesis tests here just like we did in S2, only instead of using the normal
tables we use the t tables (with correct degrees of freedom ν) to find tcrit and compare
X̄ − c
the test statistic q against tcrit . Here ν = n − 1.
s2
n
• For example a machine is producing circular disks whose radius is normally distributed.
Their radius historically has been 5cm. The factory foreman believes that the machine is
now producing disks that are too small. A sample of 9 disks are taken and their radii are
4.8, 4.9, 4.5, 5.2, 4.9, 4.8, 5.0, 4.8, 5.0
Test at the 10% level whether the foreman has a case.
Let µ = the population mean radii of the disks.

H0 : µ = 5,
H1 : µ < 5.
n = 9, so ν = 9 − 1 = 8.
α = 10%. Therefore in t8 we lookup 90% (because one tailed) and discover 1.397. But
because it is a “<” test tcrit must be negative to tcrit = −1.397.
P
x
x̄ = n= 43.9
9 = 4.87̇.
P 2
x 9 214.43
s2 = n−1
n 2 = − 4.87̇2 = 0.03694.

n − x̄ 8 9
tobs = x̄−c
q = q4.87̇−5 = −1.908.
s2 0.03694
n 9
−1.908 < −1.397. This value lies in the rejection region of the test and therefore at
the 10% level we have sufficient evidence to reject H0 and conclude that the machine is
probably not working fine.
Testing For Difference Between Means

2

• The central pillar in this section is that if X̄ ∼ N µx , nσxx (which is either exactly true if
σ2

X is itself normal, or approximately true if nx > 30 from CLT) and Ȳ ∼ N µy , nyy then
(provided X and Y are independent)
!
σx2 σy2
X̄ − Ȳ ∼ N µx − µy , + .
nx ny
• If X and Y are normally distributed with known variances (σx2 and σy2 ) and we are testing
H0 : µx − µy = c the test statistic is
X̄ − Ȳ − c
Z= q 2 .
σx σy2
nx + ny
For example5 it is known that French people’s heights (in cm) are normally distributed
N (µf , 25). It is also known that German people’s heights are normally distributed N (µg , 20).
It is wished to test whether or not German people are taller than French people (at the
2 21 % level). A random sample of 10 French people’s heights are and their mean height
recorded (f¯). Similarly 8 German people’s heights are taken and their mean recorded (ḡ).
1. State appropriate null and alternative hypotheses.

2. Find the set of values for ḡ − f¯ for which we would reject the null hypothesis.
3. If in fact Germans are 7cm taller on average then find the probability of a Type II
error.
1. H0 : µg − µf = 0,
H1 : µg − µf > 0.
X̄−Ȳ −c
2. Given Z = r we obtain
2
σx σ2
nx
+ ny
y
(Ḡ − F̄ )crit
Zcrit = 1.960 = q .
25 20
10 + 8
Therefore critical value is (ḡ − f¯)crit = 4.383. We therefore reject the null hypothesis
if ḡ − f¯ > 4.383.
3. For a Type II error we must lie in the acceptance region of the original test given the
new information. Here we require P(ḡ − f¯ < 4.383 | µg − µf = 7), so
 
4.383 − 7 
P(ḡ − f¯ < 4.383 | µg − µf = 7) = P Z < q
25 20
10 + 8
= P(Z < −1.170)

= 1 − P(Z < 1.170)
= 1 − 0.8790 = 0.121
• If X and Y are not normally distributed we need the samples to be large (then CLT ap-
plies). If the variances are known then the above is still correct. However if the population
variances are unknown we replace the σx and σy by their estimators sx and sy .
For example, Dr. Evil believes that people’s attention spans are different in Japan and
America. HeP samples 80 Japanese
P 2 people and finds their attention spans are described
(in
P minutes) j =
P 2800 and j = 12000. He samples 100 people in America and finds
a = 850 and a = 11200. Test at the 5% level whether Dr Evil is justified in his
claim. So
H0 : µj − µa = 0.
H1 : µj − µa 6= 0.
5
It’s well worth thinking very hard about this example. It stumped me the first time I saw a similar question.
α = 5%.
j̄ = 10, ā = 8.5.
80 12000
s2j = 2 = 50.63.

79 80 − 10
100 11200
s2a = 2 = 40.15.

99 100 − 8.5
X̄−Ȳ −c q 10−8.5
Zobs = s = 50.63
= 1.475.
s2
j s2 80
+ 40.15
100
nj
+ na
a
Zcrit = ±1.960. Therefore we reject if |Zobs | > 1.960.

1.475 < 1.960, so at the 5% level we have no reason to reject H0 and conclude that Dr Evil
is probably mistaken in his claim that the two countries have different attention spans.
• If X and Y are normally distributed with an unknown, common variance and we are
testing H0 : µx − µy = c we use a two-sample t-test. The test statistic here is
X̄ − Ȳ − c
T =r .
2 1 1
s p nx + ny
Here s2p is the unbiased pooled estimate of the common variance, defined
(nx − 1)s2x + (ny − 1)s2y

s2p = .
nx + ny − 2
Also ν = nx + ny − 2. For example a scientist wishes to test whether new heart medication
reduces blood pressure. 10P patients with high
P blood pressure were given the medication
and their summary data is x = 1271 and (x− x̄)2 = P 640.9. 8 patientsP
with high blood
pressure were given a placebo and their summary data is y = 1036 and (y− ȳ)2 = 222.
Carry out a hypothesis test at the 10% level to see if the medication is working.
H0 : µx − µy = 0.
H1 : µx − µy < 0.
α = 10%.
x̄ = 127.1, ȳ = 129.5.
s2x = 10 640.9

9 10 = 71.21̇.
2 8 222

sy = 7 8 = 31.71.
9×71.21̇+7×31.71
s2p = 16 = 53.93.
r X̄− Ȳ −c q127.1−129.5
Tobs = = = −0.689.
53.93( 10
1
+ 81 )

s2p n1 + n1
x y
ν = 16 so Tcrit = −1.337.
−0.689 > −1.337, so at the 10% level we have no reason to reject H0 and conclude that
the medication is probably not lowering blood pressure.
• Also look for ‘paired’ data. This can only happen if nx = ny and if every piece of data in
x is somehow linked to a piece of data in y. Ask yourself ‘would it matter if you changed
the ordering of the xi but not the yi ?’ If yes, then paired. If the data is paired then you
create a new set of data di = xi − yi .
1. If the population of differences is distributed normally (or assumed to be distributed
normally) then the test statistic for H0 : µd = c is
D̄ − c
T = q 2 with ν = n − 1.
sd
n
For example, Dwayne believes that his mystical crystals can boost IQs. He takes
10 students and records their IQs before and after they have been ‘blessed’ by the
crystals. The results are
Victim 1 2 3 4 5 6 7 8 9 10
IQ Before 107 124 161 89 96 120 109 98 147 89
IQ After 108 124 159 100 101 119 110 101 146 94
Test at the 5% level Dwayne’s claim. The data is clearly paired and thus we create
di = IQafter − IQbefore giving
1, 0, −2, 11, 5, −1, 1, 3, −1, 5.
H0 : µd = 0,
H1 : µd > 0.
α = 5%
ν = 10 − 1 = 9.
d¯ = 22
10 =
2.2
P 2
2 n d ¯2 = 10 188
− 2.22 = 15.51̇.

sd = n−1 n − d 9 10
D̄−c
Tobs = r = 1.766.
s2
d
n
Tcrit = 1.833 (tables)

1.766 < 1.833 therefore at the 5% level no reason to reject H0 and conclude that the
crystals probably don’t significantly increase IQ.
2. If the population of differences is not distributed normally, but the sample size is
large, then CLT applies and the test statistic for H0 : µd = c is
D̄ − c
Z= q 2 .
sd
n
• If testing for differences in population proportions there are two cases, each requiring
independent, large samples (CLT).
1. For H0 : px = py (i.e. no difference in population proportions) the test statistic is
Psx − Psy
Z=r .
pq n1x + n1y
x+y x
Here p is the value of the common population proportion p = . Also psx = nx
nx + ny
y
and psy = ny .
2. For H0 : px − py = c the test statistic is
Psx − Psy − c
Z=q .
Psx Qsx Psy Qsy
nx + ny
Here qsy = 1 − psx and qsy = 1 − psy .
Confidence Intervals
• It has been described to me by someone I respect that a confidence interval is like an
‘egg-cup’ of a certain width that we throw down onto the number-line. Of all possible
‘egg-cups’ we want 90% (or some other percentage) of those egg cups to contain the true
mean µ. This does not mean that a confidence interval has a 90% chance of containing
the mean; it either contains the mean or it doesn’t.
• A confidence interval is denoted [a, b] which means a < x < b. In S3 we only consider
symmetric confidence intervals about the sample mean (because x̄ is an unbiased estimate
of µ). They basically represent the acceptance region of a hypothesis test where H0 :
µ = x̄.
• To find the required z or t values in all of the following confidence intervals is easy. If you
want (say) a 90% confidence interval then you (sort of) want to contain 90% of the data,
so you must have 10% not contained which means that there must be 5% at each end of
the distribution. Therefore you look up, either in the little table beneath the big normal
table or in the correct line of the t table, 95%. This then gives you the z or t value to the
left of which 95% of the data lies.
• This is fine for certain special values (90%, 95%, 99% etc.) and for the t-distribution this
is all you can do. However for z values we can also do a ‘reverse look-up’ in the main
normal tables to find more ‘exotic’ values. For example if I wanted a 78% confidence
interval with z, then 11% would be in each end. Therefore I would reverse look-up 0.8900
within the main body of the table to find z = 1.226.
• If you are drawing from a normal of known variance σ 2 then the confidence interval will
be
σ σ
x̄ − z √ , x̄ + z √ .
n n
This result is true even for small sample sizes.
For example, an α% confidence interval is calculated from a normal population whose vari-
ance is known to be 9. The sample size is 16 and the confidence interval is [19.68675, 22.31325].
Find α. The midpoint of the interval is 21. Therefore the confidence interval is [21 −
z √316 , 21 + z √316 ]. We can then solve 21 + z √316 = 22.31325 to find z = 1.751. A forward
lookup in the table reveals 0.96. Therefore there exists 4% at either end, so α = 8; i.e. it
is an 92% confidence interval.
• If you are drawing from a normal of unknown variance then the confidence interval will
be
s s
x̄ − t √ , x̄ + t √ .
n n
The degrees of freedom here will be ν = n − 1.
• If you are drawing from an unknown distribution then (provided n > 30 to invoke the
CLT) then the confidence interval will be

s s
x̄ − z √ , x̄ + z √ .
n n
• If, instead of means, we are taking a sample proportion then the confidence interval will
be r r
ps q s ps q s
ps − z , ps + z .
n n
• If instead of single samples we are looking for a confidence interval for the difference
between two populations we use the following, depending on the situation.
1. Difference in means being zero from two normals of known variances

" s s #
σx2 σy2 σx2 σy2
x̄ − ȳ − z + , x̄ − ȳ + z + .
nx ny nx ny
Or for difference in means x̄ − ȳ being c,

" s s #
σx2 σy2 σx2 σy2
x̄ − ȳ − c − z + , x̄ − ȳ − c + z + .
nx ny nx ny
This can also be used for non-normal populations of known variance if the samples
are large (CLT).
2. The above can be altered if the samples are large (CLT) and the variances are not
known to " s s #
s2x s2y s2x s2y
x̄ − ȳ − z + , x̄ − ȳ + z + .
nx ny nx ny
3. Difference in means being zero from two normals of the same, unknown variance
" s s #
1 1 1 1
x̄ − ȳ − tsp + , x̄ − ȳ + tsp + .
nx ny nx ny
Or for difference in means x̄ − ȳ being c,

" s s #
1 1 1 1
x̄ − ȳ − c − tsp + , x̄ − ȳ − c + tsp + .
nx ny nx ny
(nx −1)s2x +(ny −1)s2y

Here sp is the unbiased pooled estimate of the common variance s2p = nx +ny −2 .
The degrees of freedom is ν = nx + ny − 2.
4. If dealing with difference in population proportions we use

psx qsx psy qsy psx qsx psy qsy
r r
psx − psy − z + , psx − psy + z + .
nx ny nx ny
χ2 -Tests
• χ2 tests measure how good data fits a given distribution. The test statistic here is
X (O − E)2
X2 = .
E
Here O is the observed frequency and E the expected frequency. The larger X 2 becomes
the more likely it is that the observed data does not come from the expected values that
we have calculated.
• As with the t-distribution, the χ2 distribution has a degree of freedom associated with it
still denoted ν. This is calculated
ν = number of classes − number of constraints.
• Given observed frequencies you need to calculate expected frequencies from theoretical
probabilities. Expected frequencies are the expected probability times the total number
of trials. The convention is that if an expected value is less than 5, then you combine
with a larger expected value such that all values end up greater than 5. For example if
you had
OBS 22 38 24 18 9 2 1 0
EXP 23.4 35.1 27.2 16.1 7.2 3.1 0.9 0.2
you would combine the final four columns to get
OBS 22 38 24 18 12
EXP 23.4 35.1 27.2 16.1 11.4
Because of this combining the total number of classes would be 5 and not 8.
• Fitting a Distribution
– As with any hypothesis tests, the expected values are computed supposing that H0
is correct. For example given the data
Outcome 0 1 2 3 4 5
Obs Frequency 22 37 23 10 6 2
test at the 5% level the hypotheses
H0 : The data is well modelled by B(5, 41 ),
H1 : The data is not well modelled by B(5, 41 ).
So, under H0 we have B(5, 14 ). We calculate the probabilities of the six outcomes
from S1:
x 0 1 2 3 4 5
243 405 135 45 15 1
P(X = x) 1024 1024 512 512 1024 1024
Then we note that the total number in the observed data is 100, so we multiply the
expected probabilities by 100 to obtain expected frequencies (to 1dp).
Outcome 0 1 2 3 4 5
Exp Frequency 23.7 39.6 26.3 8.8 1.5 0.1
We see that the expected frequencies have dropped below five, so we combine the
last 3 columns to obtain:
OBS 22 37 23 18
EXP 23.7 39.6 26.3 10.4
So X 2 = 2.89 6.76 10.89 57.76

23.7 + 39.6 + 26.3 + 10.4 = 6.26.
Now the only constraint here is the total observed frequencies of 100, so ν = 4−1 = 3.
In the tables we observe P(χ23 6 7.815) = 0.95. Therefore the critical X 2 value is
7.815. So 6.26 < 7.815 and we therefore have no reason to reject H0 and conclude
that B(5, 41 ) is probably a good model for the data.
– Parameter Estimation. It is important to note that there is a difference in ν in
the following situations:
∗ H0 : The data can be modelled by a Poisson distribution with λ = 3.1.
∗ H0 : The data can be modelled by a Poisson distribution.
The second has an extra constraint because you will need to estimate the value of λ
from your observed data. In general just remember that if you estimate a parameter
from observed data then this provides another constraint.
∗ If you need to estimate p from a frequency table for testing the goodness of fit
of a binomial distribution you calculate x̄ from the data in the usual way and
equate this with np because that is the expectation of a binomial. For example,
estimate p from the following observed data:
x 0 1 2 3 4
.
Obs frequency 12 16 6 2 1
So np = x̄ = 0×12+1×16+2×6+3×2+4×1
37 = 3738
. Therefore p = 37×4 38
= 0.257 (to
3dp).
∗ If you need to estimate λ from a frequency table for testing the goodness of fit
of a Poisson distribution you calculate x̄ from the data in the usual way and
equate this with λ. The only potential difficulty lies in the fact that the Poisson
distribution has an infinite number of outcomes {0, 1, 2, 3 . . . }. However, the
examiners will take pity and give you a scenario such as
x 0 1 2 3 4 or more
where the “4 or more” frequency will be zero. Therefore λ = 0×5+1×11+2×10+3×3
29 =
1.38 (to 2dp).
∗ Likewise the geometric distribution takes an infinite number of possible outcomes
1
{1, 2, 3, 4 . . . }, and E(X) = 1p , so to estimate p we calculate E(X) . For example
given
x 1 2 3 4 5 or more
1×26+2×20+3×13+4×6 129 65
So, x̄ = 65 = 65 . Therefore p = 129 .
– For example for the following, test at the 1% level the following hypotheses:
H0 : The data is well modelled by a Poisson,
H1 : The data is not well modelled by a Poisson.
x 0 1 2 3 4 5or more
Obs frequency 14 23 14 7 2 0
So we estimate from the data (as above) λ = 34 . Now we calculate the first five
x
expected values using total × P(X = x) = e−λ λx! . The final total we calculate by 60
subtract the other five totals.
x 0 1 2 3 4 5or more
.
Exp frequency 15.8 21.1 14.1 6.2 2.1 0.7
So combining columns so that the expected values equal at least five we obtain.
OBS 14 23 14 9
EXP 15.8 21.1 14.1 9.0
Now X 2 = 0.377. ν = 4 − 2 = 2 (2 constraints because of 60 total and estimation of
λ).
From tables P(χ22 < 9.210) = 0.99. 0.377 < 9.210 and therefore at the 1% level we
have no reason to reject H0 and conclude that the data is probably well described by
a Poisson.
• Contingency Tables
– we are looking for independence (or, equivalently, dependence) between two variables.
Remember that two events (A and B) are independent if P(A|B) = P(A|B ′ ) =
P(A). Coupling this with the formula P(A|B) = P(A∩B) P(B) (which drops out easily
from a Venn diagram with A and B overlapping) we discover that independence
implies P(A) × P(B) = P(A ∩ B). Therefore given any contingency table showing
observed values we wish to calculate the values that would be expected if they were
independent. Then carry out the analysis as before.
– For example 81 children are asked which of football, rugby or netball is their favourite.
OBS Football Rugby Netball Total
Boy 17 25 3 45
Girl 9 3 24 36
Total 26 28 27 81
Now, if the sex and choice of favourite were independent then P(rugby and girl) =
P(rugby) × P(girl) = 28 36
81 × 81 . Therefore the number of girls who like rugby best
28
should be 81 × 81 × 81 . The 81 cancels to give an expected number of 28×36
36
81 . This
is an example of the general result
column total × row total
expected number = .
grand total
Therefore in our example we have
EXP Football Rugby Netball Total
26×45
Boy 81 = 14 94 28×45
81= 15 59 27×45
81 = 15 45
26×36
Girl 81 = 11 95 28×36
81= 12 49 27×36
81 = 12 36
Total 26 28 27 81
None of the expected values are less than 5, so no need to combine columns. Therefore
X (O − E)2
X2 = = 35.52 (to 2 dp). Make sure you can get my answer. A table
E 2
often helps you build up to the answer. Use columns O, E, (O − E)2 , (O−E) E .
– In an m × n contingency table the degrees of freedom is
ν = (m − 1)(n − 1).
So in the above example ν = (3 − 1) × (2 − 1) = 2. So if we were to carry out a

hypothesis test (at the 5% level) of
H0 : The variables ‘sex’ and ‘favourite sport’ are independent;
H1 : The variables ‘sex’ and ‘favourite sport’ are not independent.
We would use the correct row in the χ2 tables to discover that P(χ22 > 5.991) = 0.05.
Now 35.52 > 5.991 so we reject H0 and conclude that ‘sex’ and ‘favourite sport’ are
not independent.
– If you have a 2 × 2 contingency table you must apply Yates’s correction. Here you
reduce each value of |O − E| by 12 . Again a table helps you build up to the answer.
2 (|O−E|− 21 )2
Use columns O, E, |O − E|, |O − E| − 21 , E .
For example carry out a hypothesis test to see if hair colour and attractiveness are
independent.
OBS Blonde Not blonde Total
Fit 24 16 40
Minging 14 46 60
Total 38 62 100
Expected values are calculated as before.
EXP Blonde Not blonde Total
38×40 62×40
Fit 100 = 15.2 100 = 24.8 40
38×60 62×60
Minging 100 = 22.8 100 = 37.2 60
Total 38 62 100
Therefore the table would be
1 2
(|O−E|− 21 )2
O E |O − E| |O − E| − 2 E
24 15.2 8.8 68.89 4.532
16 24.8 8.8 68.89 2.778
14 22.8 8.8 68.89 3.021
46 37.2 8.8 68.89 1.852
12.183
X 2 = 12.183 and ν = 1 and you use these values in any subsequent hypothesis test.
(Note that X 2 is pretty high here and for any significance level in the tables we
would reject the hypothesis that hair colour and fitness were independent. Blondes
are hot.)

OCR S3 Revision Notes

Uploaded by

Copyright:

Available Formats

OCR S3 Revision Notes

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

OCR S3 Revision Notes

Uploaded by

Copyright:

Available Formats

OCR Statistics 3 Module Revision Sheet

• The unbiased estimator of variance from a sample (s2 ) simplifies to

(nx −1)s2x +(ny −1)s2y

(nx − 1)s2x + (ny − 1)s2y

Continuous Random Variables

P(a < X < b) = F (b) − F (a).

and Y = 4X 2 find F (y).

• You must be a little careful if (say) Y = X1 . You start F (y) = P(Y 6 y) = P( X1 6 y) =

For example find E(X 2 + 1) of

Linear Combinations Random Variables

E(Y ) = E(aX + b) = aE(X) + b.

Var(Y ) = Var(aX + b) = a2 Var(X).

Var(aX + bY ) = a2 Var(X) + b2 Var(Y ).

E(a1 X1 + a2 X2 + · · · + an Xn ) = a1 E(X1 ) + a2 E(X2 ) + + · · · + an E(Xn ),

The second (of course) true if all independent.

• If X and Y are independent and normally distributed then aX + bY is also normally

X ∼ N (µ1 , σ12 ) and Y ∼ N (µ2 , σ22 ) ⇒ aX + bY ∼ N (aµ1 + bµ2 , a2 σ12 + b2 σ22 ).

The random variable 2X is doubling the outcome of one sampling of X, but X1 + X2

• Given Poisson distributed X and Y it is even simpler. Here aX + bY is not distributed

X ∼ P o(λ1 ) and Y ∼ P o(λ2 ) ⇒ X + Y ∼ P o(λ1 + λ2 ).

4.8, 4.9, 4.5, 5.2, 4.9, 4.8, 5.0, 4.8, 5.0

Test at the 10% level whether the foreman has a case.

Let µ = the population mean radii of the disks.

Testing For Difference Between Means

1. State appropriate null and alternative hypotheses.

= P(Z < −1.170)

Zcrit = ±1.960. Therefore we reject if |Zobs | > 1.960.

(nx − 1)s2x + (ny − 1)s2y

Tcrit = 1.833 (tables)

Here qsy = 1 − psx and qsy = 1 − psy .

1. Difference in means being zero from two normals of known variances

Or for difference in means x̄ − ȳ being c,

Or for difference in means x̄ − ȳ being c,

(nx −1)s2x +(ny −1)s2y

ν = number of classes − number of constraints.

you would combine the final four columns to get

So X 2 = 2.89 6.76 10.89 57.76

So in the above example ν = (3 − 1) × (2 − 1) = 2. So if we were to carry out a

You might also like